You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@impala.apache.org by mi...@apache.org on 2018/05/09 21:10:10 UTC

[01/51] [partial] impala git commit: [DOCS] Impala doc site update for 3.0

Repository: impala
Updated Branches:
  refs/heads/asf-site 52b8807de -> fae51ec24


http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_string_functions.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_string_functions.html b/docs/build3x/html/topics/impala_string_functions.html
new file mode 100644
index 0000000..b623a47
--- /dev/null
+++ b/docs/build3x/html/topics/impala_string_functions.html
@@ -0,0 +1,1719 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_functions.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="string_functions"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Impala String Functions</title></head><body id="string_functions"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Impala String Functions</h1>
+
+
+
+  <div class="body conbody">
+
+    <div class="p">
+      String functions are classified as those primarily accepting or returning <code class="ph codeph">STRING</code>,
+      <code class="ph codeph">VARCHAR</code>, or <code class="ph codeph">CHAR</code> data types, for example to measure the length of a string
+      or concatenate two strings together.
+      <ul class="ul">
+        <li class="li">
+          All the functions that accept <code class="ph codeph">STRING</code> arguments also accept the <code class="ph codeph">VARCHAR</code>
+          and <code class="ph codeph">CHAR</code> types introduced in Impala 2.0.
+        </li>
+
+        <li class="li">
+          Whenever <code class="ph codeph">VARCHAR</code> or <code class="ph codeph">CHAR</code> values are passed to a function that returns a
+          string value, the return type is normalized to <code class="ph codeph">STRING</code>. For example, a call to
+          <code class="ph codeph">concat()</code> with a mix of <code class="ph codeph">STRING</code>, <code class="ph codeph">VARCHAR</code>, and
+          <code class="ph codeph">CHAR</code> arguments produces a <code class="ph codeph">STRING</code> result.
+        </li>
+      </ul>
+    </div>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+    <p class="p">
+      The string functions operate mainly on these data types: <a class="xref" href="impala_string.html#string">STRING Data Type</a>,
+      <a class="xref" href="impala_varchar.html#varchar">VARCHAR Data Type (Impala 2.0 or higher only)</a>, and <a class="xref" href="impala_char.html#char">CHAR Data Type (Impala 2.0 or higher only)</a>.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Function reference:</strong>
+    </p>
+
+    <p class="p">
+      Impala supports the following string functions:
+    </p>
+
+    <dl class="dl">
+
+
+        <dt class="dt dlterm" id="string_functions__ascii">
+          <code class="ph codeph">ascii(string str)</code>
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Returns the numeric ASCII code of the first character of the argument.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">int</code>
+          </p>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="string_functions__base64decode">
+          <code class="ph codeph">base64decode(string str)</code>
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong>
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+          </p>
+          <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+          <p class="p">
+            For general information about Base64 encoding, see
+            <a class="xref" href="https://en.wikipedia.org/wiki/Base64" target="_blank">Base64 article on Wikipedia</a>.
+          </p>
+          <p class="p">
+        The functions <code class="ph codeph">base64encode()</code> and
+        <code class="ph codeph">base64decode()</code> are typically used
+        in combination, to store in an Impala table string data that is
+        problematic to store or transmit. For example, you could use
+        these functions to store string data that uses an encoding
+        other than UTF-8, or to transform the values in contexts that
+        require ASCII values, such as for partition key columns.
+        Keep in mind that base64-encoded values produce different results
+        for string functions such as <code class="ph codeph">LENGTH()</code>,
+        <code class="ph codeph">MAX()</code>, and <code class="ph codeph">MIN()</code> than when
+        those functions are called with the unencoded string values.
+      </p>
+          <p class="p">
+        The set of characters that can be generated as output
+        from <code class="ph codeph">base64encode()</code>, or specified in
+        the argument string to <code class="ph codeph">base64decode()</code>,
+        are the ASCII uppercase and lowercase letters (A-Z, a-z),
+        digits (0-9), and the punctuation characters
+        <code class="ph codeph">+</code>, <code class="ph codeph">/</code>, and <code class="ph codeph">=</code>.
+      </p>
+          <p class="p">
+        All return values produced by <code class="ph codeph">base64encode()</code>
+        are a multiple of 4 bytes in length. All argument values
+        supplied to <code class="ph codeph">base64decode()</code> must also be a
+        multiple of 4 bytes in length. If a base64-encoded value
+        would otherwise have a different length, it can be padded
+        with trailing <code class="ph codeph">=</code> characters to reach a length
+        that is a multiple of 4 bytes.
+      </p>
+          <p class="p">
+        If the argument string to <code class="ph codeph">base64decode()</code> does
+        not represent a valid base64-encoded value, subject to the
+        constraints of the Impala implementation such as the allowed
+        character set, the function returns <code class="ph codeph">NULL</code>.
+      </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+          <div class="p">
+        The following examples show how to use <code class="ph codeph">base64encode()</code>
+        and <code class="ph codeph">base64decode()</code> together to store and retrieve
+        string values:
+<pre class="pre codeblock"><code>
+-- An arbitrary string can be encoded in base 64.
+-- The length of the output is a multiple of 4 bytes,
+-- padded with trailing = characters if necessary.
+select base64encode('hello world') as encoded,
+  length(base64encode('hello world')) as length;
++------------------+--------+
+| encoded          | length |
++------------------+--------+
+| aGVsbG8gd29ybGQ= | 16     |
++------------------+--------+
+
+-- Passing an encoded value to base64decode() produces
+-- the original value.
+select base64decode('aGVsbG8gd29ybGQ=') as decoded;
++-------------+
+| decoded     |
++-------------+
+| hello world |
++-------------+
+</code></pre>
+
+      These examples demonstrate incorrect encoded values that
+      produce <code class="ph codeph">NULL</code> return values when decoded:
+
+<pre class="pre codeblock"><code>
+-- The input value to base64decode() must be a multiple of 4 bytes.
+-- In this case, leaving off the trailing = padding character
+-- produces a NULL return value.
+select base64decode('aGVsbG8gd29ybGQ') as decoded;
++---------+
+| decoded |
++---------+
+| NULL    |
++---------+
+WARNINGS: UDF WARNING: Invalid base64 string; input length is 15,
+  which is not a multiple of 4.
+
+-- The input to base64decode() can only contain certain characters.
+-- The $ character in this case causes a NULL return value.
+select base64decode('abc$');
++----------------------+
+| base64decode('abc$') |
++----------------------+
+| NULL                 |
++----------------------+
+WARNINGS: UDF WARNING: Could not base64 decode input in space 4; actual output length 0
+</code></pre>
+
+      These examples demonstrate <span class="q">"round-tripping"</span> of an original string to an
+      encoded string, and back again. This technique is applicable if the original
+      source is in an unknown encoding, or if some intermediate processing stage
+      might cause national characters to be misrepresented:
+
+<pre class="pre codeblock"><code>
+select 'circumflex accents: â, ê, î, ô, û' as original,
+  base64encode('circumflex accents: â, ê, î, ô, û') as encoded;
++-----------------------------------+------------------------------------------------------+
+| original                          | encoded                                              |
++-----------------------------------+------------------------------------------------------+
+| circumflex accents: â, ê, î, ô, û | Y2lyY3VtZmxleCBhY2NlbnRzOiDDoiwgw6osIMOuLCDDtCwgw7s= |
++-----------------------------------+------------------------------------------------------+
+
+select base64encode('circumflex accents: â, ê, î, ô, û') as encoded,
+  base64decode(base64encode('circumflex accents: â, ê, î, ô, û')) as decoded;
++------------------------------------------------------+-----------------------------------+
+| encoded                                              | decoded                           |
++------------------------------------------------------+-----------------------------------+
+| Y2lyY3VtZmxleCBhY2NlbnRzOiDDoiwgw6osIMOuLCDDtCwgw7s= | circumflex accents: â, ê, î, ô, û |
++------------------------------------------------------+-----------------------------------+
+</code></pre>
+      </div>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="string_functions__base64encode">
+          <code class="ph codeph">base64encode(string str)</code>
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong>
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+          </p>
+          <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+          <p class="p">
+            For general information about Base64 encoding, see
+            <a class="xref" href="https://en.wikipedia.org/wiki/Base64" target="_blank">Base64 article on Wikipedia</a>.
+          </p>
+          <p class="p">
+        The functions <code class="ph codeph">base64encode()</code> and
+        <code class="ph codeph">base64decode()</code> are typically used
+        in combination, to store in an Impala table string data that is
+        problematic to store or transmit. For example, you could use
+        these functions to store string data that uses an encoding
+        other than UTF-8, or to transform the values in contexts that
+        require ASCII values, such as for partition key columns.
+        Keep in mind that base64-encoded values produce different results
+        for string functions such as <code class="ph codeph">LENGTH()</code>,
+        <code class="ph codeph">MAX()</code>, and <code class="ph codeph">MIN()</code> than when
+        those functions are called with the unencoded string values.
+      </p>
+          <p class="p">
+        The set of characters that can be generated as output
+        from <code class="ph codeph">base64encode()</code>, or specified in
+        the argument string to <code class="ph codeph">base64decode()</code>,
+        are the ASCII uppercase and lowercase letters (A-Z, a-z),
+        digits (0-9), and the punctuation characters
+        <code class="ph codeph">+</code>, <code class="ph codeph">/</code>, and <code class="ph codeph">=</code>.
+      </p>
+          <p class="p">
+        All return values produced by <code class="ph codeph">base64encode()</code>
+        are a multiple of 4 bytes in length. All argument values
+        supplied to <code class="ph codeph">base64decode()</code> must also be a
+        multiple of 4 bytes in length. If a base64-encoded value
+        would otherwise have a different length, it can be padded
+        with trailing <code class="ph codeph">=</code> characters to reach a length
+        that is a multiple of 4 bytes.
+      </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+          <div class="p">
+        The following examples show how to use <code class="ph codeph">base64encode()</code>
+        and <code class="ph codeph">base64decode()</code> together to store and retrieve
+        string values:
+<pre class="pre codeblock"><code>
+-- An arbitrary string can be encoded in base 64.
+-- The length of the output is a multiple of 4 bytes,
+-- padded with trailing = characters if necessary.
+select base64encode('hello world') as encoded,
+  length(base64encode('hello world')) as length;
++------------------+--------+
+| encoded          | length |
++------------------+--------+
+| aGVsbG8gd29ybGQ= | 16     |
++------------------+--------+
+
+-- Passing an encoded value to base64decode() produces
+-- the original value.
+select base64decode('aGVsbG8gd29ybGQ=') as decoded;
++-------------+
+| decoded     |
++-------------+
+| hello world |
++-------------+
+</code></pre>
+
+      These examples demonstrate incorrect encoded values that
+      produce <code class="ph codeph">NULL</code> return values when decoded:
+
+<pre class="pre codeblock"><code>
+-- The input value to base64decode() must be a multiple of 4 bytes.
+-- In this case, leaving off the trailing = padding character
+-- produces a NULL return value.
+select base64decode('aGVsbG8gd29ybGQ') as decoded;
++---------+
+| decoded |
++---------+
+| NULL    |
++---------+
+WARNINGS: UDF WARNING: Invalid base64 string; input length is 15,
+  which is not a multiple of 4.
+
+-- The input to base64decode() can only contain certain characters.
+-- The $ character in this case causes a NULL return value.
+select base64decode('abc$');
++----------------------+
+| base64decode('abc$') |
++----------------------+
+| NULL                 |
++----------------------+
+WARNINGS: UDF WARNING: Could not base64 decode input in space 4; actual output length 0
+</code></pre>
+
+      These examples demonstrate <span class="q">"round-tripping"</span> of an original string to an
+      encoded string, and back again. This technique is applicable if the original
+      source is in an unknown encoding, or if some intermediate processing stage
+      might cause national characters to be misrepresented:
+
+<pre class="pre codeblock"><code>
+select 'circumflex accents: â, ê, î, ô, û' as original,
+  base64encode('circumflex accents: â, ê, î, ô, û') as encoded;
++-----------------------------------+------------------------------------------------------+
+| original                          | encoded                                              |
++-----------------------------------+------------------------------------------------------+
+| circumflex accents: â, ê, î, ô, û | Y2lyY3VtZmxleCBhY2NlbnRzOiDDoiwgw6osIMOuLCDDtCwgw7s= |
++-----------------------------------+------------------------------------------------------+
+
+select base64encode('circumflex accents: â, ê, î, ô, û') as encoded,
+  base64decode(base64encode('circumflex accents: â, ê, î, ô, û')) as decoded;
++------------------------------------------------------+-----------------------------------+
+| encoded                                              | decoded                           |
++------------------------------------------------------+-----------------------------------+
+| Y2lyY3VtZmxleCBhY2NlbnRzOiDDoiwgw6osIMOuLCDDtCwgw7s= | circumflex accents: â, ê, î, ô, û |
++------------------------------------------------------+-----------------------------------+
+</code></pre>
+      </div>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="string_functions__btrim">
+          <code class="ph codeph">btrim(string a)</code>,
+          <code class="ph codeph">btrim(string a, string chars_to_trim)</code>
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Removes all instances of one or more characters
+          from the start and end of a <code class="ph codeph">STRING</code> value.
+          By default, removes only spaces.
+          If a non-<code class="ph codeph">NULL</code> optional second argument is specified, the function removes all
+          occurrences of characters in that second argument from the beginning and
+          end of the string.
+          <p class="p"><strong class="ph b">Return type:</strong> <code class="ph codeph">string</code></p>
+          <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.3.0</span>
+      </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+          <p class="p">
+            The following examples show the default <code class="ph codeph">btrim()</code> behavior,
+            and what changes when you specify the optional second argument.
+            All the examples bracket the output value with <code class="ph codeph">[ ]</code>
+            so that you can see any leading or trailing spaces in the <code class="ph codeph">btrim()</code> result.
+            By default, the function removes and number of both leading and trailing spaces.
+            When the second argument is specified, any number of occurrences of any
+            character in the second argument are removed from the start and end of the
+            input string; in this case, spaces are not removed (unless they are part of the second
+            argument) and any instances of the characters are not removed if they do not come
+            right at the beginning or end of the string.
+          </p>
+<pre class="pre codeblock"><code>-- Remove multiple spaces before and one space after.
+select concat('[',btrim('    hello '),']');
++---------------------------------------+
+| concat('[', btrim('    hello '), ']') |
++---------------------------------------+
+| [hello]                               |
++---------------------------------------+
+
+-- Remove any instances of x or y or z at beginning or end. Leave spaces alone.
+select concat('[',btrim('xy    hello zyzzxx','xyz'),']');
++------------------------------------------------------+
+| concat('[', btrim('xy    hello zyzzxx', 'xyz'), ']') |
++------------------------------------------------------+
+| [    hello ]                                         |
++------------------------------------------------------+
+
+-- Remove any instances of x or y or z at beginning or end.
+-- Leave x, y, z alone in the middle of the string.
+select concat('[',btrim('xyhelxyzlozyzzxx','xyz'),']');
++----------------------------------------------------+
+| concat('[', btrim('xyhelxyzlozyzzxx', 'xyz'), ']') |
++----------------------------------------------------+
+| [helxyzlo]                                         |
++----------------------------------------------------+
+</code></pre>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="string_functions__char_length">
+          <code class="ph codeph">char_length(string a), <span class="ph" id="string_functions__character_length">character_length(string a)</span></code>
+        </dt>
+
+        <dd class="dd">
+
+
+          <strong class="ph b">Purpose:</strong> Returns the length in characters of the argument string, including any
+          trailing spaces that pad a <code class="ph codeph">CHAR</code> value.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">int</code>
+          </p>
+          <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+          <p class="p">
+            When applied to a <code class="ph codeph">STRING</code> value, it returns the
+            same result as the <code class="ph codeph">length()</code> function. When applied
+            to a <code class="ph codeph">CHAR</code> value, it might return a larger value
+            than <code class="ph codeph">length()</code> does, to account for trailing spaces
+            in the <code class="ph codeph">CHAR</code>.
+          </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+          <div class="p">
+        The following example demonstrates how <code class="ph codeph">length()</code>
+        and <code class="ph codeph">char_length()</code> sometimes produce the same result,
+        and sometimes produce different results depending on the type of the
+        argument and the presence of trailing spaces for <code class="ph codeph">CHAR</code>
+        values. The <code class="ph codeph">S</code> and <code class="ph codeph">C</code> values are
+        displayed with enclosing quotation marks to show any trailing spaces.
+<pre class="pre codeblock" id="string_functions__d6e2627"><code>create table length_demo (s string, c char(5));
+insert into length_demo values
+  ('a',cast('a' as char(5))),
+  ('abc',cast('abc' as char(5))),
+  ('hello',cast('hello' as char(5)));
+
+select concat('"',s,'"') as s, concat('"',c,'"') as c,
+  length(s), length(c),
+  char_length(s), char_length(c)
+from length_demo;
++---------+---------+-----------+-----------+----------------+----------------+
+| s       | c       | length(s) | length(c) | char_length(s) | char_length(c) |
++---------+---------+-----------+-----------+----------------+----------------+
+| "a"     | "a    " | 1         | 1         | 1              | 5              |
+| "abc"   | "abc  " | 3         | 3         | 3              | 5              |
+| "hello" | "hello" | 5         | 5         | 5              | 5              |
++---------+---------+-----------+-----------+----------------+----------------+
+</code></pre>
+      </div>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="string_functions__chr">
+          <code class="ph codeph">chr(int character_code)</code>
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Returns a character specified by a decimal code point value.
+          The interpretation and display of the resulting character depends on your system locale.
+          Because consistent processing of Impala string values is only guaranteed
+          for values within the ASCII range, only use this function for values
+          corresponding to ASCII characters.
+          In particular, parameter values greater than 255 return an empty string.
+          <p class="p"><strong class="ph b">Return type:</strong> <code class="ph codeph">string</code></p>
+          <p class="p">
+            <strong class="ph b">Usage notes:</strong> Can be used as the inverse of the <code class="ph codeph">ascii()</code> function, which
+            converts a character to its numeric ASCII code.
+          </p>
+          <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.3.0</span>
+      </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+<pre class="pre codeblock"><code>SELECT chr(65);
++---------+
+| chr(65) |
++---------+
+| A       |
++---------+
+
+SELECT chr(97);
++---------+
+| chr(97) |
++---------+
+| a       |
++---------+
+</code></pre>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="string_functions__concat">
+          <code class="ph codeph">concat(string a, string b...)</code>
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Returns a single string representing all the argument values joined together.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+          </p>
+          <p class="p">
+        <strong class="ph b">Usage notes:</strong> <code class="ph codeph">concat()</code> and <code class="ph codeph">concat_ws()</code> are appropriate for
+        concatenating the values of multiple columns within the same row, while <code class="ph codeph">group_concat()</code>
+        joins together values from different rows.
+      </p>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="string_functions__concat_ws">
+          <code class="ph codeph">concat_ws(string sep, string a, string b...)</code>
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Returns a single string representing the second and following argument values joined
+          together, delimited by a specified separator.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+          </p>
+          <p class="p">
+        <strong class="ph b">Usage notes:</strong> <code class="ph codeph">concat()</code> and <code class="ph codeph">concat_ws()</code> are appropriate for
+        concatenating the values of multiple columns within the same row, while <code class="ph codeph">group_concat()</code>
+        joins together values from different rows.
+      </p>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="string_functions__find_in_set">
+          <code class="ph codeph">find_in_set(string str, string strList)</code>
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Returns the position (starting from 1) of the first occurrence of a specified string
+          within a comma-separated string. Returns <code class="ph codeph">NULL</code> if either argument is
+          <code class="ph codeph">NULL</code>, 0 if the search string is not found, or 0 if the search string contains a comma.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">int</code>
+          </p>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="string_functions__group_concat">
+          <code class="ph codeph">group_concat(string s [, string sep])</code>
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Returns a single string representing the argument value concatenated together for each
+          row of the result set. If the optional separator string is specified, the separator is added between each
+          pair of concatenated values.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+          </p>
+          <p class="p">
+        <strong class="ph b">Usage notes:</strong> <code class="ph codeph">concat()</code> and <code class="ph codeph">concat_ws()</code> are appropriate for
+        concatenating the values of multiple columns within the same row, while <code class="ph codeph">group_concat()</code>
+        joins together values from different rows.
+      </p>
+          <p class="p">
+            By default, returns a single string covering the whole result set. To include other columns or values
+            in the result set, or to produce multiple concatenated strings for subsets of rows, include a
+            <code class="ph codeph">GROUP BY</code> clause in the query.
+          </p>
+          <p class="p">
+            Strictly speaking, <code class="ph codeph">group_concat()</code> is an aggregate function, not a scalar
+            function like the others in this list.
+            For additional details and examples, see <a class="xref" href="impala_group_concat.html#group_concat">GROUP_CONCAT Function</a>.
+          </p>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="string_functions__initcap">
+          <code class="ph codeph">initcap(string str)</code>
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Returns the input string with the first letter capitalized.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+          </p>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="string_functions__instr">
+          <code class="ph codeph">instr(string str, string substr <span class="ph">[, bigint position [, bigint occurrence ] ]</span>)</code>
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Returns the position (starting from 1) of the first occurrence of a substring within a
+          longer string.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">int</code>
+          </p>
+          <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+          <p class="p">
+            If the substring is not present in the string, the function returns 0:
+          </p>
+
+<pre class="pre codeblock"><code>
+select instr('foo bar bletch', 'z');
++------------------------------+
+| instr('foo bar bletch', 'z') |
++------------------------------+
+| 0                            |
++------------------------------+
+</code></pre>
+
+          <p class="p">
+            The optional third and fourth arguments let you find instances of the substring
+            other than the first instance starting from the left:
+          </p>
+          <ul class="ul">
+            <li class="li">
+              <p class="p">
+                The third argument lets you specify a starting point within the string
+                other than 1:
+              </p>
+
+<pre class="pre codeblock"><code>
+-- Restricting the search to positions 7..end,
+-- the first occurrence of 'b' is at position 9.
+select instr('foo bar bletch', 'b', 7);
++---------------------------------+
+| instr('foo bar bletch', 'b', 7) |
++---------------------------------+
+| 9                               |
++---------------------------------+
+
+-- If there are no more occurrences after the
+-- specified position, the result is 0.
+select instr('foo bar bletch', 'b', 10);
++----------------------------------+
+| instr('foo bar bletch', 'b', 10) |
++----------------------------------+
+| 0                                |
++----------------------------------+
+</code></pre>
+
+              <p class="p">
+                If the third argument is negative, the search works right-to-left
+                starting that many characters from the right. The return value still
+                represents the position starting from the left side of the string.
+              </p>
+
+<pre class="pre codeblock"><code>
+-- Scanning right to left, the first occurrence of 'o'
+-- is at position 8. (8th character from the left.)
+select instr('hello world','o',-1);
++-------------------------------+
+| instr('hello world', 'o', -1) |
++-------------------------------+
+| 8                             |
++-------------------------------+
+
+-- Scanning right to left, starting from the 6th character
+-- from the right, the first occurrence of 'o' is at
+-- position 5 (5th character from the left).
+select instr('hello world','o',-6);
++-------------------------------+
+| instr('hello world', 'o', -6) |
++-------------------------------+
+| 5                             |
++-------------------------------+
+
+-- If there are no more occurrences after the
+-- specified position, the result is 0.
+select instr('hello world','o',-10);
++--------------------------------+
+| instr('hello world', 'o', -10) |
++--------------------------------+
+| 0                              |
++--------------------------------+
+</code></pre>
+
+            </li>
+
+            <li class="li">
+              <p class="p">
+                The fourth argument lets you specify an occurrence other than the first:
+              </p>
+
+<pre class="pre codeblock"><code>
+-- 2nd occurrence of 'b' is at position 9.
+select instr('foo bar bletch', 'b', 1, 2);
++------------------------------------+
+| instr('foo bar bletch', 'b', 1, 2) |
++------------------------------------+
+| 9                                  |
++------------------------------------+
+
+-- Negative position argument means scan right-to-left.
+-- This example finds second instance of 'b' from the right.
+select instr('foo bar bletch', 'b', -1, 2);
++-------------------------------------+
+| instr('foo bar bletch', 'b', -1, 2) |
++-------------------------------------+
+| 5                                   |
++-------------------------------------+
+</code></pre>
+
+              <p class="p">
+                If the fourth argument is greater than the number of matching occurrences,
+                the function returns 0:
+              </p>
+
+<pre class="pre codeblock"><code>
+-- There is no 3rd occurrence within the string.
+select instr('foo bar bletch', 'b', 1, 3);
++------------------------------------+
+| instr('foo bar bletch', 'b', 1, 3) |
++------------------------------------+
+| 0                                  |
++------------------------------------+
+
+-- There is not even 1 occurrence when scanning
+-- the string starting at position 10.
+select instr('foo bar bletch', 'b', 10, 1);
++-------------------------------------+
+| instr('foo bar bletch', 'b', 10, 1) |
++-------------------------------------+
+| 0                                   |
++-------------------------------------+
+</code></pre>
+
+              <p class="p">
+                The fourth argument cannot be negative or zero. A non-positive value for
+                this argument causes an error:
+              </p>
+
+<pre class="pre codeblock"><code>
+select instr('foo bar bletch', 'b', 1, 0);
+ERROR: UDF ERROR: Invalid occurrence parameter to instr function: 0
+
+select instr('aaaaaaaaa','aa', 1, -1);
+ERROR: UDF ERROR: Invalid occurrence parameter to instr function: -1
+</code></pre>
+
+            </li>
+
+            <li class="li">
+              <p class="p">
+                If either of the optional arguments is <code class="ph codeph">NULL</code>,
+                the function also returns <code class="ph codeph">NULL</code>:
+              </p>
+
+<pre class="pre codeblock"><code>
+select instr('foo bar bletch', 'b', null);
++------------------------------------+
+| instr('foo bar bletch', 'b', null) |
++------------------------------------+
+| NULL                               |
++------------------------------------+
+
+select instr('foo bar bletch', 'b', 1, null);
++---------------------------------------+
+| instr('foo bar bletch', 'b', 1, null) |
++---------------------------------------+
+| NULL                                  |
++---------------------------------------+
+</code></pre>
+            </li>
+
+          </ul>
+
+        </dd>
+
+
+
+        <dt class="dt dlterm" id="string_functions__left">
+          <code class="ph codeph">left(string a, int num_chars)</code>
+        </dt>
+        <dd class="dd">
+          See the <code class="ph codeph">strleft</code> function.
+        </dd>
+
+
+
+
+        <dt class="dt dlterm" id="string_functions__length">
+          <code class="ph codeph">length(string a)</code>
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Returns the length in characters of the argument string,
+          ignoring any trailing spaces in <code class="ph codeph">CHAR</code> values.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">int</code>
+          </p>
+          <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+          <p class="p">
+            When applied to a <code class="ph codeph">STRING</code> value, it returns the
+            same result as the <code class="ph codeph">char_length()</code> function. When applied
+            to a <code class="ph codeph">CHAR</code> value, it might return a smaller value
+            than <code class="ph codeph">char_length()</code> does, because <code class="ph codeph">length()</code>
+            ignores any trailing spaces in the <code class="ph codeph">CHAR</code>.
+          </p>
+          <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+            Because the behavior of <code class="ph codeph">length()</code> with <code class="ph codeph">CHAR</code>
+            values containing trailing spaces is not standardized across the industry,
+            when porting code from other database systems, evaluate the behavior of
+            <code class="ph codeph">length()</code> on the source system and switch to
+            <code class="ph codeph">char_length()</code> for Impala if necessary.
+          </div>
+
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+          <div class="p">
+        The following example demonstrates how <code class="ph codeph">length()</code>
+        and <code class="ph codeph">char_length()</code> sometimes produce the same result,
+        and sometimes produce different results depending on the type of the
+        argument and the presence of trailing spaces for <code class="ph codeph">CHAR</code>
+        values. The <code class="ph codeph">S</code> and <code class="ph codeph">C</code> values are
+        displayed with enclosing quotation marks to show any trailing spaces.
+<pre class="pre codeblock" id="string_functions__d6e2627"><code>create table length_demo (s string, c char(5));
+insert into length_demo values
+  ('a',cast('a' as char(5))),
+  ('abc',cast('abc' as char(5))),
+  ('hello',cast('hello' as char(5)));
+
+select concat('"',s,'"') as s, concat('"',c,'"') as c,
+  length(s), length(c),
+  char_length(s), char_length(c)
+from length_demo;
++---------+---------+-----------+-----------+----------------+----------------+
+| s       | c       | length(s) | length(c) | char_length(s) | char_length(c) |
++---------+---------+-----------+-----------+----------------+----------------+
+| "a"     | "a    " | 1         | 1         | 1              | 5              |
+| "abc"   | "abc  " | 3         | 3         | 3              | 5              |
+| "hello" | "hello" | 5         | 5         | 5              | 5              |
++---------+---------+-----------+-----------+----------------+----------------+
+</code></pre>
+      </div>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="string_functions__locate">
+          <code class="ph codeph">locate(string substr, string str[, int pos])</code>
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Returns the position (starting from 1) of the first occurrence of a substring within a
+          longer string, optionally after a particular position.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">int</code>
+          </p>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="string_functions__lower">
+          <code class="ph codeph">lower(string a), <span class="ph" id="string_functions__lcase">lcase(string a)</span> </code>
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Returns the argument string converted to all-lowercase.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+          </p>
+          <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+          <p class="p">
+        In <span class="keyword">Impala 2.5</span> and higher, you can simplify queries that
+        use many <code class="ph codeph">UPPER()</code> and <code class="ph codeph">LOWER()</code> calls
+        to do case-insensitive comparisons, by using the <code class="ph codeph">ILIKE</code>
+        or <code class="ph codeph">IREGEXP</code> operators instead. See
+        <a class="xref" href="../shared/../topics/impala_operators.html#ilike">ILIKE Operator</a> and
+        <a class="xref" href="../shared/../topics/impala_operators.html#iregexp">IREGEXP Operator</a> for details.
+      </p>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="string_functions__lpad">
+          <code class="ph codeph">lpad(string str, int len, string pad)</code>
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Returns a string of a specified length, based on the first argument string. If the
+          specified string is too short, it is padded on the left with a repeating sequence of the characters from
+          the pad string. If the specified string is too long, it is truncated on the right.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+          </p>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="string_functions__ltrim">
+          <code class="ph codeph">ltrim(string a [, string chars_to_trim])</code>
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Returns the argument string with all occurrences
+          of characters specified by the second argument removed from
+          the left side. Removes spaces if the second argument is not specified.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+          </p>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="string_functions__parse_url">
+          <code class="ph codeph">parse_url(string urlString, string partToExtract [, string keyToExtract])</code>
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Returns the portion of a URL corresponding to a specified part. The part argument can be
+          <code class="ph codeph">'PROTOCOL'</code>, <code class="ph codeph">'HOST'</code>, <code class="ph codeph">'PATH'</code>, <code class="ph codeph">'REF'</code>,
+          <code class="ph codeph">'AUTHORITY'</code>, <code class="ph codeph">'FILE'</code>, <code class="ph codeph">'USERINFO'</code>, or
+          <code class="ph codeph">'QUERY'</code>. Uppercase is required for these literal values. When requesting the
+          <code class="ph codeph">QUERY</code> portion of the URL, you can optionally specify a key to retrieve just the
+          associated value from the key-value pairs in the query string.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+          </p>
+          <p class="p">
+            <strong class="ph b">Usage notes:</strong> This function is important for the traditional Hadoop use case of interpreting web
+            logs. For example, if the web traffic data features raw URLs not divided into separate table columns,
+            you can count visitors to a particular page by extracting the <code class="ph codeph">'PATH'</code> or
+            <code class="ph codeph">'FILE'</code> field, or analyze search terms by extracting the corresponding key from the
+            <code class="ph codeph">'QUERY'</code> field.
+          </p>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="string_functions__regexp_escape">
+          <code class="ph codeph">regexp_escape(string source)</code>
+        </dt>
+
+        <dd class="dd">
+          <strong class="ph b">Purpose:</strong> The <code class="ph codeph">regexp_escape</code> function returns
+          a string escaped for the special character in RE2 library so that the
+          special characters are interpreted literally rather than as special
+          characters. The following special characters are escaped by the
+          function:
+<pre class="pre codeblock"><code>.\+*?[^]$(){}=!&lt;&gt;|:-</code></pre>
+
+          <p class="p">
+            <strong class="ph b">Return type:</strong>
+            <code class="ph codeph">string</code>
+          </p>
+
+          <p class="p">
+        In Impala 2.0 and later, the Impala regular expression syntax conforms to the POSIX Extended Regular
+        Expression syntax used by the Google RE2 library. For details, see
+        <a class="xref" href="https://code.google.com/p/re2/" target="_blank">the RE2 documentation</a>. It
+        has most idioms familiar from regular expressions in Perl, Python, and so on, including
+        <code class="ph codeph">.*?</code> for non-greedy matches.
+      </p>
+          <p class="p">
+        In Impala 2.0 and later, a change in the underlying regular expression library could cause changes in the
+        way regular expressions are interpreted by this function. Test any queries that use regular expressions and
+        adjust the expression patterns if necessary. See
+        <a class="xref" href="../shared/../topics/impala_incompatible_changes.html#incompatible_changes_200">Incompatible Changes Introduced in Impala 2.0.0</a> for details.
+      </p>
+          <p class="p">
+        Because the <span class="keyword cmdname">impala-shell</span> interpreter uses the <code class="ph codeph">\</code> character for escaping,
+        use <code class="ph codeph">\\</code> to represent the regular expression escape character in any regular expressions
+        that you submit through <span class="keyword cmdname">impala-shell</span> . You might prefer to use the equivalent character
+        class names, such as <code class="ph codeph">[[:digit:]]</code> instead of <code class="ph codeph">\d</code> which you would have to
+        escape as <code class="ph codeph">\\d</code>.
+      </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+          <p class="p">
+            This example shows escaping one of special characters in RE2.
+          </p>
+<pre class="pre codeblock"><code>
++------------------------------------------------------+
+| regexp_escape('Hello.world')                         |
++------------------------------------------------------+
+| Hello\.world                                         |
++------------------------------------------------------+
+</code></pre>
+          <p class="p">
+            This example shows escaping all the special characters in RE2.
+          </p>
+<pre class="pre codeblock"><code>
++------------------------------------------------------------+
+| regexp_escape('a.b\\c+d*e?f[g]h$i(j)k{l}m=n!o&lt;p&gt;q|r:s-t')  |
++------------------------------------------------------------+
+| a\.b\\c\+d\*e\?f\[g\]h\$i\(j\)k\{l\}m\=n\!o\&lt;p\&gt;q\|r\:s\-t |
++------------------------------------------------------------+
+
+</code></pre>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="string_functions__regexp_extract">
+          <code class="ph codeph">regexp_extract(string subject, string pattern, int index)</code>
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Returns the specified () group from a string based on a regular expression pattern. Group
+          0 refers to the entire extracted string, while group 1, 2, and so on refers to the first, second, and so
+          on <code class="ph codeph">(...)</code> portion.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+          </p>
+          <p class="p">
+        In Impala 2.0 and later, the Impala regular expression syntax conforms to the POSIX Extended Regular
+        Expression syntax used by the Google RE2 library. For details, see
+        <a class="xref" href="https://code.google.com/p/re2/" target="_blank">the RE2 documentation</a>. It
+        has most idioms familiar from regular expressions in Perl, Python, and so on, including
+        <code class="ph codeph">.*?</code> for non-greedy matches.
+      </p>
+          <p class="p">
+        In Impala 2.0 and later, a change in the underlying regular expression library could cause changes in the
+        way regular expressions are interpreted by this function. Test any queries that use regular expressions and
+        adjust the expression patterns if necessary. See
+        <a class="xref" href="../shared/../topics/impala_incompatible_changes.html#incompatible_changes_200">Incompatible Changes Introduced in Impala 2.0.0</a> for details.
+      </p>
+          <p class="p">
+        Because the <span class="keyword cmdname">impala-shell</span> interpreter uses the <code class="ph codeph">\</code> character for escaping,
+        use <code class="ph codeph">\\</code> to represent the regular expression escape character in any regular expressions
+        that you submit through <span class="keyword cmdname">impala-shell</span> . You might prefer to use the equivalent character
+        class names, such as <code class="ph codeph">[[:digit:]]</code> instead of <code class="ph codeph">\d</code> which you would have to
+        escape as <code class="ph codeph">\\d</code>.
+      </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+          <p class="p">
+            This example shows how group 0 matches the full pattern string, including the portion outside any
+            <code class="ph codeph">()</code> group:
+          </p>
+<pre class="pre codeblock"><code>[localhost:21000] &gt; select regexp_extract('abcdef123ghi456jkl','.*?(\\d+)',0);
++------------------------------------------------------+
+| regexp_extract('abcdef123ghi456jkl', '.*?(\\d+)', 0) |
++------------------------------------------------------+
+| abcdef123ghi456                                      |
++------------------------------------------------------+
+Returned 1 row(s) in 0.11s</code></pre>
+          <p class="p">
+            This example shows how group 1 matches just the contents inside the first <code class="ph codeph">()</code> group in
+            the pattern string:
+          </p>
+<pre class="pre codeblock"><code>[localhost:21000] &gt; select regexp_extract('abcdef123ghi456jkl','.*?(\\d+)',1);
++------------------------------------------------------+
+| regexp_extract('abcdef123ghi456jkl', '.*?(\\d+)', 1) |
++------------------------------------------------------+
+| 456                                                  |
++------------------------------------------------------+
+Returned 1 row(s) in 0.11s</code></pre>
+          <p class="p">
+            Unlike in earlier Impala releases, the regular expression library used in Impala 2.0 and later supports
+            the <code class="ph codeph">.*?</code> idiom for non-greedy matches. This example shows how a pattern string starting
+            with <code class="ph codeph">.*?</code> matches the shortest possible portion of the source string, returning the
+            rightmost set of lowercase letters. A pattern string both starting and ending with <code class="ph codeph">.*?</code>
+            finds two potential matches of equal length, and returns the first one found (the leftmost set of
+            lowercase letters).
+          </p>
+<pre class="pre codeblock"><code>[localhost:21000] &gt; select regexp_extract('AbcdBCdefGHI','.*?([[:lower:]]+)',1);
++--------------------------------------------------------+
+| regexp_extract('abcdbcdefghi', '.*?([[:lower:]]+)', 1) |
++--------------------------------------------------------+
+| def                                                    |
++--------------------------------------------------------+
+[localhost:21000] &gt; select regexp_extract('AbcdBCdefGHI','.*?([[:lower:]]+).*?',1);
++-----------------------------------------------------------+
+| regexp_extract('abcdbcdefghi', '.*?([[:lower:]]+).*?', 1) |
++-----------------------------------------------------------+
+| bcd                                                       |
++-----------------------------------------------------------+
+</code></pre>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="string_functions__regexp_like">
+          <code class="ph codeph">regexp_like(string source, string pattern[, string options])</code>
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Returns <code class="ph codeph">true</code> or <code class="ph codeph">false</code> to indicate
+          whether the source string contains anywhere inside it the regular expression given by the pattern.
+          The optional third argument consists of letter flags that change how the match is performed,
+          such as <code class="ph codeph">i</code> for case-insensitive matching.
+          <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+          <p class="p">
+            The flags that you can include in the optional third argument are:
+          </p>
+          <ul class="ul">
+          <li class="li">
+          <code class="ph codeph">c</code>: Case-sensitive matching (the default).
+          </li>
+          <li class="li">
+          <code class="ph codeph">i</code>: Case-insensitive matching. If multiple instances of <code class="ph codeph">c</code> and <code class="ph codeph">i</code>
+          are included in the third argument, the last such option takes precedence.
+          </li>
+          <li class="li">
+          <code class="ph codeph">m</code>: Multi-line matching. The <code class="ph codeph">^</code> and <code class="ph codeph">$</code>
+          operators match the start or end of any line within the source string, not the
+          start and end of the entire string.
+          </li>
+          <li class="li">
+          <code class="ph codeph">n</code>: Newline matching. The <code class="ph codeph">.</code> operator can match the
+          newline character. A repetition operator such as <code class="ph codeph">.*</code> can
+          match a portion of the source string that spans multiple lines.
+          </li>
+          </ul>
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">boolean</code>
+          </p>
+          <p class="p">
+        In Impala 2.0 and later, the Impala regular expression syntax conforms to the POSIX Extended Regular
+        Expression syntax used by the Google RE2 library. For details, see
+        <a class="xref" href="https://code.google.com/p/re2/" target="_blank">the RE2 documentation</a>. It
+        has most idioms familiar from regular expressions in Perl, Python, and so on, including
+        <code class="ph codeph">.*?</code> for non-greedy matches.
+      </p>
+          <p class="p">
+        In Impala 2.0 and later, a change in the underlying regular expression library could cause changes in the
+        way regular expressions are interpreted by this function. Test any queries that use regular expressions and
+        adjust the expression patterns if necessary. See
+        <a class="xref" href="../shared/../topics/impala_incompatible_changes.html#incompatible_changes_200">Incompatible Changes Introduced in Impala 2.0.0</a> for details.
+      </p>
+          <p class="p">
+        Because the <span class="keyword cmdname">impala-shell</span> interpreter uses the <code class="ph codeph">\</code> character for escaping,
+        use <code class="ph codeph">\\</code> to represent the regular expression escape character in any regular expressions
+        that you submit through <span class="keyword cmdname">impala-shell</span> . You might prefer to use the equivalent character
+        class names, such as <code class="ph codeph">[[:digit:]]</code> instead of <code class="ph codeph">\d</code> which you would have to
+        escape as <code class="ph codeph">\\d</code>.
+      </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+          <p class="p">
+            This example shows how <code class="ph codeph">regexp_like()</code> can test for the existence
+            of various kinds of regular expression patterns within a source string:
+          </p>
+<pre class="pre codeblock"><code>
+-- Matches because the 'f' appears somewhere in 'foo'.
+select regexp_like('foo','f');
++-------------------------+
+| regexp_like('foo', 'f') |
++-------------------------+
+| true                    |
++-------------------------+
+
+-- Does not match because the comparison is case-sensitive by default.
+select regexp_like('foo','F');
++-------------------------+
+| regexp_like('foo', 'f') |
++-------------------------+
+| false                   |
++-------------------------+
+
+-- The 3rd argument can change the matching logic, such as 'i' meaning case-insensitive.
+select regexp_like('foo','F','i');
++------------------------------+
+| regexp_like('foo', 'f', 'i') |
++------------------------------+
+| true                         |
++------------------------------+
+
+-- The familiar regular expression notations work, such as ^ and $ anchors...
+select regexp_like('foo','f$');
++--------------------------+
+| regexp_like('foo', 'f$') |
++--------------------------+
+| false                    |
++--------------------------+
+
+select regexp_like('foo','o$');
++--------------------------+
+| regexp_like('foo', 'o$') |
++--------------------------+
+| true                     |
++--------------------------+
+
+-- ...and repetition operators such as * and +
+select regexp_like('foooooobar','fo+b');
++-----------------------------------+
+| regexp_like('foooooobar', 'fo+b') |
++-----------------------------------+
+| true                              |
++-----------------------------------+
+
+select regexp_like('foooooobar','fx*y*o*b');
++---------------------------------------+
+| regexp_like('foooooobar', 'fx*y*o*b') |
++---------------------------------------+
+| true                                  |
++---------------------------------------+
+
+</code></pre>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="string_functions__regexp_replace">
+          <code class="ph codeph">regexp_replace(string initial, string pattern, string replacement)</code>
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Returns the initial argument with the regular expression pattern replaced by the final
+          argument string.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+          </p>
+          <p class="p">
+        In Impala 2.0 and later, the Impala regular expression syntax conforms to the POSIX Extended Regular
+        Expression syntax used by the Google RE2 library. For details, see
+        <a class="xref" href="https://code.google.com/p/re2/" target="_blank">the RE2 documentation</a>. It
+        has most idioms familiar from regular expressions in Perl, Python, and so on, including
+        <code class="ph codeph">.*?</code> for non-greedy matches.
+      </p>
+          <p class="p">
+        In Impala 2.0 and later, a change in the underlying regular expression library could cause changes in the
+        way regular expressions are interpreted by this function. Test any queries that use regular expressions and
+        adjust the expression patterns if necessary. See
+        <a class="xref" href="../shared/../topics/impala_incompatible_changes.html#incompatible_changes_200">Incompatible Changes Introduced in Impala 2.0.0</a> for details.
+      </p>
+          <p class="p">
+        Because the <span class="keyword cmdname">impala-shell</span> interpreter uses the <code class="ph codeph">\</code> character for escaping,
+        use <code class="ph codeph">\\</code> to represent the regular expression escape character in any regular expressions
+        that you submit through <span class="keyword cmdname">impala-shell</span> . You might prefer to use the equivalent character
+        class names, such as <code class="ph codeph">[[:digit:]]</code> instead of <code class="ph codeph">\d</code> which you would have to
+        escape as <code class="ph codeph">\\d</code>.
+      </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+          <p class="p">
+            These examples show how you can replace parts of a string matching a pattern with replacement text,
+            which can include backreferences to any <code class="ph codeph">()</code> groups in the pattern string. The
+            backreference numbers start at 1, and any <code class="ph codeph">\</code> characters must be escaped as
+            <code class="ph codeph">\\</code>.
+          </p>
+          <p class="p">
+            Replace a character pattern with new text:
+          </p>
+<pre class="pre codeblock"><code>[localhost:21000] &gt; select regexp_replace('aaabbbaaa','b+','xyz');
++------------------------------------------+
+| regexp_replace('aaabbbaaa', 'b+', 'xyz') |
++------------------------------------------+
+| aaaxyzaaa                                |
++------------------------------------------+
+Returned 1 row(s) in 0.11s</code></pre>
+          <p class="p">
+            Replace a character pattern with substitution text that includes the original matching text:
+          </p>
+<pre class="pre codeblock"><code>[localhost:21000] &gt; select regexp_replace('aaabbbaaa','(b+)','&lt;\\1&gt;');
++----------------------------------------------+
+| regexp_replace('aaabbbaaa', '(b+)', '&lt;\\1&gt;') |
++----------------------------------------------+
+| aaa&lt;bbb&gt;aaa                                  |
++----------------------------------------------+
+Returned 1 row(s) in 0.11s</code></pre>
+          <p class="p">
+            Remove all characters that are not digits:
+          </p>
+<pre class="pre codeblock"><code>[localhost:21000] &gt; select regexp_replace('123-456-789','[^[:digit:]]','');
++---------------------------------------------------+
+| regexp_replace('123-456-789', '[^[:digit:]]', '') |
++---------------------------------------------------+
+| 123456789                                         |
++---------------------------------------------------+
+Returned 1 row(s) in 0.12s</code></pre>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="string_functions__repeat">
+          <code class="ph codeph">repeat(string str, int n)</code>
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Returns the argument string repeated a specified number of times.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+          </p>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="string_functions__replace">
+          <code class="ph codeph">replace(string initial, string target, string replacement)</code>
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Returns the initial argument with all occurrences of the target string
+          replaced by the replacement string.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+          </p>
+          <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+          <p class="p">
+            Because this function does not use any regular expression patterns, it is typically faster
+            than <code class="ph codeph">regexp_replace()</code> for simple string substitutions.
+          </p>
+          <p class="p">
+            If any argument is <code class="ph codeph">NULL</code>, the return value is <code class="ph codeph">NULL</code>.
+          </p>
+          <p class="p">
+            Matching is case-sensitive.
+          </p>
+          <p class="p">
+            If the replacement string contains another instance of the target
+            string, the expansion is only performed once, instead of
+            applying again to the newly constructed string.
+          </p>
+          <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.9.0</span>
+      </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+<pre class="pre codeblock"><code>-- Replace one string with another.
+select replace('hello world','world','earth');
++------------------------------------------+
+| replace('hello world', 'world', 'earth') |
++------------------------------------------+
+| hello earth                              |
++------------------------------------------+
+
+-- All occurrences of the target string are replaced.
+select replace('hello world','o','0');
++----------------------------------+
+| replace('hello world', 'o', '0') |
++----------------------------------+
+| hell0 w0rld                      |
++----------------------------------+
+
+-- If no match is found, the original string is returned unchanged.
+select replace('hello world','xyz','abc');
++--------------------------------------+
+| replace('hello world', 'xyz', 'abc') |
++--------------------------------------+
+| hello world                          |
++--------------------------------------+
+</code></pre>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="string_functions__reverse">
+          <code class="ph codeph">reverse(string a)</code>
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Returns the argument string with characters in reversed order.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+          </p>
+        </dd>
+
+
+
+        <dt class="dt dlterm" id="string_functions__right">
+          <code class="ph codeph">right(string a, int num_chars)</code>
+        </dt>
+        <dd class="dd">
+          See the <code class="ph codeph">strright</code> function.
+        </dd>
+
+
+
+
+        <dt class="dt dlterm" id="string_functions__rpad">
+          <code class="ph codeph">rpad(string str, int len, string pad)</code>
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Returns a string of a specified length, based on the first argument string. If the
+          specified string is too short, it is padded on the right with a repeating sequence of the characters from
+          the pad string. If the specified string is too long, it is truncated on the right.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+          </p>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="string_functions__rtrim">
+          <code class="ph codeph">rtrim(string a [, string chars_to_trim])</code>
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Returns the argument string with all occurrences
+          of characters specified by the second argument removed from
+          the right side. Removes spaces if the second argument is not specified.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+          </p>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="string_functions__space">
+          <code class="ph codeph">space(int n)</code>
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Returns a concatenated string of the specified number of spaces. Shorthand for
+          <code class="ph codeph">repeat(' ',<var class="keyword varname">n</var>)</code>.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+          </p>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="string_functions__split_part">
+          <code class="ph codeph">split_part(string source, string delimiter, bigint n)</code>
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Returns the nth field within a delimited string. The
+          fields are numbered starting from 1. The delimiter can consist of
+          multiple characters, not just a single character. All matching of the
+          delimiter is done exactly, not using any regular expression patterns.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+          </p>
+          <p class="p">
+        In Impala 2.0 and later, the Impala regular expression syntax conforms to the POSIX Extended Regular
+        Expression syntax used by the Google RE2 library. For details, see
+        <a class="xref" href="https://code.google.com/p/re2/" target="_blank">the RE2 documentation</a>. It
+        has most idioms familiar from regular expressions in Perl, Python, and so on, including
+        <code class="ph codeph">.*?</code> for non-greedy matches.
+      </p>
+          <p class="p">
+        In Impala 2.0 and later, a change in the underlying regular expression library could cause changes in the
+        way regular expressions are interpreted by this function. Test any queries that use regular expressions and
+        adjust the expression patterns if necessary. See
+        <a class="xref" href="../shared/../topics/impala_incompatible_changes.html#incompatible_changes_200">Incompatible Changes Introduced in Impala 2.0.0</a> for details.
+      </p>
+          <p class="p">
+        Because the <span class="keyword cmdname">impala-shell</span> interpreter uses the <code class="ph codeph">\</code> character for escaping,
+        use <code class="ph codeph">\\</code> to represent the regular expression escape character in any regular expressions
+        that you submit through <span class="keyword cmdname">impala-shell</span> . You might prefer to use the equivalent character
+        class names, such as <code class="ph codeph">[[:digit:]]</code> instead of <code class="ph codeph">\d</code> which you would have to
+        escape as <code class="ph codeph">\\d</code>.
+      </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+          <p class="p">
+            These examples show how to retrieve the nth field from a delimited
+            string:
+          </p>
+<pre class="pre codeblock"><code>
+select split_part('x,y,z',',',1);
++-----------------------------+
+| split_part('x,y,z', ',', 1) |
++-----------------------------+
+| x                           |
++-----------------------------+
+
+select split_part('x,y,z',',',2);
++-----------------------------+
+| split_part('x,y,z', ',', 2) |
++-----------------------------+
+| y                           |
++-----------------------------+
+
+select split_part('x,y,z',',',3);
++-----------------------------+
+| split_part('x,y,z', ',', 3) |
++-----------------------------+
+| z                           |
++-----------------------------+
+
+</code></pre>
+          <p class="p">
+            These examples show what happens for out-of-range field positions.
+            Specifying a value less than 1 produces an error. Specifying a value
+            greater than the number of fields returns a zero-length string
+            (which is not the same as <code class="ph codeph">NULL</code>).
+          </p>
+<pre class="pre codeblock"><code>
+select split_part('x,y,z',',',0);
+ERROR: Invalid field position: 0
+
+with t1 as (select split_part('x,y,z',',',4) nonexistent_field)
+  select
+      nonexistent_field
+    , concat('[',nonexistent_field,']')
+    , length(nonexistent_field);
+from t1
++-------------------+-------------------------------------+---------------------------+
+| nonexistent_field | concat('[', nonexistent_field, ']') | length(nonexistent_field) |
++-------------------+-------------------------------------+---------------------------+
+|                   | []                                  | 0                         |
++-------------------+-------------------------------------+---------------------------+
+
+</code></pre>
+          <p class="p">
+            These examples show how the delimiter can be a multi-character value:
+          </p>
+<pre class="pre codeblock"><code>
+select split_part('one***two***three','***',2);
++-------------------------------------------+
+| split_part('one***two***three', '***', 2) |
++-------------------------------------------+
+| two                                       |
++-------------------------------------------+
+
+select split_part('one\|/two\|/three','\|/',3);
++-------------------------------------------+
+| split_part('one\|/two\|/three', '\|/', 3) |
++-------------------------------------------+
+| three                                     |
++-------------------------------------------+
+
+</code></pre>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="string_functions__strleft">
+          <code class="ph codeph">strleft(string a, int num_chars)</code>
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Returns the leftmost characters of the string. Shorthand for a call to
+          <code class="ph codeph">substr()</code> with 2 arguments.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+          </p>
+
+        </dd>
+
+
+
+
+        <dt class="dt dlterm" id="string_functions__strright">
+          <code class="ph codeph">strright(string a, int num_chars)</code>
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Returns the rightmost characters of the string. Shorthand for a call to
+          <code class="ph codeph">substr()</code> with 2 arguments.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+          </p>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="string_functions__substr">
+          <code class="ph codeph">substr(string a, int start [, int len]), <span class="ph" id="string_functions__substring">substring(string a, int start [, int
+          len])</span></code>
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Returns the portion of the string starting at a specified point, optionally with a
+          specified maximum length. The characters in the string are indexed starting at 1.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+          </p>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="string_functions__translate">
+          <code class="ph codeph">translate(string input, string from, string to)</code>
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Returns the input string with a set of characters replaced by another set of characters.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+          </p>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="string_functions__trim">
+          <code class="ph codeph">trim(string a)</code>
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Returns the input string with both leading and trailing spaces removed. The same as
+          passing the string through both <code class="ph codeph">ltrim()</code> and <code class="ph codeph">rtrim()</code>.
+          <p class="p">
+            <strong class="ph b">Usage notes:</strong> Often used during data cleansing operations during the ETL cycle, if input values might still have surrounding spaces.
+            For a more general-purpose function that can remove other leading and trailing characters besides spaces, see <code class="ph codeph">btrim()</code>.
+          </p>
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+          </p>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="string_functions__upper">
+          <code class="ph codeph">upper(string a), <span class="ph" id="string_functions__ucase">ucase(string a)</span></code>
+        </dt>
+
+        <dd class="dd">
+
+
+          <strong class="ph b">Purpose:</strong> Returns the argument string converted to all-uppercase.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+          </p>
+          <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+          <p class="p">
+        In <span class="keyword">Impala 2.5</span> and higher, you can simplify queries that
+        use many <code class="ph codeph">UPPER()</code> and <code class="ph codeph">LOWER()</code> calls
+        to do case-insensitive comparisons, by using the <code class="ph codeph">ILIKE</code>
+        or <code class="ph codeph">IREGEXP</code> operators instead. See
+        <a class="xref" href="../shared/../topics/impala_operators.html#ilike">ILIKE Operator</a> and
+        <a class="xref" href="../shared/../topics/impala_operators.html#iregexp">IREGEXP Operator</a> for details.
+      </p>
+        </dd>
+
+
+    </dl>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_functions.html">Impala Built-In Functions</a></div></div></nav></article></main></body></html>

[04/51] [partial] impala git commit: [DOCS] Impala doc site update for 3.0

Posted by mi...@apache.org.

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_shell_options.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_shell_options.html b/docs/build3x/html/topics/impala_shell_options.html
new file mode 100644
index 0000000..4d61196
--- /dev/null
+++ b/docs/build3x/html/topics/impala_shell_options.html
@@ -0,0 +1,618 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_impala_shell.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="shell_options"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>impala-shell Configuration Options</title></head><body id="shell_options"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">impala-shell Configuration Options</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+      You can specify the following options when starting the <code class="ph codeph">impala-shell</code> command to change how
+      shell commands are executed. The table shows the format to use when specifying each option on the command
+      line, or through the <span class="ph filepath">$HOME/.impalarc</span> configuration file.
+    </p>
+
+    <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+      <p class="p">
+        These options are different than the configuration options for the <code class="ph codeph">impalad</code> daemon itself.
+        For the <code class="ph codeph">impalad</code> options, see <a class="xref" href="impala_config_options.html#config_options">Modifying Impala Startup Options</a>.
+      </p>
+    </div>
+
+    <p class="p toc inpage"></p>
+  </div>
+
+  <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_impala_shell.html">Using the Impala Shell (impala-shell Command)</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="shell_options__shell_option_summary">
+
+    <h2 class="title topictitle2" id="ariaid-title2">Summary of impala-shell Configuration Options</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        The following table shows the names and allowed arguments for the <span class="keyword cmdname">impala-shell</span>
+        configuration options. You can specify options on the command line, or in a configuration file as described
+        in <a class="xref" href="impala_shell_options.html#shell_config_file">impala-shell Configuration File</a>.
+      </p>
+
+      <table class="table"><caption></caption><colgroup><col style="width:25%"><col style="width:25%"><col style="width:50%"></colgroup><thead class="thead">
+            <tr class="row">
+              <th class="entry nocellnorowborder" id="shell_option_summary__entry__1">
+                Command-Line Option
+              </th>
+              <th class="entry nocellnorowborder" id="shell_option_summary__entry__2">
+                Configuration File Setting
+              </th>
+              <th class="entry nocellnorowborder" id="shell_option_summary__entry__3">
+                Explanation
+              </th>
+            </tr>
+          </thead><tbody class="tbody">
+            <tr class="row">
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__1 ">
+                <p class="p">
+                  -B or --delimited
+                </p>
+              </td>
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__2 ">
+                <p class="p">
+                  write_delimited=true
+                </p>
+              </td>
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__3 ">
+                <p class="p">
+                  Causes all query results to be printed in plain format as a delimited text file. Useful for
+                  producing data files to be used with other Hadoop components. Also useful for avoiding the
+                  performance overhead of pretty-printing all output, especially when running benchmark tests using
+                  queries returning large result sets. Specify the delimiter character with the
+                  <code class="ph codeph">--output_delimiter</code> option. Store all query results in a file rather than
+                  printing to the screen with the <code class="ph codeph">-B</code> option. Added in Impala 1.0.1.
+                </p>
+              </td>
+            </tr>
+            <tr class="row">
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__1 ">
+                <p class="p">
+                  -b or
+                </p>
+                <p class="p">
+                  --kerberos_host_fqdn
+                </p>
+              </td>
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__2 ">
+                <p class="p">
+                  kerberos_host_fqdn=
+                </p>
+                <p class="p">
+                  <var class="keyword varname">load-balancer-hostname</var>
+                </p>
+              </td>
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__3 ">
+                <p class="p">
+                  If set, the setting overrides the expected hostname of the
+                  Impala daemon's Kerberos service principal.
+                    <span class="keyword cmdname">impala-shell</span> will check that the server's
+                  principal matches this hostname. This may be used when
+                    <code class="ph codeph">impalad</code> is configured to be accessed via a
+                  load-balancer, but it is desired for impala-shell to talk to a
+                  specific <code class="ph codeph">impalad</code> directly.
+                </p>
+              </td>
+            </tr>
+            <tr class="row">
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__1 ">
+                <p class="p">
+                  --print_header
+                </p>
+              </td>
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__2 ">
+                <p class="p">
+                  print_header=true
+                </p>
+              </td>
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__3 ">
+                <p class="p"></p>
+              </td>
+            </tr>
+            <tr class="row">
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__1 ">
+                <p class="p">
+                  -o <var class="keyword varname">filename</var> or --output_file <var class="keyword varname">filename</var>
+                </p>
+              </td>
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__2 ">
+                <p class="p">
+                  output_file=<var class="keyword varname">filename</var>
+                </p>
+              </td>
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__3 ">
+                <p class="p">
+                  Stores all query results in the specified file. Typically used to store the results of a single
+                  query issued from the command line with the <code class="ph codeph">-q</code> option. Also works for
+                  interactive sessions; you see the messages such as number of rows fetched, but not the actual
+                  result set. To suppress these incidental messages when combining the <code class="ph codeph">-q</code> and
+                  <code class="ph codeph">-o</code> options, redirect <code class="ph codeph">stderr</code> to <code class="ph codeph">/dev/null</code>.
+                  Added in Impala 1.0.1.
+                </p>
+              </td>
+            </tr>
+            <tr class="row">
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__1 ">
+                <p class="p">
+                  --output_delimiter=<var class="keyword varname">character</var>
+                </p>
+              </td>
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__2 ">
+                <p class="p">
+                  output_delimiter=<var class="keyword varname">character</var>
+                </p>
+              </td>
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__3 ">
+                <p class="p">
+                  Specifies the character to use as a delimiter between fields when query results are printed in
+                  plain format by the <code class="ph codeph">-B</code> option. Defaults to tab (<code class="ph codeph">'\t'</code>). If an
+                  output value contains the delimiter character, that field is quoted, escaped by doubling quotation marks, or both. Added in
+                  Impala 1.0.1.
+                </p>
+              </td>
+            </tr>
+            <tr class="row">
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__1 ">
+                <p class="p">
+                  -p or --show_profiles
+                </p>
+              </td>
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__2 ">
+                <p class="p">
+                  show_profiles=true
+                </p>
+              </td>
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__3 ">
+                <p class="p">
+                  Displays the query execution plan (same output as the <code class="ph codeph">EXPLAIN</code> statement) and a
+                  more detailed low-level breakdown of execution steps, for every query executed by the shell.
+                </p>
+              </td>
+            </tr>
+            <tr class="row">
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__1 ">
+                <p class="p">
+                  -h or --help
+                </p>
+              </td>
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__2 ">
+                <p class="p">
+                  N/A
+                </p>
+              </td>
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__3 ">
+                <p class="p">
+                  Displays help information.
+                </p>
+              </td>
+            </tr>
+            <tr class="row">
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__1 ">
+                <p class="p">
+                  N/A
+                </p>
+              </td>
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__2 ">
+                <p class="p">
+                  history_max=1000
+                </p>
+              </td>
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__3 ">
+                <p class="p">
+                  Sets the maximum number of queries to store in the history file.
+                </p>
+              </td>
+            </tr>
+            <tr class="row">
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__1 ">
+                <p class="p">
+                  -i <var class="keyword varname">hostname</var> or
+                  --impalad=<var class="keyword varname">hostname</var>[:<var class="keyword varname">portnum</var>]
+                </p>
+              </td>
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__2 ">
+                <p class="p">
+                  impalad=<var class="keyword varname">hostname</var>[:<var class="keyword varname">portnum</var>]
+                </p>
+              </td>
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__3 ">
+                <p class="p">
+                  Connects to the <code class="ph codeph">impalad</code> daemon on the specified host. The default port of 21000
+                  is assumed unless you provide another value. You can connect to any host in your cluster that is
+                  running <code class="ph codeph">impalad</code>. If you connect to an instance of <code class="ph codeph">impalad</code> that
+                  was started with an alternate port specified by the <code class="ph codeph">--fe_port</code> flag, provide that
+                  alternative port.
+                </p>
+              </td>
+            </tr>
+            <tr class="row">
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__1 ">
+                <p class="p">
+                  -q <var class="keyword varname">query</var> or --query=<var class="keyword varname">query</var>
+                </p>
+              </td>
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__2 ">
+                <p class="p">
+                  query=<var class="keyword varname">query</var>
+                </p>
+              </td>
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__3 ">
+                <p class="p">
+                  Passes a query or other <span class="keyword cmdname">impala-shell</span> command from the command line. The
+                  <span class="keyword cmdname">impala-shell</span> interpreter immediately exits after processing the statement. It
+                  is limited to a single statement, which could be a <code class="ph codeph">SELECT</code>, <code class="ph codeph">CREATE
+                  TABLE</code>, <code class="ph codeph">SHOW TABLES</code>, or any other statement recognized in
+                  <code class="ph codeph">impala-shell</code>. Because you cannot pass a <code class="ph codeph">USE</code> statement and
+                  another query, fully qualify the names for any tables outside the <code class="ph codeph">default</code>
+                  database. (Or use the <code class="ph codeph">-f</code> option to pass a file with a <code class="ph codeph">USE</code>
+                  statement followed by other queries.)
+                </p>
+              </td>
+            </tr>
+            <tr class="row">
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__1 ">
+                <p class="p">
+                  -f <var class="keyword varname">query_file</var> or --query_file=<var class="keyword varname">query_file</var>
+                </p>
+              </td>
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__2 ">
+                <p class="p">
+                  query_file=<var class="keyword varname">path_to_query_file</var>
+                </p>
+              </td>
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__3 ">
+                <p class="p">
+                  Passes a SQL query from a file. Multiple statements must be semicolon (;) delimited.
+                  <span class="ph">In <span class="keyword">Impala 2.3</span> and higher, you can specify a filename of <code class="ph codeph">-</code>
+                  to represent standard input. This feature makes it convenient to use <span class="keyword cmdname">impala-shell</span>
+                  as part of a Unix pipeline where SQL statements are generated dynamically by other tools.</span>
+                </p>
+              </td>
+            </tr>
+            <tr class="row">
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__1 ">
+                <p class="p">
+                  --query_option="<var class="keyword varname">option</var>=<var class="keyword varname">value</var>"
+                  -Q "<var class="keyword varname">option</var>=<var class="keyword varname">value</var>"
+                </p>
+              </td>
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__2 ">
+                <p class="p">
+                 Header line <code class="ph codeph">[impala.query_options]</code>,
+                 followed on subsequent lines by <var class="keyword varname">option</var>=<var class="keyword varname">value</var>, one option per line.
+                </p>
+              </td>
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__3 ">
+                <p class="p">
+                  Sets default query options for an invocation of the <span class="keyword cmdname">impala-shell</span> command.
+                  To set multiple query options at once, use more than one instance of this command-line option.
+                  The query option names are not case-sensitive.
+                </p>
+              </td>
+            </tr>
+            <tr class="row">
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__1 ">
+                <p class="p">
+                  -k or --kerberos
+                </p>
+              </td>
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__2 ">
+                <p class="p">
+                  use_kerberos=true
+                </p>
+              </td>
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__3 ">
+                <p class="p">
+                  Kerberos authentication is used when the shell connects to <code class="ph codeph">impalad</code>. If Kerberos
+                  is not enabled on the instance of <code class="ph codeph">impalad</code> to which you are connecting, errors
+                  are displayed.
+                </p>
+              </td>
+            </tr>
+            <tr class="row">
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__1 ">
+                <p class="p">
+                  -s <var class="keyword varname">kerberos_service_name</var> or --kerberos_service_name=<var class="keyword varname">name</var>
+                </p>
+              </td>
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__2 ">
+                <p class="p">
+                  kerberos_service_name=<var class="keyword varname">name</var>
+                </p>
+              </td>
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__3 ">
+                <p class="p">
+                  Instructs <code class="ph codeph">impala-shell</code> to authenticate to a particular <code class="ph codeph">impalad</code>
+                  service principal. If a <var class="keyword varname">kerberos_service_name</var> is not specified,
+                  <code class="ph codeph">impala</code> is used by default. If this option is used in conjunction with a
+                  connection in which Kerberos is not supported, errors are returned.
+                </p>
+              </td>
+            </tr>
+            <tr class="row">
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__1 ">
+                <p class="p">
+                  -V or --verbose
+                </p>
+              </td>
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__2 ">
+                <p class="p">
+                  verbose=true
+                </p>
+              </td>
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__3 ">
+                <p class="p">
+                  Enables verbose output.
+                </p>
+              </td>
+            </tr>
+            <tr class="row">
+
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__1 ">
+                <p class="p">
+                  --quiet
+                </p>
+              </td>
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__2 ">
+                <p class="p">
+                  verbose=false
+                </p>
+              </td>
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__3 ">
+                <p class="p">
+                  Disables verbose output.
+                </p>
+              </td>
+            </tr>
+            <tr class="row">
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__1 ">
+                <p class="p">
+                  -v or --version
+                </p>
+              </td>
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__2 ">
+                <p class="p">
+                  version=true
+                </p>
+              </td>
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__3 ">
+                <p class="p">
+                  Displays version information.
+                </p>
+              </td>
+            </tr>
+            <tr class="row">
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__1 ">
+                <p class="p">
+                  -c
+                </p>
+              </td>
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__2 ">
+                <p class="p">
+                  ignore_query_failure=true
+                </p>
+              </td>
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__3 ">
+                <p class="p">
+                  Continues on query failure.
+                </p>
+              </td>
+            </tr>
+            <tr class="row">
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__1 ">
+                <p class="p">
+                  -d <var class="keyword varname">default_db</var> or --database=<var class="keyword varname">default_db</var>
+                </p>
+              </td>
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__2 ">
+                <p class="p">
+                  default_db=<var class="keyword varname">default_db</var>
+                </p>
+              </td>
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__3 ">
+                <p class="p">
+                  Specifies the database to be used on startup. Same as running the
+                  <code class="ph codeph"><a class="xref" href="impala_use.html#use">USE</a></code> statement after connecting. If not
+                  specified, a database named <code class="ph codeph">DEFAULT</code> is used.
+                </p>
+              </td>
+            </tr>
+            <tr class="row">
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__1 ">
+                -ssl
+              </td>
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__2 ">
+                ssl=true
+              </td>
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__3 ">
+                Enables TLS/SSL for <span class="keyword cmdname">impala-shell</span>.
+              </td>
+            </tr>
+            <tr class="row">
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__1 ">
+                --ca_cert=<var class="keyword varname">path_to_certificate</var>
+              </td>
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__2 ">
+                ca_cert=<var class="keyword varname">path_to_certificate</var>
+              </td>
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__3 ">
+                The local pathname pointing to the third-party CA certificate, or to a copy of the server
+                certificate for self-signed server certificates. If <code class="ph codeph">--ca_cert</code> is not set,
+                <span class="keyword cmdname">impala-shell</span> enables TLS/SSL, but does not validate the server certificate. This is
+                useful for connecting to a known-good Impala that is only running over TLS/SSL, when a copy of the
+                certificate is not available (such as when debugging customer installations).
+              </td>
+            </tr>
+            <tr class="row">
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__1 ">
+                -l
+              </td>
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__2 ">
+                use_ldap=true
+              </td>
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__3 ">
+                Enables LDAP authentication.
+              </td>
+            </tr>
+            <tr class="row">
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__1 ">
+                -u
+              </td>
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__2 ">
+                user=<var class="keyword varname">user_name</var>
+              </td>
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__3 ">
+                Supplies the username, when LDAP authentication is enabled by the <code class="ph codeph">-l</code> option.
+                (Specify the short username, not the full LDAP distinguished name.) The shell then prompts
+                interactively for the password.
+              </td>
+            </tr>
+            <tr class="row">
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__1 ">
+                --ldap_password_cmd=<var class="keyword varname">command</var>
+              </td>
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__2 ">
+                N/A
+              </td>
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__3 ">
+                Specifies a command to run to retrieve the LDAP password,
+                when LDAP authentication is enabled by the <code class="ph codeph">-l</code> option.
+                If the command includes space-separated arguments, enclose the command and
+                its arguments in quotation marks.
+              </td>
+            </tr>
+            <tr class="row">
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__1 ">
+                --config_file=<var class="keyword varname">path_to_config_file</var>
+              </td>
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__2 ">
+                N/A
+              </td>
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__3 ">
+                Specifies the path of the file containing <span class="keyword cmdname">impala-shell</span> configuration settings.
+                The default is <span class="ph filepath">$HOME/.impalarc</span>. This setting can only be specified on the
+                command line.
+              </td>
+            </tr>
+            <tr class="row">
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__1 ">--live_progress</td>
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__2 ">N/A</td>
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__3 ">Prints a progress bar showing roughly the percentage complete for each query.
+              The information is updated interactively as the query progresses.
+              See <a class="xref" href="impala_live_progress.html#live_progress">LIVE_PROGRESS Query Option (Impala 2.3 or higher only)</a>.</td>
+            </tr>
+            <tr class="row">
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__1 ">--live_summary</td>
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__2 ">N/A</td>
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__3 ">Prints a detailed report, similar to the <code class="ph codeph">SUMMARY</code> command, showing progress details for each phase of query execution.
+              The information is updated interactively as the query progresses.
+              See <a class="xref" href="impala_live_summary.html#live_summary">LIVE_SUMMARY Query Option (Impala 2.3 or higher only)</a>.</td>
+            </tr>
+            <tr class="row">
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__1 ">--var=<var class="keyword varname">variable_name</var>=<var class="keyword varname">value</var></td>
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__2 ">N/A</td>
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__3 ">
+                Defines a substitution variable that can be used within the <span class="keyword cmdname">impala-shell</span> session.
+                The variable can be substituted into statements processed by the <code class="ph codeph">-q</code> or <code class="ph codeph">-f</code> options,
+                or in an interactive shell session.
+                Within a SQL statement, you substitute the value by using the notation <code class="ph codeph">${var:<var class="keyword varname">variable_name</var>}</code>.
+                This feature is available in <span class="keyword">Impala 2.5</span> and higher.
+              </td>
+            </tr>
+          </tbody></table>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="shell_options__shell_config_file">
+
+    <h2 class="title topictitle2" id="ariaid-title3">impala-shell Configuration File</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        You can define a set of default options for your <span class="keyword cmdname">impala-shell</span> environment, stored in the
+        file <span class="ph filepath">$HOME/.impalarc</span>. This file consists of key-value pairs, one option per line.
+        Everything after a <code class="ph codeph">#</code> character on a line is treated as a comment and ignored.
+      </p>
+
+      <p class="p">
+        The configuration file must contain a header label <code class="ph codeph">[impala]</code>, followed by the options
+        specific to <span class="keyword cmdname">impala-shell</span>. (This standard convention for configuration files lets you
+        use a single file to hold configuration options for multiple applications.)
+      </p>
+
+      <p class="p">
+        To specify a different filename or path for the configuration file, specify the argument
+        <code class="ph codeph">--config_file=<var class="keyword varname">path_to_config_file</var></code> on the
+        <span class="keyword cmdname">impala-shell</span> command line.
+      </p>
+
+      <p class="p">
+        The names of the options in the configuration file are similar (although not necessarily identical) to the
+        long-form command-line arguments to the <span class="keyword cmdname">impala-shell</span> command. For the names to use, see
+        <a class="xref" href="impala_shell_options.html#shell_option_summary">Summary of impala-shell Configuration Options</a>.
+      </p>
+
+      <p class="p">
+        Any options you specify on the <span class="keyword cmdname">impala-shell</span> command line override any corresponding
+        options within the configuration file.
+      </p>
+
+      <p class="p">
+        The following example shows a configuration file that you might use during benchmarking tests. It sets
+        verbose mode, so that the output from each SQL query is followed by timing information.
+        <span class="keyword cmdname">impala-shell</span> starts inside the database containing the tables with the benchmark data,
+        avoiding the need to issue a <code class="ph codeph">USE</code> statement or use fully qualified table names.
+      </p>
+
+      <p class="p">
+        In this example, the query output is formatted as delimited text rather than enclosed in ASCII art boxes,
+        and is stored in a file rather than printed to the screen. Those options are appropriate for benchmark
+        situations, so that the overhead of <span class="keyword cmdname">impala-shell</span> formatting and printing the result set
+        does not factor into the timing measurements. It also enables the <code class="ph codeph">show_profiles</code> option.
+        That option prints detailed performance information after each query, which might be valuable in
+        understanding the performance of benchmark queries.
+      </p>
+
+<pre class="pre codeblock"><code>[impala]
+verbose=true
+default_db=tpc_benchmarking
+write_delimited=true
+output_delimiter=,
+output_file=/home/tester1/benchmark_results.csv
+show_profiles=true
+</code></pre>
+
+      <p class="p">
+        The following example shows a configuration file that connects to a specific remote Impala node, runs a
+        single query within a particular database, then exits. Any query options predefined under the
+        <code class="ph codeph">[impala.query_options]</code> section in the configuration file take effect during the session.
+      </p>
+
+      <p class="p">
+        You would typically use this kind of single-purpose
+        configuration setting with the <span class="keyword cmdname">impala-shell</span> command-line option
+        <code class="ph codeph">--config_file=<var class="keyword varname">path_to_config_file</var></code>, to easily select between many
+        predefined queries that could be run against different databases, hosts, or even different clusters. To run
+        a sequence of statements instead of a single query, specify the configuration option
+        <code class="ph codeph">query_file=<var class="keyword varname">path_to_query_file</var></code> instead.
+      </p>
+
+<pre class="pre codeblock"><code>[impala]
+impalad=impala-test-node1.example.com
+default_db=site_stats
+# Issue a predefined query and immediately exit.
+query=select count(*) from web_traffic where event_date = trunc(now(),'dd')
+
+<span class="ph">[impala.query_options]
+mem_limit=32g</span>
+</code></pre>
+    </div>
+  </article>
+</article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_shell_running_commands.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_shell_running_commands.html b/docs/build3x/html/topics/impala_shell_running_commands.html
new file mode 100644
index 0000000..98c4d24
--- /dev/null
+++ b/docs/build3x/html/topics/impala_shell_running_commands.html
@@ -0,0 +1,322 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_impala_shell.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="shell_running_commands"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Running Commands and SQL Statements in impala-shell</title></head><body id="shell_running_commands"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Running Commands and SQL Statements in impala-shell</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+      For information on available commands, see
+      <a class="xref" href="impala_shell_commands.html#shell_commands">impala-shell Command Reference</a>. You can see the full set of available
+      commands by pressing TAB twice, for example:
+    </p>
+
+<pre class="pre codeblock"><code>[impalad-host:21000] &gt;
+connect   describe  explain   help      history   insert    quit      refresh   select    set       shell     show      use       version
+[impalad-host:21000] &gt;</code></pre>
+
+    <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+      Commands must be terminated by a semi-colon. A command can span multiple lines.
+    </div>
+
+    <p class="p">
+      For example:
+    </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; select *
+                  &gt; from t1
+                  &gt; limit 5;
++---------+-----------+
+| s1      | s2        |
++---------+-----------+
+| hello   | world     |
+| goodbye | cleveland |
++---------+-----------+
+</code></pre>
+
+    <p class="p">
+      A comment is considered part of the statement it precedes, so when you enter a <code class="ph codeph">--</code> or
+      <code class="ph codeph">/* */</code> comment, you get a continuation prompt until you finish entering a statement ending
+      with a semicolon:
+    </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; -- This is a test comment
+                  &gt; show tables like 't*';
++--------+
+| name   |
++--------+
+| t1     |
+| t2     |
+| tab1   |
+| tab2   |
+| tab3   |
+| text_t |
++--------+
+</code></pre>
+
+    <p class="p">
+      Use the up-arrow and down-arrow keys to cycle through and edit previous commands.
+      <span class="keyword cmdname">impala-shell</span> uses the <code class="ph codeph">readline</code> library and so supports a standard set of
+      keyboard shortcuts for editing and cursor movement, such as <code class="ph codeph">Ctrl-A</code> for beginning of line and
+      <code class="ph codeph">Ctrl-E</code> for end of line.
+    </p>
+
+    <p class="p">
+      In <span class="keyword">Impala 2.5</span> and higher, you can define substitution variables to be used within SQL statements
+      processed by <span class="keyword cmdname">impala-shell</span>. On the command line, you specify the option
+      <code class="ph codeph">--var=<var class="keyword varname">variable_name</var>=<var class="keyword varname">value</var></code>.
+      Within an interactive session or a script file processed by the <code class="ph codeph">-f</code> option, you specify
+      a <code class="ph codeph">SET</code> command using the notation <code class="ph codeph">SET VAR:<var class="keyword varname">variable_name</var>=<var class="keyword varname">value</var></code>.
+      Within a SQL statement, you substitute the value by using the notation <code class="ph codeph">${var:<var class="keyword varname">variable_name</var>}</code>.
+    </p>
+
+    <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+      Because this feature is part of <span class="keyword cmdname">impala-shell</span> rather than the <span class="keyword cmdname">impalad</span>
+      backend, make sure the client system you are connecting from has the most recent <span class="keyword cmdname">impala-shell</span>.
+      You can use this feature with a new <span class="keyword cmdname">impala-shell</span> connecting to an older <span class="keyword cmdname">impalad</span>,
+      but not the reverse.
+    </div>
+
+    <p class="p">
+      For example, here are some <span class="keyword cmdname">impala-shell</span> commands that define substitution variables and then
+      use them in SQL statements executed through the <code class="ph codeph">-q</code> and <code class="ph codeph">-f</code> options.
+      Notice how the <code class="ph codeph">-q</code> argument strings are single-quoted to prevent shell expansion of the
+      <code class="ph codeph">${var:value}</code> notation, and any string literals within the queries are enclosed by double quotation marks.
+    </p>
+
+<pre class="pre codeblock"><code>
+$ impala-shell --var=tname=table1 --var=colname=x --var=coltype=string -q 'create table ${var:tname} (${var:colname} ${var:coltype}) stored as parquet'
+Starting Impala Shell without Kerberos authentication
+Connected to <var class="keyword varname">hostname</var>
+Server version: <var class="keyword varname">impalad_version</var>
+Query: create table table1 (x string) stored as parquet
+
+$ NEW_STRING="hello world"
+$ impala-shell --var=tname=table1 --var=insert_val="$NEW_STRING" -q 'insert into ${var:tname} values ("${var:insert_val}")'
+Starting Impala Shell without Kerberos authentication
+Connected to <var class="keyword varname">hostname</var>
+Server version: <var class="keyword varname">impalad_version</var>
+Query: insert into table1 values ("hello world")
+Inserted 1 row(s) in 1.40s
+
+$ for VAL in foo bar bletch
+do
+  impala-shell --var=tname=table1 --var=insert_val="$VAL" -q 'insert into ${var:tname} values ("${var:insert_val}")'
+done
+...
+Query: insert into table1 values ("foo")
+Inserted 1 row(s) in 0.22s
+Query: insert into table1 values ("bar")
+Inserted 1 row(s) in 0.11s
+Query: insert into table1 values ("bletch")
+Inserted 1 row(s) in 0.21s
+
+$ echo "Search for what substring?" ; read answer
+Search for what substring?
+b
+$ impala-shell --var=tname=table1 -q 'select x from ${var:tname} where x like "%${var:answer}%"'
+Starting Impala Shell without Kerberos authentication
+Connected to <var class="keyword varname">hostname</var>
+Server version: <var class="keyword varname">impalad_version</var>
+Query: select x from table1 where x like "%b%"
++--------+
+| x      |
++--------+
+| bletch |
+| bar    |
++--------+
+Fetched 2 row(s) in 0.83s
+</code></pre>
+
+    <p class="p">
+      Here is a substitution variable passed in by the <code class="ph codeph">--var</code> option,
+      and then referenced by statements issued interactively. Then the variable is
+      cleared with the <code class="ph codeph">UNSET</code> command, and defined again with the
+      <code class="ph codeph">SET</code> command.
+    </p>
+
+<pre class="pre codeblock"><code>
+$ impala-shell --quiet --var=tname=table1
+Starting Impala Shell without Kerberos authentication
+***********************************************************************************
+<var class="keyword varname">banner_message</var>
+***********************************************************************************
+[<var class="keyword varname">hostname</var>:21000] &gt; select count(*) from ${var:tname};
++----------+
+| count(*) |
++----------+
+| 4        |
++----------+
+[<var class="keyword varname">hostname</var>:21000] &gt; unset var:tname;
+Unsetting variable TNAME
+[<var class="keyword varname">hostname</var>:21000] &gt; select count(*) from ${var:tname};
+Error: Unknown variable TNAME
+[<var class="keyword varname">hostname</var>:21000] &gt; set var:tname=table1;
+[<var class="keyword varname">hostname</var>:21000] &gt; select count(*) from ${var:tname};
++----------+
+| count(*) |
++----------+
+| 4        |
++----------+
+</code></pre>
+
+    <p class="p">
+      The following example shows how the <code class="ph codeph">SOURCE</code> command can execute
+      a series of statements from a file:
+    </p>
+
+<pre class="pre codeblock"><code>
+$ cat commands.sql
+show databases;
+show tables in default;
+show functions in _impala_builtins like '*minute*';
+
+$ impala-shell -i localhost
+...
+[localhost:21000] &gt; source commands.sql;
+Query: show databases
++------------------+----------------------------------------------+
+| name             | comment                                      |
++------------------+----------------------------------------------+
+| _impala_builtins | System database for Impala builtin functions |
+| default          | Default Hive database                        |
++------------------+----------------------------------------------+
+Fetched 2 row(s) in 0.06s
+Query: show tables in default
++-----------+
+| name      |
++-----------+
+| customers |
+| sample_07 |
+| sample_08 |
+| web_logs  |
++-----------+
+Fetched 4 row(s) in 0.02s
+Query: show functions in _impala_builtins like '*minute*'
++-------------+--------------------------------+-------------+---------------+
+| return type | signature                      | binary type | is persistent |
++-------------+--------------------------------+-------------+---------------+
+| INT         | minute(TIMESTAMP)              | BUILTIN     | true          |
+| TIMESTAMP   | minutes_add(TIMESTAMP, BIGINT) | BUILTIN     | true          |
+| TIMESTAMP   | minutes_add(TIMESTAMP, INT)    | BUILTIN     | true          |
+| TIMESTAMP   | minutes_sub(TIMESTAMP, BIGINT) | BUILTIN     | true          |
+| TIMESTAMP   | minutes_sub(TIMESTAMP, INT)    | BUILTIN     | true          |
++-------------+--------------------------------+-------------+---------------+
+Fetched 5 row(s) in 0.03s
+</code></pre>
+
+    <p class="p">
+      The following example shows how a file that is run by the <code class="ph codeph">SOURCE</code> command,
+      or through the <code class="ph codeph">-q</code> or <code class="ph codeph">-f</code> options of <span class="keyword cmdname">impala-shell</span>,
+      can contain additional <code class="ph codeph">SOURCE</code> commands.
+      The first file, <span class="ph filepath">nested1.sql</span>, runs an <span class="keyword cmdname">impala-shell</span> command
+      and then also runs the commands from <span class="ph filepath">nested2.sql</span>.
+      This ability for scripts to call each other is often useful for code that sets up schemas for applications
+      or test environments.
+    </p>
+
+<pre class="pre codeblock"><code>
+$ cat nested1.sql
+show functions in _impala_builtins like '*minute*';
+source nested2.sql
+$ cat nested2.sql
+show functions in _impala_builtins like '*hour*'
+
+$ impala-shell -i localhost -f nested1.sql
+Starting Impala Shell without Kerberos authentication
+Connected to localhost:21000
+...
+Query: show functions in _impala_builtins like '*minute*'
++-------------+--------------------------------+-------------+---------------+
+| return type | signature                      | binary type | is persistent |
++-------------+--------------------------------+-------------+---------------+
+| INT         | minute(TIMESTAMP)              | BUILTIN     | true          |
+| TIMESTAMP   | minutes_add(TIMESTAMP, BIGINT) | BUILTIN     | true          |
+| TIMESTAMP   | minutes_add(TIMESTAMP, INT)    | BUILTIN     | true          |
+| TIMESTAMP   | minutes_sub(TIMESTAMP, BIGINT) | BUILTIN     | true          |
+| TIMESTAMP   | minutes_sub(TIMESTAMP, INT)    | BUILTIN     | true          |
++-------------+--------------------------------+-------------+---------------+
+Fetched 5 row(s) in 0.01s
+Query: show functions in _impala_builtins like '*hour*'
++-------------+------------------------------+-------------+---------------+
+| return type | signature                    | binary type | is persistent |
++-------------+------------------------------+-------------+---------------+
+| INT         | hour(TIMESTAMP)              | BUILTIN     | true          |
+| TIMESTAMP   | hours_add(TIMESTAMP, BIGINT) | BUILTIN     | true          |
+| TIMESTAMP   | hours_add(TIMESTAMP, INT)    | BUILTIN     | true          |
+| TIMESTAMP   | hours_sub(TIMESTAMP, BIGINT) | BUILTIN     | true          |
+| TIMESTAMP   | hours_sub(TIMESTAMP, INT)    | BUILTIN     | true          |
++-------------+------------------------------+-------------+---------------+
+Fetched 5 row(s) in 0.01s
+</code></pre>
+
+  </div>
+
+  <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_impala_shell.html">Using the Impala Shell (impala-shell Command)</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="shell_running_commands__rerun">
+    <h2 class="title topictitle2" id="ariaid-title2">Rerunning impala-shell Commands</h2>
+    <div class="body conbody">
+
+      <p class="p">
+        In <span class="keyword">Impala 2.10</span> and higher, you can use the
+        <code class="ph codeph">rerun</code> command, or its abbreviation <code class="ph codeph">@</code>,
+        to re-execute commands from the history list. The argument can be
+        a positive integer (reflecting the number shown in <code class="ph codeph">history</code>
+        output) or a negative integer (reflecting the N'th last command in the
+        <code class="ph codeph">history</code> output. For example:
+      </p>
+
+<pre class="pre codeblock"><code>
+[localhost:21000] &gt; select * from p1 order by t limit 5;
+...
+[localhost:21000] &gt; show table stats p1;
++-----------+--------+--------+------------------------------------------------------------+
+| #Rows     | #Files | Size   | Location                                                   |
++-----------+--------+--------+------------------------------------------------------------+
+| 134217728 | 50     | 4.66MB | hdfs://test.example.com:8020/user/hive/warehouse/jdr.db/p1 |
++-----------+--------+--------+------------------------------------------------------------+
+[localhost:21000] &gt; compute stats p1;
++-----------------------------------------+
+| summary                                 |
++-----------------------------------------+
+| Updated 1 partition(s) and 3 column(s). |
++-----------------------------------------+
+[localhost:21000] &gt; history;
+[1]: use jdr;
+[2]: history;
+[3]: show tables;
+[4]: select * from p1 order by t limit 5;
+[5]: show table stats p1;
+[6]: compute stats p1;
+[7]: history;
+[localhost:21000] &gt; @-2; &lt;- Rerun the 2nd last command in the history list
+Rerunning compute stats p1;
++-----------------------------------------+
+| summary                                 |
++-----------------------------------------+
+| Updated 1 partition(s) and 3 column(s). |
++-----------------------------------------+
+[localhost:21000] &gt; history; &lt;- History list is not updated by rerunning commands
+                                or by repeating the last command, in this case 'history'.
+[1]: use jdr;
+[2]: history;
+[3]: show tables;
+[4]: select * from p1 order by t limit 5;
+[5]: show table stats p1;
+[6]: compute stats p1;
+[7]: history;
+[localhost:21000] &gt; @4; &lt;- Rerun command #4 in the history list using short form '@'.
+Rerunning select * from p1 order by t limit 5;
+...
+[localhost:21000] &gt; rerun 4; &lt;- Rerun command #4 using long form 'rerun'.
+Rerunning select * from p1 order by t limit 5;
+...
+
+</code></pre>
+
+    </div>
+  </article>
+
+</article></main></body></html>

[51/51] [partial] impala git commit: [DOCS] Impala doc site update for 3.0

Posted by mi...@apache.org.

[DOCS] Impala doc site update for 3.0

Add Impala docs from branch master.
Commit hash: f20415755401d56103df7f16348cea8ed12fb3d8

Change-Id: Icf5927efa7baa965095a3ff2fd4ec7411313342d
Reviewed-on: http://gerrit.cloudera.org:8080/10322
Reviewed-by: Michael Brown <mi...@cloudera.com>
Tested-by: Alex Rodoni <ar...@cloudera.com>


Project: http://git-wip-us.apache.org/repos/asf/impala/repo
Commit: http://git-wip-us.apache.org/repos/asf/impala/commit/fae51ec2
Tree: http://git-wip-us.apache.org/repos/asf/impala/tree/fae51ec2
Diff: http://git-wip-us.apache.org/repos/asf/impala/diff/fae51ec2

Branch: refs/heads/asf-site
Commit: fae51ec244b5005d21a45e434bc3425cfe08e871
Parents: 52b8807
Author: Alex Rodoni <ar...@cloudera.com>
Authored: Tue May 8 10:56:06 2018 -0700
Committer: Alex Rodoni <ar...@cloudera.com>
Committed: Wed May 9 20:55:22 2018 +0000

----------------------------------------------------------------------
 docs/build3x/html/commonltr.css                 |  555 ++
 docs/build3x/html/commonrtl.css                 |  592 ++
 docs/build3x/html/images/impala_arch.jpeg       |  Bin 0 -> 41900 bytes
 docs/build3x/html/index.html                    |    3 +
 .../html/topics/impala_abort_on_error.html      |   42 +
 docs/build3x/html/topics/impala_adls.html       |  638 ++
 docs/build3x/html/topics/impala_admin.html      |   52 +
 docs/build3x/html/topics/impala_admission.html  |  822 +++
 .../html/topics/impala_aggregate_functions.html |   34 +
 docs/build3x/html/topics/impala_aliases.html    |  148 +
 .../impala_allow_unsupported_formats.html       |   24 +
 .../build3x/html/topics/impala_alter_table.html | 1117 ++++
 docs/build3x/html/topics/impala_alter_view.html |  139 +
 .../html/topics/impala_analytic_functions.html  | 1785 ++++++
 .../html/topics/impala_appx_count_distinct.html |   82 +
 .../build3x/html/topics/impala_appx_median.html |  132 +
 docs/build3x/html/topics/impala_array.html      |  321 +
 docs/build3x/html/topics/impala_auditing.html   |  232 +
 .../html/topics/impala_authentication.html      |   37 +
 .../html/topics/impala_authorization.html       | 1176 ++++
 docs/build3x/html/topics/impala_avg.html        |  318 +
 docs/build3x/html/topics/impala_avro.html       |  565 ++
 docs/build3x/html/topics/impala_batch_size.html |   34 +
 docs/build3x/html/topics/impala_bigint.html     |  138 +
 .../html/topics/impala_bit_functions.html       |  848 +++
 docs/build3x/html/topics/impala_boolean.html    |  170 +
 docs/build3x/html/topics/impala_breakpad.html   |  239 +
 .../html/topics/impala_buffer_pool_limit.html   |   71 +
 docs/build3x/html/topics/impala_char.html       |  305 +
 docs/build3x/html/topics/impala_comments.html   |   46 +
 .../html/topics/impala_complex_types.html       | 2606 ++++++++
 docs/build3x/html/topics/impala_components.html |  227 +
 .../html/topics/impala_compression_codec.html   |   92 +
 .../html/topics/impala_compute_stats.html       |  637 ++
 .../impala_compute_stats_min_sample_size.html   |   23 +
 docs/build3x/html/topics/impala_concepts.html   |   48 +
 .../topics/impala_conditional_functions.html    |  611 ++
 docs/build3x/html/topics/impala_config.html     |   48 +
 .../html/topics/impala_config_options.html      |  389 ++
 .../html/topics/impala_config_performance.html  |  149 +
 docs/build3x/html/topics/impala_connecting.html |  187 +
 .../topics/impala_conversion_functions.html     |  288 +
 docs/build3x/html/topics/impala_count.html      |  353 ++
 .../html/topics/impala_create_database.html     |  209 +
 .../html/topics/impala_create_function.html     |  502 ++
 .../build3x/html/topics/impala_create_role.html |   70 +
 .../html/topics/impala_create_table.html        | 1346 ++++
 .../build3x/html/topics/impala_create_view.html |  194 +
 docs/build3x/html/topics/impala_databases.html  |   62 +
 docs/build3x/html/topics/impala_datatypes.html  |   33 +
 .../html/topics/impala_datetime_functions.html  | 3105 +++++++++
 docs/build3x/html/topics/impala_ddl.html        |  141 +
 .../html/topics/impala_debug_action.html        |   24 +
 docs/build3x/html/topics/impala_decimal.html    |  907 +++
 docs/build3x/html/topics/impala_decimal_v2.html |   32 +
 .../impala_default_join_distribution_mode.html  |  113 +
 .../impala_default_spillable_buffer_size.html   |   87 +
 docs/build3x/html/topics/impala_delegation.html |   70 +
 docs/build3x/html/topics/impala_delete.html     |  177 +
 docs/build3x/html/topics/impala_describe.html   |  817 +++
 .../build3x/html/topics/impala_development.html |  197 +
 .../html/topics/impala_disable_codegen.html     |   36 +
 .../impala_disable_row_runtime_filtering.html   |   90 +
 ...mpala_disable_streaming_preaggregations.html |   50 +
 .../topics/impala_disable_unsafe_spills.html    |   50 +
 docs/build3x/html/topics/impala_disk_space.html |  133 +
 docs/build3x/html/topics/impala_distinct.html   |   81 +
 docs/build3x/html/topics/impala_dml.html        |   82 +
 docs/build3x/html/topics/impala_double.html     |  157 +
 .../html/topics/impala_drop_database.html       |  193 +
 .../html/topics/impala_drop_function.html       |  136 +
 docs/build3x/html/topics/impala_drop_role.html  |   71 +
 docs/build3x/html/topics/impala_drop_stats.html |  285 +
 docs/build3x/html/topics/impala_drop_table.html |  192 +
 docs/build3x/html/topics/impala_drop_view.html  |   80 +
 .../impala_exec_single_node_rows_threshold.html |   89 +
 .../html/topics/impala_exec_time_limit_s.html   |   70 +
 docs/build3x/html/topics/impala_explain.html    |  296 +
 .../html/topics/impala_explain_level.html       |  342 +
 .../html/topics/impala_explain_plan.html        |  592 ++
 docs/build3x/html/topics/impala_faq.html        |   21 +
 .../html/topics/impala_file_formats.html        |  236 +
 .../html/topics/impala_fixed_issues.html        | 5961 ++++++++++++++++++
 docs/build3x/html/topics/impala_float.html      |  153 +
 docs/build3x/html/topics/impala_functions.html  |  162 +
 .../html/topics/impala_functions_overview.html  |  109 +
 docs/build3x/html/topics/impala_grant.html      |  256 +
 docs/build3x/html/topics/impala_group_by.html   |  140 +
 .../html/topics/impala_group_concat.html        |  141 +
 docs/build3x/html/topics/impala_hadoop.html     |  138 +
 docs/build3x/html/topics/impala_having.html     |   39 +
 docs/build3x/html/topics/impala_hbase.html      |  772 +++
 .../html/topics/impala_hbase_cache_blocks.html  |   36 +
 .../html/topics/impala_hbase_caching.html       |   36 +
 docs/build3x/html/topics/impala_hints.html      |  488 ++
 .../build3x/html/topics/impala_identifiers.html |  110 +
 .../html/topics/impala_impala_shell.html        |   87 +
 .../topics/impala_incompatible_changes.html     | 1526 +++++
 docs/build3x/html/topics/impala_insert.html     |  911 +++
 docs/build3x/html/topics/impala_install.html    |  126 +
 docs/build3x/html/topics/impala_int.html        |  121 +
 docs/build3x/html/topics/impala_intro.html      |  198 +
 .../html/topics/impala_invalidate_metadata.html |  286 +
 docs/build3x/html/topics/impala_isilon.html     |   89 +
 docs/build3x/html/topics/impala_jdbc.html       |  340 +
 docs/build3x/html/topics/impala_joins.html      |  531 ++
 docs/build3x/html/topics/impala_kerberos.html   |  342 +
 .../html/topics/impala_known_issues.html        | 1012 +++
 docs/build3x/html/topics/impala_kudu.html       | 1449 +++++
 docs/build3x/html/topics/impala_langref.html    |   66 +
 .../build3x/html/topics/impala_langref_sql.html |   28 +
 .../html/topics/impala_langref_unsupported.html |  337 +
 docs/build3x/html/topics/impala_ldap.html       |  294 +
 docs/build3x/html/topics/impala_limit.html      |  168 +
 docs/build3x/html/topics/impala_lineage.html    |   91 +
 docs/build3x/html/topics/impala_literals.html   |  424 ++
 .../html/topics/impala_live_progress.html       |  131 +
 .../html/topics/impala_live_summary.html        |  177 +
 docs/build3x/html/topics/impala_load_data.html  |  322 +
 docs/build3x/html/topics/impala_logging.html    |  423 ++
 docs/build3x/html/topics/impala_map.html        |  331 +
 .../html/topics/impala_math_functions.html      | 1711 +++++
 docs/build3x/html/topics/impala_max.html        |  298 +
 docs/build3x/html/topics/impala_max_errors.html |   40 +
 .../topics/impala_max_num_runtime_filters.html  |   75 +
 .../html/topics/impala_max_row_size.html        |  221 +
 .../topics/impala_max_scan_range_length.html    |   47 +
 docs/build3x/html/topics/impala_mem_limit.html  |  206 +
 docs/build3x/html/topics/impala_min.html        |  297 +
 .../impala_min_spillable_buffer_size.html       |   87 +
 .../html/topics/impala_misc_functions.html      |  175 +
 .../html/topics/impala_mixed_security.html      |   26 +
 docs/build3x/html/topics/impala_mt_dop.html     |  190 +
 docs/build3x/html/topics/impala_ndv.html        |  226 +
 .../html/topics/impala_new_features.html        | 3806 +++++++++++
 docs/build3x/html/topics/impala_num_nodes.html  |   61 +
 .../html/topics/impala_num_scanner_threads.html |   27 +
 docs/build3x/html/topics/impala_odbc.html       |   24 +
 docs/build3x/html/topics/impala_offset.html     |   67 +
 docs/build3x/html/topics/impala_operators.html  | 2042 ++++++
 .../impala_optimize_partition_key_scans.html    |  188 +
 docs/build3x/html/topics/impala_order_by.html   |  398 ++
 docs/build3x/html/topics/impala_parquet.html    | 1421 +++++
 .../impala_parquet_annotate_strings_utf8.html   |   54 +
 .../topics/impala_parquet_array_resolution.html |  180 +
 .../impala_parquet_compression_codec.html       |   17 +
 ...pala_parquet_fallback_schema_resolution.html |   55 +
 .../html/topics/impala_parquet_file_size.html   |  101 +
 .../html/topics/impala_partitioning.html        |  801 +++
 .../html/topics/impala_perf_benchmarking.html   |   27 +
 .../html/topics/impala_perf_cookbook.html       |  256 +
 .../html/topics/impala_perf_hdfs_caching.html   |  578 ++
 docs/build3x/html/topics/impala_perf_joins.html |  508 ++
 .../html/topics/impala_perf_resources.html      |   47 +
 docs/build3x/html/topics/impala_perf_skew.html  |  139 +
 docs/build3x/html/topics/impala_perf_stats.html | 1192 ++++
 .../html/topics/impala_perf_testing.html        |  152 +
 .../build3x/html/topics/impala_performance.html |  116 +
 docs/build3x/html/topics/impala_planning.html   |   20 +
 docs/build3x/html/topics/impala_porting.html    |  603 ++
 docs/build3x/html/topics/impala_ports.html      |  421 ++
 .../html/topics/impala_prefetch_mode.html       |   47 +
 docs/build3x/html/topics/impala_prereqs.html    |  275 +
 docs/build3x/html/topics/impala_processes.html  |  115 +
 docs/build3x/html/topics/impala_proxy.html      |  501 ++
 .../html/topics/impala_query_options.html       |   55 +
 .../html/topics/impala_query_timeout_s.html     |   62 +
 docs/build3x/html/topics/impala_rcfile.html     |  246 +
 docs/build3x/html/topics/impala_real.html       |   39 +
 docs/build3x/html/topics/impala_refresh.html    |  408 ++
 .../html/topics/impala_release_notes.html       |   26 +
 docs/build3x/html/topics/impala_relnotes.html   |   26 +
 .../html/topics/impala_replica_preference.html  |   68 +
 .../html/topics/impala_request_pool.html        |   35 +
 .../html/topics/impala_reserved_words.html      | 3853 +++++++++++
 .../html/topics/impala_resource_management.html |   97 +
 docs/build3x/html/topics/impala_revoke.html     |  151 +
 .../impala_runtime_bloom_filter_size.html       |  104 +
 .../topics/impala_runtime_filter_max_size.html  |   65 +
 .../topics/impala_runtime_filter_min_size.html  |   65 +
 .../html/topics/impala_runtime_filter_mode.html |   75 +
 .../impala_runtime_filter_wait_time_ms.html     |   51 +
 .../html/topics/impala_runtime_filtering.html   |  533 ++
 docs/build3x/html/topics/impala_s3.html         |  775 +++
 .../topics/impala_s3_skip_insert_staging.html   |   78 +
 .../build3x/html/topics/impala_scalability.html |  920 +++
 .../topics/impala_schedule_random_replica.html  |   83 +
 .../html/topics/impala_schema_design.html       |  184 +
 .../html/topics/impala_schema_objects.html      |   48 +
 .../html/topics/impala_scratch_limit.html       |   77 +
 docs/build3x/html/topics/impala_security.html   |   99 +
 .../html/topics/impala_security_files.html      |   58 +
 .../html/topics/impala_security_guidelines.html |   99 +
 .../html/topics/impala_security_install.html    |   17 +
 .../html/topics/impala_security_metastore.html  |   30 +
 .../html/topics/impala_security_webui.html      |   57 +
 docs/build3x/html/topics/impala_select.html     |  236 +
 docs/build3x/html/topics/impala_seqfile.html    |  240 +
 docs/build3x/html/topics/impala_set.html        |  280 +
 .../html/topics/impala_shell_commands.html      |  416 ++
 .../html/topics/impala_shell_options.html       |  618 ++
 .../topics/impala_shell_running_commands.html   |  322 +
 docs/build3x/html/topics/impala_show.html       | 1525 +++++
 .../topics/impala_shuffle_distinct_exprs.html   |   37 +
 docs/build3x/html/topics/impala_smallint.html   |  127 +
 docs/build3x/html/topics/impala_ssl.html        |  180 +
 docs/build3x/html/topics/impala_stddev.html     |  121 +
 docs/build3x/html/topics/impala_string.html     |  197 +
 .../html/topics/impala_string_functions.html    | 1719 +++++
 docs/build3x/html/topics/impala_struct.html     |  500 ++
 docs/build3x/html/topics/impala_subqueries.html |  332 +
 docs/build3x/html/topics/impala_sum.html        |  333 +
 .../html/topics/impala_support_start_over.html  |   30 +
 docs/build3x/html/topics/impala_sync_ddl.html   |   55 +
 docs/build3x/html/topics/impala_tables.html     |  446 ++
 .../build3x/html/topics/impala_tablesample.html |  560 ++
 docs/build3x/html/topics/impala_timeouts.html   |  182 +
 docs/build3x/html/topics/impala_timestamp.html  |  656 ++
 docs/build3x/html/topics/impala_tinyint.html    |  133 +
 .../html/topics/impala_troubleshooting.html     |  370 ++
 .../html/topics/impala_truncate_table.html      |  200 +
 docs/build3x/html/topics/impala_tutorial.html   | 2270 +++++++
 docs/build3x/html/topics/impala_txtfile.html    |  770 +++
 docs/build3x/html/topics/impala_udf.html        | 1603 +++++
 docs/build3x/html/topics/impala_union.html      |  146 +
 docs/build3x/html/topics/impala_update.html     |  169 +
 docs/build3x/html/topics/impala_upgrading.html  |  280 +
 docs/build3x/html/topics/impala_upsert.html     |  113 +
 docs/build3x/html/topics/impala_use.html        |   84 +
 docs/build3x/html/topics/impala_varchar.html    |  254 +
 docs/build3x/html/topics/impala_variance.html   |  132 +
 docs/build3x/html/topics/impala_views.html      |  300 +
 docs/build3x/html/topics/impala_webui.html      |  311 +
 docs/build3x/html/topics/impala_with.html       |   63 +
 docs/build3x/impala-3.0.pdf                     |  Bin 0 -> 3886205 bytes
 impala-docs.html                                |    9 +-
 236 files changed, 88682 insertions(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/commonltr.css
----------------------------------------------------------------------
diff --git a/docs/build3x/html/commonltr.css b/docs/build3x/html/commonltr.css
new file mode 100644
index 0000000..f0738c6
--- /dev/null
+++ b/docs/build3x/html/commonltr.css
@@ -0,0 +1,555 @@
+/*!
+ * This file is part of the DITA Open Toolkit project. See the accompanying LICENSE.md file for applicable licenses.
+ */
+/*
+ | (c) Copyright IBM Corp. 2004, 2005 All Rights Reserved.
+ */
+.codeblock {
+  font-family: monospace;
+}
+
+.codeph {
+  font-family: monospace;
+}
+
+.kwd {
+  font-weight: bold;
+}
+
+.parmname {
+  font-weight: bold;
+}
+
+.var {
+  font-style: italic;
+}
+
+.filepath {
+  font-family: monospace;
+}
+
+div.tasklabel {
+  margin-top: 1em;
+  margin-bottom: 1em;
+}
+
+h2.tasklabel,
+h3.tasklabel,
+h4.tasklabel,
+h5.tasklabel,
+h6.tasklabel {
+  font-size: 100%;
+}
+
+.screen {
+  padding: 5px 5px 5px 5px;
+  border: outset;
+  background-color: #CCCCCC;
+  margin-top: 2px;
+  margin-bottom: 2px;
+  white-space: pre;
+}
+
+.wintitle {
+  font-weight: bold;
+}
+
+.numcharref {
+  color: #663399;
+  font-family: Menlo, Monaco, Consolas, "Courier New", monospace;
+}
+
+.parameterentity {
+  color: #663399;
+  font-family: Menlo, Monaco, Consolas, "Courier New", monospace;
+}
+
+.textentity {
+  color: #663399;
+  font-family: Menlo, Monaco, Consolas, "Courier New", monospace;
+}
+
+.xmlatt {
+  color: #663399;
+  font-family: Menlo, Monaco, Consolas, "Courier New", monospace;
+}
+
+.xmlelement {
+  color: #663399;
+  font-family: Menlo, Monaco, Consolas, "Courier New", monospace;
+}
+
+.xmlnsname {
+  color: #663399;
+  font-family: Menlo, Monaco, Consolas, "Courier New", monospace;
+}
+
+.xmlpi {
+  color: #663399;
+  font-family: Menlo, Monaco, Consolas, "Courier New", monospace;
+}
+
+.frame-top {
+  border-top: solid 1px;
+  border-right: 0;
+  border-bottom: 0;
+  border-left: 0;
+}
+
+.frame-bottom {
+  border-top: 0;
+  border-right: 0;
+  border-bottom: solid 1px;
+  border-left: 0;
+}
+
+.frame-topbot {
+  border-top: solid 1px;
+  border-right: 0;
+  border-bottom: solid 1px;
+  border-left: 0;
+}
+
+.frame-all {
+  border: solid 1px;
+}
+
+.frame-sides {
+  border-top: 0;
+  border-left: solid 1px;
+  border-right: solid 1px;
+  border-bottom: 0;
+}
+
+.frame-none {
+  border: 0;
+}
+
+.scale-50 {
+  font-size: 50%;
+}
+
+.scale-60 {
+  font-size: 60%;
+}
+
+.scale-70 {
+  font-size: 70%;
+}
+
+.scale-80 {
+  font-size: 80%;
+}
+
+.scale-90 {
+  font-size: 90%;
+}
+
+.scale-100 {
+  font-size: 100%;
+}
+
+.scale-110 {
+  font-size: 110%;
+}
+
+.scale-120 {
+  font-size: 120%;
+}
+
+.scale-140 {
+  font-size: 140%;
+}
+
+.scale-160 {
+  font-size: 160%;
+}
+
+.scale-180 {
+  font-size: 180%;
+}
+
+.scale-200 {
+  font-size: 200%;
+}
+
+.expanse-page, .expanse-spread {
+  width: 100%;
+}
+
+.fig {
+  /* Default of italics to set apart figure captions */
+  /* Use @frame to create frames on figures */
+}
+.figcap {
+  font-style: italic;
+}
+.figdesc {
+  font-style: normal;
+}
+.figborder {
+  border-color: Silver;
+  border-style: solid;
+  border-width: 2px;
+  margin-top: 1em;
+  padding-left: 3px;
+  padding-right: 3px;
+}
+.figsides {
+  border-color: Silver;
+  border-left: 2px solid;
+  border-right: 2px solid;
+  margin-top: 1em;
+  padding-left: 3px;
+  padding-right: 3px;
+}
+.figtop {
+  border-color: Silver;
+  border-top: 2px solid;
+  margin-top: 1em;
+}
+.figbottom {
+  border-bottom: 2px solid;
+  border-color: Silver;
+}
+.figtopbot {
+  border-bottom: 2px solid;
+  border-color: Silver;
+  border-top: 2px solid;
+  margin-top: 1em;
+}
+
+/* Align images based on @align on topic/image */
+div.imageleft {
+  text-align: left;
+}
+
+div.imagecenter {
+  text-align: center;
+}
+
+div.imageright {
+  text-align: right;
+}
+
+div.imagejustify {
+  text-align: justify;
+}
+
+/* Set heading sizes, getting smaller for deeper nesting */
+.topictitle1 {
+  font-size: 1.34em;
+  margin-bottom: 0.1em;
+  margin-top: 0;
+}
+
+.topictitle2 {
+  font-size: 1.17em;
+  margin-bottom: 0.45em;
+  margin-top: 1pc;
+}
+
+.topictitle3 {
+  font-size: 1.17em;
+  font-weight: bold;
+  margin-bottom: 0.17em;
+  margin-top: 1pc;
+}
+
+.topictitle4 {
+  font-size: 1.17em;
+  font-weight: bold;
+  margin-top: 0.83em;
+}
+
+.topictitle5 {
+  font-size: 1.17em;
+  font-weight: bold;
+}
+
+.topictitle6 {
+  font-size: 1.17em;
+  font-style: italic;
+}
+
+.sectiontitle {
+  color: #000;
+  font-size: 1.17em;
+  font-weight: bold;
+  margin-bottom: 0;
+  margin-top: 1em;
+}
+
+.section {
+  margin-bottom: 1em;
+  margin-top: 1em;
+}
+
+.example {
+  margin-bottom: 1em;
+  margin-top: 1em;
+}
+
+/* Most link groups are created with <div>. Ensure they have space before and after. */
+.ullinks {
+  list-style-type: none;
+}
+
+.ulchildlink {
+  margin-bottom: 1em;
+  margin-top: 1em;
+}
+
+.olchildlink {
+  margin-bottom: 1em;
+  margin-top: 1em;
+}
+
+.linklist {
+  margin-bottom: 1em;
+}
+
+.linklistwithchild {
+  margin-bottom: 1em;
+  margin-left: 1.5em;
+}
+
+.sublinklist {
+  margin-bottom: 1em;
+  margin-left: 1.5em;
+}
+
+.relconcepts {
+  margin-bottom: 1em;
+  margin-top: 1em;
+}
+
+.reltasks {
+  margin-bottom: 1em;
+  margin-top: 1em;
+}
+
+.relref {
+  margin-bottom: 1em;
+  margin-top: 1em;
+}
+
+.relinfo {
+  margin-bottom: 1em;
+  margin-top: 1em;
+}
+
+.breadcrumb {
+  font-size: smaller;
+  margin-bottom: 1em;
+}
+
+/* Simple lists do not get a bullet */
+ul.simple {
+  list-style-type: none;
+}
+
+/* Default of bold for definition list terms */
+.dlterm {
+  font-weight: bold;
+}
+
+/* Use CSS to expand lists with @compact="no" */
+.dltermexpand {
+  font-weight: bold;
+  margin-top: 1em;
+}
+
+*[compact="yes"] > li {
+  margin-top: 0;
+}
+
+*[compact="no"] > li {
+  margin-top: 0.53em;
+}
+
+.liexpand {
+  margin-bottom: 1em;
+  margin-top: 1em;
+}
+
+.sliexpand {
+  margin-bottom: 1em;
+  margin-top: 1em;
+}
+
+.dlexpand {
+  margin-bottom: 1em;
+  margin-top: 1em;
+}
+
+.ddexpand {
+  margin-bottom: 1em;
+  margin-top: 1em;
+}
+
+.stepexpand {
+  margin-bottom: 1em;
+  margin-top: 1em;
+}
+
+.substepexpand {
+  margin-bottom: 1em;
+  margin-top: 1em;
+}
+
+dt.prereq {
+  margin-left: 20px;
+}
+
+/* All note formats have the same default presentation */
+.note {
+  margin-bottom: 1em;
+  margin-top: 1em;
+}
+.note .notetitle, .note .notelisttitle,
+.note .note__title {
+  font-weight: bold;
+}
+
+/* Various basic phrase styles */
+.bold {
+  font-weight: bold;
+}
+
+.bolditalic {
+  font-style: italic;
+  font-weight: bold;
+}
+
+.italic {
+  font-style: italic;
+}
+
+.underlined {
+  text-decoration: underline;
+}
+
+.uicontrol {
+  font-weight: bold;
+}
+
+.defkwd {
+  font-weight: bold;
+  text-decoration: underline;
+}
+
+.shortcut {
+  text-decoration: underline;
+}
+
+table {
+  border-collapse: collapse;
+}
+
+table .desc {
+  display: block;
+  font-style: italic;
+}
+
+.cellrowborder {
+  border-bottom: solid 1px;
+  border-left: 0;
+  border-right: solid 1px;
+  border-top: 0;
+}
+
+.row-nocellborder {
+  border-bottom: solid 1px;
+  border-left: 0;
+  border-top: 0;
+}
+
+.cell-norowborder {
+  border-left: 0;
+  border-right: solid 1px;
+  border-top: 0;
+}
+
+.nocellnorowborder {
+  border: 0;
+}
+
+.firstcol {
+  font-weight: bold;
+}
+
+.table--pgwide-1 {
+  width: 100%;
+}
+
+.align-left {
+  text-align: left;
+}
+
+.align-right {
+  text-align: right;
+}
+
+.align-center {
+  text-align: center;
+}
+
+.align-justify {
+  text-align: justify;
+}
+
+.align-char {
+  text-align: char;
+}
+
+.valign-top {
+  vertical-align: top;
+}
+
+.valign-bottom {
+  vertical-align: bottom;
+}
+
+.valign-middle {
+  vertical-align: middle;
+}
+
+.colsep-0 {
+  border-right: 0;
+}
+
+.colsep-1 {
+  border-right: 1px solid;
+}
+
+.rowsep-0 {
+  border-bottom: 0;
+}
+
+.rowsep-1 {
+  border-bottom: 1px solid;
+}
+
+.stentry {
+  border-right: 1px solid;
+  border-bottom: 1px solid;
+}
+
+.stentry:last-child {
+  border-right: 0;
+}
+
+.strow:last-child .stentry {
+  border-bottom: 0;
+}
+
+/* Add space for top level topics */
+.nested0 {
+  margin-top: 1em;
+}
+
+/* div with class=p is used for paragraphs that contain blocks, to keep the XHTML valid */
+.p {
+  margin-top: 1em;
+}

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/commonrtl.css
----------------------------------------------------------------------
diff --git a/docs/build3x/html/commonrtl.css b/docs/build3x/html/commonrtl.css
new file mode 100644
index 0000000..99acb72
--- /dev/null
+++ b/docs/build3x/html/commonrtl.css
@@ -0,0 +1,592 @@
+/*!
+ * This file is part of the DITA Open Toolkit project. See the accompanying LICENSE.md file for applicable licenses.
+ */
+/*
+ | (c) Copyright IBM Corp. 2004, 2005 All Rights Reserved.
+ */
+.codeblock {
+  font-family: monospace;
+}
+
+.codeph {
+  font-family: monospace;
+}
+
+.kwd {
+  font-weight: bold;
+}
+
+.parmname {
+  font-weight: bold;
+}
+
+.var {
+  font-style: italic;
+}
+
+.filepath {
+  font-family: monospace;
+}
+
+div.tasklabel {
+  margin-top: 1em;
+  margin-bottom: 1em;
+}
+
+h2.tasklabel,
+h3.tasklabel,
+h4.tasklabel,
+h5.tasklabel,
+h6.tasklabel {
+  font-size: 100%;
+}
+
+.screen {
+  padding: 5px 5px 5px 5px;
+  border: outset;
+  background-color: #CCCCCC;
+  margin-top: 2px;
+  margin-bottom: 2px;
+  white-space: pre;
+}
+
+.wintitle {
+  font-weight: bold;
+}
+
+.numcharref {
+  color: #663399;
+  font-family: Menlo, Monaco, Consolas, "Courier New", monospace;
+}
+
+.parameterentity {
+  color: #663399;
+  font-family: Menlo, Monaco, Consolas, "Courier New", monospace;
+}
+
+.textentity {
+  color: #663399;
+  font-family: Menlo, Monaco, Consolas, "Courier New", monospace;
+}
+
+.xmlatt {
+  color: #663399;
+  font-family: Menlo, Monaco, Consolas, "Courier New", monospace;
+}
+
+.xmlelement {
+  color: #663399;
+  font-family: Menlo, Monaco, Consolas, "Courier New", monospace;
+}
+
+.xmlnsname {
+  color: #663399;
+  font-family: Menlo, Monaco, Consolas, "Courier New", monospace;
+}
+
+.xmlpi {
+  color: #663399;
+  font-family: Menlo, Monaco, Consolas, "Courier New", monospace;
+}
+
+.frame-top {
+  border-top: solid 1px;
+  border-right: 0;
+  border-bottom: 0;
+  border-left: 0;
+}
+
+.frame-bottom {
+  border-top: 0;
+  border-right: 0;
+  border-bottom: solid 1px;
+  border-left: 0;
+}
+
+.frame-topbot {
+  border-top: solid 1px;
+  border-right: 0;
+  border-bottom: solid 1px;
+  border-left: 0;
+}
+
+.frame-all {
+  border: solid 1px;
+}
+
+.frame-sides {
+  border-top: 0;
+  border-left: solid 1px;
+  border-right: solid 1px;
+  border-bottom: 0;
+}
+
+.frame-none {
+  border: 0;
+}
+
+.scale-50 {
+  font-size: 50%;
+}
+
+.scale-60 {
+  font-size: 60%;
+}
+
+.scale-70 {
+  font-size: 70%;
+}
+
+.scale-80 {
+  font-size: 80%;
+}
+
+.scale-90 {
+  font-size: 90%;
+}
+
+.scale-100 {
+  font-size: 100%;
+}
+
+.scale-110 {
+  font-size: 110%;
+}
+
+.scale-120 {
+  font-size: 120%;
+}
+
+.scale-140 {
+  font-size: 140%;
+}
+
+.scale-160 {
+  font-size: 160%;
+}
+
+.scale-180 {
+  font-size: 180%;
+}
+
+.scale-200 {
+  font-size: 200%;
+}
+
+.expanse-page, .expanse-spread {
+  width: 100%;
+}
+
+.fig {
+  /* Default of italics to set apart figure captions */
+  /* Use @frame to create frames on figures */
+}
+.figcap {
+  font-style: italic;
+}
+.figdesc {
+  font-style: normal;
+}
+.figborder {
+  border-color: Silver;
+  border-style: solid;
+  border-width: 2px;
+  margin-top: 1em;
+  padding-left: 3px;
+  padding-right: 3px;
+}
+.figsides {
+  border-color: Silver;
+  border-left: 2px solid;
+  border-right: 2px solid;
+  margin-top: 1em;
+  padding-left: 3px;
+  padding-right: 3px;
+}
+.figtop {
+  border-color: Silver;
+  border-top: 2px solid;
+  margin-top: 1em;
+}
+.figbottom {
+  border-bottom: 2px solid;
+  border-color: Silver;
+}
+.figtopbot {
+  border-bottom: 2px solid;
+  border-color: Silver;
+  border-top: 2px solid;
+  margin-top: 1em;
+}
+
+/* Align images based on @align on topic/image */
+div.imageleft {
+  text-align: left;
+}
+
+div.imagecenter {
+  text-align: center;
+}
+
+div.imageright {
+  text-align: right;
+}
+
+div.imagejustify {
+  text-align: justify;
+}
+
+/* Set heading sizes, getting smaller for deeper nesting */
+.topictitle1 {
+  font-size: 1.34em;
+  margin-bottom: 0.1em;
+  margin-top: 0;
+}
+
+.topictitle2 {
+  font-size: 1.17em;
+  margin-bottom: 0.45em;
+  margin-top: 1pc;
+}
+
+.topictitle3 {
+  font-size: 1.17em;
+  font-weight: bold;
+  margin-bottom: 0.17em;
+  margin-top: 1pc;
+}
+
+.topictitle4 {
+  font-size: 1.17em;
+  font-weight: bold;
+  margin-top: 0.83em;
+}
+
+.topictitle5 {
+  font-size: 1.17em;
+  font-weight: bold;
+}
+
+.topictitle6 {
+  font-size: 1.17em;
+  font-style: italic;
+}
+
+.sectiontitle {
+  color: #000;
+  font-size: 1.17em;
+  font-weight: bold;
+  margin-bottom: 0;
+  margin-top: 1em;
+}
+
+.section {
+  margin-bottom: 1em;
+  margin-top: 1em;
+}
+
+.example {
+  margin-bottom: 1em;
+  margin-top: 1em;
+}
+
+/* Most link groups are created with <div>. Ensure they have space before and after. */
+.ullinks {
+  list-style-type: none;
+}
+
+.ulchildlink {
+  margin-bottom: 1em;
+  margin-top: 1em;
+}
+
+.olchildlink {
+  margin-bottom: 1em;
+  margin-top: 1em;
+}
+
+.linklist {
+  margin-bottom: 1em;
+}
+
+.linklistwithchild {
+  margin-bottom: 1em;
+  margin-left: 1.5em;
+}
+
+.sublinklist {
+  margin-bottom: 1em;
+  margin-left: 1.5em;
+}
+
+.relconcepts {
+  margin-bottom: 1em;
+  margin-top: 1em;
+}
+
+.reltasks {
+  margin-bottom: 1em;
+  margin-top: 1em;
+}
+
+.relref {
+  margin-bottom: 1em;
+  margin-top: 1em;
+}
+
+.relinfo {
+  margin-bottom: 1em;
+  margin-top: 1em;
+}
+
+.breadcrumb {
+  font-size: smaller;
+  margin-bottom: 1em;
+}
+
+/* Simple lists do not get a bullet */
+ul.simple {
+  list-style-type: none;
+}
+
+/* Default of bold for definition list terms */
+.dlterm {
+  font-weight: bold;
+}
+
+/* Use CSS to expand lists with @compact="no" */
+.dltermexpand {
+  font-weight: bold;
+  margin-top: 1em;
+}
+
+*[compact="yes"] > li {
+  margin-top: 0;
+}
+
+*[compact="no"] > li {
+  margin-top: 0.53em;
+}
+
+.liexpand {
+  margin-bottom: 1em;
+  margin-top: 1em;
+}
+
+.sliexpand {
+  margin-bottom: 1em;
+  margin-top: 1em;
+}
+
+.dlexpand {
+  margin-bottom: 1em;
+  margin-top: 1em;
+}
+
+.ddexpand {
+  margin-bottom: 1em;
+  margin-top: 1em;
+}
+
+.stepexpand {
+  margin-bottom: 1em;
+  margin-top: 1em;
+}
+
+.substepexpand {
+  margin-bottom: 1em;
+  margin-top: 1em;
+}
+
+dt.prereq {
+  margin-left: 20px;
+}
+
+/* All note formats have the same default presentation */
+.note {
+  margin-bottom: 1em;
+  margin-top: 1em;
+}
+.note .notetitle, .note .notelisttitle,
+.note .note__title {
+  font-weight: bold;
+}
+
+/* Various basic phrase styles */
+.bold {
+  font-weight: bold;
+}
+
+.bolditalic {
+  font-style: italic;
+  font-weight: bold;
+}
+
+.italic {
+  font-style: italic;
+}
+
+.underlined {
+  text-decoration: underline;
+}
+
+.uicontrol {
+  font-weight: bold;
+}
+
+.defkwd {
+  font-weight: bold;
+  text-decoration: underline;
+}
+
+.shortcut {
+  text-decoration: underline;
+}
+
+table {
+  border-collapse: collapse;
+}
+
+table .desc {
+  display: block;
+  font-style: italic;
+}
+
+.cellrowborder {
+  border-bottom: solid 1px;
+  border-left: 0;
+  border-right: solid 1px;
+  border-top: 0;
+}
+
+.row-nocellborder {
+  border-bottom: solid 1px;
+  border-left: 0;
+  border-top: 0;
+}
+
+.cell-norowborder {
+  border-left: 0;
+  border-right: solid 1px;
+  border-top: 0;
+}
+
+.nocellnorowborder {
+  border: 0;
+}
+
+.firstcol {
+  font-weight: bold;
+}
+
+.table--pgwide-1 {
+  width: 100%;
+}
+
+.align-left {
+  text-align: left;
+}
+
+.align-right {
+  text-align: right;
+}
+
+.align-center {
+  text-align: center;
+}
+
+.align-justify {
+  text-align: justify;
+}
+
+.align-char {
+  text-align: char;
+}
+
+.valign-top {
+  vertical-align: top;
+}
+
+.valign-bottom {
+  vertical-align: bottom;
+}
+
+.valign-middle {
+  vertical-align: middle;
+}
+
+.colsep-0 {
+  border-right: 0;
+}
+
+.colsep-1 {
+  border-right: 1px solid;
+}
+
+.rowsep-0 {
+  border-bottom: 0;
+}
+
+.rowsep-1 {
+  border-bottom: 1px solid;
+}
+
+.stentry {
+  border-right: 1px solid;
+  border-bottom: 1px solid;
+}
+
+.stentry:last-child {
+  border-right: 0;
+}
+
+.strow:last-child .stentry {
+  border-bottom: 0;
+}
+
+/* Add space for top level topics */
+.nested0 {
+  margin-top: 1em;
+}
+
+/* div with class=p is used for paragraphs that contain blocks, to keep the XHTML valid */
+.p {
+  margin-top: 1em;
+}
+
+.linklist {
+  margin-bottom: 1em;
+}
+
+.linklistwithchild {
+  margin-right: 1.5em;
+  margin-top: 1em;
+}
+
+.sublinklist {
+  margin-right: 1.5em;
+  margin-top: 1em;
+}
+
+dt.prereq {
+  margin-right: 20px;
+}
+
+.cellrowborder {
+  border-left: solid 1px;
+  border-right: none;
+}
+
+.row-nocellborder {
+  border-left: hidden;
+  border-right: none;
+}
+
+.cell-norowborder {
+  border-left: solid 1px;
+  border-right: none;
+}
+
+.nocellnorowborder {
+  border-left: hidden;
+}

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/images/impala_arch.jpeg
----------------------------------------------------------------------
diff --git a/docs/build3x/html/images/impala_arch.jpeg b/docs/build3x/html/images/impala_arch.jpeg
new file mode 100644
index 0000000..8289469
Binary files /dev/null and b/docs/build3x/html/images/impala_arch.jpeg differ

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/index.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/index.html b/docs/build3x/html/index.html
new file mode 100644
index 0000000..41fc348
--- /dev/null
+++ b/docs/build3x/html/index.html
@@ -0,0 +1,3 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="map"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="impala"><link rel="stylesheet" type="text/css" href="commonltr.css"><title>Apache Impala Guide</title></head><body id="impala"><h1 class="title topictitle1">Apache Impala Guide</h1><nav><ul class="map"><li class="topicref"><a href="topics/impala_intro.html">Introducing Apache Impala</a></li><li class="topicref"><a href="topics/impala_concepts.html">Concepts and Architecture</a><ul><li class="topicref"><a href="topics/impala_components.html">Components</a></li><li class="topicref"><a href="topics/impala_development.html">Developing Applications</a></li><li class="topicref"><a href="topics/impala_hadoop.html">Role in the Hadoop Ecosystem</a></li></ul></li><li 
 class="topicref"><a href="topics/impala_planning.html">Deployment Planning</a><ul><li class="topicref"><a href="topics/impala_prereqs.html#prereqs">Requirements</a></li><li class="topicref"><a href="topics/impala_schema_design.html">Designing Schemas</a></li></ul></li><li class="topicref"><a href="topics/impala_install.html#install">Installing Impala</a></li><li class="topicref"><a href="topics/impala_config.html">Managing Impala</a><ul><li class="topicref"><a href="topics/impala_config_performance.html">Post-Installation Configuration for Impala</a></li><li class="topicref"><a href="topics/impala_odbc.html">Configuring Impala to Work with ODBC</a></li><li class="topicref"><a href="topics/impala_jdbc.html">Configuring Impala to Work with JDBC</a></li></ul></li><li class="topicref"><a href="topics/impala_upgrading.html">Upgrading Impala</a></li><li class="topicref"><a href="topics/impala_processes.html">Starting Impala</a><ul><li class="topicref"><a href="topics/impala_config_options
 .html">Modifying Impala Startup Options</a></li></ul></li><li class="topicref"><a href="topics/impala_tutorial.html">Tutorials</a></li><li class="topicref"><a href="topics/impala_admin.html">Administration</a><ul><li class="topicref"><a href="topics/impala_admission.html">Admission Control and Query Queuing</a></li><li class="topicref"><a href="topics/impala_resource_management.html">Resource Management for Impala</a></li><li class="topicref"><a href="topics/impala_timeouts.html">Setting Timeouts</a></li><li class="topicref"><a href="topics/impala_proxy.html">Load-Balancing Proxy for HA</a></li><li class="topicref"><a href="topics/impala_disk_space.html">Managing Disk Space</a></li></ul></li><li class="topicref"><a href="topics/impala_security.html">Impala Security</a><ul><li class="topicref"><a href="topics/impala_security_guidelines.html">Security Guidelines for Impala</a></li><li class="topicref"><a href="topics/impala_security_files.html">Securing Impala Data and Log Files</a></
 li><li class="topicref"><a href="topics/impala_security_install.html">Installation Considerations for Impala Security</a></li><li class="topicref"><a href="topics/impala_security_metastore.html">Securing the Hive Metastore Database</a></li><li class="topicref"><a href="topics/impala_security_webui.html">Securing the Impala Web User Interface</a></li><li class="topicref"><a href="topics/impala_ssl.html">Configuring TLS/SSL for Impala</a></li><li class="topicref"><a href="topics/impala_authorization.html">Enabling Sentry Authorization for Impala</a></li><li class="topicref"><a href="topics/impala_authentication.html">Impala Authentication</a><ul><li class="topicref"><a href="topics/impala_kerberos.html">Enabling Kerberos Authentication for Impala</a></li><li class="topicref"><a href="topics/impala_ldap.html">Enabling LDAP Authentication for Impala</a></li><li class="topicref"><a href="topics/impala_mixed_security.html">Using Multiple Authentication Methods with Impala</a></li><li clas
 s="topicref"><a href="topics/impala_delegation.html">Configuring Impala Delegation for Hue and BI Tools</a></li></ul></li><li class="topicref"><a href="topics/impala_auditing.html">Auditing</a></li><li class="topicref"><a href="topics/impala_lineage.html">Viewing Lineage Info</a></li></ul></li><li class="topicref"><a href="topics/impala_langref.html">SQL Reference</a><ul><li class="topicref"><a href="topics/impala_comments.html">Comments</a></li><li class="topicref"><a href="topics/impala_datatypes.html">Data Types</a><ul><li class="topicref"><a href="topics/impala_array.html">ARRAY Complex Type (Impala 2.3 or higher only)</a></li><li class="topicref"><a href="topics/impala_bigint.html">BIGINT</a></li><li class="topicref"><a href="topics/impala_boolean.html">BOOLEAN</a></li><li class="topicref"><a href="topics/impala_char.html">CHAR</a></li><li class="topicref"><a href="topics/impala_decimal.html">DECIMAL</a></li><li class="topicref"><a href="topics/impala_double.html">DOUBLE</a></l
 i><li class="topicref"><a href="topics/impala_float.html">FLOAT</a></li><li class="topicref"><a href="topics/impala_int.html">INT</a></li><li class="topicref"><a href="topics/impala_map.html">MAP Complex Type (Impala 2.3 or higher only)</a></li><li class="topicref"><a href="topics/impala_real.html">REAL</a></li><li class="topicref"><a href="topics/impala_smallint.html">SMALLINT</a></li><li class="topicref"><a href="topics/impala_string.html">STRING</a></li><li class="topicref"><a href="topics/impala_struct.html">STRUCT Complex Type (Impala 2.3 or higher only)</a></li><li class="topicref"><a href="topics/impala_timestamp.html">TIMESTAMP</a></li><li class="topicref"><a href="topics/impala_tinyint.html">TINYINT</a></li><li class="topicref"><a href="topics/impala_varchar.html">VARCHAR</a></li><li class="topicref"><a href="topics/impala_complex_types.html">Complex Types (Impala 2.3 or higher only)</a></li></ul></li><li class="topicref"><a href="topics/impala_literals.html">Literals</a></
 li><li class="topicref"><a href="topics/impala_operators.html">SQL Operators</a></li><li class="topicref"><a href="topics/impala_schema_objects.html">Schema Objects and Object Names</a><ul><li class="topicref"><a href="topics/impala_aliases.html">Aliases</a></li><li class="topicref"><a href="topics/impala_databases.html">Databases</a></li><li class="topicref"><a href="topics/impala_functions_overview.html">Functions</a></li><li class="topicref"><a href="topics/impala_identifiers.html">Identifiers</a></li><li class="topicref"><a href="topics/impala_tables.html">Tables</a></li><li class="topicref"><a href="topics/impala_views.html">Views</a></li></ul></li><li class="topicref"><a href="topics/impala_langref_sql.html">SQL Statements</a><ul><li class="topicref"><a href="topics/impala_ddl.html">DDL Statements</a></li><li class="topicref"><a href="topics/impala_dml.html">DML Statements</a></li><li class="topicref"><a href="topics/impala_alter_table.html">ALTER TABLE</a></li><li class="topi
 cref"><a href="topics/impala_alter_view.html">ALTER VIEW</a></li><li class="topicref"><a href="topics/impala_compute_stats.html">COMPUTE STATS</a></li><li class="topicref"><a href="topics/impala_create_database.html">CREATE DATABASE</a></li><li class="topicref"><a href="topics/impala_create_function.html">CREATE FUNCTION</a></li><li class="topicref"><a href="topics/impala_create_role.html">CREATE ROLE</a></li><li class="topicref"><a href="topics/impala_create_table.html">CREATE TABLE</a></li><li class="topicref"><a href="topics/impala_create_view.html">CREATE VIEW</a></li><li class="topicref"><a href="topics/impala_delete.html">DELETE</a></li><li class="topicref"><a href="topics/impala_describe.html">DESCRIBE</a></li><li class="topicref"><a href="topics/impala_drop_database.html">DROP DATABASE</a></li><li class="topicref"><a href="topics/impala_drop_function.html">DROP FUNCTION</a></li><li class="topicref"><a href="topics/impala_drop_role.html">DROP ROLE</a></li><li class="topicref"
 ><a href="topics/impala_drop_stats.html">DROP STATS</a></li><li class="topicref"><a href="topics/impala_drop_table.html">DROP TABLE</a></li><li class="topicref"><a href="topics/impala_drop_view.html">DROP VIEW</a></li><li class="topicref"><a href="topics/impala_explain.html">EXPLAIN</a></li><li class="topicref"><a href="topics/impala_grant.html">GRANT</a></li><li class="topicref"><a href="topics/impala_insert.html">INSERT</a></li><li class="topicref"><a href="topics/impala_invalidate_metadata.html">INVALIDATE METADATA</a></li><li class="topicref"><a href="topics/impala_load_data.html">LOAD DATA</a></li><li class="topicref"><a href="topics/impala_refresh.html">REFRESH</a></li><li class="topicref"><a href="topics/impala_revoke.html">REVOKE</a></li><li class="topicref"><a href="topics/impala_select.html">SELECT</a><ul><li class="topicref"><a href="topics/impala_joins.html">Joins</a></li><li class="topicref"><a href="topics/impala_order_by.html">ORDER BY Clause</a></li><li class="topicr
 ef"><a href="topics/impala_group_by.html">GROUP BY Clause</a></li><li class="topicref"><a href="topics/impala_having.html">HAVING Clause</a></li><li class="topicref"><a href="topics/impala_limit.html">LIMIT Clause</a></li><li class="topicref"><a href="topics/impala_offset.html">OFFSET Clause</a></li><li class="topicref"><a href="topics/impala_union.html">UNION Clause</a></li><li class="topicref"><a href="topics/impala_subqueries.html">Subqueries</a></li><li class="topicref"><a href="topics/impala_tablesample.html">TABLESAMPLE Clause</a></li><li class="topicref"><a href="topics/impala_with.html">WITH Clause</a></li><li class="topicref"><a href="topics/impala_distinct.html">DISTINCT Operator</a></li></ul></li><li class="topicref"><a href="topics/impala_set.html">SET</a><ul><li class="topicref"><a href="topics/impala_query_options.html">Query Options for the SET Statement</a><ul><li class="topicref"><a href="topics/impala_abort_on_error.html">ABORT_ON_ERROR</a></li><li class="topicref"
 ><a href="topics/impala_allow_unsupported_formats.html">ALLOW_UNSUPPORTED_FORMATS</a></li><li class="topicref"><a href="topics/impala_appx_count_distinct.html">APPX_COUNT_DISTINCT</a></li><li class="topicref"><a href="topics/impala_batch_size.html">BATCH_SIZE</a></li><li class="topicref"><a href="topics/impala_buffer_pool_limit.html">BUFFER_POOL_LIMIT</a></li><li class="topicref"><a href="topics/impala_compression_codec.html">COMPRESSION_CODEC</a></li><li class="topicref"><a href="topics/impala_compute_stats_min_sample_size.html">COMPUTE_STATS_MIN_SAMPLE_SIZE</a></li><li class="topicref"><a href="topics/impala_debug_action.html">DEBUG_ACTION</a></li><li class="topicref"><a href="topics/impala_decimal_v2.html">DECIMAL_V2</a></li><li class="topicref"><a href="topics/impala_default_join_distribution_mode.html">DEFAULT_JOIN_DISTRIBUTION_MODE</a></li><li class="topicref"><a href="topics/impala_default_spillable_buffer_size.html">DEFAULT_SPILLABLE_BUFFER_SIZE</a></li><li class="topicref">
 <a href="topics/impala_disable_codegen.html">DISABLE_CODEGEN</a></li><li class="topicref"><a href="topics/impala_disable_row_runtime_filtering.html">DISABLE_ROW_RUNTIME_FILTERING</a></li><li class="topicref"><a href="topics/impala_disable_streaming_preaggregations.html">DISABLE_STREAMING_PREAGGREGATIONS</a></li><li class="topicref"><a href="topics/impala_disable_unsafe_spills.html">DISABLE_UNSAFE_SPILLS</a></li><li class="topicref"><a href="topics/impala_exec_single_node_rows_threshold.html">EXEC_SINGLE_NODE_ROWS_THRESHOLD</a></li><li class="topicref"><a href="topics/impala_exec_time_limit_s.html">EXEC_TIME_LIMIT_S</a></li><li class="topicref"><a href="topics/impala_explain_level.html">EXPLAIN_LEVEL</a></li><li class="topicref"><a href="topics/impala_hbase_cache_blocks.html">HBASE_CACHE_BLOCKS</a></li><li class="topicref"><a href="topics/impala_hbase_caching.html">HBASE_CACHING</a></li><li class="topicref"><a href="topics/impala_live_progress.html">LIVE_PROGRESS</a></li><li class="t
 opicref"><a href="topics/impala_live_summary.html">LIVE_SUMMARY</a></li><li class="topicref"><a href="topics/impala_max_errors.html">MAX_ERRORS</a></li><li class="topicref"><a href="topics/impala_max_row_size.html">MAX_ROW_SIZE</a></li><li class="topicref"><a href="topics/impala_max_num_runtime_filters.html">MAX_NUM_RUNTIME_FILTERS</a></li><li class="topicref"><a href="topics/impala_max_scan_range_length.html">MAX_SCAN_RANGE_LENGTH</a></li><li class="topicref"><a href="topics/impala_mem_limit.html">MEM_LIMIT</a></li><li class="topicref"><a href="topics/impala_min_spillable_buffer_size.html">MIN_SPILLABLE_BUFFER_SIZE</a></li><li class="topicref"><a href="topics/impala_mt_dop.html">MT_DOP</a></li><li class="topicref"><a href="topics/impala_num_nodes.html">NUM_NODES</a></li><li class="topicref"><a href="topics/impala_num_scanner_threads.html">NUM_SCANNER_THREADS</a></li><li class="topicref"><a href="topics/impala_optimize_partition_key_scans.html">OPTIMIZE_PARTITION_KEY_SCANS</a></li><
 li class="topicref"><a href="topics/impala_parquet_compression_codec.html">PARQUET_COMPRESSION_CODEC</a></li><li class="topicref"><a href="topics/impala_parquet_annotate_strings_utf8.html">PARQUET_ANNOTATE_STRINGS_UTF8</a></li><li class="topicref"><a href="topics/impala_parquet_array_resolution.html">PARQUET_ARRAY_RESOLUTION</a></li><li class="topicref"><a href="topics/impala_parquet_fallback_schema_resolution.html">PARQUET_FALLBACK_SCHEMA_RESOLUTION</a></li><li class="topicref"><a href="topics/impala_parquet_file_size.html">PARQUET_FILE_SIZE</a></li><li class="topicref"><a href="topics/impala_prefetch_mode.html">PREFETCH_MODE</a></li><li class="topicref"><a href="topics/impala_query_timeout_s.html">QUERY_TIMEOUT_S</a></li><li class="topicref"><a href="topics/impala_request_pool.html">REQUEST_POOL</a></li><li class="topicref"><a href="topics/impala_replica_preference.html">REPLICA_PREFERENCE</a></li><li class="topicref"><a href="topics/impala_runtime_bloom_filter_size.html">RUNTIME_
 BLOOM_FILTER_SIZE</a></li><li class="topicref"><a href="topics/impala_runtime_filter_max_size.html">RUNTIME_FILTER_MAX_SIZE</a></li><li class="topicref"><a href="topics/impala_runtime_filter_min_size.html">RUNTIME_FILTER_MIN_SIZE</a></li><li class="topicref"><a href="topics/impala_runtime_filter_mode.html">RUNTIME_FILTER_MODE</a></li><li class="topicref"><a href="topics/impala_runtime_filter_wait_time_ms.html">RUNTIME_FILTER_WAIT_TIME_MS</a></li><li class="topicref"><a href="topics/impala_s3_skip_insert_staging.html">S3_SKIP_INSERT_STAGING</a></li><li class="topicref"><a href="topics/impala_schedule_random_replica.html">SCHEDULE_RANDOM_REPLICA</a></li><li class="topicref"><a href="topics/impala_scratch_limit.html">SCRATCH_LIMIT</a></li><li class="topicref"><a href="topics/impala_shuffle_distinct_exprs.html">SHUFFLE_DISTINCT_EXPRS</a></li><li class="topicref"><a href="topics/impala_support_start_over.html">SUPPORT_START_OVER</a></li><li class="topicref"><a href="topics/impala_sync_dd
 l.html">SYNC_DDL</a></li></ul></li></ul></li><li class="topicref"><a href="topics/impala_show.html">SHOW</a></li><li class="topicref"><a href="topics/impala_truncate_table.html">TRUNCATE TABLE</a></li><li class="topicref"><a href="topics/impala_update.html">UPDATE</a></li><li class="topicref"><a href="topics/impala_upsert.html">UPSERT</a></li><li class="topicref"><a href="topics/impala_use.html">USE</a></li><li class="topicref"><a href="topics/impala_hints.html">Optimizer Hints</a></li></ul></li><li class="topicref"><a href="topics/impala_functions.html">Built-In Functions</a><ul><li class="topicref"><a href="topics/impala_math_functions.html">Mathematical Functions</a></li><li class="topicref"><a href="topics/impala_bit_functions.html">Bit Functions</a></li><li class="topicref"><a href="topics/impala_conversion_functions.html">Type Conversion Functions</a></li><li class="topicref"><a href="topics/impala_datetime_functions.html">Date and Time Functions</a></li><li class="topicref"><
 a href="topics/impala_conditional_functions.html">Conditional Functions</a></li><li class="topicref"><a href="topics/impala_string_functions.html">String Functions</a></li><li class="topicref"><a href="topics/impala_misc_functions.html">Miscellaneous Functions</a></li><li class="topicref"><a href="topics/impala_aggregate_functions.html">Aggregate Functions</a><ul><li class="topicref"><a href="topics/impala_appx_median.html">APPX_MEDIAN</a></li><li class="topicref"><a href="topics/impala_avg.html">AVG</a></li><li class="topicref"><a href="topics/impala_count.html">COUNT</a></li><li class="topicref"><a href="topics/impala_group_concat.html">GROUP_CONCAT</a></li><li class="topicref"><a href="topics/impala_max.html">MAX</a></li><li class="topicref"><a href="topics/impala_min.html">MIN</a></li><li class="topicref"><a href="topics/impala_ndv.html">NDV</a></li><li class="topicref"><a href="topics/impala_stddev.html">STDDEV, STDDEV_SAMP, STDDEV_POP</a></li><li class="topicref"><a href="topi
 cs/impala_sum.html">SUM</a></li><li class="topicref"><a href="topics/impala_variance.html">VARIANCE, VARIANCE_SAMP, VARIANCE_POP, VAR_SAMP, VAR_POP</a></li></ul></li><li class="topicref"><a href="topics/impala_analytic_functions.html">Analytic Functions</a></li><li class="topicref"><a href="topics/impala_udf.html">Impala User-Defined Functions (UDFs)</a></li></ul></li><li class="topicref"><a href="topics/impala_langref_unsupported.html">SQL Differences Between Impala and Hive</a></li><li class="topicref"><a href="topics/impala_porting.html">Porting SQL</a></li></ul></li><li class="topicref"><a href="topics/impala_impala_shell.html">The Impala Shell</a><ul><li class="topicref"><a href="topics/impala_shell_options.html">Configuration Options</a></li><li class="topicref"><a href="topics/impala_connecting.html">Connecting to impalad</a></li><li class="topicref"><a href="topics/impala_shell_running_commands.html">Running Commands and SQL Statements</a></li><li class="topicref"><a href="t
 opics/impala_shell_commands.html">Command Reference</a></li></ul></li><li class="topicref"><a href="topics/impala_performance.html">Performance Tuning</a><ul><li class="topicref"><a href="topics/impala_perf_cookbook.html">Performance Best Practices</a></li><li class="topicref"><a href="topics/impala_perf_joins.html">Join Performance</a></li><li class="topicref"><a href="topics/impala_perf_stats.html">Table and Column Statistics</a></li><li class="topicref"><a href="topics/impala_perf_benchmarking.html">Benchmarking</a></li><li class="topicref"><a href="topics/impala_perf_resources.html">Controlling Resource Usage</a></li><li class="topicref"><a href="topics/impala_runtime_filtering.html">Runtime Filtering</a></li><li class="topicref"><a href="topics/impala_perf_hdfs_caching.html">HDFS Caching</a></li><li class="topicref"><a href="topics/impala_perf_testing.html">Testing Impala Performance</a></li><li class="topicref"><a href="topics/impala_explain_plan.html">EXPLAIN Plans and Query 
 Profiles</a></li><li class="topicref"><a href="topics/impala_perf_skew.html">HDFS Block Skew</a></li></ul></li><li class="topicref"><a href="topics/impala_scalability.html">Scalability Considerations</a></li><li class="topicref"><a href="topics/impala_partitioning.html">Partitioning</a></li><li class="topicref"><a href="topics/impala_file_formats.html">File Formats</a><ul><li class="topicref"><a href="topics/impala_txtfile.html">Text Data Files</a></li><li class="topicref"><a href="topics/impala_parquet.html">Parquet Data Files</a></li><li class="topicref"><a href="topics/impala_avro.html">Avro Data Files</a></li><li class="topicref"><a href="topics/impala_rcfile.html">RCFile Data Files</a></li><li class="topicref"><a href="topics/impala_seqfile.html">SequenceFile Data Files</a></li></ul></li><li class="topicref"><a href="topics/impala_kudu.html">Using Impala to Query Kudu Tables</a></li><li class="topicref"><a href="topics/impala_hbase.html">HBase Tables</a></li><li class="topicref
 "><a href="topics/impala_s3.html">S3 Tables</a></li><li class="topicref"><a href="topics/impala_adls.html">ADLS Tables</a></li><li class="topicref"><a href="topics/impala_isilon.html">Isilon Storage</a></li><li class="topicref"><a href="topics/impala_logging.html">Logging</a></li><li class="topicref"><a href="topics/impala_troubleshooting.html">Troubleshooting Impala</a><ul><li class="topicref"><a href="topics/impala_webui.html">Web User Interface</a></li><li class="topicref"><a href="topics/impala_breakpad.html">Breakpad Minidumps</a></li></ul></li><li class="topicref"><a href="topics/impala_ports.html">Ports Used by Impala</a></li><li class="topicref"><a href="topics/impala_reserved_words.html">Impala Reserved Words</a></li><li class="topicref"><a href="topics/impala_faq.html">Impala Frequently Asked Questions</a></li><li class="topicref"><a href="topics/impala_release_notes.html">Impala Release Notes</a><ul><li class="topicref"><a href="topics/impala_new_features.html">New Featur
 es in Apache Impala</a></li><li class="topicref"><a href="topics/impala_incompatible_changes.html">Incompatible Changes and Limitations in Apache Impala</a></li><li class="topicref"><a href="topics/impala_known_issues.html">Known Issues and Workarounds in Impala</a></li><li class="topicref"><a href="topics/impala_fixed_issues.html">Fixed Issues in Apache Impala</a></li></ul></li></ul></nav></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_abort_on_error.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_abort_on_error.html b/docs/build3x/html/topics/impala_abort_on_error.html
new file mode 100644
index 0000000..6887375
--- /dev/null
+++ b/docs/build3x/html/topics/impala_abort_on_error.html
@@ -0,0 +1,42 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="abort_on_error"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>ABORT_ON_ERROR Query Option</title></head><body id="abort_on_error"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">ABORT_ON_ERROR Query Option</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+
+      When this option is enabled, Impala cancels a query immediately when any of the nodes encounters an error,
+      rather than continuing and possibly returning incomplete results. This option is disabled by default, to help
+      gather maximum diagnostic information when an error occurs, for example, whether the same problem occurred on
+      all nodes or only a single node. Currently, the errors that Impala can skip over involve data corruption,
+      such as a column that contains a string value when expected to contain an integer value.
+    </p>
+
+    <p class="p">
+      To control how much logging Impala does for non-fatal errors when <code class="ph codeph">ABORT_ON_ERROR</code> is turned
+      off, use the <code class="ph codeph">MAX_ERRORS</code> option.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Type:</strong> Boolean; recognized values are 1 and 0, or <code class="ph codeph">true</code> and <code class="ph codeph">false</code>;
+        any other value interpreted as <code class="ph codeph">false</code>
+      </p>
+    <p class="p">
+        <strong class="ph b">Default:</strong> <code class="ph codeph">false</code> (shown as 0 in output of <code class="ph codeph">SET</code> statement)
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+    <p class="p">
+      <a class="xref" href="impala_max_errors.html#max_errors">MAX_ERRORS Query Option</a>,
+      <a class="xref" href="impala_logging.html#logging">Using Impala Logging</a>
+    </p>
+
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>

[28/51] [partial] impala git commit: [DOCS] Impala doc site update for 3.0

Posted by mi...@apache.org.

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_insert.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_insert.html b/docs/build3x/html/topics/impala_insert.html
new file mode 100644
index 0000000..61044fb
--- /dev/null
+++ b/docs/build3x/html/topics/impala_insert.html
@@ -0,0 +1,911 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_langref_sql.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="insert"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>INSERT Statement</title></head><body id="insert"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">INSERT Statement</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+
+      Impala supports inserting into tables and partitions that you create with the Impala <code class="ph codeph">CREATE
+      TABLE</code> statement, or pre-defined tables and partitions created through Hive.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>[<var class="keyword varname">with_clause</var>]
+  INSERT <span class="ph">[<var class="keyword varname">hint_clause</var>]</span> { INTO | OVERWRITE } [TABLE] <var class="keyword varname">table_name</var>
+  [(<var class="keyword varname">column_list</var>)]
+  [ PARTITION (<var class="keyword varname">partition_clause</var>)]
+{
+    [<var class="keyword varname">hint_clause</var>] <var class="keyword varname">select_statement</var>
+  | VALUES (<var class="keyword varname">value</var> [, <var class="keyword varname">value</var> ...]) [, (<var class="keyword varname">value</var> [, <var class="keyword varname">value</var> ...]) ...]
+}
+
+partition_clause ::= <var class="keyword varname">col_name</var> [= <var class="keyword varname">constant</var>] [, <var class="keyword varname">col_name</var> [= <var class="keyword varname">constant</var>] ...]
+
+hint_clause ::=
+  <var class="keyword varname">hint_with_dashes</var> |
+  <var class="keyword varname">hint_with_cstyle_delimiters</var> |
+  <var class="keyword varname">hint_with_brackets</var>
+
+hint_with_dashes ::= -- +SHUFFLE | -- +NOSHUFFLE <span class="ph">-- +CLUSTERED</span>
+
+hint_with_cstyle_comments ::= /* +SHUFFLE */ | /* +NOSHUFFLE */ <span class="ph">| /* +CLUSTERED */</span>
+
+hint_with_brackets ::= [SHUFFLE] | [NOSHUFFLE]
+  (With this hint format, the square brackets are part of the syntax.)
+</code></pre>
+
+    <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+        The square bracket style of hint is now deprecated and might be removed in
+        a future release. For that reason, any newly added hints are not available
+        with the square bracket syntax.
+      </div>
+
+    <p class="p">
+      <strong class="ph b">Appending or replacing (INTO and OVERWRITE clauses):</strong>
+    </p>
+
+    <p class="p">
+      The <code class="ph codeph">INSERT INTO</code> syntax appends data to a table. The existing data files are left as-is, and
+      the inserted data is put into one or more new data files.
+    </p>
+
+    <p class="p">
+      The <code class="ph codeph">INSERT OVERWRITE</code> syntax replaces the data in a table.
+
+
+      Currently, the overwritten data files are deleted immediately; they do not go through the HDFS trash
+      mechanism.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Complex type considerations:</strong>
+      </p>
+
+    <p class="p">
+      The <code class="ph codeph">INSERT</code> statement currently does not support writing data files
+      containing complex types (<code class="ph codeph">ARRAY</code>, <code class="ph codeph">STRUCT</code>, and <code class="ph codeph">MAP</code>).
+      To prepare Parquet data for such tables, you generate the data files outside Impala and then
+      use <code class="ph codeph">LOAD DATA</code> or <code class="ph codeph">CREATE EXTERNAL TABLE</code> to associate those
+      data files with the table. Currently, such tables must use the Parquet file format.
+      See <a class="xref" href="impala_complex_types.html#complex_types">Complex Types (Impala 2.3 or higher only)</a> for details about working with complex types.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Kudu considerations:</strong>
+      </p>
+
+    <p class="p">
+        Currently, the <code class="ph codeph">INSERT OVERWRITE</code> syntax cannot be used with Kudu tables.
+      </p>
+
+    <p class="p">
+      Kudu tables require a unique primary key for each row. If an <code class="ph codeph">INSERT</code>
+      statement attempts to insert a row with the same values for the primary key columns
+      as an existing row, that row is discarded and the insert operation continues.
+      When rows are discarded due to duplicate primary keys, the statement finishes
+      with a warning, not an error. (This is a change from early releases of Kudu
+      where the default was to return in error in such cases, and the syntax
+      <code class="ph codeph">INSERT IGNORE</code> was required to make the statement succeed.
+      The <code class="ph codeph">IGNORE</code> clause is no longer part of the <code class="ph codeph">INSERT</code>
+      syntax.)
+    </p>
+
+    <p class="p">
+      For situations where you prefer to replace rows with duplicate primary key values,
+      rather than discarding the new data, you can use the <code class="ph codeph">UPSERT</code>
+      statement instead of <code class="ph codeph">INSERT</code>. <code class="ph codeph">UPSERT</code> inserts
+      rows that are entirely new, and for rows that match an existing primary key in the
+      table, the non-primary-key columns are updated to reflect the values in the
+      <span class="q">"upserted"</span> data.
+    </p>
+
+    <p class="p">
+      If you really want to store new rows, not replace existing ones, but cannot do so
+      because of the primary key uniqueness constraint, consider recreating the table
+      with additional columns included in the primary key.
+    </p>
+
+    <p class="p">
+      See <a class="xref" href="impala_kudu.html#impala_kudu">Using Impala to Query Kudu Tables</a> for more details about using Impala with Kudu.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+    <p class="p">
+      Impala currently supports:
+    </p>
+
+    <ul class="ul">
+      <li class="li">
+        Copy data from another table using <code class="ph codeph">SELECT</code> query. In Impala 1.2.1 and higher, you can
+        combine <code class="ph codeph">CREATE TABLE</code> and <code class="ph codeph">INSERT</code> operations into a single step with the
+        <code class="ph codeph">CREATE TABLE AS SELECT</code> syntax, which bypasses the actual <code class="ph codeph">INSERT</code> keyword.
+      </li>
+
+      <li class="li">
+        An optional <a class="xref" href="impala_with.html#with"><code class="ph codeph">WITH</code> clause</a> before the
+        <code class="ph codeph">INSERT</code> keyword, to define a subquery referenced in the <code class="ph codeph">SELECT</code> portion.
+      </li>
+
+      <li class="li">
+        Create one or more new rows using constant expressions through <code class="ph codeph">VALUES</code> clause. (The
+        <code class="ph codeph">VALUES</code> clause was added in Impala 1.0.1.)
+      </li>
+
+      <li class="li">
+        <p class="p">
+          By default, the first column of each newly inserted row goes into the first column of the table, the
+          second column into the second column, and so on.
+        </p>
+        <p class="p">
+          You can also specify the columns to be inserted, an arbitrarily ordered subset of the columns in the
+          destination table, by specifying a column list immediately after the name of the destination table. This
+          feature lets you adjust the inserted columns to match the layout of a <code class="ph codeph">SELECT</code> statement,
+          rather than the other way around. (This feature was added in Impala 1.1.)
+        </p>
+        <p class="p">
+          The number of columns mentioned in the column list (known as the <span class="q">"column permutation"</span>) must match
+          the number of columns in the <code class="ph codeph">SELECT</code> list or the <code class="ph codeph">VALUES</code> tuples. The
+          order of columns in the column permutation can be different than in the underlying table, and the columns
+          of each input row are reordered to match. If the number of columns in the column permutation is less than
+          in the destination table, all unmentioned columns are set to <code class="ph codeph">NULL</code>.
+        </p>
+      </li>
+
+      <li class="li">
+        An optional hint clause immediately either before the <code class="ph codeph">SELECT</code> keyword or after the
+        <code class="ph codeph">INSERT</code> keyword, to fine-tune the behavior when doing an <code class="ph codeph">INSERT ... SELECT</code>
+        operation into partitioned Parquet tables. The hint clause cannot be specified in multiple places.
+        The hint keywords are <code class="ph codeph">[SHUFFLE]</code> and <code class="ph codeph">[NOSHUFFLE]</code>, including the square brackets.
+        Inserting into partitioned Parquet tables can be a resource-intensive operation because it potentially
+        involves many files being written to HDFS simultaneously, and separate
+        <span class="ph">large</span> memory buffers being allocated to buffer the data for each
+        partition. For usage details, see <a class="xref" href="impala_parquet.html#parquet_etl">Loading Data into Parquet Tables</a>.
+      </li>
+    </ul>
+
+    <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+      <ul class="ul">
+        <li class="li">
+          Insert commands that partition or add files result in changes to Hive metadata. Because Impala uses Hive
+          metadata, such changes may necessitate a metadata refresh. For more information, see the
+          <a class="xref" href="impala_refresh.html#refresh">REFRESH</a> function.
+        </li>
+
+        <li class="li">
+          Currently, Impala can only insert data into tables that use the text and Parquet formats. For other file
+          formats, insert the data using Hive and use Impala to query it.
+        </li>
+
+        <li class="li">
+          As an alternative to the <code class="ph codeph">INSERT</code> statement, if you have existing data files elsewhere in
+          HDFS, the <code class="ph codeph">LOAD DATA</code> statement can move those files into a table. This statement works
+          with tables of any file format.
+        </li>
+      </ul>
+    </div>
+
+    <p class="p">
+        <strong class="ph b">Statement type:</strong> DML (but still affected by
+        <a class="xref" href="../shared/../topics/impala_sync_ddl.html#sync_ddl">SYNC_DDL</a> query option)
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+    <p class="p">
+      When you insert the results of an expression, particularly of a built-in function call, into a small numeric
+      column such as <code class="ph codeph">INT</code>, <code class="ph codeph">SMALLINT</code>, <code class="ph codeph">TINYINT</code>, or
+      <code class="ph codeph">FLOAT</code>, you might need to use a <code class="ph codeph">CAST()</code> expression to coerce values into the
+      appropriate type. Impala does not automatically convert from a larger type to a smaller one. For example, to
+      insert cosine values into a <code class="ph codeph">FLOAT</code> column, write <code class="ph codeph">CAST(COS(angle) AS FLOAT)</code>
+      in the <code class="ph codeph">INSERT</code> statement to make the conversion explicit.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">File format considerations:</strong>
+      </p>
+
+    <p class="p">
+      Because Impala can read certain file formats that it cannot write,
+      the <code class="ph codeph">INSERT</code> statement does not work for all kinds of
+      Impala tables. See <a class="xref" href="impala_file_formats.html#file_formats">How Impala Works with Hadoop File Formats</a>
+      for details about what file formats are supported by the
+      <code class="ph codeph">INSERT</code> statement.
+    </p>
+
+    <p class="p">
+        Any <code class="ph codeph">INSERT</code> statement for a Parquet table requires enough free space in the HDFS filesystem
+        to write one block. Because Parquet data files use a block size of 1 GB by default, an
+        <code class="ph codeph">INSERT</code> might fail (even for a very small amount of data) if your HDFS is running low on
+        space.
+      </p>
+
+    <p class="p">
+        If you connect to different Impala nodes within an <span class="keyword cmdname">impala-shell</span> session for
+        load-balancing purposes, you can enable the <code class="ph codeph">SYNC_DDL</code> query option to make each DDL
+        statement wait before returning, until the new or changed metadata has been received by all the Impala
+        nodes. See <a class="xref" href="../shared/../topics/impala_sync_ddl.html#sync_ddl">SYNC_DDL Query Option</a> for details.
+      </p>
+
+    <div class="note important note_important"><span class="note__title importanttitle">Important:</span>
+        After adding or replacing data in a table used in performance-critical queries, issue a <code class="ph codeph">COMPUTE
+        STATS</code> statement to make sure all statistics are up-to-date. Consider updating statistics for a
+        table after any <code class="ph codeph">INSERT</code>, <code class="ph codeph">LOAD DATA</code>, or <code class="ph codeph">CREATE TABLE AS
+        SELECT</code> statement in Impala, or after loading data through Hive and doing a <code class="ph codeph">REFRESH
+        <var class="keyword varname">table_name</var></code> in Impala. This technique is especially important for tables that
+        are very large, used in join queries, or both.
+      </div>
+
+    <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+    <p class="p">
+      The following example sets up new tables with the same definition as the <code class="ph codeph">TAB1</code> table from the
+      <a class="xref" href="impala_tutorial.html#tutorial">Tutorial</a> section, using different file
+      formats, and demonstrates inserting data into the tables created with the <code class="ph codeph">STORED AS TEXTFILE</code>
+      and <code class="ph codeph">STORED AS PARQUET</code> clauses:
+    </p>
+
+<pre class="pre codeblock"><code>CREATE DATABASE IF NOT EXISTS file_formats;
+USE file_formats;
+
+DROP TABLE IF EXISTS text_table;
+CREATE TABLE text_table
+( id INT, col_1 BOOLEAN, col_2 DOUBLE, col_3 TIMESTAMP )
+STORED AS TEXTFILE;
+
+DROP TABLE IF EXISTS parquet_table;
+CREATE TABLE parquet_table
+( id INT, col_1 BOOLEAN, col_2 DOUBLE, col_3 TIMESTAMP )
+STORED AS PARQUET;</code></pre>
+
+    <p class="p">
+      With the <code class="ph codeph">INSERT INTO TABLE</code> syntax, each new set of inserted rows is appended to any existing
+      data in the table. This is how you would record small amounts of data that arrive continuously, or ingest new
+      batches of data alongside the existing data. For example, after running 2 <code class="ph codeph">INSERT INTO TABLE</code>
+      statements with 5 rows each, the table contains 10 rows total:
+    </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; insert into table text_table select * from default.tab1;
+Inserted 5 rows in 0.41s
+
+[localhost:21000] &gt; insert into table text_table select * from default.tab1;
+Inserted 5 rows in 0.46s
+
+[localhost:21000] &gt; select count(*) from text_table;
++----------+
+| count(*) |
++----------+
+| 10       |
++----------+
+Returned 1 row(s) in 0.26s</code></pre>
+
+    <p class="p">
+      With the <code class="ph codeph">INSERT OVERWRITE TABLE</code> syntax, each new set of inserted rows replaces any existing
+      data in the table. This is how you load data to query in a data warehousing scenario where you analyze just
+      the data for a particular day, quarter, and so on, discarding the previous data each time. You might keep the
+      entire set of data in one raw table, and transfer and transform certain rows into a more compact and
+      efficient form to perform intensive analysis on that subset.
+    </p>
+
+    <p class="p">
+      For example, here we insert 5 rows into a table using the <code class="ph codeph">INSERT INTO</code> clause, then replace
+      the data by inserting 3 rows with the <code class="ph codeph">INSERT OVERWRITE</code> clause. Afterward, the table only
+      contains the 3 rows from the final <code class="ph codeph">INSERT</code> statement.
+    </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; insert into table parquet_table select * from default.tab1;
+Inserted 5 rows in 0.35s
+
+[localhost:21000] &gt; insert overwrite table parquet_table select * from default.tab1 limit 3;
+Inserted 3 rows in 0.43s
+[localhost:21000] &gt; select count(*) from parquet_table;
++----------+
+| count(*) |
++----------+
+| 3        |
++----------+
+Returned 1 row(s) in 0.43s</code></pre>
+
+    <p class="p">
+      The <code class="ph codeph"><a class="xref" href="impala_insert.html#values">VALUES</a></code> clause lets you insert one or more
+      rows by specifying constant values for all the columns. The number, types, and order of the expressions must
+      match the table definition.
+    </p>
+
+    <div class="note note note_note" id="insert__insert_values_warning"><span class="note__title notetitle">Note:</span>
+      The <code class="ph codeph">INSERT ... VALUES</code> technique is not suitable for loading large quantities of data into
+      HDFS-based tables, because the insert operations cannot be parallelized, and each one produces a separate
+      data file. Use it for setting up small dimension tables or tiny amounts of data for experimenting with SQL
+      syntax, or with HBase tables. Do not use it for large ETL jobs or benchmark tests for load operations. Do not
+      run scripts with thousands of <code class="ph codeph">INSERT ... VALUES</code> statements that insert a single row each
+      time. If you do run <code class="ph codeph">INSERT ... VALUES</code> operations to load data into a staging table as one
+      stage in an ETL pipeline, include multiple row values if possible within each <code class="ph codeph">VALUES</code> clause,
+      and use a separate database to make cleanup easier if the operation does produce many tiny files.
+    </div>
+
+    <p class="p">
+      The following example shows how to insert one row or multiple rows, with expressions of different types,
+      using literal values, expressions, and function return values:
+    </p>
+
+<pre class="pre codeblock"><code>create table val_test_1 (c1 int, c2 float, c3 string, c4 boolean, c5 timestamp);
+insert into val_test_1 values (100, 99.9/10, 'abc', true, now());
+create table val_test_2 (id int, token string);
+insert overwrite val_test_2 values (1, 'a'), (2, 'b'), (-1,'xyzzy');</code></pre>
+
+    <p class="p">
+      These examples show the type of <span class="q">"not implemented"</span> error that you see when attempting to insert data into
+      a table with a file format that Impala currently does not write to:
+    </p>
+
+<pre class="pre codeblock"><code>DROP TABLE IF EXISTS sequence_table;
+CREATE TABLE sequence_table
+( id INT, col_1 BOOLEAN, col_2 DOUBLE, col_3 TIMESTAMP )
+STORED AS SEQUENCEFILE;
+
+DROP TABLE IF EXISTS rc_table;
+CREATE TABLE rc_table
+( id INT, col_1 BOOLEAN, col_2 DOUBLE, col_3 TIMESTAMP )
+STORED AS RCFILE;
+
+[localhost:21000] &gt; insert into table rc_table select * from default.tab1;
+Remote error
+Backend 0:RC_FILE not implemented.
+
+[localhost:21000] &gt; insert into table sequence_table select * from default.tab1;
+Remote error
+Backend 0:SEQUENCE_FILE not implemented. </code></pre>
+
+    <p class="p">
+      The following examples show how you can copy the data in all the columns from one table to another, copy the
+      data from only some columns, or specify the columns in the select list in a different order than they
+      actually appear in the table:
+    </p>
+
+<pre class="pre codeblock"><code>-- Start with 2 identical tables.
+create table t1 (c1 int, c2 int);
+create table t2 like t1;
+
+-- If there is no () part after the destination table name,
+-- all columns must be specified, either as * or by name.
+insert into t2 select * from t1;
+insert into t2 select c1, c2 from t1;
+
+-- With the () notation following the destination table name,
+-- you can omit columns (all values for that column are NULL
+-- in the destination table), and/or reorder the values
+-- selected from the source table. This is the "column permutation" feature.
+insert into t2 (c1) select c1 from t1;
+insert into t2 (c2, c1) select c1, c2 from t1;
+
+-- The column names can be entirely different in the source and destination tables.
+-- You can copy any columns, not just the corresponding ones, from the source table.
+-- But the number and type of selected columns must match the columns mentioned in the () part.
+alter table t2 replace columns (x int, y int);
+insert into t2 (y) select c1 from t1;
+</code></pre>
+
+    <p class="p">
+        <strong class="ph b">Sorting considerations:</strong> Although you can specify an <code class="ph codeph">ORDER BY</code> clause in an
+        <code class="ph codeph">INSERT ... SELECT</code> statement, any <code class="ph codeph">ORDER BY</code> clause is ignored and the
+        results are not necessarily sorted. An <code class="ph codeph">INSERT ... SELECT</code> operation potentially creates
+        many different data files, prepared on different data nodes, and therefore the notion of the data being
+        stored in sorted order is impractical.
+      </p>
+
+    <p class="p">
+      <strong class="ph b">Concurrency considerations:</strong> Each <code class="ph codeph">INSERT</code> operation creates new data files with unique
+      names, so you can run multiple <code class="ph codeph">INSERT INTO</code> statements simultaneously without filename
+      conflicts.
+
+      While data is being inserted into an Impala table, the data is staged temporarily in a subdirectory inside
+      the data directory; during this period, you cannot issue queries against that table in Hive. If an
+      <code class="ph codeph">INSERT</code> operation fails, the temporary data file and the subdirectory could be left behind in
+      the data directory. If so, remove the relevant subdirectory and any data files it contains manually, by
+      issuing an <code class="ph codeph">hdfs dfs -rm -r</code> command, specifying the full path of the work subdirectory, whose
+      name ends in <code class="ph codeph">_dir</code>.
+    </p>
+  </div>
+
+  <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_langref_sql.html">Impala SQL Statements</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="insert__values">
+
+    <h2 class="title topictitle2" id="ariaid-title2">VALUES Clause</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        The <code class="ph codeph">VALUES</code> clause is a general-purpose way to specify the columns of one or more rows,
+        typically within an <code class="ph codeph"><a class="xref" href="impala_insert.html#insert">INSERT</a></code> statement.
+      </p>
+
+      <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+        The <code class="ph codeph">INSERT ... VALUES</code> technique is not suitable for loading large quantities of data into
+        HDFS-based tables, because the insert operations cannot be parallelized, and each one produces a separate
+        data file. Use it for setting up small dimension tables or tiny amounts of data for experimenting with SQL
+        syntax, or with HBase tables. Do not use it for large ETL jobs or benchmark tests for load operations. Do
+        not run scripts with thousands of <code class="ph codeph">INSERT ... VALUES</code> statements that insert a single row
+        each time. If you do run <code class="ph codeph">INSERT ... VALUES</code> operations to load data into a staging table as
+        one stage in an ETL pipeline, include multiple row values if possible within each <code class="ph codeph">VALUES</code>
+        clause, and use a separate database to make cleanup easier if the operation does produce many tiny files.
+      </div>
+
+      <p class="p">
+        The following examples illustrate:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          How to insert a single row using a <code class="ph codeph">VALUES</code> clause.
+        </li>
+
+        <li class="li">
+          How to insert multiple rows using a <code class="ph codeph">VALUES</code> clause.
+        </li>
+
+        <li class="li">
+          How the row or rows from a <code class="ph codeph">VALUES</code> clause can be appended to a table through
+          <code class="ph codeph">INSERT INTO</code>, or replace the contents of the table through <code class="ph codeph">INSERT
+          OVERWRITE</code>.
+        </li>
+
+        <li class="li">
+          How the entries in a <code class="ph codeph">VALUES</code> clause can be literals, function results, or any other kind
+          of expression. See <a class="xref" href="impala_literals.html#literals">Literals</a> for the notation to use for literal
+          values, especially <a class="xref" href="impala_literals.html#string_literals">String Literals</a> for quoting and escaping
+          conventions for strings. See <a class="xref" href="impala_operators.html#operators">SQL Operators</a> and
+          <a class="xref" href="impala_functions.html#builtins">Impala Built-In Functions</a> for other things you can include in expressions with the
+          <code class="ph codeph">VALUES</code> clause.
+        </li>
+      </ul>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; describe val_example;
+Query: describe val_example
+Query finished, fetching results ...
++-------+---------+---------+
+| name  | type    | comment |
++-------+---------+---------+
+| id    | int     |         |
+| col_1 | boolean |         |
+| col_2 | double  |         |
++-------+---------+---------+
+
+[localhost:21000] &gt; insert into val_example values (1,true,100.0);
+Inserted 1 rows in 0.30s
+[localhost:21000] &gt; select * from val_example;
++----+-------+-------+
+| id | col_1 | col_2 |
++----+-------+-------+
+| 1  | true  | 100   |
++----+-------+-------+
+
+[localhost:21000] &gt; insert overwrite val_example values (10,false,pow(2,5)), (50,true,10/3);
+Inserted 2 rows in 0.16s
+[localhost:21000] &gt; select * from val_example;
++----+-------+-------------------+
+| id | col_1 | col_2             |
++----+-------+-------------------+
+| 10 | false | 32                |
+| 50 | true  | 3.333333333333333 |
++----+-------+-------------------+</code></pre>
+
+      <p class="p">
+        When used in an <code class="ph codeph">INSERT</code> statement, the Impala <code class="ph codeph">VALUES</code> clause can specify
+        some or all of the columns in the destination table, and the columns can be specified in a different order
+        than they actually appear in the table. To specify a different set or order of columns than in the table,
+        use the syntax:
+      </p>
+
+<pre class="pre codeblock"><code>INSERT INTO <var class="keyword varname">destination</var>
+  (<var class="keyword varname">col_x</var>, <var class="keyword varname">col_y</var>, <var class="keyword varname">col_z</var>)
+  VALUES
+  (<var class="keyword varname">val_x</var>, <var class="keyword varname">val_y</var>, <var class="keyword varname">val_z</var>);
+</code></pre>
+
+      <p class="p">
+        Any columns in the table that are not listed in the <code class="ph codeph">INSERT</code> statement are set to
+        <code class="ph codeph">NULL</code>.
+      </p>
+
+
+
+      <p class="p">
+        To use a <code class="ph codeph">VALUES</code> clause like a table in other statements, wrap it in parentheses and use
+        <code class="ph codeph">AS</code> clauses to specify aliases for the entire object and any columns you need to refer to:
+      </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; select * from (values(4,5,6),(7,8,9)) as t;
++---+---+---+
+| 4 | 5 | 6 |
++---+---+---+
+| 4 | 5 | 6 |
+| 7 | 8 | 9 |
++---+---+---+
+[localhost:21000] &gt; select * from (values(1 as c1, true as c2, 'abc' as c3),(100,false,'xyz')) as t;
++-----+-------+-----+
+| c1  | c2    | c3  |
++-----+-------+-----+
+| 1   | true  | abc |
+| 100 | false | xyz |
++-----+-------+-----+</code></pre>
+
+      <p class="p">
+        For example, you might use a tiny table constructed like this from constant literals or function return
+        values as part of a longer statement involving joins or <code class="ph codeph">UNION ALL</code>.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">HDFS considerations:</strong>
+      </p>
+
+      <p class="p">
+        Impala physically writes all inserted files under the ownership of its default user, typically
+        <code class="ph codeph">impala</code>. Therefore, this user must have HDFS write permission in the corresponding table
+        directory.
+      </p>
+
+      <p class="p">
+        The permission requirement is independent of the authorization performed by the Sentry framework. (If the
+        connected user is not authorized to insert into a table, Sentry blocks that operation immediately,
+        regardless of the privileges available to the <code class="ph codeph">impala</code> user.) Files created by Impala are
+        not owned by and do not inherit permissions from the connected user.
+      </p>
+
+      <p class="p">
+        The number of data files produced by an <code class="ph codeph">INSERT</code> statement depends on the size of the
+        cluster, the number of data blocks that are processed, the partition key columns in a partitioned table,
+        and the mechanism Impala uses for dividing the work in parallel. Do not assume that an
+        <code class="ph codeph">INSERT</code> statement will produce some particular number of output files. In case of
+        performance issues with data written by Impala, check that the output files do not suffer from issues such
+        as many tiny files or many tiny partitions. (In the Hadoop context, even files or partitions of a few tens
+        of megabytes are considered <span class="q">"tiny"</span>.)
+      </p>
+
+      <p class="p">
+        The <code class="ph codeph">INSERT</code> statement has always left behind a hidden work directory inside the data
+        directory of the table. Formerly, this hidden work directory was named
+        <span class="ph filepath">.impala_insert_staging</span> . In Impala 2.0.1 and later, this directory name is changed to
+        <span class="ph filepath">_impala_insert_staging</span> . (While HDFS tools are expected to treat names beginning
+        either with underscore and dot as hidden, in practice names beginning with an underscore are more widely
+        supported.) If you have any scripts, cleanup jobs, and so on that rely on the name of this work directory,
+        adjust them to use the new name.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">HBase considerations:</strong>
+      </p>
+
+      <p class="p">
+        You can use the <code class="ph codeph">INSERT</code> statement with HBase tables as follows:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          <p class="p">
+            You can insert a single row or a small set of rows into an HBase table with the <code class="ph codeph">INSERT ...
+            VALUES</code> syntax. This is a good use case for HBase tables with Impala, because HBase tables are
+            not subject to the same kind of fragmentation from many small insert operations as HDFS tables are.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            You can insert any number of rows at once into an HBase table using the <code class="ph codeph">INSERT ...
+            SELECT</code> syntax.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            If more than one inserted row has the same value for the HBase key column, only the last inserted row
+            with that value is visible to Impala queries. You can take advantage of this fact with <code class="ph codeph">INSERT
+            ... VALUES</code> statements to effectively update rows one at a time, by inserting new rows with the
+            same key values as existing rows. Be aware that after an <code class="ph codeph">INSERT ... SELECT</code> operation
+            copying from an HDFS table, the HBase table might contain fewer rows than were inserted, if the key
+            column in the source table contained duplicate values.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            You cannot <code class="ph codeph">INSERT OVERWRITE</code> into an HBase table. New rows are always appended.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            When you create an Impala or Hive table that maps to an HBase table, the column order you specify with
+            the <code class="ph codeph">INSERT</code> statement might be different than the order you declare with the
+            <code class="ph codeph">CREATE TABLE</code> statement. Behind the scenes, HBase arranges the columns based on how
+            they are divided into column families. This might cause a mismatch during insert operations, especially
+            if you use the syntax <code class="ph codeph">INSERT INTO <var class="keyword varname">hbase_table</var> SELECT * FROM
+            <var class="keyword varname">hdfs_table</var></code>. Before inserting data, verify the column order by issuing a
+            <code class="ph codeph">DESCRIBE</code> statement for the table, and adjust the order of the select list in the
+            <code class="ph codeph">INSERT</code> statement.
+          </p>
+        </li>
+      </ul>
+
+      <p class="p">
+        See <a class="xref" href="impala_hbase.html#impala_hbase">Using Impala to Query HBase Tables</a> for more details about using Impala with HBase.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Amazon S3 considerations:</strong>
+      </p>
+      <p class="p">
+        In <span class="keyword">Impala 2.6</span> and higher, the Impala DML statements (<code class="ph codeph">INSERT</code>, <code class="ph codeph">LOAD DATA</code>,
+        and <code class="ph codeph">CREATE TABLE AS SELECT</code>) can write data into a table or partition that resides in the
+        Amazon Simple Storage Service (S3).
+        The syntax of the DML statements is the same as for any other tables, because the S3 location for tables and
+        partitions is specified by an <code class="ph codeph">s3a://</code> prefix in the
+        <code class="ph codeph">LOCATION</code> attribute of
+        <code class="ph codeph">CREATE TABLE</code> or <code class="ph codeph">ALTER TABLE</code> statements.
+        If you bring data into S3 using the normal S3 transfer mechanisms instead of Impala DML statements,
+        issue a <code class="ph codeph">REFRESH</code> statement for the table before using Impala to query the S3 data.
+      </p>
+      <p class="p">
+        Because of differences between S3 and traditional filesystems, DML operations
+        for S3 tables can take longer than for tables on HDFS. For example, both the
+        <code class="ph codeph">LOAD DATA</code> statement and the final stage of the <code class="ph codeph">INSERT</code>
+        and <code class="ph codeph">CREATE TABLE AS SELECT</code> statements involve moving files from one directory
+        to another. (In the case of <code class="ph codeph">INSERT</code> and <code class="ph codeph">CREATE TABLE AS SELECT</code>,
+        the files are moved from a temporary staging directory to the final destination directory.)
+        Because S3 does not support a <span class="q">"rename"</span> operation for existing objects, in these cases Impala
+        actually copies the data files from one location to another and then removes the original files.
+        In <span class="keyword">Impala 2.6</span>, the <code class="ph codeph">S3_SKIP_INSERT_STAGING</code> query option provides a way
+        to speed up <code class="ph codeph">INSERT</code> statements for S3 tables and partitions, with the tradeoff
+        that a problem during statement execution could leave data in an inconsistent state.
+        It does not apply to <code class="ph codeph">INSERT OVERWRITE</code> or <code class="ph codeph">LOAD DATA</code> statements.
+        See <a class="xref" href="../shared/../topics/impala_s3_skip_insert_staging.html#s3_skip_insert_staging">S3_SKIP_INSERT_STAGING Query Option (Impala 2.6 or higher only)</a> for details.
+      </p>
+      <p class="p">See <a class="xref" href="impala_s3.html#s3">Using Impala with the Amazon S3 Filesystem</a> for details about reading and writing S3 data with Impala.</p>
+
+      <p class="p">
+        <strong class="ph b">ADLS considerations:</strong>
+      </p>
+      <p class="p">
+        In <span class="keyword">Impala 2.9</span> and higher, the Impala DML statements (<code class="ph codeph">INSERT</code>, <code class="ph codeph">LOAD DATA</code>,
+        and <code class="ph codeph">CREATE TABLE AS SELECT</code>) can write data into a table or partition that resides in the
+        Azure Data Lake Store (ADLS).
+        The syntax of the DML statements is the same as for any other tables, because the ADLS location for tables and
+        partitions is specified by an <code class="ph codeph">adl://</code> prefix in the
+        <code class="ph codeph">LOCATION</code> attribute of
+        <code class="ph codeph">CREATE TABLE</code> or <code class="ph codeph">ALTER TABLE</code> statements.
+        If you bring data into ADLS using the normal ADLS transfer mechanisms instead of Impala DML statements,
+        issue a <code class="ph codeph">REFRESH</code> statement for the table before using Impala to query the ADLS data.
+      </p>
+      <p class="p">See <a class="xref" href="impala_adls.html#adls">Using Impala with the Azure Data Lake Store (ADLS)</a> for details about reading and writing ADLS data with Impala.</p>
+
+      <p class="p">
+        <strong class="ph b">Security considerations:</strong>
+      </p>
+      <p class="p">
+        If these statements in your environment contain sensitive literal values such as credit card numbers or tax
+        identifiers, Impala can redact this sensitive information when displaying the statements in log files and
+        other administrative contexts. See <span class="xref">the documentation for your Apache Hadoop distribution</span> for details.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Cancellation:</strong> Can be cancelled. To cancel this statement, use Ctrl-C from the
+        <span class="keyword cmdname">impala-shell</span> interpreter, the <span class="ph uicontrol">Cancel</span> button from the
+        <span class="ph uicontrol">Watch</span> page in Hue, or <span class="ph uicontrol">Cancel</span> from the list of
+        in-flight queries (for a particular node) on the <span class="ph uicontrol">Queries</span> tab in the Impala web UI
+        (port 25000).
+      </p>
+
+      <p class="p">
+        <strong class="ph b">HDFS permissions:</strong>
+      </p>
+      <p class="p">
+        The user ID that the <span class="keyword cmdname">impalad</span> daemon runs under,
+        typically the <code class="ph codeph">impala</code> user, must have read
+        permission for the files in the source directory of an <code class="ph codeph">INSERT ... SELECT</code>
+        operation, and write permission for all affected directories in the destination table.
+        (An <code class="ph codeph">INSERT</code> operation could write files to multiple different HDFS directories
+        if the destination table is partitioned.)
+        This user must also have write permission to create a temporary work directory
+        in the top-level HDFS directory of the destination table.
+        An <code class="ph codeph">INSERT OVERWRITE</code> operation does not require write permission on
+        the original data files in the table, only on the table directories themselves.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Restrictions:</strong>
+      </p>
+
+      <p class="p">
+        For <code class="ph codeph">INSERT</code> operations into <code class="ph codeph">CHAR</code> or <code class="ph codeph">VARCHAR</code> columns, you
+        must cast all <code class="ph codeph">STRING</code> literals or expressions returning <code class="ph codeph">STRING</code> to to a
+        <code class="ph codeph">CHAR</code> or <code class="ph codeph">VARCHAR</code> type with the appropriate length.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Related startup options:</strong>
+      </p>
+
+      <p class="p">
+        By default, if an <code class="ph codeph">INSERT</code> statement creates any new subdirectories underneath a partitioned
+        table, those subdirectories are assigned default HDFS permissions for the <code class="ph codeph">impala</code> user. To
+        make each subdirectory have the same permissions as its parent directory in HDFS, specify the
+        <code class="ph codeph">--insert_inherit_permissions</code> startup option for the <span class="keyword cmdname">impalad</span> daemon.
+      </p>
+    </div>
+  </article>
+  <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="insert__partition_insert">
+    <h2 class="title topictitle2" id="ariaid-title3">Inserting Into Partitioned Tables with PARTITION Clause</h2>
+    <div class="body conbody">
+      <p class="p">
+        For a partitioned table, the optional <code class="ph codeph">PARTITION</code> clause
+        identifies which partition or partitions the values are inserted
+        into.
+      </p>
+      <p class="p">
+        All examples in this section will use the table declared as below:
+      </p>
+<pre class="pre codeblock"><code>CREATE TABLE t1 (w INT) PARTITIONED BY (x INT, y STRING);</code></pre>
+    </div>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title4" id="partition_insert__static_partition_insert">
+      <h3 class="title topictitle3" id="ariaid-title4">Static Partition Inserts</h3>
+      <div class="body conbody">
+        <p class="p">
+          In a static partition insert where a partition key column is given a
+          constant value, such as <code class="ph codeph">PARTITION</code>
+          <code class="ph codeph">(year=2012, month=2)</code>, the rows are inserted with the
+          same values specified for those partition key columns.
+        </p>
+        <p class="p">
+          The number of columns in the <code class="ph codeph">SELECT</code> list must equal
+          the number of columns in the column permutation.
+        </p>
+        <p class="p">
+          The <code class="ph codeph">PARTITION</code> clause must be used for static
+          partitioning inserts.
+        </p>
+        <p class="p">
+          Example:
+        </p>
+        <div class="p">
+          The following statement will insert the
+            <code class="ph codeph">some_other_table.c1</code> values for the
+            <code class="ph codeph">w</code> column, and all the rows inserted will have the
+          same <code class="ph codeph">x</code> value of <code class="ph codeph">10</code>, and the same
+            <code class="ph codeph">y</code> value of
+          <code class="ph codeph">‘a’</code>.<pre class="pre codeblock"><code>INSERT INTO t1 PARTITION (x=10, y='a')
+            SELECT c1 FROM some_other_table;</code></pre>
+        </div>
+      </div>
+    </article>
+    <article class="topic concept nested2" aria-labelledby="ariaid-title5" id="partition_insert__dynamic_partition_insert">
+        <h3 class="title topictitle3" id="ariaid-title5">Dynamic Partition Inserts</h3>
+        <div class="body conbody">
+          <p class="p">
+            In a dynamic partition insert where a partition key
+          column is in the <code class="ph codeph">INSERT</code> statement but not assigned a
+          value, such as in <code class="ph codeph">PARTITION (year, region)</code>(both
+          columns unassigned) or <code class="ph codeph">PARTITION(year, region='CA')</code>
+            (<code class="ph codeph">year</code> column unassigned), the unassigned columns
+          are filled in with the final columns of the <code class="ph codeph">SELECT</code> or
+            <code class="ph codeph">VALUES</code> clause. In this case, the number of columns
+          in the <code class="ph codeph">SELECT</code> list must equal the number of columns
+          in the column permutation plus the number of partition key columns not
+          assigned a constant value.
+          </p>
+          <p class="p">
+            See <a class="xref" href="https://www.cloudera.com/documentation/enterprise/latest/topics/impala_partitioning.html#partition_static_dynamic" target="_blank"><u class="ph u">Static and Dynamic Partitioning
+                Clauses</u></a> for examples and performance characteristics
+            of static and dynamic partitioned inserts.
+          </p>
+          <p class="p">
+            The following rules apply to dynamic partition
+            inserts.
+          </p>
+          <ul class="ul">
+            <li class="li">
+              <p class="p">
+                The columns are bound in the order they appear in the
+                  <code class="ph codeph">INSERT</code> statement.
+              </p>
+              <p class="p">
+                The table below shows the values inserted with the
+                <code class="ph codeph">INSERT</code> statements of different column
+              orders.
+              </p>
+            </li>
+          </ul>
+          <table class="table frame-all" id="dynamic_partition_insert__table_vyx_dp3_ldb"><caption></caption><colgroup><col><col><col><col></colgroup><tbody class="tbody">
+                <tr class="row">
+                  <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                  <td class="entry cellrowborder align-left colsep-1 rowsep-1">Column <code class="ph codeph">w</code> Value</td>
+                  <td class="entry cellrowborder align-left colsep-1 rowsep-1">Column <code class="ph codeph">x</code> Value</td>
+                  <td class="entry cellrowborder align-left colsep-1 rowsep-1">Column <code class="ph codeph">y</code> Value</td>
+                </tr>
+                <tr class="row">
+                  <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">INSERT INTO t1 (w, x, y) VALUES (1, 2,
+                      'c');</code></td>
+                  <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">1</code></td>
+                  <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">2</code></td>
+                  <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">‘c’</code></td>
+                </tr>
+                <tr class="row">
+                  <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">INSERT INTO t1 (x,w) PARTITION (y) VALUES (1,
+                      2, 'c');</code></td>
+                  <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">2</code></td>
+                  <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">1</code></td>
+                  <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">‘c’</code></td>
+                </tr>
+              </tbody></table>
+          <ul class="ul">
+            <li class="li">
+              When a partition clause is specified but the non-partition
+            columns are not specified in the <code class="ph codeph">INSERT</code> statement,
+            as in the first example below, the non-partition columns are treated
+            as though they had been specified before the
+              <code class="ph codeph">PARTITION</code> clause in the SQL.
+              <p class="p">
+                Example: These
+              three statements are equivalent, inserting <code class="ph codeph">1</code> to
+                <code class="ph codeph">w</code>, <code class="ph codeph">2</code> to <code class="ph codeph">x</code>,
+              and <code class="ph codeph">‘c’</code> to <code class="ph codeph">y</code>
+            columns.
+              </p>
+<pre class="pre codeblock"><code>INSERT INTO t1 PARTITION (x,y) VALUES (1, 2, ‘c’);
+INSERT INTO t1 (w) PARTITION (x, y) VALUES (1, 2, ‘c’);
+INSERT INTO t1 PARTITION (x, y='c') VALUES (1, 2);</code></pre>
+            </li>
+            <li class="li">
+              The <code class="ph codeph">PARTITION</code> clause is not required for
+            dynamic partition, but all the partition columns must be explicitly
+            present in the <code class="ph codeph">INSERT</code> statement in the column list
+            or in the <code class="ph codeph">PARTITION</code> clause. The partition columns
+            cannot be defaulted to <code class="ph codeph">NULL</code>.
+              <p class="p">
+                Example:
+              </p>
+              <p class="p">The following statements are valid because the partition
+              columns, <code class="ph codeph">x</code> and <code class="ph codeph">y</code>, are present in
+              the <code class="ph codeph">INSERT</code> statements, either in the
+                <code class="ph codeph">PARTITION</code> clause or in the column
+              list.
+              </p>
+<pre class="pre codeblock"><code>INSERT INTO t1 PARTITION (x,y) VALUES (1, 2, ‘c’);
+INSERT INTO t1 (w, x) PARTITION (y) VALUES (1, 2, ‘c’);</code></pre>
+              <p class="p">
+                The following statement is not valid for the partitioned table as
+              defined above because the partition columns, <code class="ph codeph">x</code>
+              and <code class="ph codeph">y</code>, are not present in the
+                <code class="ph codeph">INSERT</code> statement.
+              </p>
+<pre class="pre codeblock"><code>INSERT INTO t1 VALUES (1, 2, 'c');</code></pre>
+          </li>
+            <li class="li">
+              If partition columns do not exist in the source table, you can
+              specify a specific value for that column in the
+              <code class="ph codeph">PARTITION</code> clause.
+              <p class="p">
+                Example: The <code class="ph codeph">source</code> table only contains the column
+                <code class="ph codeph">w</code> and <code class="ph codeph">y</code>. The value,
+                <code class="ph codeph">20</code>, specified in the <code class="ph codeph">PARTITION</code>
+              clause, is inserted into the <code class="ph codeph">x</code> column.
+              </p>
+<pre class="pre codeblock"><code>INSERT INTO t1 PARTITION (x=20, y) SELECT * FROM source;</code></pre>
+          </li>
+          </ul>
+        </div>
+      </article>
+    </article>
+  </article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_install.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_install.html b/docs/build3x/html/topics/impala_install.html
new file mode 100644
index 0000000..9071134
--- /dev/null
+++ b/docs/build3x/html/topics/impala_install.html
@@ -0,0 +1,126 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="install"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Installing Impala</title></head><body id="install"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1"><span class="ph">Installing Impala</span></h1>
+
+
+  <div class="body conbody">
+
+    <p class="p">
+
+
+
+
+
+
+
+
+      Impala is an open-source analytic database for Apache Hadoop
+      that returns rapid responses to queries.
+    </p>
+
+    <p class="p">
+      Follow these steps to set up Impala on a cluster by building from source:
+    </p>
+
+
+
+    <ul class="ul">
+      <li class="li">
+        <p class="p">
+          Download the latest release. See
+          <a class="xref" href="http://impala.apache.org/downloads.html" target="_blank">the Impala downloads page</a>
+          for the link to the latest release.
+        </p>
+      </li>
+      <li class="li">
+        <p class="p">
+          Check the <span class="ph filepath">README.md</span> file for a pointer
+          to the build instructions.
+        </p>
+      </li>
+      <li class="li">
+        <p class="p">
+          Please check the MD5 and SHA1 and GPG signature, the latter by using the code signing keys of the release managers.
+        </p>
+      </li>
+      <li class="li">
+        <div class="p">
+          Developers interested in working on Impala can clone the Impala source repository:
+<pre class="pre codeblock"><code>
+git clone https://git-wip-us.apache.org/repos/asf/impala.git
+</code></pre>
+        </div>
+      </li>
+    </ul>
+
+  </div>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title2" id="install__install_details">
+
+    <h2 class="title topictitle2" id="ariaid-title2">What is Included in an Impala Installation</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        Impala is made up of a set of components that can be installed on multiple nodes throughout your cluster.
+        The key installation step for performance is to install the <span class="keyword cmdname">impalad</span> daemon (which does
+        most of the query processing work) on <em class="ph i">all</em> DataNodes in the cluster.
+      </p>
+
+      <p class="p">
+        Impala primarily consists of these executables, which should be available after you build from source:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          <p class="p">
+            <span class="keyword cmdname">impalad</span> - The Impala daemon. Plans and executes queries against HDFS, HBase, <span class="ph">and Amazon S3 data</span>.
+            <a class="xref" href="impala_processes.html#processes">Run one impalad process</a> on each node in the cluster
+            that has a DataNode.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            <span class="keyword cmdname">statestored</span> - Name service that tracks location and status of all
+            <code class="ph codeph">impalad</code> instances in the cluster. <a class="xref" href="impala_processes.html#processes">Run one
+            instance of this daemon</a> on a node in your cluster. Most production deployments run this daemon
+            on the namenode.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            <span class="keyword cmdname">catalogd</span> - Metadata coordination service that broadcasts changes from Impala DDL and
+            DML statements to all affected Impala nodes, so that new tables, newly loaded data, and so on are
+            immediately visible to queries submitted through any Impala node.
+
+            (Prior to Impala 1.2, you had to run the <code class="ph codeph">REFRESH</code> or <code class="ph codeph">INVALIDATE
+            METADATA</code> statement on each node to synchronize changed metadata. Now those statements are only
+            required if you perform the DDL or DML through an external mechanism such as Hive <span class="ph">or by uploading
+            data to the Amazon S3 filesystem</span>.)
+            <a class="xref" href="impala_processes.html#processes">Run one instance of this daemon</a> on a node in your cluster,
+            preferably on the same host as the <code class="ph codeph">statestored</code> daemon.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            <span class="keyword cmdname">impala-shell</span> - <a class="xref" href="impala_impala_shell.html#impala_shell">Command-line
+            interface</a> for issuing queries to the Impala daemon. You install this on one or more hosts
+            anywhere on your network, not necessarily DataNodes or even within the same cluster as Impala. It can
+            connect remotely to any instance of the Impala daemon.
+          </p>
+        </li>
+      </ul>
+
+      <p class="p">
+        Before starting working with Impala, ensure that you have all necessary prerequisites. See
+        <a class="xref" href="impala_prereqs.html#prereqs">Impala Requirements</a> for details.
+      </p>
+    </div>
+  </article>
+
+</article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_int.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_int.html b/docs/build3x/html/topics/impala_int.html
new file mode 100644
index 0000000..44f4ee1
--- /dev/null
+++ b/docs/build3x/html/topics/impala_int.html
@@ -0,0 +1,121 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_datatypes.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="int"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>INT Data Type</title></head><body id="int"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">INT Data Type</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+      A 4-byte integer data type used in <code class="ph codeph">CREATE TABLE</code> and <code class="ph codeph">ALTER TABLE</code> statements.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+    <p class="p">
+      In the column definition of a <code class="ph codeph">CREATE TABLE</code> statement:
+    </p>
+
+<pre class="pre codeblock"><code><var class="keyword varname">column_name</var> INT</code></pre>
+
+    <p class="p">
+      <strong class="ph b">Range:</strong> -2147483648 .. 2147483647. There is no <code class="ph codeph">UNSIGNED</code> subtype.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Conversions:</strong> Impala automatically converts to a larger integer type (<code class="ph codeph">BIGINT</code>) or a
+      floating-point type (<code class="ph codeph">FLOAT</code> or <code class="ph codeph">DOUBLE</code>) automatically. Use
+      <code class="ph codeph">CAST()</code> to convert to <code class="ph codeph">TINYINT</code>, <code class="ph codeph">SMALLINT</code>,
+      <code class="ph codeph">STRING</code>, or <code class="ph codeph">TIMESTAMP</code>.
+      <span class="ph">
+          Casting an integer or floating-point value <code class="ph codeph">N</code> to
+        <code class="ph codeph">TIMESTAMP</code> produces a value that is <code class="ph codeph">N</code> seconds past the start of the epoch
+        date (January 1, 1970). By default, the result value represents a date and time in the UTC time zone.
+        If the setting <code class="ph codeph">--use_local_tz_for_unix_timestamp_conversions=true</code> is in effect,
+        the resulting <code class="ph codeph">TIMESTAMP</code> represents a date and time in the local time zone.
+        </span>
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+    <p class="p">
+      The data type <code class="ph codeph">INTEGER</code> is an alias for <code class="ph codeph">INT</code>.
+    </p>
+
+    <p class="p">
+      For a convenient and automated way to check the bounds of the <code class="ph codeph">INT</code> type, call the functions
+      <code class="ph codeph">MIN_INT()</code> and <code class="ph codeph">MAX_INT()</code>.
+    </p>
+
+    <p class="p">
+      If an integer value is too large to be represented as a <code class="ph codeph">INT</code>, use a <code class="ph codeph">BIGINT</code>
+      instead.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">NULL considerations:</strong> Casting any non-numeric value to this type produces a <code class="ph codeph">NULL</code>
+        value.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>CREATE TABLE t1 (x INT);
+SELECT CAST(1000 AS INT);
+</code></pre>
+
+    <p class="p">
+        <strong class="ph b">Partitioning:</strong> Prefer to use this type for a partition key column. Impala can process the numeric
+        type more efficiently than a <code class="ph codeph">STRING</code> representation of the value.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">HBase considerations:</strong> This data type is fully compatible with HBase tables.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Parquet considerations:</strong>
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Text table considerations:</strong> Values of this type are potentially larger in text tables than in tables
+        using Parquet or other binary formats.
+      </p>
+
+
+
+    <p class="p">
+        <strong class="ph b">Internal details:</strong> Represented in memory as a 4-byte value.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Added in:</strong> Available in all versions of Impala.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Column statistics considerations:</strong> Because this type has a fixed size, the maximum and average size
+        fields are always filled in for column statistics, even before you run the <code class="ph codeph">COMPUTE STATS</code>
+        statement.
+      </p>
+
+
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+    <p class="p">
+      <a class="xref" href="impala_literals.html#numeric_literals">Numeric Literals</a>, <a class="xref" href="impala_tinyint.html#tinyint">TINYINT Data Type</a>,
+      <a class="xref" href="impala_smallint.html#smallint">SMALLINT Data Type</a>, <a class="xref" href="impala_int.html#int">INT Data Type</a>,
+      <a class="xref" href="impala_bigint.html#bigint">BIGINT Data Type</a>, <a class="xref" href="impala_decimal.html#decimal">DECIMAL Data Type (Impala 3.0 or higher only)</a>,
+      <a class="xref" href="impala_math_functions.html#math_functions">Impala Mathematical Functions</a>
+    </p>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_datatypes.html">Data Types</a></div></div></nav></article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_intro.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_intro.html b/docs/build3x/html/topics/impala_intro.html
new file mode 100644
index 0000000..99c24b3
--- /dev/null
+++ b/docs/build3x/html/topics/impala_intro.html
@@ -0,0 +1,198 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="intro"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Introducing Apache Impala</title></head><body id="intro"><main role="main"><article role="article" aria-labelledby="intro__impala">
+
+  <h1 class="title topictitle1" id="intro__impala"><span class="ph">Introducing Apache Impala</span></h1>
+
+
+  <div class="body conbody" id="intro__intro_body">
+
+      <p class="p">
+        Impala provides fast, interactive SQL queries directly on your Apache Hadoop data stored in HDFS,
+        HBase, <span class="ph">or the Amazon Simple Storage Service (S3)</span>.
+        In addition to using the same unified storage platform,
+        Impala also uses the same metadata, SQL syntax (Hive SQL), ODBC driver, and user interface
+        (Impala query UI in Hue) as Apache Hive. This
+        provides a familiar and unified platform for real-time or batch-oriented queries.
+      </p>
+
+      <p class="p">
+        Impala is an addition to tools available for querying big data. Impala does not replace the batch
+        processing frameworks built on MapReduce such as Hive. Hive and other frameworks built on MapReduce are
+        best suited for long running batch jobs, such as those involving batch processing of Extract, Transform,
+        and Load (ETL) type jobs.
+      </p>
+
+      <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+         Impala graduated from the Apache Incubator on November 15, 2017.
+         In places where the documentation formerly referred to <span class="q">"Cloudera Impala"</span>,
+         now the official name is <span class="q">"Apache Impala"</span>.
+      </div>
+
+  </div>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title2" id="intro__benefits">
+
+    <h2 class="title topictitle2" id="ariaid-title2">Impala Benefits</h2>
+
+    <div class="body conbody">
+
+      <div class="p">
+        Impala provides:
+
+        <ul class="ul">
+          <li class="li">
+            Familiar SQL interface that data scientists and analysts already know.
+          </li>
+
+          <li class="li">
+            Ability to query high volumes of data (<span class="q">"big data"</span>) in Apache Hadoop.
+          </li>
+
+          <li class="li">
+            Distributed queries in a cluster environment, for convenient scaling and to make use of cost-effective
+            commodity hardware.
+          </li>
+
+          <li class="li">
+            Ability to share data files between different components with no copy or export/import step; for example,
+            to write with Pig, transform with Hive and query with Impala. Impala can read from and write to Hive
+            tables, enabling simple data interchange using Impala for analytics on Hive-produced data.
+          </li>
+
+          <li class="li">
+            Single system for big data processing and analytics, so customers can avoid costly modeling and ETL just
+            for analytics.
+          </li>
+        </ul>
+      </div>
+
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="intro__impala_hadoop">
+
+    <h2 class="title topictitle2" id="ariaid-title3">How Impala Works with <span class="keyword">Apache Hadoop</span></h2>
+
+
+    <div class="body conbody">
+
+
+
+      <div class="p">
+        The Impala solution is composed of the following components:
+        <ul class="ul">
+          <li class="li">
+            Clients - Entities including Hue, ODBC clients, JDBC clients, and the Impala Shell can all interact
+            with Impala. These interfaces are typically used to issue queries or complete administrative tasks such
+            as connecting to Impala.
+          </li>
+
+          <li class="li">
+            Hive Metastore - Stores information about the data available to Impala. For example, the metastore lets
+            Impala know what databases are available and what the structure of those databases is. As you create,
+            drop, and alter schema objects, load data into tables, and so on through Impala SQL statements, the
+            relevant metadata changes are automatically broadcast to all Impala nodes by the dedicated catalog
+            service introduced in Impala 1.2.
+          </li>
+
+          <li class="li">
+            Impala - This process, which runs on DataNodes, coordinates and executes queries. Each
+            instance of Impala can receive, plan, and coordinate queries from Impala clients. Queries are
+            distributed among Impala nodes, and these nodes then act as workers, executing parallel query
+            fragments.
+          </li>
+
+          <li class="li">
+            HBase and HDFS - Storage for data to be queried.
+          </li>
+        </ul>
+      </div>
+
+      <div class="p">
+        Queries executed using Impala are handled as follows:
+        <ol class="ol">
+          <li class="li">
+            User applications send SQL queries to Impala through ODBC or JDBC, which provide standardized querying
+            interfaces. The user application may connect to any <code class="ph codeph">impalad</code> in the cluster. This
+            <code class="ph codeph">impalad</code> becomes the coordinator for the query.
+          </li>
+
+          <li class="li">
+            Impala parses the query and analyzes it to determine what tasks need to be performed by
+            <code class="ph codeph">impalad</code> instances across the cluster. Execution is planned for optimal efficiency.
+          </li>
+
+          <li class="li">
+            Services such as HDFS and HBase are accessed by local <code class="ph codeph">impalad</code> instances to provide
+            data.
+          </li>
+
+          <li class="li">
+            Each <code class="ph codeph">impalad</code> returns data to the coordinating <code class="ph codeph">impalad</code>, which sends
+            these results to the client.
+          </li>
+        </ol>
+      </div>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="intro__features">
+
+    <h2 class="title topictitle2" id="ariaid-title4">Primary Impala Features</h2>
+
+    <div class="body conbody">
+
+      <div class="p">
+        Impala provides support for:
+        <ul class="ul">
+          <li class="li">
+            Most common SQL-92 features of Hive Query Language (HiveQL) including
+            <a class="xref" href="../shared/../topics/impala_select.html#select">SELECT</a>,
+            <a class="xref" href="../shared/../topics/impala_joins.html#joins">joins</a>, and
+            <a class="xref" href="../shared/../topics/impala_aggregate_functions.html#aggregate_functions">aggregate functions</a>.
+          </li>
+
+          <li class="li">
+            HDFS, HBase, <span class="ph">and Amazon Simple Storage System (S3)</span> storage, including:
+            <ul class="ul">
+              <li class="li">
+                <a class="xref" href="../shared/../topics/impala_file_formats.html#file_formats">HDFS file formats</a>: delimited text files, Parquet,
+                Avro, SequenceFile, and RCFile.
+              </li>
+
+              <li class="li">
+                Compression codecs: Snappy, GZIP, Deflate, BZIP.
+              </li>
+            </ul>
+          </li>
+
+          <li class="li">
+            Common data access interfaces including:
+            <ul class="ul">
+              <li class="li">
+                <a class="xref" href="../shared/../topics/impala_jdbc.html#impala_jdbc">JDBC driver</a>.
+              </li>
+
+              <li class="li">
+                <a class="xref" href="../shared/../topics/impala_odbc.html#impala_odbc">ODBC driver</a>.
+              </li>
+
+              <li class="li">
+                Hue Beeswax and the Impala Query UI.
+              </li>
+            </ul>
+          </li>
+
+          <li class="li">
+            <a class="xref" href="../shared/../topics/impala_impala_shell.html#impala_shell">impala-shell command-line interface</a>.
+          </li>
+
+          <li class="li">
+            <a class="xref" href="../shared/../topics/impala_security.html#security">Kerberos authentication</a>.
+          </li>
+        </ul>
+      </div>
+    </div>
+  </article>
+</article></main></body></html>

[38/51] [partial] impala git commit: [DOCS] Impala doc site update for 3.0

Posted by mi...@apache.org.

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_datatypes.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_datatypes.html b/docs/build3x/html/topics/impala_datatypes.html
new file mode 100644
index 0000000..45bc6fc
--- /dev/null
+++ b/docs/build3x/html/topics/impala_datatypes.html
@@ -0,0 +1,33 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_langref.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_array.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_bigint.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_boolean.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_char.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_decimal.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_double.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_float.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_int.html"><meta name="DC.Relation" scheme="URI" content=".
 ./topics/impala_map.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_real.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_smallint.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_string.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_struct.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_timestamp.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_tinyint.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_varchar.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_complex_types.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="datatypes"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Data Types</title></h
 ead><body id="datatypes"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Data Types</h1>
+
+
+  <div class="body conbody">
+
+    <p class="p">
+
+      Impala supports a set of data types that you can use for table columns, expression values, and function
+      arguments and return values.
+    </p>
+
+    <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+      Currently, Impala supports only scalar types, not composite or nested types. Accessing a table containing any
+      columns with unsupported types causes an error.
+    </div>
+
+    <p class="p toc"></p>
+
+    <p class="p">
+      For the notation to write literals of each of these data types, see
+      <a class="xref" href="impala_literals.html#literals">Literals</a>.
+    </p>
+
+    <p class="p">
+      See <a class="xref" href="impala_langref_unsupported.html#langref_hiveql_delta">SQL Differences Between Impala and Hive</a> for differences between Impala and
+      Hive data types.
+    </p>
+  </div>
+<nav role="navigation" class="related-links"><ul class="ullinks"><li class="link ulchildlink"><strong><a href="../topics/impala_array.html">ARRAY Complex Type (Impala 2.3 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_bigint.html">BIGINT Data Type</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_boolean.html">BOOLEAN Data Type</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_char.html">CHAR Data Type (Impala 2.0 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_decimal.html">DECIMAL Data Type (Impala 3.0 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_double.html">DOUBLE Data Type</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_float.html">FLOAT Data Type</a></strong><br></li><li class="link ulchildlink"><strong><a href="../to
 pics/impala_int.html">INT Data Type</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_map.html">MAP Complex Type (Impala 2.3 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_real.html">REAL Data Type</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_smallint.html">SMALLINT Data Type</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_string.html">STRING Data Type</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_struct.html">STRUCT Complex Type (Impala 2.3 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_timestamp.html">TIMESTAMP Data Type</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_tinyint.html">TINYINT Data Type</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_varchar.html">V
 ARCHAR Data Type (Impala 2.0 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_complex_types.html">Complex Types (Impala 2.3 or higher only)</a></strong><br></li></ul><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_langref.html">Impala SQL Language Reference</a></div></div></nav></article></main></body></html>

[36/51] [partial] impala git commit: [DOCS] Impala doc site update for 3.0

Posted by mi...@apache.org.

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_ddl.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_ddl.html b/docs/build3x/html/topics/impala_ddl.html
new file mode 100644
index 0000000..e9737bf
--- /dev/null
+++ b/docs/build3x/html/topics/impala_ddl.html
@@ -0,0 +1,141 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_langref_sql.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="ddl"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>DDL Statements</title></head><body id="ddl"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">DDL Statements</h1>
+
+
+  <div class="body conbody">
+
+    <p class="p">
+      DDL refers to <span class="q">"Data Definition Language"</span>, a subset of SQL statements that change the structure of the
+      database schema in some way, typically by creating, deleting, or modifying schema objects such as databases,
+      tables, and views. Most Impala DDL statements start with the keywords <code class="ph codeph">CREATE</code>,
+      <code class="ph codeph">DROP</code>, or <code class="ph codeph">ALTER</code>.
+    </p>
+
+    <p class="p">
+      The Impala DDL statements are:
+    </p>
+
+    <ul class="ul">
+      <li class="li">
+        <a class="xref" href="impala_alter_table.html#alter_table">ALTER TABLE Statement</a>
+      </li>
+
+      <li class="li">
+        <a class="xref" href="impala_alter_view.html#alter_view">ALTER VIEW Statement</a>
+      </li>
+
+      <li class="li">
+        <a class="xref" href="impala_compute_stats.html#compute_stats">COMPUTE STATS Statement</a>
+      </li>
+
+      <li class="li">
+        <a class="xref" href="impala_create_database.html#create_database">CREATE DATABASE Statement</a>
+      </li>
+
+      <li class="li">
+        <a class="xref" href="impala_create_function.html#create_function">CREATE FUNCTION Statement</a>
+      </li>
+
+      <li class="li">
+        <a class="xref" href="impala_create_role.html#create_role">CREATE ROLE Statement (Impala 2.0 or higher only)</a>
+      </li>
+
+      <li class="li">
+        <a class="xref" href="impala_create_table.html#create_table">CREATE TABLE Statement</a>
+      </li>
+
+      <li class="li">
+        <a class="xref" href="impala_create_view.html#create_view">CREATE VIEW Statement</a>
+      </li>
+
+      <li class="li">
+        <a class="xref" href="impala_drop_database.html#drop_database">DROP DATABASE Statement</a>
+      </li>
+
+      <li class="li">
+        <a class="xref" href="impala_drop_function.html#drop_function">DROP FUNCTION Statement</a>
+      </li>
+
+      <li class="li">
+        <a class="xref" href="impala_drop_role.html#drop_role">DROP ROLE Statement (Impala 2.0 or higher only)</a>
+      </li>
+
+      <li class="li">
+        <a class="xref" href="impala_drop_table.html#drop_table">DROP TABLE Statement</a>
+      </li>
+
+      <li class="li">
+        <a class="xref" href="impala_drop_view.html#drop_view">DROP VIEW Statement</a>
+      </li>
+
+      <li class="li">
+        <a class="xref" href="impala_grant.html#grant">GRANT Statement (Impala 2.0 or higher only)</a>
+      </li>
+
+      <li class="li">
+        <a class="xref" href="impala_revoke.html#revoke">REVOKE Statement (Impala 2.0 or higher only)</a>
+      </li>
+    </ul>
+
+    <p class="p">
+      After Impala executes a DDL command, information about available tables, columns, views, partitions, and so
+      on is automatically synchronized between all the Impala nodes in a cluster. (Prior to Impala 1.2, you had to
+      issue a <code class="ph codeph">REFRESH</code> or <code class="ph codeph">INVALIDATE METADATA</code> statement manually on the other
+      nodes to make them aware of the changes.)
+    </p>
+
+    <p class="p">
+      If the timing of metadata updates is significant, for example if you use round-robin scheduling where each
+      query could be issued through a different Impala node, you can enable the
+      <a class="xref" href="impala_sync_ddl.html#sync_ddl">SYNC_DDL</a> query option to make the DDL statement wait until
+      all nodes have been notified about the metadata changes.
+    </p>
+
+    <p class="p">
+      See <a class="xref" href="impala_s3.html#s3">Using Impala with the Amazon S3 Filesystem</a> for details about how Impala DDL statements interact with
+      tables and partitions stored in the Amazon S3 filesystem.
+    </p>
+
+    <p class="p">
+      Although the <code class="ph codeph">INSERT</code> statement is officially classified as a DML (data manipulation language)
+      statement, it also involves metadata changes that must be broadcast to all Impala nodes, and so is also
+      affected by the <code class="ph codeph">SYNC_DDL</code> query option.
+    </p>
+
+    <p class="p">
+      Because the <code class="ph codeph">SYNC_DDL</code> query option makes each DDL operation take longer than normal, you
+      might only enable it before the last DDL operation in a sequence. For example, if you are running a script
+      that issues multiple of DDL operations to set up an entire new schema, add several new partitions, and so on,
+      you might minimize the performance overhead by enabling the query option only before the last
+      <code class="ph codeph">CREATE</code>, <code class="ph codeph">DROP</code>, <code class="ph codeph">ALTER</code>, or <code class="ph codeph">INSERT</code> statement.
+      The script only finishes when all the relevant metadata changes are recognized by all the Impala nodes, so
+      you could connect to any node and issue queries through it.
+    </p>
+
+    <p class="p">
+      The classification of DDL, DML, and other statements is not necessarily the same between Impala and Hive.
+      Impala organizes these statements in a way intended to be familiar to people familiar with relational
+      databases or data warehouse products. Statements that modify the metastore database, such as <code class="ph codeph">COMPUTE
+      STATS</code>, are classified as DDL. Statements that only query the metastore database, such as
+      <code class="ph codeph">SHOW</code> or <code class="ph codeph">DESCRIBE</code>, are put into a separate category of utility statements.
+    </p>
+
+    <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+      The query types shown in the Impala debug web user interface might not match exactly the categories listed
+      here. For example, currently the <code class="ph codeph">USE</code> statement is shown as DDL in the debug web UI. The
+      query types shown in the debug web UI are subject to change, for improved consistency.
+    </div>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+    <p class="p">
+      The other major classifications of SQL statements are data manipulation language (see
+      <a class="xref" href="impala_dml.html#dml">DML Statements</a>) and queries (see <a class="xref" href="impala_select.html#select">SELECT Statement</a>).
+    </p>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_langref_sql.html">Impala SQL Statements</a></div></div></nav></article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_debug_action.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_debug_action.html b/docs/build3x/html/topics/impala_debug_action.html
new file mode 100644
index 0000000..f39c89f
--- /dev/null
+++ b/docs/build3x/html/topics/impala_debug_action.html
@@ -0,0 +1,24 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="debug_action"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>DEBUG_ACTION Query Option</title></head><body id="debug_action"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">DEBUG_ACTION Query Option</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+
+      Introduces artificial problem conditions within queries. For internal debugging and troubleshooting.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Type:</strong> <code class="ph codeph">STRING</code>
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Default:</strong> empty string
+    </p>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_decimal.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_decimal.html b/docs/build3x/html/topics/impala_decimal.html
new file mode 100644
index 0000000..3c0a917
--- /dev/null
+++ b/docs/build3x/html/topics/impala_decimal.html
@@ -0,0 +1,907 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_datatypes.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="decimal"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>DECIMAL Data Type (Impala 3.0 or higher only)</title></head><body id="decimal"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">DECIMAL Data Type (<span class="keyword">Impala 3.0</span> or higher only)</h1>
+
+
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+      The <code class="ph codeph">DECIMAL</code> data type is a numeric data type with fixed scale and
+      precision.
+    </p>
+
+    <p class="p">
+      The data type is useful for storing and doing operations on precise decimal values.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>DECIMAL[(<var class="keyword varname">precision</var>[, <var class="keyword varname">scale</var>])]</code></pre>
+
+    <p class="p">
+      <strong class="ph b">Precision:</strong>
+    </p>
+
+    <p class="p">
+      <var class="keyword varname">precision</var> represents the total number of digits that can be represented
+      regardless of the location of the decimal point.
+    </p>
+
+    <p class="p">
+      This value must be between 1 and 38, specified as an integer literal.
+    </p>
+
+    <p class="p">
+      The default precision is 9.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Scale:</strong>
+    </p>
+
+    <p class="p">
+      <var class="keyword varname">scale</var> represents the number of fractional digits.
+    </p>
+
+    <p class="p">
+      This value must be less than or equal to the precision, specified as an integer literal.
+    </p>
+
+    <p class="p">
+      The default scale is 0.
+    </p>
+
+    <p class="p">
+      When the precision and the scale are omitted, a <code class="ph codeph">DECIMAL</code> is treated as
+      <code class="ph codeph">DECIMAL(9, 0)</code>.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Range:</strong>
+    </p>
+
+    <p class="p">
+      The range of <code class="ph codeph">DECIMAL</code> type is -10^38 +1 through 10^38 –1.
+    </p>
+
+    <p class="p">
+      The largest value is represented by <code class="ph codeph">DECIMAL(38, 0)</code>.
+    </p>
+
+    <p class="p">
+      The most precise fractional value (between 0 and 1, or 0 and -1) is represented by
+      <code class="ph codeph">DECIMAL(38, 38)</code>, with 38 digits to the right of the decimal point. The
+      value closest to 0 would be .0000...1 (37 zeros and the final 1). The value closest to 1
+      would be .999... (9 repeated 38 times).
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Memory and disk storage:</strong>
+    </p>
+
+    <p class="p">
+      Only the precision determines the storage size for <code class="ph codeph">DECIMAL</code> values, and
+      the scale setting has no effect on the storage size. The following table describes the
+      in-memory storage once the values are loaded into memory.
+    </p>
+
+    <div class="p">
+      <table class="simpletable frame-all" id="decimal__simpletable_tty_3y2_mdb"><col style="width:50%"><col style="width:50%"><thead><tr class="sthead">
+
+          <th class="stentry" id="decimal__simpletable_tty_3y2_mdb__stentry__1">Precision</th>
+
+          <th class="stentry" id="decimal__simpletable_tty_3y2_mdb__stentry__2">In-memory Storage</th>
+
+        </tr></thead><tbody><tr class="strow">
+
+          <td class="stentry" headers="decimal__simpletable_tty_3y2_mdb__stentry__1">1 - 9</td>
+
+          <td class="stentry" headers="decimal__simpletable_tty_3y2_mdb__stentry__2">4 bytes</td>
+
+        </tr><tr class="strow">
+
+          <td class="stentry" headers="decimal__simpletable_tty_3y2_mdb__stentry__1">10 - 18</td>
+
+          <td class="stentry" headers="decimal__simpletable_tty_3y2_mdb__stentry__2">8 bytes</td>
+
+        </tr><tr class="strow">
+
+          <td class="stentry" headers="decimal__simpletable_tty_3y2_mdb__stentry__1">19 - 38</td>
+
+          <td class="stentry" headers="decimal__simpletable_tty_3y2_mdb__stentry__2">16 bytes</td>
+
+        </tr></tbody></table>
+    </div>
+
+    <p class="p">
+      The on-disk representation varies depending on the file format of the table.
+    </p>
+
+    <p class="p">
+      Text, RCFile, and SequenceFile tables use ASCII-based formats as below:
+    </p>
+
+    <div class="p">
+      <ul class="ul">
+        <li class="li">
+          Leading zeros are not stored.
+        </li>
+
+        <li class="li">
+          Trailing zeros are stored.
+        </li>
+
+        <li class="li">
+          <p class="p">
+            Each <code class="ph codeph">DECIMAL</code> value takes up as many bytes as the precision of the
+            value, plus:
+          </p>
+          <ul class="ul">
+            <li class="li">
+              One extra byte if the decimal point is present.
+            </li>
+
+            <li class="li">
+              One extra byte for negative values.
+            </li>
+          </ul>
+        </li>
+      </ul>
+    </div>
+
+    <p class="p">
+      Parquet and Avro tables use binary formats and offer more compact storage for
+      <code class="ph codeph">DECIMAL</code> values. In these tables, Impala stores each value in fewer bytes
+      where possible depending on the precision specified for the <code class="ph codeph">DECIMAL</code>
+      column. To conserve space in large tables, use the smallest-precision
+      <code class="ph codeph">DECIMAL</code> type.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Precision and scale in arithmetic operations:</strong>
+    </p>
+
+    <p class="p">
+      For all arithmetic operations, the resulting precision is at most 38.
+    </p>
+
+    <p class="p">
+      If the resulting precision would be greater than 38, Impala truncates the result from the
+      back, but keeps at least 6 fractional digits in scale and rounds.
+    </p>
+
+    <p class="p">
+      For example, <code class="ph codeph">DECIMAL(38, 20) * DECIMAL(38, 20)</code> returns
+      <code class="ph codeph">DECIMAL(38, 6)</code>. According to the table below, the resulting precision and
+      scale would be <code class="ph codeph">(77, 40)</code>, but they are higher than the maximum precision
+      and scale for <code class="ph codeph">DECIMAL</code>. So, Impala sets the precision to the maximum
+      allowed 38, and truncates the scale to 6.
+    </p>
+
+    <div class="p">
+      When you use <code class="ph codeph">DECIMAL</code> values in arithmetic operations, the precision and
+      scale of the result value are determined as follows. For better readability, the following
+      terms are used in the table below:
+      <ul class="ul">
+        <li dir="ltr" class="li">
+          <p dir="ltr" class="p">
+            P1, P2: Input precisions
+          </p>
+        </li>
+
+        <li dir="ltr" class="li">
+          <p dir="ltr" class="p">
+            S1, S2: Input scales
+          </p>
+        </li>
+
+        <li dir="ltr" class="li">
+          <p dir="ltr" class="p">
+            L1, L2: Leading digits in input <code class="ph codeph">DECIMAL</code>s, i.e., L1 = P1 - S1 and L2
+            = P2 - S2
+          </p>
+        </li>
+      </ul>
+    </div>
+
+    <div class="p">
+      <table class="table frame-all" id="decimal__table_inl_sz2_mdb"><caption></caption><colgroup><col><col><col></colgroup><tbody class="tbody">
+            <tr class="row">
+              <td class="entry cellrowborder align-left colsep-1 rowsep-1">
+                <strong class="ph b">Operation</strong>
+              </td>
+              <td class="entry cellrowborder align-left colsep-1 rowsep-1">
+                <strong class="ph b">Resulting Precision</strong>
+              </td>
+              <td class="entry cellrowborder align-left colsep-1 rowsep-1">
+                <strong class="ph b">Resulting Scale</strong>
+              </td>
+            </tr>
+            <tr class="row">
+              <td class="entry cellrowborder align-left colsep-1 rowsep-1">
+                Addition and Subtraction
+              </td>
+              <td class="entry cellrowborder align-left colsep-1 rowsep-1">
+                <p class="p">
+                  max (L1, L2) + max (S1, S2) + 1
+                </p>
+
+
+
+                <p class="p">
+                  1 is for carry-over.
+                </p>
+              </td>
+              <td class="entry cellrowborder align-left colsep-1 rowsep-1">
+                max (S1, S2)
+              </td>
+            </tr>
+            <tr class="row">
+              <td class="entry cellrowborder align-left colsep-1 rowsep-1">
+                Multiplication
+              </td>
+              <td class="entry cellrowborder align-left colsep-1 rowsep-1">
+                P1 + P2 + 1
+              </td>
+              <td class="entry cellrowborder align-left colsep-1 rowsep-1">
+                S1 + S2
+              </td>
+            </tr>
+            <tr class="row">
+              <td class="entry cellrowborder align-left colsep-1 rowsep-1">
+                Division
+              </td>
+              <td class="entry cellrowborder align-left colsep-1 rowsep-1">
+                L1 + S2 + max (S1 + P2 + 1, 6)
+              </td>
+              <td class="entry cellrowborder align-left colsep-1 rowsep-1">
+                max (S1 + P2 + 1, 6)
+              </td>
+            </tr>
+            <tr class="row">
+              <td class="entry cellrowborder align-left colsep-1 rowsep-1">
+                Modulo
+              </td>
+              <td class="entry cellrowborder align-left colsep-1 rowsep-1">
+                min (L1, L2) + max (S1, S2)
+              </td>
+              <td class="entry cellrowborder align-left colsep-1 rowsep-1">
+                max (S1, S2)
+              </td>
+            </tr>
+          </tbody></table>
+    </div>
+
+    <p class="p">
+      <strong class="ph b">Precision and scale in functions:</strong>
+    </p>
+
+    <div class="p">
+      When you use <code class="ph codeph">DECIMAL</code> values in built-in functions, the precision and
+      scale of the result value are determined as follows:
+      <ul class="ul">
+        <li dir="ltr" class="li">
+          The result of the <code class="ph codeph">SUM</code> aggregate function on a
+          <code class="ph codeph">DECIMAL</code> value is:
+          <ul class="ul">
+            <li class="li">
+              <p dir="ltr" class="p">
+                Precision: 38
+              </p>
+            </li>
+
+            <li class="li">
+              <p dir="ltr" class="p">
+                Scale: The same scale as the input column
+              </p>
+            </li>
+          </ul>
+        </li>
+
+        <li dir="ltr" class="li">
+          <p dir="ltr" class="p">
+            The result of <code class="ph codeph">AVG</code> aggregate function on a <code class="ph codeph">DECIMAL</code>
+            value is:
+          </p>
+          <ul class="ul">
+            <li class="li">
+              <p dir="ltr" class="p">
+                Precision: 38
+              </p>
+            </li>
+
+            <li class="li">
+              <p dir="ltr" class="p">
+                Scale: max(Scale of input column, 6)
+              </p>
+            </li>
+          </ul>
+        </li>
+      </ul>
+    </div>
+
+    <p class="p">
+      <strong class="ph b">Implicit conversions in DECIMAL assignments:</strong>
+    </p>
+
+    <p class="p">
+      Impala enforces strict conversion rules in decimal assignments like in
+      <code class="ph codeph">INSERT</code> and <code class="ph codeph">UNION</code> statements, or in functions like
+      <code class="ph codeph">COALESCE</code>.
+    </p>
+
+    <p class="p">
+      If there is not enough precision and scale in the destination, Impala fails with an error.
+    </p>
+
+    <div class="p">
+      Impala performs implicit conversions between <code class="ph codeph">DECIMAL</code> and other numeric
+      types as below:
+      <ul class="ul">
+        <li class="li">
+          <code class="ph codeph">DECIMAL</code> is implicitly converted to <code class="ph codeph">DOUBLE</code> or
+          <code class="ph codeph">FLOAT</code> when necessary even with a loss of precision. It can be
+          necessary, for example when inserting a <code class="ph codeph">DECIMAL</code> value into a
+          <code class="ph codeph">DOUBLE</code> column. For example:
+<pre class="pre codeblock"><code>CREATE TABLE flt(c FLOAT);
+INSERT INTO flt SELECT CAST(1e37 AS DECIMAL(38, 0));
+SELECT CAST(c AS DECIMAL(38, 0)) FROM flt;
+
+Result: 9999999933815812510711506376257961984</code></pre>
+          <p dir="ltr" class="p">
+            The result has a loss of information due to implicit casting. This is why we
+            discourage using the <code class="ph codeph">DOUBLE</code> and <code class="ph codeph">FLOAT</code> types in
+            general.
+          </p>
+        </li>
+
+        <li class="li">
+          <code class="ph codeph">DOUBLE</code> and <code class="ph codeph">FLOAT</code> cannot be implicitly converted to
+          <code class="ph codeph">DECIMAL</code>. An error is returned.
+        </li>
+
+        <li class="li">
+          <code class="ph codeph">DECIMAL</code> is implicitly converted to <code class="ph codeph">DECIMAL</code> if all
+          digits fit in the resulting <code class="ph codeph">DECIMAL</code>.
+          <div class="p">
+            For example, the following query returns an error because the resulting type that
+            guarantees that all digits fit cannot be determined .
+<pre class="pre codeblock"><code>SELECT GREATEST (CAST(1 AS DECIMAL(38, 0)), CAST(2 AS DECIMAL(38, 37)));</code></pre>
+          </div>
+        </li>
+
+        <li class="li">
+          Integer values can be implicitly converted to <code class="ph codeph">DECIMAL</code> when there is
+          enough room in the <code class="ph codeph">DECIMAL</code> to guarantee that all digits fit. The
+          integer types require the following numbers of digits to the left of the decimal point
+          when converted to <code class="ph codeph">DECIMAL</code>:
+          <ul class="ul">
+            <li class="li">
+              <p dir="ltr" class="p">
+                <code class="ph codeph">BIGINT</code>: 19 digits
+              </p>
+            </li>
+
+            <li class="li">
+              <p dir="ltr" class="p">
+                <code class="ph codeph">INT</code>: 10 digits
+              </p>
+            </li>
+
+            <li class="li">
+              <p dir="ltr" class="p">
+                <code class="ph codeph">SMALLINT</code>: 5 digits
+              </p>
+            </li>
+
+            <li class="li">
+              <p dir="ltr" class="p">
+                <code class="ph codeph">TINYINT</code>: 3 digits
+              </p>
+            </li>
+          </ul>
+          <p class="p">
+            For example:
+          </p>
+
+          <div class="p">
+<pre class="pre codeblock"><code>CREATE TABLE decimals_10_8 (x DECIMAL(10, 8));
+INSERT INTO decimals_10_8 VALUES (CAST(1 AS TINYINT));</code></pre>
+          </div>
+
+          <p class="p">
+            The above <code class="ph codeph">INSERT</code> statement fails because <code class="ph codeph">TINYINT</code>
+            requires room for 3 digits to the left of the decimal point in the
+            <code class="ph codeph">DECIMAL</code>.
+          </p>
+
+          <div class="p">
+<pre class="pre codeblock"><code>CREATE TABLE decimals_11_8(x DECIMAL(11, 8));
+INSERT INTO decimals_11_8 VALUES (CAST(1 AS TINYINT));</code></pre>
+          </div>
+
+          <p class="p">
+            The above <code class="ph codeph">INSERT</code> statement succeeds because there is enough room
+            for 3 digits to the left of the decimal point that <code class="ph codeph">TINYINT</code>
+            requires.
+          </p>
+        </li>
+      </ul>
+    </div>
+
+    <div class="p">
+      In <code class="ph codeph">UNION</code>, the resulting precision and scales are determined as follows.
+      <ul class="ul">
+        <li class="li">
+          Precision: max (L1, L2) + max (S1, S2)
+          <p class="p">
+            If the resulting type does not fit in the <code class="ph codeph">DECIMAL</code> type, an error is
+            returned. See the first example below.
+          </p>
+        </li>
+
+        <li class="li">
+          Scale: max (S1, S2)
+        </li>
+      </ul>
+    </div>
+
+    <div class="p">
+      Examples for <code class="ph codeph">UNION</code>:
+      <ul class="ul">
+        <li class="li">
+          <code class="ph codeph">DECIMAL(20, 0) UNION DECIMAL(20, 20)</code> would require a
+          <code class="ph codeph">DECIMAL(40, 20)</code> to fit all the digits. Since this is larger than the
+          max precision for <code class="ph codeph">DECIMAL</code>, Impala returns an error. One way to fix
+          the error is to cast both operands to the desired type, for example
+          <code class="ph codeph">DECIMAL(38, 18)</code>.
+        </li>
+
+        <li dir="ltr" class="li">
+          <p dir="ltr" class="p">
+            <code class="ph codeph">DECIMAL(20, 2) UNION DECIMAL(8, 6)</code> returns <code class="ph codeph">DECIMAL(24,
+            6)</code>.
+          </p>
+        </li>
+
+        <li dir="ltr" class="li">
+          <p dir="ltr" class="p">
+            <code class="ph codeph">INT UNION DECIMAL(9, 4)</code> returns <code class="ph codeph">DECIMAL(14, 4)</code>.
+          </p>
+
+          <p class="p">
+            <code class="ph codeph">INT</code> has the precision 10 and the scale 0, so it is treated as
+            <code class="ph codeph">DECIMAL(10, 0) UNION DECIMAL(9. 4)</code>.
+          </p>
+        </li>
+      </ul>
+    </div>
+
+    <p class="p">
+      <strong class="ph b">Casting between DECIMAL and other data types:</strong>
+    </p>
+
+    <div class="p">
+      To avoid potential conversion errors, use <code class="ph codeph">CAST</code> to explicitly convert
+      between <code class="ph codeph">DECIMAL</code> and other types in decimal assignments like in
+      <code class="ph codeph">INSERT</code> and <code class="ph codeph">UNION</code> statements, or in functions like
+      <code class="ph codeph">COALESCE</code>:
+      <ul class="ul">
+        <li dir="ltr" class="li">
+          <p dir="ltr" class="p">
+            You can cast the following types to <code class="ph codeph">DECIMAL</code>:
+            <code class="ph codeph">FLOAT</code>, <code class="ph codeph">TINYINT</code>, <code class="ph codeph">SMALLINT</code>,
+            <code class="ph codeph">INT</code>, <code class="ph codeph">BIGINT</code>, <code class="ph codeph">STRING</code>
+          </p>
+        </li>
+
+        <li dir="ltr" class="li">
+          <p dir="ltr" class="p">
+            You can cast <code class="ph codeph">DECIMAL</code> to the following types:
+            <code class="ph codeph">FLOAT</code>, <code class="ph codeph">TINYINT</code>, <code class="ph codeph">SMALLINT</code>,
+            <code class="ph codeph">INT</code>, <code class="ph codeph">BIGINT</code>, <code class="ph codeph">STRING</code>,
+            <code class="ph codeph">BOOLEAN</code>, <code class="ph codeph">TIMESTAMP</code>
+          </p>
+        </li>
+      </ul>
+    </div>
+
+    <div class="p">
+      Impala performs <code class="ph codeph">CAST</code> between <code class="ph codeph">DECIMAL</code> and other numeric
+      types as below:
+      <ul class="ul">
+        <li dir="ltr" class="li">
+          <p dir="ltr" class="p">
+            Precision: If you cast a value with bigger precision than the precision of the
+            destination type, Impala returns an error. For example, <code class="ph codeph">CAST(123456 AS
+            DECIMAL(3,0))</code> returns an error because all digits do not fit into
+            <code class="ph codeph">DECIMAL(3, 0)</code>
+          </p>
+        </li>
+
+        <li dir="ltr" class="li">
+          <p dir="ltr" class="p">
+            Scale: If you cast a value with more fractional digits than the scale of the
+            destination type, the fractional digits are rounded. For example, <code class="ph codeph">CAST(1.239
+            AS DECIMAL(3, 2))</code> returns <code class="ph codeph">1.24</code>.
+          </p>
+        </li>
+      </ul>
+    </div>
+
+    <p class="p">
+      <strong class="ph b">Casting STRING to DECIMAL:</strong>
+    </p>
+
+    <div class="p">
+      You can cast <code class="ph codeph">STRING</code> of numeric characters in columns, literals, or
+      expressions to <code class="ph codeph">DECIMAL</code> as long as number fits within the specified target
+      <code class="ph codeph">DECIMAL</code> type without overflow.
+      <ul class="ul">
+        <li dir="ltr" class="li">
+          <p dir="ltr" class="p">
+            If scale in <code class="ph codeph">STRING</code> &gt; scale in <code class="ph codeph">DECIMAL</code>, the
+            fractional digits are rounded to the <code class="ph codeph">DECIMAL</code> scale.
+          </p>
+
+          <p dir="ltr" class="p">
+            For example, <code class="ph codeph">CAST('98.678912' AS DECIMAL(15, 1))</code> returns
+            <code class="ph codeph">98.7</code>.
+          </p>
+        </li>
+
+        <li dir="ltr" class="li">
+          <p dir="ltr" class="p">
+            If # leading digits in <code class="ph codeph">STRING</code> &gt; # leading digits in
+            <code class="ph codeph">DECIMAL</code>, an error is returned.
+          </p>
+
+          <p dir="ltr" class="p">
+            For example, <code class="ph codeph">CAST('123.45' AS DECIMAL(2, 2))</code> returns an error.
+          </p>
+        </li>
+      </ul>
+    </div>
+
+    <p class="p">
+      Exponential notation is supported when casting from <code class="ph codeph">STRING</code>.
+    </p>
+
+    <p class="p">
+      For example, <code class="ph codeph">CAST('1.0e6' AS DECIMAL(32, 0))</code> returns
+      <code class="ph codeph">1000000</code>.
+    </p>
+
+    <p class="p">
+      Casting any non-numeric value, such as <code class="ph codeph">'ABC'</code> to the
+      <code class="ph codeph">DECIMAL</code> type returns an error.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Casting DECIMAL to TIMESTAMP:</strong>
+    </p>
+
+    <p class="p">
+      Casting a <code class="ph codeph">DECIMAL</code> value N to <code class="ph codeph">TIMESTAMP</code> produces a value
+      that is N seconds past the start of the epoch date (January 1, 1970).
+    </p>
+
+    <p class="p">
+      <strong class="ph b">DECIMAL vs FLOAT consideration:</strong>
+    </p>
+
+    <p class="p">
+      The <code class="ph codeph">FLOAT</code> and <code class="ph codeph">DOUBLE</code> types can cause problems or
+      unexpected behavior due to inability to precisely represent certain fractional values, for
+      example dollar and cents values for currency. You might find output values slightly
+      different than you inserted, equality tests that do not match precisely, or unexpected
+      values for <code class="ph codeph">GROUP BY</code> columns. The <code class="ph codeph">DECIMAL</code> type can help
+      reduce unexpected behavior and rounding errors, but at the expense of some performance
+      overhead for assignments and comparisons.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Literals and expressions:</strong>
+    </p>
+
+    <div class="p">
+      <ul class="ul">
+        <li dir="ltr" class="li">
+          <p dir="ltr" class="p">
+            Numeric literals without a decimal point
+          </p>
+          <ul class="ul">
+            <li class="li">
+              The literals are treated as the smallest integer that would fit the literal. For
+              example, <code class="ph codeph">111</code> is a <code class="ph codeph">TINYINT</code>, and
+              <code class="ph codeph">1111</code> is a <code class="ph codeph">SMALLINT</code>.
+            </li>
+
+            <li class="li">
+              Large literals that do not fit into any integer type are treated as
+              <code class="ph codeph">DECIMAL</code>.
+            </li>
+
+            <li class="li">
+              The literals too large to fit into a <code class="ph codeph">DECIMAL(38, 0)</code> are treated
+              as <code class="ph codeph">DOUBLE</code>.
+            </li>
+          </ul>
+        </li>
+
+        <li dir="ltr" class="li">
+          <p dir="ltr" class="p">
+            Numeric literals with a decimal point
+          </p>
+          <ul class="ul">
+            <li class="li">
+              The literal with less than 38 digits are treated as <code class="ph codeph">DECIMAL</code>.
+            </li>
+
+            <li class="li">
+              The literals with 38 or more digits are treated as a <code class="ph codeph">DOUBLE</code>.
+            </li>
+          </ul>
+        </li>
+
+        <li class="li">
+          Exponential notation is supported in <code class="ph codeph">DECIMAL</code> literals.
+        </li>
+
+        <li dir="ltr" class="li">
+          <p class="p">
+            To represent a very large or precise <code class="ph codeph">DECIMAL</code> value as a literal,
+            for example one that contains more digits than can be represented by a
+            <code class="ph codeph">BIGINT</code> literal, use a quoted string or a floating-point value for
+            the number and <code class="ph codeph">CAST</code> the string to the desired
+            <code class="ph codeph">DECIMAL</code> type.
+          </p>
+
+          <p class="p">
+            For example: <code class="ph codeph">CAST('999999999999999999999999999999' AS DECIMAL(38,
+            5)))</code>
+          </p>
+        </li>
+      </ul>
+    </div>
+
+    <p class="p">
+        <strong class="ph b">File format considerations:</strong>
+      </p>
+
+    <div class="p" dir="ltr">
+      The <code class="ph codeph">DECIMAL</code> data type can be stored in any of the file formats supported
+      by Impala.
+      <ul class="ul">
+        <li dir="ltr" class="li">
+          <p dir="ltr" class="p">
+            Impala can query Avro, RCFile, or SequenceFile tables that contain
+            <code class="ph codeph">DECIMAL</code> columns, created by other Hadoop components.
+          </p>
+        </li>
+
+        <li dir="ltr" class="li">
+          <p dir="ltr" class="p">
+            Impala can query and insert into Kudu tables that contain <code class="ph codeph">DECIMAL</code>
+            columns.
+          </p>
+        </li>
+
+        <li dir="ltr" class="li">
+          <p dir="ltr" class="p">
+            The <code class="ph codeph">DECIMAL</code> data type is fully compatible with HBase tables.
+          </p>
+        </li>
+
+        <li dir="ltr" class="li">
+          <p dir="ltr" class="p">
+            The <code class="ph codeph">DECIMAL</code> data type is fully compatible with Parquet tables.
+          </p>
+        </li>
+
+        <li dir="ltr" class="li">
+          <p dir="ltr" class="p">
+            Values of the <code class="ph codeph">DECIMAL</code> data type are potentially larger in text
+            tables than in tables using Parquet or other binary formats.
+          </p>
+        </li>
+      </ul>
+    </div>
+
+    <p class="p">
+      <strong class="ph b">UDF consideration:</strong>
+    </p>
+
+    <p class="p">
+      When writing a C++ UDF, use the <code class="ph codeph">DecimalVal</code> data type defined in
+      <span class="ph filepath">/usr/include/impala_udf/udf.h</span>.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Changing precision and scale:</strong>
+    </p>
+
+    <div class="p">
+      You can issue an <code class="ph codeph">ALTER TABLE ... REPLACE COLUMNS</code> statement to change the
+      precision and scale of an existing <code class="ph codeph">DECIMAL</code> column.
+      <ul class="ul">
+        <li dir="ltr" class="li">
+          <p dir="ltr" class="p">
+            For text-based formats (text, RCFile, and SequenceFile tables)
+          </p>
+          <ul class="ul">
+            <li class="li">
+              <p dir="ltr" class="p">
+                If the values in the column fit within the new precision and scale, they are
+                returned correctly by a query.
+              </p>
+            </li>
+
+            <li class="li">
+              <div class="p" dir="ltr">
+                If any values that do not fit within the new precision and scale:
+                <ul class="ul">
+                  <li class="li">
+                    Impala returns an error if the query option <code class="ph codeph">ABORT_ON_ERROR</code>
+                    is set to <code class="ph codeph">true</code>.
+                  </li>
+
+                  <li class="li">
+                    Impala returns a <code class="ph codeph">NULL</code> and warning that conversion failed if
+                    the query option <code class="ph codeph">ABORT_ON_ERROR</code> is set to
+                    <code class="ph codeph">false</code>.
+                  </li>
+                </ul>
+              </div>
+            </li>
+
+            <li class="li">
+              <p class="p">
+                Leading zeros do not count against the precision value, but trailing zeros after
+                the decimal point do.
+              </p>
+            </li>
+          </ul>
+        </li>
+
+        <li dir="ltr" class="li">
+          <p dir="ltr" class="p">
+            For binary formats (Parquet and Avro tables)
+          </p>
+          <ul class="ul">
+            <li class="li">
+              <p dir="ltr" class="p">
+                Although an <code class="ph codeph">ALTER TABLE ... REPLACE COLUMNS</code> statement that
+                changes the precision or scale of a <code class="ph codeph">DECIMAL</code> column succeeds,
+                any subsequent attempt to query the changed column results in a fatal error.
+                This is because the metadata about the columns is stored in the data files
+                themselves, and <code class="ph codeph">ALTER TABLE</code> does not actually make any updates
+                to the data files. The other unaltered columns can still be queried
+                successfully.
+              </p>
+            </li>
+
+            <li class="li">
+              <p dir="ltr" class="p">
+                If the metadata in the data files disagrees with the metadata in the metastore
+                database, Impala cancels the query.
+              </p>
+            </li>
+          </ul>
+        </li>
+      </ul>
+    </div>
+
+    <p class="p">
+        <strong class="ph b">Partitioning:</strong>
+      </p>
+
+    <p class="p">
+      Using a <code class="ph codeph">DECIMAL</code> column as a partition key provides you a better match
+      between the partition key values and the HDFS directory names than using a
+      <code class="ph codeph">DOUBLE</code> or <code class="ph codeph">FLOAT</code> partitioning column.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Column statistics considerations:</strong>
+    </p>
+
+    <p class="p">
+      Because the <code class="ph codeph">DECIMAL</code> type has a fixed size, the maximum and average size
+      fields are always filled in for column statistics, even before you run the <code class="ph codeph">COMPUTE
+      STATS</code> statement.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Compatibility with older version of DECIMAL:</strong>
+    </p>
+
+    <p class="p">
+      This version of <code class="ph codeph">DECIMAL</code> type is the default in
+      <span class="keyword">Impala 3.0</span> and higher. The key differences between this
+      version of <code class="ph codeph">DECIMAL</code> and the previous <code class="ph codeph">DECIMAL</code> V1 in Impala
+      2.x include the following.
+    </p>
+
+    <div class="p">
+      <table class="simpletable frame-all" id="decimal__simpletable_bwl_khm_rdb"><col style="width:33.33333333333333%"><col style="width:33.33333333333333%"><col style="width:33.33333333333333%"><thead><tr class="sthead">
+
+          <th class="stentry" id="decimal__simpletable_bwl_khm_rdb__stentry__1"></th>
+
+          <th class="stentry" id="decimal__simpletable_bwl_khm_rdb__stentry__2">DECIMAL in <span class="keyword">Impala 3.0</span> or
+            higher</th>
+
+          <th class="stentry" id="decimal__simpletable_bwl_khm_rdb__stentry__3">DECIMAL in <span class="keyword">Impala 2.12</span> or lower
+          </th>
+
+        </tr></thead><tbody><tr class="strow">
+
+          <td class="stentry" headers="decimal__simpletable_bwl_khm_rdb__stentry__1">Overall behavior</td>
+
+          <td class="stentry" headers="decimal__simpletable_bwl_khm_rdb__stentry__2">Returns either the result or an error.</td>
+
+          <td class="stentry" headers="decimal__simpletable_bwl_khm_rdb__stentry__3">Returns either the result or <code class="ph codeph">NULL</code> with a
+            warning.</td>
+
+        </tr><tr class="strow">
+
+          <td class="stentry" headers="decimal__simpletable_bwl_khm_rdb__stentry__1">Overflow behavior</td>
+
+          <td class="stentry" headers="decimal__simpletable_bwl_khm_rdb__stentry__2">Aborts with an error.</td>
+
+          <td class="stentry" headers="decimal__simpletable_bwl_khm_rdb__stentry__3">Issues a warning and returns <code class="ph codeph">NULL</code>.</td>
+
+        </tr><tr class="strow">
+
+          <td class="stentry" headers="decimal__simpletable_bwl_khm_rdb__stentry__1">Truncation / rounding behavior in arithmetic</td>
+
+          <td class="stentry" headers="decimal__simpletable_bwl_khm_rdb__stentry__2">Truncates and rounds digits from the back.</td>
+
+          <td class="stentry" headers="decimal__simpletable_bwl_khm_rdb__stentry__3">Truncates digits from the front.</td>
+
+        </tr><tr class="strow">
+
+          <td class="stentry" headers="decimal__simpletable_bwl_khm_rdb__stentry__1">String cast</td>
+
+          <td class="stentry" headers="decimal__simpletable_bwl_khm_rdb__stentry__2">Truncates from the back and rounds.</td>
+
+          <td class="stentry" headers="decimal__simpletable_bwl_khm_rdb__stentry__3">Truncates from the back.</td>
+
+        </tr></tbody></table>
+    </div>
+
+    <div class="p">
+      If you need to continue using the first version of the <code class="ph codeph">DECIMAL</code> type for
+      the backward compatibility of your queries, set the <code class="ph codeph">DECIMAL_V2</code> query
+      option to <code class="ph codeph">FALSE</code>:
+<pre class="pre codeblock"><code>SET DECIMAL_V2=FALSE;</code></pre>
+    </div>
+
+    <p class="p">
+      <strong class="ph b">Compatibility with other databases:</strong>
+    </p>
+
+    <p dir="ltr" class="p">
+      Use the <code class="ph codeph">DECIMAL</code> data type in Impala for applications where you used the
+      <code class="ph codeph">NUMBER</code> data type in Oracle.
+    </p>
+
+    <p dir="ltr" class="p">
+      The Impala <code class="ph codeph">DECIMAL</code> type does not support the Oracle idioms of
+      <code class="ph codeph">*</code> for scale.
+    </p>
+
+    <p dir="ltr" class="p">
+      The Impala <code class="ph codeph">DECIMAL</code> type does not support negative values for precision.
+    </p>
+
+  </div>
+
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_datatypes.html">Data Types</a></div></div></nav></article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_decimal_v2.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_decimal_v2.html b/docs/build3x/html/topics/impala_decimal_v2.html
new file mode 100644
index 0000000..b26c3e7
--- /dev/null
+++ b/docs/build3x/html/topics/impala_decimal_v2.html
@@ -0,0 +1,32 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="decimal_v2"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>DECIMAL_V2 Query Option</title></head><body id="decimal_v2"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">DECIMAL_V2 Query Option</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+      A query option that changes behavior related to the <code class="ph codeph">DECIMAL</code>
+      data type.
+    </p>
+
+    <div class="note important note_important"><span class="note__title importanttitle">Important:</span>
+      <p class="p">
+        This query option is currently unsupported.
+        Its precise behavior is currently undefined and might change
+        in the future.
+      </p>
+    </div>
+
+    <p class="p">
+        <strong class="ph b">Type:</strong> Boolean; recognized values are 1 and 0, or <code class="ph codeph">true</code> and <code class="ph codeph">false</code>;
+        any other value interpreted as <code class="ph codeph">false</code>
+      </p>
+    <p class="p">
+        <strong class="ph b">Default:</strong> <code class="ph codeph">false</code> (shown as 0 in output of <code class="ph codeph">SET</code> statement)
+      </p>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_default_join_distribution_mode.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_default_join_distribution_mode.html b/docs/build3x/html/topics/impala_default_join_distribution_mode.html
new file mode 100644
index 0000000..95ae29b
--- /dev/null
+++ b/docs/build3x/html/topics/impala_default_join_distribution_mode.html
@@ -0,0 +1,113 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="default_join_distribution_mode"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>DEFAULT_JOIN_DISTRIBUTION_MODE Query Option</title></head><body id="default_join_distribution_mode"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">DEFAULT_JOIN_DISTRIBUTION_MODE Query Option</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+
+      This option determines the join distribution that Impala uses when any of the tables
+      involved in a join query is missing statistics.
+    </p>
+
+    <p class="p">
+      Impala optimizes join queries based on the presence of table statistics,
+      which are produced by the Impala <code class="ph codeph">COMPUTE STATS</code> statement.
+      By default, when a table involved in the join query does not have statistics,
+      Impala uses the <span class="q">"broadcast"</span> technique that transmits the entire contents
+      of the table to all executor nodes participating in the query. If one table
+      involved in a join has statistics and the other does not, the table without
+      statistics is broadcast. If both tables are missing statistics, the table
+      that is referenced second in the join order is broadcast. This behavior
+      is appropriate when the table involved is relatively small, but can lead to
+      excessive network, memory, and CPU overhead if the table being broadcast is
+      large.
+    </p>
+
+    <p class="p">
+      Because Impala queries frequently involve very large tables, and suboptimal
+      joins for such tables could result in spilling or out-of-memory errors,
+      the setting <code class="ph codeph">DEFAULT_JOIN_DISTRIBUTION_MODE=SHUFFLE</code> lets you
+      override the default behavior. The shuffle join mechanism divides the corresponding rows
+      of each table involved in a join query using a hashing algorithm, and transmits
+      subsets of the rows to other nodes for processing. Typically, this kind of join is
+      more efficient for joins between large tables of similar size.
+    </p>
+
+    <p class="p">
+      The setting <code class="ph codeph">DEFAULT_JOIN_DISTRIBUTION_MODE=SHUFFLE</code> is
+      recommended when setting up and deploying new clusters, because it is less likely
+      to result in serious consequences such as spilling or out-of-memory errors if
+      the query plan is based on incomplete information. This setting is not the default,
+      to avoid changing the performance characteristics of join queries for clusters that
+      are already tuned for their existing workloads.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Type:</strong> integer
+      </p>
+    <p class="p">
+      The allowed values are <code class="ph codeph">BROADCAST</code> (equivalent to 0)
+      or <code class="ph codeph">SHUFFLE</code> (equivalent to 1).
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+    <p class="p">
+      The following examples demonstrate appropriate scenarios for each
+      setting of this query option.
+    </p>
+
+<pre class="pre codeblock"><code>
+-- Create a billion-row table.
+create table big_table stored as parquet
+  as select * from huge_table limit 1e9;
+
+-- For a big table with no statistics, the
+-- shuffle join mechanism is appropriate.
+set default_join_distribution_mode=shuffle;
+
+...join queries involving the big table...
+</code></pre>
+
+<pre class="pre codeblock"><code>
+-- Create a hundred-row table.
+create table tiny_table stored as parquet
+  as select * from huge_table limit 100;
+
+-- For a tiny table with no statistics, the
+-- broadcast join mechanism is appropriate.
+set default_join_distribution_mode=broadcast;
+
+...join queries involving the tiny table...
+</code></pre>
+
+<pre class="pre codeblock"><code>
+compute stats tiny_table;
+compute stats big_table;
+
+-- Once the stats are computed, the query option has
+-- no effect on join queries involving these tables.
+-- Impala can determine the absolute and relative sizes
+-- of each side of the join query by examining the
+-- row size, cardinality, and so on of each table.
+
+...join queries involving both of these tables...
+</code></pre>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+    <p class="p">
+      <a class="xref" href="impala_compute_stats.html">COMPUTE STATS Statement</a>,
+      <a class="xref" href="impala_joins.html">Joins in Impala SELECT Statements</a>,
+      <a class="xref" href="impala_perf_joins.html">Performance Considerations for Join Queries</a>
+    </p>
+
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_default_spillable_buffer_size.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_default_spillable_buffer_size.html b/docs/build3x/html/topics/impala_default_spillable_buffer_size.html
new file mode 100644
index 0000000..3eb3689
--- /dev/null
+++ b/docs/build3x/html/topics/impala_default_spillable_buffer_size.html
@@ -0,0 +1,87 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="default_spillable_buffer_size"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>DEFAULT_SPILLABLE_BUFFER_SIZE Query Option</title></head><body id="default_spillable_buffer_size"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">DEFAULT_SPILLABLE_BUFFER_SIZE Query Option</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+
+      Specifies the default size for a memory buffer used when the
+      spill-to-disk mechanism is activated, for example for queries against
+      a large table with no statistics, or large join operations.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Type:</strong> integer
+      </p>
+
+
+
+    <p class="p">
+        <strong class="ph b">Default:</strong>
+      </p>
+    <p class="p">
+      <code class="ph codeph">2097152</code> (2 MB)
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Units:</strong> A numeric argument represents a size in bytes; you can also use a suffix of <code class="ph codeph">m</code>
+        or <code class="ph codeph">mb</code> for megabytes, or <code class="ph codeph">g</code> or <code class="ph codeph">gb</code> for gigabytes. If you
+        specify a value with unrecognized formats, subsequent queries fail with an error.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.10.0</span>
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+    <p class="p">
+      This query option sets an upper bound on the size of the internal
+      buffer size that can be used during spill-to-disk operations. The
+      actual size of the buffer is chosen by the query planner.
+    </p>
+    <p class="p">
+      If overall query performance is limited by the time needed for spilling,
+      consider increasing the <code class="ph codeph">DEFAULT_SPILLABLE_BUFFER_SIZE</code> setting.
+      Larger buffer sizes result in Impala issuing larger I/O requests to storage
+      devices, which might result in higher throughput, particularly on rotational
+      disks.
+    </p>
+    <p class="p">
+      The tradeoff with a large value for this setting is increased memory usage during
+      spill-to-disk operations. Reducing this value may reduce memory consumption.
+    </p>
+    <p class="p">
+      To determine if the value for this setting is having an effect by capping the
+      spillable buffer size, you can see the buffer size chosen by the query planner for
+      a particular query. <code class="ph codeph">EXPLAIN</code> the query while the setting
+      <code class="ph codeph">EXPLAIN_LEVEL=2</code> is in effect.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>
+set default_spillable_buffer_size=4MB;
+
+</code></pre>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+    <p class="p">
+      <a class="xref" href="impala_buffer_pool_limit.html">BUFFER_POOL_LIMIT Query Option</a>,
+      <a class="xref" href="impala_max_row_size.html">MAX_ROW_SIZE Query Option</a>,
+      <a class="xref" href="impala_min_spillable_buffer_size.html">MIN_SPILLABLE_BUFFER_SIZE Query Option</a>,
+      <a class="xref" href="impala_scalability.html">Scalability Considerations for Impala</a>
+    </p>
+
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_delegation.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_delegation.html b/docs/build3x/html/topics/impala_delegation.html
new file mode 100644
index 0000000..696af37
--- /dev/null
+++ b/docs/build3x/html/topics/impala_delegation.html
@@ -0,0 +1,70 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_authentication.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="delegation"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Configuring Impala Delegation for Hue and BI Tools</title></head><body id="delegation"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Configuring Impala Delegation for Hue and BI Tools</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+
+      When users submit Impala queries through a separate application, such as Hue or a business intelligence tool,
+      typically all requests are treated as coming from the same user. In Impala 1.2 and higher, authentication is
+      extended by a new feature that allows applications to pass along credentials for the users that connect to
+      them (known as <span class="q">"delegation"</span>), and issue Impala queries with the privileges for those users. Currently,
+      the delegation feature is available only for Impala queries submitted through application interfaces such as
+      Hue and BI tools; for example, Impala cannot issue queries using the privileges of the HDFS user.
+    </p>
+
+    <p class="p">
+      The delegation feature is enabled by a startup option for <span class="keyword cmdname">impalad</span>:
+      <code class="ph codeph">--authorized_proxy_user_config</code>. When you specify this option, users whose names you specify
+      (such as <code class="ph codeph">hue</code>) can delegate the execution of a query to another user. The query runs with the
+      privileges of the delegated user, not the original user such as <code class="ph codeph">hue</code>. The name of the
+      delegated user is passed using the HiveServer2 configuration property <code class="ph codeph">impala.doas.user</code>.
+    </p>
+
+    <p class="p">
+      You can specify a list of users that the application user can delegate to, or <code class="ph codeph">*</code> to allow a
+      superuser to delegate to any other user. For example:
+    </p>
+
+<pre class="pre codeblock"><code>impalad --authorized_proxy_user_config 'hue=user1,user2;admin=*' ...</code></pre>
+
+    <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+      Make sure to use single quotes or escape characters to ensure that any <code class="ph codeph">*</code> characters do not
+      undergo wildcard expansion when specified in command-line arguments.
+    </div>
+
+    <p class="p">
+      See <a class="xref" href="impala_config_options.html#config_options">Modifying Impala Startup Options</a> for details about adding or changing
+      <span class="keyword cmdname">impalad</span> startup options. See
+      <a class="xref" href="http://blog.cloudera.com/blog/2013/07/how-hiveserver2-brings-security-and-concurrency-to-apache-hive/" target="_blank">this
+      blog post</a> for background information about the delegation capability in HiveServer2.
+    </p>
+    <p class="p">
+      To set up authentication for the delegated users:
+    </p>
+
+    <ul class="ul">
+      <li class="li">
+        <p class="p">
+          On the server side, configure either user/password authentication through LDAP, or Kerberos
+          authentication, for all the delegated users. See <a class="xref" href="impala_ldap.html#ldap">Enabling LDAP Authentication for Impala</a> or
+          <a class="xref" href="impala_kerberos.html#kerberos">Enabling Kerberos Authentication for Impala</a> for details.
+        </p>
+      </li>
+
+      <li class="li">
+        <p class="p">
+          On the client side, to learn how to enable delegation, consult the documentation
+          for the ODBC driver you are using.
+        </p>
+      </li>
+    </ul>
+
+  </div>
+
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_authentication.html">Impala Authentication</a></div></div></nav></article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_delete.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_delete.html b/docs/build3x/html/topics/impala_delete.html
new file mode 100644
index 0000000..668970e
--- /dev/null
+++ b/docs/build3x/html/topics/impala_delete.html
@@ -0,0 +1,177 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_langref_sql.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="delete"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>DELETE Statement (Impala 2.8 or higher only)</title></head><body id="delete"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">DELETE Statement (<span class="keyword">Impala 2.8</span> or higher only)</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+
+      Deletes an arbitrary number of rows from a Kudu table.
+      This statement only works for Impala tables that use the Kudu storage engine.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>
+DELETE [FROM] [<var class="keyword varname">database_name</var>.]<var class="keyword varname">table_name</var> [ WHERE <var class="keyword varname">where_conditions</var> ]
+
+DELETE <var class="keyword varname">table_ref</var> FROM [<var class="keyword varname">joined_table_refs</var>] [ WHERE <var class="keyword varname">where_conditions</var> ]
+</code></pre>
+
+    <p class="p">
+      The first form evaluates rows from one table against an optional
+      <code class="ph codeph">WHERE</code> clause, and deletes all the rows that
+      match the <code class="ph codeph">WHERE</code> conditions, or all rows if
+      <code class="ph codeph">WHERE</code> is omitted.
+    </p>
+
+    <p class="p">
+      The second form evaluates one or more join clauses, and deletes
+      all matching rows from one of the tables. The join clauses can
+      include non-Kudu tables, but the table from which the rows
+      are deleted must be a Kudu table. The <code class="ph codeph">FROM</code>
+      keyword is required in this case, to separate the name of
+      the table whose rows are being deleted from the table names
+      of the join clauses.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+    <p class="p">
+      The conditions in the <code class="ph codeph">WHERE</code> clause are the same ones allowed
+      for the <code class="ph codeph">SELECT</code> statement. See <a class="xref" href="impala_select.html#select">SELECT Statement</a>
+      for details.
+    </p>
+
+    <p class="p">
+      The conditions in the <code class="ph codeph">WHERE</code> clause can refer to
+      any combination of primary key columns or other columns. Referring to
+      primary key columns in the <code class="ph codeph">WHERE</code> clause is more efficient
+      than referring to non-primary key columns.
+    </p>
+
+    <p class="p">
+      If the <code class="ph codeph">WHERE</code> clause is omitted, all rows are removed from the table.
+    </p>
+
+    <p class="p">
+      Because Kudu currently does not enforce strong consistency during concurrent DML operations,
+      be aware that the results after this statement finishes might be different than you
+      intuitively expect:
+    </p>
+    <ul class="ul">
+      <li class="li">
+        <p class="p">
+          If some rows cannot be deleted because their
+          some primary key columns are not found, due to their being deleted
+          by a concurrent <code class="ph codeph">DELETE</code> operation,
+          the statement succeeds but returns a warning.
+        </p>
+      </li>
+      <li class="li">
+        <p class="p">
+          A <code class="ph codeph">DELETE</code> statement might also overlap with
+          <code class="ph codeph">INSERT</code>, <code class="ph codeph">UPDATE</code>,
+          or <code class="ph codeph">UPSERT</code> statements running concurrently on the same table.
+          After the statement finishes, there might be more or fewer rows than expected in the table
+          because it is undefined whether the <code class="ph codeph">DELETE</code> applies to rows that are
+          inserted or updated while the <code class="ph codeph">DELETE</code> is in progress.
+        </p>
+      </li>
+    </ul>
+
+    <p class="p">
+      The number of affected rows is reported in an <span class="keyword cmdname">impala-shell</span> message
+      and in the query profile.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Statement type:</strong> DML
+      </p>
+
+    <div class="note important note_important"><span class="note__title importanttitle">Important:</span>
+        After adding or replacing data in a table used in performance-critical queries, issue a <code class="ph codeph">COMPUTE
+        STATS</code> statement to make sure all statistics are up-to-date. Consider updating statistics for a
+        table after any <code class="ph codeph">INSERT</code>, <code class="ph codeph">LOAD DATA</code>, or <code class="ph codeph">CREATE TABLE AS
+        SELECT</code> statement in Impala, or after loading data through Hive and doing a <code class="ph codeph">REFRESH
+        <var class="keyword varname">table_name</var></code> in Impala. This technique is especially important for tables that
+        are very large, used in join queries, or both.
+      </div>
+
+    <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+    <p class="p">
+      The following examples show how to delete rows from a specified
+      table, either all rows or rows that match a <code class="ph codeph">WHERE</code>
+      clause:
+    </p>
+
+<pre class="pre codeblock"><code>
+-- Deletes all rows. The FROM keyword is optional.
+DELETE FROM kudu_table;
+DELETE kudu_table;
+
+-- Deletes 0, 1, or more rows.
+-- (If c1 is a single-column primary key, the statement could only
+-- delete 0 or 1 rows.)
+DELETE FROM kudu_table WHERE c1 = 100;
+
+-- Deletes all rows that match all the WHERE conditions.
+DELETE FROM kudu_table WHERE
+  (c1 &gt; c2 OR c3 IN ('hello','world')) AND c4 IS NOT NULL;
+DELETE FROM t1 WHERE
+  (c1 IN (1,2,3) AND c2 &gt; c3) OR c4 IS NOT NULL;
+DELETE FROM time_series WHERE
+  year = 2016 AND month IN (11,12) AND day &gt; 15;
+
+-- WHERE condition with a subquery.
+DELETE FROM t1 WHERE
+  c5 IN (SELECT DISTINCT other_col FROM other_table);
+
+-- Does not delete any rows, because the WHERE condition is always false.
+DELETE FROM kudu_table WHERE 1 = 0;
+</code></pre>
+
+    <p class="p">
+      The following examples show how to delete rows that are part
+      of the result set from a join:
+    </p>
+
+<pre class="pre codeblock"><code>
+-- Remove _all_ rows from t1 that have a matching X value in t2.
+DELETE t1 FROM t1 JOIN t2 ON t1.x = t2.x;
+
+-- Remove _some_ rows from t1 that have a matching X value in t2.
+DELETE t1 FROM t1 JOIN t2 ON t1.x = t2.x
+  WHERE t1.y = FALSE and t2.z &gt; 100;
+
+-- Delete from a Kudu table based on a join with a non-Kudu table.
+DELETE t1 FROM kudu_table t1 JOIN non_kudu_table t2 ON t1.x = t2.x;
+
+-- The tables can be joined in any order as long as the Kudu table
+-- is specified as the deletion target.
+DELETE t2 FROM non_kudu_table t1 JOIN kudu_table t2 ON t1.x = t2.x;
+</code></pre>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+    <p class="p">
+      <a class="xref" href="impala_kudu.html#impala_kudu">Using Impala to Query Kudu Tables</a>, <a class="xref" href="impala_insert.html#insert">INSERT Statement</a>,
+      <a class="xref" href="impala_update.html#update">UPDATE Statement (Impala 2.8 or higher only)</a>, <a class="xref" href="impala_upsert.html#upsert">UPSERT Statement (Impala 2.8 or higher only)</a>
+    </p>
+
+  </div>
+
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_langref_sql.html">Impala SQL Statements</a></div></div></nav></article></main></body></html>

[21/51] [partial] impala git commit: [DOCS] Impala doc site update for 3.0

Posted by mi...@apache.org.

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_max_row_size.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_max_row_size.html b/docs/build3x/html/topics/impala_max_row_size.html
new file mode 100644
index 0000000..76c6d69
--- /dev/null
+++ b/docs/build3x/html/topics/impala_max_row_size.html
@@ -0,0 +1,221 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="max_row_size"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>MAX_ROW_SIZE Query Option</title></head><body id="max_row_size"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">MAX_ROW_SIZE Query Option</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+
+      Ensures that Impala can process rows of at least the specified size. (Larger
+      rows might be successfully processed, but that is not guaranteed.) Applies when
+      constructing intermediate or final rows in the result set. This setting prevents
+      out-of-control memory use when accessing columns containing huge strings.
+    </p>
+
+
+
+    <p class="p">
+        <strong class="ph b">Type:</strong> integer
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Default:</strong>
+      </p>
+    <p class="p">
+      <code class="ph codeph">524288</code> (512 KB)
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Units:</strong> A numeric argument represents a size in bytes; you can also use a suffix of <code class="ph codeph">m</code>
+        or <code class="ph codeph">mb</code> for megabytes, or <code class="ph codeph">g</code> or <code class="ph codeph">gb</code> for gigabytes. If you
+        specify a value with unrecognized formats, subsequent queries fail with an error.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.10.0</span>
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+    <p class="p">
+      If a query fails because it involves rows with long strings and/or
+      many columns, causing the total row size to exceed <code class="ph codeph">MAX_ROW_SIZE</code>
+      bytes, increase the <code class="ph codeph">MAX_ROW_SIZE</code> setting to accommodate
+      the total bytes stored in the largest row. Examine the error messages for any
+      failed queries to see the size of the row that caused the problem.
+    </p>
+    <p class="p">
+      Impala attempts to handle rows that exceed the <code class="ph codeph">MAX_ROW_SIZE</code>
+      value where practical, so in many cases, queries succeed despite having rows
+      that are larger than this setting.
+    </p>
+    <p class="p">
+      Specifying a value that is substantially higher than actually needed can cause
+      Impala to reserve more memory than is necessary to execute the query.
+    </p>
+    <p class="p">
+      In a Hadoop cluster with highly concurrent workloads and queries that process
+      high volumes of data, traditional SQL tuning advice about minimizing wasted memory
+      is worth remembering. For example, if a table has <code class="ph codeph">STRING</code> columns
+      where a single value might be multiple megabytes, make sure that the
+      <code class="ph codeph">SELECT</code> lists in queries only refer to columns that are actually
+      needed in the result set, instead of using the <code class="ph codeph">SELECT *</code> shorthand.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+    <p class="p">
+      The following examples show the kinds of situations where it is necessary to
+      adjust the <code class="ph codeph">MAX_ROW_SIZE</code> setting. First, we create a table
+      containing some very long values in <code class="ph codeph">STRING</code> columns:
+    </p>
+
+<pre class="pre codeblock"><code>
+create table big_strings (s1 string, s2 string, s3 string) stored as parquet;
+
+-- Turn off compression to more easily reason about data volume by doing SHOW TABLE STATS.
+-- Does not actually affect query success or failure, because MAX_ROW_SIZE applies when
+-- column values are materialized in memory.
+set compression_codec=none;
+set;
+...
+  MAX_ROW_SIZE: [524288]
+...
+
+-- A very small row.
+insert into big_strings values ('one', 'two', 'three');
+-- A row right around the default MAX_ROW_SIZE limit: a 500 KiB string and a 30 KiB string.
+insert into big_strings values (repeat('12345',100000), 'short', repeat('123',10000));
+-- A row that is too big if the query has to materialize both S1 and S3.
+insert into big_strings values (repeat('12345',100000), 'short', repeat('12345',100000));
+
+</code></pre>
+
+    <p class="p">
+      With the default <code class="ph codeph">MAX_ROW_SIZE</code> setting, different queries succeed
+      or fail based on which column values have to be materialized during query processing:
+    </p>
+
+<pre class="pre codeblock"><code>
+-- All the S1 values can be materialized within the 512 KB MAX_ROW_SIZE buffer.
+select count(distinct s1) from big_strings;
++--------------------+
+| count(distinct s1) |
++--------------------+
+| 2                  |
++--------------------+
+
+-- A row where even the S1 value is too large to materialize within MAX_ROW_SIZE.
+insert into big_strings values (repeat('12345',1000000), 'short', repeat('12345',1000000));
+
+-- The 5 MiB string is too large to materialize. The message explains the size of the result
+-- set row the query is attempting to materialize.
+select count(distinct(s1)) from big_strings;
+WARNINGS: Row of size 4.77 MB could not be materialized in plan node with id 1.
+  Increase the max_row_size query option (currently 512.00 KB) to process larger rows.
+
+-- If more columns are involved, the result set row being materialized is bigger.
+select count(distinct s1, s2, s3) from big_strings;
+WARNINGS: Row of size 9.54 MB could not be materialized in plan node with id 1.
+  Increase the max_row_size query option (currently 512.00 KB) to process larger rows.
+
+-- Column S2, containing only short strings, can still be examined.
+select count(distinct(s2)) from big_strings;
++----------------------+
+| count(distinct (s2)) |
++----------------------+
+| 2                    |
++----------------------+
+
+-- Queries that do not materialize the big column values are OK.
+select count(*) from big_strings;
++----------+
+| count(*) |
++----------+
+| 4        |
++----------+
+
+</code></pre>
+
+    <p class="p">
+      The following examples show how adjusting <code class="ph codeph">MAX_ROW_SIZE</code> upward
+      allows queries involving the long string columns to succeed:
+    </p>
+
+<pre class="pre codeblock"><code>
+-- Boosting MAX_ROW_SIZE moderately allows all S1 values to be materialized.
+set max_row_size=7mb;
+
+select count(distinct s1) from big_strings;
++--------------------+
+| count(distinct s1) |
++--------------------+
+| 3                  |
++--------------------+
+
+-- But the combination of S1 + S3 strings is still too large.
+select count(distinct s1, s2, s3) from big_strings;
+WARNINGS: Row of size 9.54 MB could not be materialized in plan node with id 1. Increase the max_row_size query option (currently 7.00 MB) to process larger rows.
+
+-- Boosting MAX_ROW_SIZE to larger than the largest row in the table allows
+-- all queries to complete successfully.
+set max_row_size=12mb;
+
+select count(distinct s1, s2, s3) from big_strings;
++----------------------------+
+| count(distinct s1, s2, s3) |
++----------------------------+
+| 4                          |
++----------------------------+
+
+</code></pre>
+
+    <p class="p">
+      The following examples show how to reason about appropriate values for
+      <code class="ph codeph">MAX_ROW_SIZE</code>, based on the characteristics of the
+      columns containing the long values:
+    </p>
+
+<pre class="pre codeblock"><code>
+-- With a large MAX_ROW_SIZE in place, we can examine the columns to
+-- understand the practical lower limit for MAX_ROW_SIZE based on the
+-- table structure and column values.
+select max(length(s1) + length(s2) + length(s3)) / 1e6 as megabytes from big_strings;
++-----------+
+| megabytes |
++-----------+
+| 10.000005 |
++-----------+
+
+-- We can also examine the 'Max Size' for each column after computing stats.
+compute stats big_strings;
+show column stats big_strings;
++--------+--------+------------------+--------+----------+-----------+
+| Column | Type   | #Distinct Values | #Nulls | Max Size | Avg Size  |
++--------+--------+------------------+--------+----------+-----------+
+| s1     | STRING | 2                | -1     | 5000000  | 2500002.5 |
+| s2     | STRING | 2                | -1     | 10       | 7.5       |
+| s3     | STRING | 2                | -1     | 5000000  | 2500005   |
++--------+--------+------------------+--------+----------+-----------+
+
+</code></pre>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+    <p class="p">
+      <a class="xref" href="impala_buffer_pool_limit.html">BUFFER_POOL_LIMIT Query Option</a>,
+      <a class="xref" href="impala_default_spillable_buffer_size.html">DEFAULT_SPILLABLE_BUFFER_SIZE Query Option</a>,
+      <a class="xref" href="impala_min_spillable_buffer_size.html">MIN_SPILLABLE_BUFFER_SIZE Query Option</a>,
+      <a class="xref" href="impala_scalability.html">Scalability Considerations for Impala</a>
+    </p>
+
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_max_scan_range_length.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_max_scan_range_length.html b/docs/build3x/html/topics/impala_max_scan_range_length.html
new file mode 100644
index 0000000..0eaf110
--- /dev/null
+++ b/docs/build3x/html/topics/impala_max_scan_range_length.html
@@ -0,0 +1,47 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="max_scan_range_length"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>MAX_SCAN_RANGE_LENGTH Query Option</title></head><body id="max_scan_range_length"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">MAX_SCAN_RANGE_LENGTH Query Option</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+
+      Maximum length of the scan range. Interacts with the number of HDFS blocks in the table to determine how many
+      CPU cores across the cluster are involved with the processing for a query. (Each core processes one scan
+      range.)
+    </p>
+
+    <p class="p">
+      Lowering the value can sometimes increase parallelism if you have unused CPU capacity, but a too-small value
+      can limit query performance because each scan range involves extra overhead.
+    </p>
+
+    <p class="p">
+      Only applicable to HDFS tables. Has no effect on Parquet tables. Unspecified or 0 indicates backend default,
+      which is the same as the HDFS block size for each table.
+    </p>
+
+    <p class="p">
+      Although the scan range can be arbitrarily long, Impala internally uses an 8 MB read buffer so that it can
+      query tables with huge block sizes without allocating equivalent blocks of memory.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Type:</strong> numeric
+    </p>
+
+    <p class="p">
+      In <span class="keyword">Impala 2.7</span> and higher, the argument value can include unit specifiers,
+      such as <code class="ph codeph">100m</code> or <code class="ph codeph">100mb</code>. In previous versions,
+      Impala interpreted such formatted values as 0, leading to query failures.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Default:</strong> 0
+    </p>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_mem_limit.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_mem_limit.html b/docs/build3x/html/topics/impala_mem_limit.html
new file mode 100644
index 0000000..46e1cd3
--- /dev/null
+++ b/docs/build3x/html/topics/impala_mem_limit.html
@@ -0,0 +1,206 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="mem_limit"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>MEM_LIMIT Query Option</title></head><body id="mem_limit"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">MEM_LIMIT Query Option</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+
+      When resource management is not enabled, defines the maximum amount of memory a query can allocate on each node.
+      Therefore, the total memory that can be used by a query is the <code class="ph codeph">MEM_LIMIT</code> times the number of nodes.
+    </p>
+
+    <p class="p">
+      There are two levels of memory limit for Impala.
+      The <code class="ph codeph">-mem_limit</code> startup option sets an overall limit for the <span class="keyword cmdname">impalad</span> process
+      (which handles multiple queries concurrently).
+      That limit is typically expressed in terms of a percentage of the RAM available on the host, such as <code class="ph codeph">-mem_limit=70%</code>.
+      The <code class="ph codeph">MEM_LIMIT</code> query option, which you set through <span class="keyword cmdname">impala-shell</span>
+      or the <code class="ph codeph">SET</code> statement in a JDBC or ODBC application, applies to each individual query.
+      The <code class="ph codeph">MEM_LIMIT</code> query option is usually expressed as a fixed size such as <code class="ph codeph">10gb</code>,
+      and must always be less than the <span class="keyword cmdname">impalad</span> memory limit.
+    </p>
+
+    <p class="p">
+      If query processing exceeds the specified memory limit on any node, either the per-query limit or the
+      <span class="keyword cmdname">impalad</span> limit, Impala cancels the query automatically.
+      Memory limits are checked periodically during query processing, so the actual memory in use
+      might briefly exceed the limit without the query being cancelled.
+    </p>
+
+    <p class="p">
+      When resource management is enabled, the mechanism for this option changes. If set, it overrides the
+      automatic memory estimate from Impala. Impala requests this amount of memory from YARN on each node, and the
+      query does not proceed until that much memory is available. The actual memory used by the query could be
+      lower, since some queries use much less memory than others. With resource management, the
+      <code class="ph codeph">MEM_LIMIT</code> setting acts both as a hard limit on the amount of memory a query can use on any
+      node (enforced by YARN) and a guarantee that that much memory will be available on each node while the query
+      is being executed. When resource management is enabled but no <code class="ph codeph">MEM_LIMIT</code> setting is
+      specified, Impala estimates the amount of memory needed on each node for each query, requests that much
+      memory from YARN before starting the query, and then internally sets the <code class="ph codeph">MEM_LIMIT</code> on each
+      node to the requested amount of memory during the query. Thus, if the query takes more memory than was
+      originally estimated, Impala detects that the <code class="ph codeph">MEM_LIMIT</code> is exceeded and cancels the query
+      itself.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Type:</strong> numeric
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Units:</strong> A numeric argument represents memory size in bytes; you can also use a suffix of <code class="ph codeph">m</code> or <code class="ph codeph">mb</code>
+      for megabytes, or more commonly <code class="ph codeph">g</code> or <code class="ph codeph">gb</code> for gigabytes. If you specify a value with unrecognized
+      formats, subsequent queries fail with an error.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Default:</strong> 0 (unlimited)
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+    <p class="p">
+      The <code class="ph codeph">MEM_LIMIT</code> setting is primarily useful in a high-concurrency setting,
+      or on a cluster with a workload shared between Impala and other data processing components.
+      You can prevent any query from accidentally using much more memory than expected,
+      which could negatively impact other Impala queries.
+    </p>
+
+    <p class="p">
+      Use the output of the <code class="ph codeph">SUMMARY</code> command in <span class="keyword cmdname">impala-shell</span>
+      to get a report of memory used for each phase of your most heavyweight queries on each node,
+      and then set a <code class="ph codeph">MEM_LIMIT</code> somewhat higher than that.
+      See <a class="xref" href="impala_explain_plan.html#perf_summary">Using the SUMMARY Report for Performance Tuning</a> for usage information about
+      the <code class="ph codeph">SUMMARY</code> command.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+    <p class="p">
+      The following examples show how to set the <code class="ph codeph">MEM_LIMIT</code> query option
+      using a fixed number of bytes, or suffixes representing gigabytes or megabytes.
+    </p>
+
+<pre class="pre codeblock"><code>
+[localhost:21000] &gt; set mem_limit=3000000000;
+MEM_LIMIT set to 3000000000
+[localhost:21000] &gt; select 5;
+Query: select 5
++---+
+| 5 |
++---+
+| 5 |
++---+
+
+[localhost:21000] &gt; set mem_limit=3g;
+MEM_LIMIT set to 3g
+[localhost:21000] &gt; select 5;
+Query: select 5
++---+
+| 5 |
++---+
+| 5 |
++---+
+
+[localhost:21000] &gt; set mem_limit=3gb;
+MEM_LIMIT set to 3gb
+[localhost:21000] &gt; select 5;
++---+
+| 5 |
++---+
+| 5 |
++---+
+
+[localhost:21000] &gt; set mem_limit=3m;
+MEM_LIMIT set to 3m
+[localhost:21000] &gt; select 5;
++---+
+| 5 |
++---+
+| 5 |
++---+
+[localhost:21000] &gt; set mem_limit=3mb;
+MEM_LIMIT set to 3mb
+[localhost:21000] &gt; select 5;
++---+
+| 5 |
++---+
+</code></pre>
+
+    <p class="p">
+      The following examples show how unrecognized <code class="ph codeph">MEM_LIMIT</code>
+      values lead to errors for subsequent queries.
+    </p>
+
+<pre class="pre codeblock"><code>
+[localhost:21000] &gt; set mem_limit=3tb;
+MEM_LIMIT set to 3tb
+[localhost:21000] &gt; select 5;
+ERROR: Failed to parse query memory limit from '3tb'.
+
+[localhost:21000] &gt; set mem_limit=xyz;
+MEM_LIMIT set to xyz
+[localhost:21000] &gt; select 5;
+Query: select 5
+ERROR: Failed to parse query memory limit from 'xyz'.
+</code></pre>
+
+    <p class="p">
+      The following examples shows the automatic query cancellation
+      when the <code class="ph codeph">MEM_LIMIT</code> value is exceeded
+      on any host involved in the Impala query. First it runs a
+      successful query and checks the largest amount of memory
+      used on any node for any stage of the query.
+      Then it sets an artificially low <code class="ph codeph">MEM_LIMIT</code>
+      setting so that the same query cannot run.
+    </p>
+
+<pre class="pre codeblock"><code>
+[localhost:21000] &gt; select count(*) from customer;
+Query: select count(*) from customer
++----------+
+| count(*) |
++----------+
+| 150000   |
++----------+
+
+[localhost:21000] &gt; select count(distinct c_name) from customer;
+Query: select count(distinct c_name) from customer
++------------------------+
+| count(distinct c_name) |
++------------------------+
+| 150000                 |
++------------------------+
+
+[localhost:21000] &gt; summary;
++--------------+--------+----------+----------+---------+------------+----------+---------------+---------------+
+| Operator     | #Hosts | Avg Time | Max Time | #Rows   | Est. #Rows | Peak Mem | Est. Peak Mem | Detail        |
++--------------+--------+----------+----------+---------+------------+----------+---------------+---------------+
+| 06:AGGREGATE | 1      | 230.00ms | 230.00ms | 1       | 1          | 16.00 KB | -1 B          | FINALIZE      |
+| 05:EXCHANGE  | 1      | 43.44us  | 43.44us  | 1       | 1          | 0 B      | -1 B          | UNPARTITIONED |
+| 02:AGGREGATE | 1      | 227.14ms | 227.14ms | 1       | 1          | 12.00 KB | 10.00 MB      |               |
+| 04:AGGREGATE | 1      | 126.27ms | 126.27ms | 150.00K | 150.00K    | 15.17 MB | 10.00 MB      |               |
+| 03:EXCHANGE  | 1      | 44.07ms  | 44.07ms  | 150.00K | 150.00K    | 0 B      | 0 B           | HASH(c_name)  |
+<strong class="ph b">| 01:AGGREGATE | 1      | 361.94ms | 361.94ms | 150.00K | 150.00K    | 23.04 MB | 10.00 MB      |               |</strong>
+| 00:SCAN HDFS | 1      | 43.64ms  | 43.64ms  | 150.00K | 150.00K    | 24.19 MB | 64.00 MB      | tpch.customer |
++--------------+--------+----------+----------+---------+------------+----------+---------------+---------------+
+
+[localhost:21000] &gt; set mem_limit=15mb;
+MEM_LIMIT set to 15mb
+[localhost:21000] &gt; select count(distinct c_name) from customer;
+Query: select count(distinct c_name) from customer
+ERROR:
+Memory limit exceeded
+Query did not have enough memory to get the minimum required buffers in the block manager.
+</code></pre>
+
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_min.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_min.html b/docs/build3x/html/topics/impala_min.html
new file mode 100644
index 0000000..bfdfd0f
--- /dev/null
+++ b/docs/build3x/html/topics/impala_min.html
@@ -0,0 +1,297 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_aggregate_functions.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="min"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>MIN Function</title></head><body id="min"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">MIN Function</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+
+      An aggregate function that returns the minimum value from a set of numbers. Opposite of the
+      <code class="ph codeph">MAX</code> function. Its single argument can be numeric column, or the numeric result of a function
+      or expression applied to the column value. Rows with a <code class="ph codeph">NULL</code> value for the specified column
+      are ignored. If the table is empty, or all the values supplied to <code class="ph codeph">MIN</code> are
+      <code class="ph codeph">NULL</code>, <code class="ph codeph">MIN</code> returns <code class="ph codeph">NULL</code>.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>MIN([DISTINCT | ALL] <var class="keyword varname">expression</var>) [OVER (<var class="keyword varname">analytic_clause</var>)]</code></pre>
+
+    <p class="p">
+      When the query contains a <code class="ph codeph">GROUP BY</code> clause, returns one value for each combination of
+      grouping values.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Restrictions:</strong> In Impala 2.0 and higher, this function can be used as an analytic function, but with restrictions on any window clause.
+        For <code class="ph codeph">MAX()</code> and <code class="ph codeph">MIN()</code>, the window clause is only allowed if the start
+        bound is <code class="ph codeph">UNBOUNDED PRECEDING</code>.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Return type:</strong> Same as the input value, except for <code class="ph codeph">CHAR</code> and <code class="ph codeph">VARCHAR</code>
+        arguments which produce a <code class="ph codeph">STRING</code> result
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+    <p class="p">
+        If you frequently run aggregate functions such as <code class="ph codeph">MIN()</code>, <code class="ph codeph">MAX()</code>, and
+        <code class="ph codeph">COUNT(DISTINCT)</code> on partition key columns, consider enabling the <code class="ph codeph">OPTIMIZE_PARTITION_KEY_SCANS</code>
+        query option, which optimizes such queries. This feature is available in <span class="keyword">Impala 2.5</span> and higher.
+        See <a class="xref" href="../shared/../topics/impala_optimize_partition_key_scans.html">OPTIMIZE_PARTITION_KEY_SCANS Query Option (Impala 2.5 or higher only)</a>
+        for the kinds of queries that this option applies to, and slight differences in how partitions are
+        evaluated when this query option is enabled.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Complex type considerations:</strong>
+      </p>
+
+    <p class="p">
+        To access a column with a complex type (<code class="ph codeph">ARRAY</code>, <code class="ph codeph">STRUCT</code>, or <code class="ph codeph">MAP</code>)
+        in an aggregation function, you unpack the individual elements using join notation in the query,
+        and then apply the function to the final scalar item, field, key, or value at the bottom of any nested type hierarchy in the column.
+        See <a class="xref" href="../shared/../topics/impala_complex_types.html#complex_types">Complex Types (Impala 2.3 or higher only)</a> for details about using complex types in Impala.
+      </p>
+
+    <div class="p">
+The following example demonstrates calls to several aggregation functions
+using values from a column containing nested complex types
+(an <code class="ph codeph">ARRAY</code> of <code class="ph codeph">STRUCT</code> items).
+The array is unpacked inside the query using join notation.
+The array elements are referenced using the <code class="ph codeph">ITEM</code>
+pseudocolumn, and the structure fields inside the array elements
+are referenced using dot notation.
+Numeric values such as <code class="ph codeph">SUM()</code> and <code class="ph codeph">AVG()</code>
+are computed using the numeric <code class="ph codeph">R_NATIONKEY</code> field, and
+the general-purpose <code class="ph codeph">MAX()</code> and <code class="ph codeph">MIN()</code>
+values are computed from the string <code class="ph codeph">N_NAME</code> field.
+<pre class="pre codeblock"><code>describe region;
++-------------+-------------------------+---------+
+| name        | type                    | comment |
++-------------+-------------------------+---------+
+| r_regionkey | smallint                |         |
+| r_name      | string                  |         |
+| r_comment   | string                  |         |
+| r_nations   | array&lt;struct&lt;           |         |
+|             |   n_nationkey:smallint, |         |
+|             |   n_name:string,        |         |
+|             |   n_comment:string      |         |
+|             | &gt;&gt;                      |         |
++-------------+-------------------------+---------+
+
+select r_name, r_nations.item.n_nationkey
+  from region, region.r_nations as r_nations
+order by r_name, r_nations.item.n_nationkey;
++-------------+------------------+
+| r_name      | item.n_nationkey |
++-------------+------------------+
+| AFRICA      | 0                |
+| AFRICA      | 5                |
+| AFRICA      | 14               |
+| AFRICA      | 15               |
+| AFRICA      | 16               |
+| AMERICA     | 1                |
+| AMERICA     | 2                |
+| AMERICA     | 3                |
+| AMERICA     | 17               |
+| AMERICA     | 24               |
+| ASIA        | 8                |
+| ASIA        | 9                |
+| ASIA        | 12               |
+| ASIA        | 18               |
+| ASIA        | 21               |
+| EUROPE      | 6                |
+| EUROPE      | 7                |
+| EUROPE      | 19               |
+| EUROPE      | 22               |
+| EUROPE      | 23               |
+| MIDDLE EAST | 4                |
+| MIDDLE EAST | 10               |
+| MIDDLE EAST | 11               |
+| MIDDLE EAST | 13               |
+| MIDDLE EAST | 20               |
++-------------+------------------+
+
+select
+  r_name,
+  count(r_nations.item.n_nationkey) as count,
+  sum(r_nations.item.n_nationkey) as sum,
+  avg(r_nations.item.n_nationkey) as avg,
+  min(r_nations.item.n_name) as minimum,
+  max(r_nations.item.n_name) as maximum,
+  ndv(r_nations.item.n_nationkey) as distinct_vals
+from
+  region, region.r_nations as r_nations
+group by r_name
+order by r_name;
++-------------+-------+-----+------+-----------+----------------+---------------+
+| r_name      | count | sum | avg  | minimum   | maximum        | distinct_vals |
++-------------+-------+-----+------+-----------+----------------+---------------+
+| AFRICA      | 5     | 50  | 10   | ALGERIA   | MOZAMBIQUE     | 5             |
+| AMERICA     | 5     | 47  | 9.4  | ARGENTINA | UNITED STATES  | 5             |
+| ASIA        | 5     | 68  | 13.6 | CHINA     | VIETNAM        | 5             |
+| EUROPE      | 5     | 77  | 15.4 | FRANCE    | UNITED KINGDOM | 5             |
+| MIDDLE EAST | 5     | 58  | 11.6 | EGYPT     | SAUDI ARABIA   | 5             |
++-------------+-------+-----+------+-----------+----------------+---------------+
+</code></pre>
+</div>
+
+    <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>-- Find the smallest value for this column in the table.
+select min(c1) from t1;
+-- Find the smallest value for this column from a subset of the table.
+select min(c1) from t1 where month = 'January' and year = '2013';
+-- Find the smallest value from a set of numeric function results.
+select min(length(s)) from t1;
+-- Can also be used in combination with DISTINCT and/or GROUP BY.
+-- Return more than one result.
+select month, year, min(purchase_price) from store_stats group by month, year;
+-- Filter the input to eliminate duplicates before performing the calculation.
+select min(distinct x) from t1;
+</code></pre>
+
+    <div class="p">
+      The following examples show how to use <code class="ph codeph">MIN()</code> in an analytic context. They use a table
+      containing integers from 1 to 10. Notice how the <code class="ph codeph">MIN()</code> is reported for each input value, as
+      opposed to the <code class="ph codeph">GROUP BY</code> clause which condenses the result set.
+<pre class="pre codeblock"><code>select x, property, min(x) over (partition by property) as min from int_t where property in ('odd','even');
++----+----------+-----+
+| x  | property | min |
++----+----------+-----+
+| 2  | even     | 2   |
+| 4  | even     | 2   |
+| 6  | even     | 2   |
+| 8  | even     | 2   |
+| 10 | even     | 2   |
+| 1  | odd      | 1   |
+| 3  | odd      | 1   |
+| 5  | odd      | 1   |
+| 7  | odd      | 1   |
+| 9  | odd      | 1   |
++----+----------+-----+
+</code></pre>
+
+Adding an <code class="ph codeph">ORDER BY</code> clause lets you experiment with results that are cumulative or apply to a moving
+set of rows (the <span class="q">"window"</span>). The following examples use <code class="ph codeph">MIN()</code> in an analytic context
+(that is, with an <code class="ph codeph">OVER()</code> clause) to display the smallest value of <code class="ph codeph">X</code>
+encountered up to each row in the result set. The examples use two columns in the <code class="ph codeph">ORDER BY</code>
+clause to produce a sequence of values that rises and falls, to illustrate how the <code class="ph codeph">MIN()</code>
+result only decreases or stays the same throughout each partition within the result set.
+The basic <code class="ph codeph">ORDER BY x</code> clause implicitly
+activates a window clause of <code class="ph codeph">RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW</code>,
+which is effectively the same as <code class="ph codeph">ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW</code>,
+therefore all of these examples produce the same results:
+
+<pre class="pre codeblock"><code>select x, property, min(x) <strong class="ph b">over (order by property, x desc)</strong> as 'minimum to this point'
+  from int_t where property in ('prime','square');
++---+----------+-----------------------+
+| x | property | minimum to this point |
++---+----------+-----------------------+
+| 7 | prime    | 7                     |
+| 5 | prime    | 5                     |
+| 3 | prime    | 3                     |
+| 2 | prime    | 2                     |
+| 9 | square   | 2                     |
+| 4 | square   | 2                     |
+| 1 | square   | 1                     |
++---+----------+-----------------------+
+
+select x, property,
+  min(x) over
+  (
+    <strong class="ph b">order by property, x desc</strong>
+    <strong class="ph b">range between unbounded preceding and current row</strong>
+  ) as 'minimum to this point'
+from int_t where property in ('prime','square');
++---+----------+-----------------------+
+| x | property | minimum to this point |
++---+----------+-----------------------+
+| 7 | prime    | 7                     |
+| 5 | prime    | 5                     |
+| 3 | prime    | 3                     |
+| 2 | prime    | 2                     |
+| 9 | square   | 2                     |
+| 4 | square   | 2                     |
+| 1 | square   | 1                     |
++---+----------+-----------------------+
+
+select x, property,
+  min(x) over
+  (
+    <strong class="ph b">order by property, x desc</strong>
+    <strong class="ph b">rows between unbounded preceding and current row</strong>
+  ) as 'minimum to this point'
+from int_t where property in ('prime','square');
++---+----------+-----------------------+
+| x | property | minimum to this point |
++---+----------+-----------------------+
+| 7 | prime    | 7                     |
+| 5 | prime    | 5                     |
+| 3 | prime    | 3                     |
+| 2 | prime    | 2                     |
+| 9 | square   | 2                     |
+| 4 | square   | 2                     |
+| 1 | square   | 1                     |
++---+----------+-----------------------+
+</code></pre>
+
+The following examples show how to construct a moving window, with a running minimum taking into account all rows before
+and 1 row after the current row.
+Because of a restriction in the Impala <code class="ph codeph">RANGE</code> syntax, this type of
+moving window is possible with the <code class="ph codeph">ROWS BETWEEN</code> clause but not the <code class="ph codeph">RANGE BETWEEN</code> clause.
+Because of an extra Impala restriction on the <code class="ph codeph">MAX()</code> and <code class="ph codeph">MIN()</code> functions in an
+analytic context, the lower bound must be <code class="ph codeph">UNBOUNDED PRECEDING</code>.
+<pre class="pre codeblock"><code>select x, property,
+  min(x) over
+  (
+    <strong class="ph b">order by property, x desc</strong>
+    <strong class="ph b">rows between unbounded preceding and 1 following</strong>
+  ) as 'local minimum'
+from int_t where property in ('prime','square');
++---+----------+---------------+
+| x | property | local minimum |
++---+----------+---------------+
+| 7 | prime    | 5             |
+| 5 | prime    | 3             |
+| 3 | prime    | 2             |
+| 2 | prime    | 2             |
+| 9 | square   | 2             |
+| 4 | square   | 1             |
+| 1 | square   | 1             |
++---+----------+---------------+
+
+-- Doesn't work because of syntax restriction on RANGE clause.
+select x, property,
+  min(x) over
+  (
+    <strong class="ph b">order by property, x desc</strong>
+    <strong class="ph b">range between unbounded preceding and 1 following</strong>
+  ) as 'local minimum'
+from int_t where property in ('prime','square');
+ERROR: AnalysisException: RANGE is only supported with both the lower and upper bounds UNBOUNDED or one UNBOUNDED and the other CURRENT ROW.
+</code></pre>
+    </div>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+    <p class="p">
+      <a class="xref" href="impala_analytic_functions.html#analytic_functions">Impala Analytic Functions</a>, <a class="xref" href="impala_max.html#max">MAX Function</a>,
+      <a class="xref" href="impala_avg.html#avg">AVG Function</a>
+    </p>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_aggregate_functions.html">Impala Aggregate Functions</a></div></div></nav></article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_min_spillable_buffer_size.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_min_spillable_buffer_size.html b/docs/build3x/html/topics/impala_min_spillable_buffer_size.html
new file mode 100644
index 0000000..9f3c84e
--- /dev/null
+++ b/docs/build3x/html/topics/impala_min_spillable_buffer_size.html
@@ -0,0 +1,87 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="min_spillable_buffer_size"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>MIN_SPILLABLE_BUFFER_SIZE Query Option</title></head><body id="min_spillable_buffer_size"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">MIN_SPILLABLE_BUFFER_SIZE Query Option</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+
+      Specifies the minimum size for a memory buffer used when the
+      spill-to-disk mechanism is activated, for example for queries against
+      a large table with no statistics, or large join operations.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Type:</strong> integer
+      </p>
+
+
+
+    <p class="p">
+        <strong class="ph b">Default:</strong>
+      </p>
+    <p class="p">
+      <code class="ph codeph">65536</code> (64 KB)
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Units:</strong> A numeric argument represents a size in bytes; you can also use a suffix of <code class="ph codeph">m</code>
+        or <code class="ph codeph">mb</code> for megabytes, or <code class="ph codeph">g</code> or <code class="ph codeph">gb</code> for gigabytes. If you
+        specify a value with unrecognized formats, subsequent queries fail with an error.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.10.0</span>
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+    <p class="p">
+      This query option sets a lower bound on the size of the internal
+      buffer size that can be used during spill-to-disk operations. The
+      actual size of the buffer is chosen by the query planner.
+    </p>
+    <p class="p">
+      If overall query performance is limited by the time needed for spilling,
+      consider increasing the <code class="ph codeph">MIN_SPILLABLE_BUFFER_SIZE</code> setting.
+      Larger buffer sizes result in Impala issuing larger I/O requests to storage
+      devices, which might result in higher throughput, particularly on rotational
+      disks.
+    </p>
+    <p class="p">
+      The tradeoff with a large value for this setting is increased memory usage during
+      spill-to-disk operations. Reducing this value may reduce memory consumption.
+    </p>
+    <p class="p">
+      To determine if the value for this setting is having an effect by capping the
+      spillable buffer size, you can see the buffer size chosen by the query planner for
+      a particular query. <code class="ph codeph">EXPLAIN</code> the query while the setting
+      <code class="ph codeph">EXPLAIN_LEVEL=2</code> is in effect.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>
+set min_spillable_buffer_size=128KB;
+
+</code></pre>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+    <p class="p">
+      <a class="xref" href="impala_buffer_pool_limit.html">BUFFER_POOL_LIMIT Query Option</a>,
+      <a class="xref" href="impala_default_spillable_buffer_size.html">DEFAULT_SPILLABLE_BUFFER_SIZE Query Option</a>,
+      <a class="xref" href="impala_max_row_size.html">MAX_ROW_SIZE Query Option</a>,
+      <a class="xref" href="impala_scalability.html">Scalability Considerations for Impala</a>
+    </p>
+
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_misc_functions.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_misc_functions.html b/docs/build3x/html/topics/impala_misc_functions.html
new file mode 100644
index 0000000..4210a99
--- /dev/null
+++ b/docs/build3x/html/topics/impala_misc_functions.html
@@ -0,0 +1,175 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_functions.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="misc_functions"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Impala Miscellaneous Functions</title></head><body id="misc_functions"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Impala Miscellaneous Functions</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+      Impala supports the following utility functions that do not operate on a particular column or data type:
+    </p>
+
+    <dl class="dl">
+
+
+        <dt class="dt dlterm" id="misc_functions__current_database">
+          <code class="ph codeph">current_database()</code>
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Returns the database that the session is currently using, either <code class="ph codeph">default</code>
+          if no database has been selected, or whatever database the session switched to through a
+          <code class="ph codeph">USE</code> statement or the <span class="keyword cmdname">impalad</span><code class="ph codeph">-d</code> option.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+          </p>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="misc_functions__effective_user">
+          <code class="ph codeph">effective_user()</code>
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Typically returns the same value as <code class="ph codeph">user()</code>,
+          except if delegation is enabled, in which case it returns the ID of the delegated user.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+          </p>
+          <p class="p">
+            <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.2.5</span>
+          </p>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="misc_functions__pid">
+          <code class="ph codeph">pid()</code>
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Returns the process ID of the <span class="keyword cmdname">impalad</span> daemon that the session is
+          connected to. You can use it during low-level debugging, to issue Linux commands that trace, show the
+          arguments, and so on the <span class="keyword cmdname">impalad</span> process.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">int</code>
+          </p>
+        </dd>
+
+
+
+
+
+
+
+        <dt class="dt dlterm" id="misc_functions__user">
+          <code class="ph codeph">user()</code>
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Returns the username of the Linux user who is connected to the <span class="keyword cmdname">impalad</span>
+          daemon. Typically called a single time, in a query without any <code class="ph codeph">FROM</code> clause, to
+          understand how authorization settings apply in a security context; once you know the logged-in username,
+          you can check which groups that user belongs to, and from the list of groups you can check which roles
+          are available to those groups through the authorization policy file.
+          <p class="p">
+        In Impala 2.0 and later, <code class="ph codeph">user()</code> returns the full Kerberos principal string, such as
+        <code class="ph codeph">user@example.com</code>, in a Kerberized environment.
+      </p>
+          <p class="p">
+            When delegation is enabled, consider calling the <code class="ph codeph">effective_user()</code> function instead.
+          </p>
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+          </p>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="misc_functions__uuid">
+          <code class="ph codeph">uuid()</code>
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Returns a <a class="xref" href="https://en.wikipedia.org/wiki/Universally_unique_identifier" target="_blank">universal unique identifier</a>, a 128-bit value encoded as a string with groups of hexadecimal digits separated by dashes.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+          </p>
+          <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.5.0</span>
+      </p>
+          <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+          <p class="p">
+            Ascending numeric sequences of type <code class="ph codeph">BIGINT</code> are often used
+            as identifiers within a table, and as join keys across multiple tables.
+            The <code class="ph codeph">uuid()</code> value is a convenient alternative that does not
+            require storing or querying the highest sequence number. For example, you
+            can use it to quickly construct new unique identifiers during a data import job,
+            or to combine data from different tables without the likelihood of ID collisions.
+          </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+<pre class="pre codeblock"><code>
+-- Each call to uuid() produces a new arbitrary value.
+select uuid();
++--------------------------------------+
+| uuid()                               |
++--------------------------------------+
+| c7013e25-1455-457f-bf74-a2046e58caea |
++--------------------------------------+
+
+-- If you get a UUID for each row of a result set, you can use it as a
+-- unique identifier within a table, or even a unique ID across tables.
+select uuid() from four_row_table;
++--------------------------------------+
+| uuid()                               |
++--------------------------------------+
+| 51d3c540-85e5-4cb9-9110-604e53999e2e |
+| 0bb40071-92f6-4a59-a6a4-60d46e9703e2 |
+| 5e9d7c36-9842-4a96-862d-c13cd0457c02 |
+| cae29095-0cc0-4053-a5ea-7fcd3c780861 |
++--------------------------------------+
+</code></pre>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="misc_functions__version">
+          <code class="ph codeph">version()</code>
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Returns information such as the precise version number and build date for the
+          <code class="ph codeph">impalad</code> daemon that you are currently connected to. Typically used to confirm that you
+          are connected to the expected level of Impala to use a particular feature, or to connect to several nodes
+          and confirm they are all running the same level of <span class="keyword cmdname">impalad</span>.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code> (with one or more embedded newlines)
+          </p>
+        </dd>
+
+
+    </dl>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_functions.html">Impala Built-In Functions</a></div></div></nav></article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_mixed_security.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_mixed_security.html b/docs/build3x/html/topics/impala_mixed_security.html
new file mode 100644
index 0000000..9cadbf7
--- /dev/null
+++ b/docs/build3x/html/topics/impala_mixed_security.html
@@ -0,0 +1,26 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_authentication.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="mixed_security"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Using Multiple Authentication Methods with Impala</title></head><body id="mixed_security"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Using Multiple Authentication Methods with Impala</h1>
+
+
+  <div class="body conbody">
+
+    <p class="p">
+      Impala 2.0 and later automatically handles both Kerberos and LDAP authentication. Each
+      <span class="keyword cmdname">impalad</span> daemon can accept both Kerberos and LDAP requests through the same port. No
+      special actions need to be taken if some users authenticate through Kerberos and some through LDAP.
+    </p>
+
+    <p class="p">
+      Prior to Impala 2.0, you had to configure each <span class="keyword cmdname">impalad</span> to listen on a specific port
+      depending on the kind of authentication, then configure your network load balancer to forward each kind of
+      request to a DataNode that was set up with the appropriate authentication type. Once the initial request was
+      made using either Kerberos or LDAP authentication, Impala automatically handled the process of coordinating
+      the work across multiple nodes and transmitting intermediate results back to the coordinator node.
+    </p>
+
+
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_authentication.html">Impala Authentication</a></div></div></nav></article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_mt_dop.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_mt_dop.html b/docs/build3x/html/topics/impala_mt_dop.html
new file mode 100644
index 0000000..42d9591
--- /dev/null
+++ b/docs/build3x/html/topics/impala_mt_dop.html
@@ -0,0 +1,190 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="mt_dop"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>MT_DOP Query Option</title></head><body id="mt_dop"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">MT_DOP Query Option</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+
+      Sets the degree of intra-node parallelism used for certain operations that
+      can benefit from multithreaded execution. You can specify values
+      higher than zero to find the ideal balance of response time,
+      memory usage, and CPU usage during statement processing.
+    </p>
+
+    <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+      <p class="p">
+        The Impala execution engine is being revamped incrementally to add
+        additional parallelism within a single host for certain statements and
+        kinds of operations. The setting <code class="ph codeph">MT_DOP=0</code> uses the
+        <span class="q">"old"</span> code path with limited intra-node parallelism.
+      </p>
+
+      <p class="p">
+        Currently, the operations affected by the <code class="ph codeph">MT_DOP</code>
+        query option are:
+      </p>
+      <ul class="ul">
+        <li class="li">
+          <p class="p">
+            <code class="ph codeph">COMPUTE [INCREMENTAL] STATS</code>. Impala automatically sets
+            <code class="ph codeph">MT_DOP=4</code> for <code class="ph codeph">COMPUTE STATS</code> and
+            <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> statements on Parquet tables.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            Queries with execution plans containing only scan and aggregation operators,
+            or local joins that do not need data exchanges (such as for nested types).
+            Other queries produce an error if <code class="ph codeph">MT_DOP</code> is set to a non-zero
+            value. Therefore, this query option is typically only set for the duration of
+            specific long-running, CPU-intensive queries.
+          </p>
+        </li>
+      </ul>
+
+    </div>
+
+    <p class="p">
+        <strong class="ph b">Type:</strong> integer
+      </p>
+    <p class="p">
+        <strong class="ph b">Default:</strong> <code class="ph codeph">0</code>
+      </p>
+    <p class="p">
+      Because <code class="ph codeph">COMPUTE STATS</code> and <code class="ph codeph">COMPUTE INCREMENTAL STATS</code>
+      statements for Parquet tables benefit substantially from extra intra-node
+      parallelism, Impala automatically sets <code class="ph codeph">MT_DOP=4</code> when computing stats
+      for Parquet tables.
+    </p>
+    <p class="p">
+      <strong class="ph b">Range:</strong> 0 to 64
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+    <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+      <p class="p">
+        Any timing figures in the following examples are on a small, lightly loaded development cluster.
+        Your mileage may vary. Speedups depend on many factors, including the number of rows, columns, and
+        partitions within each table.
+      </p>
+    </div>
+
+    <p class="p">
+      The following example shows how to run a <code class="ph codeph">COMPUTE STATS</code>
+      statement against a Parquet table with or without an explicit <code class="ph codeph">MT_DOP</code>
+      setting:
+    </p>
+
+<pre class="pre codeblock"><code>
+-- Explicitly setting MT_DOP to 0 selects the old code path.
+set mt_dop = 0;
+MT_DOP set to 0
+
+-- The analysis for the billion rows is distributed among hosts,
+-- but uses only a single core on each host.
+compute stats billion_rows_parquet;
++-----------------------------------------+
+| summary                                 |
++-----------------------------------------+
+| Updated 1 partition(s) and 2 column(s). |
++-----------------------------------------+
+
+drop stats billion_rows_parquet;
+
+-- Using 4 logical processors per host is faster.
+set mt_dop = 4;
+MT_DOP set to 4
+
+compute stats billion_rows_parquet;
++-----------------------------------------+
+| summary                                 |
++-----------------------------------------+
+| Updated 1 partition(s) and 2 column(s). |
++-----------------------------------------+
+
+drop stats billion_rows_parquet;
+
+-- Unsetting the option reverts back to its default.
+-- Which for COMPUTE STATS and a Parquet table is 4,
+-- so again it uses the fast path.
+unset MT_DOP;
+Unsetting option MT_DOP
+
+compute stats billion_rows_parquet;
++-----------------------------------------+
+| summary                                 |
++-----------------------------------------+
+| Updated 1 partition(s) and 2 column(s). |
++-----------------------------------------+
+
+</code></pre>
+
+    <p class="p">
+      The following example shows the effects of setting <code class="ph codeph">MT_DOP</code>
+      for a query involving only scan and aggregation operations for a Parquet table:
+    </p>
+
+<pre class="pre codeblock"><code>
+set mt_dop = 0;
+MT_DOP set to 0
+
+-- COUNT(DISTINCT) for a unique column is CPU-intensive.
+select count(distinct id) from billion_rows_parquet;
++--------------------+
+| count(distinct id) |
++--------------------+
+| 1000000000         |
++--------------------+
+Fetched 1 row(s) in 67.20s
+
+set mt_dop = 16;
+MT_DOP set to 16
+
+-- Introducing more intra-node parallelism for the aggregation
+-- speeds things up, and potentially reduces memory overhead by
+-- reducing the number of scanner threads.
+select count(distinct id) from billion_rows_parquet;
++--------------------+
+| count(distinct id) |
++--------------------+
+| 1000000000         |
++--------------------+
+Fetched 1 row(s) in 17.19s
+
+</code></pre>
+
+    <p class="p">
+      The following example shows how queries that are not compatible with non-zero
+      <code class="ph codeph">MT_DOP</code> settings produce an error when <code class="ph codeph">MT_DOP</code>
+      is set:
+    </p>
+
+<pre class="pre codeblock"><code>
+set mt_dop=1;
+MT_DOP set to 1
+
+select * from a1 inner join a2
+  on a1.id = a2.id limit 4;
+ERROR: NotImplementedException: MT_DOP not supported for plans with
+  base table joins or table sinks.
+
+</code></pre>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+    <p class="p">
+      <a class="xref" href="impala_compute_stats.html">COMPUTE STATS Statement</a>,
+      <a class="xref" href="impala_aggregate_functions.html">Impala Aggregate Functions</a>
+    </p>
+
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_ndv.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_ndv.html b/docs/build3x/html/topics/impala_ndv.html
new file mode 100644
index 0000000..a3f7e2c
--- /dev/null
+++ b/docs/build3x/html/topics/impala_ndv.html
@@ -0,0 +1,226 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_aggregate_functions.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="ndv"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>NDV Function</title></head><body id="ndv"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">NDV Function</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+
+      An aggregate function that returns an approximate value similar to the result of <code class="ph codeph">COUNT(DISTINCT
+      <var class="keyword varname">col</var>)</code>, the <span class="q">"number of distinct values"</span>. It is much faster than the
+      combination of <code class="ph codeph">COUNT</code> and <code class="ph codeph">DISTINCT</code>, and uses a constant amount of memory and
+      thus is less memory-intensive for columns with high cardinality.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>NDV([DISTINCT | ALL] <var class="keyword varname">expression</var>)</code></pre>
+
+    <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+    <p class="p">
+      This is the mechanism used internally by the <code class="ph codeph">COMPUTE STATS</code> statement for computing the
+      number of distinct values in a column.
+    </p>
+
+    <p class="p">
+      Because this number is an estimate, it might not reflect the precise number of different values in the
+      column, especially if the cardinality is very low or very high. If the estimated number is higher than the
+      number of rows in the table, Impala adjusts the value internally during query planning.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Return type:</strong> <code class="ph codeph">DOUBLE</code> in Impala 2.0 and higher; <code class="ph codeph">STRING</code> in earlier
+        releases
+      </p>
+
+
+
+    <p class="p">
+        <strong class="ph b">Complex type considerations:</strong>
+      </p>
+
+    <p class="p">
+        To access a column with a complex type (<code class="ph codeph">ARRAY</code>, <code class="ph codeph">STRUCT</code>, or <code class="ph codeph">MAP</code>)
+        in an aggregation function, you unpack the individual elements using join notation in the query,
+        and then apply the function to the final scalar item, field, key, or value at the bottom of any nested type hierarchy in the column.
+        See <a class="xref" href="../shared/../topics/impala_complex_types.html#complex_types">Complex Types (Impala 2.3 or higher only)</a> for details about using complex types in Impala.
+      </p>
+
+    <div class="p">
+The following example demonstrates calls to several aggregation functions
+using values from a column containing nested complex types
+(an <code class="ph codeph">ARRAY</code> of <code class="ph codeph">STRUCT</code> items).
+The array is unpacked inside the query using join notation.
+The array elements are referenced using the <code class="ph codeph">ITEM</code>
+pseudocolumn, and the structure fields inside the array elements
+are referenced using dot notation.
+Numeric values such as <code class="ph codeph">SUM()</code> and <code class="ph codeph">AVG()</code>
+are computed using the numeric <code class="ph codeph">R_NATIONKEY</code> field, and
+the general-purpose <code class="ph codeph">MAX()</code> and <code class="ph codeph">MIN()</code>
+values are computed from the string <code class="ph codeph">N_NAME</code> field.
+<pre class="pre codeblock"><code>describe region;
++-------------+-------------------------+---------+
+| name        | type                    | comment |
++-------------+-------------------------+---------+
+| r_regionkey | smallint                |         |
+| r_name      | string                  |         |
+| r_comment   | string                  |         |
+| r_nations   | array&lt;struct&lt;           |         |
+|             |   n_nationkey:smallint, |         |
+|             |   n_name:string,        |         |
+|             |   n_comment:string      |         |
+|             | &gt;&gt;                      |         |
++-------------+-------------------------+---------+
+
+select r_name, r_nations.item.n_nationkey
+  from region, region.r_nations as r_nations
+order by r_name, r_nations.item.n_nationkey;
++-------------+------------------+
+| r_name      | item.n_nationkey |
++-------------+------------------+
+| AFRICA      | 0                |
+| AFRICA      | 5                |
+| AFRICA      | 14               |
+| AFRICA      | 15               |
+| AFRICA      | 16               |
+| AMERICA     | 1                |
+| AMERICA     | 2                |
+| AMERICA     | 3                |
+| AMERICA     | 17               |
+| AMERICA     | 24               |
+| ASIA        | 8                |
+| ASIA        | 9                |
+| ASIA        | 12               |
+| ASIA        | 18               |
+| ASIA        | 21               |
+| EUROPE      | 6                |
+| EUROPE      | 7                |
+| EUROPE      | 19               |
+| EUROPE      | 22               |
+| EUROPE      | 23               |
+| MIDDLE EAST | 4                |
+| MIDDLE EAST | 10               |
+| MIDDLE EAST | 11               |
+| MIDDLE EAST | 13               |
+| MIDDLE EAST | 20               |
++-------------+------------------+
+
+select
+  r_name,
+  count(r_nations.item.n_nationkey) as count,
+  sum(r_nations.item.n_nationkey) as sum,
+  avg(r_nations.item.n_nationkey) as avg,
+  min(r_nations.item.n_name) as minimum,
+  max(r_nations.item.n_name) as maximum,
+  ndv(r_nations.item.n_nationkey) as distinct_vals
+from
+  region, region.r_nations as r_nations
+group by r_name
+order by r_name;
++-------------+-------+-----+------+-----------+----------------+---------------+
+| r_name      | count | sum | avg  | minimum   | maximum        | distinct_vals |
++-------------+-------+-----+------+-----------+----------------+---------------+
+| AFRICA      | 5     | 50  | 10   | ALGERIA   | MOZAMBIQUE     | 5             |
+| AMERICA     | 5     | 47  | 9.4  | ARGENTINA | UNITED STATES  | 5             |
+| ASIA        | 5     | 68  | 13.6 | CHINA     | VIETNAM        | 5             |
+| EUROPE      | 5     | 77  | 15.4 | FRANCE    | UNITED KINGDOM | 5             |
+| MIDDLE EAST | 5     | 58  | 11.6 | EGYPT     | SAUDI ARABIA   | 5             |
++-------------+-------+-----+------+-----------+----------------+---------------+
+</code></pre>
+</div>
+
+    <p class="p">
+        <strong class="ph b">Restrictions:</strong>
+      </p>
+
+    <p class="p">
+        This function cannot be used in an analytic context. That is, the <code class="ph codeph">OVER()</code> clause is not allowed at all with this function.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+    <p class="p">
+      The following example queries a billion-row table to illustrate the relative performance of
+      <code class="ph codeph">COUNT(DISTINCT)</code> and <code class="ph codeph">NDV()</code>. It shows how <code class="ph codeph">COUNT(DISTINCT)</code>
+      gives a precise answer, but is inefficient for large-scale data where an approximate result is sufficient.
+      The <code class="ph codeph">NDV()</code> function gives an approximate result but is much faster.
+    </p>
+
+<pre class="pre codeblock"><code>select count(distinct col1) from sample_data;
++---------------------+
+| count(distinct col1)|
++---------------------+
+| 100000              |
++---------------------+
+Fetched 1 row(s) in 20.13s
+
+select cast(ndv(col1) as bigint) as col1 from sample_data;
++----------+
+| col1     |
++----------+
+| 139017   |
++----------+
+Fetched 1 row(s) in 8.91s
+</code></pre>
+
+    <p class="p">
+      The following example shows how you can code multiple <code class="ph codeph">NDV()</code> calls in a single query, to
+      easily learn which columns have substantially more or fewer distinct values. This technique is faster than
+      running a sequence of queries with <code class="ph codeph">COUNT(DISTINCT)</code> calls.
+    </p>
+
+<pre class="pre codeblock"><code>select cast(ndv(col1) as bigint) as col1, cast(ndv(col2) as bigint) as col2,
+    cast(ndv(col3) as bigint) as col3, cast(ndv(col4) as bigint) as col4
+  from sample_data;
++----------+-----------+------------+-----------+
+| col1     | col2      | col3       | col4      |
++----------+-----------+------------+-----------+
+| 139017   | 282       | 46         | 145636240 |
++----------+-----------+------------+-----------+
+Fetched 1 row(s) in 34.97s
+
+select count(distinct col1) from sample_data;
++---------------------+
+| count(distinct col1)|
++---------------------+
+| 100000              |
++---------------------+
+Fetched 1 row(s) in 20.13s
+
+select count(distinct col2) from sample_data;
++----------------------+
+| count(distinct col2) |
++----------------------+
+| 278                  |
++----------------------+
+Fetched 1 row(s) in 20.09s
+
+select count(distinct col3) from sample_data;
++-----------------------+
+| count(distinct col3)  |
++-----------------------+
+| 46                    |
++-----------------------+
+Fetched 1 row(s) in 19.12s
+
+select count(distinct col4) from sample_data;
++----------------------+
+| count(distinct col4) |
++----------------------+
+| 147135880            |
++----------------------+
+Fetched 1 row(s) in 266.95s
+</code></pre>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_aggregate_functions.html">Impala Aggregate Functions</a></div></div></nav></article></main></body></html>

[10/51] [partial] impala git commit: [DOCS] Impala doc site update for 3.0

Posted by mi...@apache.org.

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_replica_preference.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_replica_preference.html b/docs/build3x/html/topics/impala_replica_preference.html
new file mode 100644
index 0000000..38b8698
--- /dev/null
+++ b/docs/build3x/html/topics/impala_replica_preference.html
@@ -0,0 +1,68 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="replica_preference"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>REPLICA_PREFERENCE Query Option (Impala 2.7 or higher only)</title></head><body id="replica_preference"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">REPLICA_PREFERENCE Query Option (<span class="keyword">Impala 2.7</span> or higher only)</h1>
+
+
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+
+    </p>
+
+    <p class="p">
+      The <code class="ph codeph">REPLICA_PREFERENCE</code> query option lets you distribute the work more
+      evenly if hotspots and bottlenecks persist. It causes the access cost of all replicas of a
+      data block to be considered equal to or worse than the configured value. This allows
+      Impala to schedule reads to suboptimal replicas (e.g. local in the presence of cached
+      ones) in order to distribute the work across more executor nodes.
+    </p>
+
+    <p class="p">
+      Allowed values are: <code class="ph codeph">CACHE_LOCAL</code> (<code class="ph codeph">0</code>),
+      <code class="ph codeph">DISK_LOCAL</code> (<code class="ph codeph">2</code>), <code class="ph codeph">REMOTE</code>
+      (<code class="ph codeph">4</code>)
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Type:</strong> Enum
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Default:</strong> <code class="ph codeph">CACHE_LOCAL (0)</code>
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.7.0</span>
+      </p>
+
+    <p class="p">
+      <strong class="ph b">Usage Notes:</strong>
+    </p>
+
+    <p class="p">
+      By default Impala selects the best replica it can find in terms of access cost. The
+      preferred order is cached, local, and remote. With <code class="ph codeph">REPLICA_PREFERENCE</code>,
+      the preference of all replicas are capped at the selected value. For example, when
+      <code class="ph codeph">REPLICA_PREFERENCE</code> is set to <code class="ph codeph">DISK_LOCAL</code>, cached and
+      local replicas are treated with the equal preference. When set to
+      <code class="ph codeph">REMOTE</code>, all three types of replicas, cached, local, remote, are treated
+      with equal preference.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+    <p class="p">
+      <a class="xref" href="impala_perf_hdfs_caching.html#hdfs_caching">Using HDFS Caching with Impala (Impala 2.1 or higher only)</a>,
+      <a class="xref" href="impala_schedule_random_replica.html#schedule_random_replica">SCHEDULE_RANDOM_REPLICA Query Option (Impala 2.5 or higher only)</a>
+    </p>
+
+  </div>
+
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_request_pool.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_request_pool.html b/docs/build3x/html/topics/impala_request_pool.html
new file mode 100644
index 0000000..39be2da
--- /dev/null
+++ b/docs/build3x/html/topics/impala_request_pool.html
@@ -0,0 +1,35 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="request_pool"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>REQUEST_POOL Query Option</title></head><body id="request_pool"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">REQUEST_POOL Query Option</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+
+      The pool or queue name that queries should be submitted to. Only applies when you enable the Impala admission control feature.
+      Specifies the name of the pool used by requests from Impala to the resource manager.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Type:</strong> <code class="ph codeph">STRING</code>
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Default:</strong> empty (use the user-to-pool mapping defined by an <span class="keyword cmdname">impalad</span> startup option
+      in the Impala configuration file)
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+    <p class="p">
+      <a class="xref" href="impala_admission.html">Admission Control and Query Queuing</a>
+    </p>
+
+
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>

[19/51] [partial] impala git commit: [DOCS] Impala doc site update for 3.0

Posted by mi...@apache.org.

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_num_nodes.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_num_nodes.html b/docs/build3x/html/topics/impala_num_nodes.html
new file mode 100644
index 0000000..691bab9
--- /dev/null
+++ b/docs/build3x/html/topics/impala_num_nodes.html
@@ -0,0 +1,61 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="num_nodes"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>NUM_NODES Query Option</title></head><body id="num_nodes"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">NUM_NODES Query Option</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+
+      Limit the number of nodes that process a query, typically during debugging.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Type:</strong> numeric
+    </p>
+
+<p class="p">
+      <strong class="ph b">Allowed values:</strong> Only accepts the values 0
+      (meaning all nodes) or 1 (meaning all work is done on the coordinator node).
+</p>
+
+    <p class="p">
+      <strong class="ph b">Default:</strong> 0
+    </p>
+
+     <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+     <p class="p">
+       If you are diagnosing a problem that you suspect is due to a timing issue due to
+       distributed query processing, you can set <code class="ph codeph">NUM_NODES=1</code> to verify
+       if the problem still occurs when all the work is done on a single node.
+     </p>
+
+    <p class="p">
+        You might set the <code class="ph codeph">NUM_NODES</code> option to 1 briefly, during <code class="ph codeph">INSERT</code> or
+        <code class="ph codeph">CREATE TABLE AS SELECT</code> statements. Normally, those statements produce one or more data
+        files per data node. If the write operation involves small amounts of data, a Parquet table, and/or a
+        partitioned table, the default behavior could produce many small files when intuitively you might expect
+        only a single output file. <code class="ph codeph">SET NUM_NODES=1</code> turns off the <span class="q">"distributed"</span> aspect of the
+        write operation, making it more likely to produce only one or a few data files.
+      </p>
+
+    <div class="note warning note_warning"><span class="note__title warningtitle">Warning:</span>
+    <p class="p">
+      Because this option results in increased resource utilization on a single host,
+      it could cause problems due to contention with other Impala statements or
+      high resource usage. Symptoms could include queries running slowly, exceeding the memory limit,
+      or appearing to hang. Use it only in a single-user development/test environment;
+      <strong class="ph b">do not</strong> use it in a production environment or in a cluster with a high-concurrency
+      or high-volume or performance-critical workload.
+    </p>
+    </div>
+
+
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_num_scanner_threads.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_num_scanner_threads.html b/docs/build3x/html/topics/impala_num_scanner_threads.html
new file mode 100644
index 0000000..0617048
--- /dev/null
+++ b/docs/build3x/html/topics/impala_num_scanner_threads.html
@@ -0,0 +1,27 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="num_scanner_threads"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>NUM_SCANNER_THREADS Query Option</title></head><body id="num_scanner_threads"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">NUM_SCANNER_THREADS Query Option</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+
+      Maximum number of scanner threads (on each node) used for each query. By default, Impala uses as many cores
+      as are available (one thread per core). You might lower this value if queries are using excessive resources
+      on a busy cluster. Impala imposes a maximum value automatically, so a high value has no practical effect.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Type:</strong> numeric
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Default:</strong> 0
+    </p>
+
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_odbc.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_odbc.html b/docs/build3x/html/topics/impala_odbc.html
new file mode 100644
index 0000000..9d73173
--- /dev/null
+++ b/docs/build3x/html/topics/impala_odbc.html
@@ -0,0 +1,24 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_config.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="impala_odbc"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Configuring Impala to Work with ODBC</title></head><body id="impala_odbc"><main role="main"><article role="article" aria-labelledby="impala_odbc__odbc">
+
+  <h1 class="title topictitle1" id="impala_odbc__odbc">Configuring Impala to Work with ODBC</h1>
+
+
+  <div class="body conbody">
+
+    <p class="p">
+
+      Third-party products, especially business intelligence and reporting tools, can access Impala
+      using the ODBC protocol. For the best experience, ensure any third-party product you intend to use is supported.
+      Verifying support includes checking that the versions of Impala, ODBC, the operating system, the
+      Apache Hadoop distribution, and the third-party product have all been approved by the appropriate suppliers
+      for use together. To configure your systems to use ODBC, download and install a connector, typically from
+      the supplier of the third-party product or the Hadoop distribution.
+      You may need to sign in and accept license agreements before accessing the pages required for downloading
+      ODBC connectors.
+    </p>
+
+  </div>
+
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_config.html">Managing Impala</a></div></div></nav></article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_offset.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_offset.html b/docs/build3x/html/topics/impala_offset.html
new file mode 100644
index 0000000..b96e7af
--- /dev/null
+++ b/docs/build3x/html/topics/impala_offset.html
@@ -0,0 +1,67 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_select.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="offset"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>OFFSET Clause</title></head><body id="offset"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">OFFSET Clause</h1>
+
+
+  <div class="body conbody">
+
+    <p class="p">
+      The <code class="ph codeph">OFFSET</code> clause in a <code class="ph codeph">SELECT</code> query causes the result set to start some
+      number of rows after the logical first item. The result set is numbered starting from zero, so <code class="ph codeph">OFFSET
+      0</code> produces the same result as leaving out the <code class="ph codeph">OFFSET</code> clause. Always use this clause
+      in combination with <code class="ph codeph">ORDER BY</code> (so that it is clear which item should be first, second, and so
+      on) and <code class="ph codeph">LIMIT</code> (so that the result set covers a bounded range, such as items 0-9, 100-199,
+      and so on).
+    </p>
+
+    <p class="p">
+        In Impala 1.2.1 and higher, you can combine a <code class="ph codeph">LIMIT</code> clause with an <code class="ph codeph">OFFSET</code>
+        clause to produce a small result set that is different from a top-N query, for example, to return items 11
+        through 20. This technique can be used to simulate <span class="q">"paged"</span> results. Because Impala queries typically
+        involve substantial amounts of I/O, use this technique only for compatibility in cases where you cannot
+        rewrite the application logic. For best performance and scalability, wherever practical, query as many
+        items as you expect to need, cache them on the application side, and display small groups of results to
+        users using application logic.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+    <p class="p">
+      The following example shows how you could run a <span class="q">"paging"</span> query originally written for a traditional
+      database application. Because typical Impala queries process megabytes or gigabytes of data and read large
+      data files from disk each time, it is inefficient to run a separate query to retrieve each small group of
+      items. Use this technique only for compatibility while porting older applications, then rewrite the
+      application code to use a single query with a large result set, and display pages of results from the cached
+      result set.
+    </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; create table numbers (x int);
+[localhost:21000] &gt; insert into numbers select x from very_long_sequence;
+Inserted 1000000 rows in 1.34s
+[localhost:21000] &gt; select x from numbers order by x limit 5 offset 0;
++----+
+| x  |
++----+
+| 1  |
+| 2  |
+| 3  |
+| 4  |
+| 5  |
++----+
+[localhost:21000] &gt; select x from numbers order by x limit 5 offset 5;
++----+
+| x  |
++----+
+| 6  |
+| 7  |
+| 8  |
+| 9  |
+| 10 |
++----+
+</code></pre>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_select.html">SELECT Statement</a></div></div></nav></article></main></body></html>

[02/51] [partial] impala git commit: [DOCS] Impala doc site update for 3.0

Posted by mi...@apache.org.

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_shuffle_distinct_exprs.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_shuffle_distinct_exprs.html b/docs/build3x/html/topics/impala_shuffle_distinct_exprs.html
new file mode 100644
index 0000000..2799c1f
--- /dev/null
+++ b/docs/build3x/html/topics/impala_shuffle_distinct_exprs.html
@@ -0,0 +1,37 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="shuffle_distinct_exprs"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>SHUFFLE_DISTINCT_EXPRS Query Option</title></head><body id="shuffle_distinct_exprs"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">SHUFFLE_DISTINCT_EXPRS Query Option</h1>
+
+
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+      The <code class="ph codeph">SHUFFLE_DISTINCT_EXPRS</code> query option controls the
+      shuffling behavior when a query has both grouping and distinct expressions.
+      Impala can optionally include the distinct expressions in the hash exchange
+      to spread the data among more nodes. However, this plan requires one more
+      hash exchange phase.
+    </p>
+
+    <p class="p">
+      It is recommended that you turn off this option if the NDVs of the grouping
+      expressions are high.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Type:</strong> Boolean; recognized values are 1 and 0, or <code class="ph codeph">true</code> and <code class="ph codeph">false</code>;
+        any other value interpreted as <code class="ph codeph">false</code>
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Default:</strong> <code class="ph codeph">false</code>
+      </p>
+
+  </div>
+
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_smallint.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_smallint.html b/docs/build3x/html/topics/impala_smallint.html
new file mode 100644
index 0000000..86d5089
--- /dev/null
+++ b/docs/build3x/html/topics/impala_smallint.html
@@ -0,0 +1,127 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_datatypes.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="smallint"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>SMALLINT Data Type</title></head><body id="smallint"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">SMALLINT Data Type</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+      A 2-byte integer data type used in <code class="ph codeph">CREATE TABLE</code> and <code class="ph codeph">ALTER TABLE</code> statements.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+    <p class="p">
+      In the column definition of a <code class="ph codeph">CREATE TABLE</code> statement:
+    </p>
+
+<pre class="pre codeblock"><code><var class="keyword varname">column_name</var> SMALLINT</code></pre>
+
+    <p class="p">
+      <strong class="ph b">Range:</strong> -32768 .. 32767. There is no <code class="ph codeph">UNSIGNED</code> subtype.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Conversions:</strong> Impala automatically converts to a larger integer type (<code class="ph codeph">INT</code> or
+      <code class="ph codeph">BIGINT</code>) or a floating-point type (<code class="ph codeph">FLOAT</code> or <code class="ph codeph">DOUBLE</code>)
+      automatically. Use <code class="ph codeph">CAST()</code> to convert to <code class="ph codeph">TINYINT</code>, <code class="ph codeph">STRING</code>,
+      or <code class="ph codeph">TIMESTAMP</code>.
+      <span class="ph">
+          Casting an integer or floating-point value <code class="ph codeph">N</code> to
+        <code class="ph codeph">TIMESTAMP</code> produces a value that is <code class="ph codeph">N</code> seconds past the start of the epoch
+        date (January 1, 1970). By default, the result value represents a date and time in the UTC time zone.
+        If the setting <code class="ph codeph">--use_local_tz_for_unix_timestamp_conversions=true</code> is in effect,
+        the resulting <code class="ph codeph">TIMESTAMP</code> represents a date and time in the local time zone.
+        </span>
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+    <p class="p">
+      For a convenient and automated way to check the bounds of the <code class="ph codeph">SMALLINT</code> type, call the
+      functions <code class="ph codeph">MIN_SMALLINT()</code> and <code class="ph codeph">MAX_SMALLINT()</code>.
+    </p>
+
+    <p class="p">
+      If an integer value is too large to be represented as a <code class="ph codeph">SMALLINT</code>, use an
+      <code class="ph codeph">INT</code> instead.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">NULL considerations:</strong> Casting any non-numeric value to this type produces a <code class="ph codeph">NULL</code>
+        value.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>CREATE TABLE t1 (x SMALLINT);
+SELECT CAST(1000 AS SMALLINT);
+</code></pre>
+
+    <p class="p">
+        <strong class="ph b">Parquet considerations:</strong>
+      </p>
+
+
+
+    <p class="p">
+      Physically, Parquet files represent <code class="ph codeph">TINYINT</code> and <code class="ph codeph">SMALLINT</code> values as 32-bit
+      integers. Although Impala rejects attempts to insert out-of-range values into such columns, if you create a
+      new table with the <code class="ph codeph">CREATE TABLE ... LIKE PARQUET</code> syntax, any <code class="ph codeph">TINYINT</code> or
+      <code class="ph codeph">SMALLINT</code> columns in the original table turn into <code class="ph codeph">INT</code> columns in the new
+      table.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Partitioning:</strong> Prefer to use this type for a partition key column. Impala can process the numeric
+        type more efficiently than a <code class="ph codeph">STRING</code> representation of the value.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">HBase considerations:</strong> This data type is fully compatible with HBase tables.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Text table considerations:</strong> Values of this type are potentially larger in text tables than in tables
+        using Parquet or other binary formats.
+      </p>
+
+
+
+    <p class="p">
+        <strong class="ph b">Internal details:</strong> Represented in memory as a 2-byte value.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Added in:</strong> Available in all versions of Impala.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Column statistics considerations:</strong> Because this type has a fixed size, the maximum and average size
+        fields are always filled in for column statistics, even before you run the <code class="ph codeph">COMPUTE STATS</code>
+        statement.
+      </p>
+
+
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+    <p class="p">
+      <a class="xref" href="impala_literals.html#numeric_literals">Numeric Literals</a>, <a class="xref" href="impala_tinyint.html#tinyint">TINYINT Data Type</a>,
+      <a class="xref" href="impala_smallint.html#smallint">SMALLINT Data Type</a>, <a class="xref" href="impala_int.html#int">INT Data Type</a>,
+      <a class="xref" href="impala_bigint.html#bigint">BIGINT Data Type</a>, <a class="xref" href="impala_decimal.html#decimal">DECIMAL Data Type (Impala 3.0 or higher only)</a>,
+      <a class="xref" href="impala_math_functions.html#math_functions">Impala Mathematical Functions</a>
+    </p>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_datatypes.html">Data Types</a></div></div></nav></article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_ssl.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_ssl.html b/docs/build3x/html/topics/impala_ssl.html
new file mode 100644
index 0000000..a91b69b
--- /dev/null
+++ b/docs/build3x/html/topics/impala_ssl.html
@@ -0,0 +1,180 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_security.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="ssl"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Configuring TLS/SSL for Impala</title></head><body id="ssl"><main role="main"><article role="article" aria-labelledby="ssl__tls">
+
+  <h1 class="title topictitle1" id="ssl__tls">Configuring TLS/SSL for Impala</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+
+      Impala supports TLS/SSL network encryption, between Impala and client
+      programs, and between the Impala-related daemons running on different nodes
+      in the cluster. This feature is important when you also use other features such as Kerberos
+      authentication or Sentry authorization, where credentials are being
+      transmitted back and forth.
+    </p>
+
+  </div>
+
+  <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_security.html">Impala Security</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="ssl__concept_q1p_j2d_rp">
+
+    <h2 class="title topictitle2" id="ariaid-title2">Using the Command Line</h2>
+
+    <div class="body conbody">
+      <p class="p">
+        To enable SSL for when client applications connect to Impala, add both of the following flags to the <span class="keyword cmdname">impalad</span> startup options:
+      </p>
+
+      <ul class="ul" id="concept_q1p_j2d_rp__ul_i2p_m2d_rp">
+        <li class="li">
+          <code class="ph codeph">--ssl_server_certificate</code>: the full path to the server certificate, on the local filesystem.
+        </li>
+
+        <li class="li">
+          <code class="ph codeph">--ssl_private_key</code>: the full path to the server private key, on the local filesystem.
+        </li>
+      </ul>
+
+      <p class="p">
+        In <span class="keyword">Impala 2.3</span> and higher, Impala can also use SSL for its own internal communication between the
+        <span class="keyword cmdname">impalad</span>, <code class="ph codeph">statestored</code>, and <code class="ph codeph">catalogd</code> daemons.
+        To enable this additional SSL encryption, set the <code class="ph codeph">--ssl_server_certificate</code>
+        and <code class="ph codeph">--ssl_private_key</code> flags in the startup options for
+        <span class="keyword cmdname">impalad</span>, <span class="keyword cmdname">catalogd</span>, and <span class="keyword cmdname">statestored</span>,
+        and also add the <code class="ph codeph">--ssl_client_ca_certificate</code> flag for all three of those daemons.
+      </p>
+
+      <div class="note warning note_warning"><span class="note__title warningtitle">Warning:</span>
+        Prior to <span class="keyword">Impala 2.3.2</span>, you could enable Kerberos authentication between Impala internal components,
+        or SSL encryption between Impala internal components, but not both at the same time.
+        This restriction has now been lifted.
+        See <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2598" target="_blank">IMPALA-2598</a>
+        to see the maintenance releases for different levels of Impala where the fix has been published.
+      </div>
+
+      <p class="p">
+        If either of these flags are set, both must be set. In that case, Impala starts listening for Beeswax and HiveServer2 requests on
+        SSL-secured ports only. (The port numbers stay the same; see <a class="xref" href="impala_ports.html#ports">Ports Used by Impala</a> for details.)
+      </p>
+
+      <p class="p">
+        Since Impala uses passphrase-less certificates in PEM format, you can reuse a host's existing Java keystore
+        by using the <code class="ph codeph">openssl</code> toolkit to convert it to the PEM format.
+      </p>
+
+      <section class="section" id="concept_q1p_j2d_rp__secref"><h3 class="title sectiontitle">Configuring TLS/SSL Communication for the Impala Shell</h3>
+
+
+
+        <p class="p">
+          With SSL enabled for Impala, use the following options when starting the
+          <span class="keyword cmdname">impala-shell</span> interpreter:
+        </p>
+
+        <ul class="ul" id="concept_q1p_j2d_rp__ul_kgp_m2d_rp">
+          <li class="li">
+            <code class="ph codeph">--ssl</code>: enables TLS/SSL for <span class="keyword cmdname">impala-shell</span>.
+          </li>
+
+          <li class="li">
+            <code class="ph codeph">--ca_cert</code>: the local pathname pointing to the third-party CA certificate, or to a copy of the server
+            certificate for self-signed server certificates.
+          </li>
+        </ul>
+
+        <p class="p">
+          If <code class="ph codeph">--ca_cert</code> is not set, <span class="keyword cmdname">impala-shell</span> enables TLS/SSL, but does not validate the server
+          certificate. This is useful for connecting to a known-good Impala that is only running over TLS/SSL, when a copy of the
+          certificate is not available (such as when debugging customer installations).
+        </p>
+
+      </section>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="ssl__ssl_jdbc_odbc">
+    <h2 class="title topictitle2" id="ariaid-title3">Using TLS/SSL with Business Intelligence Tools</h2>
+    <div class="body conbody">
+      <p class="p">
+        You can use Kerberos authentication, TLS/SSL encryption, or both to secure
+        connections from JDBC and ODBC applications to Impala.
+        See <a class="xref" href="impala_jdbc.html#impala_jdbc">Configuring Impala to Work with JDBC</a> and <a class="xref" href="impala_odbc.html#impala_odbc">Configuring Impala to Work with ODBC</a>
+        for details.
+      </p>
+
+      <p class="p">
+        Prior to <span class="keyword">Impala 2.5</span>, the Hive JDBC driver did not support connections that use both Kerberos authentication
+        and SSL encryption. If your cluster is running an older release that has this restriction,
+        use an alternative JDBC driver that supports
+        both of these security features.
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="ssl__tls_min_version">
+
+    <h2 class="title topictitle2" id="ariaid-title4">Specifying TLS/SSL Minimum Allowed Version and Ciphers</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        Depending on your cluster configuration and the security practices in your
+        organization, you might need to restrict the allowed versions of TLS/SSL
+        used by Impala. Older TLS/SSL versions might have vulnerabilities or lack
+        certain features. In <span class="keyword">Impala 2.10</span>, you can use startup
+        options for the <span class="keyword cmdname">impalad</span>, <span class="keyword cmdname">catalogd</span>,
+        and <span class="keyword cmdname">statestored</span> daemons to specify a minimum allowed
+        version of TLS/SSL.
+      </p>
+
+      <p class="p">
+        Specify one of the following values for the <code class="ph codeph">--ssl_minimum_version</code>
+        configuration setting:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          <p class="p">
+            <code class="ph codeph">tlsv1</code>: Allow any TLS version of 1.0 or higher.
+            This setting is the default when TLS/SSL is enabled.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            <code class="ph codeph">tlsv1.1</code>: Allow any TLS version of 1.1 or higher.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            <code class="ph codeph">tlsv1.2</code>: Allow any TLS version of 1.2 or higher.
+          </p>
+        </li>
+      </ul>
+
+      <p class="p">
+        Along with specifying the version, you can also specify the allowed set of TLS ciphers
+        by using the <code class="ph codeph">--ssl_cipher_list</code> configuration setting. The argument to
+        this option is a list of keywords, separated by colons, commas, or spaces, and
+        optionally including other notation. For example:
+      </p>
+
+<pre class="pre codeblock"><code>
+--ssl_cipher_list="RC4-SHA,RC4-MD5"
+</code></pre>
+
+      <p class="p">
+        By default, the cipher list is empty, and Impala uses the default cipher list for
+        the underlying platform. See the output of <span class="keyword cmdname">man ciphers</span> for the full
+        set of keywords and notation allowed in the argument string.
+      </p>
+
+    </div>
+
+  </article>
+
+</article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_stddev.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_stddev.html b/docs/build3x/html/topics/impala_stddev.html
new file mode 100644
index 0000000..e775089
--- /dev/null
+++ b/docs/build3x/html/topics/impala_stddev.html
@@ -0,0 +1,121 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_aggregate_functions.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="stddev"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>STDDEV, STDDEV_SAMP, STDDEV_POP Functions</title></head><body id="stddev"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">STDDEV, STDDEV_SAMP, STDDEV_POP Functions</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+
+
+
+      An aggregate function that
+      <a class="xref" href="http://en.wikipedia.org/wiki/Standard_deviation" target="_blank">standard
+      deviation</a> of a set of numbers.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>{ STDDEV | STDDEV_SAMP | STDDEV_POP } ([DISTINCT | ALL] <var class="keyword varname">expression</var>)</code></pre>
+
+    <p class="p">
+      This function works with any numeric data type.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Return type:</strong> <code class="ph codeph">DOUBLE</code> in Impala 2.0 and higher; <code class="ph codeph">STRING</code> in earlier
+        releases
+      </p>
+
+    <p class="p">
+      This function is typically used in mathematical formulas related to probability distributions.
+    </p>
+
+    <p class="p">
+      The <code class="ph codeph">STDDEV_POP()</code> and <code class="ph codeph">STDDEV_SAMP()</code> functions compute the population
+      standard deviation and sample standard deviation, respectively, of the input values.
+      (<code class="ph codeph">STDDEV()</code> is an alias for <code class="ph codeph">STDDEV_SAMP()</code>.) Both functions evaluate all input
+      rows matched by the query. The difference is that <code class="ph codeph">STDDEV_SAMP()</code> is scaled by
+      <code class="ph codeph">1/(N-1)</code> while <code class="ph codeph">STDDEV_POP()</code> is scaled by <code class="ph codeph">1/N</code>.
+    </p>
+
+    <p class="p">
+      If no input rows match the query, the result of any of these functions is <code class="ph codeph">NULL</code>. If a single
+      input row matches the query, the result of any of these functions is <code class="ph codeph">"0.0"</code>.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+    <p class="p">
+      This example demonstrates how <code class="ph codeph">STDDEV()</code> and <code class="ph codeph">STDDEV_SAMP()</code> return the same
+      result, while <code class="ph codeph">STDDEV_POP()</code> uses a slightly different calculation to reflect that the input
+      data is considered part of a larger <span class="q">"population"</span>.
+    </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; select stddev(score) from test_scores;
++---------------+
+| stddev(score) |
++---------------+
+| 28.5          |
++---------------+
+[localhost:21000] &gt; select stddev_samp(score) from test_scores;
++--------------------+
+| stddev_samp(score) |
++--------------------+
+| 28.5               |
++--------------------+
+[localhost:21000] &gt; select stddev_pop(score) from test_scores;
++-------------------+
+| stddev_pop(score) |
++-------------------+
+| 28.4858           |
++-------------------+
+</code></pre>
+
+    <p class="p">
+      This example demonstrates that, because the return value of these aggregate functions is a
+      <code class="ph codeph">STRING</code>, you must currently convert the result with <code class="ph codeph">CAST</code>.
+    </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; create table score_stats as select cast(stddev(score) as decimal(7,4)) `standard_deviation`, cast(variance(score) as decimal(7,4)) `variance` from test_scores;
++-------------------+
+| summary           |
++-------------------+
+| Inserted 1 row(s) |
++-------------------+
+[localhost:21000] &gt; desc score_stats;
++--------------------+--------------+---------+
+| name               | type         | comment |
++--------------------+--------------+---------+
+| standard_deviation | decimal(7,4) |         |
+| variance           | decimal(7,4) |         |
++--------------------+--------------+---------+
+</code></pre>
+
+    <p class="p">
+        <strong class="ph b">Restrictions:</strong>
+      </p>
+
+    <p class="p">
+        This function cannot be used in an analytic context. That is, the <code class="ph codeph">OVER()</code> clause is not allowed at all with this function.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+    <p class="p">
+      The <code class="ph codeph">STDDEV()</code>, <code class="ph codeph">STDDEV_POP()</code>, and <code class="ph codeph">STDDEV_SAMP()</code> functions
+      compute the standard deviation (square root of the variance) based on the results of
+      <code class="ph codeph">VARIANCE()</code>, <code class="ph codeph">VARIANCE_POP()</code>, and <code class="ph codeph">VARIANCE_SAMP()</code>
+      respectively. See <a class="xref" href="impala_variance.html#variance">VARIANCE, VARIANCE_SAMP, VARIANCE_POP, VAR_SAMP, VAR_POP Functions</a> for details about the variance property.
+    </p>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_aggregate_functions.html">Impala Aggregate Functions</a></div></div></nav></article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_string.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_string.html b/docs/build3x/html/topics/impala_string.html
new file mode 100644
index 0000000..6f594ca
--- /dev/null
+++ b/docs/build3x/html/topics/impala_string.html
@@ -0,0 +1,197 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_datatypes.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="string"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>STRING Data Type</title></head><body id="string"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">STRING Data Type</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+      A data type used in <code class="ph codeph">CREATE TABLE</code> and <code class="ph codeph">ALTER TABLE</code> statements.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+    <p class="p">
+      In the column definition of a <code class="ph codeph">CREATE TABLE</code> statement:
+    </p>
+
+<pre class="pre codeblock"><code><var class="keyword varname">column_name</var> STRING</code></pre>
+
+    <p class="p">
+      <strong class="ph b">Length:</strong> Maximum of 32,767 bytes. Do not use any length constraint when declaring
+      <code class="ph codeph">STRING</code> columns, as you might be familiar with from <code class="ph codeph">VARCHAR</code>,
+      <code class="ph codeph">CHAR</code>, or similar column types from relational database systems. <span class="ph">If you do
+      need to manipulate string values with precise or maximum lengths, in Impala 2.0 and higher you can declare
+      columns as <code class="ph codeph">VARCHAR(<var class="keyword varname">max_length</var>)</code> or
+      <code class="ph codeph">CHAR(<var class="keyword varname">length</var>)</code>, but for best performance use <code class="ph codeph">STRING</code>
+      where practical.</span>
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Character sets:</strong> For full support in all Impala subsystems, restrict string values to the ASCII
+      character set. Although some UTF-8 character data can be stored in Impala and retrieved through queries, UTF-8 strings
+      containing non-ASCII characters are not guaranteed to work properly in combination with many SQL aspects,
+      including but not limited to:
+    </p>
+    <ul class="ul">
+      <li class="li">
+        String manipulation functions.
+      </li>
+      <li class="li">
+        Comparison operators.
+      </li>
+      <li class="li">
+        The <code class="ph codeph">ORDER BY</code> clause.
+      </li>
+      <li class="li">
+        Values in partition key columns.
+      </li>
+    </ul>
+
+    <p class="p">
+      For any national language aspects such as
+      collation order or interpreting extended ASCII variants such as ISO-8859-1 or ISO-8859-2 encodings, Impala
+      does not include such metadata with the table definition. If you need to sort, manipulate, or display data
+      depending on those national language characteristics of string data, use logic on the application side.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Conversions:</strong>
+    </p>
+
+    <ul class="ul">
+      <li class="li">
+        <p class="p">
+          Impala does not automatically convert <code class="ph codeph">STRING</code> to any numeric type. Impala does
+          automatically convert <code class="ph codeph">STRING</code> to <code class="ph codeph">TIMESTAMP</code> if the value matches one of
+          the accepted <code class="ph codeph">TIMESTAMP</code> formats; see <a class="xref" href="impala_timestamp.html#timestamp">TIMESTAMP Data Type</a> for
+          details.
+        </p>
+      </li>
+
+      <li class="li">
+        <p class="p">
+          You can use <code class="ph codeph">CAST()</code> to convert <code class="ph codeph">STRING</code> values to
+          <code class="ph codeph">TINYINT</code>, <code class="ph codeph">SMALLINT</code>, <code class="ph codeph">INT</code>, <code class="ph codeph">BIGINT</code>,
+          <code class="ph codeph">FLOAT</code>, <code class="ph codeph">DOUBLE</code>, or <code class="ph codeph">TIMESTAMP</code>.
+        </p>
+      </li>
+
+      <li class="li">
+        <p class="p">
+          You cannot directly cast a <code class="ph codeph">STRING</code> value to <code class="ph codeph">BOOLEAN</code>. You can use a
+          <code class="ph codeph">CASE</code> expression to evaluate string values such as <code class="ph codeph">'T'</code>,
+          <code class="ph codeph">'true'</code>, and so on and return Boolean <code class="ph codeph">true</code> and <code class="ph codeph">false</code>
+          values as appropriate.
+        </p>
+      </li>
+
+      <li class="li">
+        <p class="p">
+          You can cast a <code class="ph codeph">BOOLEAN</code> value to <code class="ph codeph">STRING</code>, returning <code class="ph codeph">'1'</code>
+          for <code class="ph codeph">true</code> values and <code class="ph codeph">'0'</code> for <code class="ph codeph">false</code> values.
+        </p>
+      </li>
+    </ul>
+
+    <p class="p">
+        <strong class="ph b">Partitioning:</strong>
+      </p>
+
+    <p class="p">
+      Although it might be convenient to use <code class="ph codeph">STRING</code> columns for partition keys, even when those
+      columns contain numbers, for performance and scalability it is much better to use numeric columns as
+      partition keys whenever practical. Although the underlying HDFS directory name might be the same in either
+      case, the in-memory storage for the partition key columns is more compact, and computations are faster, if
+      partition key columns such as <code class="ph codeph">YEAR</code>, <code class="ph codeph">MONTH</code>, <code class="ph codeph">DAY</code> and so on
+      are declared as <code class="ph codeph">INT</code>, <code class="ph codeph">SMALLINT</code>, and so on.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Zero-length strings:</strong> For purposes of clauses such as <code class="ph codeph">DISTINCT</code> and <code class="ph codeph">GROUP
+        BY</code>, Impala considers zero-length strings (<code class="ph codeph">""</code>), <code class="ph codeph">NULL</code>, and space
+        to all be different values.
+      </p>
+
+
+
+
+
+    <p class="p">
+        <strong class="ph b">Text table considerations:</strong> Values of this type are potentially larger in text tables than in tables
+        using Parquet or other binary formats.
+      </p>
+
+    <p class="p"><strong class="ph b">Avro considerations:</strong></p>
+    <p class="p">
+        The Avro specification allows string values up to 2**64 bytes in length.
+        Impala queries for Avro tables use 32-bit integers to hold string lengths.
+        In <span class="keyword">Impala 2.5</span> and higher, Impala truncates <code class="ph codeph">CHAR</code>
+        and <code class="ph codeph">VARCHAR</code> values in Avro tables to (2**31)-1 bytes.
+        If a query encounters a <code class="ph codeph">STRING</code> value longer than (2**31)-1
+        bytes in an Avro table, the query fails. In earlier releases,
+        encountering such long values in an Avro table could cause a crash.
+      </p>
+
+
+
+
+
+
+
+    <p class="p">
+        <strong class="ph b">Column statistics considerations:</strong> Because the values of this type have variable size, none of the
+        column statistics fields are filled in until you run the <code class="ph codeph">COMPUTE STATS</code> statement.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+    <p class="p">
+      The following examples demonstrate double-quoted and single-quoted string literals, and required escaping for
+      quotation marks within string literals:
+    </p>
+
+<pre class="pre codeblock"><code>SELECT 'I am a single-quoted string';
+SELECT "I am a double-quoted string";
+SELECT 'I\'m a single-quoted string with an apostrophe';
+SELECT "I\'m a double-quoted string with an apostrophe";
+SELECT 'I am a "short" single-quoted string containing quotes';
+SELECT "I am a \"short\" double-quoted string containing quotes";
+</code></pre>
+
+    <p class="p">
+      The following examples demonstrate calls to string manipulation functions to concatenate strings, convert
+      numbers to strings, or pull out substrings:
+    </p>
+
+<pre class="pre codeblock"><code>SELECT CONCAT("Once upon a time, there were ", CAST(3 AS STRING), ' little pigs.');
+SELECT SUBSTR("hello world",7,5);
+</code></pre>
+
+    <p class="p">
+      The following examples show how to perform operations on <code class="ph codeph">STRING</code> columns within a table:
+    </p>
+
+<pre class="pre codeblock"><code>CREATE TABLE t1 (s1 STRING, s2 STRING);
+INSERT INTO t1 VALUES ("hello", 'world'), (CAST(7 AS STRING), "wonders");
+SELECT s1, s2, length(s1) FROM t1 WHERE s2 LIKE 'w%';
+</code></pre>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+    <p class="p">
+      <a class="xref" href="impala_literals.html#string_literals">String Literals</a>, <a class="xref" href="impala_char.html#char">CHAR Data Type (Impala 2.0 or higher only)</a>,
+      <a class="xref" href="impala_varchar.html#varchar">VARCHAR Data Type (Impala 2.0 or higher only)</a>, <a class="xref" href="impala_string_functions.html#string_functions">Impala String Functions</a>,
+      <a class="xref" href="impala_datetime_functions.html#datetime_functions">Impala Date and Time Functions</a>
+    </p>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_datatypes.html">Data Types</a></div></div></nav></article></main></body></html>

[39/51] [partial] impala git commit: [DOCS] Impala doc site update for 3.0

Posted by mi...@apache.org.

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_create_table.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_create_table.html b/docs/build3x/html/topics/impala_create_table.html
new file mode 100644
index 0000000..e2c3528
--- /dev/null
+++ b/docs/build3x/html/topics/impala_create_table.html
@@ -0,0 +1,1346 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_langref_sql.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="create_table"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>CREATE TABLE Statement</title></head><body class="impala sql_statement" id="create_table"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1 impala_title sql_statement_title" id="ariaid-title1">CREATE TABLE Statement</h1>
+
+
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+
+      Creates a new table and specifies its characteristics. While creating a table, you
+      optionally specify aspects such as:
+    </p>
+
+    <ul class="ul">
+      <li class="li">
+        Whether the table is internal or external.
+      </li>
+
+      <li class="li">
+        The columns and associated data types.
+      </li>
+
+      <li class="li">
+        The columns used for physically partitioning the data.
+      </li>
+
+      <li class="li">
+        The file format for data files.
+      </li>
+
+      <li class="li">
+        The HDFS directory where the data files are located.
+      </li>
+    </ul>
+
+    <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+    <p class="p">
+      The general syntax for creating a table and specifying its columns is as follows:
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Explicit column definitions:</strong>
+    </p>
+
+<pre class="pre codeblock"><code>CREATE [EXTERNAL] TABLE [IF NOT EXISTS] [<var class="keyword varname">db_name</var>.]<var class="keyword varname">table_name</var>
+  (<var class="keyword varname">col_name</var> <var class="keyword varname">data_type</var>
+    [COMMENT '<var class="keyword varname">col_comment</var>']
+    [, ...]
+  )
+  [PARTITIONED BY (<var class="keyword varname">col_name</var> <var class="keyword varname">data_type</var> [COMMENT '<var class="keyword varname">col_comment</var>'], ...)]
+  <span class="ph">[SORT BY ([<var class="keyword varname">column</var> [, <var class="keyword varname">column</var> ...]])]</span>
+  [COMMENT '<var class="keyword varname">table_comment</var>']
+  [WITH SERDEPROPERTIES ('<var class="keyword varname">key1</var>'='<var class="keyword varname">value1</var>', '<var class="keyword varname">key2</var>'='<var class="keyword varname">value2</var>', ...)]
+  [
+   [ROW FORMAT <var class="keyword varname">row_format</var>] [STORED AS <var class="keyword varname">file_format</var>]
+  ]
+  [LOCATION '<var class="keyword varname">hdfs_path</var>']
+  [TBLPROPERTIES ('<var class="keyword varname">key1</var>'='<var class="keyword varname">value1</var>', '<var class="keyword varname">key2</var>'='<var class="keyword varname">value2</var>', ...)]
+<span class="ph">  [CACHED IN '<var class="keyword varname">pool_name</var>'</span> <span class="ph">[WITH REPLICATION = <var class="keyword varname">integer</var>]</span> | UNCACHED]
+</code></pre>
+
+    <p class="p">
+      <strong class="ph b">CREATE TABLE AS SELECT:</strong>
+    </p>
+
+<pre class="pre codeblock"><code>CREATE [EXTERNAL] TABLE [IF NOT EXISTS] <var class="keyword varname">db_name</var>.]<var class="keyword varname">table_name</var>
+  <span class="ph">[PARTITIONED BY (<var class="keyword varname">col_name</var>[, ...])]</span>
+  <span class="ph">[SORT BY ([<var class="keyword varname">column</var> [, <var class="keyword varname">column</var> ...]])]</span>
+  [COMMENT '<var class="keyword varname">table_comment</var>']
+  [WITH SERDEPROPERTIES ('<var class="keyword varname">key1</var>'='<var class="keyword varname">value1</var>', '<var class="keyword varname">key2</var>'='<var class="keyword varname">value2</var>', ...)]
+  [
+   [ROW FORMAT <var class="keyword varname">row_format</var>] <span class="ph">[STORED AS <var class="keyword varname">ctas_file_format</var>]</span>
+  ]
+  [LOCATION '<var class="keyword varname">hdfs_path</var>']
+  [TBLPROPERTIES ('<var class="keyword varname">key1</var>'='<var class="keyword varname">value1</var>', '<var class="keyword varname">key2</var>'='<var class="keyword varname">value2</var>', ...)]
+<span class="ph">  [CACHED IN '<var class="keyword varname">pool_name</var>'</span> <span class="ph">[WITH REPLICATION = <var class="keyword varname">integer</var>]</span> | UNCACHED]
+AS
+  <var class="keyword varname">select_statement</var></code></pre>
+
+<pre class="pre codeblock"><code>primitive_type:
+    TINYINT
+  | SMALLINT
+  | INT
+  | BIGINT
+  | BOOLEAN
+  | FLOAT
+  | DOUBLE
+  <span class="ph">| DECIMAL</span>
+  | STRING
+  <span class="ph">| CHAR</span>
+  <span class="ph">| VARCHAR</span>
+  | TIMESTAMP
+
+<span class="ph">complex_type:
+    struct_type
+  | array_type
+  | map_type
+
+struct_type: STRUCT &lt; <var class="keyword varname">name</var> : <var class="keyword varname">primitive_or_complex_type</var> [COMMENT '<var class="keyword varname">comment_string</var>'], ... &gt;
+
+array_type: ARRAY &lt; <var class="keyword varname">primitive_or_complex_type</var> &gt;
+
+map_type: MAP &lt; <var class="keyword varname">primitive_type</var>, <var class="keyword varname">primitive_or_complex_type</var> &gt;
+</span>
+row_format:
+  DELIMITED [FIELDS TERMINATED BY '<var class="keyword varname">char</var>' [ESCAPED BY '<var class="keyword varname">char</var>']]
+  [LINES TERMINATED BY '<var class="keyword varname">char</var>']
+
+file_format:
+    PARQUET
+  | TEXTFILE
+  | AVRO
+  | SEQUENCEFILE
+  | RCFILE
+
+<span class="ph">ctas_file_format:
+    PARQUET
+  | TEXTFILE</span>
+</code></pre>
+
+    <p class="p">
+      <strong class="ph b">Column definitions inferred from data file:</strong>
+    </p>
+
+<pre class="pre codeblock"><code>CREATE [EXTERNAL] TABLE [IF NOT EXISTS] [<var class="keyword varname">db_name</var>.]<var class="keyword varname">table_name</var>
+  LIKE PARQUET '<var class="keyword varname">hdfs_path_of_parquet_file</var>'
+  <span class="ph">[SORT BY ([<var class="keyword varname">column</var> [, <var class="keyword varname">column</var> ...]])]</span>
+  [COMMENT '<var class="keyword varname">table_comment</var>']
+  [PARTITIONED BY (<var class="keyword varname">col_name</var> <var class="keyword varname">data_type</var> [COMMENT '<var class="keyword varname">col_comment</var>'], ...)]
+  [WITH SERDEPROPERTIES ('<var class="keyword varname">key1</var>'='<var class="keyword varname">value1</var>', '<var class="keyword varname">key2</var>'='<var class="keyword varname">value2</var>', ...)]
+  [
+   [ROW FORMAT <var class="keyword varname">row_format</var>] [STORED AS <var class="keyword varname">file_format</var>]
+  ]
+  [LOCATION '<var class="keyword varname">hdfs_path</var>']
+  [TBLPROPERTIES ('<var class="keyword varname">key1</var>'='<var class="keyword varname">value1</var>', '<var class="keyword varname">key2</var>'='<var class="keyword varname">value2</var>', ...)]
+<span class="ph">  [CACHED IN '<var class="keyword varname">pool_name</var>'</span> <span class="ph">[WITH REPLICATION = <var class="keyword varname">integer</var>]</span> | UNCACHED]
+data_type:
+    <var class="keyword varname">primitive_type</var>
+  | array_type
+  | map_type
+  | struct_type
+</code></pre>
+
+    <p class="p">
+      <strong class="ph b">Kudu tables:</strong>
+    </p>
+
+<pre class="pre codeblock"><code>CREATE TABLE [IF NOT EXISTS] [<var class="keyword varname">db_name</var>.]<var class="keyword varname">table_name</var>
+  (<var class="keyword varname">col_name</var> <var class="keyword varname">data_type</var>
+    <span class="ph">[<var class="keyword varname">kudu_column_attribute</var> ...]</span>
+    [COMMENT '<var class="keyword varname">col_comment</var>']
+    [, ...]
+    [PRIMARY KEY (<var class="keyword varname">col_name</var>[, ...])]
+  )
+  <span class="ph">[PARTITION BY <var class="keyword varname">kudu_partition_clause</var>]</span>
+  [COMMENT '<var class="keyword varname">table_comment</var>']
+  STORED AS KUDU
+  [TBLPROPERTIES ('<var class="keyword varname">key1</var>'='<var class="keyword varname">value1</var>', '<var class="keyword varname">key2</var>'='<var class="keyword varname">value2</var>', ...)]
+</code></pre>
+
+    <div class="p">
+      <strong class="ph b">Kudu column attributes:</strong>
+<pre class="pre codeblock"><code>
+  PRIMARY KEY
+| [NOT] NULL
+| ENCODING <var class="keyword varname">codec</var>
+| COMPRESSION <var class="keyword varname">algorithm</var>
+| DEFAULT <var class="keyword varname">constant</var>
+| BLOCK_SIZE <var class="keyword varname">number</var>
+</code></pre>
+    </div>
+
+    <div class="p">
+      <strong class="ph b">kudu_partition_clause:</strong>
+<pre class="pre codeblock"><code>
+kudu_partition_clause ::= [<var class="keyword varname">hash_clause</var>] [, <var class="keyword varname">range_clause</var> [ , <var class="keyword varname">range_clause</var> ] ]
+
+hash_clause ::=
+  HASH [ (<var class="keyword varname">pk_col</var> [, ...]) ]
+    PARTITIONS <var class="keyword varname">n</var>
+
+range_clause ::=
+  RANGE [ (<var class="keyword varname">pk_col</var> [, ...]) ]
+  (
+    {
+      PARTITION <var class="keyword varname">constant_expression</var> <var class="keyword varname">range_comparison_operator</var> VALUES <var class="keyword varname">range_comparison_operator</var> <var class="keyword varname">constant_expression</var>
+      | PARTITION VALUE = <var class="keyword varname">constant_expression_or_tuple</var>
+    }
+   [, ...]
+  )
+
+range_comparison_operator ::= { &lt; | &lt;= }
+</code></pre>
+    </div>
+
+    <p class="p">
+      <strong class="ph b">External Kudu tables:</strong>
+    </p>
+
+<pre class="pre codeblock"><code>CREATE EXTERNAL TABLE [IF NOT EXISTS] [<var class="keyword varname">db_name</var>.]<var class="keyword varname">table_name</var>
+  [COMMENT '<var class="keyword varname">table_comment</var>']
+  STORED AS KUDU
+  [TBLPROPERTIES ('kudu.table_name'='<var class="keyword varname">internal_kudu_name</var>')]
+</code></pre>
+
+    <p class="p">
+      <strong class="ph b">CREATE TABLE AS SELECT for Kudu tables:</strong>
+    </p>
+
+<pre class="pre codeblock"><code>CREATE TABLE [IF NOT EXISTS] <var class="keyword varname">db_name</var>.]<var class="keyword varname">table_name</var>
+  [PRIMARY KEY (<var class="keyword varname">col_name</var>[, ...])]
+  [PARTITION BY <var class="keyword varname">kudu_partition_clause</var>]
+  [COMMENT '<var class="keyword varname">table_comment</var>']
+  STORED AS KUDU
+  [TBLPROPERTIES ('<var class="keyword varname">key1</var>'='<var class="keyword varname">value1</var>', '<var class="keyword varname">key2</var>'='<var class="keyword varname">value2</var>', ...)]
+AS
+  <var class="keyword varname">select_statement</var></code></pre>
+
+    <p class="p">
+        <strong class="ph b">Statement type:</strong> DDL
+      </p>
+
+
+
+    <p class="p">
+      <strong class="ph b">Column definitions:</strong>
+    </p>
+
+    <p class="p">
+      Depending on the form of the <code class="ph codeph">CREATE TABLE</code> statement, the column
+      definitions are required or not allowed.
+    </p>
+
+    <p class="p">
+      With the <code class="ph codeph">CREATE TABLE AS SELECT</code> and <code class="ph codeph">CREATE TABLE LIKE</code>
+      syntax, you do not specify the columns at all; the column names and types are derived from
+      the source table, query, or data file.
+    </p>
+
+    <p class="p">
+      With the basic <code class="ph codeph">CREATE TABLE</code> syntax, you must list one or more columns,
+      its name, type, and optionally a comment, in addition to any columns used as partitioning
+      keys. There is one exception where the column list is not required: when creating an Avro
+      table with the <code class="ph codeph">STORED AS AVRO</code> clause, you can omit the list of columns
+      and specify the same metadata as part of the <code class="ph codeph">TBLPROPERTIES</code> clause.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Complex type considerations:</strong>
+      </p>
+
+    <p class="p">
+      The Impala complex types (<code class="ph codeph">STRUCT</code>, <code class="ph codeph">ARRAY</code>, or
+      <code class="ph codeph">MAP</code>) are available in <span class="keyword">Impala 2.3</span> and higher.
+      Because you can nest these types (for example, to make an array of maps or a struct with
+      an array field), these types are also sometimes referred to as nested types. See
+      <a class="xref" href="impala_complex_types.html#complex_types">Complex Types (Impala 2.3 or higher only)</a> for usage details.
+    </p>
+
+
+
+    <p class="p">
+      Impala can create tables containing complex type columns, with any supported file format.
+      Because currently Impala can only query complex type columns in Parquet tables, creating
+      tables with complex type columns and other file formats such as text is of limited use.
+      For example, you might create a text table including some columns with complex types with
+      Impala, and use Hive as part of your to ingest the nested type data and copy it to an
+      identical Parquet table. Or you might create a partitioned table containing complex type
+      columns using one file format, and use <code class="ph codeph">ALTER TABLE</code> to change the file
+      format of individual partitions to Parquet; Impala can then query only the Parquet-format
+      partitions in that table.
+    </p>
+
+    <p class="p">
+        Partitioned tables can contain complex type columns.
+        All the partition key columns must be scalar types.
+      </p>
+
+    <p class="p">
+      <strong class="ph b">Internal and external tables (EXTERNAL and LOCATION clauses):</strong>
+    </p>
+
+    <p class="p">
+      By default, Impala creates an <span class="q">"internal"</span> table, where Impala manages the underlying
+      data files for the table, and physically deletes the data files when you drop the table.
+      If you specify the <code class="ph codeph">EXTERNAL</code> clause, Impala treats the table as an
+      <span class="q">"external"</span> table, where the data files are typically produced outside Impala and
+      queried from their original locations in HDFS, and Impala leaves the data files in place
+      when you drop the table. For details about internal and external tables, see
+      <a class="xref" href="impala_tables.html#tables">Overview of Impala Tables</a>.
+    </p>
+
+    <p class="p">
+      Typically, for an external table you include a <code class="ph codeph">LOCATION</code> clause to specify
+      the path to the HDFS directory where Impala reads and writes files for the table. For
+      example, if your data pipeline produces Parquet files in the HDFS directory
+      <span class="ph filepath">/user/etl/destination</span>, you might create an external table as follows:
+    </p>
+
+<pre class="pre codeblock"><code>CREATE EXTERNAL TABLE external_parquet (c1 INT, c2 STRING, c3 TIMESTAMP)
+  STORED AS PARQUET LOCATION '/user/etl/destination';
+</code></pre>
+
+    <p class="p">
+      Although the <code class="ph codeph">EXTERNAL</code> and <code class="ph codeph">LOCATION</code> clauses are often
+      specified together, <code class="ph codeph">LOCATION</code> is optional for external tables, and you can
+      also specify <code class="ph codeph">LOCATION</code> for internal tables. The difference is all about
+      whether Impala <span class="q">"takes control"</span> of the underlying data files and moves them when you
+      rename the table, or deletes them when you drop the table. For more about internal and
+      external tables and how they interact with the <code class="ph codeph">LOCATION</code> attribute, see
+      <a class="xref" href="impala_tables.html#tables">Overview of Impala Tables</a>.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Partitioned tables (PARTITIONED BY clause):</strong>
+    </p>
+
+    <p class="p">
+      The <code class="ph codeph">PARTITIONED BY</code> clause divides the data files based on the values from
+      one or more specified columns. Impala queries can use the partition metadata to minimize
+      the amount of data that is read from disk or transmitted across the network, particularly
+      during join queries. For details about partitioning, see
+      <a class="xref" href="impala_partitioning.html#partitioning">Partitioning for Impala Tables</a>.
+    </p>
+
+    <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+      <p class="p">
+        All Kudu tables require partitioning, which involves different syntax than non-Kudu
+        tables. See the <code class="ph codeph">PARTITION BY</code> clause, rather than <code class="ph codeph">PARTITIONED
+        BY</code>, for Kudu tables.
+      </p>
+      <p class="p">
+        In <span class="keyword">Impala 2.10</span> and higher, the <code class="ph codeph">PARTITION BY</code>
+        clause is optional for Kudu tables. If the clause is omitted, Impala automatically
+        constructs a single partition that is not connected to any column. Because such a
+        table cannot take advantage of Kudu features for parallelized queries and
+        query optimizations, omitting the <code class="ph codeph">PARTITION BY</code> clause is only
+        appropriate for small lookup tables.
+      </p>
+    </div>
+
+    <p class="p">
+      Prior to <span class="keyword">Impala 2.5</span>, you could use a partitioned table as the
+      source and copy data from it, but could not specify any partitioning clauses for the new
+      table. In <span class="keyword">Impala 2.5</span> and higher, you can now use the
+      <code class="ph codeph">PARTITIONED BY</code> clause with a <code class="ph codeph">CREATE TABLE AS SELECT</code>
+      statement. See the examples under the following discussion of the <code class="ph codeph">CREATE TABLE AS
+      SELECT</code> syntax variation.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Sorted tables (SORT BY clause):</strong>
+    </p>
+
+    <p class="p">
+      The optional <code class="ph codeph">SORT BY</code> clause lets you specify zero or more columns
+      that are sorted in the data files created by each Impala <code class="ph codeph">INSERT</code> or
+      <code class="ph codeph">CREATE TABLE AS SELECT</code> operation. Creating data files that are
+      sorted is most useful for Parquet tables, where the metadata stored inside each file includes
+      the minimum and maximum values for each column in the file. (The statistics apply to each row group
+      within the file; for simplicity, Impala writes a single row group in each file.) Grouping
+      data values together in relatively narrow ranges within each data file makes it possible
+      for Impala to quickly skip over data files that do not contain value ranges indicated in
+      the <code class="ph codeph">WHERE</code> clause of a query, and can improve the effectiveness
+      of Parquet encoding and compression.
+    </p>
+
+    <p class="p">
+      This clause is not applicable for Kudu tables or HBase tables. Although it works
+      for other HDFS file formats besides Parquet, the more efficient layout is most
+      evident with Parquet tables, because each Parquet data file includes statistics
+      about the data values in that file.
+    </p>
+
+    <p class="p">
+      The <code class="ph codeph">SORT BY</code> columns cannot include any partition key columns
+      for a partitioned table, because those column values are not represented in
+      the underlying data files.
+    </p>
+
+    <p class="p">
+      Because data files can arrive in Impala tables by mechanisms that do not respect
+      the <code class="ph codeph">SORT BY</code> clause, such as <code class="ph codeph">LOAD DATA</code> or ETL
+      tools that create HDFS files, Impala does not guarantee or rely on the data being
+      sorted. The sorting aspect is only used to create a more efficient layout for
+      Parquet files generated by Impala, which helps to optimize the processing of
+      those Parquet files during Impala queries. During an <code class="ph codeph">INSERT</code>
+      or <code class="ph codeph">CREATE TABLE AS SELECT</code> operation, the sorting occurs
+      when the <code class="ph codeph">SORT BY</code> clause applies to the destination table
+      for the data, regardless of whether the source table has a <code class="ph codeph">SORT BY</code>
+      clause.
+    </p>
+
+    <p class="p">
+      For example, when creating a table intended to contain census data, you might define
+      sort columns such as last name and state. If a data file in this table contains a
+      narrow range of last names, for example from <code class="ph codeph">Smith</code> to <code class="ph codeph">Smythe</code>,
+      Impala can quickly detect that this data file contains no matches for a <code class="ph codeph">WHERE</code>
+      clause such as <code class="ph codeph">WHERE last_name = 'Jones'</code> and avoid reading the entire file.
+    </p>
+
+<pre class="pre codeblock"><code>CREATE TABLE census_data (last_name STRING, first_name STRING, state STRING, address STRING)
+  SORT BY (last_name, state)
+  STORED AS PARQUET;
+</code></pre>
+
+    <p class="p">
+      Likewise, if an existing table contains data without any sort order, you can reorganize
+      the data in a more efficient way by using <code class="ph codeph">INSERT</code> or
+      <code class="ph codeph">CREATE TABLE AS SELECT</code> to copy that data into a new table with a
+      <code class="ph codeph">SORT BY</code> clause:
+    </p>
+
+<pre class="pre codeblock"><code>CREATE TABLE sorted_census_data
+  SORT BY (last_name, state)
+  STORED AS PARQUET
+  AS SELECT last_name, first_name, state, address
+    FROM unsorted_census_data;
+</code></pre>
+
+    <p class="p">
+      The metadata for the <code class="ph codeph">SORT BY</code> clause is stored in the <code class="ph codeph">TBLPROPERTIES</code>
+      fields for the table. Other SQL engines that can interoperate with Impala tables, such as Hive
+      and Spark SQL, do not recognize this property when inserting into a table that has a <code class="ph codeph">SORT BY</code>
+      clause.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Kudu considerations:</strong>
+      </p>
+
+    <p class="p">
+      Because Kudu tables do not support clauses related to HDFS and S3 data files and
+      partitioning mechanisms, the syntax associated with the <code class="ph codeph">STORED AS KUDU</code>
+      clause is shown separately in the above syntax descriptions. Kudu tables have their own
+      syntax for <code class="ph codeph">CREATE TABLE</code>, <code class="ph codeph">CREATE EXTERNAL TABLE</code>, and
+      <code class="ph codeph">CREATE TABLE AS SELECT</code>. <span class="ph">Prior to <span class="keyword">Impala 2.10</span>,
+      all internal Kudu tables require a <code class="ph codeph">PARTITION BY</code> clause, different than
+      the <code class="ph codeph">PARTITIONED BY</code> clause for HDFS-backed tables.</span>
+    </p>
+
+    <p class="p">
+      Here are some examples of creating empty Kudu tables:
+    </p>
+
+<pre class="pre codeblock"><code>
+<span class="ph">-- Single partition. Only for <span class="keyword">Impala 2.10</span> and higher.
+-- Only suitable for small lookup tables.
+CREATE TABLE kudu_no_partition_by_clause
+  (
+    id bigint PRIMARY KEY, s STRING, b BOOLEAN
+  )
+  STORED AS KUDU;</span>
+
+-- Single-column primary key.
+CREATE TABLE kudu_t1 (id BIGINT PRIMARY key, s STRING, b BOOLEAN)
+  PARTITION BY HASH (id) PARTITIONS 20 STORED AS KUDU;
+
+-- Multi-column primary key.
+CREATE TABLE kudu_t2 (id BIGINT, s STRING, b BOOLEAN, PRIMARY KEY (id,s))
+  PARTITION BY HASH (s) PARTITIONS 30 STORED AS KUDU;
+
+-- Meaningful primary key column is good for range partitioning.
+CREATE TABLE kudu_t3 (id BIGINT, year INT, s STRING,
+    b BOOLEAN, PRIMARY KEY (id,year))
+  PARTITION BY HASH (id) PARTITIONS 20,
+  RANGE (year) (PARTITION 1980 &lt;= VALUES &lt; 1990,
+    PARTITION 1990 &lt;= VALUES &lt; 2000,
+    PARTITION VALUE = 2001,
+    PARTITION 2001 &lt; VALUES)
+  STORED AS KUDU;
+
+</code></pre>
+
+    <p class="p">
+      Here is an example of creating an external Kudu table:
+    </p>
+
+<pre class="pre codeblock"><code>
+-- Inherits column definitions from original table.
+-- For tables created through Impala, the kudu.table_name property
+-- comes from DESCRIBE FORMATTED output from the original table.
+CREATE EXTERNAL TABLE external_t1 STORED AS KUDU
+  TBLPROPERTIES ('kudu.table_name'='kudu_tbl_created_via_api');
+
+</code></pre>
+
+    <p class="p">
+      Here is an example of <code class="ph codeph">CREATE TABLE AS SELECT</code> syntax for a Kudu table:
+    </p>
+
+<pre class="pre codeblock"><code>
+-- The CTAS statement defines the primary key and partitioning scheme.
+-- The rest of the column definitions are derived from the select list.
+CREATE TABLE ctas_t1
+  PRIMARY KEY (id) PARTITION BY HASH (id) PARTITIONS 10
+  STORED AS KUDU
+  AS SELECT id, s FROM kudu_t1;
+
+</code></pre>
+
+    <p class="p">
+      The following <code class="ph codeph">CREATE TABLE</code> clauses are not supported for Kudu tables:
+    </p>
+
+    <ul class="ul">
+      <li class="li">
+        <code class="ph codeph">PARTITIONED BY</code> (Kudu tables use the clause <code class="ph codeph">PARTITION
+        BY</code> instead)
+      </li>
+
+      <li class="li">
+        <code class="ph codeph">LOCATION</code>
+      </li>
+
+      <li class="li">
+        <code class="ph codeph">ROWFORMAT</code>
+      </li>
+
+      <li class="li">
+        <code class="ph codeph">CACHED IN | UNCACHED</code>
+      </li>
+
+      <li class="li">
+        <code class="ph codeph">WITH SERDEPROPERTIES</code>
+      </li>
+    </ul>
+
+    <p class="p">
+      For more on the <code class="ph codeph">PRIMARY KEY</code> clause, see
+      <a class="xref" href="impala_kudu.html#kudu_primary_key">Primary Key Columns for Kudu Tables</a> and
+      <a class="xref" href="impala_kudu.html#kudu_primary_key_attribute">PRIMARY KEY Attribute</a>.
+    </p>
+
+    <p class="p">
+      For more on the <code class="ph codeph">NULL</code> and <code class="ph codeph">NOT NULL</code> attributes, see
+      <a class="xref" href="impala_kudu.html#kudu_not_null_attribute">NULL | NOT NULL Attribute</a>.
+    </p>
+
+    <p class="p">
+      For more on the <code class="ph codeph">ENCODING</code> attribute, see
+      <a class="xref" href="impala_kudu.html#kudu_encoding_attribute">ENCODING Attribute</a>.
+    </p>
+
+    <p class="p">
+      For more on the <code class="ph codeph">COMPRESSION</code> attribute, see
+      <a class="xref" href="impala_kudu.html#kudu_compression_attribute">COMPRESSION Attribute</a>.
+    </p>
+
+    <p class="p">
+      For more on the <code class="ph codeph">DEFAULT</code> attribute, see
+      <a class="xref" href="impala_kudu.html#kudu_default_attribute">DEFAULT Attribute</a>.
+    </p>
+
+    <p class="p">
+      For more on the <code class="ph codeph">BLOCK_SIZE</code> attribute, see
+      <a class="xref" href="impala_kudu.html#kudu_block_size_attribute">BLOCK_SIZE Attribute</a>.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Partitioning for Kudu tables (PARTITION BY clause)</strong>
+    </p>
+
+    <p class="p">
+      For Kudu tables, you specify logical partitioning across one or more columns using the
+      <code class="ph codeph">PARTITION BY</code> clause. In contrast to partitioning for HDFS-based tables,
+      multiple values for a partition key column can be located in the same partition. The
+      optional <code class="ph codeph">HASH</code> clause lets you divide one or a set of partition key
+      columns into a specified number of buckets. You can use more than one
+      <code class="ph codeph">HASH</code> clause, specifying a distinct set of partition key columns for each.
+      The optional <code class="ph codeph">RANGE</code> clause further subdivides the partitions, based on a
+      set of comparison operations for the partition key columns.
+    </p>
+
+    <p class="p">
+      Here are some examples of the <code class="ph codeph">PARTITION BY HASH</code> syntax:
+    </p>
+
+<pre class="pre codeblock"><code>
+-- Apply hash function to 1 primary key column.
+create table hash_t1 (x bigint, y bigint, s string, primary key (x,y))
+  partition by hash (x) partitions 10
+  stored as kudu;
+
+-- Apply hash function to a different primary key column.
+create table hash_t2 (x bigint, y bigint, s string, primary key (x,y))
+  partition by hash (y) partitions 10
+  stored as kudu;
+
+-- Apply hash function to both primary key columns.
+-- In this case, the total number of partitions is 10.
+create table hash_t3 (x bigint, y bigint, s string, primary key (x,y))
+  partition by hash (x,y) partitions 10
+  stored as kudu;
+
+-- When the column list is omitted, apply hash function to all primary key columns.
+create table hash_t4 (x bigint, y bigint, s string, primary key (x,y))
+  partition by hash partitions 10
+  stored as kudu;
+
+-- Hash the X values independently from the Y values.
+-- In this case, the total number of partitions is 10 x 20.
+create table hash_t5 (x bigint, y bigint, s string, primary key (x,y))
+  partition by hash (x) partitions 10, hash (y) partitions 20
+  stored as kudu;
+
+</code></pre>
+
+    <p class="p">
+      Here are some examples of the <code class="ph codeph">PARTITION BY RANGE</code> syntax:
+    </p>
+
+<pre class="pre codeblock"><code>
+-- Create partitions that cover every possible value of X.
+-- Ranges that span multiple values use the keyword VALUES between
+-- a pair of &lt; and &lt;= comparisons.
+create table range_t1 (x bigint, s string, s2 string, primary key (x, s))
+  partition by range (x)
+    (
+      partition 0 &lt;= values &lt;= 49, partition 50 &lt;= values &lt;= 100,
+      partition values &lt; 0, partition 100 &lt; values
+    )
+  stored as kudu;
+
+-- Create partitions that cover some possible values of X.
+-- Values outside the covered range(s) are rejected.
+-- New range partitions can be added through ALTER TABLE.
+create table range_t2 (x bigint, s string, s2 string, primary key (x, s))
+  partition by range (x)
+    (
+      partition 0 &lt;= values &lt;= 49, partition 50 &lt;= values &lt;= 100
+    )
+  stored as kudu;
+
+-- A range can also specify a single specific value, using the keyword VALUE
+-- with an = comparison.
+create table range_t3 (x bigint, s string, s2 string, primary key (x, s))
+  partition by range (s)
+    (
+      partition value = 'Yes', partition value = 'No', partition value = 'Maybe'
+    )
+  stored as kudu;
+
+-- Using multiple columns in the RANGE clause and tuples inside the partition spec
+-- only works for partitions specified with the VALUE= syntax.
+create table range_t4 (x bigint, s string, s2 string, primary key (x, s))
+  partition by range (x,s)
+    (
+      partition value = (0,'zero'), partition value = (1,'one'), partition value = (2,'two')
+    )
+  stored as kudu;
+
+</code></pre>
+
+    <p class="p">
+      Here are some examples combining both <code class="ph codeph">HASH</code> and <code class="ph codeph">RANGE</code>
+      syntax for the <code class="ph codeph">PARTITION BY</code> clause:
+    </p>
+
+<pre class="pre codeblock"><code>
+-- Values from each range partition are hashed into 10 associated buckets.
+-- Total number of partitions in this case is 10 x 2.
+create table combined_t1 (x bigint, s string, s2 string, primary key (x, s))
+  partition by hash (x) partitions 10, range (x)
+    (
+      partition 0 &lt;= values &lt;= 49, partition 50 &lt;= values &lt;= 100
+    )
+  stored as kudu;
+
+-- The hash partitioning and range partitioning can apply to different columns.
+-- But all the columns used in either partitioning scheme must be from the primary key.
+create table combined_t2 (x bigint, s string, s2 string, primary key (x, s))
+  partition by hash (s) partitions 10, range (x)
+    (
+      partition 0 &lt;= values &lt;= 49, partition 50 &lt;= values &lt;= 100
+    )
+  stored as kudu;
+
+</code></pre>
+
+    <p class="p">
+      For more usage details and examples of the Kudu partitioning syntax, see
+      <a class="xref" href="impala_kudu.html">Using Impala to Query Kudu Tables</a>.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Specifying file format (STORED AS and ROW FORMAT clauses):</strong>
+    </p>
+
+    <p class="p">
+      The <code class="ph codeph">STORED AS</code> clause identifies the format of the underlying data files.
+      Currently, Impala can query more types of file formats than it can create or insert into.
+      Use Hive to perform any create or data load operations that are not currently available in
+      Impala. For example, Impala can create an Avro, SequenceFile, or RCFile table but cannot
+      insert data into it. There are also Impala-specific procedures for using compression with
+      each kind of file format. For details about working with data files of various formats,
+      see <a class="xref" href="impala_file_formats.html#file_formats">How Impala Works with Hadoop File Formats</a>.
+    </p>
+
+    <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+      In Impala 1.4.0 and higher, Impala can create Avro tables, which formerly required doing
+      the <code class="ph codeph">CREATE TABLE</code> statement in Hive. See
+      <a class="xref" href="impala_avro.html#avro">Using the Avro File Format with Impala Tables</a> for details and examples.
+    </div>
+
+    <p class="p">
+      By default (when no <code class="ph codeph">STORED AS</code> clause is specified), data files in Impala
+      tables are created as text files with Ctrl-A (hex 01) characters as the delimiter.
+
+      Specify the <code class="ph codeph">ROW FORMAT DELIMITED</code> clause to produce or ingest data files
+      that use a different delimiter character such as tab or <code class="ph codeph">|</code>, or a different
+      line end character such as carriage return or newline. When specifying delimiter and line
+      end characters with the <code class="ph codeph">FIELDS TERMINATED BY</code> and <code class="ph codeph">LINES TERMINATED
+      BY</code> clauses, use <code class="ph codeph">'\t'</code> for tab, <code class="ph codeph">'\n'</code> for newline
+      or linefeed, <code class="ph codeph">'\r'</code> for carriage return, and
+      <code class="ph codeph">\</code><code class="ph codeph">0</code> for ASCII <code class="ph codeph">nul</code> (hex 00). For more
+      examples of text tables, see <a class="xref" href="impala_txtfile.html#txtfile">Using Text Data Files with Impala Tables</a>.
+    </p>
+
+    <p class="p">
+      The <code class="ph codeph">ESCAPED BY</code> clause applies both to text files that you create through
+      an <code class="ph codeph">INSERT</code> statement to an Impala <code class="ph codeph">TEXTFILE</code> table, and to
+      existing data files that you put into an Impala table directory. (You can ingest existing
+      data files either by creating the table with <code class="ph codeph">CREATE EXTERNAL TABLE ...
+      LOCATION</code>, the <code class="ph codeph">LOAD DATA</code> statement, or through an HDFS operation
+      such as <code class="ph codeph">hdfs dfs -put <var class="keyword varname">file</var>
+      <var class="keyword varname">hdfs_path</var></code>.) Choose an escape character that is not used
+      anywhere else in the file, and put it in front of each instance of the delimiter character
+      that occurs within a field value. Surrounding field values with quotation marks does not
+      help Impala to parse fields with embedded delimiter characters; the quotation marks are
+      considered to be part of the column value. If you want to use <code class="ph codeph">\</code> as the
+      escape character, specify the clause in <span class="keyword cmdname">impala-shell</span> as <code class="ph codeph">ESCAPED
+      BY '\\'</code>.
+    </p>
+
+    <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+        The <code class="ph codeph">CREATE TABLE</code> clauses <code class="ph codeph">FIELDS TERMINATED BY</code>, <code class="ph codeph">ESCAPED
+        BY</code>, and <code class="ph codeph">LINES TERMINATED BY</code> have special rules for the string literal used for
+        their argument, because they all require a single character. You can use a regular character surrounded by
+        single or double quotation marks, an octal sequence such as <code class="ph codeph">'\054'</code> (representing a comma),
+        or an integer in the range '-127'..'128' (with quotation marks but no backslash), which is interpreted as a
+        single-byte ASCII character. Negative values are subtracted from 256; for example, <code class="ph codeph">FIELDS
+        TERMINATED BY '-2'</code> sets the field delimiter to ASCII code 254, the <span class="q">"Icelandic Thorn"</span>
+        character used as a delimiter by some data formats.
+      </div>
+
+    <p class="p">
+      <strong class="ph b">Cloning tables (LIKE clause):</strong>
+    </p>
+
+    <p class="p">
+      To create an empty table with the same columns, comments, and other attributes as another
+      table, use the following variation. The <code class="ph codeph">CREATE TABLE ... LIKE</code> form allows
+      a restricted set of clauses, currently only the <code class="ph codeph">LOCATION</code>,
+      <code class="ph codeph">COMMENT</code>, and <code class="ph codeph">STORED AS</code> clauses.
+    </p>
+
+<pre class="pre codeblock"><code>CREATE [EXTERNAL] TABLE [IF NOT EXISTS] [<var class="keyword varname">db_name</var>.]<var class="keyword varname">table_name</var>
+  <span class="ph">LIKE { [<var class="keyword varname">db_name</var>.]<var class="keyword varname">table_name</var> | PARQUET '<var class="keyword varname">hdfs_path_of_parquet_file</var>' }</span>
+  [COMMENT '<var class="keyword varname">table_comment</var>']
+  [STORED AS <var class="keyword varname">file_format</var>]
+  [LOCATION '<var class="keyword varname">hdfs_path</var>']</code></pre>
+
+    <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+      <p class="p">
+        To clone the structure of a table and transfer data into it in a single operation, use
+        the <code class="ph codeph">CREATE TABLE AS SELECT</code> syntax described in the next subsection.
+      </p>
+    </div>
+
+    <p class="p">
+      When you clone the structure of an existing table using the <code class="ph codeph">CREATE TABLE ...
+      LIKE</code> syntax, the new table keeps the same file format as the original one, so you
+      only need to specify the <code class="ph codeph">STORED AS</code> clause if you want to use a different
+      file format, or when specifying a view as the original table. (Creating a table
+      <span class="q">"like"</span> a view produces a text table by default.)
+    </p>
+
+    <p class="p">
+      Although normally Impala cannot create an HBase table directly, Impala can clone the
+      structure of an existing HBase table with the <code class="ph codeph">CREATE TABLE ... LIKE</code>
+      syntax, preserving the file format and metadata from the original table.
+    </p>
+
+    <p class="p">
+      There are some exceptions to the ability to use <code class="ph codeph">CREATE TABLE ... LIKE</code>
+      with an Avro table. For example, you cannot use this technique for an Avro table that is
+      specified with an Avro schema but no columns. When in doubt, check if a <code class="ph codeph">CREATE
+      TABLE ... LIKE</code> operation works in Hive; if not, it typically will not work in
+      Impala either.
+    </p>
+
+    <p class="p">
+      If the original table is partitioned, the new table inherits the same partition key
+      columns. Because the new table is initially empty, it does not inherit the actual
+      partitions that exist in the original one. To create partitions in the new table, insert
+      data or issue <code class="ph codeph">ALTER TABLE ... ADD PARTITION</code> statements.
+    </p>
+
+    <p class="p">
+        Prior to Impala 1.4.0, it was not possible to use the <code class="ph codeph">CREATE TABLE LIKE
+        <var class="keyword varname">view_name</var></code> syntax. In Impala 1.4.0 and higher, you can create a table with the
+        same column definitions as a view using the <code class="ph codeph">CREATE TABLE LIKE</code> technique. Although
+        <code class="ph codeph">CREATE TABLE LIKE</code> normally inherits the file format of the original table, a view has no
+        underlying file format, so <code class="ph codeph">CREATE TABLE LIKE <var class="keyword varname">view_name</var></code> produces a text
+        table by default. To specify a different file format, include a <code class="ph codeph">STORED AS
+        <var class="keyword varname">file_format</var></code> clause at the end of the <code class="ph codeph">CREATE TABLE LIKE</code>
+        statement.
+      </p>
+
+    <p class="p">
+      Because <code class="ph codeph">CREATE TABLE ... LIKE</code> only manipulates table metadata, not the
+      physical data of the table, issue <code class="ph codeph">INSERT INTO TABLE</code> statements afterward
+      to copy any data from the original table into the new one, optionally converting the data
+      to a new file format. (For some file formats, Impala can do a <code class="ph codeph">CREATE TABLE ...
+      LIKE</code> to create the table, but Impala cannot insert data in that file format; in
+      these cases, you must load the data in Hive. See
+      <a class="xref" href="impala_file_formats.html#file_formats">How Impala Works with Hadoop File Formats</a> for details.)
+    </p>
+
+    <p class="p" id="create_table__ctas">
+      <strong class="ph b">CREATE TABLE AS SELECT:</strong>
+    </p>
+
+    <p class="p">
+      The <code class="ph codeph">CREATE TABLE AS SELECT</code> syntax is a shorthand notation to create a
+      table based on column definitions from another table, and copy data from the source table
+      to the destination table without issuing any separate <code class="ph codeph">INSERT</code> statement.
+      This idiom is so popular that it has its own acronym, <span class="q">"CTAS"</span>.
+    </p>
+
+    <p class="p">
+      The following examples show how to copy data from a source table <code class="ph codeph">T1</code> to a
+      variety of destinations tables, applying various transformations to the table properties,
+      table layout, or the data itself as part of the operation:
+    </p>
+
+<pre class="pre codeblock"><code>
+-- Sample table to be the source of CTAS operations.
+CREATE TABLE t1 (x INT, y STRING);
+INSERT INTO t1 VALUES (1, 'one'), (2, 'two'), (3, 'three');
+
+-- Clone all the columns and data from one table to another.
+CREATE TABLE clone_of_t1 AS SELECT * FROM t1;
++-------------------+
+| summary           |
++-------------------+
+| Inserted 3 row(s) |
++-------------------+
+
+-- Clone the columns and data, and convert the data to a different file format.
+CREATE TABLE parquet_version_of_t1 STORED AS PARQUET AS SELECT * FROM t1;
++-------------------+
+| summary           |
++-------------------+
+| Inserted 3 row(s) |
++-------------------+
+
+-- Copy only some rows to the new table.
+CREATE TABLE subset_of_t1 AS SELECT * FROM t1 WHERE x &gt;= 2;
++-------------------+
+| summary           |
++-------------------+
+| Inserted 2 row(s) |
++-------------------+
+
+-- Same idea as CREATE TABLE LIKE: clone table layout but do not copy any data.
+CREATE TABLE empty_clone_of_t1 AS SELECT * FROM t1 WHERE 1=0;
++-------------------+
+| summary           |
++-------------------+
+| Inserted 0 row(s) |
++-------------------+
+
+-- Reorder and rename columns and transform the data.
+CREATE TABLE t5 AS SELECT upper(y) AS s, x+1 AS a, 'Entirely new column' AS n FROM t1;
++-------------------+
+| summary           |
++-------------------+
+| Inserted 3 row(s) |
++-------------------+
+SELECT * FROM t5;
++-------+---+---------------------+
+| s     | a | n                   |
++-------+---+---------------------+
+| ONE   | 2 | Entirely new column |
+| TWO   | 3 | Entirely new column |
+| THREE | 4 | Entirely new column |
++-------+---+---------------------+
+</code></pre>
+
+
+
+
+
+    <p class="p">
+      See <a class="xref" href="impala_select.html#select">SELECT Statement</a> for details about query syntax for the
+      <code class="ph codeph">SELECT</code> portion of a <code class="ph codeph">CREATE TABLE AS SELECT</code> statement.
+    </p>
+
+    <p class="p">
+      The newly created table inherits the column names that you select from the original table,
+      which you can override by specifying column aliases in the query. Any column or table
+      comments from the original table are not carried over to the new table.
+    </p>
+
+    <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+      When using the <code class="ph codeph">STORED AS</code> clause with a <code class="ph codeph">CREATE TABLE AS
+      SELECT</code> statement, the destination table must be a file format that Impala can
+      write to: currently, text or Parquet. You cannot specify an Avro, SequenceFile, or RCFile
+      table as the destination table for a CTAS operation.
+    </div>
+
+    <p class="p">
+      Prior to <span class="keyword">Impala 2.5</span> you could use a partitioned table as the source
+      and copy data from it, but could not specify any partitioning clauses for the new table.
+      In <span class="keyword">Impala 2.5</span> and higher, you can now use the <code class="ph codeph">PARTITIONED
+      BY</code> clause with a <code class="ph codeph">CREATE TABLE AS SELECT</code> statement. The following
+      example demonstrates how you can copy data from an unpartitioned table in a <code class="ph codeph">CREATE
+      TABLE AS SELECT</code> operation, creating a new partitioned table in the process. The
+      main syntax consideration is the column order in the <code class="ph codeph">PARTITIONED BY</code>
+      clause and the select list: the partition key columns must be listed last in the select
+      list, in the same order as in the <code class="ph codeph">PARTITIONED BY</code> clause. Therefore, in
+      this case, the column order in the destination table is different from the source table.
+      You also only specify the column names in the <code class="ph codeph">PARTITIONED BY</code> clause, not
+      the data types or column comments.
+    </p>
+
+<pre class="pre codeblock"><code>
+create table partitions_no (year smallint, month tinyint, s string);
+insert into partitions_no values (2016, 1, 'January 2016'),
+  (2016, 2, 'February 2016'), (2016, 3, 'March 2016');
+
+-- Prove that the source table is not partitioned.
+show partitions partitions_no;
+ERROR: AnalysisException: Table is not partitioned: ctas_partition_by.partitions_no
+
+-- Create new table with partitions based on column values from source table.
+<strong class="ph b">create table partitions_yes partitioned by (year, month)
+  as select s, year, month from partitions_no;</strong>
++-------------------+
+| summary           |
++-------------------+
+| Inserted 3 row(s) |
++-------------------+
+
+-- Prove that the destination table is partitioned.
+show partitions partitions_yes;
++-------+-------+-------+--------+------+...
+| year  | month | #Rows | #Files | Size |...
++-------+-------+-------+--------+------+...
+| 2016  | 1     | -1    | 1      | 13B  |...
+| 2016  | 2     | -1    | 1      | 14B  |...
+| 2016  | 3     | -1    | 1      | 11B  |...
+| Total |       | -1    | 3      | 38B  |...
++-------+-------+-------+--------+------+...
+</code></pre>
+
+    <p class="p">
+      The most convenient layout for partitioned tables is with all the partition key columns at
+      the end. The CTAS <code class="ph codeph">PARTITIONED BY</code> syntax requires that column order in the
+      select list, resulting in that same column order in the destination table.
+    </p>
+
+<pre class="pre codeblock"><code>
+describe partitions_no;
++-------+----------+---------+
+| name  | type     | comment |
++-------+----------+---------+
+| year  | smallint |         |
+| month | tinyint  |         |
+| s     | string   |         |
++-------+----------+---------+
+
+-- The CTAS operation forced us to put the partition key columns last.
+-- Having those columns last works better with idioms such as SELECT *
+-- for partitioned tables.
+describe partitions_yes;
++-------+----------+---------+
+| name  | type     | comment |
++-------+----------+---------+
+| s     | string   |         |
+| year  | smallint |         |
+| month | tinyint  |         |
++-------+----------+---------+
+</code></pre>
+
+    <p class="p">
+      Attempting to use a select list with the partition key columns not at the end results in
+      an error due to a column name mismatch:
+    </p>
+
+<pre class="pre codeblock"><code>
+-- We expect this CTAS to fail because non-key column S
+-- comes after key columns YEAR and MONTH in the select list.
+create table partitions_maybe partitioned by (year, month)
+  as select year, month, s from partitions_no;
+ERROR: AnalysisException: Partition column name mismatch: year != month
+</code></pre>
+
+    <p class="p">
+      For example, the following statements show how you can clone all the data in a table, or a
+      subset of the columns and/or rows, or reorder columns, rename them, or construct them out
+      of expressions:
+    </p>
+
+    <p class="p">
+      As part of a CTAS operation, you can convert the data to any file format that Impala can
+      write (currently, <code class="ph codeph">TEXTFILE</code> and <code class="ph codeph">PARQUET</code>). You cannot
+      specify the lower-level properties of a text table, such as the delimiter.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Sorting considerations:</strong> Although you can specify an <code class="ph codeph">ORDER BY</code> clause in an
+        <code class="ph codeph">INSERT ... SELECT</code> statement, any <code class="ph codeph">ORDER BY</code> clause is ignored and the
+        results are not necessarily sorted. An <code class="ph codeph">INSERT ... SELECT</code> operation potentially creates
+        many different data files, prepared on different data nodes, and therefore the notion of the data being
+        stored in sorted order is impractical.
+      </p>
+
+    <p class="p">
+      <strong class="ph b">CREATE TABLE LIKE PARQUET:</strong>
+    </p>
+
+    <p class="p">
+      The variation <code class="ph codeph">CREATE TABLE ... LIKE PARQUET
+      '<var class="keyword varname">hdfs_path_of_parquet_file</var>'</code> lets you skip the column
+      definitions of the <code class="ph codeph">CREATE TABLE</code> statement. The column names and data
+      types are automatically configured based on the organization of the specified Parquet data
+      file, which must already reside in HDFS. You can use a data file located outside the
+      Impala database directories, or a file from an existing Impala Parquet table; either way,
+      Impala only uses the column definitions from the file and does not use the HDFS location
+      for the <code class="ph codeph">LOCATION</code> attribute of the new table. (Although you can also
+      specify the enclosing directory with the <code class="ph codeph">LOCATION</code> attribute, to both use
+      the same schema as the data file and point the Impala table at the associated directory
+      for querying.)
+    </p>
+
+    <p class="p">
+      The following considerations apply when you use the <code class="ph codeph">CREATE TABLE LIKE
+      PARQUET</code> technique:
+    </p>
+
+    <ul class="ul">
+      <li class="li">
+        Any column comments from the original table are not preserved in the new table. Each
+        column in the new table has a comment stating the low-level Parquet field type used to
+        deduce the appropriate SQL column type.
+      </li>
+
+      <li class="li">
+        If you use a data file from a partitioned Impala table, any partition key columns from
+        the original table are left out of the new table, because they are represented in HDFS
+        directory names rather than stored in the data file. To preserve the partition
+        information, repeat the same <code class="ph codeph">PARTITION</code> clause as in the original
+        <code class="ph codeph">CREATE TABLE</code> statement.
+      </li>
+
+      <li class="li">
+        The file format of the new table defaults to text, as with other kinds of <code class="ph codeph">CREATE
+        TABLE</code> statements. To make the new table also use Parquet format, include the
+        clause <code class="ph codeph">STORED AS PARQUET</code> in the <code class="ph codeph">CREATE TABLE LIKE
+        PARQUET</code> statement.
+      </li>
+
+      <li class="li">
+        If the Parquet data file comes from an existing Impala table, currently, any
+        <code class="ph codeph">TINYINT</code> or <code class="ph codeph">SMALLINT</code> columns are turned into
+        <code class="ph codeph">INT</code> columns in the new table. Internally, Parquet stores such values as
+        32-bit integers.
+      </li>
+
+      <li class="li">
+        When the destination table uses the Parquet file format, the <code class="ph codeph">CREATE TABLE AS
+        SELECT</code> and <code class="ph codeph">INSERT ... SELECT</code> statements always create at least
+        one data file, even if the <code class="ph codeph">SELECT</code> part of the statement does not match
+        any rows. You can use such an empty Parquet data file as a template for subsequent
+        <code class="ph codeph">CREATE TABLE LIKE PARQUET</code> statements.
+      </li>
+    </ul>
+
+    <p class="p">
+      For more details about creating Parquet tables, and examples of the <code class="ph codeph">CREATE TABLE
+      LIKE PARQUET</code> syntax, see <a class="xref" href="impala_parquet.html#parquet">Using the Parquet File Format with Impala Tables</a>.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Visibility and Metadata (TBLPROPERTIES and WITH SERDEPROPERTIES clauses):</strong>
+    </p>
+
+    <p class="p">
+      You can associate arbitrary items of metadata with a table by specifying the
+      <code class="ph codeph">TBLPROPERTIES</code> clause. This clause takes a comma-separated list of
+      key-value pairs and stores those items in the metastore database. You can also change the
+      table properties later with an <code class="ph codeph">ALTER TABLE</code> statement. You can observe the
+      table properties for different delimiter and escape characters using the <code class="ph codeph">DESCRIBE
+      FORMATTED</code> command, and change those settings for an existing table with
+      <code class="ph codeph">ALTER TABLE ... SET TBLPROPERTIES</code>.
+    </p>
+
+    <p class="p">
+      You can also associate SerDes properties with the table by specifying key-value pairs
+      through the <code class="ph codeph">WITH SERDEPROPERTIES</code> clause. This metadata is not used by
+      Impala, which has its own built-in serializer and deserializer for the file formats it
+      supports. Particular property values might be needed for Hive compatibility with certain
+      variations of file formats, particularly Avro.
+    </p>
+
+    <p class="p">
+      Some DDL operations that interact with other Hadoop components require specifying
+      particular values in the <code class="ph codeph">SERDEPROPERTIES</code> or
+      <code class="ph codeph">TBLPROPERTIES</code> fields, such as creating an Avro table or an HBase table.
+      (You typically create HBase tables in Hive, because they require additional clauses not
+      currently available in Impala.)
+
+    </p>
+
+    <p class="p">
+      To see the column definitions and column comments for an existing table, for example
+      before issuing a <code class="ph codeph">CREATE TABLE ... LIKE</code> or a <code class="ph codeph">CREATE TABLE ... AS
+      SELECT</code> statement, issue the statement <code class="ph codeph">DESCRIBE
+      <var class="keyword varname">table_name</var></code>. To see even more detail, such as the location of
+      data files and the values for clauses such as <code class="ph codeph">ROW FORMAT</code> and
+      <code class="ph codeph">STORED AS</code>, issue the statement <code class="ph codeph">DESCRIBE FORMATTED
+      <var class="keyword varname">table_name</var></code>. <code class="ph codeph">DESCRIBE FORMATTED</code> is also needed
+      to see any overall table comment (as opposed to individual column comments).
+    </p>
+
+    <p class="p">
+      After creating a table, your <span class="keyword cmdname">impala-shell</span> session or another
+      <span class="keyword cmdname">impala-shell</span> connected to the same node can immediately query that
+      table. There might be a brief interval (one statestore heartbeat) before the table can be
+      queried through a different Impala node. To make the <code class="ph codeph">CREATE TABLE</code>
+      statement return only when the table is recognized by all Impala nodes in the cluster,
+      enable the <code class="ph codeph">SYNC_DDL</code> query option.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">HDFS caching (CACHED IN clause):</strong>
+    </p>
+
+    <p class="p">
+      If you specify the <code class="ph codeph">CACHED IN</code> clause, any existing or future data files in
+      the table directory or the partition subdirectories are designated to be loaded into
+      memory with the HDFS caching mechanism. See
+      <a class="xref" href="impala_perf_hdfs_caching.html#hdfs_caching">Using HDFS Caching with Impala (Impala 2.1 or higher only)</a> for details about using the HDFS
+      caching feature.
+    </p>
+
+    <p class="p">
+        In <span class="keyword">Impala 2.2</span> and higher, the optional <code class="ph codeph">WITH REPLICATION</code> clause
+        for <code class="ph codeph">CREATE TABLE</code> and <code class="ph codeph">ALTER TABLE</code> lets you specify
+        a <dfn class="term">replication factor</dfn>, the number of hosts on which to cache the same data blocks.
+        When Impala processes a cached data block, where the cache replication factor is greater than 1, Impala randomly
+        selects a host that has a cached copy of that data block. This optimization avoids excessive CPU
+        usage on a single host when the same cached data block is processed multiple times.
+        Where practical, specify a value greater than or equal to the HDFS block replication factor.
+      </p>
+
+
+
+    <p class="p">
+      <strong class="ph b">Column order</strong>:
+    </p>
+
+    <p class="p">
+      If you intend to use the table to hold data files produced by some external source,
+      specify the columns in the same order as they appear in the data files.
+    </p>
+
+    <p class="p">
+      If you intend to insert or copy data into the table through Impala, or if you have control
+      over the way externally produced data files are arranged, use your judgment to specify
+      columns in the most convenient order:
+    </p>
+
+    <ul class="ul">
+      <li class="li">
+        <p class="p">
+          If certain columns are often <code class="ph codeph">NULL</code>, specify those columns last. You
+          might produce data files that omit these trailing columns entirely. Impala
+          automatically fills in the <code class="ph codeph">NULL</code> values if so.
+        </p>
+      </li>
+
+      <li class="li">
+        <p class="p">
+          If an unpartitioned table will be used as the source for an <code class="ph codeph">INSERT ...
+          SELECT</code> operation into a partitioned table, specify last in the unpartitioned
+          table any columns that correspond to partition key columns in the partitioned table,
+          and in the same order as the partition key columns are declared in the partitioned
+          table. This technique lets you use <code class="ph codeph">INSERT ... SELECT *</code> when copying
+          data to the partitioned table, rather than specifying each column name individually.
+        </p>
+      </li>
+
+      <li class="li">
+        <p class="p">
+          If you specify columns in an order that you later discover is suboptimal, you can
+          sometimes work around the problem without recreating the table. You can create a view
+          that selects columns from the original table in a permuted order, then do a
+          <code class="ph codeph">SELECT *</code> from the view. When inserting data into a table, you can
+          specify a permuted order for the inserted columns to match the order in the
+          destination table.
+        </p>
+      </li>
+    </ul>
+
+    <p class="p">
+        <strong class="ph b">Hive considerations:</strong>
+      </p>
+
+    <p class="p">
+      Impala queries can make use of metadata about the table and columns, such as the number of
+      rows in a table or the number of different values in a column. Prior to Impala 1.2.2, to
+      create this metadata, you issued the <code class="ph codeph">ANALYZE TABLE</code> statement in Hive to
+      gather this information, after creating the table and loading representative data into it.
+      In Impala 1.2.2 and higher, the <code class="ph codeph">COMPUTE STATS</code> statement produces these
+      statistics within Impala, without needing to use Hive at all.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">HBase considerations:</strong>
+      </p>
+
+    <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+      <p class="p">
+        The Impala <code class="ph codeph">CREATE TABLE</code> statement cannot create an HBase table, because
+        it currently does not support the <code class="ph codeph">STORED BY</code> clause needed for HBase
+        tables. Create such tables in Hive, then query them through Impala. For information on
+        using Impala with HBase tables, see <a class="xref" href="impala_hbase.html#impala_hbase">Using Impala to Query HBase Tables</a>.
+      </p>
+    </div>
+
+    <p class="p">
+        <strong class="ph b">Amazon S3 considerations:</strong>
+      </p>
+
+    <p class="p">
+      To create a table where the data resides in the Amazon Simple Storage Service (S3),
+      specify a <code class="ph codeph">s3a://</code> prefix <code class="ph codeph">LOCATION</code> attribute pointing to
+      the data files in S3.
+    </p>
+
+    <p class="p">
+      In <span class="keyword">Impala 2.6</span> and higher, you can use this special
+      <code class="ph codeph">LOCATION</code> syntax as part of a <code class="ph codeph">CREATE TABLE AS SELECT</code>
+      statement.
+    </p>
+
+    <p class="p">
+        In <span class="keyword">Impala 2.6</span> and higher, Impala DDL statements such as
+        <code class="ph codeph">CREATE DATABASE</code>, <code class="ph codeph">CREATE TABLE</code>, <code class="ph codeph">DROP DATABASE CASCADE</code>,
+        <code class="ph codeph">DROP TABLE</code>, and <code class="ph codeph">ALTER TABLE [ADD|DROP] PARTITION</code> can create or remove folders
+        as needed in the Amazon S3 system. Prior to <span class="keyword">Impala 2.6</span>, you had to create folders yourself and point
+        Impala database, tables, or partitions at them, and manually remove folders when no longer needed.
+        See <a class="xref" href="../shared/../topics/impala_s3.html#s3">Using Impala with the Amazon S3 Filesystem</a> for details about reading and writing S3 data with Impala.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Sorting considerations:</strong> Although you can specify an <code class="ph codeph">ORDER BY</code> clause in an
+        <code class="ph codeph">INSERT ... SELECT</code> statement, any <code class="ph codeph">ORDER BY</code> clause is ignored and the
+        results are not necessarily sorted. An <code class="ph codeph">INSERT ... SELECT</code> operation potentially creates
+        many different data files, prepared on different data nodes, and therefore the notion of the data being
+        stored in sorted order is impractical.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">HDFS considerations:</strong>
+      </p>
+
+    <p class="p">
+      The <code class="ph codeph">CREATE TABLE</code> statement for an internal table creates a directory in
+      HDFS. The <code class="ph codeph">CREATE EXTERNAL TABLE</code> statement associates the table with an
+      existing HDFS directory, and does not create any new directory in HDFS. To locate the HDFS
+      data directory for a table, issue a <code class="ph codeph">DESCRIBE FORMATTED
+      <var class="keyword varname">table</var></code> statement. To examine the contents of that HDFS
+      directory, use an OS command such as <code class="ph codeph">hdfs dfs -ls
+      hdfs://<var class="keyword varname">path</var></code>, either from the OS command line or through the
+      <code class="ph codeph">shell</code> or <code class="ph codeph">!</code> commands in <span class="keyword cmdname">impala-shell</span>.
+    </p>
+
+    <p class="p">
+      The <code class="ph codeph">CREATE TABLE AS SELECT</code> syntax creates data files under the table data
+      directory to hold any data copied by the <code class="ph codeph">INSERT</code> portion of the statement.
+      (Even if no data is copied, Impala might create one or more empty data files.)
+    </p>
+
+    <p class="p">
+        <strong class="ph b">HDFS permissions:</strong>
+      </p>
+
+    <p class="p">
+      The user ID that the <span class="keyword cmdname">impalad</span> daemon runs under, typically the
+      <code class="ph codeph">impala</code> user, must have both execute and write permission for the database
+      directory where the table is being created.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Security considerations:</strong>
+      </p>
+
+    <p class="p">
+        If these statements in your environment contain sensitive literal values such as credit card numbers or tax
+        identifiers, Impala can redact this sensitive information when displaying the statements in log files and
+        other administrative contexts. See <span class="xref">the documentation for your Apache Hadoop distribution</span> for details.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Cancellation:</strong> Certain multi-stage statements (<code class="ph codeph">CREATE TABLE AS SELECT</code> and
+        <code class="ph codeph">COMPUTE STATS</code>) can be cancelled during some stages, when running <code class="ph codeph">INSERT</code>
+        or <code class="ph codeph">SELECT</code> operations internally. To cancel this statement, use Ctrl-C from the
+        <span class="keyword cmdname">impala-shell</span> interpreter, the <span class="ph uicontrol">Cancel</span> button from the
+        <span class="ph uicontrol">Watch</span> page in Hue, or <span class="ph uicontrol">Cancel</span> from the list of
+        in-flight queries (for a particular node) on the <span class="ph uicontrol">Queries</span> tab in the Impala web UI
+        (port 25000).
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+    <p class="p">
+      <a class="xref" href="impala_tables.html#tables">Overview of Impala Tables</a>,
+      <a class="xref" href="impala_alter_table.html#alter_table">ALTER TABLE Statement</a>,
+      <a class="xref" href="impala_drop_table.html#drop_table">DROP TABLE Statement</a>,
+      <a class="xref" href="impala_partitioning.html#partitioning">Partitioning for Impala Tables</a>,
+      <a class="xref" href="impala_tables.html#internal_tables">Internal Tables</a>,
+      <a class="xref" href="impala_tables.html#external_tables">External Tables</a>,
+      <a class="xref" href="impala_compute_stats.html#compute_stats">COMPUTE STATS Statement</a>,
+      <a class="xref" href="impala_sync_ddl.html#sync_ddl">SYNC_DDL Query Option</a>, <a class="xref" href="impala_show.html#show_tables">SHOW TABLES Statement</a>,
+      <a class="xref" href="impala_show.html#show_create_table">SHOW CREATE TABLE Statement</a>,
+      <a class="xref" href="impala_describe.html#describe">DESCRIBE Statement</a>
+    </p>
+
+  </div>
+
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_langref_sql.html">Impala SQL Statements</a></div></div></nav></article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_create_view.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_create_view.html b/docs/build3x/html/topics/impala_create_view.html
new file mode 100644
index 0000000..f25d810
--- /dev/null
+++ b/docs/build3x/html/topics/impala_create_view.html
@@ -0,0 +1,194 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_langref_sql.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="create_view"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>CREATE VIEW Statement</title></head><body id="create_view"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">CREATE VIEW Statement</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+
+      The <code class="ph codeph">CREATE VIEW</code> statement lets you create a shorthand abbreviation for a more complicated
+      query. The base query can involve joins, expressions, reordered columns, column aliases, and other SQL
+      features that can make a query hard to understand or maintain.
+    </p>
+
+    <p class="p">
+      Because a view is purely a logical construct (an alias for a query) with no physical data behind it,
+      <code class="ph codeph">ALTER VIEW</code> only involves changes to metadata in the metastore database, not any data files
+      in HDFS.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>CREATE VIEW [IF NOT EXISTS] <var class="keyword varname">view_name</var> [(<var class="keyword varname">column_list</var>)]
+  AS <var class="keyword varname">select_statement</var></code></pre>
+
+    <p class="p">
+        <strong class="ph b">Statement type:</strong> DDL
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+    <p class="p">
+      The <code class="ph codeph">CREATE VIEW</code> statement can be useful in scenarios such as the following:
+    </p>
+
+    <ul class="ul">
+      <li class="li">
+        To turn even the most lengthy and complicated SQL query into a one-liner. You can issue simple queries
+        against the view from applications, scripts, or interactive queries in <span class="keyword cmdname">impala-shell</span>.
+        For example:
+<pre class="pre codeblock"><code>select * from <var class="keyword varname">view_name</var>;
+select * from <var class="keyword varname">view_name</var> order by c1 desc limit 10;</code></pre>
+        The more complicated and hard-to-read the original query, the more benefit there is to simplifying the
+        query using a view.
+      </li>
+
+      <li class="li">
+        To hide the underlying table and column names, to minimize maintenance problems if those names change. In
+        that case, you re-create the view using the new names, and all queries that use the view rather than the
+        underlying tables keep running with no changes.
+      </li>
+
+      <li class="li">
+        To experiment with optimization techniques and make the optimized queries available to all applications.
+        For example, if you find a combination of <code class="ph codeph">WHERE</code> conditions, join order, join hints, and so
+        on that works the best for a class of queries, you can establish a view that incorporates the
+        best-performing techniques. Applications can then make relatively simple queries against the view, without
+        repeating the complicated and optimized logic over and over. If you later find a better way to optimize the
+        original query, when you re-create the view, all the applications immediately take advantage of the
+        optimized base query.
+      </li>
+
+      <li class="li">
+        To simplify a whole class of related queries, especially complicated queries involving joins between
+        multiple tables, complicated expressions in the column list, and other SQL syntax that makes the query
+        difficult to understand and debug. For example, you might create a view that joins several tables, filters
+        using several <code class="ph codeph">WHERE</code> conditions, and selects several columns from the result set.
+        Applications might issue queries against this view that only vary in their <code class="ph codeph">LIMIT</code>,
+        <code class="ph codeph">ORDER BY</code>, and similar simple clauses.
+      </li>
+    </ul>
+
+    <p class="p">
+      For queries that require repeating complicated clauses over and over again, for example in the select list,
+      <code class="ph codeph">ORDER BY</code>, and <code class="ph codeph">GROUP BY</code> clauses, you can use the <code class="ph codeph">WITH</code>
+      clause as an alternative to creating a view.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Complex type considerations:</strong>
+      </p>
+    <p class="p">
+        For tables containing complex type columns (<code class="ph codeph">ARRAY</code>,
+        <code class="ph codeph">STRUCT</code>, or <code class="ph codeph">MAP</code>), you typically use
+        join queries to refer to the complex values. You can use views to
+        hide the join notation, making such tables seem like traditional denormalized
+        tables, and making those tables queryable by business intelligence tools
+        that do not have built-in support for those complex types.
+        See <a class="xref" href="../shared/../topics/impala_complex_types.html#complex_types_views">Accessing Complex Type Data in Flattened Form Using Views</a> for details.
+      </p>
+    <p class="p">
+        Because you cannot directly issue <code class="ph codeph">SELECT <var class="keyword varname">col_name</var></code>
+        against a column of complex type, you cannot use a view or a <code class="ph codeph">WITH</code>
+        clause to <span class="q">"rename"</span> a column by selecting it with a column alias.
+      </p>
+
+    <p class="p">
+        If you connect to different Impala nodes within an <span class="keyword cmdname">impala-shell</span> session for
+        load-balancing purposes, you can enable the <code class="ph codeph">SYNC_DDL</code> query option to make each DDL
+        statement wait before returning, until the new or changed metadata has been received by all the Impala
+        nodes. See <a class="xref" href="../shared/../topics/impala_sync_ddl.html#sync_ddl">SYNC_DDL Query Option</a> for details.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Security considerations:</strong>
+      </p>
+    <p class="p">
+        If these statements in your environment contain sensitive literal values such as credit card numbers or tax
+        identifiers, Impala can redact this sensitive information when displaying the statements in log files and
+        other administrative contexts. See <span class="xref">the documentation for your Apache Hadoop distribution</span> for details.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Cancellation:</strong> Cannot be cancelled.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">HDFS permissions:</strong> This statement does not touch any HDFS files or directories,
+        therefore no HDFS permissions are required.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+
+
+<pre class="pre codeblock"><code>-- Create a view that is exactly the same as the underlying table.
+create view v1 as select * from t1;
+
+-- Create a view that includes only certain columns from the underlying table.
+create view v2 as select c1, c3, c7 from t1;
+
+-- Create a view that filters the values from the underlying table.
+create view v3 as select distinct c1, c3, c7 from t1 where c1 is not null and c5 &gt; 0;
+
+-- Create a view that that reorders and renames columns from the underlying table.
+create view v4 as select c4 as last_name, c6 as address, c2 as birth_date from t1;
+
+-- Create a view that runs functions to convert or transform certain columns.
+create view v5 as select c1, cast(c3 as string) c3, concat(c4,c5) c5, trim(c6) c6, "Constant" c8 from t1;
+
+-- Create a view that hides the complexity of a view query.
+create view v6 as select t1.c1, t2.c2 from t1 join t2 on t1.id = t2.id;
+</code></pre>
+
+
+
+    <div class="p">
+        The following example creates a series of views and then drops them. These examples illustrate how views
+        are associated with a particular database, and both the view definitions and the view names for
+        <code class="ph codeph">CREATE VIEW</code> and <code class="ph codeph">DROP VIEW</code> can refer to a view in the current database or
+        a fully qualified view name.
+<pre class="pre codeblock"><code>
+-- Create and drop a view in the current database.
+CREATE VIEW few_rows_from_t1 AS SELECT * FROM t1 LIMIT 10;
+DROP VIEW few_rows_from_t1;
+
+-- Create and drop a view referencing a table in a different database.
+CREATE VIEW table_from_other_db AS SELECT x FROM db1.foo WHERE x IS NOT NULL;
+DROP VIEW table_from_other_db;
+
+USE db1;
+-- Create a view in a different database.
+CREATE VIEW db2.v1 AS SELECT * FROM db2.foo;
+-- Switch into the other database and drop the view.
+USE db2;
+DROP VIEW v1;
+
+USE db1;
+-- Create a view in a different database.
+CREATE VIEW db2.v1 AS SELECT * FROM db2.foo;
+-- Drop a view in the other database.
+DROP VIEW db2.v1;
+</code></pre>
+      </div>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+    <p class="p">
+      <a class="xref" href="impala_views.html#views">Overview of Impala Views</a>, <a class="xref" href="impala_alter_view.html#alter_view">ALTER VIEW Statement</a>,
+      <a class="xref" href="impala_drop_view.html#drop_view">DROP VIEW Statement</a>
+    </p>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_langref_sql.html">Impala SQL Statements</a></div></div></nav></article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_databases.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_databases.html b/docs/build3x/html/topics/impala_databases.html
new file mode 100644
index 0000000..550d744
--- /dev/null
+++ b/docs/build3x/html/topics/impala_databases.html
@@ -0,0 +1,62 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_schema_objects.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="databases"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Overview of Impala Databases</title></head><body id="databases"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Overview of Impala Databases</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+      In Impala, a database is a logical container for a group of tables. Each database defines a separate
+      namespace. Within a database, you can refer to the tables inside it using their unqualified names. Different
+      databases can contain tables with identical names.
+    </p>
+
+    <p class="p">
+      Creating a database is a lightweight operation. There are minimal database-specific properties to configure,
+      only <code class="ph codeph">LOCATION</code> and <code class="ph codeph">COMMENT</code>.  There is no <code class="ph codeph">ALTER DATABASE</code> statement.
+    </p>
+
+    <p class="p">
+      Typically, you create a separate database for each project or application, to avoid naming conflicts between
+      tables and to make clear which tables are related to each other. The <code class="ph codeph">USE</code> statement lets
+      you switch between databases. Unqualified references to tables, views, and functions refer to objects
+      within the current database. You can also refer to objects in other databases by using qualified names
+      of the form <code class="ph codeph"><var class="keyword varname">dbname</var>.<var class="keyword varname">object_name</var></code>.
+    </p>
+
+    <p class="p">
+      Each database is physically represented by a directory in HDFS. When you do not specify a <code class="ph codeph">LOCATION</code>
+      attribute, the directory is located in the Impala data directory with the associated tables managed by Impala.
+      When you do specify a <code class="ph codeph">LOCATION</code> attribute, any read and write operations for tables in that
+      database are relative to the specified HDFS directory.
+    </p>
+
+    <p class="p">
+      There is a special database, named <code class="ph codeph">default</code>, where you begin when you connect to Impala.
+      Tables created in <code class="ph codeph">default</code> are physically located one level higher in HDFS than all the
+      user-created databases.
+    </p>
+
+    <div class="p">
+        Impala includes another predefined database, <code class="ph codeph">_impala_builtins</code>, that serves as the location
+        for the <a class="xref" href="../shared/../topics/impala_functions.html#builtins">built-in functions</a>. To see the built-in
+        functions, use a statement like the following:
+<pre class="pre codeblock"><code>show functions in _impala_builtins;
+show functions in _impala_builtins like '*<var class="keyword varname">substring</var>*';
+</code></pre>
+      </div>
+
+    <p class="p">
+      <strong class="ph b">Related statements:</strong>
+    </p>
+
+    <p class="p">
+      <a class="xref" href="impala_create_database.html#create_database">CREATE DATABASE Statement</a>,
+      <a class="xref" href="impala_drop_database.html#drop_database">DROP DATABASE Statement</a>, <a class="xref" href="impala_use.html#use">USE Statement</a>,
+      <a class="xref" href="impala_show.html#show_databases">SHOW DATABASES</a>
+    </p>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_schema_objects.html">Impala Schema Objects and Object Names</a></div></div></nav></article></main></body></html>

[14/51] [partial] impala git commit: [DOCS] Impala doc site update for 3.0

Posted by mi...@apache.org.

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_perf_hdfs_caching.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_perf_hdfs_caching.html b/docs/build3x/html/topics/impala_perf_hdfs_caching.html
new file mode 100644
index 0000000..596675d
--- /dev/null
+++ b/docs/build3x/html/topics/impala_perf_hdfs_caching.html
@@ -0,0 +1,578 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_performance.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="hdfs_caching"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Using HDFS Caching with Impala (Impala 2.1 or higher only)</title></head><body id="hdfs_caching"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Using HDFS Caching with Impala (<span class="keyword">Impala 2.1</span> or higher only)</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+      HDFS caching provides performance and scalability benefits in production environments where Impala queries
+      and other Hadoop jobs operate on quantities of data much larger than the physical RAM on the DataNodes,
+      making it impractical to rely on the Linux OS cache, which only keeps the most recently used data in memory.
+      Data read from the HDFS cache avoids the overhead of checksumming and memory-to-memory copying involved when
+      using data from the Linux OS cache.
+    </p>
+
+    <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+      <p class="p">
+        On a small or lightly loaded cluster, HDFS caching might not produce any speedup. It might even lead to
+        slower queries, if I/O read operations that were performed in parallel across the entire cluster are replaced by in-memory
+        operations operating on a smaller number of hosts. The hosts where the HDFS blocks are cached can become
+        bottlenecks because they experience high CPU load while processing the cached data blocks, while other hosts remain idle.
+        Therefore, always compare performance with and without this feature enabled, using a realistic workload.
+      </p>
+      <p class="p">
+        In <span class="keyword">Impala 2.2</span> and higher, you can spread the CPU load more evenly by specifying the <code class="ph codeph">WITH REPLICATION</code>
+        clause of the <code class="ph codeph">CREATE TABLE</code> and <code class="ph codeph">ALTER TABLE</code> statements.
+        This clause lets you control the replication factor for
+        HDFS caching for a specific table or partition. By default, each cached block is
+        only present on a single host, which can lead to CPU contention if the same host
+        processes each cached block. Increasing the replication factor lets Impala choose
+        different hosts to process different cached blocks, to better distribute the CPU load.
+        Always use a <code class="ph codeph">WITH REPLICATION</code> setting of at least 3, and adjust upward
+        if necessary to match the replication factor for the underlying HDFS data files.
+      </p>
+      <p class="p">
+        In <span class="keyword">Impala 2.5</span> and higher, Impala automatically randomizes which host processes
+        a cached HDFS block, to avoid CPU hotspots. For tables where HDFS caching is not applied,
+        Impala designates which host to process a data block using an algorithm that estimates
+        the load on each host. If CPU hotspots still arise during queries,
+        you can enable additional randomization for the scheduling algorithm for non-HDFS cached data
+        by setting the <code class="ph codeph">SCHEDULE_RANDOM_REPLICA</code> query option.
+      </p>
+    </div>
+
+    <p class="p toc inpage"></p>
+
+
+
+    <p class="p">
+      For background information about how to set up and manage HDFS caching for a <span class="keyword"></span> cluster, see
+      <span class="xref">the documentation for your Apache Hadoop distribution</span>.
+    </p>
+  </div>
+
+  <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_performance.html">Tuning Impala for Performance</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="hdfs_caching__hdfs_caching_overview">
+
+    <h2 class="title topictitle2" id="ariaid-title2">Overview of HDFS Caching for Impala</h2>
+
+
+    <div class="body conbody">
+
+      <p class="p">
+        In <span class="keyword">Impala 1.4</span> and higher, Impala can use the HDFS caching feature to make more effective use of RAM, so that
+        repeated queries can take advantage of data <span class="q">"pinned"</span> in memory regardless of how much data is
+        processed overall. The HDFS caching feature lets you designate a subset of frequently accessed data to be
+        pinned permanently in memory, remaining in the cache across multiple queries and never being evicted. This
+        technique is suitable for tables or partitions that are frequently accessed and are small enough to fit
+        entirely within the HDFS memory cache. For example, you might designate several dimension tables to be
+        pinned in the cache, to speed up many different join queries that reference them. Or in a partitioned
+        table, you might pin a partition holding data from the most recent time period because that data will be
+        queried intensively; then when the next set of data arrives, you could unpin the previous partition and pin
+        the partition holding the new data.
+      </p>
+
+      <p class="p">
+        Because this Impala performance feature relies on HDFS infrastructure, it only applies to Impala tables
+        that use HDFS data files. HDFS caching for Impala does not apply to HBase tables, S3 tables,
+        Kudu tables,
+        or Isilon tables.
+      </p>
+
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="hdfs_caching__hdfs_caching_prereqs">
+
+    <h2 class="title topictitle2" id="ariaid-title3">Setting Up HDFS Caching for Impala</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        To use HDFS caching with Impala, first set up that feature for your <span class="keyword"></span> cluster:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          <p class="p">
+          Decide how much memory to devote to the HDFS cache on each host. Remember that the total memory available
+          for cached data is the sum of the cache sizes on all the hosts. By default, any data block is only cached on one
+          host, although you can cache a block across multiple hosts by increasing the replication factor.
+
+          </p>
+        </li>
+
+        <li class="li">
+          <div class="p">
+          Issue <span class="keyword cmdname">hdfs cacheadmin</span> commands to set up one or more cache pools, owned by the same
+          user as the <span class="keyword cmdname">impalad</span> daemon (typically <code class="ph codeph">impala</code>). For example:
+<pre class="pre codeblock"><code>hdfs cacheadmin -addPool four_gig_pool -owner impala -limit 4000000000
+</code></pre>
+          For details about the <span class="keyword cmdname">hdfs cacheadmin</span> command, see
+          <span class="xref">the documentation for your Apache Hadoop distribution</span>.
+          </div>
+        </li>
+      </ul>
+
+      <p class="p">
+        Once HDFS caching is enabled and one or more pools are available, see
+        <a class="xref" href="impala_perf_hdfs_caching.html#hdfs_caching_ddl">Enabling HDFS Caching for Impala Tables and Partitions</a> for how to choose which Impala data to load
+        into the HDFS cache. On the Impala side, you specify the cache pool name defined by the <code class="ph codeph">hdfs
+        cacheadmin</code> command in the Impala DDL statements that enable HDFS caching for a table or partition,
+        such as <code class="ph codeph">CREATE TABLE ... CACHED IN <var class="keyword varname">pool</var></code> or <code class="ph codeph">ALTER TABLE ... SET
+        CACHED IN <var class="keyword varname">pool</var></code>.
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="hdfs_caching__hdfs_caching_ddl">
+
+    <h2 class="title topictitle2" id="ariaid-title4">Enabling HDFS Caching for Impala Tables and Partitions</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        Begin by choosing which tables or partitions to cache. For example, these might be lookup tables that are
+        accessed by many different join queries, or partitions corresponding to the most recent time period that
+        are analyzed by different reports or ad hoc queries.
+      </p>
+
+      <p class="p">
+        In your SQL statements, you specify logical divisions such as tables and partitions to be cached. Impala
+        translates these requests into HDFS-level directives that apply to particular directories and files. For
+        example, given a partitioned table <code class="ph codeph">CENSUS</code> with a partition key column
+        <code class="ph codeph">YEAR</code>, you could choose to cache all or part of the data as follows:
+      </p>
+
+      <p class="p">
+        In <span class="keyword">Impala 2.2</span> and higher, the optional <code class="ph codeph">WITH REPLICATION</code> clause
+        for <code class="ph codeph">CREATE TABLE</code> and <code class="ph codeph">ALTER TABLE</code> lets you specify
+        a <dfn class="term">replication factor</dfn>, the number of hosts on which to cache the same data blocks.
+        When Impala processes a cached data block, where the cache replication factor is greater than 1, Impala randomly
+        selects a host that has a cached copy of that data block. This optimization avoids excessive CPU
+        usage on a single host when the same cached data block is processed multiple times.
+        Where practical, specify a value greater than or equal to the HDFS block replication factor.
+      </p>
+
+<pre class="pre codeblock"><code>-- Cache the entire table (all partitions).
+alter table census set cached in '<var class="keyword varname">pool_name</var>';
+
+-- Remove the entire table from the cache.
+alter table census set uncached;
+
+-- Cache a portion of the table (a single partition).
+-- If the table is partitioned by multiple columns (such as year, month, day),
+-- the ALTER TABLE command must specify values for all those columns.
+alter table census partition (year=1960) set cached in '<var class="keyword varname">pool_name</var>';
+
+<span class="ph">-- Cache the data from one partition on up to 4 hosts, to minimize CPU load on any
+-- single host when the same data block is processed multiple times.
+alter table census partition (year=1970)
+  set cached in '<var class="keyword varname">pool_name</var>' with replication = 4;</span>
+
+-- At each stage, check the volume of cached data.
+-- For large tables or partitions, the background loading might take some time,
+-- so you might have to wait and reissue the statement until all the data
+-- has finished being loaded into the cache.
+show table stats census;
++-------+-------+--------+------+--------------+--------+
+| year  | #Rows | #Files | Size | Bytes Cached | Format |
++-------+-------+--------+------+--------------+--------+
+| 1900  | -1    | 1      | 11B  | NOT CACHED   | TEXT   |
+| 1940  | -1    | 1      | 11B  | NOT CACHED   | TEXT   |
+| 1960  | -1    | 1      | 11B  | 11B          | TEXT   |
+| 1970  | -1    | 1      | 11B  | NOT CACHED   | TEXT   |
+| Total | -1    | 4      | 44B  | 11B          |        |
++-------+-------+--------+------+--------------+--------+
+</code></pre>
+
+      <p class="p">
+        <strong class="ph b">CREATE TABLE considerations:</strong>
+      </p>
+
+      <p class="p">
+        The HDFS caching feature affects the Impala <code class="ph codeph">CREATE TABLE</code> statement as follows:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+        <p class="p">
+          You can put a <code class="ph codeph">CACHED IN '<var class="keyword varname">pool_name</var>'</code> clause
+          <span class="ph">and optionally a <code class="ph codeph">WITH REPLICATION = <var class="keyword varname">number_of_hosts</var></code> clause</span>
+          at the end of a
+          <code class="ph codeph">CREATE TABLE</code> statement to automatically cache the entire contents of the table,
+          including any partitions added later. The <var class="keyword varname">pool_name</var> is a pool that you previously set
+          up with the <span class="keyword cmdname">hdfs cacheadmin</span> command.
+        </p>
+        </li>
+
+        <li class="li">
+        <p class="p">
+          Once a table is designated for HDFS caching through the <code class="ph codeph">CREATE TABLE</code> statement, if new
+          partitions are added later through <code class="ph codeph">ALTER TABLE ... ADD PARTITION</code> statements, the data in
+          those new partitions is automatically cached in the same pool.
+        </p>
+        </li>
+
+        <li class="li">
+        <p class="p">
+          If you want to perform repetitive queries on a subset of data from a large table, and it is not practical
+          to designate the entire table or specific partitions for HDFS caching, you can create a new cached table
+          with just a subset of the data by using <code class="ph codeph">CREATE TABLE ... CACHED IN '<var class="keyword varname">pool_name</var>'
+          AS SELECT ... WHERE ...</code>. When you are finished with generating reports from this subset of data,
+          drop the table and both the data files and the data cached in RAM are automatically deleted.
+        </p>
+        </li>
+      </ul>
+
+      <p class="p">
+        See <a class="xref" href="impala_create_table.html#create_table">CREATE TABLE Statement</a> for the full syntax.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Other memory considerations:</strong>
+      </p>
+
+      <p class="p">
+        Certain DDL operations, such as <code class="ph codeph">ALTER TABLE ... SET LOCATION</code>, are blocked while the
+        underlying HDFS directories contain cached files. You must uncache the files first, before changing the
+        location, dropping the table, and so on.
+      </p>
+
+      <p class="p">
+        When data is requested to be pinned in memory, that process happens in the background without blocking
+        access to the data while the caching is in progress. Loading the data from disk could take some time.
+        Impala reads each HDFS data block from memory if it has been pinned already, or from disk if it has not
+        been pinned yet. When files are added to a table or partition whose contents are cached, Impala
+        automatically detects those changes and performs a <code class="ph codeph">REFRESH</code> automatically once the relevant
+        data is cached.
+      </p>
+
+      <p class="p">
+        The amount of data that you can pin on each node through the HDFS caching mechanism is subject to a quota
+        that is enforced by the underlying HDFS service. Before requesting to pin an Impala table or partition in
+        memory, check that its size does not exceed this quota.
+      </p>
+
+      <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+        Because the HDFS cache consists of combined memory from all the DataNodes in the cluster, cached tables or
+        partitions can be bigger than the amount of HDFS cache memory on any single host.
+      </div>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title5" id="hdfs_caching__hdfs_caching_etl">
+
+    <h2 class="title topictitle2" id="ariaid-title5">Loading and Removing Data with HDFS Caching Enabled</h2>
+
+
+    <div class="body conbody">
+
+      <p class="p">
+        When HDFS caching is enabled, extra processing happens in the background when you add or remove data
+        through statements such as <code class="ph codeph">INSERT</code> and <code class="ph codeph">DROP TABLE</code>.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Inserting or loading data:</strong>
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          When Impala performs an <code class="ph codeph"><a class="xref" href="impala_insert.html#insert">INSERT</a></code> or
+          <code class="ph codeph"><a class="xref" href="impala_load_data.html#load_data">LOAD DATA</a></code> statement for a table or
+          partition that is cached, the new data files are automatically cached and Impala recognizes that fact
+          automatically.
+        </li>
+
+        <li class="li">
+          If you perform an <code class="ph codeph">INSERT</code> or <code class="ph codeph">LOAD DATA</code> through Hive, as always, Impala
+          only recognizes the new data files after a <code class="ph codeph">REFRESH <var class="keyword varname">table_name</var></code>
+          statement in Impala.
+        </li>
+
+        <li class="li">
+          If the cache pool is entirely full, or becomes full before all the requested data can be cached, the
+          Impala DDL statement returns an error. This is to avoid situations where only some of the requested data
+          could be cached.
+        </li>
+
+        <li class="li">
+          When HDFS caching is enabled for a table or partition, new data files are cached automatically when they
+          are added to the appropriate directory in HDFS, without the need for a <code class="ph codeph">REFRESH</code> statement
+          in Impala. Impala automatically performs a <code class="ph codeph">REFRESH</code> once the new data is loaded into the
+          HDFS cache.
+        </li>
+      </ul>
+
+      <p class="p">
+        <strong class="ph b">Dropping tables, partitions, or cache pools:</strong>
+      </p>
+
+      <p class="p">
+        The HDFS caching feature interacts with the Impala
+        <code class="ph codeph"><a class="xref" href="impala_drop_table.html#drop_table">DROP TABLE</a></code> and
+        <code class="ph codeph"><a class="xref" href="impala_alter_table.html#alter_table">ALTER TABLE ... DROP PARTITION</a></code>
+        statements as follows:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          When you issue a <code class="ph codeph">DROP TABLE</code> for a table that is entirely cached, or has some partitions
+          cached, the <code class="ph codeph">DROP TABLE</code> succeeds and all the cache directives Impala submitted for that
+          table are removed from the HDFS cache system.
+        </li>
+
+        <li class="li">
+          The same applies to <code class="ph codeph">ALTER TABLE ... DROP PARTITION</code>. The operation succeeds and any cache
+          directives are removed.
+        </li>
+
+        <li class="li">
+          As always, the underlying data files are removed if the dropped table is an internal table, or the
+          dropped partition is in its default location underneath an internal table. The data files are left alone
+          if the dropped table is an external table, or if the dropped partition is in a non-default location.
+        </li>
+
+        <li class="li">
+          If you designated the data files as cached through the <span class="keyword cmdname">hdfs cacheadmin</span> command, and
+          the data files are left behind as described in the previous item, the data files remain cached. Impala
+          only removes the cache directives submitted by Impala through the <code class="ph codeph">CREATE TABLE</code> or
+          <code class="ph codeph">ALTER TABLE</code> statements. It is OK to have multiple redundant cache directives pertaining
+          to the same files; the directives all have unique IDs and owners so that the system can tell them apart.
+        </li>
+
+        <li class="li">
+          If you drop an HDFS cache pool through the <span class="keyword cmdname">hdfs cacheadmin</span> command, all the Impala
+          data files are preserved, just no longer cached. After a subsequent <code class="ph codeph">REFRESH</code>,
+          <code class="ph codeph">SHOW TABLE STATS</code> reports 0 bytes cached for each associated Impala table or partition.
+        </li>
+      </ul>
+
+      <p class="p">
+        <strong class="ph b">Relocating a table or partition:</strong>
+      </p>
+
+      <p class="p">
+        The HDFS caching feature interacts with the Impala
+        <code class="ph codeph"><a class="xref" href="impala_alter_table.html#alter_table">ALTER TABLE ... SET LOCATION</a></code>
+        statement as follows:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          If you have designated a table or partition as cached through the <code class="ph codeph">CREATE TABLE</code> or
+          <code class="ph codeph">ALTER TABLE</code> statements, subsequent attempts to relocate the table or partition through
+          an <code class="ph codeph">ALTER TABLE ... SET LOCATION</code> statement will fail. You must issue an <code class="ph codeph">ALTER
+          TABLE ... SET UNCACHED</code> statement for the table or partition first. Otherwise, Impala would lose
+          track of some cached data files and have no way to uncache them later.
+        </li>
+      </ul>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title6" id="hdfs_caching__hdfs_caching_admin">
+
+    <h2 class="title topictitle2" id="ariaid-title6">Administration for HDFS Caching with Impala</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        Here are the guidelines and steps to check or change the status of HDFS caching for Impala data:
+      </p>
+
+      <p class="p">
+        <strong class="ph b">hdfs cacheadmin command:</strong>
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          If you drop a cache pool with the <span class="keyword cmdname">hdfs cacheadmin</span> command, Impala queries against the
+          associated data files will still work, by falling back to reading the files from disk. After performing a
+          <code class="ph codeph">REFRESH</code> on the table, Impala reports the number of bytes cached as 0 for all associated
+          tables and partitions.
+        </li>
+
+        <li class="li">
+          You might use <span class="keyword cmdname">hdfs cacheadmin</span> to get a list of existing cache pools, or detailed
+          information about the pools, as follows:
+<pre class="pre codeblock"><code>hdfs cacheadmin -listDirectives         # Basic info
+Found 122 entries
+  ID POOL       REPL EXPIRY  PATH
+ 123 testPool      1 never   /user/hive/warehouse/tpcds.store_sales
+ 124 testPool      1 never   /user/hive/warehouse/tpcds.store_sales/ss_date=1998-01-15
+ 125 testPool      1 never   /user/hive/warehouse/tpcds.store_sales/ss_date=1998-02-01
+...
+
+hdfs cacheadmin -listDirectives -stats  # More details
+Found 122 entries
+  ID POOL       REPL EXPIRY  PATH                                                        BYTES_NEEDED  BYTES_CACHED  FILES_NEEDED  FILES_CACHED
+ 123 testPool      1 never   /user/hive/warehouse/tpcds.store_sales                                 0             0             0             0
+ 124 testPool      1 never   /user/hive/warehouse/tpcds.store_sales/ss_date=1998-01-15         143169        143169             1             1
+ 125 testPool      1 never   /user/hive/warehouse/tpcds.store_sales/ss_date=1998-02-01         112447        112447             1             1
+...
+</code></pre>
+        </li>
+      </ul>
+
+      <p class="p">
+        <strong class="ph b">Impala SHOW statement:</strong>
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          For each table or partition, the <code class="ph codeph">SHOW TABLE STATS</code> or <code class="ph codeph">SHOW PARTITIONS</code>
+          statement displays the number of bytes currently cached by the HDFS caching feature. If there are no
+          cache directives in place for that table or partition, the result set displays <code class="ph codeph">NOT
+          CACHED</code>. A value of 0, or a smaller number than the overall size of the table or partition,
+          indicates that the cache request has been submitted but the data has not been entirely loaded into memory
+          yet. See <a class="xref" href="impala_show.html#show">SHOW Statement</a> for details.
+        </li>
+      </ul>
+
+      <p class="p">
+        <strong class="ph b">Impala memory limits:</strong>
+      </p>
+
+      <p class="p">
+        The Impala HDFS caching feature interacts with the Impala memory limits as follows:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          The maximum size of each HDFS cache pool is specified externally to Impala, through the <span class="keyword cmdname">hdfs
+          cacheadmin</span> command.
+        </li>
+
+        <li class="li">
+          All the memory used for HDFS caching is separate from the <span class="keyword cmdname">impalad</span> daemon address space
+          and does not count towards the limits of the <code class="ph codeph">--mem_limit</code> startup option,
+          <code class="ph codeph">MEM_LIMIT</code> query option, or further limits imposed through YARN resource management or
+          the Linux <code class="ph codeph">cgroups</code> mechanism.
+        </li>
+
+        <li class="li">
+          Because accessing HDFS cached data avoids a memory-to-memory copy operation, queries involving cached
+          data require less memory on the Impala side than the equivalent queries on uncached data. In addition to
+          any performance benefits in a single-user environment, the reduced memory helps to improve scalability
+          under high-concurrency workloads.
+        </li>
+      </ul>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title7" id="hdfs_caching__hdfs_caching_performance">
+
+    <h2 class="title topictitle2" id="ariaid-title7">Performance Considerations for HDFS Caching with Impala</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        In Impala 1.4.0 and higher, Impala supports efficient reads from data that is pinned in memory through HDFS
+        caching. Impala takes advantage of the HDFS API and reads the data from memory rather than from disk
+        whether the data files are pinned using Impala DDL statements, or using the command-line mechanism where
+        you specify HDFS paths.
+      </p>
+
+      <p class="p">
+        When you examine the output of the <span class="keyword cmdname">impala-shell</span> <span class="keyword cmdname">SUMMARY</span> command, or
+        look in the metrics report for the <span class="keyword cmdname">impalad</span> daemon, you see how many bytes are read from
+        the HDFS cache. For example, this excerpt from a query profile illustrates that all the data read during a
+        particular phase of the query came from the HDFS cache, because the <code class="ph codeph">BytesRead</code> and
+        <code class="ph codeph">BytesReadDataNodeCache</code> values are identical.
+      </p>
+
+<pre class="pre codeblock"><code>HDFS_SCAN_NODE (id=0):(Total: 11s114ms, non-child: 11s114ms, % non-child: 100.00%)
+        - AverageHdfsReadThreadConcurrency: 0.00
+        - AverageScannerThreadConcurrency: 32.75
+<strong class="ph b">        - BytesRead: 10.47 GB (11240756479)
+        - BytesReadDataNodeCache: 10.47 GB (11240756479)</strong>
+        - BytesReadLocal: 10.47 GB (11240756479)
+        - BytesReadShortCircuit: 10.47 GB (11240756479)
+        - DecompressionTime: 27s572ms
+</code></pre>
+
+      <p class="p">
+        For queries involving smaller amounts of data, or in single-user workloads, you might not notice a
+        significant difference in query response time with or without HDFS caching. Even with HDFS caching turned
+        off, the data for the query might still be in the Linux OS buffer cache. The benefits become clearer as
+        data volume increases, and especially as the system processes more concurrent queries. HDFS caching
+        improves the scalability of the overall system. That is, it prevents query performance from declining when
+        the workload outstrips the capacity of the Linux OS cache.
+      </p>
+
+      <p class="p">
+        Due to a limitation of HDFS, zero-copy reads are not supported with
+        encryption. Where practical, avoid HDFS caching for Impala data
+        files in encryption zones. The queries fall back to the normal read
+        path during query execution, which might cause some performance overhead.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">SELECT considerations:</strong>
+      </p>
+
+      <p class="p">
+        The Impala HDFS caching feature interacts with the
+        <code class="ph codeph"><a class="xref" href="impala_select.html#select">SELECT</a></code> statement and query performance as
+        follows:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          Impala automatically reads from memory any data that has been designated as cached and actually loaded
+          into the HDFS cache. (It could take some time after the initial request to fully populate the cache for a
+          table with large size or many partitions.) The speedup comes from two aspects: reading from RAM instead
+          of disk, and accessing the data straight from the cache area instead of copying from one RAM area to
+          another. This second aspect yields further performance improvement over the standard OS caching
+          mechanism, which still results in memory-to-memory copying of cached data.
+        </li>
+
+        <li class="li">
+          For small amounts of data, the query speedup might not be noticeable in terms of wall clock time. The
+          performance might be roughly the same with HDFS caching turned on or off, due to recently used data being
+          held in the Linux OS cache. The difference is more pronounced with:
+          <ul class="ul">
+            <li class="li">
+              Data volumes (for all queries running concurrently) that exceed the size of the Linux OS cache.
+            </li>
+
+            <li class="li">
+              A busy cluster running many concurrent queries, where the reduction in memory-to-memory copying and
+              overall memory usage during queries results in greater scalability and throughput.
+            </li>
+
+            <li class="li">
+              Thus, to really exercise and benchmark this feature in a development environment, you might need to
+              simulate realistic workloads and concurrent queries that match your production environment.
+            </li>
+
+            <li class="li">
+              One way to simulate a heavy workload on a lightly loaded system is to flush the OS buffer cache (on
+              each DataNode) between iterations of queries against the same tables or partitions:
+<pre class="pre codeblock"><code>$ sync
+$ echo 1 &gt; /proc/sys/vm/drop_caches
+</code></pre>
+            </li>
+          </ul>
+        </li>
+
+        <li class="li">
+          Impala queries take advantage of HDFS cached data regardless of whether the cache directive was issued by
+          Impala or externally through the <span class="keyword cmdname">hdfs cacheadmin</span> command, for example for an external
+          table where the cached data files might be accessed by several different Hadoop components.
+        </li>
+
+        <li class="li">
+          If your query returns a large result set, the time reported for the query could be dominated by the time
+          needed to print the results on the screen. To measure the time for the underlying query processing, query
+          the <code class="ph codeph">COUNT()</code> of the big result set, which does all the same processing but only prints a
+          single line to the screen.
+        </li>
+      </ul>
+    </div>
+  </article>
+</article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_perf_joins.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_perf_joins.html b/docs/build3x/html/topics/impala_perf_joins.html
new file mode 100644
index 0000000..7def5b4
--- /dev/null
+++ b/docs/build3x/html/topics/impala_perf_joins.html
@@ -0,0 +1,508 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_performance.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="perf_joins"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Performance Considerations for Join Queries</title></head><body id="perf_joins"><ma
 in role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Performance Considerations for Join Queries</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+      Queries involving join operations often require more tuning than queries that refer to only one table. The
+      maximum size of the result set from a join query is the product of the number of rows in all the joined
+      tables. When joining several tables with millions or billions of rows, any missed opportunity to filter the
+      result set, or other inefficiency in the query, could lead to an operation that does not finish in a
+      practical time and has to be cancelled.
+    </p>
+
+    <p class="p">
+      The simplest technique for tuning an Impala join query is to collect statistics on each table involved in the
+      join using the <code class="ph codeph"><a class="xref" href="impala_compute_stats.html#compute_stats">COMPUTE STATS</a></code>
+      statement, and then let Impala automatically optimize the query based on the size of each table, number of
+      distinct values of each column, and so on. The <code class="ph codeph">COMPUTE STATS</code> statement and the join
+      optimization are new features introduced in Impala 1.2.2. For accurate statistics about each table, issue the
+      <code class="ph codeph">COMPUTE STATS</code> statement after loading the data into that table, and again if the amount of
+      data changes substantially due to an <code class="ph codeph">INSERT</code>, <code class="ph codeph">LOAD DATA</code>, adding a partition,
+      and so on.
+    </p>
+
+    <p class="p">
+      If statistics are not available for all the tables in the join query, or if Impala chooses a join order that
+      is not the most efficient, you can override the automatic join order optimization by specifying the
+      <code class="ph codeph">STRAIGHT_JOIN</code> keyword immediately after the <code class="ph codeph">SELECT</code> and any <code class="ph codeph">DISTINCT</code>
+      or <code class="ph codeph">ALL</code> keywords. In this case, Impala uses the order the tables appear in the query to guide how the
+      joins are processed.
+    </p>
+
+    <p class="p">
+      When you use the <code class="ph codeph">STRAIGHT_JOIN</code> technique, you must order the tables in the join query
+      manually instead of relying on the Impala optimizer. The optimizer uses sophisticated techniques to estimate
+      the size of the result set at each stage of the join. For manual ordering, use this heuristic approach to
+      start with, and then experiment to fine-tune the order:
+    </p>
+
+    <ul class="ul">
+      <li class="li">
+        Specify the largest table first. This table is read from disk by each Impala node and so its size is not
+        significant in terms of memory usage during the query.
+      </li>
+
+      <li class="li">
+        Next, specify the smallest table. The contents of the second, third, and so on tables are all transmitted
+        across the network. You want to minimize the size of the result set from each subsequent stage of the join
+        query. The most likely approach involves joining a small table first, so that the result set remains small
+        even as subsequent larger tables are processed.
+      </li>
+
+      <li class="li">
+        Join the next smallest table, then the next smallest, and so on.
+      </li>
+
+      <li class="li">
+        For example, if you had tables <code class="ph codeph">BIG</code>, <code class="ph codeph">MEDIUM</code>, <code class="ph codeph">SMALL</code>, and
+        <code class="ph codeph">TINY</code>, the logical join order to try would be <code class="ph codeph">BIG</code>, <code class="ph codeph">TINY</code>,
+        <code class="ph codeph">SMALL</code>, <code class="ph codeph">MEDIUM</code>.
+      </li>
+    </ul>
+
+    <p class="p">
+      The terms <span class="q">"largest"</span> and <span class="q">"smallest"</span> refers to the size of the intermediate result set based on the
+      number of rows and columns from each table that are part of the result set. For example, if you join one
+      table <code class="ph codeph">sales</code> with another table <code class="ph codeph">customers</code>, a query might find results from
+      100 different customers who made a total of 5000 purchases. In that case, you would specify <code class="ph codeph">SELECT
+      ... FROM sales JOIN customers ...</code>, putting <code class="ph codeph">customers</code> on the right side because it
+      is smaller in the context of this query.
+    </p>
+
+    <p class="p">
+      The Impala query planner chooses between different techniques for performing join queries, depending on the
+      absolute and relative sizes of the tables. <strong class="ph b">Broadcast joins</strong> are the default, where the right-hand table
+      is considered to be smaller than the left-hand table, and its contents are sent to all the other nodes
+      involved in the query. The alternative technique is known as a <strong class="ph b">partitioned join</strong> (not related to a
+      partitioned table), which is more suitable for large tables of roughly equal size. With this technique,
+      portions of each table are sent to appropriate other nodes where those subsets of rows can be processed in
+      parallel. The choice of broadcast or partitioned join also depends on statistics being available for all
+      tables in the join, gathered by the <code class="ph codeph">COMPUTE STATS</code> statement.
+    </p>
+
+    <p class="p">
+      To see which join strategy is used for a particular query, issue an <code class="ph codeph">EXPLAIN</code> statement for
+      the query. If you find that a query uses a broadcast join when you know through benchmarking that a
+      partitioned join would be more efficient, or vice versa, add a hint to the query to specify the precise join
+      mechanism to use. See <a class="xref" href="impala_hints.html#hints">Optimizer Hints</a> for details.
+    </p>
+  </div>
+
+  <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_performance.html">Tuning Impala for Performance</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="perf_joins__joins_no_stats">
+
+    <h2 class="title topictitle2" id="ariaid-title2">How Joins Are Processed when Statistics Are Unavailable</h2>
+
+
+    <div class="body conbody">
+
+      <p class="p">
+        If table or column statistics are not available for some tables in a join, Impala still reorders the tables
+        using the information that is available. Tables with statistics are placed on the left side of the join
+        order, in descending order of cost based on overall size and cardinality. Tables without statistics are
+        treated as zero-size, that is, they are always placed on the right side of the join order.
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="perf_joins__straight_join">
+
+    <h2 class="title topictitle2" id="ariaid-title3">Overriding Join Reordering with STRAIGHT_JOIN</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        If an Impala join query is inefficient because of outdated statistics or unexpected data distribution, you
+        can keep Impala from reordering the joined tables by using the <code class="ph codeph">STRAIGHT_JOIN</code> keyword
+        immediately after the <code class="ph codeph">SELECT</code> and any <code class="ph codeph">DISTINCT</code> or <code class="ph codeph">ALL</code>
+        keywords. The <code class="ph codeph">STRAIGHT_JOIN</code> keyword turns off
+        the reordering of join clauses that Impala does internally, and produces a plan that relies on the join
+        clauses being ordered optimally in the query text. In this case, rewrite the query so that the largest
+        table is on the left, followed by the next largest, and so on until the smallest table is on the right.
+      </p>
+
+      <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+        <p class="p">
+        The <code class="ph codeph">STRAIGHT_JOIN</code> hint affects the join order of table references in the query
+        block containing the hint. It does not affect the join order of nested queries, such as views,
+        inline views, or <code class="ph codeph">WHERE</code>-clause subqueries. To use this hint for performance
+        tuning of complex queries, apply the hint to all query blocks that need a fixed join order.
+      </p>
+      </div>
+
+      <p class="p">
+        In this example, the subselect from the <code class="ph codeph">BIG</code> table produces a very small result set, but
+        the table might still be treated as if it were the biggest and placed first in the join order. Using
+        <code class="ph codeph">STRAIGHT_JOIN</code> for the last join clause prevents the final table from being reordered,
+        keeping it as the rightmost table in the join order.
+      </p>
+
+<pre class="pre codeblock"><code>select straight_join x from medium join small join (select * from big where c1 &lt; 10) as big
+  where medium.id = small.id and small.id = big.id;
+
+-- If the query contains [DISTINCT | ALL], the hint goes after those keywords.
+select distinct straight_join x from medium join small join (select * from big where c1 &lt; 10) as big
+  where medium.id = small.id and small.id = big.id;</code></pre>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="perf_joins__perf_joins_examples">
+
+    <h2 class="title topictitle2" id="ariaid-title4">Examples of Join Order Optimization</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        Here are examples showing joins between tables with 1 billion, 200 million, and 1 million rows. (In this
+        case, the tables are unpartitioned and using Parquet format.) The smaller tables contain subsets of data
+        from the largest one, for convenience of joining on the unique <code class="ph codeph">ID</code> column. The smallest
+        table only contains a subset of columns from the others.
+      </p>
+
+      <p class="p"></p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; create table big stored as parquet as select * from raw_data;
++----------------------------+
+| summary                    |
++----------------------------+
+| Inserted 1000000000 row(s) |
++----------------------------+
+Returned 1 row(s) in 671.56s
+[localhost:21000] &gt; desc big;
++-----------+---------+---------+
+| name      | type    | comment |
++-----------+---------+---------+
+| id        | int     |         |
+| val       | int     |         |
+| zfill     | string  |         |
+| name      | string  |         |
+| assertion | boolean |         |
++-----------+---------+---------+
+Returned 5 row(s) in 0.01s
+[localhost:21000] &gt; create table medium stored as parquet as select * from big limit 200 * floor(1e6);
++---------------------------+
+| summary                   |
++---------------------------+
+| Inserted 200000000 row(s) |
++---------------------------+
+Returned 1 row(s) in 138.31s
+[localhost:21000] &gt; create table small stored as parquet as select id,val,name from big where assertion = true limit 1 * floor(1e6);
++-------------------------+
+| summary                 |
++-------------------------+
+| Inserted 1000000 row(s) |
++-------------------------+
+Returned 1 row(s) in 6.32s</code></pre>
+
+      <p class="p">
+        For any kind of performance experimentation, use the <code class="ph codeph">EXPLAIN</code> statement to see how any
+        expensive query will be performed without actually running it, and enable verbose <code class="ph codeph">EXPLAIN</code>
+        plans containing more performance-oriented detail: The most interesting plan lines are highlighted in bold,
+        showing that without statistics for the joined tables, Impala cannot make a good estimate of the number of
+        rows involved at each stage of processing, and is likely to stick with the <code class="ph codeph">BROADCAST</code> join
+        mechanism that sends a complete copy of one of the tables to each node.
+      </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; set explain_level=verbose;
+EXPLAIN_LEVEL set to verbose
+[localhost:21000] &gt; explain select count(*) from big join medium where big.id = medium.id;
++----------------------------------------------------------+
+| Explain String                                           |
++----------------------------------------------------------+
+| Estimated Per-Host Requirements: Memory=2.10GB VCores=2  |
+|                                                          |
+| PLAN FRAGMENT 0                                          |
+|   PARTITION: UNPARTITIONED                               |
+|                                                          |
+|   6:AGGREGATE (merge finalize)                           |
+|   |  output: SUM(COUNT(*))                               |
+|   |  cardinality: 1                                      |
+|   |  per-host memory: unavailable                        |
+|   |  tuple ids: 2                                        |
+|   |                                                      |
+|   5:EXCHANGE                                             |
+|      cardinality: 1                                      |
+|      per-host memory: unavailable                        |
+|      tuple ids: 2                                        |
+|                                                          |
+| PLAN FRAGMENT 1                                          |
+|   PARTITION: RANDOM                                      |
+|                                                          |
+|   STREAM DATA SINK                                       |
+|     EXCHANGE ID: 5                                       |
+|     UNPARTITIONED                                        |
+|                                                          |
+|   3:AGGREGATE                                            |
+|   |  output: COUNT(*)                                    |
+|   |  cardinality: 1                                      |
+|   |  per-host memory: 10.00MB                            |
+|   |  tuple ids: 2                                        |
+|   |                                                      |
+|   2:HASH JOIN                                            |
+<strong class="ph b">|   |  join op: INNER JOIN (BROADCAST)                     |</strong>
+|   |  hash predicates:                                    |
+|   |    big.id = medium.id                                |
+<strong class="ph b">|   |  cardinality: unavailable                            |</strong>
+|   |  per-host memory: 2.00GB                             |
+|   |  tuple ids: 0 1                                      |
+|   |                                                      |
+|   |----4:EXCHANGE                                        |
+|   |       cardinality: unavailable                       |
+|   |       per-host memory: 0B                            |
+|   |       tuple ids: 1                                   |
+|   |                                                      |
+|   0:SCAN HDFS                                            |
+<strong class="ph b">|      table=join_order.big #partitions=1/1 size=23.12GB   |
+|      table stats: unavailable                            |
+|      column stats: unavailable                           |
+|      cardinality: unavailable                            |</strong>
+|      per-host memory: 88.00MB                            |
+|      tuple ids: 0                                        |
+|                                                          |
+| PLAN FRAGMENT 2                                          |
+|   PARTITION: RANDOM                                      |
+|                                                          |
+|   STREAM DATA SINK                                       |
+|     EXCHANGE ID: 4                                       |
+|     UNPARTITIONED                                        |
+|                                                          |
+|   1:SCAN HDFS                                            |
+<strong class="ph b">|      table=join_order.medium #partitions=1/1 size=4.62GB |
+|      table stats: unavailable                            |
+|      column stats: unavailable                           |
+|      cardinality: unavailable                            |</strong>
+|      per-host memory: 88.00MB                            |
+|      tuple ids: 1                                        |
++----------------------------------------------------------+
+Returned 64 row(s) in 0.04s</code></pre>
+
+      <p class="p">
+        Gathering statistics for all the tables is straightforward, one <code class="ph codeph">COMPUTE STATS</code> statement
+        per table:
+      </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; compute stats small;
++-----------------------------------------+
+| summary                                 |
++-----------------------------------------+
+| Updated 1 partition(s) and 3 column(s). |
++-----------------------------------------+
+Returned 1 row(s) in 4.26s
+[localhost:21000] &gt; compute stats medium;
++-----------------------------------------+
+| summary                                 |
++-----------------------------------------+
+| Updated 1 partition(s) and 5 column(s). |
++-----------------------------------------+
+Returned 1 row(s) in 42.11s
+[localhost:21000] &gt; compute stats big;
++-----------------------------------------+
+| summary                                 |
++-----------------------------------------+
+| Updated 1 partition(s) and 5 column(s). |
++-----------------------------------------+
+Returned 1 row(s) in 165.44s</code></pre>
+
+      <p class="p">
+        With statistics in place, Impala can choose a more effective join order rather than following the
+        left-to-right sequence of tables in the query, and can choose <code class="ph codeph">BROADCAST</code> or
+        <code class="ph codeph">PARTITIONED</code> join strategies based on the overall sizes and number of rows in the table:
+      </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; explain select count(*) from medium join big where big.id = medium.id;
+Query: explain select count(*) from medium join big where big.id = medium.id
++-----------------------------------------------------------+
+| Explain String                                            |
++-----------------------------------------------------------+
+| Estimated Per-Host Requirements: Memory=937.23MB VCores=2 |
+|                                                           |
+| PLAN FRAGMENT 0                                           |
+|   PARTITION: UNPARTITIONED                                |
+|                                                           |
+|   6:AGGREGATE (merge finalize)                            |
+|   |  output: SUM(COUNT(*))                                |
+|   |  cardinality: 1                                       |
+|   |  per-host memory: unavailable                         |
+|   |  tuple ids: 2                                         |
+|   |                                                       |
+|   5:EXCHANGE                                              |
+|      cardinality: 1                                       |
+|      per-host memory: unavailable                         |
+|      tuple ids: 2                                         |
+|                                                           |
+| PLAN FRAGMENT 1                                           |
+|   PARTITION: RANDOM                                       |
+|                                                           |
+|   STREAM DATA SINK                                        |
+|     EXCHANGE ID: 5                                        |
+|     UNPARTITIONED                                         |
+|                                                           |
+|   3:AGGREGATE                                             |
+|   |  output: COUNT(*)                                     |
+|   |  cardinality: 1                                       |
+|   |  per-host memory: 10.00MB                             |
+|   |  tuple ids: 2                                         |
+|   |                                                       |
+|   2:HASH JOIN                                             |
+|   |  join op: INNER JOIN (BROADCAST)                      |
+|   |  hash predicates:                                     |
+|   |    big.id = medium.id                                 |
+|   |  cardinality: 1443004441                              |
+|   |  per-host memory: 839.23MB                            |
+|   |  tuple ids: 1 0                                       |
+|   |                                                       |
+|   |----4:EXCHANGE                                         |
+|   |       cardinality: 200000000                          |
+|   |       per-host memory: 0B                             |
+|   |       tuple ids: 0                                    |
+|   |                                                       |
+|   1:SCAN HDFS                                             |
+|      table=join_order.big #partitions=1/1 size=23.12GB    |
+|      table stats: 1000000000 rows total                   |
+|      column stats: all                                    |
+|      cardinality: 1000000000                              |
+|      per-host memory: 88.00MB                             |
+|      tuple ids: 1                                         |
+|                                                           |
+| PLAN FRAGMENT 2                                           |
+|   PARTITION: RANDOM                                       |
+|                                                           |
+|   STREAM DATA SINK                                        |
+|     EXCHANGE ID: 4                                        |
+|     UNPARTITIONED                                         |
+|                                                           |
+|   0:SCAN HDFS                                             |
+|      table=join_order.medium #partitions=1/1 size=4.62GB  |
+|      table stats: 200000000 rows total                    |
+|      column stats: all                                    |
+|      cardinality: 200000000                               |
+|      per-host memory: 88.00MB                             |
+|      tuple ids: 0                                         |
++-----------------------------------------------------------+
+Returned 64 row(s) in 0.04s
+
+[localhost:21000] &gt; explain select count(*) from small join big where big.id = small.id;
+Query: explain select count(*) from small join big where big.id = small.id
++-----------------------------------------------------------+
+| Explain String                                            |
++-----------------------------------------------------------+
+| Estimated Per-Host Requirements: Memory=101.15MB VCores=2 |
+|                                                           |
+| PLAN FRAGMENT 0                                           |
+|   PARTITION: UNPARTITIONED                                |
+|                                                           |
+|   6:AGGREGATE (merge finalize)                            |
+|   |  output: SUM(COUNT(*))                                |
+|   |  cardinality: 1                                       |
+|   |  per-host memory: unavailable                         |
+|   |  tuple ids: 2                                         |
+|   |                                                       |
+|   5:EXCHANGE                                              |
+|      cardinality: 1                                       |
+|      per-host memory: unavailable                         |
+|      tuple ids: 2                                         |
+|                                                           |
+| PLAN FRAGMENT 1                                           |
+|   PARTITION: RANDOM                                       |
+|                                                           |
+|   STREAM DATA SINK                                        |
+|     EXCHANGE ID: 5                                        |
+|     UNPARTITIONED                                         |
+|                                                           |
+|   3:AGGREGATE                                             |
+|   |  output: COUNT(*)                                     |
+|   |  cardinality: 1                                       |
+|   |  per-host memory: 10.00MB                             |
+|   |  tuple ids: 2                                         |
+|   |                                                       |
+|   2:HASH JOIN                                             |
+|   |  join op: INNER JOIN (BROADCAST)                      |
+|   |  hash predicates:                                     |
+|   |    big.id = small.id                                  |
+|   |  cardinality: 1000000000                              |
+|   |  per-host memory: 3.15MB                              |
+|   |  tuple ids: 1 0                                       |
+|   |                                                       |
+|   |----4:EXCHANGE                                         |
+|   |       cardinality: 1000000                            |
+|   |       per-host memory: 0B                             |
+|   |       tuple ids: 0                                    |
+|   |                                                       |
+|   1:SCAN HDFS                                             |
+|      table=join_order.big #partitions=1/1 size=23.12GB    |
+|      table stats: 1000000000 rows total                   |
+|      column stats: all                                    |
+|      cardinality: 1000000000                              |
+|      per-host memory: 88.00MB                             |
+|      tuple ids: 1                                         |
+|                                                           |
+| PLAN FRAGMENT 2                                           |
+|   PARTITION: RANDOM                                       |
+|                                                           |
+|   STREAM DATA SINK                                        |
+|     EXCHANGE ID: 4                                        |
+|     UNPARTITIONED                                         |
+|                                                           |
+|   0:SCAN HDFS                                             |
+|      table=join_order.small #partitions=1/1 size=17.93MB  |
+|      table stats: 1000000 rows total                      |
+|      column stats: all                                    |
+|      cardinality: 1000000                                 |
+|      per-host memory: 32.00MB                             |
+|      tuple ids: 0                                         |
++-----------------------------------------------------------+
+Returned 64 row(s) in 0.03s</code></pre>
+
+      <p class="p">
+        When queries like these are actually run, the execution times are relatively consistent regardless of the
+        table order in the query text. Here are examples using both the unique <code class="ph codeph">ID</code> column and the
+        <code class="ph codeph">VAL</code> column containing duplicate values:
+      </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; select count(*) from big join small on (big.id = small.id);
+Query: select count(*) from big join small on (big.id = small.id)
++----------+
+| count(*) |
++----------+
+| 1000000  |
++----------+
+Returned 1 row(s) in 21.68s
+[localhost:21000] &gt; select count(*) from small join big on (big.id = small.id);
+Query: select count(*) from small join big on (big.id = small.id)
++----------+
+| count(*) |
++----------+
+| 1000000  |
++----------+
+Returned 1 row(s) in 20.45s
+
+[localhost:21000] &gt; select count(*) from big join small on (big.val = small.val);
++------------+
+| count(*)   |
++------------+
+| 2000948962 |
++------------+
+Returned 1 row(s) in 108.85s
+[localhost:21000] &gt; select count(*) from small join big on (big.val = small.val);
++------------+
+| count(*)   |
++------------+
+| 2000948962 |
++------------+
+Returned 1 row(s) in 100.76s</code></pre>
+
+      <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+        When examining the performance of join queries and the effectiveness of the join order optimization, make
+        sure the query involves enough data and cluster resources to see a difference depending on the query plan.
+        For example, a single data file of just a few megabytes will reside in a single HDFS block and be processed
+        on a single node. Likewise, if you use a single-node or two-node cluster, there might not be much
+        difference in efficiency for the broadcast or partitioned join strategies.
+      </div>
+    </div>
+  </article>
+</article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_perf_resources.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_perf_resources.html b/docs/build3x/html/topics/impala_perf_resources.html
new file mode 100644
index 0000000..2bd7503
--- /dev/null
+++ b/docs/build3x/html/topics/impala_perf_resources.html
@@ -0,0 +1,47 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_performance.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="mem_limits"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Controlling Impala Resource Usage</title></head><body id="mem_limits"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Controlling Impala Resource Usage</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+      Sometimes, balancing raw query performance against scalability requires limiting the amount of resources,
+      such as memory or CPU, used by a single query or group of queries. Impala can use several mechanisms that
+      help to smooth out the load during heavy concurrent usage, resulting in faster overall query times and
+      sharing of resources across Impala queries, MapReduce jobs, and other kinds of workloads across a <span class="keyword"></span>
+      cluster:
+    </p>
+
+    <ul class="ul">
+      <li class="li">
+        The Impala admission control feature uses a fast, distributed mechanism to hold back queries that exceed
+        limits on the number of concurrent queries or the amount of memory used. The queries are queued, and
+        executed as other queries finish and resources become available. You can control the concurrency limits,
+        and specify different limits for different groups of users to divide cluster resources according to the
+        priorities of different classes of users. This feature is new in Impala 1.3.
+        See <a class="xref" href="impala_admission.html#admission_control">Admission Control and Query Queuing</a> for details.
+      </li>
+
+      <li class="li">
+        <p class="p">
+          You can restrict the amount of memory Impala reserves during query execution by specifying the
+          <code class="ph codeph">-mem_limit</code> option for the <code class="ph codeph">impalad</code> daemon. See
+          <a class="xref" href="impala_config_options.html#config_options">Modifying Impala Startup Options</a> for details. This limit applies only to the
+          memory that is directly consumed by queries; Impala reserves additional memory at startup, for example to
+          hold cached metadata.
+        </p>
+      </li>
+
+      <li class="li">
+        <p class="p">
+          For production deployments, implement resource isolation using your cluster management
+          tool.
+        </p>
+      </li>
+    </ul>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_performance.html">Tuning Impala for Performance</a></div></div></nav></article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_perf_skew.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_perf_skew.html b/docs/build3x/html/topics/impala_perf_skew.html
new file mode 100644
index 0000000..20e5bfc
--- /dev/null
+++ b/docs/build3x/html/topics/impala_perf_skew.html
@@ -0,0 +1,139 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_performance.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="perf_skew"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Detecting and Correcting HDFS Block Skew Conditions</title></head><body id="perf_skew"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Detecting and Correcting HDFS Block Skew Conditions</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+      For best performance of Impala parallel queries, the work is divided equally across hosts in the cluster, and
+      all hosts take approximately equal time to finish their work. If one host takes substantially longer than
+      others, the extra time needed for the slow host can become the dominant factor in query performance.
+      Therefore, one of the first steps in performance tuning for Impala is to detect and correct such conditions.
+    </p>
+
+    <p class="p">
+      The main cause of uneven performance that you can correct within Impala is <dfn class="term">skew</dfn> in the number of
+      HDFS data blocks processed by each host, where some hosts process substantially more data blocks than others.
+      This condition can occur because of uneven distribution of the data values themselves, for example causing
+      certain data files or partitions to be large while others are very small. (Although it is possible to have
+      unevenly distributed data without any problems with the distribution of HDFS blocks.) Block skew could also
+      be due to the underlying block allocation policies within HDFS, the replication factor of the data files, and
+      the way that Impala chooses the host to process each data block.
+    </p>
+
+    <p class="p">
+      The most convenient way to detect block skew, or slow-host issues in general, is to examine the <span class="q">"executive
+      summary"</span> information from the query profile after running a query:
+    </p>
+
+    <ul class="ul">
+      <li class="li">
+        <p class="p">
+          In <span class="keyword cmdname">impala-shell</span>, issue the <code class="ph codeph">SUMMARY</code> command immediately after the
+          query is complete, to see just the summary information. If you detect issues involving skew, you might
+          switch to issuing the <code class="ph codeph">PROFILE</code> command, which displays the summary information followed
+          by a detailed performance analysis.
+        </p>
+      </li>
+
+      <li class="li">
+        <p class="p">
+          In the Impala debug web UI, click on the <span class="ph uicontrol">Profile</span> link associated with the query after it is
+          complete. The executive summary information is displayed early in the profile output.
+        </p>
+      </li>
+    </ul>
+
+    <p class="p">
+      For each phase of the query, you see an <span class="ph uicontrol">Avg Time</span> and a <span class="ph uicontrol">Max Time</span>
+      value, along with <span class="ph uicontrol">#Hosts</span> indicating how many hosts are involved in that query phase.
+      For all the phases with <span class="ph uicontrol">#Hosts</span> greater than one, look for cases where the maximum time
+      is substantially greater than the average time. Focus on the phases that took the longest, for example, those
+      taking multiple seconds rather than milliseconds or microseconds.
+    </p>
+
+    <p class="p">
+      If you detect that some hosts take longer than others, first rule out non-Impala causes. One reason that some
+      hosts could be slower than others is if those hosts have less capacity than the others, or if they are
+      substantially busier due to unevenly distributed non-Impala workloads:
+    </p>
+
+    <ul class="ul">
+      <li class="li">
+        <p class="p">
+          For clusters running Impala, keep the relative capacities of all hosts roughly equal. Any cost savings
+          from including some underpowered hosts in the cluster will likely be outweighed by poor or uneven
+          performance, and the time spent diagnosing performance issues.
+        </p>
+      </li>
+
+      <li class="li">
+        <p class="p">
+          If non-Impala workloads cause slowdowns on some hosts but not others, use the appropriate load-balancing
+          techniques for the non-Impala components to smooth out the load across the cluster.
+        </p>
+      </li>
+    </ul>
+
+    <p class="p">
+      If the hosts on your cluster are evenly powered and evenly loaded, examine the detailed profile output to
+      determine which host is taking longer than others for the query phase in question. Examine how many bytes are
+      processed during that phase on that host, how much memory is used, and how many bytes are transmitted across
+      the network.
+    </p>
+
+    <p class="p">
+      The most common symptom is a higher number of bytes read on one host than others, due to one host being
+      requested to process a higher number of HDFS data blocks. This condition is more likely to occur when the
+      number of blocks accessed by the query is relatively small. For example, if you have a 10-node cluster and
+      the query processes 10 HDFS blocks, each node might not process exactly one block. If one node sits idle
+      while another node processes two blocks, the query could take twice as long as if the data was perfectly
+      distributed.
+    </p>
+
+    <p class="p">
+      Possible solutions in this case include:
+    </p>
+
+    <ul class="ul">
+      <li class="li">
+        <p class="p">
+          If the query is artificially small, perhaps for benchmarking purposes, scale it up to process a larger
+          data set. For example, if some nodes read 10 HDFS data blocks while others read 11, the overall effect of
+          the uneven distribution is much lower than when some nodes did twice as much work as others. As a
+          guideline, aim for a <span class="q">"sweet spot"</span> where each node reads 2 GB or more from HDFS per query. Queries
+          that process lower volumes than that could experience inconsistent performance that smooths out as
+          queries become more data-intensive.
+        </p>
+      </li>
+
+      <li class="li">
+        <p class="p">
+          If the query processes only a few large blocks, so that many nodes sit idle and cannot help to
+          parallelize the query, consider reducing the overall block size. For example, you might adjust the
+          <code class="ph codeph">PARQUET_FILE_SIZE</code> query option before copying or converting data into a Parquet table.
+          Or you might adjust the granularity of data files produced earlier in the ETL pipeline by non-Impala
+          components. In Impala 2.0 and later, the default Parquet block size is 256 MB, reduced from 1 GB, to
+          improve parallelism for common cluster sizes and data volumes.
+        </p>
+      </li>
+
+      <li class="li">
+        <p class="p">
+          Reduce the amount of compression applied to the data. For text data files, the highest degree of
+          compression (gzip) produces unsplittable files that are more difficult for Impala to process in parallel,
+          and require extra memory during processing to hold the compressed and uncompressed data simultaneously.
+          For binary formats such as Parquet and Avro, compression can result in fewer data blocks overall, but
+          remember that when queries process relatively few blocks, there is less opportunity for parallel
+          execution and many nodes in the cluster might sit idle. Note that when Impala writes Parquet data with
+          the query option <code class="ph codeph">COMPRESSION_CODEC=NONE</code> enabled, the data is still typically compact due
+          to the encoding schemes used by Parquet, independent of the final compression step.
+        </p>
+      </li>
+    </ul>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_performance.html">Tuning Impala for Performance</a></div></div></nav></article></main></body></html>

[25/51] [partial] impala git commit: [DOCS] Impala doc site update for 3.0

Posted by mi...@apache.org.

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_kudu.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_kudu.html b/docs/build3x/html/topics/impala_kudu.html
new file mode 100644
index 0000000..1f10f44
--- /dev/null
+++ b/docs/build3x/html/topics/impala_kudu.html
@@ -0,0 +1,1449 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="impala_kudu"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Using Impala to Query Kudu Tables</title></head><body id="impala_kudu"><main role="main"><article role="article" aria-labelledby="impala_kudu__kudu">
+
+  <h1 class="title topictitle1" id="impala_kudu__kudu">Using Impala to Query Kudu Tables</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+
+      You can use Impala to query tables stored by Apache Kudu. This capability
+      allows convenient access to a storage system that is tuned for different kinds of
+      workloads than the default with Impala.
+    </p>
+
+    <p class="p">
+      By default, Impala tables are stored on HDFS using data files with various file formats.
+      HDFS files are ideal for bulk loads (append operations) and queries using full-table scans,
+      but do not support in-place updates or deletes. Kudu is an alternative storage engine used
+      by Impala which can do both in-place updates (for mixed read/write workloads) and fast scans
+      (for data-warehouse/analytic operations). Using Kudu tables with Impala can simplify the
+      ETL pipeline by avoiding extra steps to segregate and reorganize newly arrived data.
+    </p>
+
+    <p class="p">
+      Certain Impala SQL statements and clauses, such as <code class="ph codeph">DELETE</code>,
+      <code class="ph codeph">UPDATE</code>, <code class="ph codeph">UPSERT</code>, and <code class="ph codeph">PRIMARY KEY</code> work
+      only with Kudu tables. Other statements and clauses, such as <code class="ph codeph">LOAD DATA</code>,
+      <code class="ph codeph">TRUNCATE TABLE</code>, and <code class="ph codeph">INSERT OVERWRITE</code>, are not applicable
+      to Kudu tables.
+    </p>
+
+    <p class="p toc inpage"></p>
+
+  </div>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title2" id="impala_kudu__kudu_benefits">
+
+    <h2 class="title topictitle2" id="ariaid-title2">Benefits of Using Kudu Tables with Impala</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        The combination of Kudu and Impala works best for tables where scan performance is
+        important, but data arrives continuously, in small batches, or needs to be updated
+        without being completely replaced. HDFS-backed tables can require substantial overhead
+        to replace or reorganize data files as new data arrives. Impala can perform efficient
+        lookups and scans within Kudu tables, and Impala can also perform update or
+        delete operations efficiently. You can also use the Kudu Java, C++, and Python APIs to
+        do ingestion or transformation operations outside of Impala, and Impala can query the
+        current data at any time.
+      </p>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="impala_kudu__kudu_config">
+
+    <h2 class="title topictitle2" id="ariaid-title3">Configuring Impala for Use with Kudu</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        The <code class="ph codeph">-kudu_master_hosts</code> configuration property must be set correctly
+        for the <span class="keyword cmdname">impalad</span> daemon, for <code class="ph codeph">CREATE TABLE ... STORED AS
+        KUDU</code> statements to connect to the appropriate Kudu server. Typically, the
+        required value for this setting is <code class="ph codeph"><var class="keyword varname">kudu_host</var>:7051</code>.
+        In a high-availability Kudu deployment, specify the names of multiple Kudu hosts separated by commas.
+      </p>
+
+      <p class="p">
+        If the <code class="ph codeph">-kudu_master_hosts</code> configuration property is not set, you can
+        still associate the appropriate value for each table by specifying a
+        <code class="ph codeph">TBLPROPERTIES('kudu.master_addresses')</code> clause in the <code class="ph codeph">CREATE TABLE</code> statement or
+        changing the <code class="ph codeph">TBLPROPERTIES('kudu.master_addresses')</code> value with an <code class="ph codeph">ALTER TABLE</code>
+        statement.
+      </p>
+
+    </div>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title4" id="kudu_config__kudu_topology">
+
+      <h3 class="title topictitle3" id="ariaid-title4">Cluster Topology for Kudu Tables</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          With HDFS-backed tables, you are typically concerned with the number of DataNodes in
+          the cluster, how many and how large HDFS data files are read during a query, and
+          therefore the amount of work performed by each DataNode and the network communication
+          to combine intermediate results and produce the final result set.
+        </p>
+
+        <p class="p">
+          With Kudu tables, the topology considerations are different, because:
+        </p>
+
+        <ul class="ul">
+          <li class="li">
+            <p class="p">
+              The underlying storage is managed and organized by Kudu, not represented as HDFS
+              data files.
+            </p>
+          </li>
+
+          <li class="li">
+            <p class="p">
+              Kudu handles some of the underlying mechanics of partitioning the data. You can specify
+              the partitioning scheme with combinations of hash and range partitioning, so that you can
+              decide how much effort to expend to manage the partitions as new data arrives. For example,
+              you can construct partitions that apply to date ranges rather than a separate partition for each
+              day or each hour.
+            </p>
+          </li>
+
+          <li class="li">
+            <p class="p">
+              Data is physically divided based on units of storage called <dfn class="term">tablets</dfn>. Tablets are
+              stored by <dfn class="term">tablet servers</dfn>. Each tablet server can store multiple tablets,
+              and each tablet is replicated across multiple tablet servers, managed automatically by Kudu.
+              Where practical, colocate the tablet servers on the same hosts as the DataNodes, although that is not required.
+            </p>
+          </li>
+        </ul>
+
+        <p class="p">
+          One consideration for the cluster topology is that the number of replicas for a Kudu table
+          must be odd.
+        </p>
+
+      </div>
+
+    </article>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title5" id="impala_kudu__kudu_ddl">
+
+    <h2 class="title topictitle2" id="ariaid-title5">Impala DDL Enhancements for Kudu Tables (CREATE TABLE and ALTER TABLE)</h2>
+
+
+
+    <div class="body conbody">
+
+      <p class="p">
+        You can use the Impala <code class="ph codeph">CREATE TABLE</code> and <code class="ph codeph">ALTER TABLE</code>
+        statements to create and fine-tune the characteristics of Kudu tables. Because Kudu
+        tables have features and properties that do not apply to other kinds of Impala tables,
+        familiarize yourself with Kudu-related concepts and syntax first.
+        For the general syntax of the <code class="ph codeph">CREATE TABLE</code>
+        statement for Kudu tables, see <a class="xref" href="impala_create_table.html">CREATE TABLE Statement</a>.
+      </p>
+
+      <p class="p toc inpage"></p>
+
+    </div>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title6" id="kudu_ddl__kudu_primary_key">
+
+      <h3 class="title topictitle3" id="ariaid-title6">Primary Key Columns for Kudu Tables</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          Kudu tables introduce the notion of primary keys to Impala for the first time. The
+          primary key is made up of one or more columns, whose values are combined and used as a
+          lookup key during queries. The tuple represented by these columns must be unique and cannot contain any
+          <code class="ph codeph">NULL</code> values, and can never be updated once inserted. For a
+          Kudu table, all the partition key columns must come from the set of
+          primary key columns.
+        </p>
+
+        <p class="p">
+          The primary key has both physical and logical aspects:
+        </p>
+
+        <ul class="ul">
+          <li class="li">
+            <p class="p">
+              On the physical side, it is used to map the data values to particular tablets for fast retrieval.
+              Because the tuples formed by the primary key values are unique, the primary key columns are typically
+              highly selective.
+            </p>
+          </li>
+          <li class="li">
+            <p class="p">
+              On the logical side, the uniqueness constraint allows you to avoid duplicate data in a table.
+              For example, if an <code class="ph codeph">INSERT</code> operation fails partway through, only some of the
+              new rows might be present in the table. You can re-run the same <code class="ph codeph">INSERT</code>, and
+              only the missing rows will be added. Or if data in the table is stale, you can run an
+              <code class="ph codeph">UPSERT</code> statement that brings the data up to date, without the possibility
+              of creating duplicate copies of existing rows.
+            </p>
+          </li>
+        </ul>
+
+        <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+          <p class="p">
+            Impala only allows <code class="ph codeph">PRIMARY KEY</code> clauses and <code class="ph codeph">NOT NULL</code>
+            constraints on columns for Kudu tables. These constraints are enforced on the Kudu side.
+          </p>
+        </div>
+
+      </div>
+
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title7" id="kudu_ddl__kudu_column_attributes">
+
+      <h3 class="title topictitle3" id="ariaid-title7">Kudu-Specific Column Attributes for CREATE TABLE</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          For the general syntax of the <code class="ph codeph">CREATE TABLE</code>
+          statement for Kudu tables, see <a class="xref" href="impala_create_table.html">CREATE TABLE Statement</a>.
+          The following sections provide more detail for some of the
+          Kudu-specific keywords you can use in column definitions.
+        </p>
+
+        <p class="p">
+          The column list in a <code class="ph codeph">CREATE TABLE</code> statement can include the following
+          attributes, which only apply to Kudu tables:
+        </p>
+
+<pre class="pre codeblock"><code>
+  PRIMARY KEY
+| [NOT] NULL
+| ENCODING <var class="keyword varname">codec</var>
+| COMPRESSION <var class="keyword varname">algorithm</var>
+| DEFAULT <var class="keyword varname">constant_expression</var>
+| BLOCK_SIZE <var class="keyword varname">number</var>
+</code></pre>
+
+        <p class="p toc inpage">
+          See the following sections for details about each column attribute.
+        </p>
+
+      </div>
+
+      <article class="topic concept nested3" aria-labelledby="ariaid-title8" id="kudu_column_attributes__kudu_primary_key_attribute">
+
+        <h4 class="title topictitle4" id="ariaid-title8">PRIMARY KEY Attribute</h4>
+
+        <div class="body conbody">
+
+          <p class="p">
+            The primary key for a Kudu table is a column, or set of columns, that uniquely
+            identifies every row. The primary key value also is used as the natural sort order
+            for the values from the table. The primary key value for each row is based on the
+            combination of values for the columns.
+          </p>
+
+          <p class="p">
+        Because all of the primary key columns must have non-null values, specifying a column
+        in the <code class="ph codeph">PRIMARY KEY</code> clause implicitly adds the <code class="ph codeph">NOT
+        NULL</code> attribute to that column.
+      </p>
+
+          <p class="p">
+            The primary key columns must be the first ones specified in the <code class="ph codeph">CREATE
+            TABLE</code> statement. For a single-column primary key, you can include a
+            <code class="ph codeph">PRIMARY KEY</code> attribute inline with the column definition. For a
+            multi-column primary key, you include a <code class="ph codeph">PRIMARY KEY (<var class="keyword varname">c1</var>,
+            <var class="keyword varname">c2</var>, ...)</code> clause as a separate entry at the end of the
+            column list.
+          </p>
+
+          <p class="p">
+            You can specify the <code class="ph codeph">PRIMARY KEY</code> attribute either inline in a single
+            column definition, or as a separate clause at the end of the column list:
+          </p>
+
+<pre class="pre codeblock"><code>
+CREATE TABLE pk_inline
+(
+  col1 BIGINT PRIMARY KEY,
+  col2 STRING,
+  col3 BOOLEAN
+) PARTITION BY HASH(col1) PARTITIONS 2 STORED AS KUDU;
+
+CREATE TABLE pk_at_end
+(
+  col1 BIGINT,
+  col2 STRING,
+  col3 BOOLEAN,
+  PRIMARY KEY (col1)
+) PARTITION BY HASH(col1) PARTITIONS 2 STORED AS KUDU;
+</code></pre>
+
+          <p class="p">
+            When the primary key is a single column, these two forms are equivalent. If the
+            primary key consists of more than one column, you must specify the primary key using
+            a separate entry in the column list:
+          </p>
+
+<pre class="pre codeblock"><code>
+CREATE TABLE pk_multiple_columns
+(
+  col1 BIGINT,
+  col2 STRING,
+  col3 BOOLEAN,
+  <strong class="ph b">PRIMARY KEY (col1, col2)</strong>
+) PARTITION BY HASH(col2) PARTITIONS 2 STORED AS KUDU;
+</code></pre>
+
+          <p class="p">
+            The <code class="ph codeph">SHOW CREATE TABLE</code> statement always represents the
+            <code class="ph codeph">PRIMARY KEY</code> specification as a separate item in the column list:
+          </p>
+
+<pre class="pre codeblock"><code>
+CREATE TABLE inline_pk_rewritten (id BIGINT <strong class="ph b">PRIMARY KEY</strong>, s STRING)
+  PARTITION BY HASH(id) PARTITIONS 2 STORED AS KUDU;
+
+SHOW CREATE TABLE inline_pk_rewritten;
++------------------------------------------------------------------------------+
+| result                                                                       |
++------------------------------------------------------------------------------+
+| CREATE TABLE user.inline_pk_rewritten (                                      |
+|   id BIGINT NOT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION, |
+|   s STRING NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION,      |
+|   <strong class="ph b">PRIMARY KEY (id)</strong>                                                           |
+| )                                                                            |
+| PARTITION BY HASH (id) PARTITIONS 2                                          |
+| STORED AS KUDU                                                               |
+| TBLPROPERTIES ('kudu.master_addresses'='host.example.com')                   |
++------------------------------------------------------------------------------+
+</code></pre>
+
+          <p class="p">
+            The notion of primary key only applies to Kudu tables. Every Kudu table requires a
+            primary key. The primary key consists of one or more columns. You must specify any
+            primary key columns first in the column list.
+          </p>
+
+          <p class="p">
+            The contents of the primary key columns cannot be changed by an
+            <code class="ph codeph">UPDATE</code> or <code class="ph codeph">UPSERT</code> statement. Including too many
+            columns in the primary key (more than 5 or 6) can also reduce the performance of
+            write operations. Therefore, pick the most selective and most frequently
+            tested non-null columns for the primary key specification.
+            If a column must always have a value, but that value
+            might change later, leave it out of the primary key and use a <code class="ph codeph">NOT
+            NULL</code> clause for that column instead. If an existing row has an
+            incorrect or outdated key column value, delete the old row and insert an entirely
+            new row with the correct primary key.
+          </p>
+
+        </div>
+
+      </article>
+
+      <article class="topic concept nested3" aria-labelledby="ariaid-title9" id="kudu_column_attributes__kudu_not_null_attribute">
+
+        <h4 class="title topictitle4" id="ariaid-title9">NULL | NOT NULL Attribute</h4>
+
+        <div class="body conbody">
+
+          <p class="p">
+            For Kudu tables, you can specify which columns can contain nulls or not. This
+            constraint offers an extra level of consistency enforcement for Kudu tables. If an
+            application requires a field to always be specified, include a <code class="ph codeph">NOT
+            NULL</code> clause in the corresponding column definition, and Kudu prevents rows
+            from being inserted with a <code class="ph codeph">NULL</code> in that column.
+          </p>
+
+          <p class="p">
+            For example, a table containing geographic information might require the latitude
+            and longitude coordinates to always be specified. Other attributes might be allowed
+            to be <code class="ph codeph">NULL</code>. For example, a location might not have a designated
+            place name, its altitude might be unimportant, and its population might be initially
+            unknown, to be filled in later.
+          </p>
+
+          <p class="p">
+        Because all of the primary key columns must have non-null values, specifying a column
+        in the <code class="ph codeph">PRIMARY KEY</code> clause implicitly adds the <code class="ph codeph">NOT
+        NULL</code> attribute to that column.
+      </p>
+
+          <p class="p">
+            For non-Kudu tables, Impala allows any column to contain <code class="ph codeph">NULL</code>
+            values, because it is not practical to enforce a <span class="q">"not null"</span> constraint on HDFS
+            data files that could be prepared using external tools and ETL processes.
+          </p>
+
+<pre class="pre codeblock"><code>
+CREATE TABLE required_columns
+(
+  id BIGINT PRIMARY KEY,
+  latitude DOUBLE NOT NULL,
+  longitude DOUBLE NOT NULL,
+  place_name STRING,
+  altitude DOUBLE,
+  population BIGINT
+) PARTITION BY HASH(id) PARTITIONS 2 STORED AS KUDU;
+</code></pre>
+
+          <p class="p">
+            During performance optimization, Kudu can use the knowledge that nulls are not
+            allowed to skip certain checks on each input row, speeding up queries and join
+            operations. Therefore, specify <code class="ph codeph">NOT NULL</code> constraints when
+            appropriate.
+          </p>
+
+          <p class="p">
+            The <code class="ph codeph">NULL</code> clause is the default condition for all columns that are not
+            part of the primary key. You can omit it, or specify it to clarify that you have made a
+            conscious design decision to allow nulls in a column.
+          </p>
+
+          <p class="p">
+            Because primary key columns cannot contain any <code class="ph codeph">NULL</code> values, the
+            <code class="ph codeph">NOT NULL</code> clause is not required for the primary key columns,
+            but you might still specify it to make your code self-describing.
+          </p>
+
+        </div>
+
+      </article>
+
+      <article class="topic concept nested3" aria-labelledby="ariaid-title10" id="kudu_column_attributes__kudu_default_attribute">
+
+        <h4 class="title topictitle4" id="ariaid-title10">DEFAULT Attribute</h4>
+
+        <div class="body conbody">
+
+          <p class="p">
+            You can specify a default value for columns in Kudu tables. The default value can be
+            any constant expression, for example, a combination of literal values, arithmetic
+            and string operations. It cannot contain references to columns or non-deterministic
+            function calls.
+          </p>
+
+          <p class="p">
+            The following example shows different kinds of expressions for the
+            <code class="ph codeph">DEFAULT</code> clause. The requirement to use a constant value means that
+            you can fill in a placeholder value such as <code class="ph codeph">NULL</code>, empty string,
+            0, -1, <code class="ph codeph">'N/A'</code> and so on, but you cannot reference functions or
+            column names. Therefore, you cannot use <code class="ph codeph">DEFAULT</code> to do things such as
+            automatically making an uppercase copy of a string value, storing Boolean values based
+            on tests of other columns, or add or subtract one from another column representing a sequence number.
+          </p>
+
+<pre class="pre codeblock"><code>
+CREATE TABLE default_vals
+(
+  id BIGINT PRIMARY KEY,
+  name STRING NOT NULL DEFAULT 'unknown',
+  address STRING DEFAULT upper('no fixed address'),
+  age INT DEFAULT -1,
+  earthling BOOLEAN DEFAULT TRUE,
+  planet_of_origin STRING DEFAULT 'Earth',
+  optional_col STRING DEFAULT NULL
+) PARTITION BY HASH(id) PARTITIONS 2 STORED AS KUDU;
+</code></pre>
+
+          <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+            <p class="p">
+              When designing an entirely new schema, prefer to use <code class="ph codeph">NULL</code> as the
+              placeholder for any unknown or missing values, because that is the universal convention
+              among database systems. Null values can be stored efficiently, and easily checked with the
+              <code class="ph codeph">IS NULL</code> or <code class="ph codeph">IS NOT NULL</code> operators. The <code class="ph codeph">DEFAULT</code>
+              attribute is appropriate when ingesting data that already has an established convention for
+              representing unknown or missing values, or where the vast majority of rows have some common
+              non-null value.
+            </p>
+          </div>
+
+        </div>
+
+      </article>
+
+      <article class="topic concept nested3" aria-labelledby="ariaid-title11" id="kudu_column_attributes__kudu_encoding_attribute">
+
+        <h4 class="title topictitle4" id="ariaid-title11">ENCODING Attribute</h4>
+
+        <div class="body conbody">
+
+          <p class="p">
+            Each column in a Kudu table can optionally use an encoding, a low-overhead form of
+            compression that reduces the size on disk, then requires additional CPU cycles to
+            reconstruct the original values during queries. Typically, highly compressible data
+            benefits from the reduced I/O to read the data back from disk.
+          </p>
+
+          <div class="p">
+            The encoding keywords that Impala recognizes are:
+
+            <ul class="ul">
+              <li class="li">
+                <p class="p">
+                  <code class="ph codeph">AUTO_ENCODING</code>: use the default encoding based
+                  on the column type, which are bitshuffle for the numeric type
+                  columns and dictionary for the string type columns.
+                </p>
+              </li>
+              <li class="li">
+                <p class="p">
+                  <code class="ph codeph">PLAIN_ENCODING</code>: leave the value in its original binary format.
+                </p>
+              </li>
+
+              <li class="li">
+                <p class="p">
+                  <code class="ph codeph">RLE</code>: compress repeated values (when sorted in primary key
+                  order) by including a count.
+                </p>
+              </li>
+              <li class="li">
+                <p class="p">
+                  <code class="ph codeph">DICT_ENCODING</code>: when the number of different string values is
+                  low, replace the original string with a numeric ID.
+                </p>
+              </li>
+              <li class="li">
+                <p class="p">
+                  <code class="ph codeph">BIT_SHUFFLE</code>: rearrange the bits of the values to efficiently
+                  compress sequences of values that are identical or vary only slightly based
+                  on primary key order. The resulting encoded data is also compressed with LZ4.
+                </p>
+              </li>
+              <li class="li">
+                <p class="p">
+                  <code class="ph codeph">PREFIX_ENCODING</code>: compress common prefixes in string values; mainly for use internally within Kudu.
+                </p>
+              </li>
+            </ul>
+          </div>
+
+
+
+          <p class="p">
+            The following example shows the Impala keywords representing the encoding types.
+            (The Impala keywords match the symbolic names used within Kudu.)
+            For usage guidelines on the different kinds of encoding, see
+            <a class="xref" href="https://kudu.apache.org/docs/schema_design.html" target="_blank">the Kudu documentation</a>.
+            The <code class="ph codeph">DESCRIBE</code> output shows how the encoding is reported after
+            the table is created, and that omitting the encoding (in this case, for the
+            <code class="ph codeph">ID</code> column) is the same as specifying <code class="ph codeph">DEFAULT_ENCODING</code>.
+          </p>
+
+<pre class="pre codeblock"><code>
+CREATE TABLE various_encodings
+(
+  id BIGINT PRIMARY KEY,
+  c1 BIGINT ENCODING PLAIN_ENCODING,
+  c2 BIGINT ENCODING AUTO_ENCODING,
+  c3 TINYINT ENCODING BIT_SHUFFLE,
+  c4 DOUBLE ENCODING BIT_SHUFFLE,
+  c5 BOOLEAN ENCODING RLE,
+  c6 STRING ENCODING DICT_ENCODING,
+  c7 STRING ENCODING PREFIX_ENCODING
+) PARTITION BY HASH(id) PARTITIONS 2 STORED AS KUDU;
+
+-- Some columns are omitted from the output for readability.
+describe various_encodings;
++------+---------+-------------+----------+-----------------+
+| name | type    | primary_key | nullable | encoding        |
++------+---------+-------------+----------+-----------------+
+| id   | bigint  | true        | false    | AUTO_ENCODING   |
+| c1   | bigint  | false       | true     | PLAIN_ENCODING  |
+| c2   | bigint  | false       | true     | AUTO_ENCODING   |
+| c3   | tinyint | false       | true     | BIT_SHUFFLE     |
+| c4   | double  | false       | true     | BIT_SHUFFLE     |
+| c5   | boolean | false       | true     | RLE             |
+| c6   | string  | false       | true     | DICT_ENCODING   |
+| c7   | string  | false       | true     | PREFIX_ENCODING |
++------+---------+-------------+----------+-----------------+
+</code></pre>
+
+        </div>
+
+      </article>
+
+      <article class="topic concept nested3" aria-labelledby="ariaid-title12" id="kudu_column_attributes__kudu_compression_attribute">
+
+        <h4 class="title topictitle4" id="ariaid-title12">COMPRESSION Attribute</h4>
+
+        <div class="body conbody">
+
+          <p class="p">
+            You can specify a compression algorithm to use for each column in a Kudu table. This
+            attribute imposes more CPU overhead when retrieving the values than the
+            <code class="ph codeph">ENCODING</code> attribute does. Therefore, use it primarily for columns with
+            long strings that do not benefit much from the less-expensive <code class="ph codeph">ENCODING</code>
+            attribute.
+          </p>
+
+          <p class="p">
+            The choices for <code class="ph codeph">COMPRESSION</code> are <code class="ph codeph">LZ4</code>,
+            <code class="ph codeph">SNAPPY</code>, and <code class="ph codeph">ZLIB</code>.
+          </p>
+
+          <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+            <p class="p">
+              Columns that use the <code class="ph codeph">BITSHUFFLE</code> encoding are already compressed
+              using <code class="ph codeph">LZ4</code>, and so typically do not need any additional
+              <code class="ph codeph">COMPRESSION</code> attribute.
+            </p>
+          </div>
+
+          <p class="p">
+            The following example shows design considerations for several
+            <code class="ph codeph">STRING</code> columns with different distribution characteristics, leading
+            to choices for both the <code class="ph codeph">ENCODING</code> and <code class="ph codeph">COMPRESSION</code>
+            attributes. The <code class="ph codeph">country</code> values come from a specific set of strings,
+            therefore this column is a good candidate for dictionary encoding. The
+            <code class="ph codeph">post_id</code> column contains an ascending sequence of integers, where
+            several leading bits are likely to be all zeroes, therefore this column is a good
+            candidate for bitshuffle encoding. The <code class="ph codeph">body</code>
+            column and the corresponding columns for translated versions tend to be long unique
+            strings that are not practical to use with any of the encoding schemes, therefore
+            they employ the <code class="ph codeph">COMPRESSION</code> attribute instead. The ideal compression
+            codec in each case would require some experimentation to determine how much space
+            savings it provided and how much CPU overhead it added, based on real-world data.
+          </p>
+
+<pre class="pre codeblock"><code>
+CREATE TABLE blog_posts
+(
+  user_id STRING ENCODING DICT_ENCODING,
+  post_id BIGINT ENCODING BIT_SHUFFLE,
+  subject STRING ENCODING PLAIN_ENCODING,
+  body STRING COMPRESSION LZ4,
+  spanish_translation STRING COMPRESSION SNAPPY,
+  esperanto_translation STRING COMPRESSION ZLIB,
+  PRIMARY KEY (user_id, post_id)
+) PARTITION BY HASH(user_id, post_id) PARTITIONS 2 STORED AS KUDU;
+</code></pre>
+
+        </div>
+
+      </article>
+
+      <article class="topic concept nested3" aria-labelledby="ariaid-title13" id="kudu_column_attributes__kudu_block_size_attribute">
+
+        <h4 class="title topictitle4" id="ariaid-title13">BLOCK_SIZE Attribute</h4>
+
+        <div class="body conbody">
+
+          <p class="p">
+            Although Kudu does not use HDFS files internally, and thus is not affected by
+            the HDFS block size, it does have an underlying unit of I/O called the
+            <dfn class="term">block size</dfn>. The <code class="ph codeph">BLOCK_SIZE</code> attribute lets you set the
+            block size for any column.
+          </p>
+
+          <p class="p">
+            The block size attribute is a relatively advanced feature. Refer to
+            <a class="xref" href="https://kudu.apache.org/docs/index.html" target="_blank">the Kudu documentation</a>
+            for usage details.
+          </p>
+
+
+
+        </div>
+
+      </article>
+
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title14" id="kudu_ddl__kudu_partitioning">
+
+      <h3 class="title topictitle3" id="ariaid-title14">Partitioning for Kudu Tables</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          Kudu tables use special mechanisms to distribute data among the underlying
+          tablet servers. Although we refer to such tables as partitioned tables, they are
+          distinguished from traditional Impala partitioned tables by use of different clauses
+          on the <code class="ph codeph">CREATE TABLE</code> statement. Kudu tables use
+          <code class="ph codeph">PARTITION BY</code>, <code class="ph codeph">HASH</code>, <code class="ph codeph">RANGE</code>, and
+          range specification clauses rather than the <code class="ph codeph">PARTITIONED BY</code> clause
+          for HDFS-backed tables, which specifies only a column name and creates a new partition for each
+          different value.
+        </p>
+
+        <p class="p">
+          For background information and architectural details about the Kudu partitioning
+          mechanism, see
+          <a class="xref" href="https://kudu.apache.org/kudu.pdf" target="_blank">the Kudu white paper, section 3.2</a>.
+        </p>
+
+
+
+        <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+          <p class="p">
+            The Impala DDL syntax for Kudu tables is different than in early Kudu versions,
+            which used an experimental fork of the Impala code. For example, the
+            <code class="ph codeph">DISTRIBUTE BY</code> clause is now <code class="ph codeph">PARTITION BY</code>, the
+            <code class="ph codeph">INTO <var class="keyword varname">n</var> BUCKETS</code> clause is now
+            <code class="ph codeph">PARTITIONS <var class="keyword varname">n</var></code> and the range partitioning syntax
+            is reworked to replace the <code class="ph codeph">SPLIT ROWS</code> clause with more expressive
+            syntax involving comparison operators.
+          </p>
+        </div>
+
+        <p class="p toc inpage"></p>
+
+      </div>
+
+      <article class="topic concept nested3" aria-labelledby="ariaid-title15" id="kudu_partitioning__kudu_hash_partitioning">
+        <h4 class="title topictitle4" id="ariaid-title15">Hash Partitioning</h4>
+        <div class="body conbody">
+
+          <p class="p">
+            Hash partitioning is the simplest type of partitioning for Kudu tables.
+            For hash-partitioned Kudu tables, inserted rows are divided up between a fixed number
+            of <span class="q">"buckets"</span> by applying a hash function to the values of the columns specified
+            in the <code class="ph codeph">HASH</code> clause.
+            Hashing ensures that rows with similar values are evenly distributed, instead of
+            clumping together all in the same bucket. Spreading new rows across the buckets this
+            way lets insertion operations work in parallel across multiple tablet servers.
+            Separating the hashed values can impose additional overhead on queries, where
+            queries with range-based predicates might have to read multiple tablets to retrieve
+            all the relevant values.
+          </p>
+
+<pre class="pre codeblock"><code>
+-- 1M rows with 50 hash partitions = approximately 20,000 rows per partition.
+-- The values in each partition are not sequential, but rather based on a hash function.
+-- Rows 1, 99999, and 123456 might be in the same partition.
+CREATE TABLE million_rows (id string primary key, s string)
+  PARTITION BY HASH(id) PARTITIONS 50
+  STORED AS KUDU;
+
+-- Because the ID values are unique, we expect the rows to be roughly
+-- evenly distributed between the buckets in the destination table.
+INSERT INTO million_rows SELECT * FROM billion_rows ORDER BY id LIMIT 1e6;
+</code></pre>
+
+          <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+            <p class="p">
+              The largest number of buckets that you can create with a <code class="ph codeph">PARTITIONS</code>
+              clause varies depending on the number of tablet servers in the cluster, while the smallest is 2.
+              For simplicity, some of the simple <code class="ph codeph">CREATE TABLE</code> statements throughout this section
+              use <code class="ph codeph">PARTITIONS 2</code> to illustrate the minimum requirements for a Kudu table.
+              For large tables, prefer to use roughly 10 partitions per server in the cluster.
+            </p>
+          </div>
+
+        </div>
+      </article>
+
+      <article class="topic concept nested3" aria-labelledby="ariaid-title16" id="kudu_partitioning__kudu_range_partitioning">
+        <h4 class="title topictitle4" id="ariaid-title16">Range Partitioning</h4>
+        <div class="body conbody">
+
+          <p class="p">
+            Range partitioning lets you specify partitioning precisely, based on single values or ranges
+            of values within one or more columns. You add one or more <code class="ph codeph">RANGE</code> clauses to the
+            <code class="ph codeph">CREATE TABLE</code> statement, following the <code class="ph codeph">PARTITION BY</code>
+            clause.
+          </p>
+
+          <p class="p">
+            Range-partitioned Kudu tables use one or more range clauses, which include a
+            combination of constant expressions, <code class="ph codeph">VALUE</code> or <code class="ph codeph">VALUES</code>
+            keywords, and comparison operators. (This syntax replaces the <code class="ph codeph">SPLIT
+            ROWS</code> clause used with early Kudu versions.)
+            For the full syntax, see <a class="xref" href="impala_create_table.html">CREATE TABLE Statement</a>.
+          </p>
+
+<pre class="pre codeblock"><code>
+-- 50 buckets, all for IDs beginning with a lowercase letter.
+-- Having only a single range enforces the allowed range of values
+-- but does not add any extra parallelism.
+create table million_rows_one_range (id string primary key, s string)
+  partition by hash(id) partitions 50,
+  range (partition 'a' &lt;= values &lt; '{')
+  stored as kudu;
+
+-- 50 buckets for IDs beginning with a lowercase letter
+-- plus 50 buckets for IDs beginning with an uppercase letter.
+-- Total number of buckets = number in the PARTITIONS clause x number of ranges.
+-- We are still enforcing constraints on the primary key values
+-- allowed in the table, and the 2 ranges provide better parallelism
+-- as rows are inserted or the table is scanned.
+create table million_rows_two_ranges (id string primary key, s string)
+  partition by hash(id) partitions 50,
+  range (partition 'a' &lt;= values &lt; '{', partition 'A' &lt;= values &lt; '[')
+  stored as kudu;
+
+-- Same as previous table, with an extra range covering the single key value '00000'.
+create table million_rows_three_ranges (id string primary key, s string)
+  partition by hash(id) partitions 50,
+  range (partition 'a' &lt;= values &lt; '{', partition 'A' &lt;= values &lt; '[', partition value = '00000')
+  stored as kudu;
+
+-- The range partitioning can be displayed with a SHOW command in impala-shell.
+show range partitions million_rows_three_ranges;
++---------------------+
+| RANGE (id)          |
++---------------------+
+| VALUE = "00000"     |
+| "A" &lt;= VALUES &lt; "[" |
+| "a" &lt;= VALUES &lt; "{" |
++---------------------+
+
+</code></pre>
+
+          <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+            <p class="p">
+              When defining ranges, be careful to avoid <span class="q">"fencepost errors"</span> where values at the
+              extreme ends might be included or omitted by accident. For example, in the tables defined
+              in the preceding code listings, the range <code class="ph codeph">"a" &lt;= VALUES &lt; "{"</code> ensures that
+              any values starting with <code class="ph codeph">z</code>, such as <code class="ph codeph">za</code> or <code class="ph codeph">zzz</code>
+              or <code class="ph codeph">zzz-ZZZ</code>, are all included, by using a less-than operator for the smallest
+              value after all the values starting with <code class="ph codeph">z</code>.
+            </p>
+          </div>
+
+          <p class="p">
+            For range-partitioned Kudu tables, an appropriate range must exist before a data value can be created in the table.
+            Any <code class="ph codeph">INSERT</code>, <code class="ph codeph">UPDATE</code>, or <code class="ph codeph">UPSERT</code> statements fail if they try to
+            create column values that fall outside the specified ranges. The error checking for ranges is performed on the
+            Kudu side; Impala passes the specified range information to Kudu, and passes back any error or warning if the
+            ranges are not valid. (A nonsensical range specification causes an error for a DDL statement, but only a warning
+            for a DML statement.)
+          </p>
+
+          <p class="p">
+            Ranges can be non-contiguous:
+          </p>
+
+<pre class="pre codeblock"><code>
+partition by range (year) (partition 1885 &lt;= values &lt;= 1889, partition 1893 &lt;= values &lt;= 1897)
+
+partition by range (letter_grade) (partition value = 'A', partition value = 'B',
+  partition value = 'C', partition value = 'D', partition value = 'F')
+
+</code></pre>
+
+          <p class="p">
+            The <code class="ph codeph">ALTER TABLE</code> statement with the <code class="ph codeph">ADD PARTITION</code> or
+            <code class="ph codeph">DROP PARTITION</code> clauses can be used to add or remove ranges from an
+            existing Kudu table.
+          </p>
+
+<pre class="pre codeblock"><code>
+ALTER TABLE foo ADD PARTITION 30 &lt;= VALUES &lt; 50;
+ALTER TABLE foo DROP PARTITION 1 &lt;= VALUES &lt; 5;
+
+</code></pre>
+
+          <p class="p">
+            When a range is added, the new range must not overlap with any of the previous ranges;
+            that is, it can only fill in gaps within the previous ranges.
+          </p>
+
+<pre class="pre codeblock"><code>
+alter table test_scores add range partition value = 'E';
+
+alter table year_ranges add range partition 1890 &lt;= values &lt; 1893;
+
+</code></pre>
+
+          <p class="p">
+            When a range is removed, all the associated rows in the table are deleted. (This
+            is true whether the table is internal or external.)
+          </p>
+
+<pre class="pre codeblock"><code>
+alter table test_scores drop range partition value = 'E';
+
+alter table year_ranges drop range partition 1890 &lt;= values &lt; 1893;
+
+</code></pre>
+
+        <p class="p">
+          Kudu tables can also use a combination of hash and range partitioning.
+        </p>
+
+<pre class="pre codeblock"><code>
+partition by hash (school) partitions 10,
+  range (letter_grade) (partition value = 'A', partition value = 'B',
+    partition value = 'C', partition value = 'D', partition value = 'F')
+
+</code></pre>
+
+        </div>
+      </article>
+
+      <article class="topic concept nested3" aria-labelledby="ariaid-title17" id="kudu_partitioning__kudu_partitioning_misc">
+        <h4 class="title topictitle4" id="ariaid-title17">Working with Partitioning in Kudu Tables</h4>
+        <div class="body conbody">
+
+          <p class="p">
+            To see the current partitioning scheme for a Kudu table, you can use the <code class="ph codeph">SHOW
+            CREATE TABLE</code> statement or the <code class="ph codeph">SHOW PARTITIONS</code> statement. The
+            <code class="ph codeph">CREATE TABLE</code> syntax displayed by this statement includes all the
+            hash, range, or both clauses that reflect the original table structure plus any
+            subsequent <code class="ph codeph">ALTER TABLE</code> statements that changed the table structure.
+          </p>
+
+          <p class="p">
+            To see the underlying buckets and partitions for a Kudu table, use the
+            <code class="ph codeph">SHOW TABLE STATS</code> or <code class="ph codeph">SHOW PARTITIONS</code> statement.
+          </p>
+
+        </div>
+      </article>
+
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title18" id="kudu_ddl__kudu_timestamps">
+
+      <h3 class="title topictitle3" id="ariaid-title18">Handling Date, Time, or Timestamp Data with Kudu</h3>
+
+      <div class="body conbody">
+
+        <div class="p">
+        In <span class="keyword">Impala 2.9</span> and higher, you can include <code class="ph codeph">TIMESTAMP</code>
+        columns in Kudu tables, instead of representing the date and time as a <code class="ph codeph">BIGINT</code>
+        value. The behavior of <code class="ph codeph">TIMESTAMP</code> for Kudu tables has some special considerations:
+
+        <ul class="ul">
+          <li class="li">
+            <p class="p">
+              Any nanoseconds in the original 96-bit value produced by Impala are not stored, because
+              Kudu represents date/time columns using 64-bit values. The nanosecond portion of the value
+              is rounded, not truncated. Therefore, a <code class="ph codeph">TIMESTAMP</code> value
+              that you store in a Kudu table might not be bit-for-bit identical to the value returned by a query.
+            </p>
+          </li>
+          <li class="li">
+            <p class="p">
+              The conversion between the Impala 96-bit representation and the Kudu 64-bit representation
+              introduces some performance overhead when reading or writing <code class="ph codeph">TIMESTAMP</code>
+              columns. You can minimize the overhead during writes by performing inserts through the
+              Kudu API. Because the overhead during reads applies to each query, you might continue to
+              use a <code class="ph codeph">BIGINT</code> column to represent date/time values  in performance-critical
+              applications.
+            </p>
+          </li>
+          <li class="li">
+            <p class="p">
+              The Impala <code class="ph codeph">TIMESTAMP</code> type has a narrower range for years than the underlying
+              Kudu data type. Impala can represent years 1400-9999. If year values outside this range
+              are written to a Kudu table by a non-Impala client, Impala returns <code class="ph codeph">NULL</code>
+              by default when reading those <code class="ph codeph">TIMESTAMP</code> values during a query. Or, if the
+              <code class="ph codeph">ABORT_ON_ERROR</code> query option is enabled, the query fails when it encounters
+              a value with an out-of-range year.
+            </p>
+          </li>
+        </ul>
+      </div>
+
+<pre class="pre codeblock"><code>--- Make a table representing a date/time value as TIMESTAMP.
+-- The strings representing the partition bounds are automatically
+-- cast to TIMESTAMP values.
+create table native_timestamp(id bigint, when_exactly timestamp, event string, primary key (id, when_exactly))
+  partition by hash (id) partitions 20,
+  range (when_exactly)
+  (
+    partition '2015-01-01' &lt;= values &lt; '2016-01-01',
+    partition '2016-01-01' &lt;= values &lt; '2017-01-01',
+    partition '2017-01-01' &lt;= values &lt; '2018-01-01'
+  )
+  stored as kudu;
+
+insert into native_timestamp values (12345, now(), 'Working on doc examples');
+
+select * from native_timestamp;
++-------+-------------------------------+-------------------------+
+| id    | when_exactly                  | event                   |
++-------+-------------------------------+-------------------------+
+| 12345 | 2017-05-31 16:27:42.667542000 | Working on doc examples |
++-------+-------------------------------+-------------------------+
+
+</code></pre>
+
+        <p class="p">
+          Because Kudu tables have some performance overhead to convert <code class="ph codeph">TIMESTAMP</code>
+          columns to the Impala 96-bit internal representation, for performance-critical
+          applications you might store date/time information as the number
+          of seconds, milliseconds, or microseconds since the Unix epoch date of January 1,
+          1970. Specify the column as <code class="ph codeph">BIGINT</code> in the Impala <code class="ph codeph">CREATE
+          TABLE</code> statement, corresponding to an 8-byte integer (an
+          <code class="ph codeph">int64</code>) in the underlying Kudu table). Then use Impala date/time
+          conversion functions as necessary to produce a numeric, <code class="ph codeph">TIMESTAMP</code>,
+          or <code class="ph codeph">STRING</code> value depending on the context.
+        </p>
+
+        <p class="p">
+          For example, the <code class="ph codeph">unix_timestamp()</code> function returns an integer result
+          representing the number of seconds past the epoch. The <code class="ph codeph">now()</code> function
+          produces a <code class="ph codeph">TIMESTAMP</code> representing the current date and time, which can
+          be passed as an argument to <code class="ph codeph">unix_timestamp()</code>. And string literals
+          representing dates and date/times can be cast to <code class="ph codeph">TIMESTAMP</code>, and from there
+          converted to numeric values. The following examples show how you might store a date/time
+          column as <code class="ph codeph">BIGINT</code> in a Kudu table, but still use string literals and
+          <code class="ph codeph">TIMESTAMP</code> values for convenience.
+        </p>
+
+<pre class="pre codeblock"><code>
+-- now() returns a TIMESTAMP and shows the format for string literals you can cast to TIMESTAMP.
+select now();
++-------------------------------+
+| now()                         |
++-------------------------------+
+| 2017-01-25 23:50:10.132385000 |
++-------------------------------+
+
+-- unix_timestamp() accepts either a TIMESTAMP or an equivalent string literal.
+select unix_timestamp(now());
++------------------+
+| unix_timestamp() |
++------------------+
+| 1485386670       |
++------------------+
+
+select unix_timestamp('2017-01-01');
++------------------------------+
+| unix_timestamp('2017-01-01') |
++------------------------------+
+| 1483228800                   |
++------------------------------+
+
+-- Make a table representing a date/time value as BIGINT.
+-- Construct 1 range partition and 20 associated hash partitions for each year.
+-- Use date/time conversion functions to express the ranges as human-readable dates.
+create table time_series(id bigint, when_exactly bigint, event string, primary key (id, when_exactly))
+  partition by hash (id) partitions 20,
+  range (when_exactly)
+  (
+    partition unix_timestamp('2015-01-01') &lt;= values &lt; unix_timestamp('2016-01-01'),
+    partition unix_timestamp('2016-01-01') &lt;= values &lt; unix_timestamp('2017-01-01'),
+    partition unix_timestamp('2017-01-01') &lt;= values &lt; unix_timestamp('2018-01-01')
+  )
+  stored as kudu;
+
+-- On insert, we can transform a human-readable date/time into a numeric value.
+insert into time_series values (12345, unix_timestamp('2017-01-25 23:24:56'), 'Working on doc examples');
+
+-- On retrieval, we can examine the numeric date/time value or turn it back into a string for readability.
+select id, when_exactly, from_unixtime(when_exactly) as 'human-readable date/time', event
+  from time_series order by when_exactly limit 100;
++-------+--------------+--------------------------+-------------------------+
+| id    | when_exactly | human-readable date/time | event                   |
++-------+--------------+--------------------------+-------------------------+
+| 12345 | 1485386696   | 2017-01-25 23:24:56      | Working on doc examples |
++-------+--------------+--------------------------+-------------------------+
+
+</code></pre>
+
+        <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+          <p class="p">
+            If you do high-precision arithmetic involving numeric date/time values,
+            when dividing millisecond values by 1000, or microsecond values by 1 million, always
+            cast the integer numerator to a <code class="ph codeph">DECIMAL</code> with sufficient precision
+            and scale to avoid any rounding or loss of precision.
+          </p>
+        </div>
+
+<pre class="pre codeblock"><code>
+-- 1 million and 1 microseconds = 1.000001 seconds.
+select microseconds,
+  cast (microseconds as decimal(20,7)) / 1e6 as fractional_seconds
+  from table_with_microsecond_column;
++--------------+----------------------+
+| microseconds | fractional_seconds   |
++--------------+----------------------+
+| 1000001      | 1.000001000000000000 |
++--------------+----------------------+
+
+</code></pre>
+
+      </div>
+
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title19" id="kudu_ddl__kudu_metadata">
+
+      <h3 class="title topictitle3" id="ariaid-title19">How Impala Handles Kudu Metadata</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+        Much of the metadata for Kudu tables is handled by the underlying
+        storage layer. Kudu tables have less reliance on the metastore
+        database, and require less metadata caching on the Impala side.
+        For example, information about partitions in Kudu tables is managed
+        by Kudu, and Impala does not cache any block locality metadata
+        for Kudu tables.
+      </p>
+        <p class="p">
+        The <code class="ph codeph">REFRESH</code> and <code class="ph codeph">INVALIDATE METADATA</code>
+        statements are needed less frequently for Kudu tables than for
+        HDFS-backed tables. Neither statement is needed when data is
+        added to, removed, or updated in a Kudu table, even if the changes
+        are made directly to Kudu through a client program using the Kudu API.
+        Run <code class="ph codeph">REFRESH <var class="keyword varname">table_name</var></code> or
+        <code class="ph codeph">INVALIDATE METADATA <var class="keyword varname">table_name</var></code>
+        for a Kudu table only after making a change to the Kudu table schema,
+        such as adding or dropping a column, by a mechanism other than
+        Impala.
+      </p>
+
+        <p class="p">
+          Because Kudu manages the metadata for its own tables separately from the metastore
+          database, there is a table name stored in the metastore database for Impala to use,
+          and a table name on the Kudu side, and these names can be modified independently
+          through <code class="ph codeph">ALTER TABLE</code> statements.
+        </p>
+
+        <p class="p">
+          To avoid potential name conflicts, the prefix <code class="ph codeph">impala::</code>
+          and the Impala database name are encoded into the underlying Kudu
+          table name:
+        </p>
+
+<pre class="pre codeblock"><code>
+create database some_database;
+use some_database;
+
+create table table_name_demo (x int primary key, y int)
+  partition by hash (x) partitions 2 stored as kudu;
+
+describe formatted table_name_demo;
+...
+kudu.table_name  | impala::some_database.table_name_demo
+
+</code></pre>
+
+        <p class="p">
+          See <a class="xref" href="impala_tables.html">Overview of Impala Tables</a> for examples of how to change the name of
+          the Impala table in the metastore database, the name of the underlying Kudu
+          table, or both.
+        </p>
+
+      </div>
+
+    </article>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title20" id="impala_kudu__kudu_etl">
+
+    <h2 class="title topictitle2" id="ariaid-title20">Loading Data into Kudu Tables</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        Kudu tables are well-suited to use cases where data arrives continuously, in small or
+        moderate volumes. To bring data into Kudu tables, use the Impala <code class="ph codeph">INSERT</code>
+        and <code class="ph codeph">UPSERT</code> statements. The <code class="ph codeph">LOAD DATA</code> statement does
+        not apply to Kudu tables.
+      </p>
+
+      <p class="p">
+        Because Kudu manages its own storage layer that is optimized for smaller block sizes than
+        HDFS, and performs its own housekeeping to keep data evenly distributed, it is not
+        subject to the <span class="q">"many small files"</span> issue and does not need explicit reorganization
+        and compaction as the data grows over time. The partitions within a Kudu table can be
+        specified to cover a variety of possible data distributions, instead of hardcoding a new
+        partition for each new day, hour, and so on, which can lead to inefficient,
+        hard-to-scale, and hard-to-manage partition schemes with HDFS tables.
+      </p>
+
+      <p class="p">
+        Your strategy for performing ETL or bulk updates on Kudu tables should take into account
+        the limitations on consistency for DML operations.
+      </p>
+
+      <p class="p">
+        Make <code class="ph codeph">INSERT</code>, <code class="ph codeph">UPDATE</code>, and <code class="ph codeph">UPSERT</code>
+        operations <dfn class="term">idempotent</dfn>: that is, able to be applied multiple times and still
+        produce an identical result.
+      </p>
+
+      <p class="p">
+        If a bulk operation is in danger of exceeding capacity limits due to timeouts or high
+        memory usage, split it into a series of smaller operations.
+      </p>
+
+      <p class="p">
+        Avoid running concurrent ETL operations where the end results depend on precise
+        ordering. In particular, do not rely on an <code class="ph codeph">INSERT ... SELECT</code> statement
+        that selects from the same table into which it is inserting, unless you include extra
+        conditions in the <code class="ph codeph">WHERE</code> clause to avoid reading the newly inserted rows
+        within the same statement.
+      </p>
+
+      <p class="p">
+        Because relationships between tables cannot be enforced by Impala and Kudu, and cannot
+        be committed or rolled back together, do not expect transactional semantics for
+        multi-table operations.
+      </p>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title21" id="impala_kudu__kudu_dml">
+
+    <h2 class="title topictitle2" id="ariaid-title21">Impala DML Support for Kudu Tables (INSERT, UPDATE, DELETE, UPSERT)</h2>
+
+
+
+    <div class="body conbody">
+
+      <p class="p">
+        Impala supports certain DML statements for Kudu tables only. The <code class="ph codeph">UPDATE</code>
+        and <code class="ph codeph">DELETE</code> statements let you modify data within Kudu tables without
+        rewriting substantial amounts of table data. The <code class="ph codeph">UPSERT</code> statement acts
+        as a combination of <code class="ph codeph">INSERT</code> and <code class="ph codeph">UPDATE</code>, inserting rows
+        where the primary key does not already exist, and updating the non-primary key columns
+        where the primary key does already exist in the table.
+      </p>
+
+      <p class="p">
+        The <code class="ph codeph">INSERT</code> statement for Kudu tables honors the unique and <code class="ph codeph">NOT
+        NULL</code> requirements for the primary key columns.
+      </p>
+
+      <p class="p">
+        Because Impala and Kudu do not support transactions, the effects of any
+        <code class="ph codeph">INSERT</code>, <code class="ph codeph">UPDATE</code>, or <code class="ph codeph">DELETE</code> statement
+        are immediately visible. For example, you cannot do a sequence of
+        <code class="ph codeph">UPDATE</code> statements and only make the changes visible after all the
+        statements are finished. Also, if a DML statement fails partway through, any rows that
+        were already inserted, deleted, or changed remain in the table; there is no rollback
+        mechanism to undo the changes.
+      </p>
+
+      <p class="p">
+        In particular, an <code class="ph codeph">INSERT ... SELECT</code> statement that refers to the table
+        being inserted into might insert more rows than expected, because the
+        <code class="ph codeph">SELECT</code> part of the statement sees some of the new rows being inserted
+        and processes them again.
+      </p>
+
+      <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+        <p class="p">
+          The <code class="ph codeph">LOAD DATA</code> statement, which involves manipulation of HDFS data files,
+          does not apply to Kudu tables.
+        </p>
+      </div>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title22" id="impala_kudu__kudu_consistency">
+
+    <h2 class="title topictitle2" id="ariaid-title22">Consistency Considerations for Kudu Tables</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        Kudu tables have consistency characteristics such as uniqueness, controlled by the
+        primary key columns, and non-nullable columns. The emphasis for consistency is on
+        preventing duplicate or incomplete data from being stored in a table.
+      </p>
+
+      <p class="p">
+        Currently, Kudu does not enforce strong consistency for order of operations, total
+        success or total failure of a multi-row statement, or data that is read while a write
+        operation is in progress. Changes are applied atomically to each row, but not applied
+        as a single unit to all rows affected by a multi-row DML statement. That is, Kudu does
+        not currently have atomic multi-row statements or isolation between statements.
+      </p>
+
+      <p class="p">
+        If some rows are rejected during a DML operation because of a mismatch with duplicate
+        primary key values, <code class="ph codeph">NOT NULL</code> constraints, and so on, the statement
+        succeeds with a warning. Impala still inserts, deletes, or updates the other rows that
+        are not affected by the constraint violation.
+      </p>
+
+      <p class="p">
+        Consequently, the number of rows affected by a DML operation on a Kudu table might be
+        different than you expect.
+      </p>
+
+      <p class="p">
+        Because there is no strong consistency guarantee for information being inserted into,
+        deleted from, or updated across multiple tables simultaneously, consider denormalizing
+        the data where practical. That is, if you run separate <code class="ph codeph">INSERT</code>
+        statements to insert related rows into two different tables, one <code class="ph codeph">INSERT</code>
+        might fail while the other succeeds, leaving the data in an inconsistent state. Even if
+        both inserts succeed, a join query might happen during the interval between the
+        completion of the first and second statements, and the query would encounter incomplete
+        inconsistent data. Denormalizing the data into a single wide table can reduce the
+        possibility of inconsistency due to multi-table operations.
+      </p>
+
+      <p class="p">
+        Information about the number of rows affected by a DML operation is reported in
+        <span class="keyword cmdname">impala-shell</span> output, and in the <code class="ph codeph">PROFILE</code> output, but
+        is not currently reported to HiveServer2 clients such as JDBC or ODBC applications.
+      </p>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title23" id="impala_kudu__kudu_security">
+
+    <h2 class="title topictitle2" id="ariaid-title23">Security Considerations for Kudu Tables</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        Security for Kudu tables involves:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          <p class="p">
+            Sentry authorization.
+          </p>
+          <div class="p">
+        Access to Kudu tables must be granted to and revoked from roles with the
+        following considerations:
+        <ul class="ul">
+          <li class="li">
+            Only users with the <code class="ph codeph">ALL</code> privilege on
+              <code class="ph codeph">SERVER</code> can create external Kudu tables.
+          </li>
+          <li class="li">
+            The <code class="ph codeph">ALL</code> privileges on <code class="ph codeph">SERVER</code> is
+            required to specify the <code class="ph codeph">kudu.master_addresses</code>
+            property in the <code class="ph codeph">CREATE TABLE</code> statements for managed
+            tables as well as external tables.
+          </li>
+          <li class="li">
+            Access to Kudu tables is enforced at the table level and at the
+            column level.
+          </li>
+          <li class="li">
+            The <code class="ph codeph">SELECT</code>- and <code class="ph codeph">INSERT</code>-specific
+            permissions are supported.
+          </li>
+          <li class="li">
+            The <code class="ph codeph">DELETE</code>, <code class="ph codeph">UPDATE</code>, and
+            <code class="ph codeph">UPSERT</code> operations require the <code class="ph codeph">ALL</code>
+            privilege.
+          </li>
+        </ul>
+        Because non-SQL APIs can access Kudu data without going through Sentry
+        authorization, currently the Sentry support is considered preliminary
+        and subject to change.
+      </div>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            Kerberos authentication. See <a class="xref" href="https://kudu.apache.org/docs/security.html" target="_blank">Kudu Security</a> for details.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            TLS encryption. See <a class="xref" href="https://kudu.apache.org/docs/security.html" target="_blank">Kudu Security</a> for details.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            Lineage tracking.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            Auditing.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            Redaction of sensitive information from log files.
+          </p>
+        </li>
+      </ul>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title24" id="impala_kudu__kudu_performance">
+
+    <h2 class="title topictitle2" id="ariaid-title24">Impala Query Performance for Kudu Tables</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        For queries involving Kudu tables, Impala can delegate much of the work of filtering the
+        result set to Kudu, avoiding some of the I/O involved in full table scans of tables
+        containing HDFS data files. This type of optimization is especially effective for
+        partitioned Kudu tables, where the Impala query <code class="ph codeph">WHERE</code> clause refers to
+        one or more primary key columns that are also used as partition key columns. For
+        example, if a partitioned Kudu table uses a <code class="ph codeph">HASH</code> clause for
+        <code class="ph codeph">col1</code> and a <code class="ph codeph">RANGE</code> clause for <code class="ph codeph">col2</code>, a
+        query using a clause such as <code class="ph codeph">WHERE col1 IN (1,2,3) AND col2 &gt; 100</code>
+        can determine exactly which tablet servers contain relevant data, and therefore
+        parallelize the query very efficiently.
+      </p>
+
+      <p class="p">
+        In <span class="keyword">Impala 2.11</span> and higher, Impala can push down additional
+        information to optimize join queries involving Kudu tables. If the join clause
+        contains predicates of the form
+        <code class="ph codeph"><var class="keyword varname">column</var> = <var class="keyword varname">expression</var></code>,
+        after Impala constructs a hash table of possible matching values for the
+        join columns from the bigger table (either an HDFS table or a Kudu table), Impala
+        can <span class="q">"push down"</span> the minimum and maximum matching column values to Kudu,
+        so that Kudu can more efficiently locate matching rows in the second (smaller) table.
+        These min/max filters are affected by the <code class="ph codeph">RUNTIME_FILTER_MODE</code>,
+        <code class="ph codeph">RUNTIME_FILTER_WAIT_TIME_MS</code>, and <code class="ph codeph">DISABLE_ROW_RUNTIME_FILTERING</code>
+        query options; the min/max filters are not affected by the
+        <code class="ph codeph">RUNTIME_BLOOM_FILTER_SIZE</code>, <code class="ph codeph">RUNTIME_FILTER_MIN_SIZE</code>,
+        <code class="ph codeph">RUNTIME_FILTER_MAX_SIZE</code>, and <code class="ph codeph">MAX_NUM_RUNTIME_FILTERS</code>
+        query options.
+      </p>
+
+      <p class="p">
+        See <a class="xref" href="impala_explain.html">EXPLAIN Statement</a> for examples of evaluating the effectiveness of
+        the predicate pushdown for a specific query against a Kudu table.
+      </p>
+
+      <p class="p">
+        The <code class="ph codeph">TABLESAMPLE</code> clause of the <code class="ph codeph">SELECT</code>
+        statement does not apply to a table reference derived from a view, a subquery,
+        or anything other than a real base table. This clause only works for tables
+        backed by HDFS or HDFS-like data files, therefore it does not apply to Kudu or
+        HBase tables.
+      </p>
+
+
+
+
+    </div>
+
+
+
+
+
+  </article>
+
+</article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_langref.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_langref.html b/docs/build3x/html/topics/impala_langref.html
new file mode 100644
index 0000000..a515a63
--- /dev/null
+++ b/docs/build3x/html/topics/impala_langref.html
@@ -0,0 +1,66 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_comments.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_datatypes.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_literals.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_operators.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_schema_objects.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_langref_sql.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_functions.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_langref_unsupported.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_porting.html"><met
 a name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="langref"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Impala SQL Language Reference</title></head><body id="langref"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Impala SQL Language Reference</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+      Impala uses SQL as its query language. To protect user investment in skills development and query
+      design, Impala provides a high degree of compatibility with the Hive Query Language (HiveQL):
+    </p>
+
+    <ul class="ul">
+      <li class="li">
+        Because Impala uses the same metadata store as Hive to record information about table structure and
+        properties, Impala can access tables defined through the native Impala <code class="ph codeph">CREATE TABLE</code>
+        command, or tables created using the Hive data definition language (DDL).
+      </li>
+
+      <li class="li">
+        Impala supports data manipulation (DML) statements similar to the DML component of HiveQL.
+      </li>
+
+      <li class="li">
+        Impala provides many <a class="xref" href="impala_functions.html#builtins">built-in functions</a> with the same
+        names and parameter types as their HiveQL equivalents.
+      </li>
+    </ul>
+
+    <p class="p">
+      Impala supports most of the same <a class="xref" href="impala_langref_sql.html#langref_sql">statements and
+      clauses</a> as HiveQL, including, but not limited to <code class="ph codeph">JOIN</code>, <code class="ph codeph">AGGREGATE</code>,
+      <code class="ph codeph">DISTINCT</code>, <code class="ph codeph">UNION ALL</code>, <code class="ph codeph">ORDER BY</code>, <code class="ph codeph">LIMIT</code> and
+      (uncorrelated) subquery in the <code class="ph codeph">FROM</code> clause. Impala also supports <code class="ph codeph">INSERT
+      INTO</code> and <code class="ph codeph">INSERT OVERWRITE</code>.
+    </p>
+
+    <p class="p">
+      Impala supports data types with the same names and semantics as the equivalent Hive data types:
+      <code class="ph codeph">STRING</code>, <code class="ph codeph">TINYINT</code>, <code class="ph codeph">SMALLINT</code>, <code class="ph codeph">INT</code>,
+      <code class="ph codeph">BIGINT</code>, <code class="ph codeph">FLOAT</code>, <code class="ph codeph">DOUBLE</code>, <code class="ph codeph">BOOLEAN</code>,
+      <code class="ph codeph">STRING</code>, <code class="ph codeph">TIMESTAMP</code>.
+    </p>
+
+    <p class="p">
+      For full details about Impala SQL syntax and semantics, see
+      <a class="xref" href="impala_langref_sql.html#langref_sql">Impala SQL Statements</a>.
+    </p>
+
+    <p class="p">
+      Most HiveQL <code class="ph codeph">SELECT</code> and <code class="ph codeph">INSERT</code> statements run unmodified with Impala. For
+      information about Hive syntax not available in Impala, see
+      <a class="xref" href="impala_langref_unsupported.html#langref_hiveql_delta">SQL Differences Between Impala and Hive</a>.
+    </p>
+
+    <p class="p">
+      For a list of the built-in functions available in Impala queries, see
+      <a class="xref" href="impala_functions.html#builtins">Impala Built-In Functions</a>.
+    </p>
+
+    <p class="p toc"></p>
+  </div>
+<nav role="navigation" class="related-links"><ul class="ullinks"><li class="link ulchildlink"><strong><a href="../topics/impala_comments.html">Comments</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_datatypes.html">Data Types</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_literals.html">Literals</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_operators.html">SQL Operators</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_schema_objects.html">Impala Schema Objects and Object Names</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_langref_sql.html">Impala SQL Statements</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_functions.html">Impala Built-In Functions</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_langref_unsupported.html">SQL Diff
 erences Between Impala and Hive</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_porting.html">Porting SQL from Other Database Systems to Impala</a></strong><br></li></ul></nav></article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_langref_sql.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_langref_sql.html b/docs/build3x/html/topics/impala_langref_sql.html
new file mode 100644
index 0000000..65c6d55
--- /dev/null
+++ b/docs/build3x/html/topics/impala_langref_sql.html
@@ -0,0 +1,28 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_langref.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_ddl.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_dml.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_alter_table.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_alter_view.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_compute_stats.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_create_database.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_create_function.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_create_role.html"><meta name
 ="DC.Relation" scheme="URI" content="../topics/impala_create_table.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_create_view.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_delete.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_describe.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_drop_database.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_drop_function.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_drop_role.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_drop_stats.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_drop_table.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_drop_view.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_explain.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_grant.html"><meta name="DC.Relation" scheme="URI" cont
 ent="../topics/impala_insert.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_invalidate_metadata.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_load_data.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_refresh.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_revoke.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_select.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_set.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_show.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_truncate_table.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_update.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_upsert.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_use.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_hints.html"><meta name="prodname" 
 content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="langref_sql"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Impala SQL Statements</title></head><body id="langref_sql"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Impala SQL Statements</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+      The Impala SQL dialect supports a range of standard elements, plus some extensions for Big Data use cases
+      related to data loading and data warehousing.
+    </p>
+
+    <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+      <p class="p">
+        In the <span class="keyword cmdname">impala-shell</span> interpreter, a semicolon at the end of each statement is required.
+        Since the semicolon is not actually part of the SQL syntax, we do not include it in the syntax definition
+        of each statement, but we do show it in examples intended to be run in <span class="keyword cmdname">impala-shell</span>.
+      </p>
+    </div>
+
+    <p class="p toc all">
+      The following sections show the major SQL statements that you work with in Impala:
+    </p>
+  </div>
+<nav role="navigation" class="related-links"><ul class="ullinks"><li class="link ulchildlink"><strong><a href="../topics/impala_ddl.html">DDL Statements</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_dml.html">DML Statements</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_alter_table.html">ALTER TABLE Statement</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_alter_view.html">ALTER VIEW Statement</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_compute_stats.html">COMPUTE STATS Statement</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_create_database.html">CREATE DATABASE Statement</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_create_function.html">CREATE FUNCTION Statement</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_create_role.h
 tml">CREATE ROLE Statement (Impala 2.0 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_create_table.html">CREATE TABLE Statement</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_create_view.html">CREATE VIEW Statement</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_delete.html">DELETE Statement (Impala 2.8 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_describe.html">DESCRIBE Statement</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_drop_database.html">DROP DATABASE Statement</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_drop_function.html">DROP FUNCTION Statement</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_drop_role.html">DROP ROLE Statement (Impala 2.0 or higher only)</a></strong><br></li><li cla
 ss="link ulchildlink"><strong><a href="../topics/impala_drop_stats.html">DROP STATS Statement</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_drop_table.html">DROP TABLE Statement</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_drop_view.html">DROP VIEW Statement</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_explain.html">EXPLAIN Statement</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_grant.html">GRANT Statement (Impala 2.0 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_insert.html">INSERT Statement</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_invalidate_metadata.html">INVALIDATE METADATA Statement</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_load_data.html">LOAD DATA Statement</a></strong><br></li><li class=
 "link ulchildlink"><strong><a href="../topics/impala_refresh.html">REFRESH Statement</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_revoke.html">REVOKE Statement (Impala 2.0 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_select.html">SELECT Statement</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_set.html">SET Statement</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_show.html">SHOW Statement</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_truncate_table.html">TRUNCATE TABLE Statement (Impala 2.3 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_update.html">UPDATE Statement (Impala 2.8 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_upsert.html">UPSERT Statement (Impala 2.8 or higher on
 ly)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_use.html">USE Statement</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_hints.html">Optimizer Hints</a></strong><br></li></ul><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_langref.html">Impala SQL Language Reference</a></div></div></nav></article></main></body></html>

[47/51] [partial] impala git commit: [DOCS] Impala doc site update for 3.0

Posted by mi...@apache.org.

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_appx_median.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_appx_median.html b/docs/build3x/html/topics/impala_appx_median.html
new file mode 100644
index 0000000..3003ec0
--- /dev/null
+++ b/docs/build3x/html/topics/impala_appx_median.html
@@ -0,0 +1,132 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_aggregate_functions.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="appx_median"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>APPX_MEDIAN Function</title></head><body id="appx_median"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">APPX_MEDIAN Function</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+
+      An aggregate function that returns a value that is approximately the median (midpoint) of values in the set
+      of input values.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>APPX_MEDIAN([DISTINCT | ALL] <var class="keyword varname">expression</var>)
+</code></pre>
+
+    <p class="p">
+      This function works with any input type, because the only requirement is that the type supports less-than and
+      greater-than comparison operators.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+    <p class="p">
+      Because the return value represents the estimated midpoint, it might not reflect the precise midpoint value,
+      especially if the cardinality of the input values is very high. If the cardinality is low (up to
+      approximately 20,000), the result is more accurate because the sampling considers all or almost all of the
+      different values.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Return type:</strong> Same as the input value, except for <code class="ph codeph">CHAR</code> and <code class="ph codeph">VARCHAR</code>
+        arguments which produce a <code class="ph codeph">STRING</code> result
+      </p>
+
+    <p class="p">
+      The return value is always the same as one of the input values, not an <span class="q">"in-between"</span> value produced by
+      averaging.
+    </p>
+
+
+
+    <p class="p">
+        <strong class="ph b">Restrictions:</strong>
+      </p>
+
+    <p class="p">
+        This function cannot be used in an analytic context. That is, the <code class="ph codeph">OVER()</code> clause is not allowed at all with this function.
+      </p>
+
+    <p class="p">
+      The <code class="ph codeph">APPX_MEDIAN</code> function returns only the first 10 characters for
+      string values (string, varchar, char). Additional characters are truncated.
+    </p>
+
+     <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+    <p class="p">
+      The following example uses a table of a million random floating-point numbers ranging up to approximately
+      50,000. The average is approximately 25,000. Because of the random distribution, we would expect the median
+      to be close to this same number. Computing the precise median is a more intensive operation than computing
+      the average, because it requires keeping track of every distinct value and how many times each occurs. The
+      <code class="ph codeph">APPX_MEDIAN()</code> function uses a sampling algorithm to return an approximate result, which in
+      this case is close to the expected value. To make sure that the value is not substantially out of range due
+      to a skewed distribution, subsequent queries confirm that there are approximately 500,000 values higher than
+      the <code class="ph codeph">APPX_MEDIAN()</code> value, and approximately 500,000 values lower than the
+      <code class="ph codeph">APPX_MEDIAN()</code> value.
+    </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; select min(x), max(x), avg(x) from million_numbers;
++-------------------+-------------------+-------------------+
+| min(x)            | max(x)            | avg(x)            |
++-------------------+-------------------+-------------------+
+| 4.725693727250069 | 49994.56852674231 | 24945.38563793553 |
++-------------------+-------------------+-------------------+
+[localhost:21000] &gt; select appx_median(x) from million_numbers;
++----------------+
+| appx_median(x) |
++----------------+
+| 24721.6        |
++----------------+
+[localhost:21000] &gt; select count(x) as higher from million_numbers where x &gt; (select appx_median(x) from million_numbers);
++--------+
+| higher |
++--------+
+| 502013 |
++--------+
+[localhost:21000] &gt; select count(x) as lower from million_numbers where x &lt; (select appx_median(x) from million_numbers);
++--------+
+| lower  |
++--------+
+| 497987 |
++--------+
+</code></pre>
+
+    <p class="p">
+      The following example computes the approximate median using a subset of the values from the table, and then
+      confirms that the result is a reasonable estimate for the midpoint.
+    </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; select appx_median(x) from million_numbers where x between 1000 and 5000;
++-------------------+
+| appx_median(x)    |
++-------------------+
+| 3013.107787358159 |
++-------------------+
+[localhost:21000] &gt; select count(x) as higher from million_numbers where x between 1000 and 5000 and x &gt; 3013.107787358159;
++--------+
+| higher |
++--------+
+| 37692  |
++--------+
+[localhost:21000] &gt; select count(x) as lower from million_numbers where x between 1000 and 5000 and x &lt; 3013.107787358159;
++-------+
+| lower |
++-------+
+| 37089 |
++-------+
+</code></pre>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_aggregate_functions.html">Impala Aggregate Functions</a></div></div></nav></article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_array.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_array.html b/docs/build3x/html/topics/impala_array.html
new file mode 100644
index 0000000..caddc89
--- /dev/null
+++ b/docs/build3x/html/topics/impala_array.html
@@ -0,0 +1,321 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_datatypes.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="array"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>ARRAY Complex Type (Impala 2.3 or higher only)</title></head><body id="array"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">ARRAY Complex Type (<span class="keyword">Impala 2.3</span> or higher only)</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+      A complex data type that can represent an arbitrary number of ordered elements.
+      The elements can be scalars or another complex type (<code class="ph codeph">ARRAY</code>,
+      <code class="ph codeph">STRUCT</code>, or <code class="ph codeph">MAP</code>).
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+
+
+<pre class="pre codeblock"><code><var class="keyword varname">column_name</var> ARRAY &lt; <var class="keyword varname">type</var> &gt;
+
+type ::= <var class="keyword varname">primitive_type</var> | <var class="keyword varname">complex_type</var>
+</code></pre>
+
+      <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+      <p class="p">
+        Because complex types are often used in combination,
+        for example an <code class="ph codeph">ARRAY</code> of <code class="ph codeph">STRUCT</code>
+        elements, if you are unfamiliar with the Impala complex types,
+        start with <a class="xref" href="../shared/../topics/impala_complex_types.html#complex_types">Complex Types (Impala 2.3 or higher only)</a> for
+        background information and usage examples.
+      </p>
+
+      <p class="p">
+        The elements of the array have no names. You refer to the value of the array item using the
+        <code class="ph codeph">ITEM</code> pseudocolumn, or its position in the array with the <code class="ph codeph">POS</code>
+        pseudocolumn. See <a class="xref" href="impala_complex_types.html#item">ITEM and POS Pseudocolumns</a> for information about
+        these pseudocolumns.
+      </p>
+
+
+
+    <p class="p">
+      Each row can have a different number of elements (including none) in the array for that row.
+    </p>
+
+
+
+      <p class="p">
+        When an array contains items of scalar types, you can use aggregation functions on the array elements without using join notation. For
+        example, you can find the <code class="ph codeph">COUNT()</code>, <code class="ph codeph">AVG()</code>, <code class="ph codeph">SUM()</code>, and so on of numeric array
+        elements, or the <code class="ph codeph">MAX()</code> and <code class="ph codeph">MIN()</code> of any scalar array elements by referring to
+        <code class="ph codeph"><var class="keyword varname">table_name</var>.<var class="keyword varname">array_column</var></code> in the <code class="ph codeph">FROM</code> clause of the query. When
+        you need to cross-reference values from the array with scalar values from the same row, such as by including a <code class="ph codeph">GROUP
+        BY</code> clause to produce a separate aggregated result for each row, then the join clause is required.
+      </p>
+
+      <p class="p">
+        A common usage pattern with complex types is to have an array as the top-level type for the column:
+        an array of structs, an array of maps, or an array of arrays.
+        For example, you can model a denormalized table by creating a column that is an <code class="ph codeph">ARRAY</code>
+        of <code class="ph codeph">STRUCT</code> elements; each item in the array represents a row from a table that would
+        normally be used in a join query. This kind of data structure lets you essentially denormalize tables by
+        associating multiple rows from one table with the matching row in another table.
+      </p>
+
+      <p class="p">
+        You typically do not create more than one top-level <code class="ph codeph">ARRAY</code> column, because if there is
+        some relationship between the elements of multiple arrays, it is convenient to model the data as
+        an array of another complex type element (either <code class="ph codeph">STRUCT</code> or <code class="ph codeph">MAP</code>).
+      </p>
+
+      <p class="p">
+        You can pass a multi-part qualified name to <code class="ph codeph">DESCRIBE</code>
+        to specify an <code class="ph codeph">ARRAY</code>, <code class="ph codeph">STRUCT</code>, or <code class="ph codeph">MAP</code>
+        column and visualize its structure as if it were a table.
+        For example, if table <code class="ph codeph">T1</code> contains an <code class="ph codeph">ARRAY</code> column
+        <code class="ph codeph">A1</code>, you could issue the statement <code class="ph codeph">DESCRIBE t1.a1</code>.
+        If table <code class="ph codeph">T1</code> contained a <code class="ph codeph">STRUCT</code> column <code class="ph codeph">S1</code>,
+        and a field <code class="ph codeph">F1</code> within the <code class="ph codeph">STRUCT</code> was a <code class="ph codeph">MAP</code>,
+        you could issue the statement <code class="ph codeph">DESCRIBE t1.s1.f1</code>.
+        An <code class="ph codeph">ARRAY</code> is shown as a two-column table, with
+        <code class="ph codeph">ITEM</code> and <code class="ph codeph">POS</code> columns.
+        A <code class="ph codeph">STRUCT</code> is shown as a table with each field
+        representing a column in the table.
+        A <code class="ph codeph">MAP</code> is shown as a two-column table, with
+        <code class="ph codeph">KEY</code> and <code class="ph codeph">VALUE</code> columns.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.3.0</span>
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Restrictions:</strong>
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          <p class="p">
+            Columns with this data type can only be used in tables or partitions with the Parquet file format.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            Columns with this data type cannot be used as partition key columns in a partitioned table.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            The <code class="ph codeph">COMPUTE STATS</code> statement does not produce any statistics for columns of this data type.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p" id="array__d6e3285">
+            The maximum length of the column definition for any complex type, including declarations for any nested types,
+            is 4000 characters.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            See <a class="xref" href="../shared/../topics/impala_complex_types.html#complex_types_limits">Limitations and Restrictions for Complex Types</a> for a full list of limitations
+            and associated guidelines about complex type columns.
+          </p>
+        </li>
+      </ul>
+
+      <p class="p">
+        <strong class="ph b">Kudu considerations:</strong>
+      </p>
+      <p class="p">
+        Currently, the data types <code class="ph codeph">CHAR</code>, <code class="ph codeph">VARCHAR</code>,
+        <code class="ph codeph">ARRAY</code>, <code class="ph codeph">MAP</code>, and <code class="ph codeph">STRUCT</code> cannot be used with Kudu tables.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+      <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+      Many of the complex type examples refer to tables
+      such as <code class="ph codeph">CUSTOMER</code> and <code class="ph codeph">REGION</code>
+      adapted from the tables used in the TPC-H benchmark.
+      See <a class="xref" href="../shared/../topics/impala_complex_types.html#complex_sample_schema">Sample Schema and Data for Experimenting with Impala Complex Types</a>
+      for the table definitions.
+      </div>
+
+      <p class="p">
+        The following example shows how to construct a table with various kinds of <code class="ph codeph">ARRAY</code> columns,
+        both at the top level and nested within other complex types.
+        Whenever the <code class="ph codeph">ARRAY</code> consists of a scalar value, such as in the <code class="ph codeph">PETS</code>
+        column or the <code class="ph codeph">CHILDREN</code> field, you can see that future expansion is limited.
+        For example, you could not easily evolve the schema to record the kind of pet or the child's birthday alongside the name.
+        Therefore, it is more common to use an <code class="ph codeph">ARRAY</code> whose elements are of <code class="ph codeph">STRUCT</code> type,
+        to associate multiple fields with each array element.
+      </p>
+
+      <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+        Practice the <code class="ph codeph">CREATE TABLE</code> and query notation for complex type columns
+        using empty tables, until you can visualize a complex data structure and construct corresponding SQL statements reliably.
+      </div>
+
+
+
+<pre class="pre codeblock"><code>CREATE TABLE array_demo
+(
+  id BIGINT,
+  name STRING,
+-- An ARRAY of scalar type as a top-level column.
+  pets ARRAY &lt;STRING&gt;,
+
+-- An ARRAY with elements of complex type (STRUCT).
+  places_lived ARRAY &lt; STRUCT &lt;
+    place: STRING,
+    start_year: INT
+  &gt;&gt;,
+
+-- An ARRAY as a field (CHILDREN) within a STRUCT.
+-- (The STRUCT is inside another ARRAY, because it is rare
+-- for a STRUCT to be a top-level column.)
+  marriages ARRAY &lt; STRUCT &lt;
+    spouse: STRING,
+    children: ARRAY &lt;STRING&gt;
+  &gt;&gt;,
+
+-- An ARRAY as the value part of a MAP.
+-- The first MAP field (the key) would be a value such as
+-- 'Parent' or 'Grandparent', and the corresponding array would
+-- represent 2 parents, 4 grandparents, and so on.
+  ancestors MAP &lt; STRING, ARRAY &lt;STRING&gt; &gt;
+)
+STORED AS PARQUET;
+
+</code></pre>
+
+    <p class="p">
+      The following example shows how to examine the structure of a table containing one or more <code class="ph codeph">ARRAY</code> columns by using the
+      <code class="ph codeph">DESCRIBE</code> statement. You can visualize each <code class="ph codeph">ARRAY</code> as its own two-column table, with columns
+      <code class="ph codeph">ITEM</code> and <code class="ph codeph">POS</code>.
+    </p>
+
+
+
+<pre class="pre codeblock"><code>DESCRIBE array_demo;
++--------------+---------------------------+
+| name         | type                      |
++--------------+---------------------------+
+| id           | bigint                    |
+| name         | string                    |
+| pets         | array&lt;string&gt;             |
+| marriages    | array&lt;struct&lt;             |
+|              |   spouse:string,          |
+|              |   children:array&lt;string&gt;  |
+|              | &gt;&gt;                        |
+| places_lived | array&lt;struct&lt;             |
+|              |   place:string,           |
+|              |   start_year:int          |
+|              | &gt;&gt;                        |
+| ancestors    | map&lt;string,array&lt;string&gt;&gt; |
++--------------+---------------------------+
+
+DESCRIBE array_demo.pets;
++------+--------+
+| name | type   |
++------+--------+
+| item | string |
+| pos  | bigint |
++------+--------+
+
+DESCRIBE array_demo.marriages;
++------+--------------------------+
+| name | type                     |
++------+--------------------------+
+| item | struct&lt;                  |
+|      |   spouse:string,         |
+|      |   children:array&lt;string&gt; |
+|      | &gt;                        |
+| pos  | bigint                   |
++------+--------------------------+
+
+DESCRIBE array_demo.places_lived;
++------+------------------+
+| name | type             |
++------+------------------+
+| item | struct&lt;          |
+|      |   place:string,  |
+|      |   start_year:int |
+|      | &gt;                |
+| pos  | bigint           |
++------+------------------+
+
+DESCRIBE array_demo.ancestors;
++-------+---------------+
+| name  | type          |
++-------+---------------+
+| key   | string        |
+| value | array&lt;string&gt; |
++-------+---------------+
+
+</code></pre>
+
+    <p class="p">
+      The following example shows queries involving <code class="ph codeph">ARRAY</code> columns containing elements of scalar or complex types. You
+      <span class="q">"unpack"</span> each <code class="ph codeph">ARRAY</code> column by referring to it in a join query, as if it were a separate table with
+      <code class="ph codeph">ITEM</code> and <code class="ph codeph">POS</code> columns. If the array element is a scalar type, you refer to its value using the
+      <code class="ph codeph">ITEM</code> pseudocolumn. If the array element is a <code class="ph codeph">STRUCT</code>, you refer to the <code class="ph codeph">STRUCT</code> fields
+      using dot notation and the field names. If the array element is another <code class="ph codeph">ARRAY</code> or a <code class="ph codeph">MAP</code>, you use
+      another level of join to unpack the nested collection elements.
+    </p>
+
+
+
+<pre class="pre codeblock"><code>-- Array of scalar values.
+-- Each array element represents a single string, plus we know its position in the array.
+SELECT id, name, pets.pos, pets.item FROM array_demo, array_demo.pets;
+
+-- Array of structs.
+-- Now each array element has named fields, possibly of different types.
+-- You can consider an ARRAY of STRUCT to represent a table inside another table.
+SELECT id, name, places_lived.pos, places_lived.item.place, places_lived.item.start_year
+FROM array_demo, array_demo.places_lived;
+
+-- The .ITEM name is optional for array elements that are structs.
+-- The following query is equivalent to the previous one, with .ITEM
+-- removed from the column references.
+SELECT id, name, places_lived.pos, places_lived.place, places_lived.start_year
+  FROM array_demo, array_demo.places_lived;
+
+-- To filter specific items from the array, do comparisons against the .POS or .ITEM
+-- pseudocolumns, or names of struct fields, in the WHERE clause.
+SELECT id, name, pets.item FROM array_demo, array_demo.pets
+  WHERE pets.pos in (0, 1, 3);
+
+SELECT id, name, pets.item FROM array_demo, array_demo.pets
+  WHERE pets.item LIKE 'Mr. %';
+
+SELECT id, name, places_lived.pos, places_lived.place, places_lived.start_year
+  FROM array_demo, array_demo.places_lived
+WHERE places_lived.place like '%California%';
+
+</code></pre>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+    <p class="p">
+      <a class="xref" href="impala_complex_types.html#complex_types">Complex Types (Impala 2.3 or higher only)</a>,
+
+      <a class="xref" href="impala_struct.html#struct">STRUCT Complex Type (Impala 2.3 or higher only)</a>, <a class="xref" href="impala_map.html#map">MAP Complex Type (Impala 2.3 or higher only)</a>
+    </p>
+
+  </div>
+
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_datatypes.html">Data Types</a></div></div></nav></article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_auditing.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_auditing.html b/docs/build3x/html/topics/impala_auditing.html
new file mode 100644
index 0000000..bbdca95
--- /dev/null
+++ b/docs/build3x/html/topics/impala_auditing.html
@@ -0,0 +1,232 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_security.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="auditing"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Auditing Impala Operations</title></head><body id="auditing"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Auditing Impala Operations</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+      To monitor how Impala data is being used within your organization, ensure
+      that your Impala authorization and authentication policies are effective.
+      To detect attempts at intrusion or unauthorized access to Impala
+      data, you can use the auditing feature in Impala 1.2.1 and higher:
+    </p>
+
+    <ul class="ul">
+      <li class="li">
+        Enable auditing by including the option
+        <code class="ph codeph">-audit_event_log_dir=<var class="keyword varname">directory_path</var></code>
+        in your <span class="keyword cmdname">impalad</span> startup options.
+        The log directory must be a local directory on the
+        server, not an HDFS directory.
+      </li>
+
+      <li class="li">
+        Decide how many queries will be represented in each audit event log file. By default,
+        Impala starts a new audit event log file every 5000 queries. To specify a different number,
+        <span class="ph">include
+        the option <code class="ph codeph">--max_audit_event_log_file_size=<var class="keyword varname">number_of_queries</var></code>
+        in the <span class="keyword cmdname">impalad</span> startup options</span>.
+      </li>
+
+      <li class="li">
+        In <span class="keyword">Impala 2.9</span> and higher, you can control how many
+        audit event log files are kept on each host. Specify the option
+        <code class="ph codeph">--max_audit_event_log_files=<var class="keyword varname">number_of_log_files</var></code>
+        in the <span class="keyword cmdname">impalad</span> startup options. Once the limit is reached, older
+        files are rotated out using the same mechanism as for other Impala log files.
+        The default value for this setting is 0, representing an unlimited number of audit
+        event log files.
+      </li>
+
+      <li class="li">
+        Use a cluster manager with governance capabilities to filter, visualize,
+        and produce reports based on the audit logs collected
+        from all the hosts in the cluster.
+      </li>
+    </ul>
+
+    <p class="p toc inpage"></p>
+  </div>
+
+  <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_security.html">Impala Security</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="auditing__auditing_performance">
+
+    <h2 class="title topictitle2" id="ariaid-title2">Durability and Performance Considerations for Impala Auditing</h2>
+
+
+    <div class="body conbody">
+
+      <p class="p">
+        The auditing feature only imposes performance overhead while auditing is enabled.
+      </p>
+
+      <p class="p">
+        Because any Impala host can process a query, enable auditing on all hosts where the
+        <span class="ph"><span class="keyword cmdname">impalad</span> daemon</span>
+         runs. Each host stores its own log
+        files, in a directory in the local filesystem. The log data is periodically flushed to disk (through an
+        <code class="ph codeph">fsync()</code> system call) to avoid loss of audit data in case of a crash.
+      </p>
+
+      <p class="p">
+        The runtime overhead of auditing applies to whichever host serves as the coordinator
+        for the query, that is, the host you connect to when you issue the query. This might
+        be the same host for all queries, or different applications or users might connect to
+        and issue queries through different hosts.
+      </p>
+
+      <p class="p">
+        To avoid excessive I/O overhead on busy coordinator hosts, Impala syncs the audit log
+        data (using the <code class="ph codeph">fsync()</code> system call) periodically rather than after
+        every query. Currently, the <code class="ph codeph">fsync()</code> calls are issued at a fixed
+        interval, every 5 seconds.
+      </p>
+
+      <p class="p">
+        By default, Impala avoids losing any audit log data in the case of an error during a logging operation
+        (such as a disk full error), by immediately shutting down
+        <span class="keyword cmdname">impalad</span> on the host where the auditing problem occurred.
+        <span class="ph">You can override this setting by specifying the option
+        <code class="ph codeph">-abort_on_failed_audit_event=false</code> in the <span class="keyword cmdname">impalad</span> startup options.</span>
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="auditing__auditing_format">
+
+    <h2 class="title topictitle2" id="ariaid-title3">Format of the Audit Log Files</h2>
+
+
+    <div class="body conbody">
+
+      <p class="p">
+        The audit log files represent the query information in JSON format, one query per line.
+        Typically, rather than looking at the log files themselves, you should use cluster-management
+        software to consolidate the log data from all Impala hosts and filter and visualize the results
+        in useful ways. (If you do examine the raw log data, you might run the files through
+        a JSON pretty-printer first.)
+     </p>
+
+      <p class="p">
+        All the information about schema objects accessed by the query is encoded in a single nested record on the
+        same line. For example, the audit log for an <code class="ph codeph">INSERT ... SELECT</code> statement records that a
+        select operation occurs on the source table and an insert operation occurs on the destination table. The
+        audit log for a query against a view records the base table accessed by the view, or multiple base tables
+        in the case of a view that includes a join query. Every Impala operation that corresponds to a SQL
+        statement is recorded in the audit logs, whether the operation succeeds or fails. Impala records more
+        information for a successful operation than for a failed one, because an unauthorized query is stopped
+        immediately, before all the query planning is completed.
+      </p>
+
+
+
+      <p class="p">
+        The information logged for each query includes:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          Client session state:
+          <ul class="ul">
+            <li class="li">
+              Session ID
+            </li>
+
+            <li class="li">
+              User name
+            </li>
+
+            <li class="li">
+              Network address of the client connection
+            </li>
+          </ul>
+        </li>
+
+        <li class="li">
+          SQL statement details:
+          <ul class="ul">
+            <li class="li">
+              Query ID
+            </li>
+
+            <li class="li">
+              Statement Type - DML, DDL, and so on
+            </li>
+
+            <li class="li">
+              SQL statement text
+            </li>
+
+            <li class="li">
+              Execution start time, in local time
+            </li>
+
+            <li class="li">
+              Execution Status - Details on any errors that were encountered
+            </li>
+
+            <li class="li">
+              Target Catalog Objects:
+              <ul class="ul">
+                <li class="li">
+                  Object Type - Table, View, or Database
+                </li>
+
+                <li class="li">
+                  Fully qualified object name
+                </li>
+
+                <li class="li">
+                  Privilege - How the object is being used (<code class="ph codeph">SELECT</code>, <code class="ph codeph">INSERT</code>,
+                  <code class="ph codeph">CREATE</code>, and so on)
+                </li>
+              </ul>
+            </li>
+          </ul>
+        </li>
+      </ul>
+
+
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="auditing__auditing_exceptions">
+
+    <h2 class="title topictitle2" id="ariaid-title4">Which Operations Are Audited</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        The kinds of SQL queries represented in the audit log are:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          Queries that are prevented due to lack of authorization.
+        </li>
+
+        <li class="li">
+          Queries that Impala can analyze and parse to determine that they are authorized. The audit data is
+          recorded immediately after Impala finishes its analysis, before the query is actually executed.
+        </li>
+      </ul>
+
+      <p class="p">
+        The audit log does not contain entries for queries that could not be parsed and analyzed. For example, a
+        query that fails due to a syntax error is not recorded in the audit log. The audit log also does not
+        contain queries that fail due to a reference to a table that does not exist, if you would be authorized to
+        access the table if it did exist.
+      </p>
+
+      <p class="p">
+        Certain statements in the <span class="keyword cmdname">impala-shell</span> interpreter, such as <code class="ph codeph">CONNECT</code>,
+        <code class="ph codeph">SUMMARY</code>, <code class="ph codeph">PROFILE</code>, <code class="ph codeph">SET</code>, and
+        <code class="ph codeph">QUIT</code>, do not correspond to actual SQL queries, and these statements are not reflected in
+        the audit log.
+      </p>
+    </div>
+  </article>
+</article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_authentication.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_authentication.html b/docs/build3x/html/topics/impala_authentication.html
new file mode 100644
index 0000000..b072c37
--- /dev/null
+++ b/docs/build3x/html/topics/impala_authentication.html
@@ -0,0 +1,37 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_security.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_kerberos.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_ldap.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_mixed_security.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_delegation.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="authentication"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Impala Auth
 entication</title></head><body id="authentication"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Impala Authentication</h1>
+
+
+  <div class="body conbody">
+
+    <p class="p">
+      Authentication is the mechanism to ensure that only specified hosts and users can connect to Impala. It also
+      verifies that when clients connect to Impala, they are connected to a legitimate server. This feature
+      prevents spoofing such as <dfn class="term">impersonation</dfn> (setting up a phony client system with the same account
+      and group names as a legitimate user) and <dfn class="term">man-in-the-middle attacks</dfn> (intercepting application
+      requests before they reach Impala and eavesdropping on sensitive information in the requests or the results).
+    </p>
+
+    <p class="p">
+      Impala supports authentication using either Kerberos or LDAP.
+    </p>
+
+    <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+      Regardless of the authentication mechanism used, Impala always creates HDFS directories and data files
+      owned by the same user (typically <code class="ph codeph">impala</code>). To implement user-level access to different
+      databases, tables, columns, partitions, and so on, use the Sentry authorization feature, as explained in
+      <a class="xref" href="../shared/../topics/impala_authorization.html#authorization">Enabling Sentry Authorization for Impala</a>.
+    </div>
+
+    <p class="p toc"></p>
+
+    <p class="p">
+      Once you are finished setting up authentication, move on to authorization, which involves specifying what
+      databases, tables, HDFS directories, and so on can be accessed by particular users when they connect through
+      Impala. See <a class="xref" href="impala_authorization.html#authorization">Enabling Sentry Authorization for Impala</a> for details.
+    </p>
+  </div>
+<nav role="navigation" class="related-links"><ul class="ullinks"><li class="link ulchildlink"><strong><a href="../topics/impala_kerberos.html">Enabling Kerberos Authentication for Impala</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_ldap.html">Enabling LDAP Authentication for Impala</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_mixed_security.html">Using Multiple Authentication Methods with Impala</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_delegation.html">Configuring Impala Delegation for Hue and BI Tools</a></strong><br></li></ul><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_security.html">Impala Security</a></div></div></nav></article></main></body></html>

[08/51] [partial] impala git commit: [DOCS] Impala doc site update for 3.0

Posted by mi...@apache.org.

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_resource_management.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_resource_management.html b/docs/build3x/html/topics/impala_resource_management.html
new file mode 100644
index 0000000..cbc116a
--- /dev/null
+++ b/docs/build3x/html/topics/impala_resource_management.html
@@ -0,0 +1,97 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_admin.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="resource_management"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Resource Management for Impala</title></head><body id="resource_management"><mai
 n role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Resource Management for Impala</h1>
+
+
+  <div class="body conbody">
+
+    <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+        <p class="p">
+          The use of the Llama component for integrated resource management within YARN
+          is no longer supported with <span class="keyword">Impala 2.3</span> and higher.
+          The Llama support code is removed entirely in <span class="keyword">Impala 2.8</span> and higher.
+        </p>
+        <p class="p">
+          For clusters running Impala alongside
+          other data management components, you define static service pools to define the resources
+          available to Impala and other components. Then within the area allocated for Impala,
+          you can create dynamic service pools, each with its own settings for the Impala admission control feature.
+        </p>
+      </div>
+
+    <p class="p">
+      You can limit the CPU and memory resources used by Impala, to manage and prioritize workloads on clusters
+      that run jobs from many Hadoop components.
+    </p>
+
+    <p class="p toc inpage"></p>
+  </div>
+
+  <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_admin.html">Impala Administration</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="resource_management__rm_enforcement">
+
+    <h2 class="title topictitle2" id="ariaid-title2">How Resource Limits Are Enforced</h2>
+
+
+    <div class="body conbody">
+
+      <p class="p">
+        Limits on memory usage are enforced by Impala's process memory limit (the <code class="ph codeph">MEM_LIMIT</code>
+        query option setting). The admission control feature checks this setting to decide how many queries
+        can be safely run at the same time. Then the Impala daemon enforces the limit by activating the
+        spill-to-disk mechanism when necessary, or cancelling a query altogether if the limit is exceeded at runtime.
+      </p>
+
+    </div>
+  </article>
+
+
+
+    <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="resource_management__rm_query_options">
+
+      <h2 class="title topictitle2" id="ariaid-title3">impala-shell Query Options for Resource Management</h2>
+
+
+      <div class="body conbody">
+
+        <p class="p">
+          Before issuing SQL statements through the <span class="keyword cmdname">impala-shell</span> interpreter, you can use the
+          <code class="ph codeph">SET</code> command to configure the following parameters related to resource management:
+        </p>
+
+        <ul class="ul" id="rm_query_options__ul_nzt_twf_jp">
+          <li class="li">
+            <a class="xref" href="impala_explain_level.html#explain_level">EXPLAIN_LEVEL Query Option</a>
+          </li>
+
+          <li class="li">
+            <a class="xref" href="impala_mem_limit.html#mem_limit">MEM_LIMIT Query Option</a>
+          </li>
+
+        </ul>
+      </div>
+    </article>
+
+
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="resource_management__rm_limitations">
+
+    <h2 class="title topictitle2" id="ariaid-title4">Limitations of Resource Management for Impala</h2>
+
+    <div class="body conbody">
+
+
+
+
+
+
+
+      <p class="p">
+        The <code class="ph codeph">MEM_LIMIT</code> query option, and the other resource-related query options, are settable
+        through the ODBC or JDBC interfaces in Impala 2.0 and higher. This is a former limitation that is now
+        lifted.
+      </p>
+    </div>
+  </article>
+</article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_revoke.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_revoke.html b/docs/build3x/html/topics/impala_revoke.html
new file mode 100644
index 0000000..02cbb59
--- /dev/null
+++ b/docs/build3x/html/topics/impala_revoke.html
@@ -0,0 +1,151 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_langref_sql.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="revoke"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>REVOKE Statement (Impala 2.0 or higher only)</title></head><body id="revoke"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">REVOKE Statement (<span class="keyword">Impala 2.0</span> or higher only)</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+      The <code class="ph codeph">REVOKE</code> statement revokes roles or
+      privileges on a specified object from groups.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>REVOKE ROLE <var class="keyword varname">role_name</var> FROM GROUP <var class="keyword varname">group_name</var>
+
+REVOKE <var class="keyword varname">privilege</var> ON <var class="keyword varname">object_type</var> <var class="keyword varname">object_name</var>
+  FROM [ROLE] <var class="keyword varname">role_name</var>
+
+<span class="ph">
+  privilege ::= ALL | ALTER | CREATE | DROP | INSERT | REFRESH | SELECT | SELECT(<var class="keyword varname">column_name</var>)
+</span>
+<span class="ph">
+  object_type ::= TABLE | DATABASE | SERVER | URI
+</span>
+</code></pre>
+
+    <p class="p">
+      See <a href="impala_grant.html"><span class="keyword">GRANT Statement (Impala 2.0 or higher only)</span></a> for the required privileges and the scope
+      for SQL operations.
+    </p>
+
+    <p class="p">
+      The <code class="ph codeph">ALL</code> privilege is a distinct privilege and not a
+      union of all other privileges. Revoking <code class="ph codeph">SELECT</code>,
+        <code class="ph codeph">INSERT</code>, etc. from a role that only has the
+        <code class="ph codeph">ALL</code> privilege has no effect. To reduce the privileges
+      of that role you must <code class="ph codeph">REVOKE ALL</code> and
+        <code class="ph codeph">GRANT</code> the desired privileges.
+    </p>
+
+    <p class="p">
+      Typically, the object name is an identifier. For URIs, it is a string literal.
+    </p>
+
+    <p class="p">
+      The ability to grant or revoke <code class="ph codeph">SELECT</code> privilege on specific columns is available
+      in <span class="keyword">Impala 2.3</span> and higher. See
+      <span class="xref">the documentation for Apache Sentry</span> for details.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Required privileges:</strong>
+      </p>
+
+    <p class="p">
+      Only administrative users (those with <code class="ph codeph">ALL</code> privileges on the server, defined in the Sentry
+      policy file) can use this statement.
+    </p>
+    <p class="p">Only Sentry administrative users can revoke the role from a group.</p>
+
+    <p class="p">
+        <strong class="ph b">Compatibility:</strong>
+      </p>
+
+    <div class="p">
+      <ul class="ul">
+        <li class="li">
+          The <code class="ph codeph">REVOKE</code> statements are available in <span class="keyword">Impala 2.0</span> and higher.
+        </li>
+
+        <li class="li">
+          In <span class="keyword">Impala 1.4</span> and higher, Impala makes use of any roles and privileges specified by the
+          <code class="ph codeph">GRANT</code> and <code class="ph codeph">REVOKE</code> statements in Hive, when your system is configured to
+          use the Sentry service instead of the file-based policy mechanism.
+        </li>
+
+        <li class="li">
+          The Impala <code class="ph codeph">REVOKE</code> statements do not require the
+            <code class="ph codeph">ROLE</code> keyword to be repeated before each role name,
+          unlike the equivalent Hive statements.
+        </li>
+
+        <li class="li">
+          Currently, each Impala <code class="ph codeph">GRANT</code> or <code class="ph codeph">REVOKE</code> statement can only grant or
+          revoke a single privilege to or from a single role.
+        </li>
+      </ul>
+    </div>
+
+    <p class="p">
+        <strong class="ph b">Cancellation:</strong> Cannot be cancelled.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">HDFS permissions:</strong> This statement does not touch any HDFS files or directories,
+        therefore no HDFS permissions are required.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Kudu considerations:</strong>
+      </p>
+    <div class="p">
+        Access to Kudu tables must be granted to and revoked from roles with the
+        following considerations:
+        <ul class="ul">
+          <li class="li">
+            Only users with the <code class="ph codeph">ALL</code> privilege on
+              <code class="ph codeph">SERVER</code> can create external Kudu tables.
+          </li>
+          <li class="li">
+            The <code class="ph codeph">ALL</code> privileges on <code class="ph codeph">SERVER</code> is
+            required to specify the <code class="ph codeph">kudu.master_addresses</code>
+            property in the <code class="ph codeph">CREATE TABLE</code> statements for managed
+            tables as well as external tables.
+          </li>
+          <li class="li">
+            Access to Kudu tables is enforced at the table level and at the
+            column level.
+          </li>
+          <li class="li">
+            The <code class="ph codeph">SELECT</code>- and <code class="ph codeph">INSERT</code>-specific
+            permissions are supported.
+          </li>
+          <li class="li">
+            The <code class="ph codeph">DELETE</code>, <code class="ph codeph">UPDATE</code>, and
+            <code class="ph codeph">UPSERT</code> operations require the <code class="ph codeph">ALL</code>
+            privilege.
+          </li>
+        </ul>
+        Because non-SQL APIs can access Kudu data without going through Sentry
+        authorization, currently the Sentry support is considered preliminary
+        and subject to change.
+      </div>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+    <p class="p">
+      <a class="xref" href="impala_authorization.html#authorization">Enabling Sentry Authorization for Impala</a>, <a class="xref" href="impala_grant.html#grant">GRANT Statement (Impala 2.0 or higher only)</a>
+      <a class="xref" href="impala_create_role.html#create_role">CREATE ROLE Statement (Impala 2.0 or higher only)</a>, <a class="xref" href="impala_drop_role.html#drop_role">DROP ROLE Statement (Impala 2.0 or higher only)</a>,
+      <a class="xref" href="impala_show.html#show">SHOW Statement</a>
+    </p>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_langref_sql.html">Impala SQL Statements</a></div></div></nav></article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_runtime_bloom_filter_size.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_runtime_bloom_filter_size.html b/docs/build3x/html/topics/impala_runtime_bloom_filter_size.html
new file mode 100644
index 0000000..7f9466e
--- /dev/null
+++ b/docs/build3x/html/topics/impala_runtime_bloom_filter_size.html
@@ -0,0 +1,104 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="runtime_bloom_filter_size"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>RUNTIME_BLOOM_FILTER_SIZE Query Option (Impala 2.5 or higher only)</title></head><body id="runtime_bloom_filter_size"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">RUNTIME_BLOOM_FILTER_SIZE Query Option (<span class="keyword">Impala 2.5</span> or higher only)</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+
+      Size (in bytes) of Bloom filter data structure used by the runtime filtering
+      feature.
+    </p>
+
+    <div class="note important note_important"><span class="note__title importanttitle">Important:</span>
+      <p class="p">
+        In <span class="keyword">Impala 2.6</span> and higher, this query option only applies as a fallback, when statistics
+        are not available. By default, Impala estimates the optimal size of the Bloom filter structure
+        regardless of the setting for this option. (This is a change from the original behavior in
+        <span class="keyword">Impala 2.5</span>.)
+      </p>
+      <p class="p">
+        In <span class="keyword">Impala 2.6</span> and higher, when the value of this query option is used for query planning,
+        it is constrained by the minimum and maximum sizes specified by the
+        <code class="ph codeph">RUNTIME_FILTER_MIN_SIZE</code> and <code class="ph codeph">RUNTIME_FILTER_MAX_SIZE</code> query options.
+        The filter size is adjusted upward or downward if necessary to fit within the minimum/maximum range.
+      </p>
+    </div>
+
+    <p class="p">
+        <strong class="ph b">Type:</strong> integer
+      </p>
+
+    <p class="p">
+      <strong class="ph b">Default:</strong> 1048576 (1 MB)
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Maximum:</strong> 16 MB
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.5.0</span>
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+    <p class="p">
+      This setting affects optimizations for large and complex queries, such
+      as dynamic partition pruning for partitioned tables, and join optimization
+      for queries that join large tables.
+      Larger filters are more effective at handling
+      higher cardinality input sets, but consume more memory per filter.
+
+    </p>
+
+    <p class="p">
+      If your query filters on high-cardinality columns (for example, millions of different values)
+      and you do not get the expected speedup from the runtime filtering mechanism, consider
+      doing some benchmarks with a higher value for <code class="ph codeph">RUNTIME_BLOOM_FILTER_SIZE</code>.
+      The extra memory devoted to the Bloom filter data structures can help make the filtering
+      more accurate.
+    </p>
+
+    <p class="p">
+        Because the runtime filtering feature applies mainly to resource-intensive
+        and long-running queries, only adjust this query option when tuning long-running queries
+        involving some combination of large partitioned tables and joins involving large tables.
+      </p>
+
+    <p class="p">
+      Because the effectiveness of this setting depends so much on query characteristics and data distribution,
+      you typically only use it for specific queries that need some extra tuning, and the ideal value depends
+      on the query. Consider setting this query option immediately before the expensive query and
+      unsetting it immediately afterward.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Kudu considerations:</strong>
+      </p>
+
+    <p class="p">
+        This query option affects only Bloom filters, not the min/max filters
+        that are applied to Kudu tables. Therefore, it does not affect the
+        performance of queries against Kudu tables.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+    <p class="p">
+      <a class="xref" href="impala_runtime_filtering.html">Runtime Filtering for Impala Queries (Impala 2.5 or higher only)</a>,
+
+      <a class="xref" href="impala_runtime_filter_mode.html#runtime_filter_mode">RUNTIME_FILTER_MODE Query Option (Impala 2.5 or higher only)</a>,
+      <a class="xref" href="impala_runtime_filter_min_size.html#runtime_filter_min_size">RUNTIME_FILTER_MIN_SIZE Query Option (Impala 2.6 or higher only)</a>,
+      <a class="xref" href="impala_runtime_filter_max_size.html#runtime_filter_max_size">RUNTIME_FILTER_MAX_SIZE Query Option (Impala 2.6 or higher only)</a>
+    </p>
+
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_runtime_filter_max_size.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_runtime_filter_max_size.html b/docs/build3x/html/topics/impala_runtime_filter_max_size.html
new file mode 100644
index 0000000..b1cf316
--- /dev/null
+++ b/docs/build3x/html/topics/impala_runtime_filter_max_size.html
@@ -0,0 +1,65 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="runtime_filter_max_size"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>RUNTIME_FILTER_MAX_SIZE Query Option (Impala 2.6 or higher only)</title></head><body id="runtime_filter_max_size"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">RUNTIME_FILTER_MAX_SIZE Query Option (<span class="keyword">Impala 2.6</span> or higher only)</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+
+      The <code class="ph codeph">RUNTIME_FILTER_MAX_SIZE</code> query option
+      adjusts the settings for the runtime filtering feature.
+      This option defines the maximum size for a filter,
+      no matter what the estimates produced by the planner are.
+      This value also overrides any lower number specified for the
+      <code class="ph codeph">RUNTIME_BLOOM_FILTER_SIZE</code> query option.
+      Filter sizes are rounded up to the nearest power of two.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Type:</strong> integer
+      </p>
+
+    <p class="p">
+      <strong class="ph b">Default:</strong> 0 (meaning use the value from the corresponding <span class="keyword cmdname">impalad</span> startup option)
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.6.0</span>
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+    <p class="p">
+        Because the runtime filtering feature applies mainly to resource-intensive
+        and long-running queries, only adjust this query option when tuning long-running queries
+        involving some combination of large partitioned tables and joins involving large tables.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Kudu considerations:</strong>
+      </p>
+
+    <p class="p">
+        This query option affects only Bloom filters, not the min/max filters
+        that are applied to Kudu tables. Therefore, it does not affect the
+        performance of queries against Kudu tables.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+    <p class="p">
+      <a class="xref" href="impala_runtime_filtering.html">Runtime Filtering for Impala Queries (Impala 2.5 or higher only)</a>,
+      <a class="xref" href="impala_runtime_filter_mode.html#runtime_filter_mode">RUNTIME_FILTER_MODE Query Option (Impala 2.5 or higher only)</a>,
+      <a class="xref" href="impala_runtime_filter_min_size.html#runtime_filter_min_size">RUNTIME_FILTER_MIN_SIZE Query Option (Impala 2.6 or higher only)</a>,
+      <a class="xref" href="impala_runtime_bloom_filter_size.html#runtime_bloom_filter_size">RUNTIME_BLOOM_FILTER_SIZE Query Option (Impala 2.5 or higher only)</a>
+    </p>
+
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_runtime_filter_min_size.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_runtime_filter_min_size.html b/docs/build3x/html/topics/impala_runtime_filter_min_size.html
new file mode 100644
index 0000000..fd70cdb
--- /dev/null
+++ b/docs/build3x/html/topics/impala_runtime_filter_min_size.html
@@ -0,0 +1,65 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="runtime_filter_min_size"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>RUNTIME_FILTER_MIN_SIZE Query Option (Impala 2.6 or higher only)</title></head><body id="runtime_filter_min_size"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">RUNTIME_FILTER_MIN_SIZE Query Option (<span class="keyword">Impala 2.6</span> or higher only)</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+
+      The <code class="ph codeph">RUNTIME_FILTER_MIN_SIZE</code> query option
+      adjusts the settings for the runtime filtering feature.
+      This option defines the minimum size for a filter,
+      no matter what the estimates produced by the planner are.
+      This value also overrides any lower number specified for the
+      <code class="ph codeph">RUNTIME_BLOOM_FILTER_SIZE</code> query option.
+      Filter sizes are rounded up to the nearest power of two.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Type:</strong> integer
+      </p>
+
+    <p class="p">
+      <strong class="ph b">Default:</strong> 0 (meaning use the value from the corresponding <span class="keyword cmdname">impalad</span> startup option)
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.6.0</span>
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+    <p class="p">
+        Because the runtime filtering feature applies mainly to resource-intensive
+        and long-running queries, only adjust this query option when tuning long-running queries
+        involving some combination of large partitioned tables and joins involving large tables.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Kudu considerations:</strong>
+      </p>
+
+    <p class="p">
+        This query option affects only Bloom filters, not the min/max filters
+        that are applied to Kudu tables. Therefore, it does not affect the
+        performance of queries against Kudu tables.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+    <p class="p">
+      <a class="xref" href="impala_runtime_filtering.html">Runtime Filtering for Impala Queries (Impala 2.5 or higher only)</a>,
+      <a class="xref" href="impala_runtime_filter_mode.html#runtime_filter_mode">RUNTIME_FILTER_MODE Query Option (Impala 2.5 or higher only)</a>,
+      <a class="xref" href="impala_runtime_filter_max_size.html#runtime_filter_max_size">RUNTIME_FILTER_MAX_SIZE Query Option (Impala 2.6 or higher only)</a>,
+      <a class="xref" href="impala_runtime_bloom_filter_size.html#runtime_bloom_filter_size">RUNTIME_BLOOM_FILTER_SIZE Query Option (Impala 2.5 or higher only)</a>
+    </p>
+
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_runtime_filter_mode.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_runtime_filter_mode.html b/docs/build3x/html/topics/impala_runtime_filter_mode.html
new file mode 100644
index 0000000..6ce6b3b
--- /dev/null
+++ b/docs/build3x/html/topics/impala_runtime_filter_mode.html
@@ -0,0 +1,75 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="runtime_filter_mode"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>RUNTIME_FILTER_MODE Query Option (Impala 2.5 or higher only)</title></head><body id="runtime_filter_mode"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">RUNTIME_FILTER_MODE Query Option (<span class="keyword">Impala 2.5</span> or higher only)</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+
+    </p>
+
+    <p class="p">
+      The <code class="ph codeph">RUNTIME_FILTER_MODE</code> query option
+      adjusts the settings for the runtime filtering feature.
+      It turns this feature on and off, and controls how
+      extensively the filters are transmitted between hosts.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Type:</strong> numeric (0, 1, 2)
+      or corresponding mnemonic strings (<code class="ph codeph">OFF</code>, <code class="ph codeph">LOCAL</code>, <code class="ph codeph">GLOBAL</code>).
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Default:</strong> 2 (equivalent to <code class="ph codeph">GLOBAL</code>); formerly was 1 / <code class="ph codeph">LOCAL</code>, in <span class="keyword">Impala 2.5</span>
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.5.0</span>
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+    <p class="p">
+      In <span class="keyword">Impala 2.6</span> and higher, the default is <code class="ph codeph">GLOBAL</code>.
+      This setting is recommended for a wide variety of workloads, to provide best
+      performance with <span class="q">"out of the box"</span> settings.
+    </p>
+
+    <p class="p">
+      The lowest setting of <code class="ph codeph">LOCAL</code> does a similar level of optimization
+      (such as partition pruning) as in earlier Impala releases.
+      This setting was the default in <span class="keyword">Impala 2.5</span>,
+      to allow for a period of post-upgrade testing for existing workloads.
+      This setting is suitable for workloads with non-performance-critical queries,
+      or if the coordinator node is under heavy CPU or memory pressure.
+    </p>
+
+    <p class="p">
+      You might change the setting to <code class="ph codeph">OFF</code> if your workload contains
+      many queries involving partitioned tables or joins that do not experience a performance
+      increase from the runtime filters feature. If the overhead of producing the runtime filters
+      outweighs the performance benefit for queries, you can turn the feature off entirely.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+    <p class="p">
+      <a class="xref" href="impala_partitioning.html#partitioning">Partitioning for Impala Tables</a> for details about runtime filtering.
+      <a class="xref" href="impala_disable_row_runtime_filtering.html#disable_row_runtime_filtering">DISABLE_ROW_RUNTIME_FILTERING Query Option (Impala 2.5 or higher only)</a>,
+      <a class="xref" href="impala_runtime_bloom_filter_size.html#runtime_bloom_filter_size">RUNTIME_BLOOM_FILTER_SIZE Query Option (Impala 2.5 or higher only)</a>,
+      <a class="xref" href="impala_runtime_filter_wait_time_ms.html#runtime_filter_wait_time_ms">RUNTIME_FILTER_WAIT_TIME_MS Query Option (Impala 2.5 or higher only)</a>,
+      and
+      <a class="xref" href="impala_max_num_runtime_filters.html#max_num_runtime_filters">MAX_NUM_RUNTIME_FILTERS Query Option (Impala 2.5 or higher only)</a>
+      for tuning options for runtime filtering.
+    </p>
+
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_runtime_filter_wait_time_ms.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_runtime_filter_wait_time_ms.html b/docs/build3x/html/topics/impala_runtime_filter_wait_time_ms.html
new file mode 100644
index 0000000..bcee5c6
--- /dev/null
+++ b/docs/build3x/html/topics/impala_runtime_filter_wait_time_ms.html
@@ -0,0 +1,51 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="runtime_filter_wait_time_ms"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>RUNTIME_FILTER_WAIT_TIME_MS Query Option (Impala 2.5 or higher only)</title></head><body id="runtime_filter_wait_time_ms"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">RUNTIME_FILTER_WAIT_TIME_MS Query Option (<span class="keyword">Impala 2.5</span> or higher only)</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+
+      The <code class="ph codeph">RUNTIME_FILTER_WAIT_TIME_MS</code> query option
+      adjusts the settings for the runtime filtering feature.
+      It specifies a time in milliseconds that each scan node waits for
+      runtime filters to be produced by other plan fragments.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Type:</strong> integer
+      </p>
+
+    <p class="p">
+      <strong class="ph b">Default:</strong> 0 (meaning use the value from the corresponding <span class="keyword cmdname">impalad</span> startup option)
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.5.0</span>
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+    <p class="p">
+        Because the runtime filtering feature applies mainly to resource-intensive
+        and long-running queries, only adjust this query option when tuning long-running queries
+        involving some combination of large partitioned tables and joins involving large tables.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+    <p class="p">
+      <a class="xref" href="impala_runtime_filtering.html">Runtime Filtering for Impala Queries (Impala 2.5 or higher only)</a>,
+      <a class="xref" href="impala_runtime_filter_mode.html#runtime_filter_mode">RUNTIME_FILTER_MODE Query Option (Impala 2.5 or higher only)</a>
+
+    </p>
+
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_runtime_filtering.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_runtime_filtering.html b/docs/build3x/html/topics/impala_runtime_filtering.html
new file mode 100644
index 0000000..1280838
--- /dev/null
+++ b/docs/build3x/html/topics/impala_runtime_filtering.html
@@ -0,0 +1,533 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_performance.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="runtime_filtering"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Runtime Filtering for Impala Queries (Impala 2.5 or higher only)</title></head><body id="runtime_filtering"><main role="main"><article role="article" aria-labelledby="runtime_filtering__runtime_filters">
+
+  <h1 class="title topictitle1" id="runtime_filtering__runtime_filters">Runtime Filtering for Impala Queries (<span class="keyword">Impala 2.5</span> or higher only)</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+
+      <dfn class="term">Runtime filtering</dfn> is a wide-ranging optimization feature available in
+      <span class="keyword">Impala 2.5</span> and higher. When only a fraction of the data in a table is
+      needed for a query against a partitioned table or to evaluate a join condition,
+      Impala determines the appropriate conditions while the query is running, and
+      broadcasts that information to all the <span class="keyword cmdname">impalad</span> nodes that are reading the table
+      so that they can avoid unnecessary I/O to read partition data, and avoid
+      unnecessary network transmission by sending only the subset of rows that match the join keys
+      across the network.
+    </p>
+
+    <p class="p">
+      This feature is primarily used to optimize queries against large partitioned tables
+      (under the name <dfn class="term">dynamic partition pruning</dfn>) and joins of large tables.
+      The information in this section includes concepts, internals, and troubleshooting
+      information for the entire runtime filtering feature.
+      For specific tuning steps for partitioned tables,
+
+      see
+      <a class="xref" href="impala_partitioning.html#dynamic_partition_pruning">Dynamic Partition Pruning</a>.
+
+    </p>
+
+    <div class="note important note_important"><span class="note__title importanttitle">Important:</span>
+      <p class="p">
+        When this feature made its debut in <span class="keyword">Impala 2.5</span>,
+        the default setting was <code class="ph codeph">RUNTIME_FILTER_MODE=LOCAL</code>.
+        Now the default is <code class="ph codeph">RUNTIME_FILTER_MODE=GLOBAL</code> in <span class="keyword">Impala 2.6</span> and higher,
+        which enables more wide-ranging and ambitious query optimization without requiring you to
+        explicitly set any query options.
+      </p>
+    </div>
+
+    <p class="p toc inpage"></p>
+
+  </div>
+
+  <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_performance.html">Tuning Impala for Performance</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="runtime_filtering__runtime_filtering_concepts">
+    <h2 class="title topictitle2" id="ariaid-title2">Background Information for Runtime Filtering</h2>
+    <div class="body conbody">
+      <p class="p">
+        To understand how runtime filtering works at a detailed level, you must
+        be familiar with some terminology from the field of distributed database technology:
+      </p>
+      <ul class="ul">
+        <li class="li">
+          <p class="p">
+            What a <dfn class="term">plan fragment</dfn> is.
+            Impala decomposes each query into smaller units of work that are distributed across the cluster.
+            Wherever possible, a data block is read, filtered, and aggregated by plan fragments executing
+            on the same host. For some operations, such as joins and combining intermediate results into
+            a final result set, data is transmitted across the network from one DataNode to another.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            What <code class="ph codeph">SCAN</code> and <code class="ph codeph">HASH JOIN</code> plan nodes are, and their role in computing query results:
+          </p>
+          <p class="p">
+            In the Impala query plan, a <dfn class="term">scan node</dfn> performs the I/O to read from the underlying data files.
+            Although this is an expensive operation from the traditional database perspective, Hadoop clusters and Impala are
+            optimized to do this kind of I/O in a highly parallel fashion. The major potential cost savings come from using
+            the columnar Parquet format (where Impala can avoid reading data for unneeded columns) and partitioned tables
+            (where Impala can avoid reading data for unneeded partitions).
+          </p>
+          <p class="p">
+            Most Impala joins use the
+            <a class="xref" href="https://en.wikipedia.org/wiki/Hash_join" target="_blank"><dfn class="term">hash join</dfn></a>
+            mechanism. (It is only fairly recently that Impala
+            started using the nested-loop join technique, for certain kinds of non-equijoin queries.)
+            In a hash join, when evaluating join conditions from two tables, Impala constructs a hash table in memory with all
+            the different column values from the table on one side of the join.
+            Then, for each row from the table on the other side of the join, Impala tests whether the relevant column values
+            are in this hash table or not.
+          </p>
+          <p class="p">
+            A <dfn class="term">hash join node</dfn> constructs such an in-memory hash table, then performs the comparisons to
+            identify which rows match the relevant join conditions
+            and should be included in the result set (or at least sent on to the subsequent intermediate stage of
+            query processing). Because some of the input for a hash join might be transmitted across the network from another host,
+            it is especially important from a performance perspective to prune out ahead of time any data that is known to be
+            irrelevant.
+          </p>
+          <p class="p">
+            The more distinct values are in the columns used as join keys, the larger the in-memory hash table and
+            thus the more memory required to process the query.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            The difference between a <dfn class="term">broadcast join</dfn> and a <dfn class="term">shuffle join</dfn>.
+            (The Hadoop notion of a shuffle join is sometimes referred to in Impala as a <dfn class="term">partitioned join</dfn>.)
+            In a broadcast join, the table from one side of the join (typically the smaller table)
+            is sent in its entirety to all the hosts involved in the query. Then each host can compare its
+            portion of the data from the other (larger) table against the full set of possible join keys.
+            In a shuffle join, there is no obvious <span class="q">"smaller"</span> table, and so the contents of both tables
+            are divided up, and corresponding portions of the data are transmitted to each host involved in the query.
+            See <a class="xref" href="impala_hints.html#hints">Optimizer Hints</a> for information about how these different kinds of
+            joins are processed.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            The notion of the build phase and probe phase when Impala processes a join query.
+            The <dfn class="term">build phase</dfn> is where the rows containing the join key columns, typically for the smaller table,
+            are transmitted across the network and built into an in-memory hash table data structure on one or
+            more destination nodes.
+            The <dfn class="term">probe phase</dfn> is where data is read locally (typically from the larger table) and the join key columns
+            are compared to the values in the in-memory hash table.
+            The corresponding input sources (tables, subqueries, and so on) for these
+            phases are referred to as the <dfn class="term">build side</dfn> and the <dfn class="term">probe side</dfn>.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            How to set Impala query options: interactively within an <span class="keyword cmdname">impala-shell</span> session through
+            the <code class="ph codeph">SET</code> command, for a JDBC or ODBC application through the <code class="ph codeph">SET</code> statement, or
+            globally for all <span class="keyword cmdname">impalad</span> daemons through the <code class="ph codeph">default_query_options</code> configuration
+            setting.
+          </p>
+        </li>
+      </ul>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="runtime_filtering__runtime_filtering_internals">
+    <h2 class="title topictitle2" id="ariaid-title3">Runtime Filtering Internals</h2>
+    <div class="body conbody">
+      <p class="p">
+        The <dfn class="term">filter</dfn> that is transmitted between plan fragments is essentially a list
+        of values for join key columns. When this list is values is transmitted in time to a scan node,
+        Impala can filter out non-matching values immediately after reading them, rather than transmitting
+        the raw data to another host to compare against the in-memory hash table on that host.
+      </p>
+      <p class="p">
+        For HDFS-based tables, this data structure is implemented as a <dfn class="term">Bloom filter</dfn>, which uses
+        a probability-based algorithm to determine all possible matching values. (The probability-based aspects
+        means that the filter might include some non-matching values, but if so, that does not cause any inaccuracy
+        in the final results.)
+      </p>
+      <p class="p">
+        Another kind of filter is the <span class="q">"min-max"</span> filter. It currently only applies to Kudu tables. The
+        filter is a data structure representing a minimum and maximum value. These filters are passed to
+        Kudu to reduce the number of rows returned to Impala when scanning the probe side of the join.
+      </p>
+      <p class="p">
+        There are different kinds of filters to match the different kinds of joins (partitioned and broadcast).
+        A broadcast filter reflects the complete list of relevant values and can be immediately evaluated by a scan node.
+        A partitioned filter reflects only the values processed by one host in the
+        cluster; all the partitioned filters must be combined into one (by the coordinator node) before the
+        scan nodes can use the results to accurately filter the data as it is read from storage.
+      </p>
+      <p class="p">
+        Broadcast filters are also classified as local or global. With a local broadcast filter, the information
+        in the filter is used by a subsequent query fragment that is running on the same host that produced the filter.
+        A non-local broadcast filter must be transmitted across the network to a query fragment that is running on a
+        different host. Impala designates 3 hosts to each produce non-local broadcast filters, to guard against the
+        possibility of a single slow host taking too long. Depending on the setting of the <code class="ph codeph">RUNTIME_FILTER_MODE</code> query option
+        (<code class="ph codeph">LOCAL</code> or <code class="ph codeph">GLOBAL</code>), Impala either uses a conservative optimization
+        strategy where filters are only consumed on the same host that produced them, or a more aggressive strategy
+        where filters are eligible to be transmitted across the network.
+      </p>
+
+      <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+        In <span class="keyword">Impala 2.6</span> and higher, the default for runtime filtering is the <code class="ph codeph">GLOBAL</code> setting.
+      </div>
+
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="runtime_filtering__runtime_filtering_file_formats">
+    <h2 class="title topictitle2" id="ariaid-title4">File Format Considerations for Runtime Filtering</h2>
+    <div class="body conbody">
+      <p class="p">
+        Parquet tables get the most benefit from
+        the runtime filtering optimizations. Runtime filtering can speed up
+        join queries against partitioned or unpartitioned Parquet tables,
+        and single-table queries against partitioned Parquet tables.
+        See <a class="xref" href="impala_parquet.html#parquet">Using the Parquet File Format with Impala Tables</a> for information about
+        using Parquet tables with Impala.
+      </p>
+      <p class="p">
+        For other file formats (text, Avro, RCFile, and SequenceFile),
+        runtime filtering speeds up queries against partitioned tables only.
+        Because partitioned tables can use a mixture of formats, Impala produces
+        the filters in all cases, even if they are not ultimately used to
+        optimize the query.
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title5" id="runtime_filtering__runtime_filtering_timing">
+    <h2 class="title topictitle2" id="ariaid-title5">Wait Intervals for Runtime Filters</h2>
+    <div class="body conbody">
+      <p class="p">
+        Because it takes time to produce runtime filters, especially for
+        partitioned filters that must be combined by the coordinator node,
+        there is a time interval above which it is more efficient for
+        the scan nodes to go ahead and construct their intermediate result sets,
+        even if that intermediate data is larger than optimal. If it only takes
+        a few seconds to produce the filters, it is worth the extra time if pruning
+        the unnecessary data can save minutes in the overall query time.
+        You can specify the maximum wait time in milliseconds using the
+        <code class="ph codeph">RUNTIME_FILTER_WAIT_TIME_MS</code> query option.
+      </p>
+      <p class="p">
+        By default, each scan node waits for up to 1 second (1000 milliseconds)
+        for filters to arrive. If all filters have not arrived within the
+        specified interval, the scan node proceeds, using whatever filters
+        did arrive to help avoid reading unnecessary data. If a filter arrives
+        after the scan node begins reading data, the scan node applies that
+        filter to the data that is read after the filter arrives, but not to
+        the data that was already read.
+      </p>
+      <p class="p">
+        If the cluster is relatively busy and your workload contains many
+        resource-intensive or long-running queries, consider increasing the wait time
+        so that complicated queries do not miss opportunities for optimization.
+        If the cluster is lightly loaded and your workload contains many small queries
+        taking only a few seconds, consider decreasing the wait time to avoid the
+        1 second delay for each query.
+      </p>
+    </div>
+  </article>
+
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title6" id="runtime_filtering__runtime_filtering_query_options">
+    <h2 class="title topictitle2" id="ariaid-title6">Query Options for Runtime Filtering</h2>
+    <div class="body conbody">
+      <p class="p">
+        See the following sections for information about the query options that control runtime filtering:
+      </p>
+      <ul class="ul">
+        <li class="li">
+          <p class="p">
+            The first query option adjusts the <span class="q">"sensitivity"</span> of this feature.
+            <span class="ph">By default, it is set to the highest level (<code class="ph codeph">GLOBAL</code>).
+            (This default applies to <span class="keyword">Impala 2.6</span> and higher.
+            In previous releases, the default was <code class="ph codeph">LOCAL</code>.)</span>
+          </p>
+          <ul class="ul">
+            <li class="li">
+              <p class="p">
+                <a class="xref" href="impala_runtime_filter_mode.html#runtime_filter_mode">RUNTIME_FILTER_MODE Query Option (Impala 2.5 or higher only)</a>
+              </p>
+            </li>
+          </ul>
+        </li>
+        <li class="li">
+          <p class="p">
+            The other query options are tuning knobs that you typically only adjust after doing
+            performance testing, and that you might want to change only for the duration of a single
+            expensive query:
+          </p>
+          <ul class="ul">
+            <li class="li">
+              <p class="p">
+                <a class="xref" href="impala_max_num_runtime_filters.html#max_num_runtime_filters">MAX_NUM_RUNTIME_FILTERS Query Option (Impala 2.5 or higher only)</a>
+              </p>
+            </li>
+            <li class="li">
+              <p class="p">
+                <a class="xref" href="impala_disable_row_runtime_filtering.html#disable_row_runtime_filtering">DISABLE_ROW_RUNTIME_FILTERING Query Option (Impala 2.5 or higher only)</a>
+              </p>
+            </li>
+            <li class="li">
+              <p class="p">
+                <a class="xref" href="impala_runtime_filter_max_size.html#runtime_filter_max_size">RUNTIME_FILTER_MAX_SIZE Query Option (Impala 2.6 or higher only)</a>
+              </p>
+            </li>
+            <li class="li">
+              <p class="p">
+                <a class="xref" href="impala_runtime_filter_min_size.html#runtime_filter_min_size">RUNTIME_FILTER_MIN_SIZE Query Option (Impala 2.6 or higher only)</a>
+              </p>
+            </li>
+            <li class="li">
+              <p class="p">
+                <a class="xref" href="impala_runtime_bloom_filter_size.html#runtime_bloom_filter_size">RUNTIME_BLOOM_FILTER_SIZE Query Option (Impala 2.5 or higher only)</a>;
+                in <span class="keyword">Impala 2.6</span> and higher, this setting acts as a fallback when
+                statistics are not available, rather than as a directive.
+              </p>
+            </li>
+          </ul>
+        </li>
+      </ul>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title7" id="runtime_filtering__runtime_filtering_explain_plan">
+    <h2 class="title topictitle2" id="ariaid-title7">Runtime Filtering and Query Plans</h2>
+    <div class="body conbody">
+      <p class="p">
+        In the same way the query plan displayed by the
+        <code class="ph codeph">EXPLAIN</code> statement includes information
+        about predicates used by each plan fragment, it also
+        includes annotations showing whether a plan fragment
+        produces or consumes a runtime filter.
+        A plan fragment that produces a filter includes an
+        annotation such as
+        <code class="ph codeph">runtime filters: <var class="keyword varname">filter_id</var> &lt;- <var class="keyword varname">table</var>.<var class="keyword varname">column</var></code>,
+        while a plan fragment that consumes a filter includes an annotation such as
+        <code class="ph codeph">runtime filters: <var class="keyword varname">filter_id</var> -&gt; <var class="keyword varname">table</var>.<var class="keyword varname">column</var></code>.
+        <span class="ph">Setting the query option <code class="ph codeph">EXPLAIN_LEVEL=2</code> adds additional
+        annotations showing the type of the filter, either <code class="ph codeph"><var class="keyword varname">filter_id</var>[bloom]</code>
+        (for HDFS-based tables) or <code class="ph codeph"><var class="keyword varname">filter_id</var>[min_max]</code> (for Kudu tables).</span>
+      </p>
+
+      <p class="p">
+        The following example shows a query that uses a single runtime filter (labelled <code class="ph codeph">RF00</code>)
+        to prune the partitions that are scanned in one stage of the query, based on evaluating the
+        result set of a subquery:
+      </p>
+
+<pre class="pre codeblock"><code>
+create table yy (s string) partitioned by (year int) stored as parquet;
+insert into yy partition (year) values ('1999', 1999), ('2000', 2000),
+  ('2001', 2001), ('2010',2010);
+compute stats yy;
+
+create table yy2 (s string) partitioned by (year int) stored as parquet;
+insert into yy2 partition (year) values ('1999', 1999), ('2000', 2000),
+  ('2001', 2001);
+compute stats yy2;
+
+-- The query reads an unknown number of partitions, whose key values are only
+-- known at run time. The 'runtime filters' lines show how the information about
+-- the partitions is calculated in query fragment 02, and then used in query
+-- fragment 00 to decide which partitions to skip.
+explain select s from yy2 where year in (select year from yy where year between 2000 and 2005);
++----------------------------------------------------------+
+| Explain String                                           |
++----------------------------------------------------------+
+| Estimated Per-Host Requirements: Memory=16.00MB VCores=2 |
+|                                                          |
+| 04:EXCHANGE [UNPARTITIONED]                              |
+| |                                                        |
+| 02:HASH JOIN [LEFT SEMI JOIN, BROADCAST]                 |
+| |  hash predicates: year = year                          |
+| |  <strong class="ph b">runtime filters: RF000 &lt;- year</strong>                        |
+| |                                                        |
+| |--03:EXCHANGE [BROADCAST]                               |
+| |  |                                                     |
+| |  01:SCAN HDFS [dpp.yy]                                 |
+| |     partitions=2/4 files=2 size=468B                   |
+| |                                                        |
+| 00:SCAN HDFS [dpp.yy2]                                   |
+|    partitions=2/3 files=2 size=468B                      |
+|    <strong class="ph b">runtime filters: RF000 -&gt; year</strong>                        |
++----------------------------------------------------------+
+</code></pre>
+
+      <p class="p">
+        The query profile (displayed by the <code class="ph codeph">PROFILE</code> command in <span class="keyword cmdname">impala-shell</span>)
+        contains both the <code class="ph codeph">EXPLAIN</code> plan and more detailed information about the internal
+        workings of the query. The profile output includes a section labelled the <span class="q">"filter routing table"</span>,
+        with information about each filter based on its ID.
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title8" id="runtime_filtering__runtime_filtering_queries">
+    <h2 class="title topictitle2" id="ariaid-title8">Examples of Queries that Benefit from Runtime Filtering</h2>
+    <div class="body conbody">
+
+      <p class="p">
+        In this example, Impala would normally do extra work to interpret the columns
+        <code class="ph codeph">C1</code>, <code class="ph codeph">C2</code>, <code class="ph codeph">C3</code>, and <code class="ph codeph">ID</code>
+        for each row in <code class="ph codeph">HUGE_T1</code>, before checking the <code class="ph codeph">ID</code>
+        value against the in-memory hash table constructed from all the <code class="ph codeph">TINY_T2.ID</code>
+        values. By producing a filter containing all the <code class="ph codeph">TINY_T2.ID</code> values
+        even before the query starts scanning the <code class="ph codeph">HUGE_T1</code> table, Impala
+        can skip the unnecessary work to parse the column info as soon as it determines
+        that an <code class="ph codeph">ID</code> value does not match any of the values from the other table.
+      </p>
+
+      <p class="p">
+        The example shows <code class="ph codeph">COMPUTE STATS</code> statements for both the tables (even
+        though that is a one-time operation after loading data into those tables) because
+        Impala relies on up-to-date statistics to
+        determine which one has more distinct <code class="ph codeph">ID</code> values than the other.
+        That information lets Impala make effective decisions about which table to use to
+        construct the in-memory hash table, and which table to read from disk and
+        compare against the entries in the hash table.
+      </p>
+
+<pre class="pre codeblock"><code>
+COMPUTE STATS huge_t1;
+COMPUTE STATS tiny_t2;
+SELECT c1, c2, c3 FROM huge_t1 JOIN tiny_t2 WHERE huge_t1.id = tiny_t2.id;
+</code></pre>
+
+
+
+      <p class="p">
+        In this example, <code class="ph codeph">T1</code> is a table partitioned by year. The subquery
+        on <code class="ph codeph">T2</code> produces multiple values, and transmits those values as a filter to the plan
+        fragments that are reading from <code class="ph codeph">T1</code>. Any non-matching partitions in <code class="ph codeph">T1</code>
+        are skipped.
+      </p>
+
+<pre class="pre codeblock"><code>
+select c1 from t1 where year in (select distinct year from t2);
+</code></pre>
+
+      <p class="p">
+        Now the <code class="ph codeph">WHERE</code> clause contains an additional test that does not apply to
+        the partition key column.
+        A filter on a column that is not a partition key is called a per-row filter.
+        Because per-row filters only apply for Parquet, <code class="ph codeph">T1</code> must be a Parquet table.
+      </p>
+
+      <p class="p">
+        The subqueries result in two filters being transmitted to
+        the scan nodes that read from <code class="ph codeph">T1</code>. The filter on <code class="ph codeph">YEAR</code> helps the query eliminate
+        entire partitions based on non-matching years. The filter on <code class="ph codeph">C2</code> lets Impala discard
+        rows with non-matching <code class="ph codeph">C2</code> values immediately after reading them. Without runtime filtering,
+        Impala would have to keep the non-matching values in memory, assemble <code class="ph codeph">C1</code>, <code class="ph codeph">C2</code>,
+        and <code class="ph codeph">C3</code> into rows in the intermediate result set, and transmit all the intermediate rows
+        back to the coordinator node, where they would be eliminated only at the very end of the query.
+      </p>
+
+<pre class="pre codeblock"><code>
+select c1, c2, c3 from t1
+  where year in (select distinct year from t2)
+    and c2 in (select other_column from t3);
+</code></pre>
+
+      <p class="p">
+        This example involves a broadcast join.
+        The fact that the <code class="ph codeph">ON</code> clause would
+        return a small number of matching rows (because there
+        are not very many rows in <code class="ph codeph">TINY_T2</code>)
+        means that the corresponding filter is very selective.
+        Therefore, runtime filtering will probably be effective
+        in optimizing this query.
+      </p>
+
+<pre class="pre codeblock"><code>
+select c1 from huge_t1 join [broadcast] tiny_t2
+  on huge_t1.id = tiny_t2.id
+  where huge_t1.year in (select distinct year from tiny_t2)
+    and c2 in (select other_column from t3);
+</code></pre>
+
+      <p class="p">
+        This example involves a shuffle or partitioned join.
+        Assume that most rows in <code class="ph codeph">HUGE_T1</code>
+        have a corresponding row in <code class="ph codeph">HUGE_T2</code>.
+        The fact that the <code class="ph codeph">ON</code> clause could
+        return a large number of matching rows means that
+        the corresponding filter would not be very selective.
+        Therefore, runtime filtering might be less effective
+        in optimizing this query.
+      </p>
+
+<pre class="pre codeblock"><code>
+select c1 from huge_t1 join [shuffle] huge_t2
+  on huge_t1.id = huge_t2.id
+  where huge_t1.year in (select distinct year from huge_t2)
+    and c2 in (select other_column from t3);
+</code></pre>
+
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title9" id="runtime_filtering__runtime_filtering_tuning">
+    <h2 class="title topictitle2" id="ariaid-title9">Tuning and Troubleshooting Queries that Use Runtime Filtering</h2>
+    <div class="body conbody">
+      <p class="p">
+        These tuning and troubleshooting procedures apply to queries that are
+        resource-intensive enough, long-running enough, and frequent enough
+        that you can devote special attention to optimizing them individually.
+      </p>
+
+      <p class="p">
+        Use the <code class="ph codeph">EXPLAIN</code> statement and examine the <code class="ph codeph">runtime filters:</code>
+        lines to determine whether runtime filters are being applied to the <code class="ph codeph">WHERE</code> predicates
+        and join clauses that you expect. For example, runtime filtering does not apply to queries that use
+        the nested loop join mechanism due to non-equijoin operators.
+      </p>
+
+      <p class="p">
+        Make sure statistics are up-to-date for all tables involved in the queries.
+        Use the <code class="ph codeph">COMPUTE STATS</code> statement after loading data into non-partitioned tables,
+        and <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> after adding new partitions to partitioned tables.
+      </p>
+
+      <p class="p">
+        If join queries involving large tables use unique columns as the join keys,
+        for example joining a primary key column with a foreign key column, the overhead of
+        producing and transmitting the filter might outweigh the performance benefit because
+        not much data could be pruned during the early stages of the query.
+        For such queries, consider setting the query option <code class="ph codeph">RUNTIME_FILTER_MODE=OFF</code>.
+      </p>
+
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title10" id="runtime_filtering__runtime_filtering_limits">
+    <h2 class="title topictitle2" id="ariaid-title10">Limitations and Restrictions for Runtime Filtering</h2>
+    <div class="body conbody">
+      <p class="p">
+        The runtime filtering feature is most effective for the Parquet file formats.
+        For other file formats, filtering only applies for partitioned tables.
+        See <a class="xref" href="impala_runtime_filtering.html#runtime_filtering_file_formats">File Format Considerations for Runtime Filtering</a>.
+        For the ways in which runtime filtering works for Kudu tables, see
+        <a class="xref" href="impala_kudu.html#kudu_performance">Impala Query Performance for Kudu Tables</a>.
+      </p>
+
+
+      <p class="p">
+        When the spill-to-disk mechanism is activated on a particular host during a query,
+        that host does not produce any filters while processing that query.
+        This limitation does not affect the correctness of results; it only reduces the
+        amount of optimization that can be applied to the query.
+      </p>
+
+    </div>
+  </article>
+
+
+</article></main></body></html>

[06/51] [partial] impala git commit: [DOCS] Impala doc site update for 3.0

Posted by mi...@apache.org.

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_scalability.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_scalability.html b/docs/build3x/html/topics/impala_scalability.html
new file mode 100644
index 0000000..f2b6a9f
--- /dev/null
+++ b/docs/build3x/html/topics/impala_scalability.html
@@ -0,0 +1,920 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name
 ="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="scalability"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Scalability Considerations for Impala</title></head><body id="scalability"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Scalability Considerations for Impala</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+      This section explains how the size of your cluster and the volume of data influences SQL performance and
+      schema design for Impala tables. Typically, adding more cluster capacity reduces problems due to memory
+      limits or disk throughput. On the other hand, larger clusters are more likely to have other kinds of
+      scalability issues, such as a single slow node that causes performance problems for queries.
+    </p>
+
+    <p class="p toc inpage"></p>
+
+    <p class="p">
+        A good source of tips related to scalability and performance tuning is the
+        <a class="xref" href="http://www.slideshare.net/cloudera/the-impala-cookbook-42530186" target="_blank">Impala Cookbook</a>
+        presentation. These slides are updated periodically as new features come out and new benchmarks are performed.
+      </p>
+
+  </div>
+
+
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title2" id="scalability__scalability_catalog">
+
+    <h2 class="title topictitle2" id="ariaid-title2">Impact of Many Tables or Partitions on Impala Catalog Performance and Memory Usage</h2>
+
+    <div class="body conbody">
+
+
+
+      <p class="p">
+        Because Hadoop I/O is optimized for reading and writing large files, Impala is optimized for tables
+        containing relatively few, large data files. Schemas containing thousands of tables, or tables containing
+        thousands of partitions, can encounter performance issues during startup or during DDL operations such as
+        <code class="ph codeph">ALTER TABLE</code> statements.
+      </p>
+
+      <div class="note important note_important"><span class="note__title importanttitle">Important:</span>
+      <p class="p">
+        Because of a change in the default heap size for the <span class="keyword cmdname">catalogd</span> daemon in
+        <span class="keyword">Impala 2.5</span> and higher, the following procedure to increase the <span class="keyword cmdname">catalogd</span>
+        memory limit might be required following an upgrade to <span class="keyword">Impala 2.5</span> even if not
+        needed previously.
+      </p>
+      </div>
+
+      <div class="p">
+        For schemas with large numbers of tables, partitions, and data files, the <span class="keyword cmdname">catalogd</span>
+        daemon might encounter an out-of-memory error. To increase the memory limit for the
+        <span class="keyword cmdname">catalogd</span> daemon:
+
+        <ol class="ol">
+          <li class="li">
+            <p class="p">
+              Check current memory usage for the <span class="keyword cmdname">catalogd</span> daemon by running the
+              following commands on the host where that daemon runs on your cluster:
+            </p>
+  <pre class="pre codeblock"><code>
+  jcmd <var class="keyword varname">catalogd_pid</var> VM.flags
+  jmap -heap <var class="keyword varname">catalogd_pid</var>
+  </code></pre>
+          </li>
+          <li class="li">
+            <p class="p">
+              Decide on a large enough value for the <span class="keyword cmdname">catalogd</span> heap.
+              You express it as an environment variable value as follows:
+            </p>
+  <pre class="pre codeblock"><code>
+  JAVA_TOOL_OPTIONS="-Xmx8g"
+  </code></pre>
+          </li>
+          <li class="li">
+            <p class="p">
+              On systems not using cluster management software, put this environment variable setting into the
+              startup script for the <span class="keyword cmdname">catalogd</span> daemon, then restart the <span class="keyword cmdname">catalogd</span>
+              daemon.
+            </p>
+          </li>
+          <li class="li">
+            <p class="p">
+              Use the same <span class="keyword cmdname">jcmd</span> and <span class="keyword cmdname">jmap</span> commands as earlier to
+              verify that the new settings are in effect.
+            </p>
+          </li>
+        </ol>
+      </div>
+
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="scalability__statestore_scalability">
+
+    <h2 class="title topictitle2" id="ariaid-title3">Scalability Considerations for the Impala Statestore</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        Before <span class="keyword">Impala 2.1</span>, the statestore sent only one kind of message to its subscribers. This message contained all
+        updates for any topics that a subscriber had subscribed to. It also served to let subscribers know that the
+        statestore had not failed, and conversely the statestore used the success of sending a heartbeat to a
+        subscriber to decide whether or not the subscriber had failed.
+      </p>
+
+      <p class="p">
+        Combining topic updates and failure detection in a single message led to bottlenecks in clusters with large
+        numbers of tables, partitions, and HDFS data blocks. When the statestore was overloaded with metadata
+        updates to transmit, heartbeat messages were sent less frequently, sometimes causing subscribers to time
+        out their connection with the statestore. Increasing the subscriber timeout and decreasing the frequency of
+        statestore heartbeats worked around the problem, but reduced responsiveness when the statestore failed or
+        restarted.
+      </p>
+
+      <p class="p">
+        As of <span class="keyword">Impala 2.1</span>, the statestore now sends topic updates and heartbeats in separate messages. This allows the
+        statestore to send and receive a steady stream of lightweight heartbeats, and removes the requirement to
+        send topic updates according to a fixed schedule, reducing statestore network overhead.
+      </p>
+
+      <p class="p">
+        The statestore now has the following relevant configuration flags for the <span class="keyword cmdname">statestored</span>
+        daemon:
+      </p>
+
+      <dl class="dl">
+
+
+          <dt class="dt dlterm" id="statestore_scalability__statestore_num_update_threads">
+            <code class="ph codeph">-statestore_num_update_threads</code>
+          </dt>
+
+          <dd class="dd">
+            The number of threads inside the statestore dedicated to sending topic updates. You should not
+            typically need to change this value.
+            <p class="p">
+              <strong class="ph b">Default:</strong> 10
+            </p>
+          </dd>
+
+
+
+
+
+          <dt class="dt dlterm" id="statestore_scalability__statestore_update_frequency_ms">
+            <code class="ph codeph">-statestore_update_frequency_ms</code>
+          </dt>
+
+          <dd class="dd">
+            The frequency, in milliseconds, with which the statestore tries to send topic updates to each
+            subscriber. This is a best-effort value; if the statestore is unable to meet this frequency, it sends
+            topic updates as fast as it can. You should not typically need to change this value.
+            <p class="p">
+              <strong class="ph b">Default:</strong> 2000
+            </p>
+          </dd>
+
+
+
+
+
+          <dt class="dt dlterm" id="statestore_scalability__statestore_num_heartbeat_threads">
+            <code class="ph codeph">-statestore_num_heartbeat_threads</code>
+          </dt>
+
+          <dd class="dd">
+            The number of threads inside the statestore dedicated to sending heartbeats. You should not typically
+            need to change this value.
+            <p class="p">
+              <strong class="ph b">Default:</strong> 10
+            </p>
+          </dd>
+
+
+
+
+
+          <dt class="dt dlterm" id="statestore_scalability__statestore_heartbeat_frequency_ms">
+            <code class="ph codeph">-statestore_heartbeat_frequency_ms</code>
+          </dt>
+
+          <dd class="dd">
+            The frequency, in milliseconds, with which the statestore tries to send heartbeats to each subscriber.
+            This value should be good for large catalogs and clusters up to approximately 150 nodes. Beyond that,
+            you might need to increase this value to make the interval longer between heartbeat messages.
+            <p class="p">
+              <strong class="ph b">Default:</strong> 1000 (one heartbeat message every second)
+            </p>
+          </dd>
+
+
+      </dl>
+
+      <p class="p">
+        If it takes a very long time for a cluster to start up, and <span class="keyword cmdname">impala-shell</span> consistently
+        displays <code class="ph codeph">This Impala daemon is not ready to accept user requests</code>, the statestore might be
+        taking too long to send the entire catalog topic to the cluster. In this case, consider adding
+        <code class="ph codeph">--load_catalog_in_background=false</code> to your catalog service configuration. This setting
+        stops the statestore from loading the entire catalog into memory at cluster startup. Instead, metadata for
+        each table is loaded when the table is accessed for the first time.
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="scalability__scalability_coordinator">
+
+    <h2 class="title topictitle2" id="ariaid-title4">Controlling which Hosts are Coordinators and Executors</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        By default, each host in the cluster that runs the <span class="keyword cmdname">impalad</span>
+        daemon can act as the coordinator for an Impala query, execute the fragments
+        of the execution plan for the query, or both. During highly concurrent
+        workloads for large-scale queries, especially on large clusters, the dual
+        roles can cause scalability issues:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          <p class="p">
+            The extra work required for a host to act as the coordinator could interfere
+            with its capacity to perform other work for the earlier phases of the query.
+            For example, the coordinator can experience significant network and CPU overhead
+            during queries containing a large number of query fragments. Each coordinator
+            caches metadata for all table partitions and data files, which can be substantial
+            and contend with memory needed to process joins, aggregations, and other operations
+            performed by query executors.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            Having a large number of hosts act as coordinators can cause unnecessary network
+            overhead, or even timeout errors, as each of those hosts communicates with the
+            <span class="keyword cmdname">statestored</span> daemon for metadata updates.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            The <span class="q">"soft limits"</span> imposed by the admission control feature are more likely
+            to be exceeded when there are a large number of heavily loaded hosts acting as
+            coordinators.
+          </p>
+        </li>
+      </ul>
+
+      <p class="p">
+        If such scalability bottlenecks occur, you can explicitly specify that certain
+        hosts act as query coordinators, but not executors for query fragments.
+        These hosts do not participate in I/O-intensive operations such as scans,
+        and CPU-intensive operations such as aggregations.
+      </p>
+
+      <p class="p">
+        Then, you specify that the
+        other hosts act as executors but not coordinators. These hosts do not communicate
+        with the <span class="keyword cmdname">statestored</span> daemon or process the final result sets
+        from queries. You cannot connect to these hosts through clients such as
+        <span class="keyword cmdname">impala-shell</span> or business intelligence tools.
+      </p>
+
+      <p class="p">
+        This feature is available in <span class="keyword">Impala 2.9</span> and higher.
+      </p>
+
+      <p class="p">
+        To use this feature, you specify one of the following startup flags for the
+        <span class="keyword cmdname">impalad</span> daemon on each host:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          <p class="p">
+            <code class="ph codeph">is_executor=false</code> for each host that
+            does not act as an executor for Impala queries.
+            These hosts act exclusively as query coordinators.
+            This setting typically applies to a relatively small number of
+            hosts, because the most common topology is to have nearly all
+            DataNodes doing work for query execution.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            <code class="ph codeph">is_coordinator=false</code> for each host that
+            does not act as a coordinator for Impala queries.
+            These hosts act exclusively as executors.
+            The number of hosts with this setting typically increases
+            as the cluster grows larger and handles more table partitions,
+            data files, and concurrent queries. As the overhead for query
+            coordination increases, it becomes more important to centralize
+            that work on dedicated hosts.
+          </p>
+        </li>
+      </ul>
+
+      <p class="p">
+        By default, both of these settings are enabled for each <code class="ph codeph">impalad</code>
+        instance, allowing all such hosts to act as both executors and coordinators.
+      </p>
+
+      <p class="p">
+        For example, on a 100-node cluster, you might specify <code class="ph codeph">is_executor=false</code>
+        for 10 hosts, to dedicate those hosts as query coordinators. Then specify
+        <code class="ph codeph">is_coordinator=false</code> for the remaining 90 hosts. All explicit or
+        load-balanced connections must go to the 10 hosts acting as coordinators. These hosts
+        perform the network communication to keep metadata up-to-date and route query results
+        to the appropriate clients. The remaining 90 hosts perform the intensive I/O, CPU, and
+        memory operations that make up the bulk of the work for each query. If a bottleneck or
+        other performance issue arises on a specific host, you can narrow down the cause more
+        easily because each host is dedicated to specific operations within the overall
+        Impala workload.
+      </p>
+
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title5" id="scalability__scalability_buffer_pool">
+    <h2 class="title topictitle2" id="ariaid-title5">Effect of Buffer Pool on Memory Usage (<span class="keyword">Impala 2.10</span> and higher)</h2>
+    <div class="body conbody">
+      <p class="p">
+        The buffer pool feature, available in <span class="keyword">Impala 2.10</span> and higher, changes the
+        way Impala allocates memory during a query. Most of the memory needed is reserved at the
+        beginning of the query, avoiding cases where a query might run for a long time before failing
+        with an out-of-memory error. The actual memory estimates and memory buffers are typically
+        smaller than before, so that more queries can run concurrently or process larger volumes
+        of data than previously.
+      </p>
+      <p class="p">
+        The buffer pool feature includes some query options that you can fine-tune:
+        <a class="xref" href="impala_buffer_pool_limit.html">BUFFER_POOL_LIMIT Query Option</a>,
+        <a class="xref" href="impala_default_spillable_buffer_size.html">DEFAULT_SPILLABLE_BUFFER_SIZE Query Option</a>,
+        <a class="xref" href="impala_max_row_size.html">MAX_ROW_SIZE Query Option</a>, and
+        <a class="xref" href="impala_min_spillable_buffer_size.html">MIN_SPILLABLE_BUFFER_SIZE Query Option</a>.
+      </p>
+      <p class="p">
+        Most of the effects of the buffer pool are transparent to you as an Impala user.
+        Memory use during spilling is now steadier and more predictable, instead of
+        increasing rapidly as more data is spilled to disk. The main change from a user
+        perspective is the need to increase the <code class="ph codeph">MAX_ROW_SIZE</code> query option
+        setting when querying tables with columns containing long strings, many columns,
+        or other combinations of factors that produce very large rows. If Impala encounters
+        rows that are too large to process with the default query option settings, the query
+        fails with an error message suggesting to increase the <code class="ph codeph">MAX_ROW_SIZE</code>
+        setting.
+      </p>
+    </div>
+  </article>
+
+
+
+
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title6" id="scalability__spill_to_disk">
+
+    <h2 class="title topictitle2" id="ariaid-title6">SQL Operations that Spill to Disk</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        Certain memory-intensive operations write temporary data to disk (known as <dfn class="term">spilling</dfn> to disk)
+        when Impala is close to exceeding its memory limit on a particular host.
+      </p>
+
+      <p class="p">
+        The result is a query that completes successfully, rather than failing with an out-of-memory error. The
+        tradeoff is decreased performance due to the extra disk I/O to write the temporary data and read it back
+        in. The slowdown could be potentially be significant. Thus, while this feature improves reliability,
+        you should optimize your queries, system parameters, and hardware configuration to make this spilling a rare occurrence.
+      </p>
+
+      <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+        <p class="p">
+          In <span class="keyword">Impala 2.10</span> and higher, also see <a class="xref" href="impala_scalability.html">Scalability Considerations for Impala</a> for
+          changes to Impala memory allocation that might change the details of which queries spill to disk,
+          and how much memory and disk space is involved in the spilling operation.
+        </p>
+      </div>
+
+      <p class="p">
+        <strong class="ph b">What kinds of queries might spill to disk:</strong>
+      </p>
+
+      <p class="p">
+        Several SQL clauses and constructs require memory allocations that could activat the spilling mechanism:
+      </p>
+      <ul class="ul">
+        <li class="li">
+          <p class="p">
+            when a query uses a <code class="ph codeph">GROUP BY</code> clause for columns
+            with millions or billions of distinct values, Impala keeps a
+            similar number of temporary results in memory, to accumulate the
+            aggregate results for each value in the group.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            When large tables are joined together, Impala keeps the values of
+            the join columns from one table in memory, to compare them to
+            incoming values from the other table.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            When a large result set is sorted by the <code class="ph codeph">ORDER BY</code>
+            clause, each node sorts its portion of the result set in memory.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            The <code class="ph codeph">DISTINCT</code> and <code class="ph codeph">UNION</code> operators
+            build in-memory data structures to represent all values found so
+            far, to eliminate duplicates as the query progresses.
+          </p>
+        </li>
+
+      </ul>
+
+      <p class="p">
+        When the spill-to-disk feature is activated for a join node within a query, Impala does not
+        produce any runtime filters for that join operation on that host. Other join nodes within
+        the query are not affected.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">How Impala handles scratch disk space for spilling:</strong>
+      </p>
+
+      <p class="p">
+        By default, intermediate files used during large sort, join, aggregation, or analytic function operations
+        are stored in the directory <span class="ph filepath">/tmp/impala-scratch</span> . These files are removed when the
+        operation finishes. (Multiple concurrent queries can perform operations that use the <span class="q">"spill to disk"</span>
+        technique, without any name conflicts for these temporary files.) You can specify a different location by
+        starting the <span class="keyword cmdname">impalad</span> daemon with the
+        <code class="ph codeph">--scratch_dirs="<var class="keyword varname">path_to_directory</var>"</code> configuration option.
+        You can specify a single directory, or a comma-separated list of directories. The scratch directories must
+        be on the local filesystem, not in HDFS. You might specify different directory paths for different hosts,
+        depending on the capacity and speed
+        of the available storage devices. In <span class="keyword">Impala 2.3</span> or higher, Impala successfully starts (with a warning
+        Impala successfully starts (with a warning written to the log) if it cannot create or read and write files
+        in one of the scratch directories. If there is less than 1 GB free on the filesystem where that directory resides,
+        Impala still runs, but writes a warning message to its log.  If Impala encounters an error reading or writing
+        files in a scratch directory during a query, Impala logs the error and the query fails.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Memory usage for SQL operators:</strong>
+      </p>
+
+      <p class="p">
+        In <span class="keyword">Impala 2.10</span> and higher, the way SQL operators such as <code class="ph codeph">GROUP BY</code>,
+        <code class="ph codeph">DISTINCT</code>, and joins, transition between using additional memory or activating the
+        spill-to-disk feature is changed. The memory required to spill to disk is reserved up front, and you can
+        examine it in the <code class="ph codeph">EXPLAIN</code> plan when the <code class="ph codeph">EXPLAIN_LEVEL</code> query option is
+        set to 2 or higher.
+      </p>
+
+     <p class="p">
+       The infrastructure of the spilling feature affects the way the affected SQL operators, such as
+       <code class="ph codeph">GROUP BY</code>, <code class="ph codeph">DISTINCT</code>, and joins, use memory.
+       On each host that participates in the query, each such operator in a query requires memory
+       to store rows of data and other data structures. Impala reserves a certain amount of memory
+       up front for each operator that supports spill-to-disk that is sufficient to execute the
+       operator. If an operator accumulates more data than can fit in the reserved memory, it
+       can either reserve more memory to continue processing data in memory or start spilling
+       data to temporary scratch files on disk. Thus, operators with spill-to-disk support
+       can adapt to different memory constraints by using however much memory is available
+       to speed up execution, yet tolerate low memory conditions by spilling data to disk.
+     </p>
+
+     <p class="p">
+       The amount data depends on the portion of the data being handled by that host, and thus
+       the operator may end up consuming different amounts of memory on different hosts.
+     </p>
+
+
+
+      <p class="p">
+        <strong class="ph b">Added in:</strong> This feature was added to the <code class="ph codeph">ORDER BY</code> clause in Impala 1.4.
+        This feature was extended to cover join queries, aggregation functions, and analytic
+        functions in Impala 2.0. The size of the memory work area required by
+        each operator that spills was reduced from 512 megabytes to 256 megabytes in Impala 2.2.
+        <span class="ph">The spilling mechanism was reworked to take advantage of the
+        Impala buffer pool feature and be more predictable and stable in <span class="keyword">Impala 2.10</span>.</span>
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Avoiding queries that spill to disk:</strong>
+      </p>
+
+      <p class="p">
+        Because the extra I/O can impose significant performance overhead on these types of queries, try to avoid
+        this situation by using the following steps:
+      </p>
+
+      <ol class="ol">
+        <li class="li">
+          Detect how often queries spill to disk, and how much temporary data is written. Refer to the following
+          sources:
+          <ul class="ul">
+            <li class="li">
+              The output of the <code class="ph codeph">PROFILE</code> command in the <span class="keyword cmdname">impala-shell</span>
+              interpreter. This data shows the memory usage for each host and in total across the cluster. The
+              <code class="ph codeph">WriteIoBytes</code> counter reports how much data was written to disk for each operator
+              during the query. (In <span class="keyword">Impala 2.9</span>, the counter was named
+              <code class="ph codeph">ScratchBytesWritten</code>; in <span class="keyword">Impala 2.8</span> and earlier, it was named
+              <code class="ph codeph">BytesWritten</code>.)
+            </li>
+
+            <li class="li">
+              The <span class="ph uicontrol">Queries</span> tab in the Impala debug web user interface. Select the query to
+              examine and click the corresponding <span class="ph uicontrol">Profile</span> link. This data breaks down the
+              memory usage for a single host within the cluster, the host whose web interface you are connected to.
+            </li>
+          </ul>
+        </li>
+
+        <li class="li">
+          Use one or more techniques to reduce the possibility of the queries spilling to disk:
+          <ul class="ul">
+            <li class="li">
+              Increase the Impala memory limit if practical, for example, if you can increase the available memory
+              by more than the amount of temporary data written to disk on a particular node. Remember that in
+              Impala 2.0 and later, you can issue <code class="ph codeph">SET MEM_LIMIT</code> as a SQL statement, which lets you
+              fine-tune the memory usage for queries from JDBC and ODBC applications.
+            </li>
+
+            <li class="li">
+              Increase the number of nodes in the cluster, to increase the aggregate memory available to Impala and
+              reduce the amount of memory required on each node.
+            </li>
+
+            <li class="li">
+              Increase the overall memory capacity of each DataNode at the hardware level.
+            </li>
+
+            <li class="li">
+              On a cluster with resources shared between Impala and other Hadoop components, use resource
+              management features to allocate more memory for Impala. See
+              <a class="xref" href="impala_resource_management.html#resource_management">Resource Management for Impala</a> for details.
+            </li>
+
+            <li class="li">
+              If the memory pressure is due to running many concurrent queries rather than a few memory-intensive
+              ones, consider using the Impala admission control feature to lower the limit on the number of
+              concurrent queries. By spacing out the most resource-intensive queries, you can avoid spikes in
+              memory usage and improve overall response times. See
+              <a class="xref" href="impala_admission.html#admission_control">Admission Control and Query Queuing</a> for details.
+            </li>
+
+            <li class="li">
+              Tune the queries with the highest memory requirements, using one or more of the following techniques:
+              <ul class="ul">
+                <li class="li">
+                  Run the <code class="ph codeph">COMPUTE STATS</code> statement for all tables involved in large-scale joins and
+                  aggregation queries.
+                </li>
+
+                <li class="li">
+                  Minimize your use of <code class="ph codeph">STRING</code> columns in join columns. Prefer numeric values
+                  instead.
+                </li>
+
+                <li class="li">
+                  Examine the <code class="ph codeph">EXPLAIN</code> plan to understand the execution strategy being used for the
+                  most resource-intensive queries. See <a class="xref" href="impala_explain_plan.html#perf_explain">Using the EXPLAIN Plan for Performance Tuning</a> for
+                  details.
+                </li>
+
+                <li class="li">
+                  If Impala still chooses a suboptimal execution strategy even with statistics available, or if it
+                  is impractical to keep the statistics up to date for huge or rapidly changing tables, add hints
+                  to the most resource-intensive queries to select the right execution strategy. See
+                  <a class="xref" href="impala_hints.html#hints">Optimizer Hints</a> for details.
+                </li>
+              </ul>
+            </li>
+
+            <li class="li">
+              If your queries experience substantial performance overhead due to spilling, enable the
+              <code class="ph codeph">DISABLE_UNSAFE_SPILLS</code> query option. This option prevents queries whose memory usage
+              is likely to be exorbitant from spilling to disk. See
+              <a class="xref" href="impala_disable_unsafe_spills.html#disable_unsafe_spills">DISABLE_UNSAFE_SPILLS Query Option (Impala 2.0 or higher only)</a> for details. As you tune
+              problematic queries using the preceding steps, fewer and fewer will be cancelled by this option
+              setting.
+            </li>
+          </ul>
+        </li>
+      </ol>
+
+      <p class="p">
+        <strong class="ph b">Testing performance implications of spilling to disk:</strong>
+      </p>
+
+      <p class="p">
+        To artificially provoke spilling, to test this feature and understand the performance implications, use a
+        test environment with a memory limit of at least 2 GB. Issue the <code class="ph codeph">SET</code> command with no
+        arguments to check the current setting for the <code class="ph codeph">MEM_LIMIT</code> query option. Set the query
+        option <code class="ph codeph">DISABLE_UNSAFE_SPILLS=true</code>. This option limits the spill-to-disk feature to prevent
+        runaway disk usage from queries that are known in advance to be suboptimal. Within
+        <span class="keyword cmdname">impala-shell</span>, run a query that you expect to be memory-intensive, based on the criteria
+        explained earlier. A self-join of a large table is a good candidate:
+      </p>
+
+<pre class="pre codeblock"><code>select count(*) from big_table a join big_table b using (column_with_many_values);
+</code></pre>
+
+      <p class="p">
+        Issue the <code class="ph codeph">PROFILE</code> command to get a detailed breakdown of the memory usage on each node
+        during the query.
+
+      </p>
+
+
+
+      <p class="p">
+        Set the <code class="ph codeph">MEM_LIMIT</code> query option to a value that is smaller than the peak memory usage
+        reported in the profile output. Now try the memory-intensive query again.
+      </p>
+
+      <p class="p">
+        Check if the query fails with a message like the following:
+      </p>
+
+<pre class="pre codeblock"><code>WARNINGS: Spilling has been disabled for plans that do not have stats and are not hinted
+to prevent potentially bad plans from using too many cluster resources. Compute stats on
+these tables, hint the plan or disable this behavior via query options to enable spilling.
+</code></pre>
+
+      <p class="p">
+        If so, the query could have consumed substantial temporary disk space, slowing down so much that it would
+        not complete in any reasonable time. Rather than rely on the spill-to-disk feature in this case, issue the
+        <code class="ph codeph">COMPUTE STATS</code> statement for the table or tables in your sample query. Then run the query
+        again, check the peak memory usage again in the <code class="ph codeph">PROFILE</code> output, and adjust the memory
+        limit again if necessary to be lower than the peak memory usage.
+      </p>
+
+      <p class="p">
+        At this point, you have a query that is memory-intensive, but Impala can optimize it efficiently so that
+        the memory usage is not exorbitant. You have set an artificial constraint through the
+        <code class="ph codeph">MEM_LIMIT</code> option so that the query would normally fail with an out-of-memory error. But
+        the automatic spill-to-disk feature means that the query should actually succeed, at the expense of some
+        extra disk I/O to read and write temporary work data.
+      </p>
+
+      <p class="p">
+        Try the query again, and confirm that it succeeds. Examine the <code class="ph codeph">PROFILE</code> output again. This
+        time, look for lines of this form:
+      </p>
+
+<pre class="pre codeblock"><code>- SpilledPartitions: <var class="keyword varname">N</var>
+</code></pre>
+
+      <p class="p">
+        If you see any such lines with <var class="keyword varname">N</var> greater than 0, that indicates the query would have
+        failed in Impala releases prior to 2.0, but now it succeeded because of the spill-to-disk feature. Examine
+        the total time taken by the <code class="ph codeph">AGGREGATION_NODE</code> or other query fragments containing non-zero
+        <code class="ph codeph">SpilledPartitions</code> values. Compare the times to similar fragments that did not spill, for
+        example in the <code class="ph codeph">PROFILE</code> output when the same query is run with a higher memory limit. This
+        gives you an idea of the performance penalty of the spill operation for a particular query with a
+        particular memory limit. If you make the memory limit just a little lower than the peak memory usage, the
+        query only needs to write a small amount of temporary data to disk. The lower you set the memory limit, the
+        more temporary data is written and the slower the query becomes.
+      </p>
+
+      <p class="p">
+        Now repeat this procedure for actual queries used in your environment. Use the
+        <code class="ph codeph">DISABLE_UNSAFE_SPILLS</code> setting to identify cases where queries used more memory than
+        necessary due to lack of statistics on the relevant tables and columns, and issue <code class="ph codeph">COMPUTE
+        STATS</code> where necessary.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">When to use DISABLE_UNSAFE_SPILLS:</strong>
+      </p>
+
+      <p class="p">
+        You might wonder, why not leave <code class="ph codeph">DISABLE_UNSAFE_SPILLS</code> turned on all the time. Whether and
+        how frequently to use this option depends on your system environment and workload.
+      </p>
+
+      <p class="p">
+        <code class="ph codeph">DISABLE_UNSAFE_SPILLS</code> is suitable for an environment with ad hoc queries whose performance
+        characteristics and memory usage are not known in advance. It prevents <span class="q">"worst-case scenario"</span> queries
+        that use large amounts of memory unnecessarily. Thus, you might turn this option on within a session while
+        developing new SQL code, even though it is turned off for existing applications.
+      </p>
+
+      <p class="p">
+        Organizations where table and column statistics are generally up-to-date might leave this option turned on
+        all the time, again to avoid worst-case scenarios for untested queries or if a problem in the ETL pipeline
+        results in a table with no statistics. Turning on <code class="ph codeph">DISABLE_UNSAFE_SPILLS</code> lets you <span class="q">"fail
+        fast"</span> in this case and immediately gather statistics or tune the problematic queries.
+      </p>
+
+      <p class="p">
+        Some organizations might leave this option turned off. For example, you might have tables large enough that
+        the <code class="ph codeph">COMPUTE STATS</code> takes substantial time to run, making it impractical to re-run after
+        loading new data. If you have examined the <code class="ph codeph">EXPLAIN</code> plans of your queries and know that
+        they are operating efficiently, you might leave <code class="ph codeph">DISABLE_UNSAFE_SPILLS</code> turned off. In that
+        case, you know that any queries that spill will not go overboard with their memory consumption.
+      </p>
+
+    </div>
+  </article>
+
+<article class="topic concept nested1" aria-labelledby="ariaid-title7" id="scalability__complex_query">
+<h2 class="title topictitle2" id="ariaid-title7">Limits on Query Size and Complexity</h2>
+<div class="body conbody">
+<p class="p">
+There are hardcoded limits on the maximum size and complexity of queries.
+Currently, the maximum number of expressions in a query is 2000.
+You might exceed the limits with large or deeply nested queries
+produced by business intelligence tools or other query generators.
+</p>
+<p class="p">
+If you have the ability to customize such queries or the query generation
+logic that produces them, replace sequences of repetitive expressions
+with single operators such as <code class="ph codeph">IN</code> or <code class="ph codeph">BETWEEN</code>
+that can represent multiple values or ranges.
+For example, instead of a large number of <code class="ph codeph">OR</code> clauses:
+</p>
+<pre class="pre codeblock"><code>WHERE val = 1 OR val = 2 OR val = 6 OR val = 100 ...
+</code></pre>
+<p class="p">
+use a single <code class="ph codeph">IN</code> clause:
+</p>
+<pre class="pre codeblock"><code>WHERE val IN (1,2,6,100,...)</code></pre>
+</div>
+</article>
+
+<article class="topic concept nested1" aria-labelledby="ariaid-title8" id="scalability__scalability_io">
+<h2 class="title topictitle2" id="ariaid-title8">Scalability Considerations for Impala I/O</h2>
+<div class="body conbody">
+<p class="p">
+Impala parallelizes its I/O operations aggressively,
+therefore the more disks you can attach to each host, the better.
+Impala retrieves data from disk so quickly using
+bulk read operations on large blocks, that most queries
+are CPU-bound rather than I/O-bound.
+</p>
+<p class="p">
+Because the kind of sequential scanning typically done by
+Impala queries does not benefit much from the random-access
+capabilities of SSDs, spinning disks typically provide
+the most cost-effective kind of storage for Impala data,
+with little or no performance penalty as compared to SSDs.
+</p>
+<p class="p">
+Resource management features such as YARN, Llama, and admission control
+typically constrain the amount of memory, CPU, or overall number of
+queries in a high-concurrency environment.
+Currently, there is no throttling mechanism for Impala I/O.
+</p>
+</div>
+</article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title9" id="scalability__big_tables">
+    <h2 class="title topictitle2" id="ariaid-title9">Scalability Considerations for Table Layout</h2>
+    <div class="body conbody">
+      <p class="p">
+        Due to the overhead of retrieving and updating table metadata
+        in the metastore database, try to limit the number of columns
+        in a table to a maximum of approximately 2000.
+        Although Impala can handle wider tables than this, the metastore overhead
+        can become significant, leading to query performance that is slower
+        than expected based on the actual data volume.
+      </p>
+      <p class="p">
+        To minimize overhead related to the metastore database and Impala query planning,
+        try to limit the number of partitions for any partitioned table to a few tens of thousands.
+      </p>
+      <p class="p">
+        If the volume of data within a table makes it impractical to run exploratory
+        queries, consider using the <code class="ph codeph">TABLESAMPLE</code> clause to limit query processing
+        to only a percentage of data within the table. This technique reduces the overhead
+        for query startup, I/O to read the data, and the amount of network, CPU, and memory
+        needed to process intermediate results during the query. See <a class="xref" href="impala_tablesample.html">TABLESAMPLE Clause</a>
+        for details.
+      </p>
+    </div>
+  </article>
+
+<article class="topic concept nested1" aria-labelledby="ariaid-title10" id="scalability__kerberos_overhead_cluster_size">
+<h2 class="title topictitle2" id="ariaid-title10">Kerberos-Related Network Overhead for Large Clusters</h2>
+<div class="body conbody">
+<p class="p">
+When Impala starts up, or after each <code class="ph codeph">kinit</code> refresh, Impala sends a number of
+simultaneous requests to the KDC. For a cluster with 100 hosts, the KDC might be able to process
+all the requests within roughly 5 seconds. For a cluster with 1000 hosts, the time to process
+the requests would be roughly 500 seconds. Impala also makes a number of DNS requests at the same
+time as these Kerberos-related requests.
+</p>
+<p class="p">
+While these authentication requests are being processed, any submitted Impala queries will fail.
+During this period, the KDC and DNS may be slow to respond to requests from components other than Impala,
+so other secure services might be affected temporarily.
+</p>
+  <p class="p">
+    In <span class="keyword">Impala 2.12</span> or earlier, to reduce the
+    frequency of the <code class="ph codeph">kinit</code> renewal that initiates a new set
+    of authentication requests, increase the <code class="ph codeph">kerberos_reinit_interval</code>
+    configuration setting for the <code class="ph codeph">impalad</code> daemons. Currently,
+    the default is 60 minutes. Consider using a higher value such as 360 (6 hours).
+      </p>
+  <p class="p">
+    The <code class="ph codeph">kerberos_reinit_interval</code> configuration setting is removed
+    in <span class="keyword">Impala 3.0</span>, and the above step is no longer needed.
+  </p>
+
+</div>
+</article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title11" id="scalability__scalability_hotspots">
+    <h2 class="title topictitle2" id="ariaid-title11">Avoiding CPU Hotspots for HDFS Cached Data</h2>
+    <div class="body conbody">
+      <p class="p">
+        You can use the HDFS caching feature, described in <a class="xref" href="impala_perf_hdfs_caching.html#hdfs_caching">Using HDFS Caching with Impala (Impala 2.1 or higher only)</a>,
+        with Impala to reduce I/O and memory-to-memory copying for frequently accessed tables or partitions.
+      </p>
+      <p class="p">
+        In the early days of this feature, you might have found that enabling HDFS caching
+        resulted in little or no performance improvement, because it could result in
+        <span class="q">"hotspots"</span>: instead of the I/O to read the table data being parallelized across
+        the cluster, the I/O was reduced but the CPU load to process the data blocks
+        might be concentrated on a single host.
+      </p>
+      <p class="p">
+        To avoid hotspots, include the <code class="ph codeph">WITH REPLICATION</code> clause with the
+        <code class="ph codeph">CREATE TABLE</code> or <code class="ph codeph">ALTER TABLE</code> statements for tables that use HDFS caching.
+        This clause allows more than one host to cache the relevant data blocks, so the CPU load
+        can be shared, reducing the load on any one host.
+        See <a class="xref" href="impala_create_table.html#create_table">CREATE TABLE Statement</a> and <a class="xref" href="impala_alter_table.html#alter_table">ALTER TABLE Statement</a>
+        for details.
+      </p>
+      <p class="p">
+        Hotspots with high CPU load for HDFS cached data could still arise in some cases, due to
+        the way that Impala schedules the work of processing data blocks on different hosts.
+        In <span class="keyword">Impala 2.5</span> and higher, scheduling improvements mean that the work for
+        HDFS cached data is divided better among all the hosts that have cached replicas
+        for a particular data block. When more than one host has a cached replica for a data block,
+        Impala assigns the work of processing that block to whichever host has done the least work
+        (in terms of number of bytes read) for the current query. If hotspots persist even with this
+        load-based scheduling algorithm, you can enable the query option <code class="ph codeph">SCHEDULE_RANDOM_REPLICA=TRUE</code>
+        to further distribute the CPU load. This setting causes Impala to randomly pick a host to process a cached
+        data block if the scheduling algorithm encounters a tie when deciding which host has done the
+        least work.
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title12" id="scalability__scalability_file_handle_cache">
+    <h2 class="title topictitle2" id="ariaid-title12">Scalability Considerations for NameNode Traffic with File Handle Caching</h2>
+    <div class="body conbody">
+      <p class="p">
+        One scalability aspect that affects heavily loaded clusters is the load on the HDFS
+        NameNode, from looking up the details as each HDFS file is opened. Impala queries
+        often access many different HDFS files, for example if a query does a full table scan
+        on a table with thousands of partitions, each partition containing multiple data files.
+        Accessing each column of a Parquet file also involves a separate <span class="q">"open"</span> call,
+        further increasing the load on the NameNode. High NameNode overhead can add startup time
+        (that is, increase latency) to Impala queries, and reduce overall throughput for non-Impala
+        workloads that also require accessing HDFS files.
+      </p>
+      <p class="p"> In <span class="keyword">Impala 2.10</span> and higher, you can reduce
+        NameNode overhead by enabling a caching feature for HDFS file handles.
+        Data files that are accessed by different queries, or even multiple
+        times within the same query, can be accessed without a new <span class="q">"open"</span>
+        call and without fetching the file details again from the NameNode. </p>
+      <p class="p">
+        Because this feature only involves HDFS data files, it does not apply to non-HDFS tables,
+        such as Kudu or HBase tables, or tables that store their data on cloud services such as
+        S3 or ADLS. Any read operations that perform remote reads also skip the cached file handles.
+      </p>
+      <p class="p"> The feature is enabled by default with 20,000 file handles to be
+        cached. To change the value, set the configuration option
+          <code class="ph codeph">max_cached_file_handles</code> to a non-zero value for each
+          <span class="keyword cmdname">impalad</span> daemon. From the initial default value of
+        20000, adjust upward if NameNode request load is still significant, or
+        downward if it is more important to reduce the extra memory usage on
+        each host. Each cache entry consumes 6 KB, meaning that caching 20,000
+        file handles requires up to 120 MB on each Impala executor. The exact
+        memory usage varies depending on how many file handles have actually
+        been cached; memory is freed as file handles are evicted from the cache. </p>
+      <p class="p">
+        If a manual HDFS operation moves a file to the HDFS Trashcan while the file handle is cached,
+        Impala still accesses the contents of that file. This is a change from prior behavior. Previously,
+        accessing a file that was in the trashcan would cause an error. This behavior only applies to
+        non-Impala methods of removing HDFS files, not the Impala mechanisms such as <code class="ph codeph">TRUNCATE TABLE</code>
+        or <code class="ph codeph">DROP TABLE</code>.
+      </p>
+      <p class="p">
+        If files are removed, replaced, or appended by HDFS operations outside of Impala, the way to bring the
+        file information up to date is to run the <code class="ph codeph">REFRESH</code> statement on the table.
+      </p>
+      <p class="p">
+        File handle cache entries are evicted as the cache fills up, or based on a timeout period
+        when they have not been accessed for some time.
+      </p>
+      <p class="p">
+        To evaluate the effectiveness of file handle caching for a particular workload, issue the
+        <code class="ph codeph">PROFILE</code> statement in <span class="keyword cmdname">impala-shell</span> or examine query
+        profiles in the Impala web UI. Look for the ratio of <code class="ph codeph">CachedFileHandlesHitCount</code>
+        (ideally, should be high) to <code class="ph codeph">CachedFileHandlesMissCount</code> (ideally, should be low).
+        Before starting any evaluation, run some representative queries to <span class="q">"warm up"</span> the cache,
+        because the first time each data file is accessed is always recorded as a cache miss.
+        To see metrics about file handle caching for each <span class="keyword cmdname">impalad</span> instance,
+        examine the <span class="ph uicontrol">/metrics</span> page in the Impala web UI, in particular the fields
+        <span class="ph uicontrol">impala-server.io.mgr.cached-file-handles-miss-count</span>,
+        <span class="ph uicontrol">impala-server.io.mgr.cached-file-handles-hit-count</span>, and
+        <span class="ph uicontrol">impala-server.io.mgr.num-cached-file-handles</span>.
+      </p>
+    </div>
+  </article>
+
+</article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_schedule_random_replica.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_schedule_random_replica.html b/docs/build3x/html/topics/impala_schedule_random_replica.html
new file mode 100644
index 0000000..85f724d
--- /dev/null
+++ b/docs/build3x/html/topics/impala_schedule_random_replica.html
@@ -0,0 +1,83 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="schedule_random_replica"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>SCHEDULE_RANDOM_REPLICA Query Option (Impala 2.5 or higher only)</title></head><body id="schedule_random_replica"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">SCHEDULE_RANDOM_REPLICA Query Option (<span class="keyword">Impala 2.5</span> or higher only)</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+
+    </p>
+
+    <p class="p">
+      The <code class="ph codeph">SCHEDULE_RANDOM_REPLICA</code> query option fine-tunes the algorithm for deciding which host
+      processes each HDFS data block. It only applies to tables and partitions that are not enabled
+      for the HDFS caching feature.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Type:</strong> Boolean; recognized values are 1 and 0, or <code class="ph codeph">true</code> and <code class="ph codeph">false</code>;
+        any other value interpreted as <code class="ph codeph">false</code>
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Default:</strong> <code class="ph codeph">false</code>
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.5.0</span>
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+    <p class="p">
+      In the presence of HDFS cached replicas, Impala randomizes
+      which host processes each cached data block.
+      To ensure that HDFS data blocks are cached on more
+      than one host, use the <code class="ph codeph">WITH REPLICATION</code> clause along with
+      the <code class="ph codeph">CACHED IN</code> clause in a
+      <code class="ph codeph">CREATE TABLE</code> or <code class="ph codeph">ALTER TABLE</code> statement.
+      Specify a replication value greater than or equal to the HDFS block replication factor.
+    </p>
+
+    <p class="p">
+      The <code class="ph codeph">SCHEDULE_RANDOM_REPLICA</code> query option applies to tables and partitions
+      that <em class="ph i">do not</em> use HDFS caching.
+      By default, Impala estimates how much work each host has done for
+      the query, and selects the host that has the lowest workload.
+      This algorithm is intended to reduce CPU hotspots arising when the
+      same host is selected to process multiple data blocks, but hotspots
+      might still arise for some combinations of queries and data layout.
+      When the <code class="ph codeph">SCHEDULE_RANDOM_REPLICA</code> option is enabled,
+      Impala further randomizes the scheduling algorithm for non-HDFS cached blocks,
+      which can further reduce the chance of CPU hotspots.
+    </p>
+
+    <p class="p">
+      This query option works in conjunction with the work scheduling improvements
+      in <span class="keyword">Impala 2.5</span> and higher. The scheduling improvements
+      distribute the processing for cached HDFS data blocks to minimize hotspots:
+      if a data block is cached on more than one host, Impala chooses which host
+      to process each block based on which host has read the fewest bytes during
+      the current query. Enable <code class="ph codeph">SCHEDULE_RANDOM_REPLICA</code> setting if CPU hotspots
+      still persist because of cases where hosts are <span class="q">"tied"</span> in terms of
+      the amount of work done; by default, Impala picks the first eligible host
+      in this case.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+    <p class="p">
+      <a class="xref" href="impala_perf_hdfs_caching.html#hdfs_caching">Using HDFS Caching with Impala (Impala 2.1 or higher only)</a>,
+      <a class="xref" href="impala_scalability.html#scalability_hotspots">Avoiding CPU Hotspots for HDFS Cached Data</a>
+      , <a class="xref" href="impala_replica_preference.html#replica_preference">REPLICA_PREFERENCE Query Option (Impala 2.7 or higher only)</a>
+    </p>
+
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_schema_design.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_schema_design.html b/docs/build3x/html/topics/impala_schema_design.html
new file mode 100644
index 0000000..31285d6
--- /dev/null
+++ b/docs/build3x/html/topics/impala_schema_design.html
@@ -0,0 +1,184 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_planning.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="schema_design"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Guidelines for Designing Impala Schemas</title></head><body id="schema_design"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Guidelines for Designing Impala Schemas</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+      The guidelines in this topic help you to construct an optimized and scalable schema, one that integrates well
+      with your existing data management processes. Use these guidelines as a checklist when doing any
+      proof-of-concept work, porting exercise, or before deploying to production.
+    </p>
+
+    <p class="p">
+      If you are adapting an existing database or Hive schema for use with Impala, read the guidelines in this
+      section and then see <a class="xref" href="impala_porting.html#porting">Porting SQL from Other Database Systems to Impala</a> for specific porting and compatibility tips.
+    </p>
+
+    <p class="p toc inpage"></p>
+
+    <section class="section" id="schema_design__schema_design_text_vs_binary"><h2 class="title sectiontitle">Prefer binary file formats over text-based formats.</h2>
+
+
+
+      <p class="p">
+        To save space and improve memory usage and query performance, use binary file formats for any large or
+        intensively queried tables. Parquet file format is the most efficient for data warehouse-style analytic
+        queries. Avro is the other binary file format that Impala supports, that you might already have as part of
+        a Hadoop ETL pipeline.
+      </p>
+
+      <p class="p">
+        Although Impala can create and query tables with the RCFile and SequenceFile file formats, such tables are
+        relatively bulky due to the text-based nature of those formats, and are not optimized for data
+        warehouse-style queries due to their row-oriented layout. Impala does not support <code class="ph codeph">INSERT</code>
+        operations for tables with these file formats.
+      </p>
+
+      <p class="p">
+        Guidelines:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          For an efficient and scalable format for large, performance-critical tables, use the Parquet file format.
+        </li>
+
+        <li class="li">
+          To deliver intermediate data during the ETL process, in a format that can also be used by other Hadoop
+          components, Avro is a reasonable choice.
+        </li>
+
+        <li class="li">
+          For convenient import of raw data, use a text table instead of RCFile or SequenceFile, and convert to
+          Parquet in a later stage of the ETL process.
+        </li>
+      </ul>
+    </section>
+
+    <section class="section" id="schema_design__schema_design_compression"><h2 class="title sectiontitle">Use Snappy compression where practical.</h2>
+
+
+
+      <p class="p">
+        Snappy compression involves low CPU overhead to decompress, while still providing substantial space
+        savings. In cases where you have a choice of compression codecs, such as with the Parquet and Avro file
+        formats, use Snappy compression unless you find a compelling reason to use a different codec.
+      </p>
+    </section>
+
+    <section class="section" id="schema_design__schema_design_numeric_types"><h2 class="title sectiontitle">Prefer numeric types over strings.</h2>
+
+
+
+      <p class="p">
+        If you have numeric values that you could treat as either strings or numbers (such as
+        <code class="ph codeph">YEAR</code>, <code class="ph codeph">MONTH</code>, and <code class="ph codeph">DAY</code> for partition key columns), define
+        them as the smallest applicable integer types. For example, <code class="ph codeph">YEAR</code> can be
+        <code class="ph codeph">SMALLINT</code>, <code class="ph codeph">MONTH</code> and <code class="ph codeph">DAY</code> can be <code class="ph codeph">TINYINT</code>.
+        Although you might not see any difference in the way partitioned tables or text files are laid out on disk,
+        using numeric types will save space in binary formats such as Parquet, and in memory when doing queries,
+        particularly resource-intensive queries such as joins.
+      </p>
+    </section>
+
+
+
+    <section class="section" id="schema_design__schema_design_partitioning"><h2 class="title sectiontitle">Partition, but do not over-partition.</h2>
+
+
+
+      <p class="p">
+        Partitioning is an important aspect of performance tuning for Impala. Follow the procedures in
+        <a class="xref" href="impala_partitioning.html#partitioning">Partitioning for Impala Tables</a> to set up partitioning for your biggest, most
+        intensively queried tables.
+      </p>
+
+      <p class="p">
+        If you are moving to Impala from a traditional database system, or just getting started in the Big Data
+        field, you might not have enough data volume to take advantage of Impala parallel queries with your
+        existing partitioning scheme. For example, if you have only a few tens of megabytes of data per day,
+        partitioning by <code class="ph codeph">YEAR</code>, <code class="ph codeph">MONTH</code>, and <code class="ph codeph">DAY</code> columns might be
+        too granular. Most of your cluster might be sitting idle during queries that target a single day, or each
+        node might have very little work to do. Consider reducing the number of partition key columns so that each
+        partition directory contains several gigabytes worth of data.
+      </p>
+
+      <p class="p">
+        For example, consider a Parquet table where each data file is 1 HDFS block, with a maximum block size of 1
+        GB. (In Impala 2.0 and later, the default Parquet block size is reduced to 256 MB. For this exercise, let's
+        assume you have bumped the size back up to 1 GB by setting the query option
+        <code class="ph codeph">PARQUET_FILE_SIZE=1g</code>.) if you have a 10-node cluster, you need 10 data files (up to 10 GB)
+        to give each node some work to do for a query. But each core on each machine can process a separate data
+        block in parallel. With 16-core machines on a 10-node cluster, a query could process up to 160 GB fully in
+        parallel. If there are only a few data files per partition, not only are most cluster nodes sitting idle
+        during queries, so are most cores on those machines.
+      </p>
+
+      <p class="p">
+        You can reduce the Parquet block size to as low as 128 MB or 64 MB to increase the number of files per
+        partition and improve parallelism. But also consider reducing the level of partitioning so that analytic
+        queries have enough data to work with.
+      </p>
+    </section>
+
+    <section class="section" id="schema_design__schema_design_compute_stats"><h2 class="title sectiontitle">Always compute stats after loading data.</h2>
+
+
+
+      <p class="p">
+        Impala makes extensive use of statistics about data in the overall table and in each column, to help plan
+        resource-intensive operations such as join queries and inserting into partitioned Parquet tables. Because
+        this information is only available after data is loaded, run the <code class="ph codeph">COMPUTE STATS</code> statement
+        on a table after loading or replacing data in a table or partition.
+      </p>
+
+      <p class="p">
+        Having accurate statistics can make the difference between a successful operation, or one that fails due to
+        an out-of-memory error or a timeout. When you encounter performance or capacity issues, always use the
+        <code class="ph codeph">SHOW STATS</code> statement to check if the statistics are present and up-to-date for all tables
+        in the query.
+      </p>
+
+      <p class="p">
+        When doing a join query, Impala consults the statistics for each joined table to determine their relative
+        sizes and to estimate the number of rows produced in each join stage. When doing an <code class="ph codeph">INSERT</code>
+        into a Parquet table, Impala consults the statistics for the source table to determine how to distribute
+        the work of constructing the data files for each partition.
+      </p>
+
+      <p class="p">
+        See <a class="xref" href="impala_compute_stats.html#compute_stats">COMPUTE STATS Statement</a> for the syntax of the <code class="ph codeph">COMPUTE
+        STATS</code> statement, and <a class="xref" href="impala_perf_stats.html#perf_stats">Table and Column Statistics</a> for all the performance
+        considerations for table and column statistics.
+      </p>
+    </section>
+
+    <section class="section" id="schema_design__schema_design_explain"><h2 class="title sectiontitle">Verify sensible execution plans with EXPLAIN and SUMMARY.</h2>
+
+
+
+      <p class="p">
+        Before executing a resource-intensive query, use the <code class="ph codeph">EXPLAIN</code> statement to get an overview
+        of how Impala intends to parallelize the query and distribute the work. If you see that the query plan is
+        inefficient, you can take tuning steps such as changing file formats, using partitioned tables, running the
+        <code class="ph codeph">COMPUTE STATS</code> statement, or adding query hints. For information about all of these
+        techniques, see <a class="xref" href="impala_performance.html#performance">Tuning Impala for Performance</a>.
+      </p>
+
+      <p class="p">
+        After you run a query, you can see performance-related information about how it actually ran by issuing the
+        <code class="ph codeph">SUMMARY</code> command in <span class="keyword cmdname">impala-shell</span>. Prior to Impala 1.4, you would use
+        the <code class="ph codeph">PROFILE</code> command, but its highly technical output was only useful for the most
+        experienced users. <code class="ph codeph">SUMMARY</code>, new in Impala 1.4, summarizes the most useful information for
+        all stages of execution, for all nodes rather than splitting out figures for each node.
+      </p>
+    </section>
+
+
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_planning.html">Planning for Impala Deployment</a></div></div></nav></article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_schema_objects.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_schema_objects.html b/docs/build3x/html/topics/impala_schema_objects.html
new file mode 100644
index 0000000..147bb50
--- /dev/null
+++ b/docs/build3x/html/topics/impala_schema_objects.html
@@ -0,0 +1,48 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_langref.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_aliases.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_databases.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_functions_overview.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_identifiers.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_tables.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_views.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name=
 "DC.Format" content="XHTML"><meta name="DC.Identifier" content="schema_objects"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Impala Schema Objects and Object Names</title></head><body id="schema_objects"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Impala Schema Objects and Object Names</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+
+      With Impala, you work with schema objects that are familiar to database users: primarily databases, tables, views,
+      and functions. The SQL syntax to work with these objects is explained in
+      <a class="xref" href="impala_langref_sql.html#langref_sql">Impala SQL Statements</a>. This section explains the conceptual knowledge you need to
+      work with these objects and the various ways to specify their names.
+    </p>
+
+    <p class="p">
+      Within a table, partitions can also be considered a kind of object. Partitioning is an important subject for
+      Impala, with its own documentation section covering use cases and performance considerations. See
+      <a class="xref" href="impala_partitioning.html#partitioning">Partitioning for Impala Tables</a> for details.
+    </p>
+
+    <p class="p">
+      Impala does not have a counterpart of the <span class="q">"tablespace"</span> notion from some database systems. By default,
+      all the data files for a database, table, or partition are located within nested folders within the HDFS file
+      system. You can also specify a particular HDFS location for a given Impala table or partition. The raw data
+      for these objects is represented as a collection of data files, providing the flexibility to load data by
+      simply moving files into the expected HDFS location.
+    </p>
+
+    <p class="p">
+      Information about the schema objects is held in the
+      <a class="xref" href="impala_hadoop.html#intro_metastore">metastore</a> database. This database is shared between
+      Impala and Hive, allowing each to create, drop, and query each other's databases, tables, and so on. When
+      Impala makes a change to schema objects through a <code class="ph codeph">CREATE</code>, <code class="ph codeph">ALTER</code>,
+      <code class="ph codeph">DROP</code>, <code class="ph codeph">INSERT</code>, or <code class="ph codeph">LOAD DATA</code> statement, it broadcasts those
+      changes to all nodes in the cluster through the <a class="xref" href="impala_components.html#intro_catalogd">catalog
+      service</a>. When you make such changes through Hive or directly through manipulating HDFS files, you use
+      the <a class="xref" href="impala_refresh.html#refresh">REFRESH</a> or
+      <a class="xref" href="impala_invalidate_metadata.html#invalidate_metadata">INVALIDATE METADATA</a> statements on the
+      Impala side to recognize the newly loaded data, new tables, and so on.
+    </p>
+
+    <p class="p toc"></p>
+  </div>
+<nav role="navigation" class="related-links"><ul class="ullinks"><li class="link ulchildlink"><strong><a href="../topics/impala_aliases.html">Overview of Impala Aliases</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_databases.html">Overview of Impala Databases</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_functions_overview.html">Overview of Impala Functions</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_identifiers.html">Overview of Impala Identifiers</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_tables.html">Overview of Impala Tables</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_views.html">Overview of Impala Views</a></strong><br></li></ul><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_langref.html">Impala SQL Language Refer
 ence</a></div></div></nav></article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_scratch_limit.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_scratch_limit.html b/docs/build3x/html/topics/impala_scratch_limit.html
new file mode 100644
index 0000000..a743dca
--- /dev/null
+++ b/docs/build3x/html/topics/impala_scratch_limit.html
@@ -0,0 +1,77 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="scratch_limit"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>SCRATCH_LIMIT Query Option</title></head><body id="scratch_limit"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">SCRATCH_LIMIT Query Option</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+
+      Specifies the maximum amount of disk storage, in bytes, that any Impala query can consume
+      on any host using the <span class="q">"spill to disk"</span> mechanism that handles queries that exceed
+      the memory limit.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+    <p class="p">
+      Specify the size in bytes, or with a trailing <code class="ph codeph">m</code> or <code class="ph codeph">g</code> character to indicate
+      megabytes or gigabytes. For example:
+    </p>
+
+
+<pre class="pre codeblock"><code>-- 128 megabytes.
+set SCRATCH_LIMIT=134217728
+
+-- 512 megabytes.
+set SCRATCH_LIMIT=512m;
+
+-- 1 gigabyte.
+set SCRATCH_LIMIT=1g;
+</code></pre>
+
+    <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+    <p class="p">
+      A value of zero turns off the spill to disk feature for queries
+      in the current session, causing them to fail immediately if they
+      exceed the memory limit.
+    </p>
+
+    <p class="p">
+      The amount of memory used per host for a query is limited by the
+      <code class="ph codeph">MEM_LIMIT</code> query option.
+    </p>
+
+    <p class="p">
+      The more Impala daemon hosts in the cluster, the less memory is used on each host,
+      and therefore also less scratch space is required for queries that
+      exceed the memory limit.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Type:</strong> numeric, with optional unit specifier
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Default:</strong> -1 (amount of spill space is unlimited)
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+    <p class="p">
+      <a class="xref" href="impala_scalability.html#spill_to_disk">SQL Operations that Spill to Disk</a>,
+      <a class="xref" href="impala_mem_limit.html#mem_limit">MEM_LIMIT Query Option</a>
+    </p>
+
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_security.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_security.html b/docs/build3x/html/topics/impala_security.html
new file mode 100644
index 0000000..e8b1588
--- /dev/null
+++ b/docs/build3x/html/topics/impala_security.html
@@ -0,0 +1,99 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_security_guidelines.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_security_files.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_security_install.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_security_metastore.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_security_webui.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_ssl.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_authorization.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_authentication.html"><meta name="DC.Relation" scheme="URI" content="../topics/i
 mpala_auditing.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_lineage.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="security"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Impala Security</title></head><body id="security"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1"><span class="ph">Impala Security</span></h1>
+
+
+  <div class="body conbody">
+
+    <p class="p">
+      Impala includes a fine-grained authorization framework for Hadoop, based on Apache Sentry.
+      Sentry authorization was added in Impala 1.1.0. Together with the Kerberos
+      authentication framework, Sentry takes Hadoop security to a new level needed for the requirements of
+      highly regulated industries such as healthcare, financial services, and government. Impala also includes
+      an auditing capability which was added in Impala 1.1.1; Impala generates the audit data which can be
+      consumed, filtered, and visualized by cluster-management components focused on governance.
+    </p>
+
+    <p class="p">
+      The Impala security features have several objectives. At the most basic level, security prevents
+      accidents or mistakes that could disrupt application processing, delete or corrupt data, or reveal data to
+      unauthorized users. More advanced security features and practices can harden the system against malicious
+      users trying to gain unauthorized access or perform other disallowed operations. The auditing feature
+      provides a way to confirm that no unauthorized access occurred, and detect whether any such attempts were
+      made. This is a critical set of features for production deployments in large organizations that handle
+      important or sensitive data. It sets the stage for multi-tenancy, where multiple applications run
+      concurrently and are prevented from interfering with each other.
+    </p>
+
+    <p class="p">
+      The material in this section presumes that you are already familiar with administering secure Linux systems.
+      That is, you should know the general security practices for Linux and Hadoop, and their associated commands
+      and configuration files. For example, you should know how to create Linux users and groups, manage Linux
+      group membership, set Linux and HDFS file permissions and ownership, and designate the default permissions
+      and ownership for new files. You should be familiar with the configuration of the nodes in your Hadoop
+      cluster, and know how to apply configuration changes or run a set of commands across all the nodes.
+    </p>
+
+    <p class="p">
+      The security features are divided into these broad categories:
+    </p>
+
+    <dl class="dl">
+
+
+        <dt class="dt dlterm">
+          authorization
+        </dt>
+
+        <dd class="dd">
+          Which users are allowed to access which resources, and what operations are they allowed to perform?
+          Impala relies on the open source Sentry project for authorization. By default (when authorization is not
+          enabled), Impala does all read and write operations with the privileges of the <code class="ph codeph">impala</code>
+          user, which is suitable for a development/test environment but not for a secure production environment.
+          When authorization is enabled, Impala uses the OS user ID of the user who runs
+          <span class="keyword cmdname">impala-shell</span> or other client program, and associates various privileges with each
+          user. See <a class="xref" href="impala_authorization.html#authorization">Enabling Sentry Authorization for Impala</a> for details about setting up and managing
+          authorization.
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm">
+          authentication
+        </dt>
+
+        <dd class="dd">
+          How does Impala verify the identity of the user to confirm that they really are allowed to exercise the
+          privileges assigned to that user? Impala relies on the Kerberos subsystem for authentication. See
+          <a class="xref" href="impala_kerberos.html#kerberos">Enabling Kerberos Authentication for Impala</a> for details about setting up and managing authentication.
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm">
+          auditing
+        </dt>
+
+        <dd class="dd">
+          What operations were attempted, and did they succeed or not? This feature provides a way to look back and
+          diagnose whether attempts were made to perform unauthorized operations. You use this information to track
+          down suspicious activity, and to see where changes are needed in authorization policies. The audit data
+          produced by this feature can be collected and presented in a user-friendly form by cluster-management
+          software. See <a class="xref" href="impala_auditing.html#auditing">Auditing Impala Operations</a> for details about setting up and managing
+          auditing.
+        </dd>
+
+
+    </dl>
+
+    <p class="p toc"></p>
+
+
+  </div>
+<nav role="navigation" class="related-links"><ul class="ullinks"><li class="link ulchildlink"><strong><a href="../topics/impala_security_guidelines.html">Security Guidelines for Impala</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_security_files.html">Securing Impala Data and Log Files</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_security_install.html">Installation Considerations for Impala Security</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_security_metastore.html">Securing the Hive Metastore Database</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_security_webui.html">Securing the Impala Web User Interface</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_ssl.html">Configuring TLS/SSL for Impala</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_authorization.html
 ">Enabling Sentry Authorization for Impala</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_authentication.html">Impala Authentication</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_auditing.html">Auditing Impala Operations</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_lineage.html">Viewing Lineage Information for Impala Data</a></strong><br></li></ul></nav></article></main></body></html>

[22/51] [partial] impala git commit: [DOCS] Impala doc site update for 3.0

Posted by mi...@apache.org.

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_math_functions.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_math_functions.html b/docs/build3x/html/topics/impala_math_functions.html
new file mode 100644
index 0000000..9987e34
--- /dev/null
+++ b/docs/build3x/html/topics/impala_math_functions.html
@@ -0,0 +1,1711 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_functions.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="math_functions"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Impala Mathematical Functions</title></head><body id="math_functions"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Impala Mathematical Functions</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+      Mathematical functions, or arithmetic functions, perform numeric calculations that are typically more complex
+      than basic addition, subtraction, multiplication, and division. For example, these functions include
+      trigonometric, logarithmic, and base conversion operations.
+    </p>
+
+    <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+      In Impala, exponentiation uses the <code class="ph codeph">pow()</code> function rather than an exponentiation operator
+      such as <code class="ph codeph">**</code>.
+    </div>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+    <p class="p">
+      The mathematical functions operate mainly on these data types: <a class="xref" href="impala_int.html#int">INT Data Type</a>,
+      <a class="xref" href="impala_bigint.html#bigint">BIGINT Data Type</a>, <a class="xref" href="impala_smallint.html#smallint">SMALLINT Data Type</a>,
+      <a class="xref" href="impala_tinyint.html#tinyint">TINYINT Data Type</a>, <a class="xref" href="impala_double.html#double">DOUBLE Data Type</a>,
+      <a class="xref" href="impala_float.html#float">FLOAT Data Type</a>, and <a class="xref" href="impala_decimal.html#decimal">DECIMAL Data Type (Impala 3.0 or higher only)</a>. For the operators that
+      perform the standard operations such as addition, subtraction, multiplication, and division, see
+      <a class="xref" href="impala_operators.html#arithmetic_operators">Arithmetic Operators</a>.
+    </p>
+
+    <p class="p">
+      Functions that perform bitwise operations are explained in <a class="xref" href="impala_bit_functions.html#bit_functions">Impala Bit Functions</a>.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Function reference:</strong>
+    </p>
+
+    <p class="p">
+      Impala supports the following mathematical functions:
+    </p>
+
+    <dl class="dl">
+
+
+        <dt class="dt dlterm" id="math_functions__abs">
+          <code class="ph codeph">abs(numeric_type a)</code>
+
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Returns the absolute value of the argument.
+          <p class="p">
+        <strong class="ph b">Return type:</strong> Same as the input value
+      </p>
+          <p class="p">
+            <strong class="ph b">Usage notes:</strong> Use this function to ensure all return values are positive. This is different than
+            the <code class="ph codeph">positive()</code> function, which returns its argument unchanged (even if the argument
+            was negative).
+          </p>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="math_functions__acos">
+          <code class="ph codeph">acos(double a)</code>
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Returns the arccosine of the argument.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">double</code>
+          </p>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="math_functions__asin">
+          <code class="ph codeph">asin(double a)</code>
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Returns the arcsine of the argument.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">double</code>
+          </p>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="math_functions__atan">
+          <code class="ph codeph">atan(double a)</code>
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Returns the arctangent of the argument.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">double</code>
+          </p>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="math_functions__atan2">
+          <code class="ph codeph">atan2(double a, double b)</code>
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Returns the arctangent of the two arguments, with the signs of the arguments used to determine the
+          quadrant of the result.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">double</code>
+          </p>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="math_functions__bin">
+          <code class="ph codeph">bin(bigint a)</code>
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Returns the binary representation of an integer value, that is, a string of 0 and 1
+          digits.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+          </p>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="math_functions__ceil">
+          <code class="ph codeph">ceil(double a)</code>,
+          <code class="ph codeph">ceil(decimal(p,s) a)</code>,
+          <code class="ph codeph" id="math_functions__ceiling">ceiling(double a)</code>,
+          <code class="ph codeph">ceiling(decimal(p,s) a)</code>,
+          <code class="ph codeph" id="math_functions__dceil">dceil(double a)</code>,
+          <code class="ph codeph">dceil(decimal(p,s) a)</code>
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Returns the smallest integer that is greater than or equal to the argument.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> Same as the input value
+          </p>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="math_functions__conv">
+          <code class="ph codeph">conv(bigint num, int from_base, int to_base), conv(string num, int from_base, int
+          to_base)</code>
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Returns a string representation of an integer value in a particular base. The input value
+          can be a string, for example to convert a hexadecimal number such as <code class="ph codeph">fce2</code> to decimal. To
+          use the return value as a number (for example, when converting to base 10), use <code class="ph codeph">CAST()</code>
+          to convert to the appropriate type.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+          </p>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="math_functions__cos">
+          <code class="ph codeph">cos(double a)</code>
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Returns the cosine of the argument.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">double</code>
+          </p>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="math_functions__cosh">
+          <code class="ph codeph">cosh(double a)</code>
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Returns the hyperbolic cosine of the argument.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">double</code>
+          </p>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="math_functions__cot">
+          <code class="ph codeph">cot(double a)</code>
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Returns the cotangent of the argument.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">double</code>
+          </p>
+          <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.3.0</span>
+      </p>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="math_functions__degrees">
+          <code class="ph codeph">degrees(double a)</code>
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Converts argument value from radians to degrees.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">double</code>
+          </p>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="math_functions__e">
+          <code class="ph codeph">e()</code>
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Returns the
+          <a class="xref" href="https://en.wikipedia.org/wiki/E_(mathematical_constant" target="_blank">mathematical
+          constant e</a>.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">double</code>
+          </p>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="math_functions__exp">
+          <code class="ph codeph">exp(double a)</code>,
+          <code class="ph codeph" id="math_functions__dexp">dexp(double a)</code>
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Returns the
+          <a class="xref" href="https://en.wikipedia.org/wiki/E_(mathematical_constant" target="_blank">mathematical
+          constant e</a> raised to the power of the argument.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">double</code>
+          </p>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="math_functions__factorial">
+          <code class="ph codeph">factorial(integer_type a)</code>
+        </dt>
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Computes the <a class="xref" href="https://en.wikipedia.org/wiki/Factorial" target="_blank">factorial</a> of an integer value.
+          It works with any integer type.
+          <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.3.0</span>
+      </p>
+          <p class="p">
+            <strong class="ph b">Usage notes:</strong> You can use either the <code class="ph codeph">factorial()</code> function or the <code class="ph codeph">!</code> operator.
+            The factorial of 0 is 1. Likewise, the <code class="ph codeph">factorial()</code> function returns 1 for any negative value.
+            The maximum positive value for the input argument is 20; a value of 21 or greater overflows the
+            range for a <code class="ph codeph">BIGINT</code> and causes an error.
+          </p>
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">bigint</code>
+          </p>
+          <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.3.0</span>
+      </p>
+<pre class="pre codeblock"><code>select factorial(5);
++--------------+
+| factorial(5) |
++--------------+
+| 120          |
++--------------+
+
+select 5!;
++-----+
+| 5!  |
++-----+
+| 120 |
++-----+
+
+select factorial(0);
++--------------+
+| factorial(0) |
++--------------+
+| 1            |
++--------------+
+
+select factorial(-100);
++-----------------+
+| factorial(-100) |
++-----------------+
+| 1               |
++-----------------+
+</code></pre>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="math_functions__floor">
+          <code class="ph codeph">floor(double a)</code>,
+          <code class="ph codeph">floor(decimal(p,s) a)</code>,
+          <code class="ph codeph" id="math_functions__dfloor">dfloor(double a)</code>,
+          <code class="ph codeph">dfloor(decimal(p,s) a)</code>
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Returns the largest integer that is less than or equal to the argument.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> Same as the input type
+          </p>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="math_functions__fmod">
+          <code class="ph codeph">fmod(double a, double b), fmod(float a, float b)</code>
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Returns the modulus of a floating-point number. Equivalent to the <code class="ph codeph">%</code> arithmetic operator.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">float</code> or <code class="ph codeph">double</code>, depending on type of arguments
+          </p>
+          <p class="p">
+        <strong class="ph b">Added in:</strong> Impala 1.1.1
+      </p>
+          <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+          <p class="p">
+            Because this function operates on <code class="ph codeph">DOUBLE</code> or <code class="ph codeph">FLOAT</code>
+            values, it is subject to potential rounding errors for values that cannot be
+            represented precisely. Prefer to use whole numbers, or values that you know
+            can be represented precisely by the <code class="ph codeph">DOUBLE</code> or <code class="ph codeph">FLOAT</code>
+            types.
+          </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+          <p class="p">
+            The following examples show equivalent operations with the <code class="ph codeph">fmod()</code>
+            function and the <code class="ph codeph">%</code> arithmetic operator, for values not subject
+            to any rounding error.
+          </p>
+<pre class="pre codeblock"><code>select fmod(10,3);
++-------------+
+| fmod(10, 3) |
++-------------+
+| 1           |
++-------------+
+
+select fmod(5.5,2);
++--------------+
+| fmod(5.5, 2) |
++--------------+
+| 1.5          |
++--------------+
+
+select 10 % 3;
++--------+
+| 10 % 3 |
++--------+
+| 1      |
++--------+
+
+select 5.5 % 2;
++---------+
+| 5.5 % 2 |
++---------+
+| 1.5     |
++---------+
+</code></pre>
+          <p class="p">
+            The following examples show operations with the <code class="ph codeph">fmod()</code>
+            function for values that cannot be represented precisely by the
+            <code class="ph codeph">DOUBLE</code> or <code class="ph codeph">FLOAT</code> types, and thus are
+            subject to rounding error. <code class="ph codeph">fmod(9.9,3.0)</code> returns a value
+            slightly different than the expected 0.9 because of rounding.
+            <code class="ph codeph">fmod(9.9,3.3)</code> returns a value quite different from
+            the expected value of 0 because of rounding error during intermediate
+            calculations.
+          </p>
+<pre class="pre codeblock"><code>select fmod(9.9,3.0);
++--------------------+
+| fmod(9.9, 3.0)     |
++--------------------+
+| 0.8999996185302734 |
++--------------------+
+
+select fmod(9.9,3.3);
++-------------------+
+| fmod(9.9, 3.3)    |
++-------------------+
+| 3.299999713897705 |
++-------------------+
+</code></pre>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="math_functions__fnv_hash">
+          <code class="ph codeph">fnv_hash(type v)</code>,
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Returns a consistent 64-bit value derived from the input argument, for convenience of
+          implementing hashing logic in an application.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">BIGINT</code>
+          </p>
+          <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+          <p class="p">
+            You might use the return value in an application where you perform load balancing, bucketing, or some
+            other technique to divide processing or storage.
+          </p>
+          <p class="p">
+            Because the result can be any 64-bit value, to restrict the value to a particular range, you can use an
+            expression that includes the <code class="ph codeph">ABS()</code> function and the <code class="ph codeph">%</code> (modulo)
+            operator. For example, to produce a hash value in the range 0-9, you could use the expression
+            <code class="ph codeph">ABS(FNV_HASH(x)) % 10</code>.
+          </p>
+          <p class="p">
+            This function implements the same algorithm that Impala uses internally for hashing, on systems where
+            the CRC32 instructions are not available.
+          </p>
+          <p class="p">
+            This function implements the
+            <a class="xref" href="http://en.wikipedia.org/wiki/Fowler%E2%80%93Noll%E2%80%93Vo_hash_function" target="_blank">Fowler–Noll–Vo
+            hash function</a>, in particular the FNV-1a variation. This is not a perfect hash function: some
+            combinations of values could produce the same result value. It is not suitable for cryptographic use.
+          </p>
+          <p class="p">
+            Similar input values of different types could produce different hash values, for example the same
+            numeric value represented as <code class="ph codeph">SMALLINT</code> or <code class="ph codeph">BIGINT</code>,
+            <code class="ph codeph">FLOAT</code> or <code class="ph codeph">DOUBLE</code>, or <code class="ph codeph">DECIMAL(5,2)</code> or
+            <code class="ph codeph">DECIMAL(20,5)</code>.
+          </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+<pre class="pre codeblock"><code>[localhost:21000] &gt; create table h (x int, s string);
+[localhost:21000] &gt; insert into h values (0, 'hello'), (1,'world'), (1234567890,'antidisestablishmentarianism');
+[localhost:21000] &gt; select x, fnv_hash(x) from h;
++------------+----------------------+
+| x          | fnv_hash(x)          |
++------------+----------------------+
+| 0          | -2611523532599129963 |
+| 1          | 4307505193096137732  |
+| 1234567890 | 3614724209955230832  |
++------------+----------------------+
+[localhost:21000] &gt; select s, fnv_hash(s) from h;
++------------------------------+---------------------+
+| s                            | fnv_hash(s)         |
++------------------------------+---------------------+
+| hello                        | 6414202926103426347 |
+| world                        | 6535280128821139475 |
+| antidisestablishmentarianism | -209330013948433970 |
++------------------------------+---------------------+
+[localhost:21000] &gt; select s, abs(fnv_hash(s)) % 10 from h;
++------------------------------+-------------------------+
+| s                            | abs(fnv_hash(s)) % 10.0 |
++------------------------------+-------------------------+
+| hello                        | 8                       |
+| world                        | 6                       |
+| antidisestablishmentarianism | 4                       |
++------------------------------+-------------------------+</code></pre>
+          <p class="p">
+            For short argument values, the high-order bits of the result have relatively low entropy:
+          </p>
+<pre class="pre codeblock"><code>[localhost:21000] &gt; create table b (x boolean);
+[localhost:21000] &gt; insert into b values (true), (true), (false), (false);
+[localhost:21000] &gt; select x, fnv_hash(x) from b;
++-------+---------------------+
+| x     | fnv_hash(x)         |
++-------+---------------------+
+| true  | 2062020650953872396 |
+| true  | 2062020650953872396 |
+| false | 2062021750465500607 |
+| false | 2062021750465500607 |
++-------+---------------------+</code></pre>
+          <p class="p">
+            <strong class="ph b">Added in:</strong> Impala 1.2.2
+          </p>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="math_functions__greatest">
+          <code class="ph codeph">greatest(bigint a[, bigint b ...])</code>, <code class="ph codeph">greatest(double a[, double b ...])</code>,
+          <code class="ph codeph">greatest(decimal(p,s) a[, decimal(p,s) b ...])</code>, <code class="ph codeph">greatest(string a[, string b
+          ...])</code>, <code class="ph codeph">greatest(timestamp a[, timestamp b ...])</code>
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Returns the largest value from a list of expressions.
+          <p class="p">
+        <strong class="ph b">Return type:</strong> same as the initial argument value, except that integer values are promoted to
+        <code class="ph codeph">BIGINT</code> and floating-point values are promoted to <code class="ph codeph">DOUBLE</code>; use
+        <code class="ph codeph">CAST()</code> when inserting into a smaller numeric column
+      </p>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="math_functions__hex">
+          <code class="ph codeph">hex(bigint a), hex(string a)</code>
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Returns the hexadecimal representation of an integer value, or of the characters in a
+          string.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+          </p>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="math_functions__is_inf">
+          <code class="ph codeph">is_inf(double a)</code>,
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Tests whether a value is equal to the special value <span class="q">"inf"</span>, signifying infinity.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">boolean</code>
+          </p>
+          <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+          <p class="p">
+        Infinity and NaN can be specified in text data files as <code class="ph codeph">inf</code> and <code class="ph codeph">nan</code>
+        respectively, and Impala interprets them as these special values. They can also be produced by certain
+        arithmetic expressions; for example, <code class="ph codeph">1/0</code> returns <code class="ph codeph">Infinity</code> and
+        <code class="ph codeph">pow(-1, 0.5)</code> returns <code class="ph codeph">NaN</code>. Or you can cast the literal values, such as <code class="ph codeph">CAST('nan' AS
+        DOUBLE)</code> or <code class="ph codeph">CAST('inf' AS DOUBLE)</code>.
+      </p>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="math_functions__is_nan">
+          <code class="ph codeph">is_nan(double a)</code>,
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Tests whether a value is equal to the special value <span class="q">"NaN"</span>, signifying <span class="q">"not a
+          number"</span>.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">boolean</code>
+          </p>
+          <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+          <p class="p">
+        Infinity and NaN can be specified in text data files as <code class="ph codeph">inf</code> and <code class="ph codeph">nan</code>
+        respectively, and Impala interprets them as these special values. They can also be produced by certain
+        arithmetic expressions; for example, <code class="ph codeph">1/0</code> returns <code class="ph codeph">Infinity</code> and
+        <code class="ph codeph">pow(-1, 0.5)</code> returns <code class="ph codeph">NaN</code>. Or you can cast the literal values, such as <code class="ph codeph">CAST('nan' AS
+        DOUBLE)</code> or <code class="ph codeph">CAST('inf' AS DOUBLE)</code>.
+      </p>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="math_functions__least">
+          <code class="ph codeph">least(bigint a[, bigint b ...])</code>, <code class="ph codeph">least(double a[, double b ...])</code>,
+          <code class="ph codeph">least(decimal(p,s) a[, decimal(p,s) b ...])</code>, <code class="ph codeph">least(string a[, string b
+          ...])</code>, <code class="ph codeph">least(timestamp a[, timestamp b ...])</code>
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Returns the smallest value from a list of expressions.
+          <p class="p">
+        <strong class="ph b">Return type:</strong> same as the initial argument value, except that integer values are promoted to
+        <code class="ph codeph">BIGINT</code> and floating-point values are promoted to <code class="ph codeph">DOUBLE</code>; use
+        <code class="ph codeph">CAST()</code> when inserting into a smaller numeric column
+      </p>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="math_functions__ln">
+          <code class="ph codeph">ln(double a)</code>,
+          <code class="ph codeph" id="math_functions__dlog1">dlog1(double a)</code>
+        </dt>
+
+        <dd class="dd">
+
+
+          <strong class="ph b">Purpose:</strong> Returns the
+          <a class="xref" href="https://en.wikipedia.org/wiki/Natural_logarithm" target="_blank">natural
+          logarithm</a> of the argument.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">double</code>
+          </p>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="math_functions__log">
+          <code class="ph codeph">log(double base, double a)</code>
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Returns the logarithm of the second argument to the specified base.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">double</code>
+          </p>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="math_functions__log10">
+          <code class="ph codeph">log10(double a)</code>,
+          <code class="ph codeph" id="math_functions__dlog10">dlog10(double a)</code>
+        </dt>
+
+        <dd class="dd">
+
+
+          <strong class="ph b">Purpose:</strong> Returns the logarithm of the argument to the base 10.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">double</code>
+          </p>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="math_functions__log2">
+          <code class="ph codeph">log2(double a)</code>
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Returns the logarithm of the argument to the base 2.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">double</code>
+          </p>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="math_functions__max_int">
+          <code class="ph codeph">max_int(), <span class="ph" id="math_functions__max_tinyint">max_tinyint()</span>, <span class="ph" id="math_functions__max_smallint">max_smallint()</span>,
+          <span class="ph" id="math_functions__max_bigint">max_bigint()</span></code>
+        </dt>
+
+        <dd class="dd">
+
+
+
+
+          <strong class="ph b">Purpose:</strong> Returns the largest value of the associated integral type.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> The same as the integral type being checked.
+          </p>
+          <p class="p">
+
+            <strong class="ph b">Usage notes:</strong> Use the corresponding <code class="ph codeph">min_</code> and <code class="ph codeph">max_</code> functions to
+            check if all values in a column are within the allowed range, before copying data or altering column
+            definitions. If not, switch to the next higher integral type or to a <code class="ph codeph">DECIMAL</code> with
+            sufficient precision.
+          </p>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="math_functions__min_int">
+          <code class="ph codeph">min_int(), <span class="ph" id="math_functions__min_tinyint">min_tinyint()</span>, <span class="ph" id="math_functions__min_smallint">min_smallint()</span>,
+          <span class="ph" id="math_functions__min_bigint">min_bigint()</span></code>
+        </dt>
+
+        <dd class="dd">
+
+
+
+
+          <strong class="ph b">Purpose:</strong> Returns the smallest value of the associated integral type (a negative number).
+          <p class="p">
+            <strong class="ph b">Return type:</strong> The same as the integral type being checked.
+          </p>
+          <p class="p">
+            <strong class="ph b">Usage notes:</strong> Use the corresponding <code class="ph codeph">min_</code> and <code class="ph codeph">max_</code> functions to
+            check if all values in a column are within the allowed range, before copying data or altering column
+            definitions. If not, switch to the next higher integral type or to a <code class="ph codeph">DECIMAL</code> with
+            sufficient precision.
+          </p>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="math_functions__mod">
+          <code class="ph codeph">mod(<var class="keyword varname">numeric_type</var> a, <var class="keyword varname">same_type</var> b)</code>
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Returns the modulus of a number. Equivalent to the <code class="ph codeph">%</code> arithmetic operator.
+          Works with any size integer type, any size floating-point type, and <code class="ph codeph">DECIMAL</code>
+          with any precision and scale.
+          <p class="p">
+        <strong class="ph b">Return type:</strong> Same as the input value
+      </p>
+          <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.2.0</span>
+      </p>
+          <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+          <p class="p">
+            Because this function works with <code class="ph codeph">DECIMAL</code> values, prefer it over <code class="ph codeph">fmod()</code>
+            when working with fractional values. It is not subject to the rounding errors that make
+            <code class="ph codeph">fmod()</code> problematic with floating-point numbers.
+            The <code class="ph codeph">%</code> arithmetic operator now uses the <code class="ph codeph">mod()</code> function
+            in cases where its arguments can be interpreted as <code class="ph codeph">DECIMAL</code> values,
+            increasing the accuracy of that operator.
+          </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+          <p class="p">
+            The following examples show how the <code class="ph codeph">mod()</code> function works for
+            whole numbers and fractional values, and how the <code class="ph codeph">%</code> operator
+            works the same way. In the case of <code class="ph codeph">mod(9.9,3)</code>,
+            the type conversion for the second argument results in the first argument
+            being interpreted as <code class="ph codeph">DOUBLE</code>, so to produce an accurate
+            <code class="ph codeph">DECIMAL</code> result requires casting the second argument
+            or writing it as a <code class="ph codeph">DECIMAL</code> literal, 3.0.
+          </p>
+<pre class="pre codeblock"><code>select mod(10,3);
++-------------+
+| fmod(10, 3) |
++-------------+
+| 1           |
++-------------+
+
+select mod(5.5,2);
++--------------+
+| fmod(5.5, 2) |
++--------------+
+| 1.5          |
++--------------+
+
+select 10 % 3;
++--------+
+| 10 % 3 |
++--------+
+| 1      |
++--------+
+
+select 5.5 % 2;
++---------+
+| 5.5 % 2 |
++---------+
+| 1.5     |
++---------+
+
+select mod(9.9,3.3);
++---------------+
+| mod(9.9, 3.3) |
++---------------+
+| 0.0           |
++---------------+
+
+select mod(9.9,3);
++--------------------+
+| mod(9.9, 3)        |
++--------------------+
+| 0.8999996185302734 |
++--------------------+
+
+select mod(9.9, cast(3 as decimal(2,1)));
++-----------------------------------+
+| mod(9.9, cast(3 as decimal(2,1))) |
++-----------------------------------+
+| 0.9                               |
++-----------------------------------+
+
+select mod(9.9,3.0);
++---------------+
+| mod(9.9, 3.0) |
++---------------+
+| 0.9           |
++---------------+
+</code></pre>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="math_functions__murmur_hash">
+          <code class="ph codeph">murmur_hash(type v)</code>
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Returns a consistent 64-bit value derived from the input argument, for convenience of
+          implementing <a class="xref" href="https://en.wikipedia.org/wiki/MurmurHash" target="_blank"> MurmurHash2</a> non-cryptographic hash function.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">BIGINT</code>
+          </p>
+          <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+          <p class="p">
+            You might use the return value in an application where you perform load balancing, bucketing, or some
+            other technique to divide processing or storage. This function provides a good performance for all kinds
+            of keys such as number, ascii string and UTF-8. It can be recommended as general-purpose hashing function.
+          </p>
+          <p class="p">
+            Regarding comparison of murmur_hash with fnv_hash, murmur_hash is based on Murmur2 hash algorithm and fnv_hash
+            function is based on FNV-1a hash algorithm. Murmur2 and FNV-1a can show very good randomness and performance
+            compared with well known other hash algorithms, but Murmur2 slightly show better randomness and performance than FNV-1a.
+            See <a class="xref" href="https://www.strchr.com/hash_functions" target="_blank">[1]</a><a class="xref" href="https://aras-p.info/blog/2016/08/09/More-Hash-Function-Tests" target="_blank">[2]</a><a class="xref" href="https://www.strchr.com/hash_functions" target="_blank">[3]</a> for details.
+          </p>
+          <p class="p">
+            Similar input values of different types could produce different hash values, for example the same
+            numeric value represented as <code class="ph codeph">SMALLINT</code> or <code class="ph codeph">BIGINT</code>,
+            <code class="ph codeph">FLOAT</code> or <code class="ph codeph">DOUBLE</code>, or <code class="ph codeph">DECIMAL(5,2)</code> or
+            <code class="ph codeph">DECIMAL(20,5)</code>.
+          </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+<pre class="pre codeblock"><code>[localhost:21000] &gt; create table h (x int, s string);
+[localhost:21000] &gt; insert into h values (0, 'hello'), (1,'world'), (1234567890,'antidisestablishmentarianism');
+[localhost:21000] &gt; select x, murmur_hash(x) from h;
++------------+----------------------+
+| x          | murmur_hash(x)       |
++------------+----------------------+
+| 0          | 6960269033020761575  |
+| 1          | -780611581681153783  |
+| 1234567890 | -5754914572385924334 |
++------------+----------------------+
+[localhost:21000] &gt; select s, murmur_hash(s) from h;
++------------------------------+----------------------+
+| s                            | murmur_hash(s)       |
++------------------------------+----------------------+
+| hello                        | 2191231550387646743  |
+| world                        | 5568329560871645431  |
+| antidisestablishmentarianism | -2261804666958489663 |
++------------------------------+----------------------+ </code></pre>
+          <p class="p">
+            For short argument values, the high-order bits of the result have relatively higher entropy than fnv_hash:
+          </p>
+<pre class="pre codeblock"><code>[localhost:21000] &gt; create table b (x boolean);
+[localhost:21000] &gt; insert into b values (true), (true), (false), (false);
+[localhost:21000] &gt; select x, murmur_hash(x) from b;
++-------+----------------------+
+| x     | murmur_hash(x)       |
++-------+---------------------++
+| true  | -5720937396023583481 |
+| true  | -5720937396023583481 |
+| false | 6351753276682545529  |
+| false | 6351753276682545529  |
++-------+--------------------+-+</code></pre>
+          <p class="p">
+            <strong class="ph b">Added in:</strong> Impala 2.12.0
+          </p>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="math_functions__negative">
+          <code class="ph codeph">negative(numeric_type a)</code>
+
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Returns the argument with the sign reversed; returns a positive value if the argument was
+          already negative.
+          <p class="p">
+        <strong class="ph b">Return type:</strong> Same as the input value
+      </p>
+
+          <p class="p">
+            <strong class="ph b">Usage notes:</strong> Use <code class="ph codeph">-abs(a)</code> instead if you need to ensure all return values are
+            negative.
+          </p>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="math_functions__pi">
+          <code class="ph codeph">pi()</code>
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Returns the constant pi.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">double</code>
+          </p>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="math_functions__pmod">
+          <code class="ph codeph">pmod(bigint a, bigint b), pmod(double a, double b)</code>
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Returns the positive modulus of a number.
+          Primarily for <a class="xref" href="https://issues.apache.org/jira/browse/HIVE-656" target="_blank">HiveQL compatibility</a>.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">int</code> or <code class="ph codeph">double</code>, depending on type of arguments
+          </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+          <p class="p">
+            The following examples show how the <code class="ph codeph">fmod()</code> function sometimes returns a negative value
+            depending on the sign of its arguments, and the <code class="ph codeph">pmod()</code> function returns the same value
+            as <code class="ph codeph">fmod()</code>, but sometimes with the sign flipped.
+          </p>
+<pre class="pre codeblock"><code>select fmod(-5,2);
++-------------+
+| fmod(-5, 2) |
++-------------+
+| -1          |
++-------------+
+
+select pmod(-5,2);
++-------------+
+| pmod(-5, 2) |
++-------------+
+| 1           |
++-------------+
+
+select fmod(-5,-2);
++--------------+
+| fmod(-5, -2) |
++--------------+
+| -1           |
++--------------+
+
+select pmod(-5,-2);
++--------------+
+| pmod(-5, -2) |
++--------------+
+| -1           |
++--------------+
+
+select fmod(5,-2);
++-------------+
+| fmod(5, -2) |
++-------------+
+| 1           |
++-------------+
+
+select pmod(5,-2);
++-------------+
+| pmod(5, -2) |
++-------------+
+| -1          |
++-------------+
+</code></pre>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="math_functions__positive">
+          <code class="ph codeph">positive(numeric_type a)</code>
+
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Returns the original argument unchanged (even if the argument is negative).
+          <p class="p">
+        <strong class="ph b">Return type:</strong> Same as the input value
+      </p>
+
+          <p class="p">
+            <strong class="ph b">Usage notes:</strong> Use <code class="ph codeph">abs()</code> instead if you need to ensure all return values are
+            positive.
+          </p>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="math_functions__pow">
+          <code class="ph codeph">pow(double a, double p)</code>,
+          <code class="ph codeph" id="math_functions__power">power(double a, double p)</code>,
+          <code class="ph codeph" id="math_functions__dpow">dpow(double a, double p)</code>,
+          <code class="ph codeph" id="math_functions__fpow">fpow(double a, double p)</code>
+        </dt>
+
+        <dd class="dd">
+
+
+
+
+          <strong class="ph b">Purpose:</strong> Returns the first argument raised to the power of the second argument.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">double</code>
+          </p>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="math_functions__precision">
+          <code class="ph codeph">precision(<var class="keyword varname">numeric_expression</var>)</code>
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Computes the precision (number of decimal digits) needed to represent the type of the
+          argument expression as a <code class="ph codeph">DECIMAL</code> value.
+          <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+          <p class="p">
+            Typically used in combination with the <code class="ph codeph">scale()</code> function, to determine the appropriate
+            <code class="ph codeph">DECIMAL(<var class="keyword varname">precision</var>,<var class="keyword varname">scale</var>)</code> type to declare in a
+            <code class="ph codeph">CREATE TABLE</code> statement or <code class="ph codeph">CAST()</code> function.
+          </p>
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">int</code>
+          </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+          <div class="p">
+        The following examples demonstrate how to check the precision and scale of numeric literals or other
+        numeric expressions. Impala represents numeric literals in the smallest appropriate type. 5 is a
+        <code class="ph codeph">TINYINT</code> value, which ranges from -128 to 127, therefore 3 decimal digits are needed to
+        represent the entire range, and because it is an integer value there are no fractional digits. 1.333 is
+        interpreted as a <code class="ph codeph">DECIMAL</code> value, with 4 digits total and 3 digits after the decimal point.
+<pre class="pre codeblock"><code>[localhost:21000] &gt; select precision(5), scale(5);
++--------------+----------+
+| precision(5) | scale(5) |
++--------------+----------+
+| 3            | 0        |
++--------------+----------+
+[localhost:21000] &gt; select precision(1.333), scale(1.333);
++------------------+--------------+
+| precision(1.333) | scale(1.333) |
++------------------+--------------+
+| 4                | 3            |
++------------------+--------------+
+[localhost:21000] &gt; with t1 as
+  ( select cast(12.34 as decimal(20,2)) x union select cast(1 as decimal(8,6)) x )
+  select precision(x), scale(x) from t1 limit 1;
++--------------+----------+
+| precision(x) | scale(x) |
++--------------+----------+
+| 24           | 6        |
++--------------+----------+
+</code></pre>
+      </div>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="math_functions__quotient">
+          <code class="ph codeph">quotient(bigint numerator, bigint denominator)</code>,
+          <code class="ph codeph">quotient(double numerator, double denominator)</code>
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Returns the first argument divided by the second argument, discarding any fractional
+          part. Avoids promoting integer arguments to <code class="ph codeph">DOUBLE</code> as happens with the <code class="ph codeph">/</code> SQL
+          operator. <span class="ph">Also includes an overload that accepts <code class="ph codeph">DOUBLE</code> arguments,
+          discards the fractional part of each argument value before dividing, and again returns <code class="ph codeph">BIGINT</code>.
+          With integer arguments, this function works the same as the <code class="ph codeph">DIV</code> operator.</span>
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">bigint</code>
+          </p>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="math_functions__radians">
+          <code class="ph codeph">radians(double a)</code>
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Converts argument value from degrees to radians.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">double</code>
+          </p>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="math_functions__rand">
+          <code class="ph codeph">rand()</code>, <code class="ph codeph">rand(int seed)</code>,
+          <code class="ph codeph" id="math_functions__random">random()</code>,
+          <code class="ph codeph">random(int seed)</code>
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Returns a random value between 0 and 1. After <code class="ph codeph">rand()</code> is called with a
+          seed argument, it produces a consistent random sequence based on the seed value.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">double</code>
+          </p>
+          <p class="p">
+            <strong class="ph b">Usage notes:</strong> Currently, the random sequence is reset after each query, and multiple calls to
+            <code class="ph codeph">rand()</code> within the same query return the same value each time. For different number
+            sequences that are different for each query, pass a unique seed value to each call to
+            <code class="ph codeph">rand()</code>. For example, <code class="ph codeph">select rand(unix_timestamp()) from ...</code>
+          </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+          <p class="p">
+            The following examples show how <code class="ph codeph">rand()</code> can produce sequences of varying predictability,
+            so that you can reproduce query results involving random values or generate unique sequences of random
+            values for each query.
+            When <code class="ph codeph">rand()</code> is called with no argument, it generates the same sequence of values each time,
+            regardless of the ordering of the result set.
+            When <code class="ph codeph">rand()</code> is called with a constant integer, it generates a different sequence of values,
+            but still always the same sequence for the same seed value.
+            If you pass in a seed value that changes, such as the return value of the expression <code class="ph codeph">unix_timestamp(now())</code>,
+            each query will use a different sequence of random values, potentially more useful in probability calculations although
+            more difficult to reproduce at a later time. Therefore, the final two examples with an unpredictable seed value
+            also include the seed in the result set, to make it possible to reproduce the same random sequence later.
+          </p>
+<pre class="pre codeblock"><code>select x, rand() from three_rows;
++---+-----------------------+
+| x | rand()                |
++---+-----------------------+
+| 1 | 0.0004714746030380365 |
+| 2 | 0.5895895192351144    |
+| 3 | 0.4431900859080209    |
++---+-----------------------+
+
+select x, rand() from three_rows order by x desc;
++---+-----------------------+
+| x | rand()                |
++---+-----------------------+
+| 3 | 0.0004714746030380365 |
+| 2 | 0.5895895192351144    |
+| 1 | 0.4431900859080209    |
++---+-----------------------+
+
+select x, rand(1234) from three_rows order by x;
++---+----------------------+
+| x | rand(1234)           |
++---+----------------------+
+| 1 | 0.7377511392057646   |
+| 2 | 0.009428468537250751 |
+| 3 | 0.208117277924026    |
++---+----------------------+
+
+select x, rand(1234) from three_rows order by x desc;
++---+----------------------+
+| x | rand(1234)           |
++---+----------------------+
+| 3 | 0.7377511392057646   |
+| 2 | 0.009428468537250751 |
+| 1 | 0.208117277924026    |
++---+----------------------+
+
+select x, unix_timestamp(now()), rand(unix_timestamp(now()))
+  from three_rows order by x;
++---+-----------------------+-----------------------------+
+| x | unix_timestamp(now()) | rand(unix_timestamp(now())) |
++---+-----------------------+-----------------------------+
+| 1 | 1440777752            | 0.002051228658320023        |
+| 2 | 1440777752            | 0.5098743483004506          |
+| 3 | 1440777752            | 0.9517714925817081          |
++---+-----------------------+-----------------------------+
+
+select x, unix_timestamp(now()), rand(unix_timestamp(now()))
+  from three_rows order by x desc;
++---+-----------------------+-----------------------------+
+| x | unix_timestamp(now()) | rand(unix_timestamp(now())) |
++---+-----------------------+-----------------------------+
+| 3 | 1440777761            | 0.9985985015512437          |
+| 2 | 1440777761            | 0.3251255333074953          |
+| 1 | 1440777761            | 0.02422675025846192         |
++---+-----------------------+-----------------------------+
+</code></pre>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="math_functions__round">
+          <code class="ph codeph">round(double a)</code>,
+          <code class="ph codeph">round(double a, int d)</code>,
+          <code class="ph codeph">round(decimal a, int_type d)</code>,
+          <code class="ph codeph" id="math_functions__dround">dround(double a)</code>,
+          <code class="ph codeph">dround(double a, int d)</code>,
+          <code class="ph codeph">dround(decimal(p,s) a, int_type d)</code>
+        </dt>
+
+        <dd class="dd">
+
+
+          <strong class="ph b">Purpose:</strong> Rounds a floating-point value. By default (with a
+          single argument), rounds to the nearest integer. Values ending in .5
+          are rounded up for positive numbers, down for negative numbers (that
+          is, away from zero). The optional second argument specifies how many
+          digits to leave after the decimal point; values greater than zero
+          produce a floating-point return value rounded to the requested number
+          of digits to the right of the decimal point.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> Same as the input type
+          </p>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="math_functions__scale">
+          <code class="ph codeph">scale(<var class="keyword varname">numeric_expression</var>)</code>
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Computes the scale (number of decimal digits to the right of the decimal point) needed to
+          represent the type of the argument expression as a <code class="ph codeph">DECIMAL</code> value.
+          <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+          <p class="p">
+            Typically used in combination with the <code class="ph codeph">precision()</code> function, to determine the
+            appropriate <code class="ph codeph">DECIMAL(<var class="keyword varname">precision</var>,<var class="keyword varname">scale</var>)</code> type to
+            declare in a <code class="ph codeph">CREATE TABLE</code> statement or <code class="ph codeph">CAST()</code> function.
+          </p>
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">int</code>
+          </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+          <div class="p">
+        The following examples demonstrate how to check the precision and scale of numeric literals or other
+        numeric expressions. Impala represents numeric literals in the smallest appropriate type. 5 is a
+        <code class="ph codeph">TINYINT</code> value, which ranges from -128 to 127, therefore 3 decimal digits are needed to
+        represent the entire range, and because it is an integer value there are no fractional digits. 1.333 is
+        interpreted as a <code class="ph codeph">DECIMAL</code> value, with 4 digits total and 3 digits after the decimal point.
+<pre class="pre codeblock"><code>[localhost:21000] &gt; select precision(5), scale(5);
++--------------+----------+
+| precision(5) | scale(5) |
++--------------+----------+
+| 3            | 0        |
++--------------+----------+
+[localhost:21000] &gt; select precision(1.333), scale(1.333);
++------------------+--------------+
+| precision(1.333) | scale(1.333) |
++------------------+--------------+
+| 4                | 3            |
++------------------+--------------+
+[localhost:21000] &gt; with t1 as
+  ( select cast(12.34 as decimal(20,2)) x union select cast(1 as decimal(8,6)) x )
+  select precision(x), scale(x) from t1 limit 1;
++--------------+----------+
+| precision(x) | scale(x) |
++--------------+----------+
+| 24           | 6        |
++--------------+----------+
+</code></pre>
+      </div>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="math_functions__sign">
+          <code class="ph codeph">sign(double a)</code>
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Returns -1, 0, or 1 to indicate the signedness of the argument value.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">int</code>
+          </p>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="math_functions__sin">
+          <code class="ph codeph">sin(double a)</code>
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Returns the sine of the argument.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">double</code>
+          </p>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="math_functions__sinh">
+          <code class="ph codeph">sinh(double a)</code>
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Returns the hyperbolic sine of the argument.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">double</code>
+          </p>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="math_functions__sqrt">
+          <code class="ph codeph">sqrt(double a)</code>,
+          <code class="ph codeph" id="math_functions__dsqrt">dsqrt(double a)</code>
+        </dt>
+
+        <dd class="dd">
+
+
+          <strong class="ph b">Purpose:</strong> Returns the square root of the argument.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">double</code>
+          </p>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="math_functions__tan">
+          <code class="ph codeph">tan(double a)</code>
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Returns the tangent of the argument.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">double</code>
+          </p>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="math_functions__tanh">
+          <code class="ph codeph">tanh(double a)</code>
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Returns the hyperbolic tangent of the argument.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">double</code>
+          </p>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="math_functions__truncate">
+          <code class="ph codeph">truncate(double_or_decimal a[, digits_to_leave])</code>,
+          <span class="ph" id="math_functions__dtrunc"><code class="ph codeph">dtrunc(double_or_decimal a[, digits_to_leave])</code></span>,
+          <span class="ph" id="math_functions__trunc_number"><code class="ph codeph">trunc(double_or_decimal a[, digits_to_leave])</code></span>
+        </dt>
+
+        <dd class="dd">
+
+
+
+          <strong class="ph b">Purpose:</strong> Removes some or all fractional digits from a numeric value.
+          <p class="p">
+            <strong class="ph b">Arguments:</strong>
+            With a single floating-point argument, removes all fractional digits, leaving an
+            integer value. The optional second argument specifies the number of fractional digits
+            to include in the return value, and only applies when the argument type is
+            <code class="ph codeph">DECIMAL</code>. A second argument of 0 truncates to a whole integer value.
+            A second argument of negative N sets N digits to 0 on the left side of the decimal
+          </p>
+          <p class="p">
+            <strong class="ph b">Scale argument:</strong> The scale argument applies only when truncating
+            <code class="ph codeph">DECIMAL</code> values. It is an integer specifying how many
+            significant digits to leave to the right of the decimal point.
+            A scale argument of 0 truncates to a whole integer value. A scale
+            argument of negative N sets N digits to 0 on the left side of the decimal
+            point.
+          </p>
+          <p class="p">
+            <code class="ph codeph">truncate()</code>, <code class="ph codeph">dtrunc()</code>,
+            <span class="ph">and <code class="ph codeph">trunc()</code></span> are aliases for the
+            same function.
+          </p>
+          <p class="p">
+            <strong class="ph b">Return type:</strong> Same as the input type
+          </p>
+          <p class="p">
+            <strong class="ph b">Added in:</strong> The <code class="ph codeph">trunc()</code> alias was added in
+            <span class="keyword">Impala 2.10</span>.
+          </p>
+          <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+          <p class="p">
+            You can also pass a <code class="ph codeph">DOUBLE</code> argument, or <code class="ph codeph">DECIMAL</code>
+            argument with optional scale, to the <code class="ph codeph">dtrunc()</code> or
+            <code class="ph codeph">truncate</code> functions. Using the <code class="ph codeph">trunc()</code>
+            function for numeric values is common with other industry-standard database
+            systems, so you might find such <code class="ph codeph">trunc()</code> calls in code that you
+            are porting to Impala.
+          </p>
+          <p class="p">
+            The <code class="ph codeph">trunc()</code> function also has a signature that applies to
+            <code class="ph codeph">TIMESTAMP</code> values. See <a class="xref" href="impala_datetime_functions.html">Impala Date and Time Functions</a>
+            for details.
+          </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+          <p class="p">
+            The following examples demonstrate the <code class="ph codeph">truncate()</code>
+            and <code class="ph codeph">dtrunc()</code> signatures for this function:
+          </p>
+<pre class="pre codeblock"><code>select truncate(3.45);
++----------------+
+| truncate(3.45) |
++----------------+
+| 3              |
++----------------+
+
+select truncate(-3.45);
++-----------------+
+| truncate(-3.45) |
++-----------------+
+| -3              |
++-----------------+
+
+select truncate(3.456,1);
++--------------------+
+| truncate(3.456, 1) |
++--------------------+
+| 3.4                |
++--------------------+
+
+select dtrunc(3.456,1);
++------------------+
+| dtrunc(3.456, 1) |
++------------------+
+| 3.4              |
++------------------+
+
+select truncate(3.456,2);
++--------------------+
+| truncate(3.456, 2) |
++--------------------+
+| 3.45               |
++--------------------+
+
+select truncate(3.456,7);
++--------------------+
+| truncate(3.456, 7) |
++--------------------+
+| 3.4560000          |
++--------------------+
+</code></pre>
+          <p class="p">
+            The following examples demonstrate using <code class="ph codeph">trunc()</code> with
+            <code class="ph codeph">DECIMAL</code> or <code class="ph codeph">DOUBLE</code> values, and with
+            an optional scale argument for <code class="ph codeph">DECIMAL</code> values.
+            (The behavior is the same for the <code class="ph codeph">truncate()</code> and
+            <code class="ph codeph">dtrunc()</code> aliases also.)
+          </p>
+<pre class="pre codeblock"><code>
+create table t1 (d decimal(20,7));
+
+-- By default, no digits to the right of the decimal point.
+insert into t1 values (1.1), (2.22), (3.333), (4.4444), (5.55555);
+select trunc(d) from t1 order by d;
++----------+
+| trunc(d) |
++----------+
+| 1        |
+| 2        |
+| 3        |
+| 4        |
+| 5        |
++----------+
+
+-- 1 digit to the right of the decimal point.
+select trunc(d,1) from t1 order by d;
++-------------+
+| trunc(d, 1) |
++-------------+
+| 1.1         |
+| 2.2         |
+| 3.3         |
+| 4.4         |
+| 5.5         |
++-------------+
+
+-- 2 digits to the right of the decimal point,
+-- including trailing zeroes if needed.
+select trunc(d,2) from t1 order by d;
++-------------+
+| trunc(d, 2) |
++-------------+
+| 1.10        |
+| 2.22        |
+| 3.33        |
+| 4.44        |
+| 5.55        |
++-------------+
+
+insert into t1 values (9999.9999), (8888.8888);
+
+-- Negative scale truncates digits to the left
+-- of the decimal point.
+select trunc(d,-2) from t1 where d &gt; 100 order by d;
++--------------+
+| trunc(d, -2) |
++--------------+
+| 8800         |
+| 9900         |
++--------------+
+
+-- The scale of the result is adjusted to match the
+-- scale argument.
+select trunc(d,2),
+  precision(trunc(d,2)) as p,
+  scale(trunc(d,2)) as s
+from t1 order by d;
++-------------+----+---+
+| trunc(d, 2) | p  | s |
++-------------+----+---+
+| 1.10        | 15 | 2 |
+| 2.22        | 15 | 2 |
+| 3.33        | 15 | 2 |
+| 4.44        | 15 | 2 |
+| 5.55        | 15 | 2 |
+| 8888.88     | 15 | 2 |
+| 9999.99     | 15 | 2 |
++-------------+----+---+
+</code></pre>
+
+<pre class="pre codeblock"><code>
+create table dbl (d double);
+
+insert into dbl values
+  (1.1), (2.22), (3.333), (4.4444), (5.55555),
+  (8888.8888), (9999.9999);
+
+-- With double values, there is no optional scale argument.
+select trunc(d) from dbl order by d;
++----------+
+| trunc(d) |
++----------+
+| 1        |
+| 2        |
+| 3        |
+| 4        |
+| 5        |
+| 8888     |
+| 9999     |
++----------+
+</code></pre>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="math_functions__unhex">
+          <code class="ph codeph">unhex(string a)</code>
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Returns a string of characters with ASCII values corresponding to pairs of hexadecimal
+          digits in the argument.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+          </p>
+        </dd>
+
+
+    </dl>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_functions.html">Impala Built-In Functions</a></div></div></nav></article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_max.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_max.html b/docs/build3x/html/topics/impala_max.html
new file mode 100644
index 0000000..00c6e64
--- /dev/null
+++ b/docs/build3x/html/topics/impala_max.html
@@ -0,0 +1,298 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_aggregate_functions.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="max"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>MAX Function</title></head><body id="max"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">MAX Function</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+
+      An aggregate function that returns the maximum value from a set of numbers. Opposite of the
+      <code class="ph codeph">MIN</code> function. Its single argument can be numeric column, or the numeric result of a function
+      or expression applied to the column value. Rows with a <code class="ph codeph">NULL</code> value for the specified column
+      are ignored. If the table is empty, or all the values supplied to <code class="ph codeph">MAX</code> are
+      <code class="ph codeph">NULL</code>, <code class="ph codeph">MAX</code> returns <code class="ph codeph">NULL</code>.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>MAX([DISTINCT | ALL] <var class="keyword varname">expression</var>) [OVER (<var class="keyword varname">analytic_clause</var>)]</code></pre>
+
+    <p class="p">
+      When the query contains a <code class="ph codeph">GROUP BY</code> clause, returns one value for each combination of
+      grouping values.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Restrictions:</strong> In Impala 2.0 and higher, this function can be used as an analytic function, but with restrictions on any window clause.
+        For <code class="ph codeph">MAX()</code> and <code class="ph codeph">MIN()</code>, the window clause is only allowed if the start
+        bound is <code class="ph codeph">UNBOUNDED PRECEDING</code>.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Return type:</strong> Same as the input value, except for <code class="ph codeph">CHAR</code> and <code class="ph codeph">VARCHAR</code>
+        arguments which produce a <code class="ph codeph">STRING</code> result
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+    <p class="p">
+        If you frequently run aggregate functions such as <code class="ph codeph">MIN()</code>, <code class="ph codeph">MAX()</code>, and
+        <code class="ph codeph">COUNT(DISTINCT)</code> on partition key columns, consider enabling the <code class="ph codeph">OPTIMIZE_PARTITION_KEY_SCANS</code>
+        query option, which optimizes such queries. This feature is available in <span class="keyword">Impala 2.5</span> and higher.
+        See <a class="xref" href="../shared/../topics/impala_optimize_partition_key_scans.html">OPTIMIZE_PARTITION_KEY_SCANS Query Option (Impala 2.5 or higher only)</a>
+        for the kinds of queries that this option applies to, and slight differences in how partitions are
+        evaluated when this query option is enabled.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Complex type considerations:</strong>
+      </p>
+
+    <p class="p">
+        To access a column with a complex type (<code class="ph codeph">ARRAY</code>, <code class="ph codeph">STRUCT</code>, or <code class="ph codeph">MAP</code>)
+        in an aggregation function, you unpack the individual elements using join notation in the query,
+        and then apply the function to the final scalar item, field, key, or value at the bottom of any nested type hierarchy in the column.
+        See <a class="xref" href="../shared/../topics/impala_complex_types.html#complex_types">Complex Types (Impala 2.3 or higher only)</a> for details about using complex types in Impala.
+      </p>
+
+    <div class="p">
+The following example demonstrates calls to several aggregation functions
+using values from a column containing nested complex types
+(an <code class="ph codeph">ARRAY</code> of <code class="ph codeph">STRUCT</code> items).
+The array is unpacked inside the query using join notation.
+The array elements are referenced using the <code class="ph codeph">ITEM</code>
+pseudocolumn, and the structure fields inside the array elements
+are referenced using dot notation.
+Numeric values such as <code class="ph codeph">SUM()</code> and <code class="ph codeph">AVG()</code>
+are computed using the numeric <code class="ph codeph">R_NATIONKEY</code> field, and
+the general-purpose <code class="ph codeph">MAX()</code> and <code class="ph codeph">MIN()</code>
+values are computed from the string <code class="ph codeph">N_NAME</code> field.
+<pre class="pre codeblock"><code>describe region;
++-------------+-------------------------+---------+
+| name        | type                    | comment |
++-------------+-------------------------+---------+
+| r_regionkey | smallint                |         |
+| r_name      | string                  |         |
+| r_comment   | string                  |         |
+| r_nations   | array&lt;struct&lt;           |         |
+|             |   n_nationkey:smallint, |         |
+|             |   n_name:string,        |         |
+|             |   n_comment:string      |         |
+|             | &gt;&gt;                      |         |
++-------------+-------------------------+---------+
+
+select r_name, r_nations.item.n_nationkey
+  from region, region.r_nations as r_nations
+order by r_name, r_nations.item.n_nationkey;
++-------------+------------------+
+| r_name      | item.n_nationkey |
++-------------+------------------+
+| AFRICA      | 0                |
+| AFRICA      | 5                |
+| AFRICA      | 14               |
+| AFRICA      | 15               |
+| AFRICA      | 16               |
+| AMERICA     | 1                |
+| AMERICA     | 2                |
+| AMERICA     | 3                |
+| AMERICA     | 17               |
+| AMERICA     | 24               |
+| ASIA        | 8                |
+| ASIA        | 9                |
+| ASIA        | 12               |
+| ASIA        | 18               |
+| ASIA        | 21               |
+| EUROPE      | 6                |
+| EUROPE      | 7                |
+| EUROPE      | 19               |
+| EUROPE      | 22               |
+| EUROPE      | 23               |
+| MIDDLE EAST | 4                |
+| MIDDLE EAST | 10               |
+| MIDDLE EAST | 11               |
+| MIDDLE EAST | 13               |
+| MIDDLE EAST | 20               |
++-------------+------------------+
+
+select
+  r_name,
+  count(r_nations.item.n_nationkey) as count,
+  sum(r_nations.item.n_nationkey) as sum,
+  avg(r_nations.item.n_nationkey) as avg,
+  min(r_nations.item.n_name) as minimum,
+  max(r_nations.item.n_name) as maximum,
+  ndv(r_nations.item.n_nationkey) as distinct_vals
+from
+  region, region.r_nations as r_nations
+group by r_name
+order by r_name;
++-------------+-------+-----+------+-----------+----------------+---------------+
+| r_name      | count | sum | avg  | minimum   | maximum        | distinct_vals |
++-------------+-------+-----+------+-----------+----------------+---------------+
+| AFRICA      | 5     | 50  | 10   | ALGERIA   | MOZAMBIQUE     | 5             |
+| AMERICA     | 5     | 47  | 9.4  | ARGENTINA | UNITED STATES  | 5             |
+| ASIA        | 5     | 68  | 13.6 | CHINA     | VIETNAM        | 5             |
+| EUROPE      | 5     | 77  | 15.4 | FRANCE    | UNITED KINGDOM | 5             |
+| MIDDLE EAST | 5     | 58  | 11.6 | EGYPT     | SAUDI ARABIA   | 5             |
++-------------+-------+-----+------+-----------+----------------+---------------+
+</code></pre>
+</div>
+
+    <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>-- Find the largest value for this column in the table.
+select max(c1) from t1;
+-- Find the largest value for this column from a subset of the table.
+select max(c1) from t1 where month = 'January' and year = '2013';
+-- Find the largest value from a set of numeric function results.
+select max(length(s)) from t1;
+-- Can also be used in combination with DISTINCT and/or GROUP BY.
+-- Return more than one result.
+select month, year, max(purchase_price) from store_stats group by month, year;
+-- Filter the input to eliminate duplicates before performing the calculation.
+select max(distinct x) from t1;
+</code></pre>
+
+    <div class="p">
+      The following examples show how to use <code class="ph codeph">MAX()</code> in an analytic context. They use a table
+      containing integers from 1 to 10. Notice how the <code class="ph codeph">MAX()</code> is reported for each input value, as
+      opposed to the <code class="ph codeph">GROUP BY</code> clause which condenses the result set.
+<pre class="pre codeblock"><code>select x, property, max(x) over (partition by property) as max from int_t where property in ('odd','even');
++----+----------+-----+
+| x  | property | max |
++----+----------+-----+
+| 2  | even     | 10  |
+| 4  | even     | 10  |
+| 6  | even     | 10  |
+| 8  | even     | 10  |
+| 10 | even     | 10  |
+| 1  | odd      | 9   |
+| 3  | odd      | 9   |
+| 5  | odd      | 9   |
+| 7  | odd      | 9   |
+| 9  | odd      | 9   |
++----+----------+-----+
+</code></pre>
+
+Adding an <code class="ph codeph">ORDER BY</code> clause lets you experiment with results that are cumulative or apply to a moving
+set of rows (the <span class="q">"window"</span>). The following examples use <code class="ph codeph">MAX()</code> in an analytic context
+(that is, with an <code class="ph codeph">OVER()</code> clause) to display the smallest value of <code class="ph codeph">X</code>
+encountered up to each row in the result set. The examples use two columns in the <code class="ph codeph">ORDER BY</code>
+clause to produce a sequence of values that rises and falls, to illustrate how the <code class="ph codeph">MAX()</code>
+result only increases or stays the same throughout each partition within the result set.
+The basic <code class="ph codeph">ORDER BY x</code> clause implicitly
+activates a window clause of <code class="ph codeph">RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW</code>,
+which is effectively the same as <code class="ph codeph">ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW</code>,
+therefore all of these examples produce the same results:
+
+<pre class="pre codeblock"><code>select x, property,
+  max(x) <strong class="ph b">over (order by property, x desc)</strong> as 'maximum to this point'
+from int_t where property in ('prime','square');
++---+----------+-----------------------+
+| x | property | maximum to this point |
++---+----------+-----------------------+
+| 7 | prime    | 7                     |
+| 5 | prime    | 7                     |
+| 3 | prime    | 7                     |
+| 2 | prime    | 7                     |
+| 9 | square   | 9                     |
+| 4 | square   | 9                     |
+| 1 | square   | 9                     |
++---+----------+-----------------------+
+
+select x, property,
+  max(x) over
+  (
+    <strong class="ph b">order by property, x desc</strong>
+    <strong class="ph b">rows between unbounded preceding and current row</strong>
+  ) as 'maximum to this point'
+from int_t where property in ('prime','square');
++---+----------+-----------------------+
+| x | property | maximum to this point |
++---+----------+-----------------------+
+| 7 | prime    | 7                     |
+| 5 | prime    | 7                     |
+| 3 | prime    | 7                     |
+| 2 | prime    | 7                     |
+| 9 | square   | 9                     |
+| 4 | square   | 9                     |
+| 1 | square   | 9                     |
++---+----------+-----------------------+
+
+select x, property,
+  max(x) over
+  (
+    <strong class="ph b">order by property, x desc</strong>
+    <strong class="ph b">range between unbounded preceding and current row</strong>
+  ) as 'maximum to this point'
+from int_t where property in ('prime','square');
++---+----------+-----------------------+
+| x | property | maximum to this point |
++---+----------+-----------------------+
+| 7 | prime    | 7                     |
+| 5 | prime    | 7                     |
+| 3 | prime    | 7                     |
+| 2 | prime    | 7                     |
+| 9 | square   | 9                     |
+| 4 | square   | 9                     |
+| 1 | square   | 9                     |
++---+----------+-----------------------+
+</code></pre>
+
+The following examples show how to construct a moving window, with a running maximum taking into account all rows before
+and 1 row after the current row.
+Because of a restriction in the Impala <code class="ph codeph">RANGE</code> syntax, this type of
+moving window is possible with the <code class="ph codeph">ROWS BETWEEN</code> clause but not the <code class="ph codeph">RANGE BETWEEN</code> clause.
+Because of an extra Impala restriction on the <code class="ph codeph">MAX()</code> and <code class="ph codeph">MIN()</code> functions in an
+analytic context, the lower bound must be <code class="ph codeph">UNBOUNDED PRECEDING</code>.
+<pre class="pre codeblock"><code>select x, property,
+  max(x) over
+  (
+    <strong class="ph b">order by property, x</strong>
+    <strong class="ph b">rows between unbounded preceding and 1 following</strong>
+  ) as 'local maximum'
+from int_t where property in ('prime','square');
++---+----------+---------------+
+| x | property | local maximum |
++---+----------+---------------+
+| 2 | prime    | 3             |
+| 3 | prime    | 5             |
+| 5 | prime    | 7             |
+| 7 | prime    | 7             |
+| 1 | square   | 7             |
+| 4 | square   | 9             |
+| 9 | square   | 9             |
++---+----------+---------------+
+
+-- Doesn't work because of syntax restriction on RANGE clause.
+select x, property,
+  max(x) over
+  (
+    <strong class="ph b">order by property, x</strong>
+    <strong class="ph b">range between unbounded preceding and 1 following</strong>
+  ) as 'local maximum'
+from int_t where property in ('prime','square');
+ERROR: AnalysisException: RANGE is only supported with both the lower and upper bounds UNBOUNDED or one UNBOUNDED and the other CURRENT ROW.
+</code></pre>
+    </div>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+    <p class="p">
+      <a class="xref" href="impala_analytic_functions.html#analytic_functions">Impala Analytic Functions</a>, <a class="xref" href="impala_min.html#min">MIN Function</a>,
+      <a class="xref" href="impala_avg.html#avg">AVG Function</a>
+    </p>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_aggregate_functions.html">Impala Aggregate Functions</a></div></div></nav></article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_max_errors.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_max_errors.html b/docs/build3x/html/topics/impala_max_errors.html
new file mode 100644
index 0000000..0773474
--- /dev/null
+++ b/docs/build3x/html/topics/impala_max_errors.html
@@ -0,0 +1,40 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="max_errors"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>MAX_ERRORS Query Option</title></head><body id="max_errors"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">MAX_ERRORS Query Option</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+
+      Maximum number of non-fatal errors for any particular query that are recorded in the Impala log file. For
+      example, if a billion-row table had a non-fatal data error in every row, you could diagnose the problem
+      without all billion errors being logged. Unspecified or 0 indicates the built-in default value of 1000.
+    </p>
+
+    <p class="p">
+      This option only controls how many errors are reported. To specify whether Impala continues or halts when it
+      encounters such errors, use the <code class="ph codeph">ABORT_ON_ERROR</code> option.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Type:</strong> numeric
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Default:</strong> 0 (meaning 1000 errors)
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+    <p class="p">
+      <a class="xref" href="impala_abort_on_error.html#abort_on_error">ABORT_ON_ERROR Query Option</a>,
+      <a class="xref" href="impala_logging.html#logging">Using Impala Logging</a>
+    </p>
+
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_max_num_runtime_filters.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_max_num_runtime_filters.html b/docs/build3x/html/topics/impala_max_num_runtime_filters.html
new file mode 100644
index 0000000..8e728e8
--- /dev/null
+++ b/docs/build3x/html/topics/impala_max_num_runtime_filters.html
@@ -0,0 +1,75 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="max_num_runtime_filters"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>MAX_NUM_RUNTIME_FILTERS Query Option (Impala 2.5 or higher only)</title></head><body id="max_num_runtime_filters"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">MAX_NUM_RUNTIME_FILTERS Query Option (<span class="keyword">Impala 2.5</span> or higher only)</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+
+      The <code class="ph codeph">MAX_NUM_RUNTIME_FILTERS</code> query option
+      sets an upper limit on the number of runtime filters that can be produced for each query.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Type:</strong> integer
+      </p>
+
+    <p class="p">
+      <strong class="ph b">Default:</strong> 10
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.5.0</span>
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+    <p class="p">
+      Each runtime filter imposes some memory overhead on the query.
+      Depending on the setting of the <code class="ph codeph">RUNTIME_BLOOM_FILTER_SIZE</code>
+      query option, each filter might consume between 1 and 16 megabytes
+      per plan fragment. There are typically 5 or fewer filters per plan fragment.
+    </p>
+
+    <p class="p">
+      Impala evaluates the effectiveness of each filter, and keeps the
+      ones that eliminate the largest number of partitions or rows.
+      Therefore, this setting can protect against
+      potential problems due to excessive memory overhead for filter production,
+      while still allowing a high level of optimization for suitable queries.
+    </p>
+
+    <p class="p">
+        Because the runtime filtering feature applies mainly to resource-intensive
+        and long-running queries, only adjust this query option when tuning long-running queries
+        involving some combination of large partitioned tables and joins involving large tables.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Kudu considerations:</strong>
+      </p>
+
+    <p class="p">
+        This query option affects only Bloom filters, not the min/max filters
+        that are applied to Kudu tables. Therefore, it does not affect the
+        performance of queries against Kudu tables.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+    <p class="p">
+      <a class="xref" href="impala_runtime_filtering.html">Runtime Filtering for Impala Queries (Impala 2.5 or higher only)</a>,
+
+      <a class="xref" href="impala_runtime_bloom_filter_size.html#runtime_bloom_filter_size">RUNTIME_BLOOM_FILTER_SIZE Query Option (Impala 2.5 or higher only)</a>,
+      <a class="xref" href="impala_runtime_filter_mode.html#runtime_filter_mode">RUNTIME_FILTER_MODE Query Option (Impala 2.5 or higher only)</a>
+    </p>
+
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>

[03/51] [partial] impala git commit: [DOCS] Impala doc site update for 3.0

Posted by mi...@apache.org.

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_show.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_show.html b/docs/build3x/html/topics/impala_show.html
new file mode 100644
index 0000000..6683296
--- /dev/null
+++ b/docs/build3x/html/topics/impala_show.html
@@ -0,0 +1,1525 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_langref_sql.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" 
 content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="show"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>SHOW Statement</title></head><body id="show"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">SHOW Statement</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+
+      The <code class="ph codeph">SHOW</code> statement is a flexible way to get information about different types of Impala
+      objects.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>SHOW DATABASES [[LIKE] '<var class="keyword varname">pattern</var>']
+SHOW SCHEMAS [[LIKE] '<var class="keyword varname">pattern</var>'] - an alias for SHOW DATABASES
+SHOW TABLES [IN <var class="keyword varname">database_name</var>] [[LIKE] '<var class="keyword varname">pattern</var>']
+<span class="ph">SHOW [AGGREGATE | ANALYTIC] FUNCTIONS [IN <var class="keyword varname">database_name</var>] [[LIKE] '<var class="keyword varname">pattern</var>']</span>
+<span class="ph">SHOW CREATE TABLE [<var class="keyword varname">database_name</var>].<var class="keyword varname">table_name</var></span>
+<span class="ph">SHOW TABLE STATS [<var class="keyword varname">database_name</var>.]<var class="keyword varname">table_name</var></span>
+<span class="ph">SHOW COLUMN STATS [<var class="keyword varname">database_name</var>.]<var class="keyword varname">table_name</var></span>
+<span class="ph">SHOW PARTITIONS [<var class="keyword varname">database_name</var>.]<var class="keyword varname">table_name</var></span>
+<span class="ph">SHOW <span class="ph">[RANGE]</span> PARTITIONS [<var class="keyword varname">database_name</var>.]<var class="keyword varname">table_name</var></span>
+SHOW FILES IN [<var class="keyword varname">database_name</var>.]<var class="keyword varname">table_name</var> <span class="ph">[PARTITION (<var class="keyword varname">key_col_expression</var> [, <var class="keyword varname">key_col_expression</var>]</span>]
+
+<span class="ph">SHOW ROLES
+SHOW CURRENT ROLES
+SHOW ROLE GRANT GROUP <var class="keyword varname">group_name</var>
+SHOW GRANT ROLE <var class="keyword varname">role_name</var></span>
+</code></pre>
+
+
+
+
+
+
+
+    <p class="p">
+      Issue a <code class="ph codeph">SHOW <var class="keyword varname">object_type</var></code> statement to see the appropriate objects in the
+      current database, or <code class="ph codeph">SHOW <var class="keyword varname">object_type</var> IN <var class="keyword varname">database_name</var></code>
+      to see objects in a specific database.
+    </p>
+
+    <p class="p">
+      The optional <var class="keyword varname">pattern</var> argument is a quoted string literal, using Unix-style
+      <code class="ph codeph">*</code> wildcards and allowing <code class="ph codeph">|</code> for alternation. The preceding
+      <code class="ph codeph">LIKE</code> keyword is also optional. All object names are stored in lowercase, so use all
+      lowercase letters in the pattern string. For example:
+    </p>
+
+<pre class="pre codeblock"><code>show databases 'a*';
+show databases like 'a*';
+show tables in some_db like '*fact*';
+use some_db;
+show tables '*dim*|*fact*';</code></pre>
+
+    <p class="p">
+        <strong class="ph b">Cancellation:</strong> Cannot be cancelled.
+      </p>
+
+    <p class="p toc inpage"></p>
+
+  </div>
+
+  <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_langref_sql.html">Impala SQL Statements</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="show__show_files">
+
+    <h2 class="title topictitle2" id="ariaid-title2">SHOW FILES Statement</h2>
+
+
+    <div class="body conbody">
+
+      <p class="p">
+        The <code class="ph codeph">SHOW FILES</code> statement displays the files that constitute a specified table,
+        or a partition within a partitioned table. This syntax is available in <span class="keyword">Impala 2.2</span> and higher
+        only. The output includes the names of the files, the size of each file, and the applicable partition
+        for a partitioned table. The size includes a suffix of <code class="ph codeph">B</code> for bytes,
+        <code class="ph codeph">MB</code> for megabytes, and <code class="ph codeph">GB</code> for gigabytes.
+      </p>
+
+      <div class="p">
+        In <span class="keyword">Impala 2.8</span> and higher, you can use general
+        expressions with operators such as <code class="ph codeph">&lt;</code>, <code class="ph codeph">IN</code>,
+        <code class="ph codeph">LIKE</code>, and <code class="ph codeph">BETWEEN</code> in the <code class="ph codeph">PARTITION</code>
+        clause, instead of only equality operators. For example:
+<pre class="pre codeblock"><code>
+show files in sample_table partition (j &lt; 5);
+show files in sample_table partition (k = 3, l between 1 and 10);
+show files in sample_table partition (month like 'J%');
+
+</code></pre>
+      </div>
+
+      <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+        This statement applies to tables and partitions stored on HDFS, or in the Amazon Simple Storage System (S3).
+        It does not apply to views.
+        It does not apply to tables mapped onto HBase <span class="ph">or Kudu</span>,
+        because those data management systems do not use the same file-based storage layout.
+      </div>
+
+      <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+      <p class="p">
+        You can use this statement to verify the results of your ETL process: that is, that
+        the expected files are present, with the expected sizes. You can examine the file information
+        to detect conditions such as empty files, missing files, or inefficient layouts due to
+        a large number of small files. When you use <code class="ph codeph">INSERT</code> statements to copy
+        from one table to another, you can see how the file layout changes due to file format
+        conversions, compaction of small input files into large data blocks, and
+        multiple output files from parallel queries and partitioned inserts.
+      </p>
+
+      <p class="p">
+        The output from this statement does not include files that Impala considers to be hidden
+        or invisible, such as those whose names start with a dot or an underscore, or that
+        end with the suffixes <code class="ph codeph">.copying</code> or <code class="ph codeph">.tmp</code>.
+      </p>
+
+      <p class="p">
+        The information for partitioned tables complements the output of the <code class="ph codeph">SHOW PARTITIONS</code>
+        statement, which summarizes information about each partition. <code class="ph codeph">SHOW PARTITIONS</code>
+        produces some output for each partition, while <code class="ph codeph">SHOW FILES</code> does not
+        produce any output for empty partitions because they do not include any data files.
+      </p>
+
+
+      <p class="p">
+        <strong class="ph b">HDFS permissions:</strong>
+      </p>
+      <p class="p">
+        The user ID that the <span class="keyword cmdname">impalad</span> daemon runs under,
+        typically the <code class="ph codeph">impala</code> user, must have read
+        permission for all the table files, read and execute permission for all the directories that make up the table,
+        and execute permission for the database directory and all its parent directories.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+      <p class="p">
+        The following example shows a <code class="ph codeph">SHOW FILES</code> statement
+        for an unpartitioned table using text format:
+      </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; create table unpart_text (x bigint, s string);
+[localhost:21000] &gt; insert into unpart_text (x, s) select id, name
+                  &gt; from oreilly.sample_data limit 20e6;
+[localhost:21000] &gt; show files in unpart_text;
++------------------------------------------------------------------------------+----------+-----------+
+| path                                                                         | size     | partition |
++------------------------------------------------------------------------------+----------+-----------+
+| hdfs://<var class="keyword varname">impala_data_dir</var>/show_files.db/unpart_text/35665776ef85cfaf_1012432410_data.0. | 448.31MB |           |
++------------------------------------------------------------------------------+----------+-----------+
+[localhost:21000] &gt; insert into unpart_text (x, s) select id, name from oreilly.sample_data limit 100e6;
+[localhost:21000] &gt; show files in unpart_text;
++--------------------------------------------------------------------------------------+----------+-----------+
+| path                                                                                 | size     | partition |
++--------------------------------------------------------------------------------------+----------+-----------+
+| hdfs://<var class="keyword varname">impala_data_dir</var>/show_files.db/unpart_text/35665776ef85cfaf_1012432410_data.0. | 448.31MB |           |
+| hdfs://<var class="keyword varname">impala_data_dir</var>/show_files.db/unpart_text/ac3dba252a8952b8_1663177415_data.0. | 2.19GB   |           |
++--------------------------------------------------------------------------------------+----------+-----------+
+</code></pre>
+
+      <p class="p">
+        This example illustrates how, after issuing some <code class="ph codeph">INSERT ... VALUES</code> statements,
+        the table now contains some tiny files of just a few bytes. Such small files could cause inefficient processing of
+        parallel queries that are expecting multi-megabyte input files. The example shows how you might compact the small files by doing
+        an <code class="ph codeph">INSERT ... SELECT</code> into a different table, possibly converting the data to Parquet in the process:
+      </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; insert into unpart_text values (10,'hello'), (20, 'world');
+[localhost:21000] &gt; insert into unpart_text values (-1,'foo'), (-1000, 'bar');
+[localhost:21000] &gt; show files in unpart_text;
++--------------------------------------------------------------------------------------+----------+
+| path                                                                                 | size     |
++--------------------------------------------------------------------------------------+----------+
+| hdfs://<var class="keyword varname">impala_data_dir</var>/show_files.db/unpart_text/4f11b8bdf8b6aa92_238145083_data.0.  | 18B
+| hdfs://<var class="keyword varname">impala_data_dir</var>/show_files.db/unpart_text/35665776ef85cfaf_1012432410_data.0. | 448.31MB
+| hdfs://<var class="keyword varname">impala_data_dir</var>/show_files.db/unpart_text/ac3dba252a8952b8_1663177415_data.0. | 2.19GB
+| hdfs://<var class="keyword varname">impala_data_dir</var>/show_files.db/unpart_text/cfb8252452445682_1868457216_data.0. | 17B
++--------------------------------------------------------------------------------------+----------+
+[localhost:21000] &gt; create table unpart_parq stored as parquet as select * from unpart_text;
++---------------------------+
+| summary                   |
++---------------------------+
+| Inserted 120000002 row(s) |
++---------------------------+
+[localhost:21000] &gt; show files in unpart_parq;
++------------------------------------------------------------------------------------------+----------+
+| path                                                                                     | size     |
++------------------------------------------------------------------------------------------+----------+
+| hdfs://<var class="keyword varname">impala_data_dir</var>/show_files.db/unpart_parq/60798d96ba630184_549959007_data.0.parq  | 255.36MB |
+| hdfs://<var class="keyword varname">impala_data_dir</var>/show_files.db/unpart_parq/60798d96ba630184_549959007_data.1.parq  | 178.52MB |
+| hdfs://<var class="keyword varname">impala_data_dir</var>/show_files.db/unpart_parq/60798d96ba630185_549959007_data.0.parq  | 255.37MB |
+| hdfs://<var class="keyword varname">impala_data_dir</var>/show_files.db/unpart_parq/60798d96ba630185_549959007_data.1.parq  | 57.71MB  |
+| hdfs://<var class="keyword varname">impala_data_dir</var>/show_files.db/unpart_parq/60798d96ba630186_2141167244_data.0.parq | 255.40MB |
+| hdfs://<var class="keyword varname">impala_data_dir</var>/show_files.db/unpart_parq/60798d96ba630186_2141167244_data.1.parq | 175.52MB |
+| hdfs://<var class="keyword varname">impala_data_dir</var>/show_files.db/unpart_parq/60798d96ba630187_1006832086_data.0.parq | 255.40MB |
+| hdfs://<var class="keyword varname">impala_data_dir</var>/show_files.db/unpart_parq/60798d96ba630187_1006832086_data.1.parq | 214.61MB |
++------------------------------------------------------------------------------------------+----------+
+</code></pre>
+
+      <p class="p">
+        The following example shows a <code class="ph codeph">SHOW FILES</code> statement for a partitioned text table
+        with data in two different partitions, and two empty partitions.
+        The partitions with no data are not represented in the <code class="ph codeph">SHOW FILES</code> output.
+      </p>
+<pre class="pre codeblock"><code>[localhost:21000] &gt; create table part_text (x bigint, y int, s string)
+                                        &gt; partitioned by (year bigint, month bigint, day bigint);
+[localhost:21000] &gt; insert overwrite part_text (x, y, s) partition (year=2014,month=1,day=1)
+                  &gt; select id, val, name from oreilly.normalized_parquet
+where id between 1 and 1000000;
+[localhost:21000] &gt; insert overwrite part_text (x, y, s) partition (year=2014,month=1,day=2)
+                  &gt; select id, val, name from oreilly.normalized_parquet
+                  &gt; where id between 1000001 and 2000000;
+[localhost:21000] &gt; alter table part_text add partition (year=2014,month=1,day=3);
+[localhost:21000] &gt; alter table part_text add partition (year=2014,month=1,day=4);
+[localhost:21000] &gt; show partitions part_text;
++-------+-------+-----+-------+--------+---------+--------------+-------------------+--------+-------------------+
+| year  | month | day | #Rows | #Files | Size    | Bytes Cached | Cache Replication | Format | Incremental stats |
++-------+-------+-----+-------+--------+---------+--------------+-------------------+--------+-------------------+
+| 2014  | 1     | 1   | -1    | 4      | 25.16MB | NOT CACHED   | NOT CACHED        | TEXT   | false             |
+| 2014  | 1     | 2   | -1    | 4      | 26.22MB | NOT CACHED   | NOT CACHED        | TEXT   | false             |
+| 2014  | 1     | 3   | -1    | 0      | 0B      | NOT CACHED   | NOT CACHED        | TEXT   | false             |
+| 2014  | 1     | 4   | -1    | 0      | 0B      | NOT CACHED   | NOT CACHED        | TEXT   | false             |
+| Total |       |     | -1    | 8      | 51.38MB | 0B           |                   |        |                   |
++-------+-------+-----+-------+--------+---------+--------------+-------------------+--------+-------------------+
+[localhost:21000] &gt; show files in part_text;
++---------------------------------------------------------------------------------------------------------+--------+-------------------------+
+| path                                                                                                    | size   | partition               |
++---------------------------------------------------------------------------------------------------------+--------+-------------------------+
+| hdfs://<var class="keyword varname">impala_data_dir</var>/show_files.db/part_text/year=2014/month=1/day=1/80732d9dc80689f_1418645991_data.0.  | 5.77MB | year=2014/month=1/day=1 |
+| hdfs://<var class="keyword varname">impala_data_dir</var>/show_files.db/part_text/year=2014/month=1/day=1/80732d9dc8068a0_1418645991_data.0.  | 6.25MB | year=2014/month=1/day=1 |
+| hdfs://<var class="keyword varname">impala_data_dir</var>/show_files.db/part_text/year=2014/month=1/day=1/80732d9dc8068a1_147082319_data.0.   | 7.16MB | year=2014/month=1/day=1 |
+| hdfs://<var class="keyword varname">impala_data_dir</var>/show_files.db/part_text/year=2014/month=1/day=1/80732d9dc8068a2_2111411753_data.0.  | 5.98MB | year=2014/month=1/day=1 |
+| hdfs://<var class="keyword varname">impala_data_dir</var>/show_files.db/part_text/year=2014/month=1/day=2/21a828cf494b5bbb_501271652_data.0.  | 6.42MB | year=2014/month=1/day=2 |
+| hdfs://<var class="keyword varname">impala_data_dir</var>/show_files.db/part_text/year=2014/month=1/day=2/21a828cf494b5bbc_501271652_data.0.  | 6.62MB | year=2014/month=1/day=2 |
+| hdfs://<var class="keyword varname">impala_data_dir</var>/show_files.db/part_text/year=2014/month=1/day=2/21a828cf494b5bbd_1393490200_data.0. | 6.98MB | year=2014/month=1/day=2 |
+| hdfs://<var class="keyword varname">impala_data_dir</var>/show_files.db/part_text/year=2014/month=1/day=2/21a828cf494b5bbe_1393490200_data.0. | 6.20MB | year=2014/month=1/day=2 |
++---------------------------------------------------------------------------------------------------------+--------+-------------------------+
+</code></pre>
+      <p class="p">
+        The following example shows a <code class="ph codeph">SHOW FILES</code> statement for a partitioned Parquet table.
+        The number and sizes of files are different from the equivalent partitioned text table
+        used in the previous example, because <code class="ph codeph">INSERT</code> operations for Parquet tables
+        are parallelized differently than for text tables. (Also, the amount of data is so small
+        that it can be written to Parquet without involving all the hosts in this 4-node cluster.)
+      </p>
+<pre class="pre codeblock"><code>[localhost:21000] &gt; create table part_parq (x bigint, y int, s string) partitioned by (year bigint, month bigint, day bigint) stored as parquet;
+[localhost:21000] &gt; insert into part_parq partition (year,month,day) select x, y, s, year, month, day from partitioned_text;
+[localhost:21000] &gt; show partitions part_parq;
++-------+-------+-----+-------+--------+---------+--------------+-------------------+---------+-------------------+
+| year  | month | day | #Rows | #Files | Size    | Bytes Cached | Cache Replication | Format  | Incremental stats |
++-------+-------+-----+-------+--------+---------+--------------+-------------------+---------+-------------------+
+| 2014  | 1     | 1   | -1    | 3      | 17.89MB | NOT CACHED   | NOT CACHED        | PARQUET | false             |
+| 2014  | 1     | 2   | -1    | 3      | 17.89MB | NOT CACHED   | NOT CACHED        | PARQUET | false             |
+| Total |       |     | -1    | 6      | 35.79MB | 0B           |                   |         |                   |
++-------+-------+-----+-------+--------+---------+--------------+-------------------+---------+-------------------+
+[localhost:21000] &gt; show files in part_parq;
++-----------------------------------------------------------------------------------------------+--------+-------------------------+
+| path                                                                                          | size   | partition               |
++-----------------------------------------------------------------------------------------------+--------+-------------------------+
+| hdfs://<var class="keyword varname">impala_data_dir</var>/show_files.db/part_parq/year=2014/month=1/day=1/1134113650_data.0.parq | 4.49MB | year=2014/month=1/day=1 |
+| hdfs://<var class="keyword varname">impala_data_dir</var>/show_files.db/part_parq/year=2014/month=1/day=1/617567880_data.0.parq  | 5.14MB | year=2014/month=1/day=1 |
+| hdfs://<var class="keyword varname">impala_data_dir</var>/show_files.db/part_parq/year=2014/month=1/day=1/2099499416_data.0.parq | 8.27MB | year=2014/month=1/day=1 |
+| hdfs://<var class="keyword varname">impala_data_dir</var>/show_files.db/part_parq/year=2014/month=1/day=2/945567189_data.0.parq  | 8.80MB | year=2014/month=1/day=2 |
+| hdfs://<var class="keyword varname">impala_data_dir</var>/show_files.db/part_parq/year=2014/month=1/day=2/2145850112_data.0.parq | 4.80MB | year=2014/month=1/day=2 |
+| hdfs://<var class="keyword varname">impala_data_dir</var>/show_files.db/part_parq/year=2014/month=1/day=2/665613448_data.0.parq  | 4.29MB | year=2014/month=1/day=2 |
++-----------------------------------------------------------------------------------------------+--------+-------------------------+
+</code></pre>
+<p class="p">
+  The following example shows output from the <code class="ph codeph">SHOW FILES</code> statement
+  for a table where the data files are stored in Amazon S3:
+</p>
+<pre class="pre codeblock"><code>[localhost:21000] &gt; show files in s3_testing.sample_data_s3;
++-----------------------------------------------------------------------+---------+
+| path                                                                  | size    |
++-----------------------------------------------------------------------+---------+
+| s3a://impala-demo/sample_data/e065453cba1988a6_1733868553_data.0.parq | 24.84MB |
++-----------------------------------------------------------------------+---------+
+</code></pre>
+
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="show__show_roles">
+
+    <h2 class="title topictitle2" id="ariaid-title3">SHOW ROLES Statement</h2>
+
+
+    <div class="body conbody">
+
+      <p class="p">
+        The <code class="ph codeph">SHOW ROLES</code> statement displays roles. This syntax is available in <span class="keyword">Impala 2.0</span> and later
+        only, when you are using the Sentry authorization framework along with the Sentry service, as described in
+        <a class="xref" href="impala_authorization.html#sentry_service">Using Impala with the Sentry Service (Impala 1.4 or higher only)</a>. It does not apply when you use the Sentry framework
+        with privileges defined in a policy file.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Security considerations:</strong>
+      </p>
+
+      <p class="p">
+        When authorization is enabled, the output of the <code class="ph codeph">SHOW</code> statement is limited to those
+        objects for which you have some privilege. There might be other database, tables, and so on, but their
+        names are concealed. If you believe an object exists but you cannot see it in the <code class="ph codeph">SHOW</code>
+        output, check with the system administrator if you need to be granted a new privilege for that object. See
+        <a class="xref" href="../shared/../topics/impala_authorization.html#authorization">Enabling Sentry Authorization for Impala</a> for how to set up authorization and add
+        privileges for specific kinds of objects.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+      <p class="p">
+        Depending on the roles set up within your organization by the <code class="ph codeph">CREATE ROLE</code> statement, the
+        output might look something like this:
+      </p>
+
+<pre class="pre codeblock"><code>show roles;
++-----------+
+| role_name |
++-----------+
+| analyst   |
+| role1     |
+| sales     |
+| superuser |
+| test_role |
++-----------+
+</code></pre>
+
+      <p class="p">
+        <strong class="ph b">HDFS permissions:</strong> This statement does not touch any HDFS files or directories,
+        therefore no HDFS permissions are required.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+      <p class="p">
+        <a class="xref" href="impala_authorization.html#authorization">Enabling Sentry Authorization for Impala</a>
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="show__show_current_role">
+
+    <h2 class="title topictitle2" id="ariaid-title4">SHOW CURRENT ROLE</h2>
+
+
+    <div class="body conbody">
+
+      <p class="p">
+        The <code class="ph codeph">SHOW CURRENT ROLE</code> statement displays roles assigned to the current user. This syntax
+        is available in <span class="keyword">Impala 2.0</span> and later only, when you are using the Sentry authorization framework along with
+        the Sentry service, as described in <a class="xref" href="impala_authorization.html#sentry_service">Using Impala with the Sentry Service (Impala 1.4 or higher only)</a>. It does not
+        apply when you use the Sentry framework with privileges defined in a policy file.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Security considerations:</strong>
+      </p>
+
+      <p class="p">
+        When authorization is enabled, the output of the <code class="ph codeph">SHOW</code> statement is limited to those
+        objects for which you have some privilege. There might be other database, tables, and so on, but their
+        names are concealed. If you believe an object exists but you cannot see it in the <code class="ph codeph">SHOW</code>
+        output, check with the system administrator if you need to be granted a new privilege for that object. See
+        <a class="xref" href="../shared/../topics/impala_authorization.html#authorization">Enabling Sentry Authorization for Impala</a> for how to set up authorization and add
+        privileges for specific kinds of objects.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+      <p class="p">
+        Depending on the roles set up within your organization by the <code class="ph codeph">CREATE ROLE</code> statement, the
+        output might look something like this:
+      </p>
+
+<pre class="pre codeblock"><code>show current roles;
++-----------+
+| role_name |
++-----------+
+| role1     |
+| superuser |
++-----------+
+</code></pre>
+
+      <p class="p">
+        <strong class="ph b">HDFS permissions:</strong> This statement does not touch any HDFS files or directories,
+        therefore no HDFS permissions are required.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+      <p class="p">
+        <a class="xref" href="impala_authorization.html#authorization">Enabling Sentry Authorization for Impala</a>
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title5" id="show__show_role_grant">
+
+    <h2 class="title topictitle2" id="ariaid-title5">SHOW ROLE GRANT Statement</h2>
+
+
+
+    <div class="body conbody">
+
+      <p class="p">
+
+        The <code class="ph codeph">SHOW ROLE GRANT</code> statement lists all the roles assigned to the specified group. This
+        statement is only allowed for Sentry administrative users and others users that are part of the specified
+        group. This syntax is available in <span class="keyword">Impala 2.0</span> and later only, when you are using the Sentry authorization
+        framework along with the Sentry service, as described in
+        <a class="xref" href="impala_authorization.html#sentry_service">Using Impala with the Sentry Service (Impala 1.4 or higher only)</a>. It does not apply when you use the Sentry framework
+        with privileges defined in a policy file.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Security considerations:</strong>
+      </p>
+
+      <p class="p">
+        When authorization is enabled, the output of the <code class="ph codeph">SHOW</code> statement is limited to those
+        objects for which you have some privilege. There might be other database, tables, and so on, but their
+        names are concealed. If you believe an object exists but you cannot see it in the <code class="ph codeph">SHOW</code>
+        output, check with the system administrator if you need to be granted a new privilege for that object. See
+        <a class="xref" href="../shared/../topics/impala_authorization.html#authorization">Enabling Sentry Authorization for Impala</a> for how to set up authorization and add
+        privileges for specific kinds of objects.
+      </p>
+
+
+
+      <p class="p">
+        <strong class="ph b">HDFS permissions:</strong> This statement does not touch any HDFS files or directories,
+        therefore no HDFS permissions are required.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+      <p class="p">
+        <a class="xref" href="impala_authorization.html#authorization">Enabling Sentry Authorization for Impala</a>
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title6" id="show__show_grant_role">
+
+    <h2 class="title topictitle2" id="ariaid-title6">SHOW GRANT ROLE Statement</h2>
+
+
+
+    <div class="body conbody">
+
+      <p class="p">
+
+        The <code class="ph codeph">SHOW GRANT ROLE</code> statement list all the grants for the given role name. This statement
+        is only allowed for Sentry administrative users and other users that have been granted the specified role.
+        This syntax is available in <span class="keyword">Impala 2.0</span> and later only, when you are using the Sentry authorization framework
+        along with the Sentry service, as described in <a class="xref" href="impala_authorization.html#sentry_service">Using Impala with the Sentry Service (Impala 1.4 or higher only)</a>. It
+        does not apply when you use the Sentry framework with privileges defined in a policy file.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Security considerations:</strong>
+      </p>
+
+      <p class="p">
+        When authorization is enabled, the output of the <code class="ph codeph">SHOW</code> statement is limited to those
+        objects for which you have some privilege. There might be other database, tables, and so on, but their
+        names are concealed. If you believe an object exists but you cannot see it in the <code class="ph codeph">SHOW</code>
+        output, check with the system administrator if you need to be granted a new privilege for that object. See
+        <a class="xref" href="../shared/../topics/impala_authorization.html#authorization">Enabling Sentry Authorization for Impala</a> for how to set up authorization and add
+        privileges for specific kinds of objects.
+      </p>
+
+
+
+      <p class="p">
+        <strong class="ph b">HDFS permissions:</strong> This statement does not touch any HDFS files or directories,
+        therefore no HDFS permissions are required.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+      <p class="p">
+        <a class="xref" href="impala_authorization.html#authorization">Enabling Sentry Authorization for Impala</a>
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title7" id="show__show_databases">
+
+    <h2 class="title topictitle2" id="ariaid-title7">SHOW DATABASES</h2>
+
+
+    <div class="body conbody">
+
+      <p class="p">
+        The <code class="ph codeph">SHOW DATABASES</code> statement is often the first one you issue when connecting to an
+        instance for the first time. You typically issue <code class="ph codeph">SHOW DATABASES</code> to see the names you can
+        specify in a <code class="ph codeph">USE <var class="keyword varname">db_name</var></code> statement, then after switching to a database
+        you issue <code class="ph codeph">SHOW TABLES</code> to see the names you can specify in <code class="ph codeph">SELECT</code> and
+        <code class="ph codeph">INSERT</code> statements.
+      </p>
+
+      <p class="p">
+        In <span class="keyword">Impala 2.5</span> and higher, the output includes a second column showing any associated comment
+        for each database.
+      </p>
+
+      <p class="p">
+        The output of <code class="ph codeph">SHOW DATABASES</code> includes the special <code class="ph codeph">_impala_builtins</code>
+        database, which lets you view definitions of built-in functions, as described under <code class="ph codeph">SHOW
+        FUNCTIONS</code>.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Security considerations:</strong>
+      </p>
+
+      <p class="p">
+        When authorization is enabled, the output of the <code class="ph codeph">SHOW</code> statement is limited to those
+        objects for which you have some privilege. There might be other database, tables, and so on, but their
+        names are concealed. If you believe an object exists but you cannot see it in the <code class="ph codeph">SHOW</code>
+        output, check with the system administrator if you need to be granted a new privilege for that object. See
+        <a class="xref" href="../shared/../topics/impala_authorization.html#authorization">Enabling Sentry Authorization for Impala</a> for how to set up authorization and add
+        privileges for specific kinds of objects.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+      <p class="p">
+        This example shows how you might locate a particular table on an unfamiliar system. The
+        <code class="ph codeph">DEFAULT</code> database is the one you initially connect to; a database with that name is present
+        on every system. You can issue <code class="ph codeph">SHOW TABLES IN <var class="keyword varname">db_name</var></code> without going
+        into a database, or <code class="ph codeph">SHOW TABLES</code> once you are inside a particular database.
+      </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; show databases;
++------------------+----------------------------------------------+
+| name             | comment                                      |
++------------------+----------------------------------------------+
+| _impala_builtins | System database for Impala builtin functions |
+| default          | Default Hive database                        |
+| file_formats     |                                              |
++------------------+----------------------------------------------+
+Returned 3 row(s) in 0.02s
+[localhost:21000] &gt; show tables in file_formats;
++--------------------+
+| name               |
++--------------------+
+| parquet_table      |
+| rcfile_table       |
+| sequencefile_table |
+| textfile_table     |
++--------------------+
+Returned 4 row(s) in 0.01s
+[localhost:21000] &gt; use file_formats;
+[localhost:21000] &gt; show tables like '*parq*';
++--------------------+
+| name               |
++--------------------+
+| parquet_table      |
++--------------------+
+Returned 1 row(s) in 0.01s</code></pre>
+
+      <p class="p">
+        <strong class="ph b">HDFS permissions:</strong> This statement does not touch any HDFS files or directories,
+        therefore no HDFS permissions are required.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+      <p class="p">
+        <a class="xref" href="impala_databases.html#databases">Overview of Impala Databases</a>, <a class="xref" href="impala_create_database.html#create_database">CREATE DATABASE Statement</a>,
+        <a class="xref" href="impala_drop_database.html#drop_database">DROP DATABASE Statement</a>, <a class="xref" href="impala_use.html#use">USE Statement</a>
+        <a class="xref" href="impala_show.html#show_tables">SHOW TABLES Statement</a>,
+        <a class="xref" href="impala_show.html#show_functions">SHOW FUNCTIONS Statement</a>
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title8" id="show__show_tables">
+
+    <h2 class="title topictitle2" id="ariaid-title8">SHOW TABLES Statement</h2>
+
+
+    <div class="body conbody">
+
+      <p class="p">
+        Displays the names of tables. By default, lists tables in the current database, or with the
+        <code class="ph codeph">IN</code> clause, in a specified database. By default, lists all tables, or with the
+        <code class="ph codeph">LIKE</code> clause, only those whose name match a pattern with <code class="ph codeph">*</code> wildcards.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Security considerations:</strong>
+      </p>
+
+      <p class="p">
+        When authorization is enabled, the output of the <code class="ph codeph">SHOW</code> statement is limited to those
+        objects for which you have some privilege. There might be other database, tables, and so on, but their
+        names are concealed. If you believe an object exists but you cannot see it in the <code class="ph codeph">SHOW</code>
+        output, check with the system administrator if you need to be granted a new privilege for that object. See
+        <a class="xref" href="../shared/../topics/impala_authorization.html#authorization">Enabling Sentry Authorization for Impala</a> for how to set up authorization and add
+        privileges for specific kinds of objects.
+      </p>
+
+      <p class="p">
+        The user ID that the <span class="keyword cmdname">impalad</span> daemon runs under,
+        typically the <code class="ph codeph">impala</code> user, must have read and execute
+        permissions for all directories that are part of the table.
+        (A table could span multiple different HDFS directories if it is partitioned.
+        The directories could be widely scattered because a partition can reside
+        in an arbitrary HDFS directory based on its <code class="ph codeph">LOCATION</code> attribute.)
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+      <p class="p">
+        The following examples demonstrate the <code class="ph codeph">SHOW TABLES</code> statement.
+        If the database contains no tables, the result set is empty.
+        If the database does contain tables, <code class="ph codeph">SHOW TABLES IN <var class="keyword varname">db_name</var></code>
+        lists all the table names. <code class="ph codeph">SHOW TABLES</code> with no qualifiers lists
+        all the table names in the current database.
+      </p>
+
+<pre class="pre codeblock"><code>create database empty_db;
+show tables in empty_db;
+Fetched 0 row(s) in 0.11s
+
+create database full_db;
+create table full_db.t1 (x int);
+create table full_db.t2 like full_db.t1;
+
+show tables in full_db;
++------+
+| name |
++------+
+| t1   |
+| t2   |
++------+
+
+use full_db;
+show tables;
++------+
+| name |
++------+
+| t1   |
+| t2   |
++------+
+</code></pre>
+
+      <p class="p">
+        This example demonstrates how <code class="ph codeph">SHOW TABLES LIKE '<var class="keyword varname">wildcard_pattern</var>'</code>
+        lists table names that match a pattern, or multiple alternative patterns.
+        The ability to do wildcard matches for table names makes it helpful to establish naming conventions for tables to
+        conveniently locate a group of related tables.
+      </p>
+
+<pre class="pre codeblock"><code>create table fact_tbl (x int);
+create table dim_tbl_1 (s string);
+create table dim_tbl_2 (s string);
+
+/* Asterisk is the wildcard character. Only 2 out of the 3 just-created tables are returned. */
+show tables like 'dim*';
++-----------+
+| name      |
++-----------+
+| dim_tbl_1 |
+| dim_tbl_2 |
++-----------+
+
+/* We are already in the FULL_DB database, but just to be sure we can specify the database name also. */
+show tables in full_db like 'dim*';
++-----------+
+| name      |
++-----------+
+| dim_tbl_1 |
+| dim_tbl_2 |
++-----------+
+
+/* The pipe character separates multiple wildcard patterns. */
+show tables like '*dim*|t*';
++-----------+
+| name      |
++-----------+
+| dim_tbl_1 |
+| dim_tbl_2 |
+| t1        |
+| t2        |
++-----------+
+</code></pre>
+
+      <p class="p">
+        <strong class="ph b">HDFS permissions:</strong> This statement does not touch any HDFS files or directories,
+        therefore no HDFS permissions are required.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+      <p class="p">
+        <a class="xref" href="impala_tables.html#tables">Overview of Impala Tables</a>, <a class="xref" href="impala_create_table.html#create_table">CREATE TABLE Statement</a>,
+        <a class="xref" href="impala_alter_table.html#alter_table">ALTER TABLE Statement</a>, <a class="xref" href="impala_drop_table.html#drop_table">DROP TABLE Statement</a>,
+        <a class="xref" href="impala_describe.html#describe">DESCRIBE Statement</a>, <a class="xref" href="impala_show.html#show_create_table">SHOW CREATE TABLE Statement</a>,
+        <a class="xref" href="impala_show.html#show_table_stats">SHOW TABLE STATS Statement</a>,
+        <a class="xref" href="impala_show.html#show_databases">SHOW DATABASES</a>,
+        <a class="xref" href="impala_show.html#show_functions">SHOW FUNCTIONS Statement</a>
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title9" id="show__show_create_table">
+
+    <h2 class="title topictitle2" id="ariaid-title9">SHOW CREATE TABLE Statement</h2>
+
+
+    <div class="body conbody">
+
+      <p class="p">
+        As a schema changes over time, you might run a <code class="ph codeph">CREATE TABLE</code> statement followed by several
+        <code class="ph codeph">ALTER TABLE</code> statements. To capture the cumulative effect of all those statements,
+        <code class="ph codeph">SHOW CREATE TABLE</code> displays a <code class="ph codeph">CREATE TABLE</code> statement that would reproduce
+        the current structure of a table. You can use this output in scripts that set up or clone a group of
+        tables, rather than trying to reproduce the original sequence of <code class="ph codeph">CREATE TABLE</code> and
+        <code class="ph codeph">ALTER TABLE</code> statements. When creating variations on the original table, or cloning the
+        original table on a different system, you might need to edit the <code class="ph codeph">SHOW CREATE TABLE</code> output
+        to change things such as the database name, <code class="ph codeph">LOCATION</code> field, and so on that might be
+        different on the destination system.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Security considerations:</strong>
+      </p>
+
+      <p class="p">
+        When authorization is enabled, the output of the <code class="ph codeph">SHOW</code> statement is limited to those
+        objects for which you have some privilege. There might be other database, tables, and so on, but their
+        names are concealed. If you believe an object exists but you cannot see it in the <code class="ph codeph">SHOW</code>
+        output, check with the system administrator if you need to be granted a new privilege for that object. See
+        <a class="xref" href="../shared/../topics/impala_authorization.html#authorization">Enabling Sentry Authorization for Impala</a> for how to set up authorization and add
+        privileges for specific kinds of objects.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">HDFS permissions:</strong> This statement does not touch any HDFS files or directories,
+        therefore no HDFS permissions are required.
+      </p>
+
+      <p class="p">
+        For Kudu tables:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          <p class="p">
+            The column specifications include attributes such as <code class="ph codeph">NULL</code>,
+            <code class="ph codeph">NOT NULL</code>, <code class="ph codeph">ENCODING</code>, and <code class="ph codeph">COMPRESSION</code>.
+            If you do not specify those attributes in the original <code class="ph codeph">CREATE TABLE</code> statement,
+            the <code class="ph codeph">SHOW CREATE TABLE</code> output displays the defaults that were used.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            The specifications of any <code class="ph codeph">RANGE</code> clauses are not displayed in full.
+            To see the definition of the range clauses for a Kudu table, use the <code class="ph codeph">SHOW RANGE PARTITIONS</code> statement.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            The <code class="ph codeph">TBLPROPERTIES</code> output reflects the Kudu master address
+            and the internal Kudu name associated with the Impala table.
+          </p>
+        </li>
+      </ul>
+
+<pre class="pre codeblock"><code>
+show CREATE TABLE numeric_grades_default_letter;
++------------------------------------------------------------------------------------------------+
+| result                                                                                         |
++------------------------------------------------------------------------------------------------+
+| CREATE TABLE user.numeric_grades_default_letter (                                              |
+|   score TINYINT NOT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION,               |
+|   letter_grade STRING NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION DEFAULT '-', |
+|   student STRING NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION,                  |
+|   PRIMARY KEY (score)                                                                          |
+| )                                                                                              |
+| PARTITION BY <strong class="ph b">RANGE (score) (...)</strong>                                                               |
+| STORED AS KUDU                                                                                 |
+| TBLPROPERTIES ('kudu.master_addresses'='vd0342.example.com:7051',                              |
+|   'kudu.table_name'='impala::USER.numeric_grades_default_letter')                              |
++------------------------------------------------------------------------------------------------+
+
+show range partitions numeric_grades_default_letter;
++--------------------+
+| RANGE (score)      |
++--------------------+
+| 0 &lt;= VALUES &lt; 50   |
+| 50 &lt;= VALUES &lt; 65  |
+| 65 &lt;= VALUES &lt; 80  |
+| 80 &lt;= VALUES &lt; 100 |
++--------------------+
+</code></pre>
+
+      <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+      <p class="p">
+        The following example shows how various clauses from the <code class="ph codeph">CREATE TABLE</code> statement are
+        represented in the output of <code class="ph codeph">SHOW CREATE TABLE</code>.
+      </p>
+
+<pre class="pre codeblock"><code>create table show_create_table_demo (id int comment "Unique ID", y double, s string)
+  partitioned by (year smallint)
+  stored as parquet;
+
+show create table show_create_table_demo;
++----------------------------------------------------------------------------------------+
+| result                                                                                 |
++----------------------------------------------------------------------------------------+
+| CREATE TABLE scratch.show_create_table_demo (                                          |
+|   id INT COMMENT 'Unique ID',                                                          |
+|   y DOUBLE,                                                                            |
+|   s STRING                                                                             |
+| )                                                                                      |
+| PARTITIONED BY (                                                                       |
+|   year SMALLINT                                                                        |
+| )                                                                                      |
+| STORED AS PARQUET                                                                      |
+| LOCATION 'hdfs://127.0.0.1:8020/user/hive/warehouse/scratch.db/show_create_table_demo' |
+| TBLPROPERTIES ('transient_lastDdlTime'='1418152582')                                   |
++----------------------------------------------------------------------------------------+
+</code></pre>
+
+      <p class="p">
+        The following example shows how, after a sequence of <code class="ph codeph">ALTER TABLE</code> statements, the output
+        from <code class="ph codeph">SHOW CREATE TABLE</code> represents the current state of the table. This output could be
+        used to create a matching table rather than executing the original <code class="ph codeph">CREATE TABLE</code> and
+        sequence of <code class="ph codeph">ALTER TABLE</code> statements.
+      </p>
+
+<pre class="pre codeblock"><code>alter table show_create_table_demo drop column s;
+alter table show_create_table_demo set fileformat textfile;
+
+show create table show_create_table_demo;
++----------------------------------------------------------------------------------------+
+| result                                                                                 |
++----------------------------------------------------------------------------------------+
+| CREATE TABLE scratch.show_create_table_demo (                                          |
+|   id INT COMMENT 'Unique ID',                                                          |
+|   y DOUBLE                                                                             |
+| )                                                                                      |
+| PARTITIONED BY (                                                                       |
+|   year SMALLINT                                                                        |
+| )                                                                                      |
+| STORED AS TEXTFILE                                                                     |
+| LOCATION 'hdfs://127.0.0.1:8020/user/hive/warehouse/demo.db/show_create_table_demo'    |
+| TBLPROPERTIES ('transient_lastDdlTime'='1418152638')                                   |
++----------------------------------------------------------------------------------------+
+</code></pre>
+
+      <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+      <p class="p">
+        <a class="xref" href="impala_create_table.html#create_table">CREATE TABLE Statement</a>, <a class="xref" href="impala_describe.html#describe">DESCRIBE Statement</a>,
+        <a class="xref" href="impala_show.html#show_tables">SHOW TABLES Statement</a>
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title10" id="show__show_table_stats">
+
+    <h2 class="title topictitle2" id="ariaid-title10">SHOW TABLE STATS Statement</h2>
+
+
+    <div class="body conbody">
+
+      <p class="p">
+        The <code class="ph codeph">SHOW TABLE STATS</code> and <code class="ph codeph">SHOW COLUMN STATS</code> variants are important for
+        tuning performance and diagnosing performance issues, especially with the largest tables and the most
+        complex join queries.
+      </p>
+
+      <p class="p">
+        Any values that are not available (because the <code class="ph codeph">COMPUTE STATS</code> statement has not been run
+        yet) are displayed as <code class="ph codeph">-1</code>.
+      </p>
+
+      <p class="p">
+        <code class="ph codeph">SHOW TABLE STATS</code> provides some general information about the table, such as the number of
+        files, overall size of the data, whether some or all of the data is in the HDFS cache, and the file format,
+        that is useful whether or not you have run the <code class="ph codeph">COMPUTE STATS</code> statement. A
+        <code class="ph codeph">-1</code> in the <code class="ph codeph">#Rows</code> output column indicates that the <code class="ph codeph">COMPUTE
+        STATS</code> statement has never been run for this table. If the table is partitioned, <code class="ph codeph">SHOW TABLE
+        STATS</code> provides this information for each partition. (It produces the same output as the
+        <code class="ph codeph">SHOW PARTITIONS</code> statement in this case.)
+      </p>
+
+      <p class="p">
+        The output of <code class="ph codeph">SHOW COLUMN STATS</code> is primarily only useful after the <code class="ph codeph">COMPUTE
+        STATS</code> statement has been run on the table. A <code class="ph codeph">-1</code> in the <code class="ph codeph">#Distinct
+        Values</code> output column indicates that the <code class="ph codeph">COMPUTE STATS</code> statement has never been
+        run for this table. Currently, Impala always leaves the <code class="ph codeph">#Nulls</code> column as
+        <code class="ph codeph">-1</code>, even after <code class="ph codeph">COMPUTE STATS</code> has been run.
+      </p>
+
+      <p class="p">
+        These <code class="ph codeph">SHOW</code> statements work on actual tables only, not on views.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Security considerations:</strong>
+      </p>
+
+      <p class="p">
+        When authorization is enabled, the output of the <code class="ph codeph">SHOW</code> statement is limited to those
+        objects for which you have some privilege. There might be other database, tables, and so on, but their
+        names are concealed. If you believe an object exists but you cannot see it in the <code class="ph codeph">SHOW</code>
+        output, check with the system administrator if you need to be granted a new privilege for that object. See
+        <a class="xref" href="../shared/../topics/impala_authorization.html#authorization">Enabling Sentry Authorization for Impala</a> for how to set up authorization and add
+        privileges for specific kinds of objects.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Kudu considerations:</strong>
+      </p>
+
+      <p class="p">
+        Because Kudu tables do not have characteristics derived from HDFS, such
+        as number of files, file format, and HDFS cache status, the output of
+        <code class="ph codeph">SHOW TABLE STATS</code> reflects different characteristics
+        that apply to Kudu tables. If the Kudu table is created with the
+        clause <code class="ph codeph">PARTITIONS 20</code>, then the result set of
+        <code class="ph codeph">SHOW TABLE STATS</code> consists of 20 rows, each representing
+        one of the numbered partitions. For example:
+      </p>
+
+<pre class="pre codeblock"><code>
+show table stats kudu_table;
++--------+-----------+----------+-----------------------+------------+
+| # Rows | Start Key | Stop Key | Leader Replica        | # Replicas |
++--------+-----------+----------+-----------------------+------------+
+| -1     |           | 00000001 | host.example.com:7050 | 3          |
+| -1     | 00000001  | 00000002 | host.example.com:7050 | 3          |
+| -1     | 00000002  | 00000003 | host.example.com:7050 | 3          |
+| -1     | 00000003  | 00000004 | host.example.com:7050 | 3          |
+| -1     | 00000004  | 00000005 | host.example.com:7050 | 3          |
+...
+</code></pre>
+
+      <p class="p">
+        Impala does not compute the number of rows for each partition for
+        Kudu tables. Therefore, you do not need to re-run <code class="ph codeph">COMPUTE STATS</code>
+        when you see -1 in the <code class="ph codeph"># Rows</code> column of the output from
+        <code class="ph codeph">SHOW TABLE STATS</code>. That column always shows -1 for
+        all Kudu tables.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+      <p class="p">
+        The following examples show how the <code class="ph codeph">SHOW TABLE STATS</code> statement displays physical
+        information about a table and the associated data files:
+      </p>
+
+<pre class="pre codeblock"><code>show table stats store_sales;
++-------+--------+----------+--------------+--------+-------------------+
+| #Rows | #Files | Size     | Bytes Cached | Format | Incremental stats |
++-------+--------+----------+--------------+--------+-------------------+
+| -1    | 1      | 370.45MB | NOT CACHED   | TEXT   | false             |
++-------+--------+----------+--------------+--------+-------------------+
+
+show table stats customer;
++-------+--------+---------+--------------+--------+-------------------+
+| #Rows | #Files | Size    | Bytes Cached | Format | Incremental stats |
++-------+--------+---------+--------------+--------+-------------------+
+| -1    | 1      | 12.60MB | NOT CACHED   | TEXT   | false             |
++-------+--------+---------+--------------+--------+-------------------+
+</code></pre>
+
+      <p class="p">
+        The following example shows how, after a <code class="ph codeph">COMPUTE STATS</code> or <code class="ph codeph">COMPUTE INCREMENTAL
+        STATS</code> statement, the <code class="ph codeph">#Rows</code> field is now filled in. Because the
+        <code class="ph codeph">STORE_SALES</code> table in this example is not partitioned, the <code class="ph codeph">COMPUTE INCREMENTAL
+        STATS</code> statement produces regular stats rather than incremental stats, therefore the
+        <code class="ph codeph">Incremental stats</code> field remains <code class="ph codeph">false</code>.
+      </p>
+
+<pre class="pre codeblock"><code>compute stats customer;
++------------------------------------------+
+| summary                                  |
++------------------------------------------+
+| Updated 1 partition(s) and 18 column(s). |
++------------------------------------------+
+
+show table stats customer;
++--------+--------+---------+--------------+--------+-------------------+
+| #Rows  | #Files | Size    | Bytes Cached | Format | Incremental stats |
++--------+--------+---------+--------------+--------+-------------------+
+| 100000 | 1      | 12.60MB | NOT CACHED   | TEXT   | false             |
++--------+--------+---------+--------------+--------+-------------------+
+
+compute incremental stats store_sales;
++------------------------------------------+
+| summary                                  |
++------------------------------------------+
+| Updated 1 partition(s) and 23 column(s). |
++------------------------------------------+
+
+show table stats store_sales;
++---------+--------+----------+--------------+--------+-------------------+
+| #Rows   | #Files | Size     | Bytes Cached | Format | Incremental stats |
++---------+--------+----------+--------------+--------+-------------------+
+| 2880404 | 1      | 370.45MB | NOT CACHED   | TEXT   | false             |
++---------+--------+----------+--------------+--------+-------------------+
+</code></pre>
+
+      <p class="p">
+        <strong class="ph b">HDFS permissions:</strong>
+      </p>
+      <p class="p">
+        The user ID that the <span class="keyword cmdname">impalad</span> daemon runs under,
+        typically the <code class="ph codeph">impala</code> user, must have read and execute
+        permissions for all directories that are part of the table.
+        (A table could span multiple different HDFS directories if it is partitioned.
+        The directories could be widely scattered because a partition can reside
+        in an arbitrary HDFS directory based on its <code class="ph codeph">LOCATION</code> attribute.)
+        The Impala user must also have execute
+        permission for the database directory, and any parent directories of the database directory in HDFS.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+      <p class="p">
+        <a class="xref" href="impala_compute_stats.html#compute_stats">COMPUTE STATS Statement</a>, <a class="xref" href="impala_show.html#show_column_stats">SHOW COLUMN STATS Statement</a>
+      </p>
+
+      <p class="p">
+        See <a class="xref" href="impala_perf_stats.html#perf_stats">Table and Column Statistics</a> for usage information and examples.
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title11" id="show__show_column_stats">
+
+    <h2 class="title topictitle2" id="ariaid-title11">SHOW COLUMN STATS Statement</h2>
+
+
+    <div class="body conbody">
+
+      <p class="p">
+        The <code class="ph codeph">SHOW TABLE STATS</code> and <code class="ph codeph">SHOW COLUMN STATS</code> variants are important for
+        tuning performance and diagnosing performance issues, especially with the largest tables and the most
+        complex join queries.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Security considerations:</strong>
+      </p>
+
+      <p class="p">
+        When authorization is enabled, the output of the <code class="ph codeph">SHOW</code> statement is limited to those
+        objects for which you have some privilege. There might be other database, tables, and so on, but their
+        names are concealed. If you believe an object exists but you cannot see it in the <code class="ph codeph">SHOW</code>
+        output, check with the system administrator if you need to be granted a new privilege for that object. See
+        <a class="xref" href="../shared/../topics/impala_authorization.html#authorization">Enabling Sentry Authorization for Impala</a> for how to set up authorization and add
+        privileges for specific kinds of objects.
+      </p>
+
+      <p class="p">
+        The output for <code class="ph codeph">SHOW COLUMN STATS</code> includes
+        the relevant information for Kudu tables.
+        The information for column statistics that originates in the
+        underlying Kudu storage layer is also represented in the
+        metastore database that Impala uses.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+      <p class="p">
+        The following examples show the output of the <code class="ph codeph">SHOW COLUMN STATS</code> statement for some tables,
+        before the <code class="ph codeph">COMPUTE STATS</code> statement is run. Impala deduces some information, such as
+        maximum and average size for fixed-length columns, and leaves and unknown values as <code class="ph codeph">-1</code>.
+      </p>
+
+<pre class="pre codeblock"><code>show column stats customer;
++------------------------+--------+------------------+--------+----------+----------+
+| Column                 | Type   | #Distinct Values | #Nulls | Max Size | Avg Size |
++------------------------+--------+------------------+--------+----------+----------+
+| c_customer_sk          | INT    | -1               | -1     | 4        | 4        |
+| c_customer_id          | STRING | -1               | -1     | -1       | -1       |
+| c_current_cdemo_sk     | INT    | -1               | -1     | 4        | 4        |
+| c_current_hdemo_sk     | INT    | -1               | -1     | 4        | 4        |
+| c_current_addr_sk      | INT    | -1               | -1     | 4        | 4        |
+| c_first_shipto_date_sk | INT    | -1               | -1     | 4        | 4        |
+| c_first_sales_date_sk  | INT    | -1               | -1     | 4        | 4        |
+| c_salutation           | STRING | -1               | -1     | -1       | -1       |
+| c_first_name           | STRING | -1               | -1     | -1       | -1       |
+| c_last_name            | STRING | -1               | -1     | -1       | -1       |
+| c_preferred_cust_flag  | STRING | -1               | -1     | -1       | -1       |
+| c_birth_day            | INT    | -1               | -1     | 4        | 4        |
+| c_birth_month          | INT    | -1               | -1     | 4        | 4        |
+| c_birth_year           | INT    | -1               | -1     | 4        | 4        |
+| c_birth_country        | STRING | -1               | -1     | -1       | -1       |
+| c_login                | STRING | -1               | -1     | -1       | -1       |
+| c_email_address        | STRING | -1               | -1     | -1       | -1       |
+| c_last_review_date     | STRING | -1               | -1     | -1       | -1       |
++------------------------+--------+------------------+--------+----------+----------+
+
+show column stats store_sales;
++-----------------------+-------+------------------+--------+----------+----------+
+| Column                | Type  | #Distinct Values | #Nulls | Max Size | Avg Size |
++-----------------------+-------+------------------+--------+----------+----------+
+| ss_sold_date_sk       | INT   | -1               | -1     | 4        | 4        |
+| ss_sold_time_sk       | INT   | -1               | -1     | 4        | 4        |
+| ss_item_sk            | INT   | -1               | -1     | 4        | 4        |
+| ss_customer_sk        | INT   | -1               | -1     | 4        | 4        |
+| ss_cdemo_sk           | INT   | -1               | -1     | 4        | 4        |
+| ss_hdemo_sk           | INT   | -1               | -1     | 4        | 4        |
+| ss_addr_sk            | INT   | -1               | -1     | 4        | 4        |
+| ss_store_sk           | INT   | -1               | -1     | 4        | 4        |
+| ss_promo_sk           | INT   | -1               | -1     | 4        | 4        |
+| ss_ticket_number      | INT   | -1               | -1     | 4        | 4        |
+| ss_quantity           | INT   | -1               | -1     | 4        | 4        |
+| ss_wholesale_cost     | FLOAT | -1               | -1     | 4        | 4        |
+| ss_list_price         | FLOAT | -1               | -1     | 4        | 4        |
+| ss_sales_price        | FLOAT | -1               | -1     | 4        | 4        |
+| ss_ext_discount_amt   | FLOAT | -1               | -1     | 4        | 4        |
+| ss_ext_sales_price    | FLOAT | -1               | -1     | 4        | 4        |
+| ss_ext_wholesale_cost | FLOAT | -1               | -1     | 4        | 4        |
+| ss_ext_list_price     | FLOAT | -1               | -1     | 4        | 4        |
+| ss_ext_tax            | FLOAT | -1               | -1     | 4        | 4        |
+| ss_coupon_amt         | FLOAT | -1               | -1     | 4        | 4        |
+| ss_net_paid           | FLOAT | -1               | -1     | 4        | 4        |
+| ss_net_paid_inc_tax   | FLOAT | -1               | -1     | 4        | 4        |
+| ss_net_profit         | FLOAT | -1               | -1     | 4        | 4        |
++-----------------------+-------+------------------+--------+----------+----------+
+</code></pre>
+
+      <p class="p">
+        The following examples show the output of the <code class="ph codeph">SHOW COLUMN STATS</code> statement for some tables,
+        after the <code class="ph codeph">COMPUTE STATS</code> statement is run. Now most of the <code class="ph codeph">-1</code> values are
+        changed to reflect the actual table data. The <code class="ph codeph">#Nulls</code> column remains <code class="ph codeph">-1</code>
+        because Impala does not use the number of <code class="ph codeph">NULL</code> values to influence query planning.
+      </p>
+
+<pre class="pre codeblock"><code>compute stats customer;
++------------------------------------------+
+| summary                                  |
++------------------------------------------+
+| Updated 1 partition(s) and 18 column(s). |
++------------------------------------------+
+
+compute stats store_sales;
++------------------------------------------+
+| summary                                  |
++------------------------------------------+
+| Updated 1 partition(s) and 23 column(s). |
++------------------------------------------+
+
+show column stats customer;
++------------------------+--------+------------------+--------+----------+--------+
+| Column                 | Type   | #Distinct Values | #Nulls | Max Size | Avg Size
++------------------------+--------+------------------+--------+----------+--------+
+| c_customer_sk          | INT    | 139017           | -1     | 4        | 4      |
+| c_customer_id          | STRING | 111904           | -1     | 16       | 16     |
+| c_current_cdemo_sk     | INT    | 95837            | -1     | 4        | 4      |
+| c_current_hdemo_sk     | INT    | 8097             | -1     | 4        | 4      |
+| c_current_addr_sk      | INT    | 57334            | -1     | 4        | 4      |
+| c_first_shipto_date_sk | INT    | 4374             | -1     | 4        | 4      |
+| c_first_sales_date_sk  | INT    | 4409             | -1     | 4        | 4      |
+| c_salutation           | STRING | 7                | -1     | 4        | 3.1308 |
+| c_first_name           | STRING | 3887             | -1     | 11       | 5.6356 |
+| c_last_name            | STRING | 4739             | -1     | 13       | 5.9106 |
+| c_preferred_cust_flag  | STRING | 3                | -1     | 1        | 0.9656 |
+| c_birth_day            | INT    | 31               | -1     | 4        | 4      |
+| c_birth_month          | INT    | 12               | -1     | 4        | 4      |
+| c_birth_year           | INT    | 71               | -1     | 4        | 4      |
+| c_birth_country        | STRING | 205              | -1     | 20       | 8.4001 |
+| c_login                | STRING | 1                | -1     | 0        | 0      |
+| c_email_address        | STRING | 94492            | -1     | 46       | 26.485 |
+| c_last_review_date     | STRING | 349              | -1     | 7        | 6.7561 |
++------------------------+--------+------------------+--------+----------+--------+
+
+show column stats store_sales;
++-----------------------+-------+------------------+--------+----------+----------+
+| Column                | Type  | #Distinct Values | #Nulls | Max Size | Avg Size |
++-----------------------+-------+------------------+--------+----------+----------+
+| ss_sold_date_sk       | INT   | 4395             | -1     | 4        | 4        |
+| ss_sold_time_sk       | INT   | 63617            | -1     | 4        | 4        |
+| ss_item_sk            | INT   | 19463            | -1     | 4        | 4        |
+| ss_customer_sk        | INT   | 122720           | -1     | 4        | 4        |
+| ss_cdemo_sk           | INT   | 242982           | -1     | 4        | 4        |
+| ss_hdemo_sk           | INT   | 8097             | -1     | 4        | 4        |
+| ss_addr_sk            | INT   | 70770            | -1     | 4        | 4        |
+| ss_store_sk           | INT   | 6                | -1     | 4        | 4        |
+| ss_promo_sk           | INT   | 355              | -1     | 4        | 4        |
+| ss_ticket_number      | INT   | 304098           | -1     | 4        | 4        |
+| ss_quantity           | INT   | 105              | -1     | 4        | 4        |
+| ss_wholesale_cost     | FLOAT | 9600             | -1     | 4        | 4        |
+| ss_list_price         | FLOAT | 22191            | -1     | 4        | 4        |
+| ss_sales_price        | FLOAT | 20693            | -1     | 4        | 4        |
+| ss_ext_discount_amt   | FLOAT | 228141           | -1     | 4        | 4        |
+| ss_ext_sales_price    | FLOAT | 433550           | -1     | 4        | 4        |
+| ss_ext_wholesale_cost | FLOAT | 406291           | -1     | 4        | 4        |
+| ss_ext_list_price     | FLOAT | 574871           | -1     | 4        | 4        |
+| ss_ext_tax            | FLOAT | 91806            | -1     | 4        | 4        |
+| ss_coupon_amt         | FLOAT | 228141           | -1     | 4        | 4        |
+| ss_net_paid           | FLOAT | 493107           | -1     | 4        | 4        |
+| ss_net_paid_inc_tax   | FLOAT | 653523           | -1     | 4        | 4        |
+| ss_net_profit         | FLOAT | 611934           | -1     | 4        | 4        |
++-----------------------+-------+------------------+--------+----------+----------+
+</code></pre>
+
+      <p class="p">
+        <strong class="ph b">HDFS permissions:</strong>
+      </p>
+      <p class="p">
+        The user ID that the <span class="keyword cmdname">impalad</span> daemon runs under,
+        typically the <code class="ph codeph">impala</code> user, must have read and execute
+        permissions for all directories that are part of the table.
+        (A table could span multiple different HDFS directories if it is partitioned.
+        The directories could be widely scattered because a partition can reside
+        in an arbitrary HDFS directory based on its <code class="ph codeph">LOCATION</code> attribute.)
+        The Impala user must also have execute
+        permission for the database directory, and any parent directories of the database directory in HDFS.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+      <p class="p">
+        <a class="xref" href="impala_compute_stats.html#compute_stats">COMPUTE STATS Statement</a>, <a class="xref" href="impala_show.html#show_table_stats">SHOW TABLE STATS Statement</a>
+      </p>
+
+      <p class="p">
+        See <a class="xref" href="impala_perf_stats.html#perf_stats">Table and Column Statistics</a> for usage information and examples.
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title12" id="show__show_partitions">
+
+    <h2 class="title topictitle2" id="ariaid-title12">SHOW PARTITIONS Statement</h2>
+
+
+    <div class="body conbody">
+
+      <p class="p">
+        <code class="ph codeph">SHOW PARTITIONS</code> displays information about each partition for a partitioned table. (The
+        output is the same as the <code class="ph codeph">SHOW TABLE STATS</code> statement, but <code class="ph codeph">SHOW PARTITIONS</code>
+        only works on a partitioned table.) Because it displays table statistics for all partitions, the output is
+        more informative if you have run the <code class="ph codeph">COMPUTE STATS</code> statement after creating all the
+        partitions. See <a class="xref" href="impala_compute_stats.html#compute_stats">COMPUTE STATS Statement</a> for details. For example, on a
+        <code class="ph codeph">CENSUS</code> table partitioned on the <code class="ph codeph">YEAR</code> column:
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Security considerations:</strong>
+      </p>
+
+      <p class="p">
+        When authorization is enabled, the output of the <code class="ph codeph">SHOW</code> statement is limited to those
+        objects for which you have some privilege. There might be other database, tables, and so on, but their
+        names are concealed. If you believe an object exists but you cannot see it in the <code class="ph codeph">SHOW</code>
+        output, check with the system administrator if you need to be granted a new privilege for that object. See
+        <a class="xref" href="../shared/../topics/impala_authorization.html#authorization">Enabling Sentry Authorization for Impala</a> for how to set up authorization and add
+        privileges for specific kinds of objects.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Kudu considerations:</strong>
+      </p>
+
+      <p class="p">
+        The optional <code class="ph codeph">RANGE</code> clause only applies to Kudu tables. It displays only the partitions
+        defined by the <code class="ph codeph">RANGE</code> clause of <code class="ph codeph">CREATE TABLE</code> or <code class="ph codeph">ALTER TABLE</code>.
+      </p>
+
+      <p class="p">
+        Although you can specify <code class="ph codeph">&lt;</code> or
+        <code class="ph codeph">&lt;=</code> comparison operators when defining
+        range partitions for Kudu tables, Kudu rewrites them if necessary
+        to represent each range as
+        <code class="ph codeph"><var class="keyword varname">low_bound</var> &lt;= VALUES &lt; <var class="keyword varname">high_bound</var></code>.
+        This rewriting might involve incrementing one of the boundary values
+        or appending a <code class="ph codeph">\0</code> for string values, so that the
+        partition covers the same range as originally specified.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+      <p class="p">
+        The following example shows the output for a Parquet, text, or other
+        HDFS-backed table partitioned on the <code class="ph codeph">YEAR</code> column:
+      </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; show partitions census;
++-------+-------+--------+------+---------+
+| year  | #Rows | #Files | Size | Format  |
++-------+-------+--------+------+---------+
+| 2000  | -1    | 0      | 0B   | TEXT    |
+| 2004  | -1    | 0      | 0B   | TEXT    |
+| 2008  | -1    | 0      | 0B   | TEXT    |
+| 2010  | -1    | 0      | 0B   | TEXT    |
+| 2011  | 4     | 1      | 22B  | TEXT    |
+| 2012  | 4     | 1      | 22B  | TEXT    |
+| 2013  | 1     | 1      | 231B | PARQUET |
+| Total | 9     | 3      | 275B |         |
++-------+-------+--------+------+---------+
+</code></pre>
+
+      <p class="p">
+        The following example shows the output for a Kudu table
+        using the hash partitioning mechanism. The number of
+        rows in the result set corresponds to the values used
+        in the <code class="ph codeph">PARTITIONS <var class="keyword varname">N</var></code>
+        clause of <code class="ph codeph">CREATE TABLE</code>.
+      </p>
+
+<pre class="pre codeblock"><code>
+show partitions million_rows_hash;
+
++--------+-----------+----------+-----------------------+--
+| # Rows | Start Key | Stop Key | Leader Replica        | # Replicas
++--------+-----------+----------+-----------------------+--
+| -1     |           | 00000001 | n236.example.com:7050 | 3
+| -1     | 00000001  | 00000002 | n236.example.com:7050 | 3
+| -1     | 00000002  | 00000003 | n336.example.com:7050 | 3
+| -1     | 00000003  | 00000004 | n238.example.com:7050 | 3
+| -1     | 00000004  | 00000005 | n338.example.com:7050 | 3
+....
+| -1     | 0000002E  | 0000002F | n240.example.com:7050 | 3
+| -1     | 0000002F  | 00000030 | n336.example.com:7050 | 3
+| -1     | 00000030  | 00000031 | n240.example.com:7050 | 3
+| -1     | 00000031  |          | n334.example.com:7050 | 3
++--------+-----------+----------+-----------------------+--
+Fetched 50 row(s) in 0.05s
+
+</code></pre>
+
+      <p class="p">
+        The following example shows the output for a Kudu table
+        using the range partitioning mechanism:
+      </p>
+
+<pre class="pre codeblock"><code>
+show range partitions million_rows_range;
++-----------------------+
+| RANGE (id)            |
++-----------------------+
+| VALUES &lt; "A"          |
+| "A" &lt;= VALUES &lt; "["   |
+| "a" &lt;= VALUES &lt; "{"   |
+| "{" &lt;= VALUES &lt; "~\0" |
++-----------------------+
+
+</code></pre>
+
+      <p class="p">
+        <strong class="ph b">HDFS permissions:</strong>
+      </p>
+      <p class="p">
+        The user ID that the <span class="keyword cmdname">impalad</span> daemon runs under,
+        typically the <code class="ph codeph">impala</code> user, must have read and execute
+        permissions for all directories that are part of the table.
+        (A table could span multiple different HDFS directories if it is partitioned.
+        The directories could be widely scattered because a partition can reside
+        in an arbitrary HDFS directory based on its <code class="ph codeph">LOCATION</code> attribute.)
+        The Impala user must also have execute
+        permission for the database directory, and any parent directories of the database directory in HDFS.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+      <p class="p">
+        See <a class="xref" href="impala_perf_stats.html#perf_stats">Table and Column Statistics</a> for usage information and examples.
+      </p>
+
+      <p class="p">
+        <a class="xref" href="impala_show.html#show_table_stats">SHOW TABLE STATS Statement</a>, <a class="xref" href="impala_partitioning.html#partitioning">Partitioning for Impala Tables</a>
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title13" id="show__show_functions">
+
+    <h2 class="title topictitle2" id="ariaid-title13">SHOW FUNCTIONS Statement</h2>
+
+
+    <div class="body conbody">
+
+      <p class="p">
+        By default, <code class="ph codeph">SHOW FUNCTIONS</code> displays user-defined functions (UDFs) and <code class="ph codeph">SHOW
+        AGGREGATE FUNCTIONS</code> displays user-defined aggregate functions (UDAFs) associated with a particular
+        database. The output from <code class="ph codeph">SHOW FUNCTIONS</code> includes the argument signature of each function.
+        You specify this argument signature as part of the <code class="ph codeph">DROP FUNCTION</code> statement. You might have
+        several UDFs with the same name, each accepting different argument data types.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+      <p class="p">
+        In <span class="keyword">Impala 2.5</span> and higher, the <code class="ph codeph">SHOW FUNCTIONS</code> output includes
+        a new column, labelled <code class="ph codeph">is persistent</code>. This property is <code class="ph codeph">true</code> for
+        Impala built-in functions, C++ UDFs, and Java UDFs created using the new <code class="ph codeph">CREATE FUNCTION</code>
+        syntax with no signature. It is <code class="ph codeph">false</code> for Java UDFs created using the old
+        <code class="ph codeph">CREATE FUNCTION</code> syntax that includes the types for the arguments and return value.
+        Any functions with <code class="ph codeph">false</code> shown for this property must be created again by the
+        <code class="ph codeph">CREATE FUNCTION</code> statement each time the Impala catalog server is restarted.
+        See <code class="ph codeph">CREATE FUNCTION</code> for information on switching to the new syntax, so that
+        Java UDFs are preserved across restarts. Java UDFs that are persisted this way are also easier
+        to share across Impala and Hive.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Security considerations:</strong>
+      </p>
+
+      <p class="p">
+        When authorization is enabled, the output of the <code class="ph codeph">SHOW</code> statement is limited to those
+        objects for which you have some privilege. There might be other database, tables, and so on, but their
+        names are concealed. If you believe an object exists but you cannot see it in the <code class="ph codeph">SHOW</code>
+        output, check with the system administrator if you need to be granted a new privilege for that object. See
+        <a class="xref" href="../shared/../topics/impala_authorization.html#authorization">Enabling Sentry Authorization for Impala</a> for how to set up authorization and add
+        privileges for specific kinds of objects.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">HDFS permissions:</strong> This statement does not touch any HDFS files or directories,
+        therefore no HDFS permissions are required.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+      <p class="p">
+        To display Impala built-in functions, specify the special database name <code class="ph codeph">_impala_builtins</code>:
+      </p>
+
+<pre class="pre codeblock"><code>show functions in _impala_builtins;
++--------------+-------------------------------------------------+-------------+---------------+
+| return type  | signature                                       | binary type | is persistent |
++--------------+-------------------------------------------------+-------------+---------------+
+| BIGINT       | abs(BIGINT)                                     | BUILTIN     | true          |
+| DECIMAL(*,*) | abs(DECIMAL(*,*))                               | BUILTIN     | true          |
+| DOUBLE       | abs(DOUBLE)                                     | BUILTIN     | true          |
+| FLOAT        | abs(FLOAT)                                      | BUILTIN     | true          |
++----------------+----------------------------------------+
+...
+
+show functions in _impala_builtins like '*week*';
++-------------+------------------------------+-------------+---------------+
+| return type | signature                    | binary type | is persistent |
++-------------+------------------------------+-------------+---------------+
+| INT         | dayofweek(TIMESTAMP)         | BUILTIN     | true          |
+| INT         | weekofyear(TIMESTAMP)        | BUILTIN     | true          |
+| TIMESTAMP   | weeks_add(TIMESTAMP, BIGINT) | BUILTIN     | true          |
+| TIMESTAMP   | weeks_add(TIMESTAMP, INT)    | BUILTIN     | true          |
+| TIMESTAMP   | weeks_sub(TIMESTAMP, BIGINT) | BUILTIN     | true          |
+| TIMESTAMP   | weeks_sub(TIMESTAMP, INT)    | BUILTIN     | true          |
++-------------+------------------------------+-------------+---------------+
+</code></pre>
+
+      <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+      <p class="p">
+        <a class="xref" href="impala_functions_overview.html#functions">Overview of Impala Functions</a>, <a class="xref" href="impala_functions.html#builtins">Impala Built-In Functions</a>,
+        <a class="xref" href="impala_udf.html#udfs">Impala User-Defined Functions (UDFs)</a>,
+        <a class="xref" href="impala_show.html#show_databases">SHOW DATABASES</a>,
+        <a class="xref" href="impala_show.html#show_tables">SHOW TABLES Statement</a>
+      </p>
+    </div>
+  </article>
+
+
+</article></main></body></html>

[33/51] [partial] impala git commit: [DOCS] Impala doc site update for 3.0

Posted by mi...@apache.org.

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_explain.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_explain.html b/docs/build3x/html/topics/impala_explain.html
new file mode 100644
index 0000000..7768124
--- /dev/null
+++ b/docs/build3x/html/topics/impala_explain.html
@@ -0,0 +1,296 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_langref_sql.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="explain"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>EXPLAIN Statement</title></head><body id="explain"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">EXPLAIN Statement</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+
+      Returns the execution plan for a statement, showing the low-level mechanisms that Impala will use to read the
+      data, divide the work among nodes in the cluster, and transmit intermediate and final results across the
+      network. Use <code class="ph codeph">explain</code> followed by a complete <code class="ph codeph">SELECT</code> query. For example:
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>EXPLAIN { <var class="keyword varname">select_query</var> | <var class="keyword varname">ctas_stmt</var> | <var class="keyword varname">insert_stmt</var> }
+</code></pre>
+
+    <p class="p">
+      The <var class="keyword varname">select_query</var> is a <code class="ph codeph">SELECT</code> statement, optionally prefixed by a
+      <code class="ph codeph">WITH</code> clause. See <a class="xref" href="impala_select.html#select">SELECT Statement</a> for details.
+    </p>
+
+    <p class="p">
+      The <var class="keyword varname">insert_stmt</var> is an <code class="ph codeph">INSERT</code> statement that inserts into or overwrites an
+      existing table. It can use either the <code class="ph codeph">INSERT ... SELECT</code> or <code class="ph codeph">INSERT ...
+      VALUES</code> syntax. See <a class="xref" href="impala_insert.html#insert">INSERT Statement</a> for details.
+    </p>
+
+    <p class="p">
+      The <var class="keyword varname">ctas_stmt</var> is a <code class="ph codeph">CREATE TABLE</code> statement using the <code class="ph codeph">AS
+      SELECT</code> clause, typically abbreviated as a <span class="q">"CTAS"</span> operation. See
+      <a class="xref" href="impala_create_table.html#create_table">CREATE TABLE Statement</a> for details.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+    <p class="p">
+      You can interpret the output to judge whether the query is performing efficiently, and adjust the query
+      and/or the schema if not. For example, you might change the tests in the <code class="ph codeph">WHERE</code> clause, add
+      hints to make join operations more efficient, introduce subqueries, change the order of tables in a join, add
+      or change partitioning for a table, collect column statistics and/or table statistics in Hive, or any other
+      performance tuning steps.
+    </p>
+
+    <p class="p">
+      The <code class="ph codeph">EXPLAIN</code> output reminds you if table or column statistics are missing from any table
+      involved in the query. These statistics are important for optimizing queries involving large tables or
+      multi-table joins. See <a class="xref" href="impala_compute_stats.html#compute_stats">COMPUTE STATS Statement</a> for how to gather statistics,
+      and <a class="xref" href="impala_perf_stats.html#perf_stats">Table and Column Statistics</a> for how to use this information for query tuning.
+    </p>
+
+    <div class="p">
+        Read the <code class="ph codeph">EXPLAIN</code> plan from bottom to top:
+        <ul class="ul">
+          <li class="li">
+            The last part of the plan shows the low-level details such as the expected amount of data that will be
+            read, where you can judge the effectiveness of your partitioning strategy and estimate how long it will
+            take to scan a table based on total data size and the size of the cluster.
+          </li>
+
+          <li class="li">
+            As you work your way up, next you see the operations that will be parallelized and performed on each
+            Impala node.
+          </li>
+
+          <li class="li">
+            At the higher levels, you see how data flows when intermediate result sets are combined and transmitted
+            from one node to another.
+          </li>
+
+          <li class="li">
+            See <a class="xref" href="../shared/../topics/impala_explain_level.html#explain_level">EXPLAIN_LEVEL Query Option</a> for details about the
+            <code class="ph codeph">EXPLAIN_LEVEL</code> query option, which lets you customize how much detail to show in the
+            <code class="ph codeph">EXPLAIN</code> plan depending on whether you are doing high-level or low-level tuning,
+            dealing with logical or physical aspects of the query.
+          </li>
+        </ul>
+      </div>
+
+    <p class="p">
+      If you come from a traditional database background and are not familiar with data warehousing, keep in mind
+      that Impala is optimized for full table scans across very large tables. The structure and distribution of
+      this data is typically not suitable for the kind of indexing and single-row lookups that are common in OLTP
+      environments. Seeing a query scan entirely through a large table is common, not necessarily an indication of
+      an inefficient query. Of course, if you can reduce the volume of scanned data by orders of magnitude, for
+      example by using a query that affects only certain partitions within a partitioned table, then you might be
+      able to optimize a query so that it executes in seconds rather than minutes.
+    </p>
+
+    <p class="p">
+      For more information and examples to help you interpret <code class="ph codeph">EXPLAIN</code> output, see
+      <a class="xref" href="impala_explain_plan.html#perf_explain">Using the EXPLAIN Plan for Performance Tuning</a>.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Extended EXPLAIN output:</strong>
+    </p>
+
+    <p class="p">
+      For performance tuning of complex queries, and capacity planning (such as using the admission control and
+      resource management features), you can enable more detailed and informative output for the
+      <code class="ph codeph">EXPLAIN</code> statement. In the <span class="keyword cmdname">impala-shell</span> interpreter, issue the command
+      <code class="ph codeph">SET EXPLAIN_LEVEL=<var class="keyword varname">level</var></code>, where <var class="keyword varname">level</var> is an integer
+      from 0 to 3 or corresponding mnemonic values <code class="ph codeph">minimal</code>, <code class="ph codeph">standard</code>,
+      <code class="ph codeph">extended</code>, or <code class="ph codeph">verbose</code>.
+    </p>
+
+    <p class="p">
+      When extended <code class="ph codeph">EXPLAIN</code> output is enabled, <code class="ph codeph">EXPLAIN</code> statements print
+      information about estimated memory requirements, minimum number of virtual cores, and so on.
+
+    </p>
+
+    <p class="p">
+      See <a class="xref" href="impala_explain_level.html#explain_level">EXPLAIN_LEVEL Query Option</a> for details and examples.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+    <p class="p">
+      This example shows how the standard <code class="ph codeph">EXPLAIN</code> output moves from the lowest (physical) level to
+      the higher (logical) levels. The query begins by scanning a certain amount of data; each node performs an
+      aggregation operation (evaluating <code class="ph codeph">COUNT(*)</code>) on some subset of data that is local to that
+      node; the intermediate results are transmitted back to the coordinator node (labelled here as the
+      <code class="ph codeph">EXCHANGE</code> node); lastly, the intermediate results are summed to display the final result.
+    </p>
+
+<pre class="pre codeblock" id="explain__explain_plan_simple"><code>[impalad-host:21000] &gt; explain select count(*) from customer_address;
++----------------------------------------------------------+
+| Explain String                                           |
++----------------------------------------------------------+
+| Estimated Per-Host Requirements: Memory=42.00MB VCores=1 |
+|                                                          |
+| 03:AGGREGATE [MERGE FINALIZE]                            |
+| |  output: sum(count(*))                                 |
+| |                                                        |
+| 02:EXCHANGE [PARTITION=UNPARTITIONED]                    |
+| |                                                        |
+| 01:AGGREGATE                                             |
+| |  output: count(*)                                      |
+| |                                                        |
+| 00:SCAN HDFS [default.customer_address]                  |
+|    partitions=1/1 size=5.25MB                            |
++----------------------------------------------------------+
+</code></pre>
+
+    <p class="p">
+      These examples show how the extended <code class="ph codeph">EXPLAIN</code> output becomes more accurate and informative as
+      statistics are gathered by the <code class="ph codeph">COMPUTE STATS</code> statement. Initially, much of the information
+      about data size and distribution is marked <span class="q">"unavailable"</span>. Impala can determine the raw data size, but
+      not the number of rows or number of distinct values for each column without additional analysis. The
+      <code class="ph codeph">COMPUTE STATS</code> statement performs this analysis, so a subsequent <code class="ph codeph">EXPLAIN</code>
+      statement has additional information to use in deciding how to optimize the distributed query.
+    </p>
+
+
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; set explain_level=extended;
+EXPLAIN_LEVEL set to extended
+[localhost:21000] &gt; explain select x from t1;
+[localhost:21000] &gt; explain select x from t1;
++----------------------------------------------------------+
+| Explain String                                           |
++----------------------------------------------------------+
+| Estimated Per-Host Requirements: Memory=32.00MB VCores=1 |
+|                                                          |
+| 01:EXCHANGE [PARTITION=UNPARTITIONED]                    |
+| |  hosts=1 per-host-mem=unavailable                      |
+<strong class="ph b">| |  tuple-ids=0 row-size=4B cardinality=unavailable       |</strong>
+| |                                                        |
+| 00:SCAN HDFS [default.t2, PARTITION=RANDOM]              |
+|    partitions=1/1 size=36B                               |
+<strong class="ph b">|    table stats: unavailable                              |</strong>
+<strong class="ph b">|    column stats: unavailable                             |</strong>
+|    hosts=1 per-host-mem=32.00MB                          |
+<strong class="ph b">|    tuple-ids=0 row-size=4B cardinality=unavailable       |</strong>
++----------------------------------------------------------+
+</code></pre>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; compute stats t1;
++-----------------------------------------+
+| summary                                 |
++-----------------------------------------+
+| Updated 1 partition(s) and 1 column(s). |
++-----------------------------------------+
+[localhost:21000] &gt; explain select x from t1;
++----------------------------------------------------------+
+| Explain String                                           |
++----------------------------------------------------------+
+| Estimated Per-Host Requirements: Memory=64.00MB VCores=1 |
+|                                                          |
+| 01:EXCHANGE [PARTITION=UNPARTITIONED]                    |
+| |  hosts=1 per-host-mem=unavailable                      |
+| |  tuple-ids=0 row-size=4B cardinality=0                 |
+| |                                                        |
+| 00:SCAN HDFS [default.t1, PARTITION=RANDOM]              |
+|    partitions=1/1 size=36B                               |
+<strong class="ph b">|    table stats: 0 rows total                             |</strong>
+<strong class="ph b">|    column stats: all                                     |</strong>
+|    hosts=1 per-host-mem=64.00MB                          |
+<strong class="ph b">|    tuple-ids=0 row-size=4B cardinality=0                 |</strong>
++----------------------------------------------------------+
+</code></pre>
+
+    <p class="p">
+        <strong class="ph b">Security considerations:</strong>
+      </p>
+    <p class="p">
+        If these statements in your environment contain sensitive literal values such as credit card numbers or tax
+        identifiers, Impala can redact this sensitive information when displaying the statements in log files and
+        other administrative contexts. See <span class="xref">the documentation for your Apache Hadoop distribution</span> for details.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Cancellation:</strong> Cannot be cancelled.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">HDFS permissions:</strong>
+      </p>
+    <p class="p">
+
+      The user ID that the <span class="keyword cmdname">impalad</span> daemon runs under,
+      typically the <code class="ph codeph">impala</code> user, must have read
+      and execute permissions for all applicable directories in all source tables
+      for the query that is being explained.
+      (A <code class="ph codeph">SELECT</code> operation could read files from multiple different HDFS directories
+      if the source table is partitioned.)
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Kudu considerations:</strong>
+      </p>
+    <p class="p">
+      The <code class="ph codeph">EXPLAIN</code> statement displays equivalent plan
+      information for queries against Kudu tables as for queries
+      against HDFS-based tables.
+    </p>
+
+    <p class="p">
+      To see which predicates Impala can <span class="q">"push down"</span> to Kudu for
+      efficient evaluation, without transmitting unnecessary rows back
+      to Impala, look for the <code class="ph codeph">kudu predicates</code> item in
+      the scan phase of the query. The label <code class="ph codeph">kudu predicates</code>
+      indicates a condition that can be evaluated efficiently on the Kudu
+      side. The label <code class="ph codeph">predicates</code> in a <code class="ph codeph">SCAN KUDU</code>
+      node indicates a condition that is evaluated by Impala.
+      For example, in a table with primary key column <code class="ph codeph">X</code>
+      and non-primary key column <code class="ph codeph">Y</code>, you can see that
+      some operators in the <code class="ph codeph">WHERE</code> clause are evaluated
+      immediately by Kudu and others are evaluated later by Impala:
+    </p>
+
+<pre class="pre codeblock"><code>
+EXPLAIN SELECT x,y from kudu_table WHERE
+  x = 1 AND y NOT IN (2,3) AND z = 1
+  AND a IS NOT NULL AND b &gt; 0 AND length(s) &gt; 5;
++----------------
+| Explain String
++----------------
+...
+| 00:SCAN KUDU [kudu_table]
+|    predicates: y NOT IN (2, 3), length(s) &gt; 5
+|    kudu predicates: a IS NOT NULL, b &gt; 0, x = 1, z = 1
+</code></pre>
+
+    <p class="p">
+      Only binary predicates, <code class="ph codeph">IS NULL</code> and <code class="ph codeph">IS NOT NULL</code>
+      (in <span class="keyword">Impala 2.9</span> and higher), and <code class="ph codeph">IN</code> predicates
+      containing literal values that exactly match the types in the Kudu table, and do not
+      require any casting, can be pushed to Kudu.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+    <p class="p">
+      <a class="xref" href="impala_select.html#select">SELECT Statement</a>,
+      <a class="xref" href="impala_insert.html#insert">INSERT Statement</a>,
+      <a class="xref" href="impala_create_table.html#create_table">CREATE TABLE Statement</a>,
+      <a class="xref" href="impala_explain_plan.html#explain_plan">Understanding Impala Query Performance - EXPLAIN Plans and Query Profiles</a>
+    </p>
+
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_langref_sql.html">Impala SQL Statements</a></div></div></nav></article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_explain_level.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_explain_level.html b/docs/build3x/html/topics/impala_explain_level.html
new file mode 100644
index 0000000..23d901a
--- /dev/null
+++ b/docs/build3x/html/topics/impala_explain_level.html
@@ -0,0 +1,342 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="explain_level"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>EXPLAIN_LEVEL Query Option</title></head><body id="explain_level"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">EXPLAIN_LEVEL Query Option</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+
+      Controls the amount of detail provided in the output of the <code class="ph codeph">EXPLAIN</code> statement. The basic
+      output can help you identify high-level performance issues such as scanning a higher volume of data or more
+      partitions than you expect. The higher levels of detail show how intermediate results flow between nodes and
+      how different SQL operations such as <code class="ph codeph">ORDER BY</code>, <code class="ph codeph">GROUP BY</code>, joins, and
+      <code class="ph codeph">WHERE</code> clauses are implemented within a distributed query.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Type:</strong> <code class="ph codeph">STRING</code> or <code class="ph codeph">INT</code>
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Default:</strong> <code class="ph codeph">1</code>
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Arguments:</strong>
+    </p>
+
+    <p class="p">
+      The allowed range of numeric values for this option is 0 to 3:
+    </p>
+
+    <ul class="ul">
+      <li class="li">
+        <code class="ph codeph">0</code> or <code class="ph codeph">MINIMAL</code>: A barebones list, one line per operation. Primarily useful
+        for checking the join order in very long queries where the regular <code class="ph codeph">EXPLAIN</code> output is too
+        long to read easily.
+      </li>
+
+      <li class="li">
+        <code class="ph codeph">1</code> or <code class="ph codeph">STANDARD</code>: The default level of detail, showing the logical way that
+        work is split up for the distributed query.
+      </li>
+
+      <li class="li">
+        <code class="ph codeph">2</code> or <code class="ph codeph">EXTENDED</code>: Includes additional detail about how the query planner
+        uses statistics in its decision-making process, to understand how a query could be tuned by gathering
+        statistics, using query hints, adding or removing predicates, and so on.
+      </li>
+
+      <li class="li">
+        <code class="ph codeph">3</code> or <code class="ph codeph">VERBOSE</code>: The maximum level of detail, showing how work is split up
+        within each node into <span class="q">"query fragments"</span> that are connected in a pipeline. This extra detail is
+        primarily useful for low-level performance testing and tuning within Impala itself, rather than for
+        rewriting the SQL code at the user level.
+      </li>
+    </ul>
+
+    <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+      Prior to Impala 1.3, the allowed argument range for <code class="ph codeph">EXPLAIN_LEVEL</code> was 0 to 1: level 0 had
+      the mnemonic <code class="ph codeph">NORMAL</code>, and level 1 was <code class="ph codeph">VERBOSE</code>. In Impala 1.3 and higher,
+      <code class="ph codeph">NORMAL</code> is not a valid mnemonic value, and <code class="ph codeph">VERBOSE</code> still applies to the
+      highest level of detail but now corresponds to level 3. You might need to adjust the values if you have any
+      older <code class="ph codeph">impala-shell</code> script files that set the <code class="ph codeph">EXPLAIN_LEVEL</code> query option.
+    </div>
+
+    <p class="p">
+      Changing the value of this option controls the amount of detail in the output of the <code class="ph codeph">EXPLAIN</code>
+      statement. The extended information from level 2 or 3 is especially useful during performance tuning, when
+      you need to confirm whether the work for the query is distributed the way you expect, particularly for the
+      most resource-intensive operations such as join queries against large tables, queries against tables with
+      large numbers of partitions, and insert operations for Parquet tables. The extended information also helps to
+      check estimated resource usage when you use the admission control or resource management features explained
+      in <a class="xref" href="impala_resource_management.html#resource_management">Resource Management for Impala</a>. See
+      <a class="xref" href="impala_explain.html#explain">EXPLAIN Statement</a> for the syntax of the <code class="ph codeph">EXPLAIN</code> statement, and
+      <a class="xref" href="impala_explain_plan.html#perf_explain">Using the EXPLAIN Plan for Performance Tuning</a> for details about how to use the extended information.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+    <p class="p">
+      As always, read the <code class="ph codeph">EXPLAIN</code> output from bottom to top. The lowest lines represent the
+      initial work of the query (scanning data files), the lines in the middle represent calculations done on each
+      node and how intermediate results are transmitted from one node to another, and the topmost lines represent
+      the final results being sent back to the coordinator node.
+    </p>
+
+    <p class="p">
+      The numbers in the left column are generated internally during the initial planning phase and do not
+      represent the actual order of operations, so it is not significant if they appear out of order in the
+      <code class="ph codeph">EXPLAIN</code> output.
+    </p>
+
+    <p class="p">
+      At all <code class="ph codeph">EXPLAIN</code> levels, the plan contains a warning if any tables in the query are missing
+      statistics. Use the <code class="ph codeph">COMPUTE STATS</code> statement to gather statistics for each table and suppress
+      this warning. See <a class="xref" href="impala_perf_stats.html#perf_stats">Table and Column Statistics</a> for details about how the statistics help
+      query performance.
+    </p>
+
+    <p class="p">
+      The <code class="ph codeph">PROFILE</code> command in <span class="keyword cmdname">impala-shell</span> always starts with an explain plan
+      showing full detail, the same as with <code class="ph codeph">EXPLAIN_LEVEL=3</code>. <span class="ph">After the explain
+      plan comes the executive summary, the same output as produced by the <code class="ph codeph">SUMMARY</code> command in
+      <span class="keyword cmdname">impala-shell</span>.</span>
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+    <p class="p">
+      These examples use a trivial, empty table to illustrate how the essential aspects of query planning are shown
+      in <code class="ph codeph">EXPLAIN</code> output:
+    </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; create table t1 (x int, s string);
+[localhost:21000] &gt; set explain_level=1;
+[localhost:21000] &gt; explain select count(*) from t1;
++------------------------------------------------------------------------+
+| Explain String                                                         |
++------------------------------------------------------------------------+
+| Estimated Per-Host Requirements: Memory=10.00MB VCores=1               |
+| WARNING: The following tables are missing relevant table and/or column |
+|   statistics.                                                          |
+| explain_plan.t1                                                        |
+|                                                                        |
+| 03:AGGREGATE [MERGE FINALIZE]                                          |
+| |  output: sum(count(*))                                               |
+| |                                                                      |
+| 02:EXCHANGE [PARTITION=UNPARTITIONED]                                  |
+| |                                                                      |
+| 01:AGGREGATE                                                           |
+| |  output: count(*)                                                    |
+| |                                                                      |
+| 00:SCAN HDFS [explain_plan.t1]                                         |
+|    partitions=1/1 size=0B                                              |
++------------------------------------------------------------------------+
+[localhost:21000] &gt; explain select * from t1;
++------------------------------------------------------------------------+
+| Explain String                                                         |
++------------------------------------------------------------------------+
+| Estimated Per-Host Requirements: Memory=-9223372036854775808B VCores=0 |
+| WARNING: The following tables are missing relevant table and/or column |
+|   statistics.                                                          |
+| explain_plan.t1                                                        |
+|                                                                        |
+| 01:EXCHANGE [PARTITION=UNPARTITIONED]                                  |
+| |                                                                      |
+| 00:SCAN HDFS [explain_plan.t1]                                         |
+|    partitions=1/1 size=0B                                              |
++------------------------------------------------------------------------+
+[localhost:21000] &gt; set explain_level=2;
+[localhost:21000] &gt; explain select * from t1;
++------------------------------------------------------------------------+
+| Explain String                                                         |
++------------------------------------------------------------------------+
+| Estimated Per-Host Requirements: Memory=-9223372036854775808B VCores=0 |
+| WARNING: The following tables are missing relevant table and/or column |
+|   statistics.                                                          |
+| explain_plan.t1                                                        |
+|                                                                        |
+| 01:EXCHANGE [PARTITION=UNPARTITIONED]                                  |
+| |  hosts=0 per-host-mem=unavailable                                    |
+| |  tuple-ids=0 row-size=19B cardinality=unavailable                    |
+| |                                                                      |
+| 00:SCAN HDFS [explain_plan.t1, PARTITION=RANDOM]                       |
+|    partitions=1/1 size=0B                                              |
+|    table stats: unavailable                                            |
+|    column stats: unavailable                                           |
+|    hosts=0 per-host-mem=0B                                             |
+|    tuple-ids=0 row-size=19B cardinality=unavailable                    |
++------------------------------------------------------------------------+
+[localhost:21000] &gt; set explain_level=3;
+[localhost:21000] &gt; explain select * from t1;
++------------------------------------------------------------------------+
+| Explain String                                                         |
++------------------------------------------------------------------------+
+| Estimated Per-Host Requirements: Memory=-9223372036854775808B VCores=0 |
+<strong class="ph b">| WARNING: The following tables are missing relevant table and/or column |</strong>
+<strong class="ph b">|   statistics.                                                          |</strong>
+<strong class="ph b">| explain_plan.t1                                                        |</strong>
+|                                                                        |
+| F01:PLAN FRAGMENT [PARTITION=UNPARTITIONED]                            |
+|   01:EXCHANGE [PARTITION=UNPARTITIONED]                                |
+|      hosts=0 per-host-mem=unavailable                                  |
+|      tuple-ids=0 row-size=19B cardinality=unavailable                  |
+|                                                                        |
+| F00:PLAN FRAGMENT [PARTITION=RANDOM]                                   |
+|   DATASTREAM SINK [FRAGMENT=F01, EXCHANGE=01, PARTITION=UNPARTITIONED] |
+|   00:SCAN HDFS [explain_plan.t1, PARTITION=RANDOM]                     |
+|      partitions=1/1 size=0B                                            |
+<strong class="ph b">|      table stats: unavailable                                          |</strong>
+<strong class="ph b">|      column stats: unavailable                                         |</strong>
+|      hosts=0 per-host-mem=0B                                           |
+|      tuple-ids=0 row-size=19B cardinality=unavailable                  |
++------------------------------------------------------------------------+
+</code></pre>
+
+    <p class="p">
+      As the warning message demonstrates, most of the information needed for Impala to do efficient query
+      planning, and for you to understand the performance characteristics of the query, requires running the
+      <code class="ph codeph">COMPUTE STATS</code> statement for the table:
+    </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; compute stats t1;
++-----------------------------------------+
+| summary                                 |
++-----------------------------------------+
+| Updated 1 partition(s) and 2 column(s). |
++-----------------------------------------+
+[localhost:21000] &gt; explain select * from t1;
++------------------------------------------------------------------------+
+| Explain String                                                         |
++------------------------------------------------------------------------+
+| Estimated Per-Host Requirements: Memory=-9223372036854775808B VCores=0 |
+|                                                                        |
+| F01:PLAN FRAGMENT [PARTITION=UNPARTITIONED]                            |
+|   01:EXCHANGE [PARTITION=UNPARTITIONED]                                |
+|      hosts=0 per-host-mem=unavailable                                  |
+|      tuple-ids=0 row-size=20B cardinality=0                            |
+|                                                                        |
+| F00:PLAN FRAGMENT [PARTITION=RANDOM]                                   |
+|   DATASTREAM SINK [FRAGMENT=F01, EXCHANGE=01, PARTITION=UNPARTITIONED] |
+|   00:SCAN HDFS [explain_plan.t1, PARTITION=RANDOM]                     |
+|      partitions=1/1 size=0B                                            |
+<strong class="ph b">|      table stats: 0 rows total                                         |</strong>
+<strong class="ph b">|      column stats: all                                                 |</strong>
+|      hosts=0 per-host-mem=0B                                           |
+|      tuple-ids=0 row-size=20B cardinality=0                            |
++------------------------------------------------------------------------+
+</code></pre>
+
+    <p class="p">
+      Joins and other complicated, multi-part queries are the ones where you most commonly need to examine the
+      <code class="ph codeph">EXPLAIN</code> output and customize the amount of detail in the output. This example shows the
+      default <code class="ph codeph">EXPLAIN</code> output for a three-way join query, then the equivalent output with a
+      <code class="ph codeph">[SHUFFLE]</code> hint to change the join mechanism between the first two tables from a broadcast
+      join to a shuffle join.
+    </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; set explain_level=1;
+[localhost:21000] &gt; explain select one.*, two.*, three.* from t1 one, t1 two, t1 three where one.x = two.x and two.x = three.x;
++---------------------------------------------------------+
+| Explain String                                          |
++---------------------------------------------------------+
+| Estimated Per-Host Requirements: Memory=4.00GB VCores=3 |
+|                                                         |
+| 07:EXCHANGE [PARTITION=UNPARTITIONED]                   |
+| |                                                       |
+<strong class="ph b">| 04:HASH JOIN [INNER JOIN, BROADCAST]                    |</strong>
+| |  hash predicates: two.x = three.x                     |
+| |                                                       |
+<strong class="ph b">| |--06:EXCHANGE [BROADCAST]                              |</strong>
+| |  |                                                    |
+| |  02:SCAN HDFS [explain_plan.t1 three]                 |
+| |     partitions=1/1 size=0B                            |
+| |                                                       |
+<strong class="ph b">| 03:HASH JOIN [INNER JOIN, BROADCAST]                    |</strong>
+| |  hash predicates: one.x = two.x                       |
+| |                                                       |
+<strong class="ph b">| |--05:EXCHANGE [BROADCAST]                              |</strong>
+| |  |                                                    |
+| |  01:SCAN HDFS [explain_plan.t1 two]                   |
+| |     partitions=1/1 size=0B                            |
+| |                                                       |
+| 00:SCAN HDFS [explain_plan.t1 one]                      |
+|    partitions=1/1 size=0B                               |
++---------------------------------------------------------+
+[localhost:21000] &gt; explain select one.*, two.*, three.*
+                  &gt; from t1 one join [shuffle] t1 two join t1 three
+                  &gt; where one.x = two.x and two.x = three.x;
++---------------------------------------------------------+
+| Explain String                                          |
++---------------------------------------------------------+
+| Estimated Per-Host Requirements: Memory=4.00GB VCores=3 |
+|                                                         |
+| 08:EXCHANGE [PARTITION=UNPARTITIONED]                   |
+| |                                                       |
+<strong class="ph b">| 04:HASH JOIN [INNER JOIN, BROADCAST]                    |</strong>
+| |  hash predicates: two.x = three.x                     |
+| |                                                       |
+<strong class="ph b">| |--07:EXCHANGE [BROADCAST]                              |</strong>
+| |  |                                                    |
+| |  02:SCAN HDFS [explain_plan.t1 three]                 |
+| |     partitions=1/1 size=0B                            |
+| |                                                       |
+<strong class="ph b">| 03:HASH JOIN [INNER JOIN, PARTITIONED]                  |</strong>
+| |  hash predicates: one.x = two.x                       |
+| |                                                       |
+<strong class="ph b">| |--06:EXCHANGE [PARTITION=HASH(two.x)]                  |</strong>
+| |  |                                                    |
+| |  01:SCAN HDFS [explain_plan.t1 two]                   |
+| |     partitions=1/1 size=0B                            |
+| |                                                       |
+<strong class="ph b">| 05:EXCHANGE [PARTITION=HASH(one.x)]                     |</strong>
+| |                                                       |
+| 00:SCAN HDFS [explain_plan.t1 one]                      |
+|    partitions=1/1 size=0B                               |
++---------------------------------------------------------+
+</code></pre>
+
+    <p class="p">
+      For a join involving many different tables, the default <code class="ph codeph">EXPLAIN</code> output might stretch over
+      several pages, and the only details you care about might be the join order and the mechanism (broadcast or
+      shuffle) for joining each pair of tables. In that case, you might set <code class="ph codeph">EXPLAIN_LEVEL</code> to its
+      lowest value of 0, to focus on just the join order and join mechanism for each stage. The following example
+      shows how the rows from the first and second joined tables are hashed and divided among the nodes of the
+      cluster for further filtering; then the entire contents of the third table are broadcast to all nodes for the
+      final stage of join processing.
+    </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; set explain_level=0;
+[localhost:21000] &gt; explain select one.*, two.*, three.*
+                  &gt; from t1 one join [shuffle] t1 two join t1 three
+                  &gt; where one.x = two.x and two.x = three.x;
++---------------------------------------------------------+
+| Explain String                                          |
++---------------------------------------------------------+
+| Estimated Per-Host Requirements: Memory=4.00GB VCores=3 |
+|                                                         |
+| 08:EXCHANGE [PARTITION=UNPARTITIONED]                   |
+<strong class="ph b">| 04:HASH JOIN [INNER JOIN, BROADCAST]                    |</strong>
+<strong class="ph b">| |--07:EXCHANGE [BROADCAST]                              |</strong>
+| |  02:SCAN HDFS [explain_plan.t1 three]                 |
+<strong class="ph b">| 03:HASH JOIN [INNER JOIN, PARTITIONED]                  |</strong>
+<strong class="ph b">| |--06:EXCHANGE [PARTITION=HASH(two.x)]                  |</strong>
+| |  01:SCAN HDFS [explain_plan.t1 two]                   |
+<strong class="ph b">| 05:EXCHANGE [PARTITION=HASH(one.x)]                     |</strong>
+| 00:SCAN HDFS [explain_plan.t1 one]                      |
++---------------------------------------------------------+
+</code></pre>
+
+
+
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_explain_plan.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_explain_plan.html b/docs/build3x/html/topics/impala_explain_plan.html
new file mode 100644
index 0000000..020c28b
--- /dev/null
+++ b/docs/build3x/html/topics/impala_explain_plan.html
@@ -0,0 +1,592 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_performance.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="explain_plan"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Understanding Impala Query Performance - EXPLAIN Plans and Query Profiles</title>
 </head><body id="explain_plan"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Understanding Impala Query Performance - EXPLAIN Plans and Query Profiles</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+      To understand the high-level performance considerations for Impala queries, read the output of the
+      <code class="ph codeph">EXPLAIN</code> statement for the query. You can get the <code class="ph codeph">EXPLAIN</code> plan without
+      actually running the query itself.
+    </p>
+
+    <p class="p">
+      For an overview of the physical performance characteristics for a query, issue the <code class="ph codeph">SUMMARY</code>
+      statement in <span class="keyword cmdname">impala-shell</span> immediately after executing a query. This condensed information
+      shows which phases of execution took the most time, and how the estimates for memory usage and number of rows
+      at each phase compare to the actual values.
+    </p>
+
+    <p class="p">
+      To understand the detailed performance characteristics for a query, issue the <code class="ph codeph">PROFILE</code>
+      statement in <span class="keyword cmdname">impala-shell</span> immediately after executing a query. This low-level information
+      includes physical details about memory, CPU, I/O, and network usage, and thus is only available after the
+      query is actually run.
+    </p>
+
+    <p class="p toc inpage"></p>
+
+    <p class="p">
+      Also, see <a class="xref" href="impala_hbase.html#hbase_performance">Performance Considerations for the Impala-HBase Integration</a>
+      and <a class="xref" href="impala_s3.html#s3_performance">Understanding and Tuning Impala Query Performance for S3 Data</a>
+      for examples of interpreting
+      <code class="ph codeph">EXPLAIN</code> plans for queries against HBase tables
+      <span class="ph">and data stored in the Amazon Simple Storage System (S3)</span>.
+    </p>
+  </div>
+
+  <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_performance.html">Tuning Impala for Performance</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="explain_plan__perf_explain">
+
+    <h2 class="title topictitle2" id="ariaid-title2">Using the EXPLAIN Plan for Performance Tuning</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        The <code class="ph codeph"><a class="xref" href="impala_explain.html#explain">EXPLAIN</a></code> statement gives you an outline
+        of the logical steps that a query will perform, such as how the work will be distributed among the nodes
+        and how intermediate results will be combined to produce the final result set. You can see these details
+        before actually running the query. You can use this information to check that the query will not operate in
+        some very unexpected or inefficient way.
+      </p>
+
+
+
+<pre class="pre codeblock"><code>[impalad-host:21000] &gt; explain select count(*) from customer_address;
++----------------------------------------------------------+
+| Explain String                                           |
++----------------------------------------------------------+
+| Estimated Per-Host Requirements: Memory=42.00MB VCores=1 |
+|                                                          |
+| 03:AGGREGATE [MERGE FINALIZE]                            |
+| |  output: sum(count(*))                                 |
+| |                                                        |
+| 02:EXCHANGE [PARTITION=UNPARTITIONED]                    |
+| |                                                        |
+| 01:AGGREGATE                                             |
+| |  output: count(*)                                      |
+| |                                                        |
+| 00:SCAN HDFS [default.customer_address]                  |
+|    partitions=1/1 size=5.25MB                            |
++----------------------------------------------------------+
+</code></pre>
+
+      <div class="p">
+        Read the <code class="ph codeph">EXPLAIN</code> plan from bottom to top:
+        <ul class="ul">
+          <li class="li">
+            The last part of the plan shows the low-level details such as the expected amount of data that will be
+            read, where you can judge the effectiveness of your partitioning strategy and estimate how long it will
+            take to scan a table based on total data size and the size of the cluster.
+          </li>
+
+          <li class="li">
+            As you work your way up, next you see the operations that will be parallelized and performed on each
+            Impala node.
+          </li>
+
+          <li class="li">
+            At the higher levels, you see how data flows when intermediate result sets are combined and transmitted
+            from one node to another.
+          </li>
+
+          <li class="li">
+            See <a class="xref" href="../shared/../topics/impala_explain_level.html#explain_level">EXPLAIN_LEVEL Query Option</a> for details about the
+            <code class="ph codeph">EXPLAIN_LEVEL</code> query option, which lets you customize how much detail to show in the
+            <code class="ph codeph">EXPLAIN</code> plan depending on whether you are doing high-level or low-level tuning,
+            dealing with logical or physical aspects of the query.
+          </li>
+        </ul>
+      </div>
+
+      <p class="p">
+        The <code class="ph codeph">EXPLAIN</code> plan is also printed at the beginning of the query profile report described in
+        <a class="xref" href="#perf_profile">Using the Query Profile for Performance Tuning</a>, for convenience in examining both the logical and physical aspects of the
+        query side-by-side.
+      </p>
+
+      <p class="p">
+        The amount of detail displayed in the <code class="ph codeph">EXPLAIN</code> output is controlled by the
+        <a class="xref" href="impala_explain_level.html#explain_level">EXPLAIN_LEVEL</a> query option. You typically
+        increase this setting from <code class="ph codeph">standard</code> to <code class="ph codeph">extended</code> (or from <code class="ph codeph">1</code>
+        to <code class="ph codeph">2</code>) when doublechecking the presence of table and column statistics during performance
+        tuning, or when estimating query resource usage in conjunction with the resource management features.
+      </p>
+
+
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="explain_plan__perf_summary">
+
+    <h2 class="title topictitle2" id="ariaid-title3">Using the SUMMARY Report for Performance Tuning</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        The <code class="ph codeph"><a class="xref" href="impala_shell_commands.html#shell_commands">SUMMARY</a></code> command within
+        the <span class="keyword cmdname">impala-shell</span> interpreter gives you an easy-to-digest overview of the timings for the
+        different phases of execution for a query. Like the <code class="ph codeph">EXPLAIN</code> plan, it is easy to see
+        potential performance bottlenecks. Like the <code class="ph codeph">PROFILE</code> output, it is available after the
+        query is run and so displays actual timing numbers.
+      </p>
+
+      <p class="p">
+        The <code class="ph codeph">SUMMARY</code> report is also printed at the beginning of the query profile report described
+        in <a class="xref" href="#perf_profile">Using the Query Profile for Performance Tuning</a>, for convenience in examining high-level and low-level aspects of the query
+        side-by-side.
+      </p>
+
+      <p class="p">
+        For example, here is a query involving an aggregate function, on a single-node VM. The different stages of
+        the query and their timings are shown (rolled up for all nodes), along with estimated and actual values
+        used in planning the query. In this case, the <code class="ph codeph">AVG()</code> function is computed for a subset of
+        data on each node (stage 01) and then the aggregated results from all nodes are combined at the end (stage
+        03). You can see which stages took the most time, and whether any estimates were substantially different
+        than the actual data distribution. (When examining the time values, be sure to consider the suffixes such
+        as <code class="ph codeph">us</code> for microseconds and <code class="ph codeph">ms</code> for milliseconds, rather than just looking
+        for the largest numbers.)
+      </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; select avg(ss_sales_price) from store_sales where ss_coupon_amt = 0;
++---------------------+
+| avg(ss_sales_price) |
++---------------------+
+| 37.80770926328327   |
++---------------------+
+[localhost:21000] &gt; summary;
++--------------+--------+----------+----------+-------+------------+----------+---------------+-----------------+
+| Operator     | #Hosts | Avg Time | Max Time | #Rows | Est. #Rows | Peak Mem | Est. Peak Mem | Detail          |
++--------------+--------+----------+----------+-------+------------+----------+---------------+-----------------+
+| 03:AGGREGATE | 1      | 1.03ms   | 1.03ms   | 1     | 1          | 48.00 KB | -1 B          | MERGE FINALIZE  |
+| 02:EXCHANGE  | 1      | 0ns      | 0ns      | 1     | 1          | 0 B      | -1 B          | UNPARTITIONED   |
+| 01:AGGREGATE | 1      | 30.79ms  | 30.79ms  | 1     | 1          | 80.00 KB | 10.00 MB      |                 |
+| 00:SCAN HDFS | 1      | 5.45s    | 5.45s    | 2.21M | -1         | 64.05 MB | 432.00 MB     | tpc.store_sales |
++--------------+--------+----------+----------+-------+------------+----------+---------------+-----------------+
+</code></pre>
+
+      <p class="p">
+        Notice how the longest initial phase of the query is measured in seconds (s), while later phases working on
+        smaller intermediate results are measured in milliseconds (ms) or even nanoseconds (ns).
+      </p>
+
+      <p class="p">
+        Here is an example from a more complicated query, as it would appear in the <code class="ph codeph">PROFILE</code>
+        output:
+      </p>
+
+<pre class="pre codeblock"><code>Operator              #Hosts   Avg Time   Max Time    #Rows  Est. #Rows  Peak Mem  Est. Peak Mem  Detail
+------------------------------------------------------------------------------------------------------------------------
+09:MERGING-EXCHANGE        1   79.738us   79.738us        5           5         0        -1.00 B  UNPARTITIONED
+05:TOP-N                   3   84.693us   88.810us        5           5  12.00 KB       120.00 B
+04:AGGREGATE               3    5.263ms    6.432ms        5           5  44.00 KB       10.00 MB  MERGE FINALIZE
+08:AGGREGATE               3   16.659ms   27.444ms   52.52K     600.12K   3.20 MB       15.11 MB  MERGE
+07:EXCHANGE                3    2.644ms      5.1ms   52.52K     600.12K         0              0  HASH(o_orderpriority)
+03:AGGREGATE               3  342.913ms  966.291ms   52.52K     600.12K  10.80 MB       15.11 MB
+02:HASH JOIN               3    2s165ms    2s171ms  144.87K     600.12K  13.63 MB      941.01 KB  INNER JOIN, BROADCAST
+|--06:EXCHANGE             3    8.296ms    8.692ms   57.22K      15.00K         0              0  BROADCAST
+|  01:SCAN HDFS            2    1s412ms    1s978ms   57.22K      15.00K  24.21 MB      176.00 MB  tpch.orders o
+00:SCAN HDFS               3    8s032ms    8s558ms    3.79M     600.12K  32.29 MB      264.00 MB  tpch.lineitem l
+</code></pre>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="explain_plan__perf_profile">
+
+    <h2 class="title topictitle2" id="ariaid-title4">Using the Query Profile for Performance Tuning</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        The <code class="ph codeph">PROFILE</code> statement, available in the <span class="keyword cmdname">impala-shell</span> interpreter,
+        produces a detailed low-level report showing how the most recent query was executed. Unlike the
+        <code class="ph codeph">EXPLAIN</code> plan described in <a class="xref" href="#perf_explain">Using the EXPLAIN Plan for Performance Tuning</a>, this information is only available
+        after the query has finished. It shows physical details such as the number of bytes read, maximum memory
+        usage, and so on for each node. You can use this information to determine if the query is I/O-bound or
+        CPU-bound, whether some network condition is imposing a bottleneck, whether a slowdown is affecting some
+        nodes but not others, and to check that recommended configuration settings such as short-circuit local
+        reads are in effect.
+      </p>
+
+      <p class="p">
+        By default, time values in the profile output reflect the wall-clock time taken by an operation.
+        For values denoting system time or user time, the measurement unit is reflected in the metric
+        name, such as <code class="ph codeph">ScannerThreadsSysTime</code> or <code class="ph codeph">ScannerThreadsUserTime</code>.
+        For example, a multi-threaded I/O operation might show a small figure for wall-clock time,
+        while the corresponding system time is larger, representing the sum of the CPU time taken by each thread.
+        Or a wall-clock time figure might be larger because it counts time spent waiting, while
+        the corresponding system and user time figures only measure the time while the operation
+        is actively using CPU cycles.
+      </p>
+
+      <p class="p">
+        The <a class="xref" href="impala_explain_plan.html#perf_explain"><code class="ph codeph">EXPLAIN</code> plan</a> is also printed
+        at the beginning of the query profile report, for convenience in examining both the logical and physical
+        aspects of the query side-by-side. The
+        <a class="xref" href="impala_explain_level.html#explain_level">EXPLAIN_LEVEL</a> query option also controls the
+        verbosity of the <code class="ph codeph">EXPLAIN</code> output printed by the <code class="ph codeph">PROFILE</code> command.
+      </p>
+
+
+
+      <p class="p">
+        Here is an example of a query profile, from a relatively straightforward query on a single-node
+        pseudo-distributed cluster to keep the output relatively brief.
+      </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; profile;
+Query Runtime Profile:
+Query (id=6540a03d4bee0691:4963d6269b210ebd):
+  Summary:
+    Session ID: ea4a197f1c7bf858:c74e66f72e3a33ba
+    Session Type: BEESWAX
+    Start Time: 2013-12-02 17:10:30.263067000
+    End Time: 2013-12-02 17:10:50.932044000
+    Query Type: QUERY
+    Query State: FINISHED
+    Query Status: OK
+    Impala Version: impalad version 1.2.1 RELEASE (build edb5af1bcad63d410bc5d47cc203df3a880e9324)
+    User: doc_demo
+    Network Address: 127.0.0.1:49161
+    Default Db: stats_testing
+    Sql Statement: select t1.s, t2.s from t1 join t2 on (t1.id = t2.parent)
+    Plan:
+----------------
+Estimated Per-Host Requirements: Memory=2.09GB VCores=2
+
+PLAN FRAGMENT 0
+  PARTITION: UNPARTITIONED
+
+  4:EXCHANGE
+     cardinality: unavailable
+     per-host memory: unavailable
+     tuple ids: 0 1
+
+PLAN FRAGMENT 1
+  PARTITION: RANDOM
+
+  STREAM DATA SINK
+    EXCHANGE ID: 4
+    UNPARTITIONED
+
+  2:HASH JOIN
+  |  join op: INNER JOIN (BROADCAST)
+  |  hash predicates:
+  |    t1.id = t2.parent
+  |  cardinality: unavailable
+  |  per-host memory: 2.00GB
+  |  tuple ids: 0 1
+  |
+  |----3:EXCHANGE
+  |       cardinality: unavailable
+  |       per-host memory: 0B
+  |       tuple ids: 1
+  |
+  0:SCAN HDFS
+     table=stats_testing.t1 #partitions=1/1 size=33B
+     table stats: unavailable
+     column stats: unavailable
+     cardinality: unavailable
+     per-host memory: 32.00MB
+     tuple ids: 0
+
+PLAN FRAGMENT 2
+  PARTITION: RANDOM
+
+  STREAM DATA SINK
+    EXCHANGE ID: 3
+    UNPARTITIONED
+
+  1:SCAN HDFS
+     table=stats_testing.t2 #partitions=1/1 size=960.00KB
+     table stats: unavailable
+     column stats: unavailable
+     cardinality: unavailable
+     per-host memory: 96.00MB
+     tuple ids: 1
+----------------
+    Query Timeline: 20s670ms
+       - Start execution: 2.559ms (2.559ms)
+       - Planning finished: 23.587ms (21.27ms)
+       - Rows available: 666.199ms (642.612ms)
+       - First row fetched: 668.919ms (2.719ms)
+       - Unregister query: 20s668ms (20s000ms)
+  ImpalaServer:
+     - ClientFetchWaitTimer: 19s637ms
+     - RowMaterializationTimer: 167.121ms
+  Execution Profile 6540a03d4bee0691:4963d6269b210ebd:(Active: 837.815ms, % non-child: 0.00%)
+    Per Node Peak Memory Usage: impala-1.example.com:22000(7.42 MB)
+     - FinalizationTimer: 0ns
+    Coordinator Fragment:(Active: 195.198ms, % non-child: 0.00%)
+      MemoryUsage(500.0ms): 16.00 KB, 7.42 MB, 7.33 MB, 7.10 MB, 6.94 MB, 6.71 MB, 6.56 MB, 6.40 MB, 6.17 MB, 6.02 MB, 5.79 MB, 5.63 MB, 5.48 MB, 5.25 MB, 5.09 MB, 4.86 MB, 4.71 MB, 4.47 MB, 4.32 MB, 4.09 MB, 3.93 MB, 3.78 MB, 3.55 MB, 3.39 MB, 3.16 MB, 3.01 MB, 2.78 MB, 2.62 MB, 2.39 MB, 2.24 MB, 2.08 MB, 1.85 MB, 1.70 MB, 1.54 MB, 1.31 MB, 1.16 MB, 948.00 KB, 790.00 KB, 553.00 KB, 395.00 KB, 237.00 KB
+      ThreadUsage(500.0ms): 1
+       - AverageThreadTokens: 1.00
+       - PeakMemoryUsage: 7.42 MB
+       - PrepareTime: 36.144us
+       - RowsProduced: 98.30K (98304)
+       - TotalCpuTime: 20s449ms
+       - TotalNetworkWaitTime: 191.630ms
+       - TotalStorageWaitTime: 0ns
+      CodeGen:(Active: 150.679ms, % non-child: 77.19%)
+         - CodegenTime: 0ns
+         - CompileTime: 139.503ms
+         - LoadTime: 10.7ms
+         - ModuleFileSize: 95.27 KB
+      EXCHANGE_NODE (id=4):(Active: 194.858ms, % non-child: 99.83%)
+         - BytesReceived: 2.33 MB
+         - ConvertRowBatchTime: 2.732ms
+         - DataArrivalWaitTime: 191.118ms
+         - DeserializeRowBatchTimer: 14.943ms
+         - FirstBatchArrivalWaitTime: 191.117ms
+         - PeakMemoryUsage: 7.41 MB
+         - RowsReturned: 98.30K (98304)
+         - RowsReturnedRate: 504.49 K/sec
+         - SendersBlockedTimer: 0ns
+         - SendersBlockedTotalTimer(*): 0ns
+    Averaged Fragment 1:(Active: 442.360ms, % non-child: 0.00%)
+      split sizes:  min: 33.00 B, max: 33.00 B, avg: 33.00 B, stddev: 0.00
+      completion times: min:443.720ms  max:443.720ms  mean: 443.720ms  stddev:0ns
+      execution rates: min:74.00 B/sec  max:74.00 B/sec  mean:74.00 B/sec  stddev:0.00 /sec
+      num instances: 1
+       - AverageThreadTokens: 1.00
+       - PeakMemoryUsage: 6.06 MB
+       - PrepareTime: 7.291ms
+       - RowsProduced: 98.30K (98304)
+       - TotalCpuTime: 784.259ms
+       - TotalNetworkWaitTime: 388.818ms
+       - TotalStorageWaitTime: 3.934ms
+      CodeGen:(Active: 312.862ms, % non-child: 70.73%)
+         - CodegenTime: 2.669ms
+         - CompileTime: 302.467ms
+         - LoadTime: 9.231ms
+         - ModuleFileSize: 95.27 KB
+      DataStreamSender (dst_id=4):(Active: 80.63ms, % non-child: 18.10%)
+         - BytesSent: 2.33 MB
+         - NetworkThroughput(*): 35.89 MB/sec
+         - OverallThroughput: 29.06 MB/sec
+         - PeakMemoryUsage: 5.33 KB
+         - SerializeBatchTime: 26.487ms
+         - ThriftTransmitTime(*): 64.814ms
+         - UncompressedRowBatchSize: 6.66 MB
+      HASH_JOIN_NODE (id=2):(Active: 362.25ms, % non-child: 3.92%)
+         - BuildBuckets: 1.02K (1024)
+         - BuildRows: 98.30K (98304)
+         - BuildTime: 12.622ms
+         - LoadFactor: 0.00
+         - PeakMemoryUsage: 6.02 MB
+         - ProbeRows: 3
+         - ProbeTime: 3.579ms
+         - RowsReturned: 98.30K (98304)
+         - RowsReturnedRate: 271.54 K/sec
+        EXCHANGE_NODE (id=3):(Active: 344.680ms, % non-child: 77.92%)
+           - BytesReceived: 1.15 MB
+           - ConvertRowBatchTime: 2.792ms
+           - DataArrivalWaitTime: 339.936ms
+           - DeserializeRowBatchTimer: 9.910ms
+           - FirstBatchArrivalWaitTime: 199.474ms
+           - PeakMemoryUsage: 156.00 KB
+           - RowsReturned: 98.30K (98304)
+           - RowsReturnedRate: 285.20 K/sec
+           - SendersBlockedTimer: 0ns
+           - SendersBlockedTotalTimer(*): 0ns
+      HDFS_SCAN_NODE (id=0):(Active: 13.616us, % non-child: 0.00%)
+         - AverageHdfsReadThreadConcurrency: 0.00
+         - AverageScannerThreadConcurrency: 0.00
+         - BytesRead: 33.00 B
+         - BytesReadLocal: 33.00 B
+         - BytesReadShortCircuit: 33.00 B
+         - NumDisksAccessed: 1
+         - NumScannerThreadsStarted: 1
+         - PeakMemoryUsage: 46.00 KB
+         - PerReadThreadRawHdfsThroughput: 287.52 KB/sec
+         - RowsRead: 3
+         - RowsReturned: 3
+         - RowsReturnedRate: 220.33 K/sec
+         - ScanRangesComplete: 1
+         - ScannerThreadsInvoluntaryContextSwitches: 26
+         - ScannerThreadsTotalWallClockTime: 55.199ms
+           - DelimiterParseTime: 2.463us
+           - MaterializeTupleTime(*): 1.226us
+           - ScannerThreadsSysTime: 0ns
+           - ScannerThreadsUserTime: 42.993ms
+         - ScannerThreadsVoluntaryContextSwitches: 1
+         - TotalRawHdfsReadTime(*): 112.86us
+         - TotalReadThroughput: 0.00 /sec
+    Averaged Fragment 2:(Active: 190.120ms, % non-child: 0.00%)
+      split sizes:  min: 960.00 KB, max: 960.00 KB, avg: 960.00 KB, stddev: 0.00
+      completion times: min:191.736ms  max:191.736ms  mean: 191.736ms  stddev:0ns
+      execution rates: min:4.89 MB/sec  max:4.89 MB/sec  mean:4.89 MB/sec  stddev:0.00 /sec
+      num instances: 1
+       - AverageThreadTokens: 0.00
+       - PeakMemoryUsage: 906.33 KB
+       - PrepareTime: 3.67ms
+       - RowsProduced: 98.30K (98304)
+       - TotalCpuTime: 403.351ms
+       - TotalNetworkWaitTime: 34.999ms
+       - TotalStorageWaitTime: 108.675ms
+      CodeGen:(Active: 162.57ms, % non-child: 85.24%)
+         - CodegenTime: 3.133ms
+         - CompileTime: 148.316ms
+         - LoadTime: 12.317ms
+         - ModuleFileSize: 95.27 KB
+      DataStreamSender (dst_id=3):(Active: 70.620ms, % non-child: 37.14%)
+         - BytesSent: 1.15 MB
+         - NetworkThroughput(*): 23.30 MB/sec
+         - OverallThroughput: 16.23 MB/sec
+         - PeakMemoryUsage: 5.33 KB
+         - SerializeBatchTime: 22.69ms
+         - ThriftTransmitTime(*): 49.178ms
+         - UncompressedRowBatchSize: 3.28 MB
+      HDFS_SCAN_NODE (id=1):(Active: 118.839ms, % non-child: 62.51%)
+         - AverageHdfsReadThreadConcurrency: 0.00
+         - AverageScannerThreadConcurrency: 0.00
+         - BytesRead: 960.00 KB
+         - BytesReadLocal: 960.00 KB
+         - BytesReadShortCircuit: 960.00 KB
+         - NumDisksAccessed: 1
+         - NumScannerThreadsStarted: 1
+         - PeakMemoryUsage: 869.00 KB
+         - PerReadThreadRawHdfsThroughput: 130.21 MB/sec
+         - RowsRead: 98.30K (98304)
+         - RowsReturned: 98.30K (98304)
+         - RowsReturnedRate: 827.20 K/sec
+         - ScanRangesComplete: 15
+         - ScannerThreadsInvoluntaryContextSwitches: 34
+         - ScannerThreadsTotalWallClockTime: 189.774ms
+           - DelimiterParseTime: 15.703ms
+           - MaterializeTupleTime(*): 3.419ms
+           - ScannerThreadsSysTime: 1.999ms
+           - ScannerThreadsUserTime: 44.993ms
+         - ScannerThreadsVoluntaryContextSwitches: 118
+         - TotalRawHdfsReadTime(*): 7.199ms
+         - TotalReadThroughput: 0.00 /sec
+    Fragment 1:
+      Instance 6540a03d4bee0691:4963d6269b210ebf (host=impala-1.example.com:22000):(Active: 442.360ms, % non-child: 0.00%)
+        Hdfs split stats (&lt;volume id&gt;:&lt;# splits&gt;/&lt;split lengths&gt;): 0:1/33.00 B
+        MemoryUsage(500.0ms): 69.33 KB
+        ThreadUsage(500.0ms): 1
+         - AverageThreadTokens: 1.00
+         - PeakMemoryUsage: 6.06 MB
+         - PrepareTime: 7.291ms
+         - RowsProduced: 98.30K (98304)
+         - TotalCpuTime: 784.259ms
+         - TotalNetworkWaitTime: 388.818ms
+         - TotalStorageWaitTime: 3.934ms
+        CodeGen:(Active: 312.862ms, % non-child: 70.73%)
+           - CodegenTime: 2.669ms
+           - CompileTime: 302.467ms
+           - LoadTime: 9.231ms
+           - ModuleFileSize: 95.27 KB
+        DataStreamSender (dst_id=4):(Active: 80.63ms, % non-child: 18.10%)
+           - BytesSent: 2.33 MB
+           - NetworkThroughput(*): 35.89 MB/sec
+           - OverallThroughput: 29.06 MB/sec
+           - PeakMemoryUsage: 5.33 KB
+           - SerializeBatchTime: 26.487ms
+           - ThriftTransmitTime(*): 64.814ms
+           - UncompressedRowBatchSize: 6.66 MB
+        HASH_JOIN_NODE (id=2):(Active: 362.25ms, % non-child: 3.92%)
+          ExecOption: Build Side Codegen Enabled, Probe Side Codegen Enabled, Hash Table Built Asynchronously
+           - BuildBuckets: 1.02K (1024)
+           - BuildRows: 98.30K (98304)
+           - BuildTime: 12.622ms
+           - LoadFactor: 0.00
+           - PeakMemoryUsage: 6.02 MB
+           - ProbeRows: 3
+           - ProbeTime: 3.579ms
+           - RowsReturned: 98.30K (98304)
+           - RowsReturnedRate: 271.54 K/sec
+          EXCHANGE_NODE (id=3):(Active: 344.680ms, % non-child: 77.92%)
+             - BytesReceived: 1.15 MB
+             - ConvertRowBatchTime: 2.792ms
+             - DataArrivalWaitTime: 339.936ms
+             - DeserializeRowBatchTimer: 9.910ms
+             - FirstBatchArrivalWaitTime: 199.474ms
+             - PeakMemoryUsage: 156.00 KB
+             - RowsReturned: 98.30K (98304)
+             - RowsReturnedRate: 285.20 K/sec
+             - SendersBlockedTimer: 0ns
+             - SendersBlockedTotalTimer(*): 0ns
+        HDFS_SCAN_NODE (id=0):(Active: 13.616us, % non-child: 0.00%)
+          Hdfs split stats (&lt;volume id&gt;:&lt;# splits&gt;/&lt;split lengths&gt;): 0:1/33.00 B
+          Hdfs Read Thread Concurrency Bucket: 0:0% 1:0%
+          File Formats: TEXT/NONE:1
+          ExecOption: Codegen enabled: 1 out of 1
+           - AverageHdfsReadThreadConcurrency: 0.00
+           - AverageScannerThreadConcurrency: 0.00
+           - BytesRead: 33.00 B
+           - BytesReadLocal: 33.00 B
+           - BytesReadShortCircuit: 33.00 B
+           - NumDisksAccessed: 1
+           - NumScannerThreadsStarted: 1
+           - PeakMemoryUsage: 46.00 KB
+           - PerReadThreadRawHdfsThroughput: 287.52 KB/sec
+           - RowsRead: 3
+           - RowsReturned: 3
+           - RowsReturnedRate: 220.33 K/sec
+           - ScanRangesComplete: 1
+           - ScannerThreadsInvoluntaryContextSwitches: 26
+           - ScannerThreadsTotalWallClockTime: 55.199ms
+             - DelimiterParseTime: 2.463us
+             - MaterializeTupleTime(*): 1.226us
+             - ScannerThreadsSysTime: 0ns
+             - ScannerThreadsUserTime: 42.993ms
+           - ScannerThreadsVoluntaryContextSwitches: 1
+           - TotalRawHdfsReadTime(*): 112.86us
+           - TotalReadThroughput: 0.00 /sec
+    Fragment 2:
+      Instance 6540a03d4bee0691:4963d6269b210ec0 (host=impala-1.example.com:22000):(Active: 190.120ms, % non-child: 0.00%)
+        Hdfs split stats (&lt;volume id&gt;:&lt;# splits&gt;/&lt;split lengths&gt;): 0:15/960.00 KB
+         - AverageThreadTokens: 0.00
+         - PeakMemoryUsage: 906.33 KB
+         - PrepareTime: 3.67ms
+         - RowsProduced: 98.30K (98304)
+         - TotalCpuTime: 403.351ms
+         - TotalNetworkWaitTime: 34.999ms
+         - TotalStorageWaitTime: 108.675ms
+        CodeGen:(Active: 162.57ms, % non-child: 85.24%)
+           - CodegenTime: 3.133ms
+           - CompileTime: 148.316ms
+           - LoadTime: 12.317ms
+           - ModuleFileSize: 95.27 KB
+        DataStreamSender (dst_id=3):(Active: 70.620ms, % non-child: 37.14%)
+           - BytesSent: 1.15 MB
+           - NetworkThroughput(*): 23.30 MB/sec
+           - OverallThroughput: 16.23 MB/sec
+           - PeakMemoryUsage: 5.33 KB
+           - SerializeBatchTime: 22.69ms
+           - ThriftTransmitTime(*): 49.178ms
+           - UncompressedRowBatchSize: 3.28 MB
+        HDFS_SCAN_NODE (id=1):(Active: 118.839ms, % non-child: 62.51%)
+          Hdfs split stats (&lt;volume id&gt;:&lt;# splits&gt;/&lt;split lengths&gt;): 0:15/960.00 KB
+          Hdfs Read Thread Concurrency Bucket: 0:0% 1:0%
+          File Formats: TEXT/NONE:15
+          ExecOption: Codegen enabled: 15 out of 15
+           - AverageHdfsReadThreadConcurrency: 0.00
+           - AverageScannerThreadConcurrency: 0.00
+           - BytesRead: 960.00 KB
+           - BytesReadLocal: 960.00 KB
+           - BytesReadShortCircuit: 960.00 KB
+           - NumDisksAccessed: 1
+           - NumScannerThreadsStarted: 1
+           - PeakMemoryUsage: 869.00 KB
+           - PerReadThreadRawHdfsThroughput: 130.21 MB/sec
+           - RowsRead: 98.30K (98304)
+           - RowsReturned: 98.30K (98304)
+           - RowsReturnedRate: 827.20 K/sec
+           - ScanRangesComplete: 15
+           - ScannerThreadsInvoluntaryContextSwitches: 34
+           - ScannerThreadsTotalWallClockTime: 189.774ms
+             - DelimiterParseTime: 15.703ms
+             - MaterializeTupleTime(*): 3.419ms
+             - ScannerThreadsSysTime: 1.999ms
+             - ScannerThreadsUserTime: 44.993ms
+           - ScannerThreadsVoluntaryContextSwitches: 118
+           - TotalRawHdfsReadTime(*): 7.199ms
+           - TotalReadThroughput: 0.00 /sec</code></pre>
+    </div>
+  </article>
+</article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_faq.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_faq.html b/docs/build3x/html/topics/impala_faq.html
new file mode 100644
index 0000000..ce2ca4c
--- /dev/null
+++ b/docs/build3x/html/topics/impala_faq.html
@@ -0,0 +1,21 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="faq"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Impala Frequently Asked Questions</title></head><body id="faq"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Impala Frequently Asked Questions</h1>
+
+
+  <div class="body conbody">
+
+    <p class="p">
+      This section lists frequently asked questions for Apache Impala,
+      the interactive SQL engine for Hadoop.
+    </p>
+
+    <p class="p">
+      This section is under construction.
+    </p>
+
+  </div>
+
+</article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_file_formats.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_file_formats.html b/docs/build3x/html/topics/impala_file_formats.html
new file mode 100644
index 0000000..c24d6a6
--- /dev/null
+++ b/docs/build3x/html/topics/impala_file_formats.html
@@ -0,0 +1,236 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_txtfile.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_parquet.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_avro.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_rcfile.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_seqfile.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="file_formats"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>How Impala Works with Hado
 op File Formats</title></head><body id="file_formats"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">How Impala Works with Hadoop File Formats</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+
+
+      Impala supports several familiar file formats used in Apache Hadoop. Impala can load and query data files
+      produced by other Hadoop components such as Pig or MapReduce, and data files produced by Impala can be used
+      by other components also. The following sections discuss the procedures, limitations, and performance
+      considerations for using each file format with Impala.
+    </p>
+
+    <p class="p">
+      The file format used for an Impala table has significant performance consequences. Some file formats include
+      compression support that affects the size of data on the disk and, consequently, the amount of I/O and CPU
+      resources required to deserialize data. The amounts of I/O and CPU resources required can be a limiting
+      factor in query performance since querying often begins with moving and decompressing data. To reduce the
+      potential impact of this part of the process, data is often compressed. By compressing data, a smaller total
+      number of bytes are transferred from disk to memory. This reduces the amount of time taken to transfer the
+      data, but a tradeoff occurs when the CPU decompresses the content.
+    </p>
+
+    <p class="p">
+      Impala can query files encoded with most of the popular file formats and compression codecs used in Hadoop.
+      Impala can create and insert data into tables that use some file formats but not others; for file formats
+      that Impala cannot write to, create the table in Hive, issue the <code class="ph codeph">INVALIDATE METADATA <var class="keyword varname">table_name</var></code>
+      statement in <code class="ph codeph">impala-shell</code>, and query the table through Impala. File formats can be
+      structured, in which case they may include metadata and built-in compression. Supported formats include:
+    </p>
+
+    <table class="table"><caption><span class="table--title-label">Table 1. </span><span class="title">File Format Support in Impala</span></caption><colgroup><col style="width:10%"><col style="width:10%"><col style="width:20%"><col style="width:30%"><col style="width:30%"></colgroup><thead class="thead">
+          <tr class="row">
+            <th class="entry nocellnorowborder" id="file_formats__entry__1">
+              File Type
+            </th>
+            <th class="entry nocellnorowborder" id="file_formats__entry__2">
+              Format
+            </th>
+            <th class="entry nocellnorowborder" id="file_formats__entry__3">
+              Compression Codecs
+            </th>
+            <th class="entry nocellnorowborder" id="file_formats__entry__4">
+              Impala Can CREATE?
+            </th>
+            <th class="entry nocellnorowborder" id="file_formats__entry__5">
+              Impala Can INSERT?
+            </th>
+          </tr>
+        </thead><tbody class="tbody">
+          <tr class="row" id="file_formats__parquet_support">
+            <td class="entry nocellnorowborder" headers="file_formats__entry__1 ">
+              <a class="xref" href="impala_parquet.html#parquet">Parquet</a>
+            </td>
+            <td class="entry nocellnorowborder" headers="file_formats__entry__2 ">
+              Structured
+            </td>
+            <td class="entry nocellnorowborder" headers="file_formats__entry__3 ">
+              Snappy, gzip; currently Snappy by default
+            </td>
+            <td class="entry nocellnorowborder" headers="file_formats__entry__4 ">
+              Yes.
+            </td>
+            <td class="entry nocellnorowborder" headers="file_formats__entry__5 ">
+              Yes: <code class="ph codeph">CREATE TABLE</code>, <code class="ph codeph">INSERT</code>, <code class="ph codeph">LOAD DATA</code>, and query.
+            </td>
+          </tr>
+          <tr class="row" id="file_formats__txtfile_support">
+            <td class="entry nocellnorowborder" headers="file_formats__entry__1 ">
+              <a class="xref" href="impala_txtfile.html#txtfile">Text</a>
+            </td>
+            <td class="entry nocellnorowborder" headers="file_formats__entry__2 ">
+              Unstructured
+            </td>
+            <td class="entry nocellnorowborder" headers="file_formats__entry__3 ">
+              LZO, gzip, bzip2, Snappy
+            </td>
+            <td class="entry nocellnorowborder" headers="file_formats__entry__4 ">
+              Yes. For <code class="ph codeph">CREATE TABLE</code> with no <code class="ph codeph">STORED AS</code> clause, the default file
+              format is uncompressed text, with values separated by ASCII <code class="ph codeph">0x01</code> characters
+              (typically represented as Ctrl-A).
+            </td>
+            <td class="entry nocellnorowborder" headers="file_formats__entry__5 ">
+              Yes: <code class="ph codeph">CREATE TABLE</code>, <code class="ph codeph">INSERT</code>, <code class="ph codeph">LOAD DATA</code>, and query.
+              If LZO compression is used, you must create the table and load data in Hive. If other kinds of
+              compression are used, you must load data through <code class="ph codeph">LOAD DATA</code>, Hive, or manually in
+              HDFS.
+
+
+            </td>
+          </tr>
+          <tr class="row" id="file_formats__avro_support">
+            <td class="entry nocellnorowborder" headers="file_formats__entry__1 ">
+              <a class="xref" href="impala_avro.html#avro">Avro</a>
+            </td>
+            <td class="entry nocellnorowborder" headers="file_formats__entry__2 ">
+              Structured
+            </td>
+            <td class="entry nocellnorowborder" headers="file_formats__entry__3 ">
+              Snappy, gzip, deflate, bzip2
+            </td>
+            <td class="entry nocellnorowborder" headers="file_formats__entry__4 ">
+              Yes, in Impala 1.4.0 and higher. Before that, create the table using Hive.
+            </td>
+            <td class="entry nocellnorowborder" headers="file_formats__entry__5 ">
+              No. Import data by using <code class="ph codeph">LOAD DATA</code> on data files already in the right format, or use
+              <code class="ph codeph">INSERT</code> in Hive followed by <code class="ph codeph">REFRESH <var class="keyword varname">table_name</var></code> in Impala.
+            </td>
+
+          </tr>
+          <tr class="row" id="file_formats__rcfile_support">
+            <td class="entry nocellnorowborder" headers="file_formats__entry__1 ">
+              <a class="xref" href="impala_rcfile.html#rcfile">RCFile</a>
+            </td>
+            <td class="entry nocellnorowborder" headers="file_formats__entry__2 ">
+              Structured
+            </td>
+            <td class="entry nocellnorowborder" headers="file_formats__entry__3 ">
+              Snappy, gzip, deflate, bzip2
+            </td>
+            <td class="entry nocellnorowborder" headers="file_formats__entry__4 ">
+              Yes.
+            </td>
+            <td class="entry nocellnorowborder" headers="file_formats__entry__5 ">
+              No. Import data by using <code class="ph codeph">LOAD DATA</code> on data files already in the right format, or use
+              <code class="ph codeph">INSERT</code> in Hive followed by <code class="ph codeph">REFRESH <var class="keyword varname">table_name</var></code> in Impala.
+            </td>
+
+          </tr>
+          <tr class="row" id="file_formats__sequencefile_support">
+            <td class="entry nocellnorowborder" headers="file_formats__entry__1 ">
+              <a class="xref" href="impala_seqfile.html#seqfile">SequenceFile</a>
+            </td>
+            <td class="entry nocellnorowborder" headers="file_formats__entry__2 ">
+              Structured
+            </td>
+            <td class="entry nocellnorowborder" headers="file_formats__entry__3 ">
+              Snappy, gzip, deflate, bzip2
+            </td>
+            <td class="entry nocellnorowborder" headers="file_formats__entry__4 ">Yes.</td>
+            <td class="entry nocellnorowborder" headers="file_formats__entry__5 ">
+              No. Import data by using <code class="ph codeph">LOAD DATA</code> on data files already in the right format, or use
+              <code class="ph codeph">INSERT</code> in Hive followed by <code class="ph codeph">REFRESH <var class="keyword varname">table_name</var></code> in Impala.
+            </td>
+
+          </tr>
+        </tbody></table>
+
+    <p class="p">
+      Impala can only query the file formats listed in the preceding table.
+      In particular, Impala does not support the ORC file format.
+    </p>
+
+    <p class="p">
+      Impala supports the following compression codecs:
+    </p>
+
+    <ul class="ul">
+      <li class="li">
+        Snappy. Recommended for its effective balance between compression ratio and decompression speed. Snappy
+        compression is very fast, but gzip provides greater space savings. Supported for text files in Impala 2.0
+        and higher.
+
+      </li>
+
+      <li class="li">
+        Gzip. Recommended when achieving the highest level of compression (and therefore greatest disk-space
+        savings) is desired. Supported for text files in Impala 2.0 and higher.
+      </li>
+
+      <li class="li">
+        Deflate. Not supported for text files.
+      </li>
+
+      <li class="li">
+        Bzip2. Supported for text files in Impala 2.0 and higher.
+
+      </li>
+
+      <li class="li">
+        <p class="p"> LZO, for text files only. Impala can query
+          LZO-compressed text tables, but currently cannot create them or insert
+          data into them; perform these operations in Hive. </p>
+      </li>
+    </ul>
+  </div>
+
+  <nav role="navigation" class="related-links"><ul class="ullinks"><li class="link ulchildlink"><strong><a href="../topics/impala_txtfile.html">Using Text Data Files with Impala Tables</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_parquet.html">Using the Parquet File Format with Impala Tables</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_avro.html">Using the Avro File Format with Impala Tables</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_rcfile.html">Using the RCFile File Format with Impala Tables</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_seqfile.html">Using the SequenceFile File Format with Impala Tables</a></strong><br></li></ul></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="file_formats__file_format_choosing">
+
+    <h2 class="title topictitle2" id="ariaid-title2">Choosing the File Format for a Table</h2>
+
+
+    <div class="body conbody">
+
+      <p class="p">
+        Different file formats and compression codecs work better for different data sets. While Impala typically
+        provides performance gains regardless of file format, choosing the proper format for your data can yield
+        further performance improvements. Use the following considerations to decide which combination of file
+        format and compression to use for a particular table:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          If you are working with existing files that are already in a supported file format, use the same format
+          for the Impala table where practical. If the original format does not yield acceptable query performance
+          or resource usage, consider creating a new Impala table with different file format or compression
+          characteristics, and doing a one-time conversion by copying the data to the new table using the
+          <code class="ph codeph">INSERT</code> statement. Depending on the file format, you might run the
+          <code class="ph codeph">INSERT</code> statement in <code class="ph codeph">impala-shell</code> or in Hive.
+        </li>
+
+        <li class="li">
+          Text files are convenient to produce through many different tools, and are human-readable for ease of
+          verification and debugging. Those characteristics are why text is the default format for an Impala
+          <code class="ph codeph">CREATE TABLE</code> statement. When performance and resource usage are the primary
+          considerations, use one of the other file formats and consider using compression. A typical workflow
+          might involve bringing data into an Impala table by copying CSV or TSV files into the appropriate data
+          directory, and then using the <code class="ph codeph">INSERT ... SELECT</code> syntax to copy the data into a table
+          using a different, more compact file format.
+        </li>
+
+        <li class="li">
+          If your architecture involves storing data to be queried in memory, do not compress the data. There is no
+          I/O savings since the data does not need to be moved from disk, but there is a CPU cost to decompress the
+          data.
+        </li>
+      </ul>
+    </div>
+  </article>
+</article></main></body></html>

[23/51] [partial] impala git commit: [DOCS] Impala doc site update for 3.0

Posted by mi...@apache.org.

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_live_summary.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_live_summary.html b/docs/build3x/html/topics/impala_live_summary.html
new file mode 100644
index 0000000..d0792a1
--- /dev/null
+++ b/docs/build3x/html/topics/impala_live_summary.html
@@ -0,0 +1,177 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="live_summary"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>LIVE_SUMMARY Query Option (Impala 2.3 or higher only)</title></head><body id="live_summary"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">LIVE_SUMMARY Query Option (<span class="keyword">Impala 2.3</span> or higher only)</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+
+      For queries submitted through the <span class="keyword cmdname">impala-shell</span> command,
+      displays the same output as the <code class="ph codeph">SUMMARY</code> command,
+      with the measurements updated in real time as the query progresses.
+      When the query finishes, the final <code class="ph codeph">SUMMARY</code> output remains
+      visible in the <span class="keyword cmdname">impala-shell</span> console output.
+    </p>
+
+    <p class="p">
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Type:</strong> Boolean; recognized values are 1 and 0, or <code class="ph codeph">true</code> and <code class="ph codeph">false</code>;
+        any other value interpreted as <code class="ph codeph">false</code>
+      </p>
+    <p class="p">
+        <strong class="ph b">Default:</strong> <code class="ph codeph">false</code> (shown as 0 in output of <code class="ph codeph">SET</code> statement)
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Command-line equivalent:</strong>
+      </p>
+    <p class="p">
+      You can enable this query option within <span class="keyword cmdname">impala-shell</span>
+      by starting the shell with the <code class="ph codeph">--live_summary</code>
+      command-line option.
+      You can still turn this setting off and on again within the shell through the
+      <code class="ph codeph">SET</code> command.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+    <p class="p">
+      The live summary output can be useful for evaluating long-running queries,
+      to evaluate which phase of execution takes up the most time, or if some hosts
+      take much longer than others for certain operations, dragging overall performance down.
+      By making the information available in real time, this feature lets you decide what
+      action to take even before you cancel a query that is taking much longer than normal.
+    </p>
+    <p class="p">
+      For example, you might see the HDFS scan phase taking a long time, and therefore revisit
+      performance-related aspects of your schema design such as constructing a partitioned table,
+      switching to the Parquet file format, running the <code class="ph codeph">COMPUTE STATS</code> statement
+      for the table, and so on.
+      Or you might see a wide variation between the average and maximum times for all hosts to
+      perform some phase of the query, and therefore investigate if one particular host
+      needed more memory or was experiencing a network problem.
+    </p>
+    <p class="p">
+        The output from this query option is printed to standard error. The output is only displayed in interactive mode,
+        that is, not when the <code class="ph codeph">-q</code> or <code class="ph codeph">-f</code> options are used.
+      </p>
+    <p class="p">
+      For a simple and concise way of tracking the progress of an interactive query, see
+      <a class="xref" href="impala_live_progress.html#live_progress">LIVE_PROGRESS Query Option (Impala 2.3 or higher only)</a>.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Restrictions:</strong>
+      </p>
+    <p class="p">
+        The <code class="ph codeph">LIVE_PROGRESS</code> and <code class="ph codeph">LIVE_SUMMARY</code> query options
+        currently do not produce any output during <code class="ph codeph">COMPUTE STATS</code> operations.
+      </p>
+    <div class="p">
+        Because the <code class="ph codeph">LIVE_PROGRESS</code> and <code class="ph codeph">LIVE_SUMMARY</code> query options
+        are available only within the <span class="keyword cmdname">impala-shell</span> interpreter:
+        <ul class="ul">
+          <li class="li">
+            <p class="p">
+              You cannot change these query options through the SQL <code class="ph codeph">SET</code>
+              statement using the JDBC or ODBC interfaces. The <code class="ph codeph">SET</code>
+              command in <span class="keyword cmdname">impala-shell</span> recognizes these names as
+              shell-only options.
+            </p>
+          </li>
+          <li class="li">
+            <p class="p">
+              Be careful when using <span class="keyword cmdname">impala-shell</span> on a pre-<span class="keyword">Impala 2.3</span>
+              system to connect to a system running <span class="keyword">Impala 2.3</span> or higher.
+              The older <span class="keyword cmdname">impala-shell</span> does not recognize these
+              query option names. Upgrade <span class="keyword cmdname">impala-shell</span> on the
+              systems where you intend to use these query options.
+            </p>
+          </li>
+          <li class="li">
+            <p class="p">
+              Likewise, the <span class="keyword cmdname">impala-shell</span> command relies on
+              some information only available in <span class="keyword">Impala 2.3</span> and higher
+              to prepare live progress reports and query summaries. The
+              <code class="ph codeph">LIVE_PROGRESS</code> and <code class="ph codeph">LIVE_SUMMARY</code>
+              query options have no effect when <span class="keyword cmdname">impala-shell</span> connects
+              to a cluster running an older version of Impala.
+            </p>
+          </li>
+        </ul>
+      </div>
+
+    <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.3.0</span>
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+    <p class="p">
+      The following example shows a series of <code class="ph codeph">LIVE_SUMMARY</code> reports that
+      are displayed during the course of a query, showing how the numbers increase to
+      show the progress of different phases of the distributed query. When you do the same
+      in <span class="keyword cmdname">impala-shell</span>, only a single report is displayed at any one time,
+      with each update overwriting the previous numbers.
+    </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; set live_summary=true;
+LIVE_SUMMARY set to true
+[localhost:21000] &gt; select count(*) from customer t1 cross join customer t2;
++---------------------+--------+----------+----------+---------+------------+----------+---------------+-----------------------+
+| Operator            | #Hosts | Avg Time | Max Time | #Rows   | Est. #Rows | Peak Mem | Est. Peak Mem | Detail                |
++---------------------+--------+----------+----------+---------+------------+----------+---------------+-----------------------+
+| 06:AGGREGATE        | 0      | 0ns      | 0ns      | 0       | 1          | 0 B      | -1 B          | FINALIZE              |
+| 05:EXCHANGE         | 0      | 0ns      | 0ns      | 0       | 1          | 0 B      | -1 B          | UNPARTITIONED         |
+| 03:AGGREGATE        | 0      | 0ns      | 0ns      | 0       | 1          | 0 B      | 10.00 MB      |                       |
+| 02:NESTED LOOP JOIN | 0      | 0ns      | 0ns      | 0       | 22.50B     | 0 B      | 0 B           | CROSS JOIN, BROADCAST |
+| |--04:EXCHANGE      | 0      | 0ns      | 0ns      | 0       | 150.00K    | 0 B      | 0 B           | BROADCAST             |
+| |  01:SCAN HDFS     | 1      | 503.57ms | 503.57ms | 150.00K | 150.00K    | 24.09 MB | 64.00 MB      | tpch.customer t2      |
+| 00:SCAN HDFS        | 0      | 0ns      | 0ns      | 0       | 150.00K    | 0 B      | 64.00 MB      | tpch.customer t1      |
++---------------------+--------+----------+----------+---------+------------+----------+---------------+-----------------------+
+
++---------------------+--------+----------+----------+---------+------------+----------+---------------+-----------------------+
+| Operator            | #Hosts | Avg Time | Max Time | #Rows   | Est. #Rows | Peak Mem | Est. Peak Mem | Detail                |
++---------------------+--------+----------+----------+---------+------------+----------+---------------+-----------------------+
+| 06:AGGREGATE        | 0      | 0ns      | 0ns      | 0       | 1          | 0 B      | -1 B          | FINALIZE              |
+| 05:EXCHANGE         | 0      | 0ns      | 0ns      | 0       | 1          | 0 B      | -1 B          | UNPARTITIONED         |
+| 03:AGGREGATE        | 1      | 0ns      | 0ns      | 0       | 1          | 20.00 KB | 10.00 MB      |                       |
+| 02:NESTED LOOP JOIN | 1      | 17.62s   | 17.62s   | 81.14M  | 22.50B     | 3.23 MB  | 0 B           | CROSS JOIN, BROADCAST |
+| |--04:EXCHANGE      | 1      | 26.29ms  | 26.29ms  | 150.00K | 150.00K    | 0 B      | 0 B           | BROADCAST             |
+| |  01:SCAN HDFS     | 1      | 503.57ms | 503.57ms | 150.00K | 150.00K    | 24.09 MB | 64.00 MB      | tpch.customer t2      |
+| 00:SCAN HDFS        | 1      | 247.53ms | 247.53ms | 1.02K   | 150.00K    | 24.39 MB | 64.00 MB      | tpch.customer t1      |
++---------------------+--------+----------+----------+---------+------------+----------+---------------+-----------------------+
+
++---------------------+--------+----------+----------+---------+------------+----------+---------------+-----------------------+
+| Operator            | #Hosts | Avg Time | Max Time | #Rows   | Est. #Rows | Peak Mem | Est. Peak Mem | Detail                |
++---------------------+--------+----------+----------+---------+------------+----------+---------------+-----------------------+
+| 06:AGGREGATE        | 0      | 0ns      | 0ns      | 0       | 1          | 0 B      | -1 B          | FINALIZE              |
+| 05:EXCHANGE         | 0      | 0ns      | 0ns      | 0       | 1          | 0 B      | -1 B          | UNPARTITIONED         |
+| 03:AGGREGATE        | 1      | 0ns      | 0ns      | 0       | 1          | 20.00 KB | 10.00 MB      |                       |
+| 02:NESTED LOOP JOIN | 1      | 61.85s   | 61.85s   | 283.43M | 22.50B     | 3.23 MB  | 0 B           | CROSS JOIN, BROADCAST |
+| |--04:EXCHANGE      | 1      | 26.29ms  | 26.29ms  | 150.00K | 150.00K    | 0 B      | 0 B           | BROADCAST             |
+| |  01:SCAN HDFS     | 1      | 503.57ms | 503.57ms | 150.00K | 150.00K    | 24.09 MB | 64.00 MB      | tpch.customer t2      |
+| 00:SCAN HDFS        | 1      | 247.59ms | 247.59ms | 2.05K   | 150.00K    | 24.39 MB | 64.00 MB      | tpch.customer t1      |
++---------------------+--------+----------+----------+---------+------------+----------+---------------+-----------------------+
+
+</code></pre>
+
+
+
+
+    <p class="p">
+        To see how the <code class="ph codeph">LIVE_PROGRESS</code> and <code class="ph codeph">LIVE_SUMMARY</code> query options
+        work in real time, see <a class="xref" href="https://asciinema.org/a/1rv7qippo0fe7h5k1b6k4nexk" target="_blank">this animated demo</a>.
+      </p>
+
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_load_data.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_load_data.html b/docs/build3x/html/topics/impala_load_data.html
new file mode 100644
index 0000000..82f689f
--- /dev/null
+++ b/docs/build3x/html/topics/impala_load_data.html
@@ -0,0 +1,322 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_langref_sql.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="load_data"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>LOAD DATA Statement</title></head><body id="load_data"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">LOAD DATA Statement</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+
+      The <code class="ph codeph">LOAD DATA</code> statement streamlines the ETL process for an internal Impala table by moving a
+      data file or all the data files in a directory from an HDFS location into the Impala data directory for that
+      table.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>LOAD DATA INPATH '<var class="keyword varname">hdfs_file_or_directory_path</var>' [OVERWRITE] INTO TABLE <var class="keyword varname">tablename</var>
+  [PARTITION (<var class="keyword varname">partcol1</var>=<var class="keyword varname">val1</var>, <var class="keyword varname">partcol2</var>=<var class="keyword varname">val2</var> ...)]</code></pre>
+
+    <p class="p">
+      When the <code class="ph codeph">LOAD DATA</code> statement operates on a partitioned table,
+      it always operates on one partition at a time. Specify the <code class="ph codeph">PARTITION</code> clauses
+      and list all the partition key columns, with a constant value specified for each.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Statement type:</strong> DML (but still affected by
+        <a class="xref" href="../shared/../topics/impala_sync_ddl.html#sync_ddl">SYNC_DDL</a> query option)
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+    <ul class="ul">
+      <li class="li">
+        The loaded data files are moved, not copied, into the Impala data directory.
+      </li>
+
+      <li class="li">
+        You can specify the HDFS path of a single file to be moved, or the HDFS path of a directory to move all the
+        files inside that directory. You cannot specify any sort of wildcard to take only some of the files from a
+        directory. When loading a directory full of data files, keep all the data files at the top level, with no
+        nested directories underneath.
+      </li>
+
+      <li class="li">
+        Currently, the Impala <code class="ph codeph">LOAD DATA</code> statement only imports files from HDFS, not from the local
+        filesystem. It does not support the <code class="ph codeph">LOCAL</code> keyword of the Hive <code class="ph codeph">LOAD DATA</code>
+        statement. You must specify a path, not an <code class="ph codeph">hdfs://</code> URI.
+      </li>
+
+      <li class="li">
+        In the interest of speed, only limited error checking is done. If the loaded files have the wrong file
+        format, different columns than the destination table, or other kind of mismatch, Impala does not raise any
+        error for the <code class="ph codeph">LOAD DATA</code> statement. Querying the table afterward could produce a runtime
+        error or unexpected results. Currently, the only checking the <code class="ph codeph">LOAD DATA</code> statement does is
+        to avoid mixing together uncompressed and LZO-compressed text files in the same table.
+      </li>
+
+      <li class="li">
+        When you specify an HDFS directory name as the <code class="ph codeph">LOAD DATA</code> argument, any hidden files in
+        that directory (files whose names start with a <code class="ph codeph">.</code>) are not moved to the Impala data
+        directory.
+      </li>
+
+      <li class="li">
+        The operation fails if the source directory contains any non-hidden directories.
+        Prior to <span class="keyword">Impala 2.5</span> if the source directory contained any subdirectory, even a hidden one such as
+        <span class="ph filepath">_impala_insert_staging</span>, the <code class="ph codeph">LOAD DATA</code> statement would fail.
+        In <span class="keyword">Impala 2.5</span> and higher, <code class="ph codeph">LOAD DATA</code> ignores hidden subdirectories in the
+        source directory, and only fails if any of the subdirectories are non-hidden.
+      </li>
+
+      <li class="li">
+        The loaded data files retain their original names in the new location, unless a name conflicts with an
+        existing data file, in which case the name of the new file is modified slightly to be unique. (The
+        name-mangling is a slight difference from the Hive <code class="ph codeph">LOAD DATA</code> statement, which replaces
+        identically named files.)
+      </li>
+
+      <li class="li">
+        By providing an easy way to transport files from known locations in HDFS into the Impala data directory
+        structure, the <code class="ph codeph">LOAD DATA</code> statement lets you avoid memorizing the locations and layout of
+        HDFS directory tree containing the Impala databases and tables. (For a quick way to check the location of
+        the data files for an Impala table, issue the statement <code class="ph codeph">DESCRIBE FORMATTED
+        <var class="keyword varname">table_name</var></code>.)
+      </li>
+
+      <li class="li">
+        The <code class="ph codeph">PARTITION</code> clause is especially convenient for ingesting new data for a partitioned
+        table. As you receive new data for a time period, geographic region, or other division that corresponds to
+        one or more partitioning columns, you can load that data straight into the appropriate Impala data
+        directory, which might be nested several levels down if the table is partitioned by multiple columns. When
+        the table is partitioned, you must specify constant values for all the partitioning columns.
+      </li>
+    </ul>
+
+    <p class="p">
+        <strong class="ph b">Complex type considerations:</strong>
+      </p>
+
+    <p class="p">
+      Because Impala currently cannot create Parquet data files containing complex types
+      (<code class="ph codeph">ARRAY</code>, <code class="ph codeph">STRUCT</code>, and <code class="ph codeph">MAP</code>), the
+      <code class="ph codeph">LOAD DATA</code> statement is especially important when working with
+      tables containing complex type columns. You create the Parquet data files outside
+      Impala, then use either <code class="ph codeph">LOAD DATA</code>, an external table, or HDFS-level
+      file operations followed by <code class="ph codeph">REFRESH</code> to associate the data files with
+      the corresponding table.
+      See <a class="xref" href="impala_complex_types.html#complex_types">Complex Types (Impala 2.3 or higher only)</a> for details about using complex types.
+    </p>
+
+    <p class="p">
+        If you connect to different Impala nodes within an <span class="keyword cmdname">impala-shell</span> session for
+        load-balancing purposes, you can enable the <code class="ph codeph">SYNC_DDL</code> query option to make each DDL
+        statement wait before returning, until the new or changed metadata has been received by all the Impala
+        nodes. See <a class="xref" href="../shared/../topics/impala_sync_ddl.html#sync_ddl">SYNC_DDL Query Option</a> for details.
+      </p>
+
+    <div class="note important note_important"><span class="note__title importanttitle">Important:</span>
+        After adding or replacing data in a table used in performance-critical queries, issue a <code class="ph codeph">COMPUTE
+        STATS</code> statement to make sure all statistics are up-to-date. Consider updating statistics for a
+        table after any <code class="ph codeph">INSERT</code>, <code class="ph codeph">LOAD DATA</code>, or <code class="ph codeph">CREATE TABLE AS
+        SELECT</code> statement in Impala, or after loading data through Hive and doing a <code class="ph codeph">REFRESH
+        <var class="keyword varname">table_name</var></code> in Impala. This technique is especially important for tables that
+        are very large, used in join queries, or both.
+      </div>
+
+    <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+    <p class="p">
+      First, we use a trivial Python script to write different numbers of strings (one per line) into files stored
+      in the <code class="ph codeph">doc_demo</code> HDFS user account. (Substitute the path for your own HDFS user account when
+      doing <span class="keyword cmdname">hdfs dfs</span> operations like these.)
+    </p>
+
+<pre class="pre codeblock"><code>$ random_strings.py 1000 | hdfs dfs -put - /user/doc_demo/thousand_strings.txt
+$ random_strings.py 100 | hdfs dfs -put - /user/doc_demo/hundred_strings.txt
+$ random_strings.py 10 | hdfs dfs -put - /user/doc_demo/ten_strings.txt</code></pre>
+
+    <p class="p">
+      Next, we create a table and load an initial set of data into it. Remember, unless you specify a
+      <code class="ph codeph">STORED AS</code> clause, Impala tables default to <code class="ph codeph">TEXTFILE</code> format with Ctrl-A (hex
+      01) as the field delimiter. This example uses a single-column table, so the delimiter is not significant. For
+      large-scale ETL jobs, you would typically use binary format data files such as Parquet or Avro, and load them
+      into Impala tables that use the corresponding file format.
+    </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; create table t1 (s string);
+[localhost:21000] &gt; load data inpath '/user/doc_demo/thousand_strings.txt' into table t1;
+Query finished, fetching results ...
++----------------------------------------------------------+
+| summary                                                  |
++----------------------------------------------------------+
+| Loaded 1 file(s). Total files in destination location: 1 |
++----------------------------------------------------------+
+Returned 1 row(s) in 0.61s
+[kilo2-202-961.cs1cloud.internal:21000] &gt; select count(*) from t1;
+Query finished, fetching results ...
++------+
+| _c0  |
++------+
+| 1000 |
++------+
+Returned 1 row(s) in 0.67s
+[localhost:21000] &gt; load data inpath '/user/doc_demo/thousand_strings.txt' into table t1;
+ERROR: AnalysisException: INPATH location '/user/doc_demo/thousand_strings.txt' does not exist. </code></pre>
+
+    <p class="p">
+      As indicated by the message at the end of the previous example, the data file was moved from its original
+      location. The following example illustrates how the data file was moved into the Impala data directory for
+      the destination table, keeping its original filename:
+    </p>
+
+<pre class="pre codeblock"><code>$ hdfs dfs -ls /user/hive/warehouse/load_data_testing.db/t1
+Found 1 items
+-rw-r--r--   1 doc_demo doc_demo      13926 2013-06-26 15:40 /user/hive/warehouse/load_data_testing.db/t1/thousand_strings.txt</code></pre>
+
+    <p class="p">
+      The following example demonstrates the difference between the <code class="ph codeph">INTO TABLE</code> and
+      <code class="ph codeph">OVERWRITE TABLE</code> clauses. The table already contains 1000 rows. After issuing the
+      <code class="ph codeph">LOAD DATA</code> statement with the <code class="ph codeph">INTO TABLE</code> clause, the table contains 100 more
+      rows, for a total of 1100. After issuing the <code class="ph codeph">LOAD DATA</code> statement with the <code class="ph codeph">OVERWRITE
+      INTO TABLE</code> clause, the former contents are gone, and now the table only contains the 10 rows from
+      the just-loaded data file.
+    </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; load data inpath '/user/doc_demo/hundred_strings.txt' into table t1;
+Query finished, fetching results ...
++----------------------------------------------------------+
+| summary                                                  |
++----------------------------------------------------------+
+| Loaded 1 file(s). Total files in destination location: 2 |
++----------------------------------------------------------+
+Returned 1 row(s) in 0.24s
+[localhost:21000] &gt; select count(*) from t1;
+Query finished, fetching results ...
++------+
+| _c0  |
++------+
+| 1100 |
++------+
+Returned 1 row(s) in 0.55s
+[localhost:21000] &gt; load data inpath '/user/doc_demo/ten_strings.txt' overwrite into table t1;
+Query finished, fetching results ...
++----------------------------------------------------------+
+| summary                                                  |
++----------------------------------------------------------+
+| Loaded 1 file(s). Total files in destination location: 1 |
++----------------------------------------------------------+
+Returned 1 row(s) in 0.26s
+[localhost:21000] &gt; select count(*) from t1;
+Query finished, fetching results ...
++-----+
+| _c0 |
++-----+
+| 10  |
++-----+
+Returned 1 row(s) in 0.62s</code></pre>
+
+    <p class="p">
+        <strong class="ph b">Amazon S3 considerations:</strong>
+      </p>
+    <p class="p">
+        In <span class="keyword">Impala 2.6</span> and higher, the Impala DML statements (<code class="ph codeph">INSERT</code>, <code class="ph codeph">LOAD DATA</code>,
+        and <code class="ph codeph">CREATE TABLE AS SELECT</code>) can write data into a table or partition that resides in the
+        Amazon Simple Storage Service (S3).
+        The syntax of the DML statements is the same as for any other tables, because the S3 location for tables and
+        partitions is specified by an <code class="ph codeph">s3a://</code> prefix in the
+        <code class="ph codeph">LOCATION</code> attribute of
+        <code class="ph codeph">CREATE TABLE</code> or <code class="ph codeph">ALTER TABLE</code> statements.
+        If you bring data into S3 using the normal S3 transfer mechanisms instead of Impala DML statements,
+        issue a <code class="ph codeph">REFRESH</code> statement for the table before using Impala to query the S3 data.
+      </p>
+    <p class="p">
+        Because of differences between S3 and traditional filesystems, DML operations
+        for S3 tables can take longer than for tables on HDFS. For example, both the
+        <code class="ph codeph">LOAD DATA</code> statement and the final stage of the <code class="ph codeph">INSERT</code>
+        and <code class="ph codeph">CREATE TABLE AS SELECT</code> statements involve moving files from one directory
+        to another. (In the case of <code class="ph codeph">INSERT</code> and <code class="ph codeph">CREATE TABLE AS SELECT</code>,
+        the files are moved from a temporary staging directory to the final destination directory.)
+        Because S3 does not support a <span class="q">"rename"</span> operation for existing objects, in these cases Impala
+        actually copies the data files from one location to another and then removes the original files.
+        In <span class="keyword">Impala 2.6</span>, the <code class="ph codeph">S3_SKIP_INSERT_STAGING</code> query option provides a way
+        to speed up <code class="ph codeph">INSERT</code> statements for S3 tables and partitions, with the tradeoff
+        that a problem during statement execution could leave data in an inconsistent state.
+        It does not apply to <code class="ph codeph">INSERT OVERWRITE</code> or <code class="ph codeph">LOAD DATA</code> statements.
+        See <a class="xref" href="../shared/../topics/impala_s3_skip_insert_staging.html#s3_skip_insert_staging">S3_SKIP_INSERT_STAGING Query Option (Impala 2.6 or higher only)</a> for details.
+      </p>
+    <p class="p">See <a class="xref" href="impala_s3.html#s3">Using Impala with the Amazon S3 Filesystem</a> for details about reading and writing S3 data with Impala.</p>
+
+    <p class="p">
+        <strong class="ph b">ADLS considerations:</strong>
+      </p>
+    <p class="p">
+        In <span class="keyword">Impala 2.9</span> and higher, the Impala DML statements (<code class="ph codeph">INSERT</code>, <code class="ph codeph">LOAD DATA</code>,
+        and <code class="ph codeph">CREATE TABLE AS SELECT</code>) can write data into a table or partition that resides in the
+        Azure Data Lake Store (ADLS).
+        The syntax of the DML statements is the same as for any other tables, because the ADLS location for tables and
+        partitions is specified by an <code class="ph codeph">adl://</code> prefix in the
+        <code class="ph codeph">LOCATION</code> attribute of
+        <code class="ph codeph">CREATE TABLE</code> or <code class="ph codeph">ALTER TABLE</code> statements.
+        If you bring data into ADLS using the normal ADLS transfer mechanisms instead of Impala DML statements,
+        issue a <code class="ph codeph">REFRESH</code> statement for the table before using Impala to query the ADLS data.
+      </p>
+    <p class="p">See <a class="xref" href="impala_adls.html#adls">Using Impala with the Azure Data Lake Store (ADLS)</a> for details about reading and writing ADLS data with Impala.</p>
+
+    <p class="p">
+        <strong class="ph b">Cancellation:</strong> Cannot be cancelled.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">HDFS permissions:</strong>
+      </p>
+    <p class="p">
+      The user ID that the <span class="keyword cmdname">impalad</span> daemon runs under,
+      typically the <code class="ph codeph">impala</code> user, must have read and write
+      permissions for the files in the source directory, and write
+      permission for the destination directory.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Kudu considerations:</strong>
+      </p>
+    <p class="p">
+        The <code class="ph codeph">LOAD DATA</code> statement cannot be used with Kudu tables.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">HBase considerations:</strong>
+      </p>
+    <p class="p">
+        The <code class="ph codeph">LOAD DATA</code> statement cannot be used with HBase tables.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+    <p class="p">
+      The <code class="ph codeph">LOAD DATA</code> statement is an alternative to the
+      <code class="ph codeph"><a class="xref" href="impala_insert.html#insert">INSERT</a></code> statement.
+      Use <code class="ph codeph">LOAD DATA</code>
+      when you have the data files in HDFS but outside of any Impala table.
+    </p>
+    <p class="p">
+      The <code class="ph codeph">LOAD DATA</code> statement is also an alternative
+      to the <code class="ph codeph">CREATE EXTERNAL TABLE</code> statement. Use
+      <code class="ph codeph">LOAD DATA</code> when it is appropriate to move the
+      data files under Impala control rather than querying them
+      from their original location. See <a class="xref" href="impala_tables.html#external_tables">External Tables</a>
+      for information about working with external tables.
+    </p>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_langref_sql.html">Impala SQL Statements</a></div></div></nav></article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_logging.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_logging.html b/docs/build3x/html/topics/impala_logging.html
new file mode 100644
index 0000000..a7cff05
--- /dev/null
+++ b/docs/build3x/html/topics/impala_logging.html
@@ -0,0 +1,423 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impa
 la 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="logging"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Using Impala Logging</title></head><body id="logging"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Using Impala Logging</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+      The Impala logs record information about:
+    </p>
+
+    <ul class="ul">
+      <li class="li">
+        Any errors Impala encountered. If Impala experienced a serious error during startup, you must diagnose and
+        troubleshoot that problem before you can do anything further with Impala.
+      </li>
+
+      <li class="li">
+        How Impala is configured.
+      </li>
+
+      <li class="li">
+        Jobs Impala has completed.
+      </li>
+    </ul>
+
+    <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+      <p class="p">
+        Formerly, the logs contained the query profile for each query, showing low-level details of how the work is
+        distributed among nodes and how intermediate and final results are transmitted across the network. To save
+        space, those query profiles are now stored in zlib-compressed files in
+        <span class="ph filepath">/var/log/impala/profiles</span>. You can access them through the Impala web user interface.
+        For example, at <code class="ph codeph">http://<var class="keyword varname">impalad-node-hostname</var>:25000/queries</code>, each query
+        is followed by a <code class="ph codeph">Profile</code> link leading to a page showing extensive analytical data for the
+        query execution.
+      </p>
+
+      <p class="p">
+        The auditing feature introduced in Impala 1.1.1 produces a separate set of audit log files when
+        enabled. See <a class="xref" href="impala_auditing.html#auditing">Auditing Impala Operations</a> for details.
+      </p>
+
+      <p class="p">
+        In <span class="keyword">Impala 2.9</span> and higher, you can control how many
+        audit event log files are kept on each host through the
+        <code class="ph codeph">--max_audit_event_log_files</code> startup option for the
+        <span class="keyword cmdname">impalad</span> daemon, similar to the <code class="ph codeph">--max_log_files</code>
+        option for regular log files.
+      </p>
+
+      <p class="p">
+        The lineage feature introduced in Impala 2.2.0 produces a separate lineage log file when
+        enabled. See <a class="xref" href="impala_lineage.html#lineage">Viewing Lineage Information for Impala Data</a> for details.
+      </p>
+    </div>
+
+    <p class="p toc inpage"></p>
+
+  </div>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title2" id="logging__logs_details">
+
+    <h2 class="title topictitle2" id="ariaid-title2">Locations and Names of Impala Log Files</h2>
+
+    <div class="body conbody">
+
+      <ul class="ul">
+        <li class="li">
+          By default, the log files are under the directory <span class="ph filepath">/var/log/impala</span>.
+          To change log file locations, modify the defaults file described in
+          <a class="xref" href="impala_processes.html#processes">Starting Impala</a>.
+        </li>
+
+        <li class="li">
+          The significant files for the <code class="ph codeph">impalad</code> process are <span class="ph filepath">impalad.INFO</span>,
+          <span class="ph filepath">impalad.WARNING</span>, and <span class="ph filepath">impalad.ERROR</span>. You might also see a file
+          <span class="ph filepath">impalad.FATAL</span>, although this is only present in rare conditions.
+        </li>
+
+        <li class="li">
+          The significant files for the <code class="ph codeph">statestored</code> process are
+          <span class="ph filepath">statestored.INFO</span>, <span class="ph filepath">statestored.WARNING</span>, and
+          <span class="ph filepath">statestored.ERROR</span>. You might also see a file <span class="ph filepath">statestored.FATAL</span>,
+          although this is only present in rare conditions.
+        </li>
+
+        <li class="li">
+          The significant files for the <code class="ph codeph">catalogd</code> process are <span class="ph filepath">catalogd.INFO</span>,
+          <span class="ph filepath">catalogd.WARNING</span>, and <span class="ph filepath">catalogd.ERROR</span>. You might also see a file
+          <span class="ph filepath">catalogd.FATAL</span>, although this is only present in rare conditions.
+        </li>
+
+        <li class="li">
+          Examine the <code class="ph codeph">.INFO</code> files to see configuration settings for the processes.
+        </li>
+
+        <li class="li">
+          Examine the <code class="ph codeph">.WARNING</code> files to see all kinds of problem information, including such
+          things as suboptimal settings and also serious runtime errors.
+        </li>
+
+        <li class="li">
+          Examine the <code class="ph codeph">.ERROR</code> and/or <code class="ph codeph">.FATAL</code> files to see only the most serious
+          errors, if the processes crash, or queries fail to complete. These messages are also in the
+          <code class="ph codeph">.WARNING</code> file.
+        </li>
+
+        <li class="li">
+          A new set of log files is produced each time the associated daemon is restarted. These log files have
+          long names including a timestamp. The <code class="ph codeph">.INFO</code>, <code class="ph codeph">.WARNING</code>, and
+          <code class="ph codeph">.ERROR</code> files are physically represented as symbolic links to the latest applicable log
+          files.
+        </li>
+
+        <li class="li">
+          The init script for the <code class="ph codeph">impala-server</code> service also produces a consolidated log file
+          <code class="ph codeph">/var/logs/impalad/impala-server.log</code>, with all the same information as the
+          corresponding<code class="ph codeph">.INFO</code>, <code class="ph codeph">.WARNING</code>, and <code class="ph codeph">.ERROR</code> files.
+        </li>
+
+        <li class="li">
+          The init script for the <code class="ph codeph">impala-state-store</code> service also produces a consolidated log file
+          <code class="ph codeph">/var/logs/impalad/impala-state-store.log</code>, with all the same information as the
+          corresponding<code class="ph codeph">.INFO</code>, <code class="ph codeph">.WARNING</code>, and <code class="ph codeph">.ERROR</code> files.
+        </li>
+      </ul>
+
+      <p class="p">
+        Impala stores information using the <code class="ph codeph">glog_v</code> logging system. You will see some messages
+        referring to C++ file names. Logging is affected by:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          The <code class="ph codeph">GLOG_v</code> environment variable specifies which types of messages are logged. See
+          <a class="xref" href="#log_levels">Setting Logging Levels</a> for details.
+        </li>
+
+        <li class="li">
+          The <code class="ph codeph">--logbuflevel</code> startup flag for the <span class="keyword cmdname">impalad</span> daemon specifies how
+          often the log information is written to disk. The default is 0, meaning that the log is immediately
+          flushed to disk when Impala outputs an important messages such as a warning or an error, but less
+          important messages such as informational ones are buffered in memory rather than being flushed to disk
+          immediately.
+        </li>
+      </ul>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="logging__logs_managing">
+
+    <h2 class="title topictitle2" id="ariaid-title3">Managing Impala Logs</h2>
+
+
+    <div class="body conbody">
+
+      <p class="p">
+        Review Impala log files on each host, when you have traced an issue back to a specific system.
+      </p>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="logging__logs_rotate">
+
+    <h2 class="title topictitle2" id="ariaid-title4">Rotating Impala Logs</h2>
+
+
+    <div class="body conbody">
+
+      <p class="p">
+        Impala periodically switches the physical files representing the current log files, after which it is safe
+        to remove the old files if they are no longer needed.
+      </p>
+
+      <p class="p">
+        Impala can automatically remove older unneeded log files, a feature known as <dfn class="term">log rotation</dfn>.
+
+      </p>
+
+      <p class="p">
+        In Impala 2.2 and higher, the <code class="ph codeph">--max_log_files</code> configuration option specifies how many log
+        files to keep at each severity level. You can specify an appropriate setting for each Impala-related daemon
+        (<span class="keyword cmdname">impalad</span>, <span class="keyword cmdname">statestored</span>, and <span class="keyword cmdname">catalogd</span>). The default
+        value is 10, meaning that Impala preserves the latest 10 log files for each severity level
+        (<code class="ph codeph">INFO</code>, <code class="ph codeph">WARNING</code>, <code class="ph codeph">ERROR</code>, and <code class="ph codeph">FATAL</code>).
+        Impala checks to see if any old logs need to be removed based on the interval specified in the
+        <code class="ph codeph">logbufsecs</code> setting, every 5 seconds by default.
+      </p>
+
+
+
+      <p class="p">
+        A value of 0 preserves all log files, in which case you would set up set up manual log rotation using your
+        Linux tool or technique of choice. A value of 1 preserves only the very latest log file.
+      </p>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title5" id="logging__logs_debug">
+
+    <h2 class="title topictitle2" id="ariaid-title5">Reviewing Impala Logs</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        By default, the Impala log is stored at <code class="ph codeph">/var/logs/impalad/</code>. The most comprehensive log,
+        showing informational, warning, and error messages, is in the file name <span class="ph filepath">impalad.INFO</span>.
+        View log file contents by using the web interface or by examining the contents of the log file. (When you
+        examine the logs through the file system, you can troubleshoot problems by reading the
+        <span class="ph filepath">impalad.WARNING</span> and/or <span class="ph filepath">impalad.ERROR</span> files, which contain the
+        subsets of messages indicating potential problems.)
+      </p>
+
+      <p class="p">
+        On a machine named <code class="ph codeph">impala.example.com</code> with default settings, you could view the Impala
+        logs on that machine by using a browser to access <code class="ph codeph">http://impala.example.com:25000/logs</code>.
+      </p>
+
+      <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+        <p class="p">
+          The web interface limits the amount of logging information displayed. To view every log entry, access the
+          log files directly through the file system.
+        </p>
+      </div>
+
+      <p class="p">
+        You can view the contents of the <code class="ph codeph">impalad.INFO</code> log file in the file system. With the
+        default configuration settings, the start of the log file appears as follows:
+      </p>
+
+<pre class="pre codeblock"><code>[user@example impalad]$ pwd
+/var/log/impalad
+[user@example impalad]$ more impalad.INFO
+Log file created at: 2013/01/07 08:42:12
+Running on machine: impala.example.com
+Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
+I0107 08:42:12.292155 14876 daemon.cc:34] impalad version 0.4 RELEASE (build 9d7fadca0461ab40b9e9df8cdb47107ec6b27cff)
+Built on Fri, 21 Dec 2012 12:55:19 PST
+I0107 08:42:12.292484 14876 daemon.cc:35] Using hostname: impala.example.com
+I0107 08:42:12.292706 14876 logging.cc:76] Flags (see also /varz are on debug webserver):
+--dump_ir=false
+--module_output=
+--be_port=22000
+--classpath=
+--hostname=impala.example.com</code></pre>
+
+      <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+        The preceding example shows only a small part of the log file. Impala log files are often several megabytes
+        in size.
+      </div>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title6" id="logging__log_format">
+
+    <h2 class="title topictitle2" id="ariaid-title6">Understanding Impala Log Contents</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        The logs store information about Impala startup options. This information appears once for each time Impala
+        is started and may include:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          Machine name.
+        </li>
+
+        <li class="li">
+          Impala version number.
+        </li>
+
+        <li class="li">
+          Flags used to start Impala.
+        </li>
+
+        <li class="li">
+          CPU information.
+        </li>
+
+        <li class="li">
+          The number of available disks.
+        </li>
+      </ul>
+
+      <p class="p">
+        There is information about each job Impala has run. Because each Impala job creates an additional set of
+        data about queries, the amount of job specific data may be very large. Logs may contained detailed
+        information on jobs. These detailed log entries may include:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          The composition of the query.
+        </li>
+
+        <li class="li">
+          The degree of data locality.
+        </li>
+
+        <li class="li">
+          Statistics on data throughput and response times.
+        </li>
+      </ul>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title7" id="logging__log_levels">
+
+    <h2 class="title topictitle2" id="ariaid-title7">Setting Logging Levels</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        Impala uses the GLOG system, which supports three logging levels. You can adjust logging levels
+        by exporting variable settings. To change logging settings manually, use a command
+        similar to the following on each node before starting <code class="ph codeph">impalad</code>:
+      </p>
+
+<pre class="pre codeblock"><code>export GLOG_v=1</code></pre>
+
+      <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+        For performance reasons, do not enable the most verbose logging level of 3 unless there is
+        no other alternative for troubleshooting.
+      </div>
+
+      <p class="p">
+        For more information on how to configure GLOG, including how to set variable logging levels for different
+        system components, see
+        <a class="xref" href="https://github.com/google/glog" target="_blank">documentation for the glog project on github</a>.
+      </p>
+
+      <section class="section" id="log_levels__loglevels_details"><h3 class="title sectiontitle">Understanding What is Logged at Different Logging Levels</h3>
+
+
+
+        <p class="p">
+          As logging levels increase, the categories of information logged are cumulative. For example, GLOG_v=2
+          records everything GLOG_v=1 records, as well as additional information.
+        </p>
+
+        <p class="p">
+          Increasing logging levels imposes performance overhead and increases log size. Where practical, use
+          GLOG_v=1 for most cases: this level has minimal performance impact but still captures useful
+          troubleshooting information.
+        </p>
+
+        <p class="p">
+          Additional information logged at each level is as follows:
+        </p>
+
+        <ul class="ul">
+          <li class="li">
+            GLOG_v=1 - The default level. Logs information about each connection and query that is initiated to an
+            <code class="ph codeph">impalad</code> instance, including runtime profiles.
+          </li>
+
+          <li class="li">
+            GLOG_v=2 - Everything from the previous level plus information for each RPC initiated. This level also
+            records query execution progress information, including details on each file that is read.
+          </li>
+
+          <li class="li">
+            GLOG_v=3 - Everything from the previous level plus logging of every row that is read. This level is
+            only applicable for the most serious troubleshooting and tuning scenarios, because it can produce
+            exceptionally large and detailed log files, potentially leading to its own set of performance and
+            capacity problems.
+          </li>
+        </ul>
+
+      </section>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title8" id="logging__redaction">
+
+    <h2 class="title topictitle2" id="ariaid-title8">Redacting Sensitive Information from Impala Log Files</h2>
+
+
+    <div class="body conbody">
+
+      <p class="p">
+
+        <dfn class="term">Log redaction</dfn> is a security feature that prevents sensitive information from being displayed in
+        locations used by administrators for monitoring and troubleshooting, such as log files and the Impala debug web
+        user interface. You configure regular expressions that match sensitive types of information processed by your
+        system, such as credit card numbers or tax IDs, and literals matching these patterns are obfuscated wherever
+        they would normally be recorded in log files or displayed in administration or debugging user interfaces.
+      </p>
+
+      <p class="p">
+        In a security context, the log redaction feature is complementary to the Sentry authorization framework.
+        Sentry prevents unauthorized users from being able to directly access table data. Redaction prevents
+        administrators or support personnel from seeing the smaller amounts of sensitive or personally identifying
+        information (PII) that might appear in queries issued by those authorized users.
+      </p>
+
+      <p class="p">
+        See <span class="xref">the documentation for your Apache Hadoop distribution</span> for details about how to enable this feature and set
+        up the regular expressions to detect and redact sensitive information within SQL statement text.
+      </p>
+
+    </div>
+
+  </article>
+
+</article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_map.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_map.html b/docs/build3x/html/topics/impala_map.html
new file mode 100644
index 0000000..3325a9b
--- /dev/null
+++ b/docs/build3x/html/topics/impala_map.html
@@ -0,0 +1,331 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_datatypes.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="map"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>MAP Complex Type (Impala 2.3 or higher only)</title></head><body id="map"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+    <h1 class="title topictitle1" id="ariaid-title1">MAP Complex Type (<span class="keyword">Impala 2.3</span> or higher only)</h1>
+
+
+
+    <div class="body conbody">
+
+      <p class="p">
+        A complex data type representing an arbitrary set of key-value pairs.
+        The key part is a scalar type, while the value part can be a scalar or
+        another complex type (<code class="ph codeph">ARRAY</code>, <code class="ph codeph">STRUCT</code>,
+        or <code class="ph codeph">MAP</code>).
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+<pre class="pre codeblock"><code><var class="keyword varname">column_name</var> MAP &lt; <var class="keyword varname">primitive_type</var>, <var class="keyword varname">type</var> &gt;
+
+type ::= <var class="keyword varname">primitive_type</var> | <var class="keyword varname">complex_type</var>
+</code></pre>
+
+      <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+      <p class="p">
+        Because complex types are often used in combination,
+        for example an <code class="ph codeph">ARRAY</code> of <code class="ph codeph">STRUCT</code>
+        elements, if you are unfamiliar with the Impala complex types,
+        start with <a class="xref" href="../shared/../topics/impala_complex_types.html#complex_types">Complex Types (Impala 2.3 or higher only)</a> for
+        background information and usage examples.
+      </p>
+
+      <p class="p">
+        The <code class="ph codeph">MAP</code> complex data type represents a set of key-value pairs.
+        Each element of the map is indexed by a primitive type such as <code class="ph codeph">BIGINT</code> or
+        <code class="ph codeph">STRING</code>, letting you define sequences that are not continuous or categories with arbitrary names.
+        You might find it convenient for modelling data produced in other languages, such as a
+        Python dictionary or Java HashMap, where a single scalar value serves as the lookup key.
+      </p>
+
+      <p class="p">
+        In a big data context, the keys in a map column might represent a numeric sequence of events during a
+        manufacturing process, or <code class="ph codeph">TIMESTAMP</code> values corresponding to sensor observations.
+        The map itself is inherently unordered, so you choose whether to make the key values significant
+        (such as a recorded <code class="ph codeph">TIMESTAMP</code>) or synthetic (such as a random global universal ID).
+      </p>
+
+      <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+        Behind the scenes, the <code class="ph codeph">MAP</code> type is implemented in a similar way as the
+        <code class="ph codeph">ARRAY</code> type. Impala does not enforce any uniqueness constraint on the
+        <code class="ph codeph">KEY</code> values, and the <code class="ph codeph">KEY</code> values are processed by
+        looping through the elements of the <code class="ph codeph">MAP</code> rather than by a constant-time lookup.
+        Therefore, this type is primarily for ease of understanding when importing data and
+        algorithms from non-SQL contexts, rather than optimizing the performance of key lookups.
+      </div>
+
+      <p class="p">
+        You can pass a multi-part qualified name to <code class="ph codeph">DESCRIBE</code>
+        to specify an <code class="ph codeph">ARRAY</code>, <code class="ph codeph">STRUCT</code>, or <code class="ph codeph">MAP</code>
+        column and visualize its structure as if it were a table.
+        For example, if table <code class="ph codeph">T1</code> contains an <code class="ph codeph">ARRAY</code> column
+        <code class="ph codeph">A1</code>, you could issue the statement <code class="ph codeph">DESCRIBE t1.a1</code>.
+        If table <code class="ph codeph">T1</code> contained a <code class="ph codeph">STRUCT</code> column <code class="ph codeph">S1</code>,
+        and a field <code class="ph codeph">F1</code> within the <code class="ph codeph">STRUCT</code> was a <code class="ph codeph">MAP</code>,
+        you could issue the statement <code class="ph codeph">DESCRIBE t1.s1.f1</code>.
+        An <code class="ph codeph">ARRAY</code> is shown as a two-column table, with
+        <code class="ph codeph">ITEM</code> and <code class="ph codeph">POS</code> columns.
+        A <code class="ph codeph">STRUCT</code> is shown as a table with each field
+        representing a column in the table.
+        A <code class="ph codeph">MAP</code> is shown as a two-column table, with
+        <code class="ph codeph">KEY</code> and <code class="ph codeph">VALUE</code> columns.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.3.0</span>
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Restrictions:</strong>
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          <p class="p">
+            Columns with this data type can only be used in tables or partitions with the Parquet file format.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            Columns with this data type cannot be used as partition key columns in a partitioned table.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            The <code class="ph codeph">COMPUTE STATS</code> statement does not produce any statistics for columns of this data type.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p" id="map__d6e3285">
+            The maximum length of the column definition for any complex type, including declarations for any nested types,
+            is 4000 characters.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            See <a class="xref" href="../shared/../topics/impala_complex_types.html#complex_types_limits">Limitations and Restrictions for Complex Types</a> for a full list of limitations
+            and associated guidelines about complex type columns.
+          </p>
+        </li>
+      </ul>
+
+      <p class="p">
+        <strong class="ph b">Kudu considerations:</strong>
+      </p>
+      <p class="p">
+        Currently, the data types <code class="ph codeph">CHAR</code>, <code class="ph codeph">VARCHAR</code>,
+        <code class="ph codeph">ARRAY</code>, <code class="ph codeph">MAP</code>, and <code class="ph codeph">STRUCT</code> cannot be used with Kudu tables.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+      <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+      Many of the complex type examples refer to tables
+      such as <code class="ph codeph">CUSTOMER</code> and <code class="ph codeph">REGION</code>
+      adapted from the tables used in the TPC-H benchmark.
+      See <a class="xref" href="../shared/../topics/impala_complex_types.html#complex_sample_schema">Sample Schema and Data for Experimenting with Impala Complex Types</a>
+      for the table definitions.
+      </div>
+
+      <p class="p">
+        The following example shows a table with various kinds of <code class="ph codeph">MAP</code> columns,
+        both at the top level and nested within other complex types.
+        Each row represents information about a specific country, with complex type fields
+        of various levels of nesting to represent different information associated
+        with the country: factual measurements such as area and population,
+        notable people in different categories, geographic features such as
+        cities, points of interest within each city, and mountains with associated facts.
+        Practice the <code class="ph codeph">CREATE TABLE</code> and query notation for complex type columns
+        using empty tables, until you can visualize a complex data structure and construct corresponding SQL statements reliably.
+      </p>
+
+<pre class="pre codeblock"><code>create TABLE map_demo
+(
+  country_id BIGINT,
+
+-- Numeric facts about each country, looked up by name.
+-- For example, 'Area':1000, 'Population':999999.
+-- Using a MAP instead of a STRUCT because there could be
+-- a different set of facts for each country.
+  metrics MAP &lt;STRING, BIGINT&gt;,
+
+-- MAP whose value part is an ARRAY.
+-- For example, the key 'Famous Politicians' could represent an array of 10 elements,
+-- while the key 'Famous Actors' could represent an array of 20 elements.
+  notables MAP &lt;STRING, ARRAY &lt;STRING&gt;&gt;,
+
+-- MAP that is a field within a STRUCT.
+-- (The STRUCT is inside another ARRAY, because it is rare
+-- for a STRUCT to be a top-level column.)
+-- For example, city #1 might have points of interest with key 'Zoo',
+-- representing an array of 3 different zoos.
+-- City #2 might have completely different kinds of points of interest.
+-- Because the set of field names is potentially large, and most entries could be blank,
+-- a MAP makes more sense than a STRUCT to represent such a sparse data structure.
+  cities ARRAY &lt; STRUCT &lt;
+    name: STRING,
+    points_of_interest: MAP &lt;STRING, ARRAY &lt;STRING&gt;&gt;
+  &gt;&gt;,
+
+-- MAP that is an element within an ARRAY. The MAP is inside a STRUCT field to associate
+-- the mountain name with all the facts about the mountain.
+-- The "key" of the map (the first STRING field) represents the name of some fact whose value
+-- can be expressed as an integer, such as 'Height', 'Year First Climbed', and so on.
+  mountains ARRAY &lt; STRUCT &lt; name: STRING, facts: MAP &lt;STRING, INT &gt; &gt; &gt;
+)
+STORED AS PARQUET;
+
+</code></pre>
+
+<pre class="pre codeblock"><code>DESCRIBE map_demo;
++------------+------------------------------------------------+
+| name       | type                                           |
++------------+------------------------------------------------+
+| country_id | bigint                                         |
+| metrics    | map&lt;string,bigint&gt;                             |
+| notables   | map&lt;string,array&lt;string&gt;&gt;                      |
+| cities     | array&lt;struct&lt;                                  |
+|            |   name:string,                                 |
+|            |   points_of_interest:map&lt;string,array&lt;string&gt;&gt; |
+|            | &gt;&gt;                                             |
+| mountains  | array&lt;struct&lt;                                  |
+|            |   name:string,                                 |
+|            |   facts:map&lt;string,int&gt;                        |
+|            | &gt;&gt;                                             |
++------------+------------------------------------------------+
+
+DESCRIBE map_demo.metrics;
++-------+--------+
+| name  | type   |
++-------+--------+
+| key   | string |
+| value | bigint |
++-------+--------+
+
+DESCRIBE map_demo.notables;
++-------+---------------+
+| name  | type          |
++-------+---------------+
+| key   | string        |
+| value | array&lt;string&gt; |
++-------+---------------+
+
+DESCRIBE map_demo.notables.value;
++------+--------+
+| name | type   |
++------+--------+
+| item | string |
+| pos  | bigint |
++------+--------+
+
+DESCRIBE map_demo.cities;
++------+------------------------------------------------+
+| name | type                                           |
++------+------------------------------------------------+
+| item | struct&lt;                                        |
+|      |   name:string,                                 |
+|      |   points_of_interest:map&lt;string,array&lt;string&gt;&gt; |
+|      | &gt;                                              |
+| pos  | bigint                                         |
++------+------------------------------------------------+
+
+DESCRIBE map_demo.cities.item.points_of_interest;
++-------+---------------+
+| name  | type          |
++-------+---------------+
+| key   | string        |
+| value | array&lt;string&gt; |
++-------+---------------+
+
+DESCRIBE map_demo.cities.item.points_of_interest.value;
++------+--------+
+| name | type   |
++------+--------+
+| item | string |
+| pos  | bigint |
++------+--------+
+
+DESCRIBE map_demo.mountains;
++------+-------------------------+
+| name | type                    |
++------+-------------------------+
+| item | struct&lt;                 |
+|      |   name:string,          |
+|      |   facts:map&lt;string,int&gt; |
+|      | &gt;                       |
+| pos  | bigint                  |
++------+-------------------------+
+
+DESCRIBE map_demo.mountains.item.facts;
++-------+--------+
+| name  | type   |
++-------+--------+
+| key   | string |
+| value | int    |
++-------+--------+
+
+</code></pre>
+
+      <p class="p">
+        The following example shows a table that uses a variety of data types for the <code class="ph codeph">MAP</code>
+        <span class="q">"key"</span> field. Typically, you use <code class="ph codeph">BIGINT</code> or <code class="ph codeph">STRING</code> to use
+        numeric or character-based keys without worrying about exceeding any size or length constraints.
+      </p>
+
+<pre class="pre codeblock"><code>CREATE TABLE map_demo_obscure
+(
+  id BIGINT,
+  m1 MAP &lt;INT, INT&gt;,
+  m2 MAP &lt;SMALLINT, INT&gt;,
+  m3 MAP &lt;TINYINT, INT&gt;,
+  m4 MAP &lt;TIMESTAMP, INT&gt;,
+  m5 MAP &lt;BOOLEAN, INT&gt;,
+  m6 MAP &lt;CHAR(5), INT&gt;,
+  m7 MAP &lt;VARCHAR(25), INT&gt;,
+  m8 MAP &lt;FLOAT, INT&gt;,
+  m9 MAP &lt;DOUBLE, INT&gt;,
+  m10 MAP &lt;DECIMAL(12,2), INT&gt;
+)
+STORED AS PARQUET;
+
+</code></pre>
+
+<pre class="pre codeblock"><code>CREATE TABLE celebrities (name STRING, birth_year MAP &lt; STRING, SMALLINT &gt;) STORED AS PARQUET;
+-- A typical row might represent values with 2 different birth years, such as:
+-- ("Joe Movie Star", { "real": 1972, "claimed": 1977 })
+
+CREATE TABLE countries (name STRING, famous_leaders MAP &lt; INT, STRING &gt;) STORED AS PARQUET;
+-- A typical row might represent values with different leaders, with key values corresponding to their numeric sequence, such as:
+-- ("United States", { 1: "George Washington", 3: "Thomas Jefferson", 16: "Abraham Lincoln" })
+
+CREATE TABLE airlines (name STRING, special_meals MAP &lt; STRING, MAP &lt; STRING, STRING &gt; &gt;) STORED AS PARQUET;
+-- A typical row might represent values with multiple kinds of meals, each with several components:
+-- ("Elegant Airlines",
+--   {
+--     "vegetarian": { "breakfast": "pancakes", "snack": "cookies", "dinner": "rice pilaf" },
+--     "gluten free": { "breakfast": "oatmeal", "snack": "fruit", "dinner": "chicken" }
+--   } )
+</code></pre>
+
+      <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+      <p class="p">
+        <a class="xref" href="impala_complex_types.html#complex_types">Complex Types (Impala 2.3 or higher only)</a>,
+        <a class="xref" href="impala_array.html#array">ARRAY Complex Type (Impala 2.3 or higher only)</a>,
+        <a class="xref" href="impala_struct.html#struct">STRUCT Complex Type (Impala 2.3 or higher only)</a>
+
+      </p>
+
+    </div>
+
+  <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_datatypes.html">Data Types</a></div></div></nav></article></main></body></html>

[13/51] [partial] impala git commit: [DOCS] Impala doc site update for 3.0

Posted by mi...@apache.org.

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_perf_stats.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_perf_stats.html b/docs/build3x/html/topics/impala_perf_stats.html
new file mode 100644
index 0000000..c4bdf0c
--- /dev/null
+++ b/docs/build3x/html/topics/impala_perf_stats.html
@@ -0,0 +1,1192 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_performance.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="perf_stats"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Table and Column Statistics</title></head><body id="perf_stats"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Table and Column Statistics</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+      Impala can do better optimization for complex or multi-table queries when it has access to
+      statistics about the volume of data and how the values are distributed. Impala uses this
+      information to help parallelize and distribute the work for a query. For example,
+      optimizing join queries requires a way of determining if one table is <span class="q">"bigger"</span> than
+      another, which is a function of the number of rows and the average row size for each
+      table. The following sections describe the categories of statistics Impala can work with,
+      and how to produce them and keep them up to date.
+    </p>
+
+    <p class="p toc inpage all"></p>
+
+  </div>
+
+  <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_performance.html">Tuning Impala for Performance</a></div></div></nav><article class="topic concept nested1" aria-labelledby="perf_table_stats__table_stats" id="perf_stats__perf_table_stats">
+
+    <h2 class="title topictitle2" id="perf_table_stats__table_stats">Overview of Table Statistics</h2>
+
+
+
+    <div class="body conbody">
+
+      <p class="p">
+        The Impala query planner can make use of statistics about entire tables and partitions.
+        This information includes physical characteristics such as the number of rows, number of
+        data files, the total size of the data files, and the file format. For partitioned
+        tables, the numbers are calculated per partition, and as totals for the whole table.
+        This metadata is stored in the metastore database, and can be updated by either Impala
+        or Hive. If a number is not available, the value -1 is used as a placeholder. Some
+        numbers, such as number and total sizes of data files, are always kept up to date
+        because they can be calculated cheaply, as part of gathering HDFS block metadata.
+      </p>
+
+      <p class="p">
+        The following example shows table stats for an unpartitioned Parquet table. The values
+        for the number and sizes of files are always available. Initially, the number of rows is
+        not known, because it requires a potentially expensive scan through the entire table,
+        and so that value is displayed as -1. The <code class="ph codeph">COMPUTE STATS</code> statement fills
+        in any unknown table stats values.
+      </p>
+
+<pre class="pre codeblock"><code>
+show table stats parquet_snappy;
++-------+--------+---------+--------------+-------------------+---------+-------------------+...
+| #Rows | #Files | Size    | Bytes Cached | Cache Replication | Format  | Incremental stats |...
++-------+--------+---------+--------------+-------------------+---------+-------------------+...
+| -1    | 96     | 23.35GB | NOT CACHED   | NOT CACHED        | PARQUET | false             |...
++-------+--------+---------+--------------+-------------------+---------+-------------------+...
+
+compute stats parquet_snappy;
++-----------------------------------------+
+| summary                                 |
++-----------------------------------------+
+| Updated 1 partition(s) and 6 column(s). |
++-----------------------------------------+
+
+
+show table stats parquet_snappy;
++------------+--------+---------+--------------+-------------------+---------+-------------------+...
+| #Rows      | #Files | Size    | Bytes Cached | Cache Replication | Format  | Incremental stats |...
++------------+--------+---------+--------------+-------------------+---------+-------------------+...
+| 1000000000 | 96     | 23.35GB | NOT CACHED   | NOT CACHED        | PARQUET | false             |...
++------------+--------+---------+--------------+-------------------+---------+-------------------+...
+</code></pre>
+
+      <p class="p">
+        Impala performs some optimizations using this metadata on its own, and other
+        optimizations by using a combination of table and column statistics.
+      </p>
+
+      <p class="p">
+        To check that table statistics are available for a table, and see the details of those
+        statistics, use the statement <code class="ph codeph">SHOW TABLE STATS
+        <var class="keyword varname">table_name</var></code>. See <a class="xref" href="impala_show.html#show">SHOW Statement</a> for
+        details.
+      </p>
+
+      <p class="p">
+        If you use the Hive-based methods of gathering statistics, see
+        <a class="xref" href="https://cwiki.apache.org/confluence/display/Hive/StatsDev" target="_blank">the
+        Hive wiki</a> for information about the required configuration on the Hive side.
+        Where practical, use the Impala <code class="ph codeph">COMPUTE STATS</code> statement to avoid
+        potential configuration and scalability issues with the statistics-gathering process.
+      </p>
+
+      <p class="p">
+        If you run the Hive statement <code class="ph codeph">ANALYZE TABLE COMPUTE STATISTICS FOR COLUMNS</code>,
+        Impala can only use the resulting column statistics if the table is unpartitioned.
+        Impala cannot use Hive-generated column statistics for a partitioned table.
+      </p>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="perf_column_stats__column_stats" id="perf_stats__perf_column_stats">
+
+    <h2 class="title topictitle2" id="perf_column_stats__column_stats">Overview of Column Statistics</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        The Impala query planner can make use of statistics about individual columns when that
+        metadata is available in the metastore database. This technique is most valuable for
+        columns compared across tables in <a class="xref" href="impala_perf_joins.html#perf_joins">join
+        queries</a>, to help estimate how many rows the query will retrieve from each table.
+        <span class="ph"> These statistics are also important for correlated subqueries using the
+        <code class="ph codeph">EXISTS()</code> or <code class="ph codeph">IN()</code> operators, which are processed
+        internally the same way as join queries.</span>
+      </p>
+
+      <p class="p">
+        The following example shows column stats for an unpartitioned Parquet table. The values
+        for the maximum and average sizes of some types are always available, because those
+        figures are constant for numeric and other fixed-size types. Initially, the number of
+        distinct values is not known, because it requires a potentially expensive scan through
+        the entire table, and so that value is displayed as -1. The same applies to maximum and
+        average sizes of variable-sized types, such as <code class="ph codeph">STRING</code>. The
+        <code class="ph codeph">COMPUTE STATS</code> statement fills in most unknown column stats values. (It
+        does not record the number of <code class="ph codeph">NULL</code> values, because currently Impala
+        does not use that figure for query optimization.)
+      </p>
+
+<pre class="pre codeblock"><code>
+show column stats parquet_snappy;
++-------------+----------+------------------+--------+----------+----------+
+| Column      | Type     | #Distinct Values | #Nulls | Max Size | Avg Size |
++-------------+----------+------------------+--------+----------+----------+
+| id          | BIGINT   | -1               | -1     | 8        | 8        |
+| val         | INT      | -1               | -1     | 4        | 4        |
+| zerofill    | STRING   | -1               | -1     | -1       | -1       |
+| name        | STRING   | -1               | -1     | -1       | -1       |
+| assertion   | BOOLEAN  | -1               | -1     | 1        | 1        |
+| location_id | SMALLINT | -1               | -1     | 2        | 2        |
++-------------+----------+------------------+--------+----------+----------+
+
+compute stats parquet_snappy;
++-----------------------------------------+
+| summary                                 |
++-----------------------------------------+
+| Updated 1 partition(s) and 6 column(s). |
++-----------------------------------------+
+
+show column stats parquet_snappy;
++-------------+----------+------------------+--------+----------+-------------------+
+| Column      | Type     | #Distinct Values | #Nulls | Max Size | Avg Size          |
++-------------+----------+------------------+--------+----------+-------------------+
+| id          | BIGINT   | 183861280        | -1     | 8        | 8                 |
+| val         | INT      | 139017           | -1     | 4        | 4                 |
+| zerofill    | STRING   | 101761           | -1     | 6        | 6                 |
+| name        | STRING   | 145636240        | -1     | 22       | 13.00020027160645 |
+| assertion   | BOOLEAN  | 2                | -1     | 1        | 1                 |
+| location_id | SMALLINT | 339              | -1     | 2        | 2                 |
++-------------+----------+------------------+--------+----------+-------------------+
+</code></pre>
+
+      <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+        <p class="p">
+          For column statistics to be effective in Impala, you also need to have table
+          statistics for the applicable tables, as described in
+          <a class="xref" href="impala_perf_stats.html#perf_table_stats">Overview of Table Statistics</a>. When you use the Impala
+          <code class="ph codeph">COMPUTE STATS</code> statement, both table and column statistics are
+          automatically gathered at the same time, for all columns in the table.
+        </p>
+      </div>
+
+      <div class="note note note_note"><span class="note__title notetitle">Note:</span>  Prior to Impala 1.4.0,
+          <code class="ph codeph">COMPUTE STATS</code> counted the number of
+          <code class="ph codeph">NULL</code> values in each column and recorded that figure
+        in the metastore database. Because Impala does not currently use the
+          <code class="ph codeph">NULL</code> count during query planning, Impala 1.4.0 and
+        higher speeds up the <code class="ph codeph">COMPUTE STATS</code> statement by
+        skipping this <code class="ph codeph">NULL</code> counting. </div>
+
+      <p class="p">
+        To check whether column statistics are available for a particular set of columns, use
+        the <code class="ph codeph">SHOW COLUMN STATS <var class="keyword varname">table_name</var></code> statement, or check
+        the extended <code class="ph codeph">EXPLAIN</code> output for a query against that table that refers
+        to those columns. See <a class="xref" href="impala_show.html#show">SHOW Statement</a> and
+        <a class="xref" href="impala_explain.html#explain">EXPLAIN Statement</a> for details.
+      </p>
+
+      <p class="p">
+        If you run the Hive statement <code class="ph codeph">ANALYZE TABLE COMPUTE STATISTICS FOR COLUMNS</code>,
+        Impala can only use the resulting column statistics if the table is unpartitioned.
+        Impala cannot use Hive-generated column statistics for a partitioned table.
+      </p>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="perf_stats_partitions__stats_partitions" id="perf_stats__perf_stats_partitions">
+
+    <h2 class="title topictitle2" id="perf_stats_partitions__stats_partitions">How Table and Column Statistics Work for Partitioned Tables</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        When you use Impala for <span class="q">"big data"</span>, you are highly likely to use partitioning for
+        your biggest tables, the ones representing data that can be logically divided based on
+        dates, geographic regions, or similar criteria. The table and column statistics are
+        especially useful for optimizing queries on such tables. For example, a query involving
+        one year might involve substantially more or less data than a query involving a
+        different year, or a range of several years. Each query might be optimized differently
+        as a result.
+      </p>
+
+      <p class="p">
+        The following examples show how table and column stats work with a partitioned table.
+        The table for this example is partitioned by year, month, and day. For simplicity, the
+        sample data consists of 5 partitions, all from the same year and month. Table stats are
+        collected independently for each partition. (In fact, the <code class="ph codeph">SHOW
+        PARTITIONS</code> statement displays exactly the same information as <code class="ph codeph">SHOW
+        TABLE STATS</code> for a partitioned table.) Column stats apply to the entire table,
+        not to individual partitions. Because the partition key column values are represented as
+        HDFS directories, their characteristics are typically known in advance, even when the
+        values for non-key columns are shown as -1.
+      </p>
+
+<pre class="pre codeblock"><code>
+show partitions year_month_day;
++-------+-------+-----+-------+--------+---------+--------------+-------------------+---------+...
+| year  | month | day | #Rows | #Files | Size    | Bytes Cached | Cache Replication | Format  |...
++-------+-------+-----+-------+--------+---------+--------------+-------------------+---------+...
+| 2013  | 12    | 1   | -1    | 1      | 2.51MB  | NOT CACHED   | NOT CACHED        | PARQUET |...
+| 2013  | 12    | 2   | -1    | 1      | 2.53MB  | NOT CACHED   | NOT CACHED        | PARQUET |...
+| 2013  | 12    | 3   | -1    | 1      | 2.52MB  | NOT CACHED   | NOT CACHED        | PARQUET |...
+| 2013  | 12    | 4   | -1    | 1      | 2.51MB  | NOT CACHED   | NOT CACHED        | PARQUET |...
+| 2013  | 12    | 5   | -1    | 1      | 2.52MB  | NOT CACHED   | NOT CACHED        | PARQUET |...
+| Total |       |     | -1    | 5      | 12.58MB | 0B           |                   |         |...
++-------+-------+-----+-------+--------+---------+--------------+-------------------+---------+...
+
+show table stats year_month_day;
++-------+-------+-----+-------+--------+---------+--------------+-------------------+---------+...
+| year  | month | day | #Rows | #Files | Size    | Bytes Cached | Cache Replication | Format  |...
++-------+-------+-----+-------+--------+---------+--------------+-------------------+---------+...
+| 2013  | 12    | 1   | -1    | 1      | 2.51MB  | NOT CACHED   | NOT CACHED        | PARQUET |...
+| 2013  | 12    | 2   | -1    | 1      | 2.53MB  | NOT CACHED   | NOT CACHED        | PARQUET |...
+| 2013  | 12    | 3   | -1    | 1      | 2.52MB  | NOT CACHED   | NOT CACHED        | PARQUET |...
+| 2013  | 12    | 4   | -1    | 1      | 2.51MB  | NOT CACHED   | NOT CACHED        | PARQUET |...
+| 2013  | 12    | 5   | -1    | 1      | 2.52MB  | NOT CACHED   | NOT CACHED        | PARQUET |...
+| Total |       |     | -1    | 5      | 12.58MB | 0B           |                   |         |...
++-------+-------+-----+-------+--------+---------+--------------+-------------------+---------+...
+
+show column stats year_month_day;
++-----------+---------+------------------+--------+----------+----------+
+| Column    | Type    | #Distinct Values | #Nulls | Max Size | Avg Size |
++-----------+---------+------------------+--------+----------+----------+
+| id        | INT     | -1               | -1     | 4        | 4        |
+| val       | INT     | -1               | -1     | 4        | 4        |
+| zfill     | STRING  | -1               | -1     | -1       | -1       |
+| name      | STRING  | -1               | -1     | -1       | -1       |
+| assertion | BOOLEAN | -1               | -1     | 1        | 1        |
+| year      | INT     | 1                | 0      | 4        | 4        |
+| month     | INT     | 1                | 0      | 4        | 4        |
+| day       | INT     | 5                | 0      | 4        | 4        |
++-----------+---------+------------------+--------+----------+----------+
+
+compute stats year_month_day;
++-----------------------------------------+
+| summary                                 |
++-----------------------------------------+
+| Updated 5 partition(s) and 5 column(s). |
++-----------------------------------------+
+
+show table stats year_month_day;
++-------+-------+-----+--------+--------+---------+--------------+-------------------+---------+...
+| year  | month | day | #Rows  | #Files | Size    | Bytes Cached | Cache Replication | Format  |...
++-------+-------+-----+--------+--------+---------+--------------+-------------------+---------+...
+| 2013  | 12    | 1   | 93606  | 1      | 2.51MB  | NOT CACHED   | NOT CACHED        | PARQUET |...
+| 2013  | 12    | 2   | 94158  | 1      | 2.53MB  | NOT CACHED   | NOT CACHED        | PARQUET |...
+| 2013  | 12    | 3   | 94122  | 1      | 2.52MB  | NOT CACHED   | NOT CACHED        | PARQUET |...
+| 2013  | 12    | 4   | 93559  | 1      | 2.51MB  | NOT CACHED   | NOT CACHED        | PARQUET |...
+| 2013  | 12    | 5   | 93845  | 1      | 2.52MB  | NOT CACHED   | NOT CACHED        | PARQUET |...
+| Total |       |     | 469290 | 5      | 12.58MB | 0B           |                   |         |...
++-------+-------+-----+--------+--------+---------+--------------+-------------------+---------+...
+
+show column stats year_month_day;
++-----------+---------+------------------+--------+----------+-------------------+
+| Column    | Type    | #Distinct Values | #Nulls | Max Size | Avg Size          |
++-----------+---------+------------------+--------+----------+-------------------+
+| id        | INT     | 511129           | -1     | 4        | 4                 |
+| val       | INT     | 364853           | -1     | 4        | 4                 |
+| zfill     | STRING  | 311430           | -1     | 6        | 6                 |
+| name      | STRING  | 471975           | -1     | 22       | 13.00160026550293 |
+| assertion | BOOLEAN | 2                | -1     | 1        | 1                 |
+| year      | INT     | 1                | 0      | 4        | 4                 |
+| month     | INT     | 1                | 0      | 4        | 4                 |
+| day       | INT     | 5                | 0      | 4        | 4                 |
++-----------+---------+------------------+--------+----------+-------------------+
+</code></pre>
+
+      <p class="p">
+        If you run the Hive statement <code class="ph codeph">ANALYZE TABLE COMPUTE STATISTICS FOR COLUMNS</code>,
+        Impala can only use the resulting column statistics if the table is unpartitioned.
+        Impala cannot use Hive-generated column statistics for a partitioned table.
+      </p>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title5" id="perf_stats__perf_generating_stats">
+
+    <h2 class="title topictitle2" id="ariaid-title5">Generating Table and Column Statistics</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        Use the <code class="ph codeph">COMPUTE STATS</code> family of commands to collect table and
+        column statistics. The <code class="ph codeph">COMPUTE STATS</code> variants offer
+        different tradeoffs between computation cost, staleness, and maintenance
+        workflows which are explained below.
+      </p>
+
+      <div class="note important note_important"><span class="note__title importanttitle">Important:</span>
+        <p class="p">
+        For a particular table, use either <code class="ph codeph">COMPUTE STATS</code> or
+        <code class="ph codeph">COMPUTE INCREMENTAL STATS</code>, but never combine the two or
+        alternate between them. If you switch from <code class="ph codeph">COMPUTE STATS</code> to
+        <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> during the lifetime of a table, or
+        vice versa, drop all statistics by running <code class="ph codeph">DROP STATS</code> before
+        making the switch.
+      </p>
+      </div>
+
+
+
+    </div>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title6" id="perf_generating_stats__concept_y2f_nfl_mdb">
+
+      <h3 class="title topictitle3" id="ariaid-title6">COMPUTE STATS</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          The <code class="ph codeph">COMPUTE STATS</code> command collects and sets the table-level
+          and partition-level row counts as well as all column statistics for a given
+          table. The collection process is CPU-intensive and can take a long time to
+          complete for very large tables.
+        </p>
+        <div class="p">
+          To speed up <code class="ph codeph">COMPUTE STATS</code> consider the following options
+          which can be combined.
+          <ul class="ul">
+          <li class="li">
+            <p class="p">
+              Limit the number of columns for which statistics are collected to increase
+              the efficiency of COMPUTE STATS. Queries benefit from statistics for those
+              columns involved in filters, join conditions, group by or partition by
+              clauses. Other columns are good candidates to exclude from COMPUTE STATS.
+              This feature is available since Impala 2.12.
+            </p>
+          </li>
+          <li class="li">
+            <p class="p">
+              Set the MT_DOP query option to use more threads within each participating
+              impalad to compute the statistics faster - but not more efficiently. Note
+              that computing stats on a large table with a high MT_DOP value can
+              negatively affect other queries running at the same time if the
+              COMPUTE STATS claims most CPU cycles.
+              This feature is available since Impala 2.8.
+            </p>
+          </li>
+          <li class="li">
+            <p class="p">
+              Consider the experimental extrapolation and sampling features (see below)
+              to further increase the efficiency of computing stats.
+            </p>
+          </li>
+          </ul>
+        </div>
+
+        <p class="p">
+          <code class="ph codeph">COMPUTE STATS</code> is intended to be run periodically,
+          e.g. weekly, or on-demand when the contents of a table have changed
+          significantly. Due to the high resource utilization and long repsonse
+          time of t<code class="ph codeph">COMPUTE STATS</code>, it is most practical to run it
+          in a scheduled maintnance window where the Impala cluster is idle
+          enough to accommodate the expensive operation. The degree of change that
+          qualifies as <span class="q">"significant"</span> depends on the query workload, but typically,
+          if 30% of the rows have changed then it is recommended to recompute
+          statistics.
+        </p>
+
+        <p class="p">
+          If you reload a complete new set of data for a table, but the number of rows and
+          number of distinct values for each column is relatively unchanged from before, you
+          do not need to recompute stats for the table.
+        </p>
+
+      </div>
+
+      <article class="topic concept nested3" aria-labelledby="ariaid-title7" id="concept_y2f_nfl_mdb__experimental_stats_features">
+        <h4 class="title topictitle4" id="ariaid-title7">Experimental: Extrapolation and Sampling</h4>
+        <div class="body conbody">
+          <div class="p">
+            Impala 2.12 and higher includes two experimental features to alleviate
+            common issues for computing and maintaining statistics on very large tables.
+            The following shortcomings are improved upon:
+            <ul class="ul">
+            <li class="li">
+              <p class="p">
+                Newly added partitions do not have row count statistics. Table scans
+                that only access those new partitions are treated as not having stats.
+                Similarly, table scans that access both new and old partitions estimate
+                the scan cardinality based on those old partitions that have stats, and
+                the new partitions without stats are treated as having 0 rows.
+              </p>
+            </li>
+            <li class="li">
+              <p class="p">
+                The row counts of existing partitions become stale when data is added
+                or dropped.
+              </p>
+            </li>
+            <li class="li">
+              <p class="p">
+                Computing stats for tables with a 100,000 or more partitions might fail
+                or be very slow due to the high cost of updating the partition metadata
+                in the Hive Metastore.
+              </p>
+            </li>
+            <li class="li">
+              <p class="p">
+                With transient compute resources it is important to minimize the time
+                from starting a new cluster to successfully running queries.
+                Since the cluster might be relatively short-lived, users might prefer to
+                quickly collect stats that are "good enough" as opposed to spending
+                a lot of time and resouces on computing full-fidelity stats.
+              </p>
+            </li>
+            </ul>
+            For very large tables, it is often wasteful or impractical to run a full
+            COMPUTE STATS to address the scenarios above on a frequent basis.
+          </div>
+          <p class="p">
+            The sampling feature makes COMPUTE STATS more efficient by processing a
+            fraction of the table data, and the extrapolation feature aims to reduce
+            the frequency at which COMPUTE STATS needs to be re-run by estimating
+            the row count of new and modified partitions.
+          </p>
+          <p class="p">
+            The sampling and extrapolation features are disabled by default.
+            They can be enabled globally or for specific tables, as follows.
+            Set the impalad start-up configuration "--enable_stats_extrapolation" to
+            enable the features globally. To enable them only for a specific table, set
+            the "impala.enable.stats.extrapolation" table property to "true" for the
+            desired table. The tbale-level property overrides the global setting, so
+            it is also possible to enable sampling and extrapolation globally, but
+            disable it for specific tables by setting the table property to "false".
+            Example:
+            ALTER TABLE mytable test_table SET TBLPROPERTIES("impala.enable.stats.extrapolation"="true")
+          </p>
+          <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+            Why are these features experimental? Due to their probabilistic nature
+            it is possible that these features perform pathologically poorly on tables
+            with extreme data/file/size distributions. Since it is not feasible for us
+            to test all possible scenarios we only cautiously advertise these new
+            capabilities. That said, the features have been thoroughly tested and
+            are considered functionally stable. If you decide to give these features
+            a try, please tell us about your experience at user@impala.apache.org!
+            We rely on user feedback to guide future inprovements in statistics
+            collection.
+          </div>
+        </div>
+
+        <article class="topic concept nested4" aria-labelledby="ariaid-title8" id="experimental_stats_features__experimental_stats_extrapolation">
+          <h5 class="title topictitle5" id="ariaid-title8">Stats Extrapolation</h5>
+          <div class="body conbody">
+            <p class="p">
+              The main idea of stats extrapolation is to estimate the row count of new
+              and modified partitions based on the result of the last COMPUTE STATS.
+              Enabling stats extrapolation changes the behavior of COMPUTE STATS,
+              as well as the cardinality estimation of table scans. COMPUTE STATS no
+              longer computes and stores per-partition row counts, and instead, only
+              computes a table-level row count together with the total number of file
+              bytes in the table at that time. No partition metadata is modified. The
+              input cardinality of a table scan is estimated by converting the data
+              volume of relevant partitions to a row count, based on the table-level
+              row count and file bytes statistics. It is assumed that within the same
+              table, different sets of files with the same data volume correspond
+              to the similar number of rows on average. With extrapolation enabled,
+              the scan cardinality estimation ignores per-partition row counts. It
+              only relies on the table-level statistics and the scanned data volume.
+            </p>
+            <p class="p">
+              The SHOW TABLE STATS and EXPLAIN commands distinguish between row counts
+              stored in the Hive Metastore, and the row counts extrapolated based on the
+              above process. Consult the SHOW TABLE STATS and EXPLAIN documentation
+              for more details.
+            </p>
+          </div>
+        </article>
+
+        <article class="topic concept nested4" aria-labelledby="ariaid-title9" id="experimental_stats_features__experimental_stats_sampling">
+          <h5 class="title topictitle5" id="ariaid-title9">Sampling</h5>
+          <div class="body conbody">
+            <p class="p">
+              A TABLESAMPLE clause may be added to COMPUTE STATS to limit the
+              percentage of data to be processed. The final statistics are obtained
+              by extrapolating the statistics from the data sample over the entire table.
+              The extrapolated statistics are stored in the Hive Metastore, just as if no
+              sampling was used. The following example runs COMPUTE STATS over a 10 percent
+              data sample: COMPUTE STATS test_table TABLESAMPLE SYSTEM(10)
+            </p>
+            <p class="p">
+            We have found that a 10 percent sampling rate typically offers a good
+            tradeoff between statistics accuracy and execution cost. A sampling rate
+            well below 10 percent has shown poor results and is not recommended.
+            </p>
+            <div class="note important note_important"><span class="note__title importanttitle">Important:</span>
+              Sampling-based techniques sacrifice result accuracy for execution
+              efficiency, so your mileage may vary for different tables and columns
+              depending on their data distribution. The extrapolation procedure Impala
+              uses for estimating the number of distinct values per column is inherently
+              non-detetministic, so your results may even vary between runs of
+              COMPUTE STATS TABLESAMPLE, even if no data has changed.
+            </div>
+          </div>
+        </article>
+      </article>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title10" id="perf_generating_stats__concept_bmk_pfl_mdb">
+
+      <h3 class="title topictitle3" id="ariaid-title10">COMPUTE INCREMENTAL STATS</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          In Impala 2.1.0 and higher, you can use the
+          <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> and
+          <code class="ph codeph">DROP INCREMENTAL STATS</code> commands.
+          The <code class="ph codeph">INCREMENTAL</code> clauses work with incremental statistics,
+          a specialized feature for partitioned tables.
+        </p>
+
+        <p class="p">
+          When you compute incremental statistics for a partitioned table, by default Impala only
+          processes those partitions that do not yet have incremental statistics. By processing
+          only newly added partitions, you can keep statistics up to date without incurring the
+          overhead of reprocessing the entire table each time.
+        </p>
+
+        <p class="p">
+          You can also compute or drop statistics for a specified subset of partitions by
+          including a <code class="ph codeph">PARTITION</code> clause in the
+          <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> or <code class="ph codeph">DROP INCREMENTAL STATS</code>
+          statement.
+        </p>
+
+        <div class="note important note_important"><span class="note__title importanttitle">Important:</span>
+          <p class="p">
+        For a table with a huge number of partitions and many columns, the approximately 400 bytes
+        of metadata per column per partition can add up to significant memory overhead, as it must
+        be cached on the <span class="keyword cmdname">catalogd</span> host and on every <span class="keyword cmdname">impalad</span> host
+        that is eligible to be a coordinator. If this metadata for all tables combined exceeds 2 GB,
+        you might experience service downtime.
+      </p>
+          <p class="p">
+        When you run <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> on a table for the first time,
+        the statistics are computed again from scratch regardless of whether the table already
+        has statistics. Therefore, expect a one-time resource-intensive operation
+        for scanning the entire table when running <code class="ph codeph">COMPUTE INCREMENTAL STATS</code>
+        for the first time on a given table.
+      </p>
+        </div>
+
+        <p class="p">
+          The metadata for incremental statistics is handled differently from the original style
+          of statistics:
+        </p>
+
+        <ul class="ul">
+          <li class="li">
+            <p class="p">
+              Issuing a <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> without a partition
+              clause causes Impala to compute incremental stats for all partitions that
+              do not already have incremental stats. This might be the entire table when
+              running the command for the first time, but subsequent runs should only
+              update new partitions. You can force updating a partition that already has
+              incremental stats by issuing a <code class="ph codeph">DROP INCREMENTAL STATS</code>
+              before running <code class="ph codeph">COMPUTE INCREMENTAL STATS</code>.
+            </p>
+          </li>
+
+          <li class="li">
+            <p class="p">
+              The <code class="ph codeph">SHOW TABLE STATS</code> and <code class="ph codeph">SHOW PARTITIONS</code>
+              statements now include an additional column showing whether incremental statistics
+              are available for each column. A partition could already be covered by the original
+              type of statistics based on a prior <code class="ph codeph">COMPUTE STATS</code> statement, as
+              indicated by a value other than <code class="ph codeph">-1</code> under the <code class="ph codeph">#Rows</code>
+              column. Impala query planning uses either kind of statistics when available.
+            </p>
+          </li>
+
+          <li class="li">
+            <p class="p">
+              <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> takes more time than <code class="ph codeph">COMPUTE
+              STATS</code> for the same volume of data. Therefore it is most suitable for tables
+              with large data volume where new partitions are added frequently, making it
+              impractical to run a full <code class="ph codeph">COMPUTE STATS</code> operation for each new
+              partition. For unpartitioned tables, or partitioned tables that are loaded once and
+              not updated with new partitions, use the original <code class="ph codeph">COMPUTE STATS</code>
+              syntax.
+            </p>
+          </li>
+
+          <li class="li">
+            <p class="p">
+              <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> uses some memory in the
+              <span class="keyword cmdname">catalogd</span> process, proportional to the number of partitions and
+              number of columns in the applicable table. The memory overhead is approximately 400
+              bytes for each column in each partition. This memory is reserved in the
+              <span class="keyword cmdname">catalogd</span> daemon, the <span class="keyword cmdname">statestored</span> daemon, and
+              in each instance of the <span class="keyword cmdname">impalad</span> daemon.
+            </p>
+          </li>
+
+          <li class="li">
+            <p class="p">
+              In cases where new files are added to an existing partition, issue a
+              <code class="ph codeph">REFRESH</code> statement for the table, followed by a <code class="ph codeph">DROP
+              INCREMENTAL STATS</code> and <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> sequence
+              for the changed partition.
+            </p>
+          </li>
+
+          <li class="li">
+            <p class="p">
+              The <code class="ph codeph">DROP INCREMENTAL STATS</code> statement operates only on a single
+              partition at a time. To remove statistics (whether incremental or not) from all
+              partitions of a table, issue a <code class="ph codeph">DROP STATS</code> statement with no
+              <code class="ph codeph">INCREMENTAL</code> or <code class="ph codeph">PARTITION</code> clauses.
+            </p>
+          </li>
+        </ul>
+
+        <p class="p">
+          The following considerations apply to incremental statistics when the structure of an
+          existing table is changed (known as <dfn class="term">schema evolution</dfn>):
+        </p>
+
+        <ul class="ul">
+          <li class="li">
+            <p class="p">
+              If you use an <code class="ph codeph">ALTER TABLE</code> statement to drop a column, the existing
+              statistics remain valid and <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> does not
+              rescan any partitions.
+            </p>
+          </li>
+
+          <li class="li">
+            <p class="p">
+              If you use an <code class="ph codeph">ALTER TABLE</code> statement to add a column, Impala rescans
+              all partitions and fills in the appropriate column-level values the next time you
+              run <code class="ph codeph">COMPUTE INCREMENTAL STATS</code>.
+            </p>
+          </li>
+
+          <li class="li">
+            <p class="p">
+              If you use an <code class="ph codeph">ALTER TABLE</code> statement to change the data type of a
+              column, Impala rescans all partitions and fills in the appropriate column-level
+              values the next time you run <code class="ph codeph">COMPUTE INCREMENTAL STATS</code>.
+            </p>
+          </li>
+
+          <li class="li">
+            <p class="p">
+              If you use an <code class="ph codeph">ALTER TABLE</code> statement to change the file format of a
+              table, the existing statistics remain valid and a subsequent <code class="ph codeph">COMPUTE
+              INCREMENTAL STATS</code> does not rescan any partitions.
+            </p>
+          </li>
+        </ul>
+
+        <p class="p">
+          See <a class="xref" href="impala_compute_stats.html#compute_stats">COMPUTE STATS Statement</a> and
+          <a class="xref" href="impala_drop_stats.html#drop_stats">DROP STATS Statement</a> for syntax details.
+        </p>
+
+      </div>
+
+    </article>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title11" id="perf_stats__perf_stats_checking">
+
+    <h2 class="title topictitle2" id="ariaid-title11">Detecting Missing Statistics</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        You can check whether a specific table has statistics using the <code class="ph codeph">SHOW TABLE
+        STATS</code> statement (for any table) or the <code class="ph codeph">SHOW PARTITIONS</code>
+        statement (for a partitioned table). Both statements display the same information. If a
+        table or a partition does not have any statistics, the <code class="ph codeph">#Rows</code> field
+        contains <code class="ph codeph">-1</code>. Once you compute statistics for the table or partition,
+        the <code class="ph codeph">#Rows</code> field changes to an accurate value.
+      </p>
+
+      <p class="p">
+        The following example shows a table that initially does not have any statistics. The
+        <code class="ph codeph">SHOW TABLE STATS</code> statement displays different values for
+        <code class="ph codeph">#Rows</code> before and after the <code class="ph codeph">COMPUTE STATS</code> operation.
+      </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; create table no_stats (x int);
+[localhost:21000] &gt; show table stats no_stats;
++-------+--------+------+--------------+--------+-------------------+
+| #Rows | #Files | Size | Bytes Cached | Format | Incremental stats |
++-------+--------+------+--------------+--------+-------------------+
+| -1    | 0      | 0B   | NOT CACHED   | TEXT   | false             |
++-------+--------+------+--------------+--------+-------------------+
+[localhost:21000] &gt; compute stats no_stats;
++-----------------------------------------+
+| summary                                 |
++-----------------------------------------+
+| Updated 1 partition(s) and 1 column(s). |
++-----------------------------------------+
+[localhost:21000] &gt; show table stats no_stats;
++-------+--------+------+--------------+--------+-------------------+
+| #Rows | #Files | Size | Bytes Cached | Format | Incremental stats |
++-------+--------+------+--------------+--------+-------------------+
+| 0     | 0      | 0B   | NOT CACHED   | TEXT   | false             |
++-------+--------+------+--------------+--------+-------------------+
+</code></pre>
+
+      <p class="p">
+        The following example shows a similar progression with a partitioned table. Initially,
+        <code class="ph codeph">#Rows</code> is <code class="ph codeph">-1</code>. After a <code class="ph codeph">COMPUTE STATS</code>
+        operation, <code class="ph codeph">#Rows</code> changes to an accurate value. Any newly added
+        partition starts with no statistics, meaning that you must collect statistics after
+        adding a new partition.
+      </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; create table no_stats_partitioned (x int) partitioned by (year smallint);
+[localhost:21000] &gt; show table stats no_stats_partitioned;
++-------+-------+--------+------+--------------+--------+-------------------+
+| year  | #Rows | #Files | Size | Bytes Cached | Format | Incremental stats |
++-------+-------+--------+------+--------------+--------+-------------------+
+| Total | -1    | 0      | 0B   | 0B           |        |                   |
++-------+-------+--------+------+--------------+--------+-------------------+
+[localhost:21000] &gt; show partitions no_stats_partitioned;
++-------+-------+--------+------+--------------+--------+-------------------+
+| year  | #Rows | #Files | Size | Bytes Cached | Format | Incremental stats |
++-------+-------+--------+------+--------------+--------+-------------------+
+| Total | -1    | 0      | 0B   | 0B           |        |                   |
++-------+-------+--------+------+--------------+--------+-------------------+
+[localhost:21000] &gt; alter table no_stats_partitioned add partition (year=2013);
+[localhost:21000] &gt; compute stats no_stats_partitioned;
++-----------------------------------------+
+| summary                                 |
++-----------------------------------------+
+| Updated 1 partition(s) and 1 column(s). |
++-----------------------------------------+
+[localhost:21000] &gt; alter table no_stats_partitioned add partition (year=2014);
+[localhost:21000] &gt; show partitions no_stats_partitioned;
++-------+-------+--------+------+--------------+--------+-------------------+
+| year  | #Rows | #Files | Size | Bytes Cached | Format | Incremental stats |
++-------+-------+--------+------+--------------+--------+-------------------+
+| 2013  | 0     | 0      | 0B   | NOT CACHED   | TEXT   | false             |
+| 2014  | -1    | 0      | 0B   | NOT CACHED   | TEXT   | false             |
+| Total | 0     | 0      | 0B   | 0B           |        |                   |
++-------+-------+--------+------+--------------+--------+-------------------+
+</code></pre>
+
+      <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+        Because the default <code class="ph codeph">COMPUTE STATS</code> statement creates and updates
+        statistics for all partitions in a table, if you expect to frequently add new
+        partitions, use the <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> syntax instead, which
+        lets you compute stats for a single specified partition, or only for those partitions
+        that do not already have incremental stats.
+      </div>
+
+      <p class="p">
+        If checking each individual table is impractical, due to a large number of tables or
+        views that hide the underlying base tables, you can also check for missing statistics
+        for a particular query. Use the <code class="ph codeph">EXPLAIN</code> statement to preview query
+        efficiency before actually running the query. Use the query profile output available
+        through the <code class="ph codeph">PROFILE</code> command in <span class="keyword cmdname">impala-shell</span> or the
+        web UI to verify query execution and timing after running the query. Both the
+        <code class="ph codeph">EXPLAIN</code> plan and the <code class="ph codeph">PROFILE</code> output display a warning
+        if any tables or partitions involved in the query do not have statistics.
+      </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; create table no_stats (x int);
+[localhost:21000] &gt; explain select count(*) from no_stats;
++------------------------------------------------------------------------------------+
+| Explain String                                                                     |
++------------------------------------------------------------------------------------+
+| Estimated Per-Host Requirements: Memory=10.00MB VCores=1                           |
+| WARNING: The following tables are missing relevant table and/or column statistics. |
+| incremental_stats.no_stats                                                         |
+|                                                                                    |
+| 03:AGGREGATE [FINALIZE]                                                            |
+| |  output: count:merge(*)                                                          |
+| |                                                                                  |
+| 02:EXCHANGE [UNPARTITIONED]                                                        |
+| |                                                                                  |
+| 01:AGGREGATE                                                                       |
+| |  output: count(*)                                                                |
+| |                                                                                  |
+| 00:SCAN HDFS [incremental_stats.no_stats]                                          |
+|    partitions=1/1 files=0 size=0B                                                  |
++------------------------------------------------------------------------------------+
+</code></pre>
+
+      <p class="p">
+        Because Impala uses the <dfn class="term">partition pruning</dfn> technique when possible to only
+        evaluate certain partitions, if you have a partitioned table with statistics for some
+        partitions and not others, whether or not the <code class="ph codeph">EXPLAIN</code> statement shows
+        the warning depends on the actual partitions used by the query. For example, you might
+        see warnings or not for different queries against the same table:
+      </p>
+
+<pre class="pre codeblock"><code>-- No warning because all the partitions for the year 2012 have stats.
+EXPLAIN SELECT ... FROM t1 WHERE year = 2012;
+
+-- Missing stats warning because one or more partitions in this range
+-- do not have stats.
+EXPLAIN SELECT ... FROM t1 WHERE year BETWEEN 2006 AND 2009;
+</code></pre>
+
+      <p class="p">
+        To confirm if any partitions at all in the table are missing statistics, you might
+        explain a query that scans the entire table, such as <code class="ph codeph">SELECT COUNT(*) FROM
+        <var class="keyword varname">table_name</var></code>.
+      </p>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title12" id="perf_stats__concept_s3c_4gl_mdb">
+
+    <h2 class="title topictitle2" id="ariaid-title12">Manually Setting Table and Column Statistics with ALTER TABLE</h2>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title13" id="concept_s3c_4gl_mdb__concept_wpt_pgl_mdb">
+
+      <h3 class="title topictitle3" id="ariaid-title13">Setting Table Statistics</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          The most crucial piece of data in all the statistics is the number of rows in the
+          table (for an unpartitioned or partitioned table) and for each partition (for a
+          partitioned table). The <code class="ph codeph">COMPUTE STATS</code> statement always gathers
+          statistics about all columns, as well as overall table statistics. If it is not
+          practical to do a full <code class="ph codeph">COMPUTE STATS</code> or <code class="ph codeph">COMPUTE INCREMENTAL
+          STATS</code> operation after adding a partition or inserting data, or if you can see
+          that Impala would produce a more efficient plan if the number of rows was different,
+          you can manually set the number of rows through an <code class="ph codeph">ALTER TABLE</code>
+          statement:
+        </p>
+
+<pre class="pre codeblock"><code>
+-- Set total number of rows. Applies to both unpartitioned and partitioned tables.
+alter table <var class="keyword varname">table_name</var> set tblproperties('numRows'='<var class="keyword varname">new_value</var>', 'STATS_GENERATED_VIA_STATS_TASK'='true');
+
+-- Set total number of rows for a specific partition. Applies to partitioned tables only.
+-- You must specify all the partition key columns in the PARTITION clause.
+alter table <var class="keyword varname">table_name</var> partition (<var class="keyword varname">keycol1</var>=<var class="keyword varname">val1</var>,<var class="keyword varname">keycol2</var>=<var class="keyword varname">val2</var>...) set tblproperties('numRows'='<var class="keyword varname">new_value</var>', 'STATS_GENERATED_VIA_STATS_TASK'='true');
+</code></pre>
+
+        <p class="p">
+          This statement avoids re-scanning any data files. (The requirement to include the
+          <code class="ph codeph">STATS_GENERATED_VIA_STATS_TASK</code> property is relatively new, as a
+          result of the issue
+          <a class="xref" href="https://issues.apache.org/jira/browse/HIVE-8648" target="_blank">HIVE-8648</a>
+          for the Hive metastore.)
+        </p>
+
+<pre class="pre codeblock"><code>create table analysis_data stored as parquet as select * from raw_data;
+Inserted 1000000000 rows in 181.98s
+compute stats analysis_data;
+insert into analysis_data select * from smaller_table_we_forgot_before;
+Inserted 1000000 rows in 15.32s
+-- Now there are 1001000000 rows. We can update this single data point in the stats.
+alter table analysis_data set tblproperties('numRows'='1001000000', 'STATS_GENERATED_VIA_STATS_TASK'='true');</code></pre>
+
+        <p class="p">
+          For a partitioned table, update both the per-partition number of rows and the number
+          of rows for the whole table:
+        </p>
+
+<pre class="pre codeblock"><code>-- If the table originally contained 1 million rows, and we add another partition with 30 thousand rows,
+-- change the numRows property for the partition and the overall table.
+alter table partitioned_data partition(year=2009, month=4) set tblproperties ('numRows'='30000', 'STATS_GENERATED_VIA_STATS_TASK'='true');
+alter table partitioned_data set tblproperties ('numRows'='1030000', 'STATS_GENERATED_VIA_STATS_TASK'='true');</code></pre>
+
+        <p class="p">
+          In practice, the <code class="ph codeph">COMPUTE STATS</code> statement, or <code class="ph codeph">COMPUTE
+          INCREMENTAL STATS</code> for a partitioned table, should be fast and convenient
+          enough that this technique is only useful for the very largest partitioned tables.
+
+
+          Because the column statistics might be left in a stale state, do not use this
+          technique as a replacement for <code class="ph codeph">COMPUTE STATS</code>. Only use this technique
+          if all other means of collecting statistics are impractical, or as a low-overhead
+          operation that you run in between periodic <code class="ph codeph">COMPUTE STATS</code> or
+          <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> operations.
+        </p>
+
+      </div>
+
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title14" id="concept_s3c_4gl_mdb__concept_asb_vgl_mdb">
+
+      <h3 class="title topictitle3" id="ariaid-title14">Setting Column Statistics</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          In <span class="keyword">Impala 2.6</span> and higher, you can also use the <code class="ph codeph">SET
+          COLUMN STATS</code> clause of <code class="ph codeph">ALTER TABLE</code> to manually set or change
+          column statistics. Only use this technique in cases where it is impractical to run
+          <code class="ph codeph">COMPUTE STATS</code> or <code class="ph codeph">COMPUTE INCREMENTAL STATS</code>
+          frequently enough to keep up with data changes for a huge table.
+        </p>
+
+        <div class="p">
+        You specify a case-insensitive symbolic name for the kind of statistics:
+        <code class="ph codeph">numDVs</code>, <code class="ph codeph">numNulls</code>, <code class="ph codeph">avgSize</code>, <code class="ph codeph">maxSize</code>.
+        The key names and values are both quoted. This operation applies to an entire table,
+        not a specific partition. For example:
+<pre class="pre codeblock"><code>
+create table t1 (x int, s string);
+insert into t1 values (1, 'one'), (2, 'two'), (2, 'deux');
+show column stats t1;
++--------+--------+------------------+--------+----------+----------+
+| Column | Type   | #Distinct Values | #Nulls | Max Size | Avg Size |
++--------+--------+------------------+--------+----------+----------+
+| x      | INT    | -1               | -1     | 4        | 4        |
+| s      | STRING | -1               | -1     | -1       | -1       |
++--------+--------+------------------+--------+----------+----------+
+alter table t1 set column stats x ('numDVs'='2','numNulls'='0');
+alter table t1 set column stats s ('numdvs'='3','maxsize'='4');
+show column stats t1;
++--------+--------+------------------+--------+----------+----------+
+| Column | Type   | #Distinct Values | #Nulls | Max Size | Avg Size |
++--------+--------+------------------+--------+----------+----------+
+| x      | INT    | 2                | 0      | 4        | 4        |
+| s      | STRING | 3                | -1     | 4        | -1       |
++--------+--------+------------------+--------+----------+----------+
+</code></pre>
+      </div>
+
+      </div>
+
+    </article>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title15" id="perf_stats__perf_stats_examples">
+
+    <h2 class="title topictitle2" id="ariaid-title15">Examples of Using Table and Column Statistics with Impala</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        The following examples walk through a sequence of <code class="ph codeph">SHOW TABLE STATS</code>,
+        <code class="ph codeph">SHOW COLUMN STATS</code>, <code class="ph codeph">ALTER TABLE</code>, and
+        <code class="ph codeph">SELECT</code> and <code class="ph codeph">INSERT</code> statements to illustrate various
+        aspects of how Impala uses statistics to help optimize queries.
+      </p>
+
+      <p class="p">
+        This example shows table and column statistics for the <code class="ph codeph">STORE</code> column
+        used in the <a class="xref" href="http://www.tpc.org/tpcds/" target="_blank">TPC-DS
+        benchmarks for decision support</a> systems. It is a tiny table holding data for 12
+        stores. Initially, before any statistics are gathered by a <code class="ph codeph">COMPUTE
+        STATS</code> statement, most of the numeric fields show placeholder values of -1,
+        indicating that the figures are unknown. The figures that are filled in are values that
+        are easily countable or deducible at the physical level, such as the number of files,
+        total data size of the files, and the maximum and average sizes for data types that have
+        a constant size such as <code class="ph codeph">INT</code>, <code class="ph codeph">FLOAT</code>, and
+        <code class="ph codeph">TIMESTAMP</code>.
+      </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; show table stats store;
++-------+--------+--------+--------+
+| #Rows | #Files | Size   | Format |
++-------+--------+--------+--------+
+| -1    | 1      | 3.08KB | TEXT   |
++-------+--------+--------+--------+
+Returned 1 row(s) in 0.03s
+[localhost:21000] &gt; show column stats store;
++--------------------+-----------+------------------+--------+----------+----------+
+| Column             | Type      | #Distinct Values | #Nulls | Max Size | Avg Size |
++--------------------+-----------+------------------+--------+----------+----------+
+| s_store_sk         | INT       | -1               | -1     | 4        | 4        |
+| s_store_id         | STRING    | -1               | -1     | -1       | -1       |
+| s_rec_start_date   | TIMESTAMP | -1               | -1     | 16       | 16       |
+| s_rec_end_date     | TIMESTAMP | -1               | -1     | 16       | 16       |
+| s_closed_date_sk   | INT       | -1               | -1     | 4        | 4        |
+| s_store_name       | STRING    | -1               | -1     | -1       | -1       |
+| s_number_employees | INT       | -1               | -1     | 4        | 4        |
+| s_floor_space      | INT       | -1               | -1     | 4        | 4        |
+| s_hours            | STRING    | -1               | -1     | -1       | -1       |
+| s_manager          | STRING    | -1               | -1     | -1       | -1       |
+| s_market_id        | INT       | -1               | -1     | 4        | 4        |
+| s_geography_class  | STRING    | -1               | -1     | -1       | -1       |
+| s_market_desc      | STRING    | -1               | -1     | -1       | -1       |
+| s_market_manager   | STRING    | -1               | -1     | -1       | -1       |
+| s_division_id      | INT       | -1               | -1     | 4        | 4        |
+| s_division_name    | STRING    | -1               | -1     | -1       | -1       |
+| s_company_id       | INT       | -1               | -1     | 4        | 4        |
+| s_company_name     | STRING    | -1               | -1     | -1       | -1       |
+| s_street_number    | STRING    | -1               | -1     | -1       | -1       |
+| s_street_name      | STRING    | -1               | -1     | -1       | -1       |
+| s_street_type      | STRING    | -1               | -1     | -1       | -1       |
+| s_suite_number     | STRING    | -1               | -1     | -1       | -1       |
+| s_city             | STRING    | -1               | -1     | -1       | -1       |
+| s_county           | STRING    | -1               | -1     | -1       | -1       |
+| s_state            | STRING    | -1               | -1     | -1       | -1       |
+| s_zip              | STRING    | -1               | -1     | -1       | -1       |
+| s_country          | STRING    | -1               | -1     | -1       | -1       |
+| s_gmt_offset       | FLOAT     | -1               | -1     | 4        | 4        |
+| s_tax_percentage   | FLOAT     | -1               | -1     | 4        | 4        |
++--------------------+-----------+------------------+--------+----------+----------+
+Returned 29 row(s) in 0.04s</code></pre>
+
+      <p class="p">
+        With the Hive <code class="ph codeph">ANALYZE TABLE</code> statement for column statistics, you had to
+        specify each column for which to gather statistics. The Impala <code class="ph codeph">COMPUTE
+        STATS</code> statement automatically gathers statistics for all columns, because it
+        reads through the entire table relatively quickly and can efficiently compute the values
+        for all the columns. This example shows how after running the <code class="ph codeph">COMPUTE
+        STATS</code> statement, statistics are filled in for both the table and all its
+        columns:
+      </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; compute stats store;
++------------------------------------------+
+| summary                                  |
++------------------------------------------+
+| Updated 1 partition(s) and 29 column(s). |
++------------------------------------------+
+Returned 1 row(s) in 1.88s
+[localhost:21000] &gt; show table stats store;
++-------+--------+--------+--------+
+| #Rows | #Files | Size   | Format |
++-------+--------+--------+--------+
+| 12    | 1      | 3.08KB | TEXT   |
++-------+--------+--------+--------+
+Returned 1 row(s) in 0.02s
+[localhost:21000] &gt; show column stats store;
++--------------------+-----------+------------------+--------+----------+-------------------+
+| Column             | Type      | #Distinct Values | #Nulls | Max Size | Avg Size          |
++--------------------+-----------+------------------+--------+----------+-------------------+
+| s_store_sk         | INT       | 12               | -1     | 4        | 4                 |
+| s_store_id         | STRING    | 6                | -1     | 16       | 16                |
+| s_rec_start_date   | TIMESTAMP | 4                | -1     | 16       | 16                |
+| s_rec_end_date     | TIMESTAMP | 3                | -1     | 16       | 16                |
+| s_closed_date_sk   | INT       | 3                | -1     | 4        | 4                 |
+| s_store_name       | STRING    | 8                | -1     | 5        | 4.25              |
+| s_number_employees | INT       | 9                | -1     | 4        | 4                 |
+| s_floor_space      | INT       | 10               | -1     | 4        | 4                 |
+| s_hours            | STRING    | 2                | -1     | 8        | 7.083300113677979 |
+| s_manager          | STRING    | 7                | -1     | 15       | 12                |
+| s_market_id        | INT       | 7                | -1     | 4        | 4                 |
+| s_geography_class  | STRING    | 1                | -1     | 7        | 7                 |
+| s_market_desc      | STRING    | 10               | -1     | 94       | 55.5              |
+| s_market_manager   | STRING    | 7                | -1     | 16       | 14                |
+| s_division_id      | INT       | 1                | -1     | 4        | 4                 |
+| s_division_name    | STRING    | 1                | -1     | 7        | 7                 |
+| s_company_id       | INT       | 1                | -1     | 4        | 4                 |
+| s_company_name     | STRING    | 1                | -1     | 7        | 7                 |
+| s_street_number    | STRING    | 9                | -1     | 3        | 2.833300113677979 |
+| s_street_name      | STRING    | 12               | -1     | 11       | 6.583300113677979 |
+| s_street_type      | STRING    | 8                | -1     | 9        | 4.833300113677979 |
+| s_suite_number     | STRING    | 11               | -1     | 9        | 8.25              |
+| s_city             | STRING    | 2                | -1     | 8        | 6.5               |
+| s_county           | STRING    | 1                | -1     | 17       | 17                |
+| s_state            | STRING    | 1                | -1     | 2        | 2                 |
+| s_zip              | STRING    | 2                | -1     | 5        | 5                 |
+| s_country          | STRING    | 1                | -1     | 13       | 13                |
+| s_gmt_offset       | FLOAT     | 1                | -1     | 4        | 4                 |
+| s_tax_percentage   | FLOAT     | 5                | -1     | 4        | 4                 |
++--------------------+-----------+------------------+--------+----------+-------------------+
+Returned 29 row(s) in 0.04s</code></pre>
+
+      <p class="p">
+        The following example shows how statistics are represented for a partitioned table. In
+        this case, we have set up a table to hold the world's most trivial census data, a single
+        <code class="ph codeph">STRING</code> field, partitioned by a <code class="ph codeph">YEAR</code> column. The table
+        statistics include a separate entry for each partition, plus final totals for the
+        numeric fields. The column statistics include some easily deducible facts for the
+        partitioning column, such as the number of distinct values (the number of partition
+        subdirectories).
+
+      </p>
+
+<pre class="pre codeblock"><code>localhost:21000] &gt; describe census;
++------+----------+---------+
+| name | type     | comment |
++------+----------+---------+
+| name | string   |         |
+| year | smallint |         |
++------+----------+---------+
+Returned 2 row(s) in 0.02s
+[localhost:21000] &gt; show table stats census;
++-------+-------+--------+------+---------+
+| year  | #Rows | #Files | Size | Format  |
++-------+-------+--------+------+---------+
+| 2000  | -1    | 0      | 0B   | TEXT    |
+| 2004  | -1    | 0      | 0B   | TEXT    |
+| 2008  | -1    | 0      | 0B   | TEXT    |
+| 2010  | -1    | 0      | 0B   | TEXT    |
+| 2011  | 0     | 1      | 22B  | TEXT    |
+| 2012  | -1    | 1      | 22B  | TEXT    |
+| 2013  | -1    | 1      | 231B | PARQUET |
+| Total | 0     | 3      | 275B |         |
++-------+-------+--------+------+---------+
+Returned 8 row(s) in 0.02s
+[localhost:21000] &gt; show column stats census;
++--------+----------+------------------+--------+----------+----------+
+| Column | Type     | #Distinct Values | #Nulls | Max Size | Avg Size |
++--------+----------+------------------+--------+----------+----------+
+| name   | STRING   | -1               | -1     | -1       | -1       |
+| year   | SMALLINT | 7                | -1     | 2        | 2        |
++--------+----------+------------------+--------+----------+----------+
+Returned 2 row(s) in 0.02s</code></pre>
+
+      <p class="p">
+        The following example shows how the statistics are filled in by a <code class="ph codeph">COMPUTE
+        STATS</code> statement in Impala.
+      </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; compute stats census;
++-----------------------------------------+
+| summary                                 |
++-----------------------------------------+
+| Updated 3 partition(s) and 1 column(s). |
++-----------------------------------------+
+Returned 1 row(s) in 2.16s
+[localhost:21000] &gt; show table stats census;
++-------+-------+--------+------+---------+
+| year  | #Rows | #Files | Size | Format  |
++-------+-------+--------+------+---------+
+| 2000  | -1    | 0      | 0B   | TEXT    |
+| 2004  | -1    | 0      | 0B   | TEXT    |
+| 2008  | -1    | 0      | 0B   | TEXT    |
+| 2010  | -1    | 0      | 0B   | TEXT    |
+| 2011  | 4     | 1      | 22B  | TEXT    |
+| 2012  | 4     | 1      | 22B  | TEXT    |
+| 2013  | 1     | 1      | 231B | PARQUET |
+| Total | 9     | 3      | 275B |         |
++-------+-------+--------+------+---------+
+Returned 8 row(s) in 0.02s
+[localhost:21000] &gt; show column stats census;
++--------+----------+------------------+--------+----------+----------+
+| Column | Type     | #Distinct Values | #Nulls | Max Size | Avg Size |
++--------+----------+------------------+--------+----------+----------+
+| name   | STRING   | 4                | -1     | 5        | 4.5      |
+| year   | SMALLINT | 7                | -1     | 2        | 2        |
++--------+----------+------------------+--------+----------+----------+
+Returned 2 row(s) in 0.02s</code></pre>
+
+      <p class="p">
+        For examples showing how some queries work differently when statistics are available,
+        see <a class="xref" href="impala_perf_joins.html#perf_joins_examples">Examples of Join Order Optimization</a>. You can see how Impala
+        executes a query differently in each case by observing the <code class="ph codeph">EXPLAIN</code>
+        output before and after collecting statistics. Measure the before and after query times,
+        and examine the throughput numbers in before and after <code class="ph codeph">SUMMARY</code> or
+        <code class="ph codeph">PROFILE</code> output, to verify how much the improved plan speeds up
+        performance.
+      </p>
+
+    </div>
+
+  </article>
+
+</article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_perf_testing.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_perf_testing.html b/docs/build3x/html/topics/impala_perf_testing.html
new file mode 100644
index 0000000..fee319a
--- /dev/null
+++ b/docs/build3x/html/topics/impala_perf_testing.html
@@ -0,0 +1,152 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_performance.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="performance_testing"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Testing Impala Performance</title></head><body id="performance_testing"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Testing Impala Performance</h1>
+
+
+  <div class="body conbody">
+
+    <p class="p">
+      Test to ensure that Impala is configured for optimal performance. If you have installed Impala with cluster
+      management software, complete the processes described in this topic to help ensure a proper
+      configuration. These procedures can be used to verify that Impala is set up correctly.
+    </p>
+
+    <section class="section" id="performance_testing__checking_config_performance"><h2 class="title sectiontitle">Checking Impala Configuration Values</h2>
+
+
+
+      <p class="p">
+        You can inspect Impala configuration values by connecting to your Impala server using a browser.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">To check Impala configuration values:</strong>
+      </p>
+
+      <ol class="ol">
+        <li class="li">
+          Use a browser to connect to one of the hosts running <code class="ph codeph">impalad</code> in your environment.
+          Connect using an address of the form
+          <code class="ph codeph">http://<var class="keyword varname">hostname</var>:<var class="keyword varname">port</var>/varz</code>.
+          <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+            In the preceding example, replace <code class="ph codeph">hostname</code> and <code class="ph codeph">port</code> with the name and
+            port of your Impala server. The default port is 25000.
+          </div>
+        </li>
+
+        <li class="li">
+          Review the configured values.
+          <p class="p">
+            For example, to check that your system is configured to use block locality tracking information, you
+            would check that the value for <code class="ph codeph">dfs.datanode.hdfs-blocks-metadata.enabled</code> is
+            <code class="ph codeph">true</code>.
+          </p>
+        </li>
+      </ol>
+
+      <p class="p" id="performance_testing__p_31">
+        <strong class="ph b">To check data locality:</strong>
+      </p>
+
+      <ol class="ol">
+        <li class="li">
+          Execute a query on a dataset that is available across multiple nodes. For example, for a table named
+          <code class="ph codeph">MyTable</code> that has a reasonable chance of being spread across multiple DataNodes:
+<pre class="pre codeblock"><code>[impalad-host:21000] &gt; SELECT COUNT (*) FROM MyTable</code></pre>
+        </li>
+
+        <li class="li">
+          After the query completes, review the contents of the Impala logs. You should find a recent message
+          similar to the following:
+<pre class="pre codeblock"><code>Total remote scan volume = 0</code></pre>
+        </li>
+      </ol>
+
+      <p class="p">
+        The presence of remote scans may indicate <code class="ph codeph">impalad</code> is not running on the correct nodes.
+        This can be because some DataNodes do not have <code class="ph codeph">impalad</code> running or it can be because the
+        <code class="ph codeph">impalad</code> instance that is starting the query is unable to contact one or more of the
+        <code class="ph codeph">impalad</code> instances.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">To understand the causes of this issue:</strong>
+      </p>
+
+      <ol class="ol">
+        <li class="li">
+          Connect to the debugging web server. By default, this server runs on port 25000. This page lists all
+          <code class="ph codeph">impalad</code> instances running in your cluster. If there are fewer instances than you expect,
+          this often indicates some DataNodes are not running <code class="ph codeph">impalad</code>. Ensure
+          <code class="ph codeph">impalad</code> is started on all DataNodes.
+        </li>
+
+        <li class="li">
+
+          If you are using multi-homed hosts, ensure that the Impala daemon's hostname resolves to the interface on
+          which <code class="ph codeph">impalad</code> is running. The hostname Impala is using is displayed when
+          <code class="ph codeph">impalad</code> starts. To explicitly set the hostname, use the <code class="ph codeph">--hostname</code>&nbsp;flag.
+        </li>
+
+        <li class="li">
+          Check that <code class="ph codeph">statestored</code> is running as expected. Review the contents of the state store
+          log to ensure all instances of <code class="ph codeph">impalad</code> are listed as having connected to the state
+          store.
+        </li>
+      </ol>
+    </section>
+
+    <section class="section" id="performance_testing__checking_config_logs"><h2 class="title sectiontitle">Reviewing Impala Logs</h2>
+
+
+
+      <p class="p">
+        You can review the contents of the Impala logs for signs that short-circuit reads or block location
+        tracking are not functioning. Before checking logs, execute a simple query against a small HDFS dataset.
+        Completing a query task generates log messages using current settings. Information on starting Impala and
+        executing queries can be found in <a class="xref" href="impala_processes.html#processes">Starting Impala</a> and
+        <a class="xref" href="impala_impala_shell.html#impala_shell">Using the Impala Shell (impala-shell Command)</a>. Information on logging can be found in
+        <a class="xref" href="impala_logging.html#logging">Using Impala Logging</a>. Log messages and their interpretations are as follows:
+      </p>
+
+      <table class="table"><caption></caption><colgroup><col style="width:75%"><col style="width:25%"></colgroup><thead class="thead">
+            <tr class="row">
+              <th class="entry nocellnorowborder" id="performance_testing__checking_config_logs__entry__1">
+                Log Message
+              </th>
+              <th class="entry nocellnorowborder" id="performance_testing__checking_config_logs__entry__2">
+                Interpretation
+              </th>
+            </tr>
+          </thead><tbody class="tbody">
+            <tr class="row">
+              <td class="entry nocellnorowborder" headers="performance_testing__checking_config_logs__entry__1 ">
+                <div class="p">
+<pre class="pre">Unknown disk id. This will negatively affect performance. Check your hdfs settings to enable block location metadata
+</pre>
+                </div>
+              </td>
+              <td class="entry nocellnorowborder" headers="performance_testing__checking_config_logs__entry__2 ">
+                <p class="p">
+                  Tracking block locality is not enabled.
+                </p>
+              </td>
+            </tr>
+            <tr class="row">
+              <td class="entry nocellnorowborder" headers="performance_testing__checking_config_logs__entry__1 ">
+                <div class="p">
+<pre class="pre">Unable to load native-hadoop library for your platform... using builtin-java classes where applicable</pre>
+                </div>
+              </td>
+              <td class="entry nocellnorowborder" headers="performance_testing__checking_config_logs__entry__2 ">
+                <p class="p">
+                  Native checksumming is not enabled.
+                </p>
+              </td>
+            </tr>
+          </tbody></table>
+    </section>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_performance.html">Tuning Impala for Performance</a></div></div></nav></article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_performance.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_performance.html b/docs/build3x/html/topics/impala_performance.html
new file mode 100644
index 0000000..bc87821
--- /dev/null
+++ b/docs/build3x/html/topics/impala_performance.html
@@ -0,0 +1,116 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_perf_cookbook.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_perf_joins.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_perf_stats.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_perf_benchmarking.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_perf_resources.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_runtime_filtering.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_perf_hdfs_caching.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_perf_testing.html"><meta name="DC.Relation" scheme="URI" content="../topics/im
 pala_explain_plan.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_perf_skew.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="performance"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Tuning Impala for Performance</title></head><body id="performance"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Tuning Impala for Performance</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+      The following sections explain the factors affecting the performance of Impala features, and procedures for
+      tuning, monitoring, and benchmarking Impala queries and other SQL operations.
+    </p>
+
+    <p class="p">
+      This section also describes techniques for maximizing Impala scalability. Scalability is tied to performance:
+      it means that performance remains high as the system workload increases. For example, reducing the disk I/O
+      performed by a query can speed up an individual query, and at the same time improve scalability by making it
+      practical to run more queries simultaneously. Sometimes, an optimization technique improves scalability more
+      than performance. For example, reducing memory usage for a query might not change the query performance much,
+      but might improve scalability by allowing more Impala queries or other kinds of jobs to run at the same time
+      without running out of memory.
+    </p>
+
+    <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+      <p class="p">
+        Before starting any performance tuning or benchmarking, make sure your system is configured with all the
+        recommended minimum hardware requirements from <a class="xref" href="impala_prereqs.html#prereqs_hardware">Hardware Requirements</a> and
+        software settings from <a class="xref" href="impala_config_performance.html#config_performance">Post-Installation Configuration for Impala</a>.
+      </p>
+    </div>
+
+    <ul class="ul">
+      <li class="li">
+        <a class="xref" href="impala_partitioning.html#partitioning">Partitioning for Impala Tables</a>. This technique physically divides the data based on
+        the different values in frequently queried columns, allowing queries to skip reading a large percentage of
+        the data in a table.
+      </li>
+
+      <li class="li">
+        <a class="xref" href="impala_perf_joins.html#perf_joins">Performance Considerations for Join Queries</a>. Joins are the main class of queries that you can tune at
+        the SQL level, as opposed to changing physical factors such as the file format or the hardware
+        configuration. The related topics <a class="xref" href="impala_perf_stats.html#perf_column_stats">Overview of Column Statistics</a> and
+        <a class="xref" href="impala_perf_stats.html#perf_table_stats">Overview of Table Statistics</a> are also important primarily for join performance.
+      </li>
+
+      <li class="li">
+        <a class="xref" href="impala_perf_stats.html#perf_table_stats">Overview of Table Statistics</a> and
+        <a class="xref" href="impala_perf_stats.html#perf_column_stats">Overview of Column Statistics</a>. Gathering table and column statistics, using the
+        <code class="ph codeph">COMPUTE STATS</code> statement, helps Impala automatically optimize the performance for join
+        queries, without requiring changes to SQL query statements. (This process is greatly simplified in Impala
+        1.2.2 and higher, because the <code class="ph codeph">COMPUTE STATS</code> statement gathers both kinds of statistics in
+        one operation, and does not require any setup and configuration as was previously necessary for the
+        <code class="ph codeph">ANALYZE TABLE</code> statement in Hive.)
+      </li>
+
+      <li class="li">
+        <a class="xref" href="impala_perf_testing.html#performance_testing">Testing Impala Performance</a>. Do some post-setup testing to ensure Impala is
+        using optimal settings for performance, before conducting any benchmark tests.
+      </li>
+
+      <li class="li">
+        <a class="xref" href="impala_perf_benchmarking.html#perf_benchmarks">Benchmarking Impala Queries</a>. The configuration and sample data that you use
+        for initial experiments with Impala is often not appropriate for doing performance tests.
+      </li>
+
+      <li class="li">
+        <a class="xref" href="impala_perf_resources.html#mem_limits">Controlling Impala Resource Usage</a>. The more memory Impala can utilize, the better query
+        performance you can expect. In a cluster running other kinds of workloads as well, you must make tradeoffs
+        to make sure all Hadoop components have enough memory to perform well, so you might cap the memory that
+        Impala can use.
+      </li>
+
+
+
+      <li class="li">
+        <a class="xref" href="impala_s3.html#s3">Using Impala with the Amazon S3 Filesystem</a>. Queries against data stored in the Amazon Simple Storage Service (S3)
+        have different performance characteristics than when the data is stored in HDFS.
+      </li>
+    </ul>
+
+    <p class="p toc"></p>
+
+    <p class="p">
+        A good source of tips related to scalability and performance tuning is the
+        <a class="xref" href="http://www.slideshare.net/cloudera/the-impala-cookbook-42530186" target="_blank">Impala Cookbook</a>
+        presentation. These slides are updated periodically as new features come out and new benchmarks are performed.
+      </p>
+
+  </div>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+<nav role="navigation" class="related-links"><ul class="ullinks"><li class="link ulchildlink"><strong><a href="../topics/impala_perf_cookbook.html">Impala Performance Guidelines and Best Practices</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_perf_joins.html">Performance Considerations for Join Queries</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_perf_stats.html">Table and Column Statistics</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_perf_benchmarking.html">Benchmarking Impala Queries</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_perf_resources.html">Controlling Impala Resource Usage</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_runtime_filtering.html">Runtime Filtering for Impala Queries (Impala 2.5 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/i
 mpala_perf_hdfs_caching.html">Using HDFS Caching with Impala (Impala 2.1 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_perf_testing.html">Testing Impala Performance</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_explain_plan.html">Understanding Impala Query Performance - EXPLAIN Plans and Query Profiles</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_perf_skew.html">Detecting and Correcting HDFS Block Skew Conditions</a></strong><br></li></ul></nav></article></main></body></html>

[24/51] [partial] impala git commit: [DOCS] Impala doc site update for 3.0

Posted by mi...@apache.org.

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_langref_unsupported.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_langref_unsupported.html b/docs/build3x/html/topics/impala_langref_unsupported.html
new file mode 100644
index 0000000..769bf86
--- /dev/null
+++ b/docs/build3x/html/topics/impala_langref_unsupported.html
@@ -0,0 +1,337 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_langref.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="langref_hiveql_delta"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>SQL Differences Between Impala and Hive</title></head><body id="langref_hiveql_delta"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">SQL Differences Between Impala and Hive</h1>
+
+
+  <div class="body conbody">
+
+    <p class="p">
+
+
+      Impala's SQL syntax follows the SQL-92 standard, and includes many industry extensions in areas such as
+      built-in functions. See <a class="xref" href="impala_porting.html#porting">Porting SQL from Other Database Systems to Impala</a> for a general discussion of adapting SQL
+      code from a variety of database systems to Impala.
+    </p>
+
+    <p class="p">
+      Because Impala and Hive share the same metastore database and their tables are often used interchangeably,
+      the following section covers differences between Impala and Hive in detail.
+    </p>
+
+    <p class="p toc inpage"></p>
+  </div>
+
+  <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_langref.html">Impala SQL Language Reference</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="langref_hiveql_delta__langref_hiveql_unsupported">
+
+    <h2 class="title topictitle2" id="ariaid-title2">HiveQL Features not Available in Impala</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        The current release of Impala does not support the following SQL features that you might be familiar with
+        from HiveQL:
+      </p>
+
+
+
+      <ul class="ul">
+
+
+        <li class="li">
+          Extensibility mechanisms such as <code class="ph codeph">TRANSFORM</code>, custom file formats, or custom SerDes.
+        </li>
+
+        <li class="li">
+          The <code class="ph codeph">DATE</code> data type.
+        </li>
+
+        <li class="li">
+          XML and JSON functions.
+        </li>
+
+        <li class="li">
+          Certain aggregate functions from HiveQL: <code class="ph codeph">covar_pop</code>, <code class="ph codeph">covar_samp</code>,
+          <code class="ph codeph">corr</code>, <code class="ph codeph">percentile</code>, <code class="ph codeph">percentile_approx</code>,
+          <code class="ph codeph">histogram_numeric</code>, <code class="ph codeph">collect_set</code>; Impala supports the set of aggregate
+          functions listed in <a class="xref" href="impala_aggregate_functions.html#aggregate_functions">Impala Aggregate Functions</a> and analytic
+          functions listed in <a class="xref" href="impala_analytic_functions.html#analytic_functions">Impala Analytic Functions</a>.
+        </li>
+
+        <li class="li">
+          Sampling.
+        </li>
+
+        <li class="li">
+          Lateral views. In <span class="keyword">Impala 2.3</span> and higher, Impala supports queries on complex types
+          (<code class="ph codeph">STRUCT</code>, <code class="ph codeph">ARRAY</code>, or <code class="ph codeph">MAP</code>), using join notation
+          rather than the <code class="ph codeph">EXPLODE()</code> keyword.
+          See <a class="xref" href="impala_complex_types.html#complex_types">Complex Types (Impala 2.3 or higher only)</a> for details about Impala support for complex types.
+        </li>
+
+        <li class="li">
+          Multiple <code class="ph codeph">DISTINCT</code> clauses per query, although Impala includes some workarounds for this
+          limitation.
+          <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+        <p class="p">
+          By default, Impala only allows a single <code class="ph codeph">COUNT(DISTINCT <var class="keyword varname">columns</var>)</code>
+          expression in each query.
+        </p>
+        <p class="p">
+          If you do not need precise accuracy, you can produce an estimate of the distinct values for a column by
+          specifying <code class="ph codeph">NDV(<var class="keyword varname">column</var>)</code>; a query can contain multiple instances of
+          <code class="ph codeph">NDV(<var class="keyword varname">column</var>)</code>. To make Impala automatically rewrite
+          <code class="ph codeph">COUNT(DISTINCT)</code> expressions to <code class="ph codeph">NDV()</code>, enable the
+          <code class="ph codeph">APPX_COUNT_DISTINCT</code> query option.
+        </p>
+        <p class="p">
+          To produce the same result as multiple <code class="ph codeph">COUNT(DISTINCT)</code> expressions, you can use the
+          following technique for queries involving a single table:
+        </p>
+<pre class="pre codeblock"><code>select v1.c1 result1, v2.c1 result2 from
+  (select count(distinct col1) as c1 from t1) v1
+    cross join
+  (select count(distinct col2) as c1 from t1) v2;
+</code></pre>
+        <p class="p">
+          Because <code class="ph codeph">CROSS JOIN</code> is an expensive operation, prefer to use the <code class="ph codeph">NDV()</code>
+          technique wherever practical.
+        </p>
+      </div>
+        </li>
+      </ul>
+
+      <div class="p">
+        User-defined functions (UDFs) are supported starting in Impala 1.2. See <a class="xref" href="impala_udf.html#udfs">Impala User-Defined Functions (UDFs)</a>
+        for full details on Impala UDFs.
+        <ul class="ul">
+          <li class="li">
+            <p class="p">
+              Impala supports high-performance UDFs written in C++, as well as reusing some Java-based Hive UDFs.
+            </p>
+          </li>
+
+          <li class="li">
+            <p class="p">
+              Impala supports scalar UDFs and user-defined aggregate functions (UDAFs). Impala does not currently
+              support user-defined table generating functions (UDTFs).
+            </p>
+          </li>
+
+          <li class="li">
+            <p class="p">
+              Only Impala-supported column types are supported in Java-based UDFs.
+            </p>
+          </li>
+
+          <li class="li">
+            <p class="p">
+        The Hive <code class="ph codeph">current_user()</code> function cannot be
+        called from a Java UDF through Impala.
+      </p>
+          </li>
+        </ul>
+      </div>
+
+      <p class="p">
+        Impala does not currently support these HiveQL statements:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          <code class="ph codeph">ANALYZE TABLE</code> (the Impala equivalent is <code class="ph codeph">COMPUTE STATS</code>)
+        </li>
+
+        <li class="li">
+          <code class="ph codeph">DESCRIBE COLUMN</code>
+        </li>
+
+        <li class="li">
+          <code class="ph codeph">DESCRIBE DATABASE</code>
+        </li>
+
+        <li class="li">
+          <code class="ph codeph">EXPORT TABLE</code>
+        </li>
+
+        <li class="li">
+          <code class="ph codeph">IMPORT TABLE</code>
+        </li>
+
+        <li class="li">
+          <code class="ph codeph">SHOW TABLE EXTENDED</code>
+        </li>
+
+        <li class="li">
+          <code class="ph codeph">SHOW INDEXES</code>
+        </li>
+
+        <li class="li">
+          <code class="ph codeph">SHOW COLUMNS</code>
+        </li>
+
+        <li class="li">
+          <code class="ph codeph">INSERT OVERWRITE DIRECTORY</code>; use <code class="ph codeph">INSERT OVERWRITE <var class="keyword varname">table_name</var></code>
+          or <code class="ph codeph">CREATE TABLE AS SELECT</code> to materialize query results into the HDFS directory associated
+          with an Impala table.
+        </li>
+      </ul>
+      <p class="p">
+        Impala respects the <code class="ph codeph">serialization.null.format</code> table
+        property only for TEXT tables and ignores the property for Parquet and
+        other formats. Hive respects the <code class="ph codeph">serialization.null.format</code>
+        property for Parquet and other formats and converts matching values
+        to NULL during the scan. See <a class="xref" href="impala_txtfile.html">Using Text Data Files with Impala Tables</a> for
+        using the table property in Impala.
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="langref_hiveql_delta__langref_hiveql_semantics">
+
+    <h2 class="title topictitle2" id="ariaid-title3">Semantic Differences Between Impala and HiveQL Features</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        This section covers instances where Impala and Hive have similar functionality, sometimes including the
+        same syntax, but there are differences in the runtime semantics of those features.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Security:</strong>
+      </p>
+
+      <p class="p">
+        Impala utilizes the <a class="xref" href="http://sentry.apache.org/" target="_blank">Apache
+        Sentry </a> authorization framework, which provides fine-grained role-based access control
+        to protect data against unauthorized access or tampering.
+      </p>
+
+      <p class="p">
+        The Hive component now includes Sentry-enabled <code class="ph codeph">GRANT</code>,
+        <code class="ph codeph">REVOKE</code>, and <code class="ph codeph">CREATE/DROP ROLE</code> statements. Earlier Hive releases had a
+        privilege system with <code class="ph codeph">GRANT</code> and <code class="ph codeph">REVOKE</code> statements that were primarily
+        intended to prevent accidental deletion of data, rather than a security mechanism to protect against
+        malicious users.
+      </p>
+
+      <p class="p">
+        Impala can make use of privileges set up through Hive <code class="ph codeph">GRANT</code> and <code class="ph codeph">REVOKE</code> statements.
+        Impala has its own <code class="ph codeph">GRANT</code> and <code class="ph codeph">REVOKE</code> statements in Impala 2.0 and higher.
+        See <a class="xref" href="impala_authorization.html#authorization">Enabling Sentry Authorization for Impala</a> for the details of authorization in Impala, including
+        how to switch from the original policy file-based privilege model to the Sentry service using privileges
+        stored in the metastore database.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">SQL statements and clauses:</strong>
+      </p>
+
+      <p class="p">
+        The semantics of Impala SQL statements varies from HiveQL in some cases where they use similar SQL
+        statement and clause names:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          Impala uses different syntax and names for query hints, <code class="ph codeph">[SHUFFLE]</code> and
+          <code class="ph codeph">[NOSHUFFLE]</code> rather than <code class="ph codeph">MapJoin</code> or <code class="ph codeph">StreamJoin</code>. See
+          <a class="xref" href="impala_joins.html#joins">Joins in Impala SELECT Statements</a> for the Impala details.
+        </li>
+
+        <li class="li">
+          Impala does not expose MapReduce specific features of <code class="ph codeph">SORT BY</code>, <code class="ph codeph">DISTRIBUTE
+          BY</code>, or <code class="ph codeph">CLUSTER BY</code>.
+        </li>
+
+        <li class="li">
+          Impala does not require queries to include a <code class="ph codeph">FROM</code> clause.
+        </li>
+      </ul>
+
+      <p class="p">
+        <strong class="ph b">Data types:</strong>
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          Impala supports a limited set of implicit casts. This can help avoid undesired results from unexpected
+          casting behavior.
+          <ul class="ul">
+            <li class="li">
+              Impala does not implicitly cast between string and numeric or Boolean types. Always use
+              <code class="ph codeph">CAST()</code> for these conversions.
+            </li>
+
+            <li class="li">
+              Impala does perform implicit casts among the numeric types, when going from a smaller or less precise
+              type to a larger or more precise one. For example, Impala will implicitly convert a
+              <code class="ph codeph">SMALLINT</code> to a <code class="ph codeph">BIGINT</code> or <code class="ph codeph">FLOAT</code>, but to convert from
+              <code class="ph codeph">DOUBLE</code> to <code class="ph codeph">FLOAT</code> or <code class="ph codeph">INT</code> to <code class="ph codeph">TINYINT</code>
+              requires a call to <code class="ph codeph">CAST()</code> in the query.
+            </li>
+
+            <li class="li">
+              Impala does perform implicit casts from string to timestamp. Impala has a restricted set of literal
+              formats for the <code class="ph codeph">TIMESTAMP</code> data type and the <code class="ph codeph">from_unixtime()</code> format
+              string; see <a class="xref" href="impala_timestamp.html#timestamp">TIMESTAMP Data Type</a> for details.
+            </li>
+          </ul>
+          <p class="p">
+            See <a class="xref" href="impala_datatypes.html#datatypes">Data Types</a> for full details on implicit and explicit casting for
+            all types, and <a class="xref" href="impala_conversion_functions.html#conversion_functions">Impala Type Conversion Functions</a> for details about
+            the <code class="ph codeph">CAST()</code> function.
+          </p>
+        </li>
+
+        <li class="li">
+          Impala does not store or interpret timestamps using the local timezone, to avoid undesired results from
+          unexpected time zone issues. Timestamps are stored and interpreted relative to UTC. This difference can
+          produce different results for some calls to similarly named date/time functions between Impala and Hive.
+          See <a class="xref" href="impala_datetime_functions.html#datetime_functions">Impala Date and Time Functions</a> for details about the Impala
+          functions. See <a class="xref" href="impala_timestamp.html#timestamp">TIMESTAMP Data Type</a> for a discussion of how Impala handles
+          time zones, and configuration options you can use to make Impala match the Hive behavior more closely
+          when dealing with Parquet-encoded <code class="ph codeph">TIMESTAMP</code> data or when converting between
+          the local time zone and UTC.
+        </li>
+
+        <li class="li">
+          The Impala <code class="ph codeph">TIMESTAMP</code> type can represent dates ranging from 1400-01-01 to 9999-12-31.
+          This is different from the Hive date range, which is 0000-01-01 to 9999-12-31.
+        </li>
+
+        <li class="li">
+          <p class="p">
+        Impala does not return column overflows as <code class="ph codeph">NULL</code>, so that customers can distinguish
+        between <code class="ph codeph">NULL</code> data and overflow conditions similar to how they do so with traditional
+        database systems. Impala returns the largest or smallest value in the range for the type. For example,
+        valid values for a <code class="ph codeph">tinyint</code> range from -128 to 127. In Impala, a <code class="ph codeph">tinyint</code>
+        with a value of -200 returns -128 rather than <code class="ph codeph">NULL</code>. A <code class="ph codeph">tinyint</code> with a
+        value of 200 returns 127.
+      </p>
+        </li>
+
+      </ul>
+
+      <p class="p">
+        <strong class="ph b">Miscellaneous features:</strong>
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          Impala does not provide virtual columns.
+        </li>
+
+        <li class="li">
+          Impala does not expose locking.
+        </li>
+
+        <li class="li">
+          Impala does not expose some configuration properties.
+        </li>
+      </ul>
+    </div>
+  </article>
+</article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_ldap.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_ldap.html b/docs/build3x/html/topics/impala_ldap.html
new file mode 100644
index 0000000..7729e93
--- /dev/null
+++ b/docs/build3x/html/topics/impala_ldap.html
@@ -0,0 +1,294 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_authentication.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="ldap"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Enabling LDAP Authentication for Impala</title></head><body id="ldap"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Enabling LDAP Authentication for Impala</h1>
+
+
+  <div class="body conbody">
+
+
+
+    <p class="p"> Authentication is the process of allowing only specified named users to
+      access the server (in this case, the Impala server). This feature is
+      crucial for any production deployment, to prevent misuse, tampering, or
+      excessive load on the server. Impala uses LDAP for authentication,
+      verifying the credentials of each user who connects through
+        <span class="keyword cmdname">impala-shell</span>, Hue, a Business Intelligence tool, JDBC
+      or ODBC application, and so on. </p>
+
+    <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+      Regardless of the authentication mechanism used, Impala always creates HDFS directories and data files
+      owned by the same user (typically <code class="ph codeph">impala</code>). To implement user-level access to different
+      databases, tables, columns, partitions, and so on, use the Sentry authorization feature, as explained in
+      <a class="xref" href="../shared/../topics/impala_authorization.html#authorization">Enabling Sentry Authorization for Impala</a>.
+    </div>
+
+    <p class="p">
+      An alternative form of authentication you can use is Kerberos, described in
+      <a class="xref" href="impala_kerberos.html#kerberos">Enabling Kerberos Authentication for Impala</a>.
+    </p>
+
+    <p class="p toc inpage"></p>
+
+  </div>
+
+  <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_authentication.html">Impala Authentication</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="ldap__ldap_prereqs">
+
+    <h2 class="title topictitle2" id="ariaid-title2">Requirements for Using Impala with LDAP</h2>
+
+
+    <div class="body conbody">
+
+      <p class="p">
+        Authentication against LDAP servers is available in Impala 1.2.2 and higher. Impala 1.4.0 adds support for
+        secure LDAP authentication through SSL and TLS.
+      </p>
+
+      <p class="p">
+        The Impala LDAP support lets you use Impala with systems such as Active Directory that use LDAP behind the
+        scenes.
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="ldap__ldap_client_server">
+
+    <h2 class="title topictitle2" id="ariaid-title3">Client-Server Considerations for LDAP</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        Only client-&gt;Impala connections can be authenticated by LDAP.
+      </p>
+
+      <p class="p"> You must use the Kerberos authentication mechanism for connections
+        between internal Impala components, such as between the
+          <span class="keyword cmdname">impalad</span>, <span class="keyword cmdname">statestored</span>, and
+          <span class="keyword cmdname">catalogd</span> daemons. See <a class="xref" href="impala_kerberos.html#kerberos">Enabling Kerberos Authentication for Impala</a> on how to set up Kerberos for
+        Impala. </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="ldap__ldap_config">
+
+    <h2 class="title topictitle2" id="ariaid-title4">Server-Side LDAP Setup</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        These requirements apply on the server side when configuring and starting Impala:
+      </p>
+
+      <p class="p">
+        To enable LDAP authentication, set the following startup options for <span class="keyword cmdname">impalad</span>:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          <code class="ph codeph">--enable_ldap_auth</code> enables LDAP-based authentication between the client and Impala.
+        </li>
+
+        <li class="li">
+          <code class="ph codeph">--ldap_uri</code> sets the URI of the LDAP server to use. Typically, the URI is prefixed with
+          <code class="ph codeph">ldap://</code>. In Impala 1.4.0 and higher, you can specify secure SSL-based LDAP transport by
+          using the prefix <code class="ph codeph">ldaps://</code>. The URI can optionally specify the port, for example:
+          <code class="ph codeph">ldap://ldap_server.example.com:389</code> or
+          <code class="ph codeph">ldaps://ldap_server.example.com:636</code>. (389 and 636 are the default ports for non-SSL and
+          SSL LDAP connections, respectively.)
+        </li>
+
+
+
+        <li class="li">
+          For <code class="ph codeph">ldaps://</code> connections secured by SSL,
+          <code class="ph codeph">--ldap_ca_certificate="<var class="keyword varname">/path/to/certificate/pem</var>"</code> specifies the
+          location of the certificate in standard <code class="ph codeph">.PEM</code> format. Store this certificate on the local
+          filesystem, in a location that only the <code class="ph codeph">impala</code> user and other trusted users can read.
+        </li>
+
+
+      </ul>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title5" id="ldap__ldap_bind_strings">
+
+    <h2 class="title topictitle2" id="ariaid-title5">Support for Custom Bind Strings</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        When Impala connects to LDAP it issues a bind call to the LDAP server to authenticate as the connected
+        user. Impala clients, including the Impala shell, provide the short name of the user to Impala. This is
+        necessary so that Impala can use Sentry for role-based access, which uses short names.
+      </p>
+
+      <p class="p">
+        However, LDAP servers often require more complex, structured usernames for authentication. Impala supports
+        three ways of transforming the short name (for example, <code class="ph codeph">'henry'</code>) to a more complicated
+        string. If necessary, specify one of the following configuration options
+        when starting the <span class="keyword cmdname">impalad</span> daemon on each DataNode:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          <code class="ph codeph">--ldap_domain</code>: Replaces the username with a string
+          <code class="ph codeph"><var class="keyword varname">username</var>@<var class="keyword varname">ldap_domain</var></code>.
+        </li>
+
+        <li class="li">
+          <code class="ph codeph">--ldap_baseDN</code>: Replaces the username with a <span class="q">"distinguished name"</span> (DN) of the form:
+          <code class="ph codeph">uid=<var class="keyword varname">userid</var>,ldap_baseDN</code>. (This is equivalent to a Hive option).
+        </li>
+
+        <li class="li">
+          <code class="ph codeph">--ldap_bind_pattern</code>: This is the most general option, and replaces the username with the
+          string <var class="keyword varname">ldap_bind_pattern</var> where all instances of the string <code class="ph codeph">#UID</code> are
+          replaced with <var class="keyword varname">userid</var>. For example, an <code class="ph codeph">ldap_bind_pattern</code> of
+          <code class="ph codeph">"user=#UID,OU=foo,CN=bar"</code> with a username of <code class="ph codeph">henry</code> will construct a
+          bind name of <code class="ph codeph">"user=henry,OU=foo,CN=bar"</code>.
+        </li>
+      </ul>
+
+      <p class="p">
+        These options are mutually exclusive; Impala does not start if more than one of these options is specified.
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title6" id="ldap__ldap_security">
+
+    <h2 class="title topictitle2" id="ariaid-title6">Secure LDAP Connections</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        To avoid sending credentials over the wire in cleartext, you must configure a secure connection between
+        both the client and Impala, and between Impala and the LDAP server. The secure connection could use SSL or
+        TLS.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Secure LDAP connections through SSL:</strong>
+      </p>
+
+      <p class="p">
+        For SSL-enabled LDAP connections, specify a prefix of <code class="ph codeph">ldaps://</code> instead of
+        <code class="ph codeph">ldap://</code>. Also, the default port for SSL-enabled LDAP connections is 636 instead of 389.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Secure LDAP connections through TLS:</strong>
+      </p>
+
+      <p class="p">
+        <a class="xref" href="http://en.wikipedia.org/wiki/Transport_Layer_Security" target="_blank">TLS</a>,
+        the successor to the SSL protocol, is supported by most modern LDAP servers. Unlike SSL connections, TLS
+        connections can be made on the same server port as non-TLS connections. To secure all connections using
+        TLS, specify the following flags as startup options to the <span class="keyword cmdname">impalad</span> daemon:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          <code class="ph codeph">--ldap_tls</code> tells Impala to start a TLS connection to the LDAP server, and to fail
+          authentication if it cannot be done.
+        </li>
+
+        <li class="li">
+          <code class="ph codeph">--ldap_ca_certificate="<var class="keyword varname">/path/to/certificate/pem</var>"</code> specifies the
+          location of the certificate in standard <code class="ph codeph">.PEM</code> format. Store this certificate on the local
+          filesystem, in a location that only the <code class="ph codeph">impala</code> user and other trusted users can read.
+        </li>
+      </ul>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title7" id="ldap__ldap_impala_shell">
+
+    <h2 class="title topictitle2" id="ariaid-title7">LDAP Authentication for impala-shell Interpreter</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        To connect to Impala using LDAP authentication, you specify command-line options to the
+        <span class="keyword cmdname">impala-shell</span> command interpreter and enter the password when prompted:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          <code class="ph codeph">-l</code> enables LDAP authentication.
+        </li>
+
+        <li class="li">
+          <code class="ph codeph">-u</code> sets the user. Per Active Directory, the user is the short username, not the full
+          LDAP distinguished name. If your LDAP settings include a search base, use the
+          <code class="ph codeph">--ldap_bind_pattern</code> on the <span class="keyword cmdname">impalad</span> daemon to translate the short user
+          name from <span class="keyword cmdname">impala-shell</span> automatically to the fully qualified name.
+
+        </li>
+
+        <li class="li">
+          <span class="keyword cmdname">impala-shell</span> automatically prompts for the password.
+        </li>
+      </ul>
+
+      <p class="p">
+        For the full list of available <span class="keyword cmdname">impala-shell</span> options, see
+        <a class="xref" href="impala_shell_options.html#shell_options">impala-shell Configuration Options</a>.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">LDAP authentication for JDBC applications:</strong> See <a class="xref" href="impala_jdbc.html#impala_jdbc">Configuring Impala to Work with JDBC</a> for the
+        format to use with the JDBC connection string for servers using LDAP authentication.
+      </p>
+    </div>
+  </article>
+  <article class="topic concept nested1" aria-labelledby="ariaid-title8" id="ldap__ldap_impala_hue">
+    <h2 class="title topictitle2" id="ariaid-title8">Enabling LDAP for Impala in Hue</h2>
+
+    <div class="body conbody">
+      <section class="section" id="ldap_impala_hue__ldap_impala_hue_cmdline"><h3 class="title sectiontitle">Enabling LDAP for Impala in Hue Using the Command Line</h3>
+
+        <div class="p">LDAP authentication for the Impala app in Hue can be enabled by
+          setting the following properties under the <code class="ph codeph">[impala]</code>
+          section in <code class="ph codeph">hue.ini</code>. <table class="table" id="ldap_impala_hue__ldap_impala_hue_configs"><caption></caption><colgroup><col style="width:33.33333333333333%"><col style="width:66.66666666666666%"></colgroup><tbody class="tbody">
+                <tr class="row">
+                  <td class="entry nocellnorowborder"><code class="ph codeph">auth_username</code></td>
+                  <td class="entry nocellnorowborder">LDAP username of Hue user to be authenticated.</td>
+                </tr>
+                <tr class="row">
+                  <td class="entry nocellnorowborder"><code class="ph codeph">auth_password</code></td>
+                  <td class="entry nocellnorowborder">
+                    <p class="p">LDAP password of Hue user to be authenticated.</p>
+                  </td>
+                </tr>
+              </tbody></table>These login details are only used by Impala to authenticate to
+          LDAP. The Impala service trusts Hue to have already validated the user
+          being impersonated, rather than simply passing on the credentials.</div>
+      </section>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title9" id="ldap__ldap_delegation">
+    <h2 class="title topictitle2" id="ariaid-title9">Enabling Impala Delegation for LDAP Users</h2>
+    <div class="body conbody">
+      <p class="p">
+        See <a class="xref" href="impala_delegation.html#delegation">Configuring Impala Delegation for Hue and BI Tools</a> for details about the delegation feature
+        that lets certain users submit queries using the credentials of other users.
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title10" id="ldap__ldap_restrictions">
+
+    <h2 class="title topictitle2" id="ariaid-title10">LDAP Restrictions for Impala</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        The LDAP support is preliminary. It currently has only been tested against Active Directory.
+      </p>
+    </div>
+  </article>
+</article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_limit.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_limit.html b/docs/build3x/html/topics/impala_limit.html
new file mode 100644
index 0000000..22dc7a5
--- /dev/null
+++ b/docs/build3x/html/topics/impala_limit.html
@@ -0,0 +1,168 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_select.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="limit"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>LIMIT Clause</title></head><body id="limit"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">LIMIT Clause</h1>
+
+
+  <div class="body conbody">
+
+    <p class="p">
+      The <code class="ph codeph">LIMIT</code> clause in a <code class="ph codeph">SELECT</code> query sets a maximum number of rows for the
+      result set. Pre-selecting the maximum size of the result set helps Impala to optimize memory usage while
+      processing a distributed query.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>LIMIT <var class="keyword varname">constant_integer_expression</var></code></pre>
+
+    <p class="p">
+      The argument to the <code class="ph codeph">LIMIT</code> clause must evaluate to a constant value. It can be a numeric
+      literal, or another kind of numeric expression involving operators, casts, and function return values. You
+      cannot refer to a column or use a subquery.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+    <p class="p">
+      This clause is useful in contexts such as:
+    </p>
+
+    <ul class="ul">
+      <li class="li">
+        To return exactly N items from a top-N query, such as the 10 highest-rated items in a shopping category or
+        the 50 hostnames that refer the most traffic to a web site.
+      </li>
+
+      <li class="li">
+        To demonstrate some sample values from a table or a particular query. (To display some arbitrary items, use
+        a query with no <code class="ph codeph">ORDER BY</code> clause. An <code class="ph codeph">ORDER BY</code> clause causes additional
+        memory and/or disk usage during the query.)
+      </li>
+
+      <li class="li">
+        To keep queries from returning huge result sets by accident if a table is larger than expected, or a
+        <code class="ph codeph">WHERE</code> clause matches more rows than expected.
+      </li>
+    </ul>
+
+    <p class="p">
+      Originally, the value for the <code class="ph codeph">LIMIT</code> clause had to be a numeric literal. In Impala 1.2.1 and
+      higher, it can be a numeric expression.
+    </p>
+
+    <p class="p">
+        Prior to Impala 1.4.0, Impala required any query including an
+        <code class="ph codeph"><a class="xref" href="../shared/../topics/impala_order_by.html#order_by">ORDER BY</a></code> clause to also use a
+        <code class="ph codeph"><a class="xref" href="../shared/../topics/impala_limit.html#limit">LIMIT</a></code> clause. In Impala 1.4.0 and
+        higher, the <code class="ph codeph">LIMIT</code> clause is optional for <code class="ph codeph">ORDER BY</code> queries. In cases where
+        sorting a huge result set requires enough memory to exceed the Impala memory limit for a particular node,
+        Impala automatically uses a temporary disk work area to perform the sort operation.
+      </p>
+
+    <p class="p">
+      See <a class="xref" href="impala_order_by.html#order_by">ORDER BY Clause</a> for details.
+    </p>
+
+    <p class="p">
+        In Impala 1.2.1 and higher, you can combine a <code class="ph codeph">LIMIT</code> clause with an <code class="ph codeph">OFFSET</code>
+        clause to produce a small result set that is different from a top-N query, for example, to return items 11
+        through 20. This technique can be used to simulate <span class="q">"paged"</span> results. Because Impala queries typically
+        involve substantial amounts of I/O, use this technique only for compatibility in cases where you cannot
+        rewrite the application logic. For best performance and scalability, wherever practical, query as many
+        items as you expect to need, cache them on the application side, and display small groups of results to
+        users using application logic.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Restrictions:</strong>
+      </p>
+
+    <p class="p">
+        Correlated subqueries used in <code class="ph codeph">EXISTS</code> and <code class="ph codeph">IN</code> operators cannot include a
+        <code class="ph codeph">LIMIT</code> clause.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+    <p class="p">
+      The following example shows how the <code class="ph codeph">LIMIT</code> clause caps the size of the result set, with the
+      limit being applied after any other clauses such as <code class="ph codeph">WHERE</code>.
+    </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; create database limits;
+[localhost:21000] &gt; use limits;
+[localhost:21000] &gt; create table numbers (x int);
+[localhost:21000] &gt; insert into numbers values (1), (3), (4), (5), (2);
+Inserted 5 rows in 1.34s
+[localhost:21000] &gt; select x from numbers limit 100;
++---+
+| x |
++---+
+| 1 |
+| 3 |
+| 4 |
+| 5 |
+| 2 |
++---+
+Returned 5 row(s) in 0.26s
+[localhost:21000] &gt; select x from numbers limit 3;
++---+
+| x |
++---+
+| 1 |
+| 3 |
+| 4 |
++---+
+Returned 3 row(s) in 0.27s
+[localhost:21000] &gt; select x from numbers where x &gt; 2 limit 2;
++---+
+| x |
++---+
+| 3 |
+| 4 |
++---+
+Returned 2 row(s) in 0.27s</code></pre>
+
+    <p class="p">
+      For top-N and bottom-N queries, you use the <code class="ph codeph">ORDER BY</code> and <code class="ph codeph">LIMIT</code> clauses
+      together:
+    </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; select x as "Top 3" from numbers order by x desc limit 3;
++-------+
+| top 3 |
++-------+
+| 5     |
+| 4     |
+| 3     |
++-------+
+[localhost:21000] &gt; select x as "Bottom 3" from numbers order by x limit 3;
++----------+
+| bottom 3 |
++----------+
+| 1        |
+| 2        |
+| 3        |
++----------+
+</code></pre>
+
+    <p class="p">
+      You can use constant values besides integer literals as the <code class="ph codeph">LIMIT</code> argument:
+    </p>
+
+<pre class="pre codeblock"><code>-- Other expressions that yield constant integer values work too.
+SELECT x FROM t1 LIMIT 1e6;                        -- Limit is one million.
+SELECT x FROM t1 LIMIT length('hello world');      -- Limit is 11.
+SELECT x FROM t1 LIMIT 2+2;                        -- Limit is 4.
+SELECT x FROM t1 LIMIT cast(truncate(9.9) AS INT); -- Limit is 9.
+</code></pre>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_select.html">SELECT Statement</a></div></div></nav></article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_lineage.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_lineage.html b/docs/build3x/html/topics/impala_lineage.html
new file mode 100644
index 0000000..12b3794
--- /dev/null
+++ b/docs/build3x/html/topics/impala_lineage.html
@@ -0,0 +1,91 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_security.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="lineage"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Viewing Lineage Information for Impala Data</title></head><body id="lineage"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Viewing Lineage Information for Impala Data</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+
+
+      <dfn class="term">Lineage</dfn> is a feature that helps you track where data originated, and how
+      data propagates through the system through SQL statements such as
+        <code class="ph codeph">SELECT</code>, <code class="ph codeph">INSERT</code>, and <code class="ph codeph">CREATE
+        TABLE AS SELECT</code>.
+    </p>
+    <p class="p">
+      This type of tracking is important in high-security configurations, especially in
+      highly regulated industries such as healthcare, pharmaceuticals, financial services and
+      intelligence. For such kinds of sensitive data, it is important to know all
+      the places in the system that contain that data or other data derived from it; to verify who has accessed
+      that data; and to be able to doublecheck that the data used to make a decision was processed correctly and
+      not tampered with.
+    </p>
+
+    <section class="section" id="lineage__column_lineage"><h2 class="title sectiontitle">Column Lineage</h2>
+
+
+
+      <p class="p">
+        <dfn class="term">Column lineage</dfn> tracks information in fine detail, at the level of
+        particular columns rather than entire tables.
+      </p>
+
+      <p class="p">
+        For example, if you have a table with information derived from web logs, you might copy that data into
+        other tables as part of the ETL process. The ETL operations might involve transformations through
+        expressions and function calls, and rearranging the columns into more or fewer tables
+        (<dfn class="term">normalizing</dfn> or <dfn class="term">denormalizing</dfn> the data). Then for reporting, you might issue
+        queries against multiple tables and views. In this example, column lineage helps you determine that data
+        that entered the system as <code class="ph codeph">RAW_LOGS.FIELD1</code> was then turned into
+        <code class="ph codeph">WEBSITE_REPORTS.IP_ADDRESS</code> through an <code class="ph codeph">INSERT ... SELECT</code> statement. Or,
+        conversely, you could start with a reporting query against a view, and trace the origin of the data in a
+        field such as <code class="ph codeph">TOP_10_VISITORS.USER_ID</code> back to the underlying table and even further back
+        to the point where the data was first loaded into Impala.
+      </p>
+
+      <p class="p">
+        When you have tables where you need to track or control access to sensitive information at the column
+        level, see <a class="xref" href="impala_authorization.html#authorization">Enabling Sentry Authorization for Impala</a> for how to implement column-level
+        security. You set up authorization using the Sentry framework, create views that refer to specific sets of
+        columns, and then assign authorization privileges to those views rather than the underlying tables.
+      </p>
+
+    </section>
+
+    <section class="section" id="lineage__lineage_data"><h2 class="title sectiontitle">Lineage Data for Impala</h2>
+
+
+
+      <p class="p">
+        The lineage feature is enabled by default. When lineage logging is enabled, the serialized column lineage
+        graph is computed for each query and stored in a specialized log file in JSON format.
+      </p>
+
+      <p class="p">
+        Impala records queries in the lineage log if they complete successfully, or fail due to authorization
+        errors. For write operations such as <code class="ph codeph">INSERT</code> and <code class="ph codeph">CREATE TABLE AS SELECT</code>,
+        the statement is recorded in the lineage log only if it successfully completes. Therefore, the lineage
+        feature tracks data that was accessed by successful queries, or that was attempted to be accessed by
+        unsuccessful queries that were blocked due to authorization failure. These kinds of queries represent data
+        that really was accessed, or where the attempted access could represent malicious activity.
+      </p>
+
+      <p class="p">
+        Impala does not record in the lineage log queries that fail due to syntax errors or that fail or are
+        cancelled before they reach the stage of requesting rows from the result set.
+      </p>
+
+      <p class="p">
+        To enable or disable this feature, set or remove the <code class="ph codeph">-lineage_event_log_dir</code>
+        configuration option for the <span class="keyword cmdname">impalad</span> daemon.
+      </p>
+
+    </section>
+
+  </div>
+
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_security.html">Impala Security</a></div></div></nav></article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_literals.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_literals.html b/docs/build3x/html/topics/impala_literals.html
new file mode 100644
index 0000000..b9cfe57
--- /dev/null
+++ b/docs/build3x/html/topics/impala_literals.html
@@ -0,0 +1,424 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_langref.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="literals"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Literals</title></head><body id="literals"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Literals</h1>
+
+
+  <div class="body conbody">
+
+    <p class="p">
+
+      Each of the Impala data types has corresponding notation for literal values of that type. You specify literal
+      values in SQL statements, such as in the <code class="ph codeph">SELECT</code> list or <code class="ph codeph">WHERE</code> clause of a
+      query, or as an argument to a function call. See <a class="xref" href="impala_datatypes.html#datatypes">Data Types</a> for a complete
+      list of types, ranges, and conversion rules.
+    </p>
+
+    <p class="p toc inpage"></p>
+  </div>
+
+  <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_langref.html">Impala SQL Language Reference</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="literals__numeric_literals">
+
+    <h2 class="title topictitle2" id="ariaid-title2">Numeric Literals</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+
+        To write literals for the integer types (<code class="ph codeph">TINYINT</code>, <code class="ph codeph">SMALLINT</code>,
+        <code class="ph codeph">INT</code>, and <code class="ph codeph">BIGINT</code>), use a sequence of digits with optional leading zeros.
+      </p>
+
+      <p class="p">
+        To write literals for the floating-point types (<code class="ph codeph">DECIMAL</code>,
+        <code class="ph codeph">FLOAT</code>, and <code class="ph codeph">DOUBLE</code>), use a sequence of digits with an optional decimal
+        point (<code class="ph codeph">.</code> character). To preserve accuracy during arithmetic expressions, Impala interprets
+        floating-point literals as the <code class="ph codeph">DECIMAL</code> type with the smallest appropriate precision and
+        scale, until required by the context to convert the result to <code class="ph codeph">FLOAT</code> or
+        <code class="ph codeph">DOUBLE</code>.
+      </p>
+
+      <p class="p">
+        Integer values are promoted to floating-point when necessary, based on the context.
+      </p>
+
+      <p class="p">
+        You can also use exponential notation by including an <code class="ph codeph">e</code> character. For example,
+        <code class="ph codeph">1e6</code> is 1 times 10 to the power of 6 (1 million). A number in exponential notation is
+        always interpreted as floating-point.
+      </p>
+
+      <p class="p">
+        When Impala encounters a numeric literal, it considers the type to be the <span class="q">"smallest"</span> that can
+        accurately represent the value. The type is promoted to larger or more accurate types if necessary, based
+        on subsequent parts of an expression.
+      </p>
+      <p class="p">
+        For example, you can see by the types Impala defines for the following table columns
+        how it interprets the corresponding numeric literals:
+      </p>
+<pre class="pre codeblock"><code>[localhost:21000] &gt; create table ten as select 10 as x;
++-------------------+
+| summary           |
++-------------------+
+| Inserted 1 row(s) |
++-------------------+
+[localhost:21000] &gt; desc ten;
++------+---------+---------+
+| name | type    | comment |
++------+---------+---------+
+| x    | tinyint |         |
++------+---------+---------+
+
+[localhost:21000] &gt; create table four_k as select 4096 as x;
++-------------------+
+| summary           |
++-------------------+
+| Inserted 1 row(s) |
++-------------------+
+[localhost:21000] &gt; desc four_k;
++------+----------+---------+
+| name | type     | comment |
++------+----------+---------+
+| x    | smallint |         |
++------+----------+---------+
+
+[localhost:21000] &gt; create table one_point_five as select 1.5 as x;
++-------------------+
+| summary           |
++-------------------+
+| Inserted 1 row(s) |
++-------------------+
+[localhost:21000] &gt; desc one_point_five;
++------+--------------+---------+
+| name | type         | comment |
++------+--------------+---------+
+| x    | decimal(2,1) |         |
++------+--------------+---------+
+
+[localhost:21000] &gt; create table one_point_three_three_three as select 1.333 as x;
++-------------------+
+| summary           |
++-------------------+
+| Inserted 1 row(s) |
++-------------------+
+[localhost:21000] &gt; desc one_point_three_three_three;
++------+--------------+---------+
+| name | type         | comment |
++------+--------------+---------+
+| x    | decimal(4,3) |         |
++------+--------------+---------+
+</code></pre>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="literals__string_literals">
+
+    <h2 class="title topictitle2" id="ariaid-title3">String Literals</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+
+        String literals are quoted using either single or double quotation marks. You can use either kind of quotes
+        for string literals, even both kinds for different literals within the same statement.
+      </p>
+
+      <p class="p">
+        Quoted literals are considered to be of type <code class="ph codeph">STRING</code>. To use quoted literals in contexts
+        requiring a <code class="ph codeph">CHAR</code> or <code class="ph codeph">VARCHAR</code> value, <code class="ph codeph">CAST()</code> the literal to
+        a <code class="ph codeph">CHAR</code> or <code class="ph codeph">VARCHAR</code> of the appropriate length.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Escaping special characters:</strong>
+      </p>
+
+      <p class="p">
+        To encode special characters within a string literal, precede them with the backslash (<code class="ph codeph">\</code>)
+        escape character:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          <code class="ph codeph">\t</code> represents a tab.
+        </li>
+
+        <li class="li">
+          <code class="ph codeph">\n</code> represents a newline or linefeed. This might cause extra line breaks in
+          <span class="keyword cmdname">impala-shell</span> output.
+        </li>
+
+        <li class="li">
+          <code class="ph codeph">\r</code> represents a carriage return. This might cause unusual formatting (making it appear
+          that some content is overwritten) in <span class="keyword cmdname">impala-shell</span> output.
+        </li>
+
+        <li class="li">
+          <code class="ph codeph">\b</code> represents a backspace. This might cause unusual formatting (making it appear that
+          some content is overwritten) in <span class="keyword cmdname">impala-shell</span> output.
+        </li>
+
+        <li class="li">
+          <code class="ph codeph">\0</code> represents an ASCII <code class="ph codeph">nul</code> character (not the same as a SQL
+          <code class="ph codeph">NULL</code>). This might not be visible in <span class="keyword cmdname">impala-shell</span> output.
+        </li>
+
+        <li class="li">
+          <code class="ph codeph">\Z</code> represents a DOS end-of-file character. This might not be visible in
+          <span class="keyword cmdname">impala-shell</span> output.
+        </li>
+
+        <li class="li">
+          <code class="ph codeph">\%</code> and <code class="ph codeph">\_</code> can be used to escape wildcard characters within the string
+          passed to the <code class="ph codeph">LIKE</code> operator.
+        </li>
+
+        <li class="li">
+          <code class="ph codeph">\</code> followed by 3 octal digits represents the ASCII code of a single character; for
+          example, <code class="ph codeph">\101</code> is ASCII 65, the character <code class="ph codeph">A</code>.
+        </li>
+
+        <li class="li">
+          Use two consecutive backslashes (<code class="ph codeph">\\</code>) to prevent the backslash from being interpreted as
+          an escape character.
+        </li>
+
+        <li class="li">
+          Use the backslash to escape single or double quotation mark characters within a string literal, if the
+          literal is enclosed by the same type of quotation mark.
+        </li>
+
+        <li class="li">
+          If the character following the <code class="ph codeph">\</code> does not represent the start of a recognized escape
+          sequence, the character is passed through unchanged.
+        </li>
+      </ul>
+
+      <p class="p">
+        <strong class="ph b">Quotes within quotes:</strong>
+      </p>
+
+      <p class="p">
+        To include a single quotation character within a string value, enclose the literal with either single or
+        double quotation marks, and optionally escape the single quote as a <code class="ph codeph">\'</code> sequence. Earlier
+        releases required escaping a single quote inside double quotes. Continue using escape sequences in this
+        case if you also need to run your SQL code on older versions of Impala.
+      </p>
+
+      <p class="p">
+        To include a double quotation character within a string value, enclose the literal with single quotation
+        marks, no escaping is necessary in this case. Or, enclose the literal with double quotation marks and
+        escape the double quote as a <code class="ph codeph">\"</code> sequence.
+      </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; select "What\'s happening?" as single_within_double,
+                  &gt;        'I\'m not sure.' as single_within_single,
+                  &gt;        "Homer wrote \"The Iliad\"." as double_within_double,
+                  &gt;        'Homer also wrote "The Odyssey".' as double_within_single;
++----------------------+----------------------+--------------------------+---------------------------------+
+| single_within_double | single_within_single | double_within_double     | double_within_single            |
++----------------------+----------------------+--------------------------+---------------------------------+
+| What's happening?    | I'm not sure.        | Homer wrote "The Iliad". | Homer also wrote "The Odyssey". |
++----------------------+----------------------+--------------------------+---------------------------------+
+</code></pre>
+
+      <p class="p">
+        <strong class="ph b">Field terminator character in CREATE TABLE:</strong>
+      </p>
+
+      <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+        The <code class="ph codeph">CREATE TABLE</code> clauses <code class="ph codeph">FIELDS TERMINATED BY</code>, <code class="ph codeph">ESCAPED
+        BY</code>, and <code class="ph codeph">LINES TERMINATED BY</code> have special rules for the string literal used for
+        their argument, because they all require a single character. You can use a regular character surrounded by
+        single or double quotation marks, an octal sequence such as <code class="ph codeph">'\054'</code> (representing a comma),
+        or an integer in the range '-127'..'128' (with quotation marks but no backslash), which is interpreted as a
+        single-byte ASCII character. Negative values are subtracted from 256; for example, <code class="ph codeph">FIELDS
+        TERMINATED BY '-2'</code> sets the field delimiter to ASCII code 254, the <span class="q">"Icelandic Thorn"</span>
+        character used as a delimiter by some data formats.
+      </div>
+
+      <p class="p">
+        <strong class="ph b">impala-shell considerations:</strong>
+      </p>
+
+      <p class="p">
+        When dealing with output that includes non-ASCII or non-printable characters such as linefeeds and
+        backspaces, use the <span class="keyword cmdname">impala-shell</span> options to save to a file, turn off pretty printing, or
+        both rather than relying on how the output appears visually. See
+        <a class="xref" href="impala_shell_options.html#shell_options">impala-shell Configuration Options</a> for a list of <span class="keyword cmdname">impala-shell</span>
+        options.
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="literals__boolean_literals">
+
+    <h2 class="title topictitle2" id="ariaid-title4">Boolean Literals</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        For <code class="ph codeph">BOOLEAN</code> values, the literals are <code class="ph codeph">TRUE</code> and <code class="ph codeph">FALSE</code>,
+        with no quotation marks and case-insensitive.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>select true;
+select * from t1 where assertion = false;
+select case bool_col when true then 'yes' when false 'no' else 'null' end from t1;</code></pre>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title5" id="literals__timestamp_literals">
+
+    <h2 class="title topictitle2" id="ariaid-title5">Timestamp Literals</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        Impala automatically converts <code class="ph codeph">STRING</code> literals of the
+        correct format into <code class="ph codeph">TIMESTAMP</code> values. Timestamp values
+        are accepted in the format <code class="ph codeph">"yyyy-MM-dd HH:mm:ss.SSSSSS"</code>,
+        and can consist of just the date, or just the time, with or without the
+        fractional second portion. For example, you can specify <code class="ph codeph">TIMESTAMP</code>
+        values such as <code class="ph codeph">'1966-07-30'</code>, <code class="ph codeph">'08:30:00'</code>,
+        or <code class="ph codeph">'1985-09-25 17:45:30.005'</code>.
+      </p>
+
+      <p class="p">
+        You can also use <code class="ph codeph">INTERVAL</code> expressions to add or subtract from timestamp literal values,
+        such as <code class="ph codeph">CAST('1966-07-30' AS TIMESTAMP) + INTERVAL 5 YEARS + INTERVAL 3 DAYS</code>. See
+        <a class="xref" href="impala_timestamp.html#timestamp">TIMESTAMP Data Type</a> for details.
+      </p>
+
+      <p class="p">
+        Depending on your data pipeline, you might receive date and time data as text, in notation that does not
+        exactly match the format for Impala <code class="ph codeph">TIMESTAMP</code> literals.
+        See <a class="xref" href="impala_datetime_functions.html#datetime_functions">Impala Date and Time Functions</a> for functions that can convert
+        between a variety of string literals (including different field order, separators, and timezone notation)
+        and equivalent <code class="ph codeph">TIMESTAMP</code> or numeric values.
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title6" id="literals__null">
+
+    <h2 class="title topictitle2" id="ariaid-title6">NULL</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+
+        The notion of <code class="ph codeph">NULL</code> values is familiar from all kinds of database systems, but each SQL
+        dialect can have its own behavior and restrictions on <code class="ph codeph">NULL</code> values. For Big Data
+        processing, the precise semantics of <code class="ph codeph">NULL</code> values are significant: any misunderstanding
+        could lead to inaccurate results or misformatted data, that could be time-consuming to correct for large
+        data sets.
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          <code class="ph codeph">NULL</code> is a different value than an empty string. The empty string is represented by a
+          string literal with nothing inside, <code class="ph codeph">""</code> or <code class="ph codeph">''</code>.
+        </li>
+
+        <li class="li">
+          In a delimited text file, the <code class="ph codeph">NULL</code> value is represented by the special token
+          <code class="ph codeph">\N</code>.
+        </li>
+
+        <li class="li">
+          When Impala inserts data into a partitioned table, and the value of one of the partitioning columns is
+          <code class="ph codeph">NULL</code> or the empty string, the data is placed in a special partition that holds only
+          these two kinds of values. When these values are returned in a query, the result is <code class="ph codeph">NULL</code>
+          whether the value was originally <code class="ph codeph">NULL</code> or an empty string. This behavior is compatible
+          with the way Hive treats <code class="ph codeph">NULL</code> values in partitioned tables. Hive does not allow empty
+          strings as partition keys, and it returns a string value such as
+          <code class="ph codeph">__HIVE_DEFAULT_PARTITION__</code> instead of <code class="ph codeph">NULL</code> when such values are
+          returned from a query. For example:
+<pre class="pre codeblock"><code>create table t1 (i int) partitioned by (x int, y string);
+-- Select an INT column from another table, with all rows going into a special HDFS subdirectory
+-- named __HIVE_DEFAULT_PARTITION__. Depending on whether one or both of the partitioning keys
+-- are null, this special directory name occurs at different levels of the physical data directory
+-- for the table.
+insert into t1 partition(x=NULL, y=NULL) select c1 from some_other_table;
+insert into t1 partition(x, y=NULL) select c1, c2 from some_other_table;
+insert into t1 partition(x=NULL, y) select c1, c3  from some_other_table;</code></pre>
+        </li>
+
+        <li class="li">
+          There is no <code class="ph codeph">NOT NULL</code> clause when defining a column to prevent <code class="ph codeph">NULL</code>
+          values in that column.
+        </li>
+
+        <li class="li">
+          There is no <code class="ph codeph">DEFAULT</code> clause to specify a non-<code class="ph codeph">NULL</code> default value.
+        </li>
+
+        <li class="li">
+          If an <code class="ph codeph">INSERT</code> operation mentions some columns but not others, the unmentioned columns
+          contain <code class="ph codeph">NULL</code> for all inserted rows.
+        </li>
+
+        <li class="li">
+          <p class="p">
+        In Impala 1.2.1 and higher, all <code class="ph codeph">NULL</code> values come at the end of the result set for
+        <code class="ph codeph">ORDER BY ... ASC</code> queries, and at the beginning of the result set for <code class="ph codeph">ORDER BY ...
+        DESC</code> queries. In effect, <code class="ph codeph">NULL</code> is considered greater than all other values for
+        sorting purposes. The original Impala behavior always put <code class="ph codeph">NULL</code> values at the end, even for
+        <code class="ph codeph">ORDER BY ... DESC</code> queries. The new behavior in Impala 1.2.1 makes Impala more compatible
+        with other popular database systems. In Impala 1.2.1 and higher, you can override or specify the sorting
+        behavior for <code class="ph codeph">NULL</code> by adding the clause <code class="ph codeph">NULLS FIRST</code> or <code class="ph codeph">NULLS
+        LAST</code> at the end of the <code class="ph codeph">ORDER BY</code> clause.
+      </p>
+          <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+
+            Because the <code class="ph codeph">NULLS FIRST</code> and <code class="ph codeph">NULLS LAST</code> keywords are not currently
+            available in Hive queries, any views you create using those keywords will not be available through
+            Hive.
+          </div>
+        </li>
+
+        <li class="li">
+          In all other contexts besides sorting with <code class="ph codeph">ORDER BY</code>, comparing a <code class="ph codeph">NULL</code>
+          to anything else returns <code class="ph codeph">NULL</code>, making the comparison meaningless. For example,
+          <code class="ph codeph">10 &gt; NULL</code> produces <code class="ph codeph">NULL</code>, <code class="ph codeph">10 &lt; NULL</code> also produces
+          <code class="ph codeph">NULL</code>, <code class="ph codeph">5 BETWEEN 1 AND NULL</code> produces <code class="ph codeph">NULL</code>, and so on.
+        </li>
+      </ul>
+
+      <p class="p">
+        Several built-in functions serve as shorthand for evaluating expressions and returning
+        <code class="ph codeph">NULL</code>, 0, or some other substitution value depending on the expression result:
+        <code class="ph codeph">ifnull()</code>, <code class="ph codeph">isnull()</code>, <code class="ph codeph">nvl()</code>, <code class="ph codeph">nullif()</code>,
+        <code class="ph codeph">nullifzero()</code>, and <code class="ph codeph">zeroifnull()</code>. See
+        <a class="xref" href="impala_conditional_functions.html#conditional_functions">Impala Conditional Functions</a> for details.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Kudu considerations:</strong>
+      </p>
+      <p class="p">
+        Columns in Kudu tables have an attribute that specifies whether or not they can contain
+        <code class="ph codeph">NULL</code> values. A column with a <code class="ph codeph">NULL</code> attribute can contain
+        nulls. A column with a <code class="ph codeph">NOT NULL</code> attribute cannot contain any nulls, and
+        an <code class="ph codeph">INSERT</code>, <code class="ph codeph">UPDATE</code>, or <code class="ph codeph">UPSERT</code> statement
+        will skip any row that attempts to store a null in a column designated as <code class="ph codeph">NOT NULL</code>.
+        Kudu tables default to the <code class="ph codeph">NULL</code> setting for each column, except columns that
+        are part of the primary key.
+      </p>
+      <p class="p">
+        In addition to columns with the <code class="ph codeph">NOT NULL</code> attribute, Kudu tables also have
+        restrictions on <code class="ph codeph">NULL</code> values in columns that are part of the primary key for
+        a table. No column that is part of the primary key in a Kudu table can contain any
+        <code class="ph codeph">NULL</code> values.
+      </p>
+
+    </div>
+  </article>
+</article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_live_progress.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_live_progress.html b/docs/build3x/html/topics/impala_live_progress.html
new file mode 100644
index 0000000..bce7807
--- /dev/null
+++ b/docs/build3x/html/topics/impala_live_progress.html
@@ -0,0 +1,131 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="live_progress"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>LIVE_PROGRESS Query Option (Impala 2.3 or higher only)</title></head><body id="live_progress"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">LIVE_PROGRESS Query Option (<span class="keyword">Impala 2.3</span> or higher only)</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+
+      For queries submitted through the <span class="keyword cmdname">impala-shell</span> command,
+      displays an interactive progress bar showing roughly what percentage of
+      processing has been completed. When the query finishes, the progress bar is erased
+      from the <span class="keyword cmdname">impala-shell</span> console output.
+    </p>
+
+    <p class="p">
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Type:</strong> Boolean; recognized values are 1 and 0, or <code class="ph codeph">true</code> and <code class="ph codeph">false</code>;
+        any other value interpreted as <code class="ph codeph">false</code>
+      </p>
+    <p class="p">
+        <strong class="ph b">Default:</strong> <code class="ph codeph">false</code> (shown as 0 in output of <code class="ph codeph">SET</code> statement)
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Command-line equivalent:</strong>
+      </p>
+    <p class="p">
+      You can enable this query option within <span class="keyword cmdname">impala-shell</span>
+      by starting the shell with the <code class="ph codeph">--live_progress</code>
+      command-line option.
+      You can still turn this setting off and on again within the shell through the
+      <code class="ph codeph">SET</code> command.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+    <p class="p">
+        The output from this query option is printed to standard error. The output is only displayed in interactive mode,
+        that is, not when the <code class="ph codeph">-q</code> or <code class="ph codeph">-f</code> options are used.
+      </p>
+    <p class="p">
+      For a more detailed way of tracking the progress of an interactive query through
+      all phases of processing, see <a class="xref" href="impala_live_summary.html#live_summary">LIVE_SUMMARY Query Option (Impala 2.3 or higher only)</a>.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Restrictions:</strong>
+      </p>
+    <p class="p">
+      Because the percentage complete figure is calculated using the number of
+      issued and completed <span class="q">"scan ranges"</span>, which occur while reading the table
+      data, the progress bar might reach 100% before the query is entirely finished.
+      For example, the query might do work to perform aggregations after all the
+      table data has been read. If many of your queries fall into this category,
+      consider using the <code class="ph codeph">LIVE_SUMMARY</code> option instead for
+      more granular progress reporting.
+    </p>
+    <p class="p">
+        The <code class="ph codeph">LIVE_PROGRESS</code> and <code class="ph codeph">LIVE_SUMMARY</code> query options
+        currently do not produce any output during <code class="ph codeph">COMPUTE STATS</code> operations.
+      </p>
+    <div class="p">
+        Because the <code class="ph codeph">LIVE_PROGRESS</code> and <code class="ph codeph">LIVE_SUMMARY</code> query options
+        are available only within the <span class="keyword cmdname">impala-shell</span> interpreter:
+        <ul class="ul">
+          <li class="li">
+            <p class="p">
+              You cannot change these query options through the SQL <code class="ph codeph">SET</code>
+              statement using the JDBC or ODBC interfaces. The <code class="ph codeph">SET</code>
+              command in <span class="keyword cmdname">impala-shell</span> recognizes these names as
+              shell-only options.
+            </p>
+          </li>
+          <li class="li">
+            <p class="p">
+              Be careful when using <span class="keyword cmdname">impala-shell</span> on a pre-<span class="keyword">Impala 2.3</span>
+              system to connect to a system running <span class="keyword">Impala 2.3</span> or higher.
+              The older <span class="keyword cmdname">impala-shell</span> does not recognize these
+              query option names. Upgrade <span class="keyword cmdname">impala-shell</span> on the
+              systems where you intend to use these query options.
+            </p>
+          </li>
+          <li class="li">
+            <p class="p">
+              Likewise, the <span class="keyword cmdname">impala-shell</span> command relies on
+              some information only available in <span class="keyword">Impala 2.3</span> and higher
+              to prepare live progress reports and query summaries. The
+              <code class="ph codeph">LIVE_PROGRESS</code> and <code class="ph codeph">LIVE_SUMMARY</code>
+              query options have no effect when <span class="keyword cmdname">impala-shell</span> connects
+              to a cluster running an older version of Impala.
+            </p>
+          </li>
+        </ul>
+      </div>
+
+    <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.3.0</span>
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+<pre class="pre codeblock"><code>[localhost:21000] &gt; set live_progress=true;
+LIVE_PROGRESS set to true
+[localhost:21000] &gt; select count(*) from customer;
++----------+
+| count(*) |
++----------+
+| 150000   |
++----------+
+[localhost:21000] &gt; select count(*) from customer t1 cross join customer t2;
+[###################################                                   ] 50%
+[######################################################################] 100%
+
+
+</code></pre>
+
+    <p class="p">
+        To see how the <code class="ph codeph">LIVE_PROGRESS</code> and <code class="ph codeph">LIVE_SUMMARY</code> query options
+        work in real time, see <a class="xref" href="https://asciinema.org/a/1rv7qippo0fe7h5k1b6k4nexk" target="_blank">this animated demo</a>.
+      </p>
+
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>

[35/51] [partial] impala git commit: [DOCS] Impala doc site update for 3.0

Posted by mi...@apache.org.

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_describe.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_describe.html b/docs/build3x/html/topics/impala_describe.html
new file mode 100644
index 0000000..5c4edf9
--- /dev/null
+++ b/docs/build3x/html/topics/impala_describe.html
@@ -0,0 +1,817 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_langref_sql.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="describe"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>DESCRIBE Statement</title></head><body id="describe"><main role="main"><article role="article" aria-labelledby="describe__desc">
+
+  <h1 class="title topictitle1" id="describe__desc">DESCRIBE Statement</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+
+      The <code class="ph codeph">DESCRIBE</code> statement displays metadata about a table, such as the column names and their
+      data types.
+      <span class="ph">In <span class="keyword">Impala 2.3</span> and higher, you can specify the name of a complex type column, which takes
+      the form of a dotted path. The path might include multiple components in the case of a nested type definition.</span>
+      <span class="ph">In <span class="keyword">Impala 2.5</span> and higher, the <code class="ph codeph">DESCRIBE DATABASE</code> form can display
+      information about a database.</span>
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>DESCRIBE [DATABASE] [FORMATTED|EXTENDED] <var class="keyword varname">object_name</var>
+
+object_name ::=
+    [<var class="keyword varname">db_name</var>.]<var class="keyword varname">table_name</var>[.<var class="keyword varname">complex_col_name</var> ...]
+  | <var class="keyword varname">db_name</var>
+</code></pre>
+
+    <p class="p">
+      You can use the abbreviation <code class="ph codeph">DESC</code> for the <code class="ph codeph">DESCRIBE</code> statement.
+    </p>
+
+    <p class="p">
+      The <code class="ph codeph">DESCRIBE FORMATTED</code> variation displays additional information, in a format familiar to
+      users of Apache Hive. The extra information includes low-level details such as whether the table is internal
+      or external, when it was created, the file format, the location of the data in HDFS, whether the object is a
+      table or a view, and (for views) the text of the query from the view definition.
+    </p>
+
+    <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+      The <code class="ph codeph">Compressed</code> field is not a reliable indicator of whether the table contains compressed
+      data. It typically always shows <code class="ph codeph">No</code>, because the compression settings only apply during the
+      session that loads data and are not stored persistently with the table metadata.
+    </div>
+
+<p class="p">
+  <strong class="ph b">Describing databases:</strong>
+</p>
+
+<p class="p">
+  By default, the <code class="ph codeph">DESCRIBE</code> output for a database includes the location
+  and the comment, which can be set by the <code class="ph codeph">LOCATION</code> and <code class="ph codeph">COMMENT</code>
+  clauses on the <code class="ph codeph">CREATE DATABASE</code> statement.
+</p>
+
+<p class="p">
+  The additional information displayed by the <code class="ph codeph">FORMATTED</code> or <code class="ph codeph">EXTENDED</code>
+  keyword includes the HDFS user ID that is considered the owner of the database, and any
+  optional database properties. The properties could be specified by the <code class="ph codeph">WITH DBPROPERTIES</code>
+  clause if the database is created using a Hive <code class="ph codeph">CREATE DATABASE</code> statement.
+  Impala currently does not set or do any special processing based on those properties.
+</p>
+
+<p class="p">
+The following examples show the variations in syntax and output for
+describing databases. This feature is available in <span class="keyword">Impala 2.5</span>
+and higher.
+</p>
+
+<pre class="pre codeblock"><code>
+describe database default;
++---------+----------------------+-----------------------+
+| name    | location             | comment               |
++---------+----------------------+-----------------------+
+| default | /user/hive/warehouse | Default Hive database |
++---------+----------------------+-----------------------+
+
+describe database formatted default;
++---------+----------------------+-----------------------+
+| name    | location             | comment               |
++---------+----------------------+-----------------------+
+| default | /user/hive/warehouse | Default Hive database |
+| Owner:  |                      |                       |
+|         | public               | ROLE                  |
++---------+----------------------+-----------------------+
+
+describe database extended default;
++---------+----------------------+-----------------------+
+| name    | location             | comment               |
++---------+----------------------+-----------------------+
+| default | /user/hive/warehouse | Default Hive database |
+| Owner:  |                      |                       |
+|         | public               | ROLE                  |
++---------+----------------------+-----------------------+
+</code></pre>
+
+<p class="p">
+  <strong class="ph b">Describing tables:</strong>
+</p>
+
+<p class="p">
+  If the <code class="ph codeph">DATABASE</code> keyword is omitted, the default
+  for the <code class="ph codeph">DESCRIBE</code> statement is to refer to a table.
+</p>
+    <p class="p">
+      If you have the <code class="ph codeph">SELECT</code> privilege on a subset of the table
+      columns and no other relevant table/database/server-level privileges,
+      <code class="ph codeph">DESCRIBE</code> returns the data from the columns you have
+      access to.
+    </p>
+
+    <p class="p">
+      If you have the <code class="ph codeph">SELECT</code> privilege on a subset of the table
+      columns and no other relevant table/database/server-level privileges,
+      <code class="ph codeph">DESCRIBE FORMATTED/EXTENDED</code> does not return
+      the <code class="ph codeph">LOCATION</code> field. The <code class="ph codeph">LOCATION</code> data
+      is shown if you have any privilege on the table, the containing database
+      or the server.
+    </p>
+
+<pre class="pre codeblock"><code>
+-- By default, the table is assumed to be in the current database.
+describe my_table;
++------+--------+---------+
+| name | type   | comment |
++------+--------+---------+
+| x    | int    |         |
+| s    | string |         |
++------+--------+---------+
+
+-- Use a fully qualified table name to specify a table in any database.
+describe my_database.my_table;
++------+--------+---------+
+| name | type   | comment |
++------+--------+---------+
+| x    | int    |         |
+| s    | string |         |
++------+--------+---------+
+
+-- The formatted or extended output includes additional useful information.
+-- The LOCATION field is especially useful to know for DDL statements and HDFS commands
+-- during ETL jobs. (The LOCATION includes a full hdfs:// URL, omitted here for readability.)
+describe formatted my_table;
++------------------------------+----------------------------------------------+----------------------+
+| name                         | type                                         | comment              |
++------------------------------+----------------------------------------------+----------------------+
+| # col_name                   | data_type                                    | comment              |
+|                              | NULL                                         | NULL                 |
+| x                            | int                                          | NULL                 |
+| s                            | string                                       | NULL                 |
+|                              | NULL                                         | NULL                 |
+| # Detailed Table Information | NULL                                         | NULL                 |
+| Database:                    | my_database                                  | NULL                 |
+| Owner:                       | jrussell                                     | NULL                 |
+| CreateTime:                  | Fri Mar 18 15:58:00 PDT 2016                 | NULL                 |
+| LastAccessTime:              | UNKNOWN                                      | NULL                 |
+| Protect Mode:                | None                                         | NULL                 |
+| Retention:                   | 0                                            | NULL                 |
+| Location:                    | /user/hive/warehouse/my_database.db/my_table | NULL                 |
+| Table Type:                  | MANAGED_TABLE                                | NULL                 |
+| Table Parameters:            | NULL                                         | NULL                 |
+|                              | transient_lastDdlTime                        | 1458341880           |
+|                              | NULL                                         | NULL                 |
+| # Storage Information        | NULL                                         | NULL                 |
+| SerDe Library:               | org. ... .LazySimpleSerDe                    | NULL                 |
+| InputFormat:                 | org.apache.hadoop.mapred.TextInputFormat     | NULL                 |
+| OutputFormat:                | org. ... .HiveIgnoreKeyTextOutputFormat      | NULL                 |
+| Compressed:                  | No                                           | NULL                 |
+| Num Buckets:                 | 0                                            | NULL                 |
+| Bucket Columns:              | []                                           | NULL                 |
+| Sort Columns:                | []                                           | NULL                 |
++------------------------------+----------------------------------------------+----------------------+
+</code></pre>
+
+    <p class="p">
+        <strong class="ph b">Complex type considerations:</strong>
+      </p>
+
+    <p class="p">
+      Because the column definitions for complex types can become long, particularly when such types are nested,
+      the <code class="ph codeph">DESCRIBE</code> statement uses special formatting for complex type columns to make the output readable.
+    </p>
+
+    <p class="p">
+      For the <code class="ph codeph">ARRAY</code>, <code class="ph codeph">STRUCT</code>, and <code class="ph codeph">MAP</code> types available in
+      <span class="keyword">Impala 2.3</span> and higher, the <code class="ph codeph">DESCRIBE</code> output is formatted to avoid
+      excessively long lines for multiple fields within a <code class="ph codeph">STRUCT</code>, or a nested sequence of
+      complex types.
+    </p>
+
+    <p class="p">
+        You can pass a multi-part qualified name to <code class="ph codeph">DESCRIBE</code>
+        to specify an <code class="ph codeph">ARRAY</code>, <code class="ph codeph">STRUCT</code>, or <code class="ph codeph">MAP</code>
+        column and visualize its structure as if it were a table.
+        For example, if table <code class="ph codeph">T1</code> contains an <code class="ph codeph">ARRAY</code> column
+        <code class="ph codeph">A1</code>, you could issue the statement <code class="ph codeph">DESCRIBE t1.a1</code>.
+        If table <code class="ph codeph">T1</code> contained a <code class="ph codeph">STRUCT</code> column <code class="ph codeph">S1</code>,
+        and a field <code class="ph codeph">F1</code> within the <code class="ph codeph">STRUCT</code> was a <code class="ph codeph">MAP</code>,
+        you could issue the statement <code class="ph codeph">DESCRIBE t1.s1.f1</code>.
+        An <code class="ph codeph">ARRAY</code> is shown as a two-column table, with
+        <code class="ph codeph">ITEM</code> and <code class="ph codeph">POS</code> columns.
+        A <code class="ph codeph">STRUCT</code> is shown as a table with each field
+        representing a column in the table.
+        A <code class="ph codeph">MAP</code> is shown as a two-column table, with
+        <code class="ph codeph">KEY</code> and <code class="ph codeph">VALUE</code> columns.
+      </p>
+
+    <p class="p">
+      For example, here is the <code class="ph codeph">DESCRIBE</code> output for a table containing a single top-level column
+      of each complex type:
+    </p>
+
+<pre class="pre codeblock"><code>create table t1 (x int, a array&lt;int&gt;, s struct&lt;f1: string, f2: bigint&gt;, m map&lt;string,int&gt;) stored as parquet;
+
+describe t1;
++------+-----------------+---------+
+| name | type            | comment |
++------+-----------------+---------+
+| x    | int             |         |
+| a    | array&lt;int&gt;      |         |
+| s    | struct&lt;         |         |
+|      |   f1:string,    |         |
+|      |   f2:bigint     |         |
+|      | &gt;               |         |
+| m    | map&lt;string,int&gt; |         |
++------+-----------------+---------+
+
+</code></pre>
+
+    <p class="p">
+      Here are examples showing how to <span class="q">"drill down"</span> into the layouts of complex types, including
+      using multi-part names to examine the definitions of nested types.
+      The <code class="ph codeph">&lt; &gt;</code> delimiters identify the columns with complex types;
+      these are the columns where you can descend another level to see the parts that make up
+      the complex type.
+      This technique helps you to understand the multi-part names you use as table references in queries
+      involving complex types, and the corresponding column names you refer to in the <code class="ph codeph">SELECT</code> list.
+      These tables are from the <span class="q">"nested TPC-H"</span> schema, shown in detail in
+      <a class="xref" href="impala_complex_types.html#complex_sample_schema">Sample Schema and Data for Experimenting with Impala Complex Types</a>.
+    </p>
+
+    <p class="p">
+      The <code class="ph codeph">REGION</code> table contains an <code class="ph codeph">ARRAY</code> of <code class="ph codeph">STRUCT</code>
+      elements:
+    </p>
+
+    <ul class="ul">
+      <li class="li">
+        <p class="p">
+          The first <code class="ph codeph">DESCRIBE</code> specifies the table name, to display the definition
+          of each top-level column.
+        </p>
+      </li>
+      <li class="li">
+        <p class="p">
+          The second <code class="ph codeph">DESCRIBE</code> specifies the name of a complex
+          column, <code class="ph codeph">REGION.R_NATIONS</code>, showing that when you include the name of an <code class="ph codeph">ARRAY</code>
+          column in a <code class="ph codeph">FROM</code> clause, that table reference acts like a two-column table with
+          columns <code class="ph codeph">ITEM</code> and <code class="ph codeph">POS</code>.
+        </p>
+      </li>
+      <li class="li">
+        <p class="p">
+          The final <code class="ph codeph">DESCRIBE</code> specifies the fully qualified name of the <code class="ph codeph">ITEM</code> field,
+          to display the layout of its underlying <code class="ph codeph">STRUCT</code> type in table format, with the fields
+          mapped to column names.
+        </p>
+      </li>
+    </ul>
+
+<pre class="pre codeblock"><code>
+-- #1: The overall layout of the entire table.
+describe region;
++-------------+-------------------------+---------+
+| name        | type                    | comment |
++-------------+-------------------------+---------+
+| r_regionkey | smallint                |         |
+| r_name      | string                  |         |
+| r_comment   | string                  |         |
+| r_nations   | array&lt;struct&lt;           |         |
+|             |   n_nationkey:smallint, |         |
+|             |   n_name:string,        |         |
+|             |   n_comment:string      |         |
+|             | &gt;&gt;                      |         |
++-------------+-------------------------+---------+
+
+-- #2: The ARRAY column within the table.
+describe region.r_nations;
++------+-------------------------+---------+
+| name | type                    | comment |
++------+-------------------------+---------+
+| item | struct&lt;                 |         |
+|      |   n_nationkey:smallint, |         |
+|      |   n_name:string,        |         |
+|      |   n_comment:string      |         |
+|      | &gt;                       |         |
+| pos  | bigint                  |         |
++------+-------------------------+---------+
+
+-- #3: The STRUCT that makes up each ARRAY element.
+--     The fields of the STRUCT act like columns of a table.
+describe region.r_nations.item;
++-------------+----------+---------+
+| name        | type     | comment |
++-------------+----------+---------+
+| n_nationkey | smallint |         |
+| n_name      | string   |         |
+| n_comment   | string   |         |
++-------------+----------+---------+
+
+</code></pre>
+
+    <p class="p">
+      The <code class="ph codeph">CUSTOMER</code> table contains an <code class="ph codeph">ARRAY</code> of <code class="ph codeph">STRUCT</code>
+      elements, where one field in the <code class="ph codeph">STRUCT</code> is another <code class="ph codeph">ARRAY</code> of
+      <code class="ph codeph">STRUCT</code> elements:
+    </p>
+    <ul class="ul">
+      <li class="li">
+        <p class="p">
+          Again, the initial <code class="ph codeph">DESCRIBE</code> specifies only the table name.
+        </p>
+      </li>
+      <li class="li">
+        <p class="p">
+          The second <code class="ph codeph">DESCRIBE</code> specifies the qualified name of the complex
+          column, <code class="ph codeph">CUSTOMER.C_ORDERS</code>, showing how an <code class="ph codeph">ARRAY</code>
+          is represented as a two-column table with columns <code class="ph codeph">ITEM</code> and <code class="ph codeph">POS</code>.
+        </p>
+      </li>
+      <li class="li">
+        <p class="p">
+          The third <code class="ph codeph">DESCRIBE</code> specifies the qualified name of the <code class="ph codeph">ITEM</code>
+          of the <code class="ph codeph">ARRAY</code> column, to see the structure of the nested <code class="ph codeph">ARRAY</code>.
+          Again, it has has two parts, <code class="ph codeph">ITEM</code> and <code class="ph codeph">POS</code>. Because the
+          <code class="ph codeph">ARRAY</code> contains a <code class="ph codeph">STRUCT</code>, the layout of the <code class="ph codeph">STRUCT</code>
+          is shown.
+        </p>
+      </li>
+      <li class="li">
+        <p class="p">
+          The fourth and fifth <code class="ph codeph">DESCRIBE</code> statements drill down into a <code class="ph codeph">STRUCT</code> field that
+          is itself a complex type, an <code class="ph codeph">ARRAY</code> of <code class="ph codeph">STRUCT</code>.
+          The <code class="ph codeph">ITEM</code> portion of the qualified name is only required when the <code class="ph codeph">ARRAY</code>
+          elements are anonymous. The fields of the <code class="ph codeph">STRUCT</code> give names to any other complex types
+          nested inside the <code class="ph codeph">STRUCT</code>. Therefore, the <code class="ph codeph">DESCRIBE</code> parameters
+          <code class="ph codeph">CUSTOMER.C_ORDERS.ITEM.O_LINEITEMS</code> and <code class="ph codeph">CUSTOMER.C_ORDERS.O_LINEITEMS</code>
+          are equivalent. (For brevity, leave out the <code class="ph codeph">ITEM</code> portion of
+          a qualified name when it is not required.)
+        </p>
+      </li>
+      <li class="li">
+        <p class="p">
+          The final <code class="ph codeph">DESCRIBE</code> shows the layout of the deeply nested <code class="ph codeph">STRUCT</code> type.
+          Because there are no more complex types nested inside this <code class="ph codeph">STRUCT</code>, this is as far
+          as you can drill down into the layout for this table.
+        </p>
+      </li>
+    </ul>
+
+<pre class="pre codeblock"><code>-- #1: The overall layout of the entire table.
+describe customer;
++--------------+------------------------------------+
+| name         | type                               |
++--------------+------------------------------------+
+| c_custkey    | bigint                             |
+... more scalar columns ...
+| c_orders     | array&lt;struct&lt;                      |
+|              |   o_orderkey:bigint,               |
+|              |   o_orderstatus:string,            |
+|              |   o_totalprice:decimal(12,2),      |
+|              |   o_orderdate:string,              |
+|              |   o_orderpriority:string,          |
+|              |   o_clerk:string,                  |
+|              |   o_shippriority:int,              |
+|              |   o_comment:string,                |
+|              |   o_lineitems:array&lt;struct&lt;        |
+|              |     l_partkey:bigint,              |
+|              |     l_suppkey:bigint,              |
+|              |     l_linenumber:int,              |
+|              |     l_quantity:decimal(12,2),      |
+|              |     l_extendedprice:decimal(12,2), |
+|              |     l_discount:decimal(12,2),      |
+|              |     l_tax:decimal(12,2),           |
+|              |     l_returnflag:string,           |
+|              |     l_linestatus:string,           |
+|              |     l_shipdate:string,             |
+|              |     l_commitdate:string,           |
+|              |     l_receiptdate:string,          |
+|              |     l_shipinstruct:string,         |
+|              |     l_shipmode:string,             |
+|              |     l_comment:string               |
+|              |   &gt;&gt;                               |
+|              | &gt;&gt;                                 |
++--------------+------------------------------------+
+
+-- #2: The ARRAY column within the table.
+describe customer.c_orders;
++------+------------------------------------+
+| name | type                               |
++------+------------------------------------+
+| item | struct&lt;                            |
+|      |   o_orderkey:bigint,               |
+|      |   o_orderstatus:string,            |
+... more struct fields ...
+|      |   o_lineitems:array&lt;struct&lt;        |
+|      |     l_partkey:bigint,              |
+|      |     l_suppkey:bigint,              |
+... more nested struct fields ...
+|      |     l_comment:string               |
+|      |   &gt;&gt;                               |
+|      | &gt;                                  |
+| pos  | bigint                             |
++------+------------------------------------+
+
+-- #3: The STRUCT that makes up each ARRAY element.
+--     The fields of the STRUCT act like columns of a table.
+describe customer.c_orders.item;
++-----------------+----------------------------------+
+| name            | type                             |
++-----------------+----------------------------------+
+| o_orderkey      | bigint                           |
+| o_orderstatus   | string                           |
+| o_totalprice    | decimal(12,2)                    |
+| o_orderdate     | string                           |
+| o_orderpriority | string                           |
+| o_clerk         | string                           |
+| o_shippriority  | int                              |
+| o_comment       | string                           |
+| o_lineitems     | array&lt;struct&lt;                    |
+|                 |   l_partkey:bigint,              |
+|                 |   l_suppkey:bigint,              |
+... more struct fields ...
+|                 |   l_comment:string               |
+|                 | &gt;&gt;                               |
++-----------------+----------------------------------+
+
+-- #4: The ARRAY nested inside the STRUCT elements of the first ARRAY.
+describe customer.c_orders.item.o_lineitems;
++------+----------------------------------+
+| name | type                             |
++------+----------------------------------+
+| item | struct&lt;                          |
+|      |   l_partkey:bigint,              |
+|      |   l_suppkey:bigint,              |
+... more struct fields ...
+|      |   l_comment:string               |
+|      | &gt;                                |
+| pos  | bigint                           |
++------+----------------------------------+
+
+-- #5: Shorter form of the previous DESCRIBE. Omits the .ITEM portion of the name
+--     because O_LINEITEMS and other field names provide a way to refer to things
+--     inside the ARRAY element.
+describe customer.c_orders.o_lineitems;
++------+----------------------------------+
+| name | type                             |
++------+----------------------------------+
+| item | struct&lt;                          |
+|      |   l_partkey:bigint,              |
+|      |   l_suppkey:bigint,              |
+... more struct fields ...
+|      |   l_comment:string               |
+|      | &gt;                                |
+| pos  | bigint                           |
++------+----------------------------------+
+
+-- #6: The STRUCT representing ARRAY elements nested inside
+--     another ARRAY of STRUCTs. The lack of any complex types
+--     in this output means this is as far as DESCRIBE can
+--     descend into the table layout.
+describe customer.c_orders.o_lineitems.item;
++-----------------+---------------+
+| name            | type          |
++-----------------+---------------+
+| l_partkey       | bigint        |
+| l_suppkey       | bigint        |
+... more scalar columns ...
+| l_comment       | string        |
++-----------------+---------------+
+
+</code></pre>
+
+<p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+<p class="p">
+  After the <span class="keyword cmdname">impalad</span> daemons are restarted, the first query against a table can take longer
+  than subsequent queries, because the metadata for the table is loaded before the query is processed. This
+  one-time delay for each table can cause misleading results in benchmark tests or cause unnecessary concern.
+  To <span class="q">"warm up"</span> the Impala metadata cache, you can issue a <code class="ph codeph">DESCRIBE</code> statement in advance
+  for each table you intend to access later.
+</p>
+
+<p class="p">
+  When you are dealing with data files stored in HDFS, sometimes it is important to know details such as the
+  path of the data files for an Impala table, and the hostname for the namenode. You can get this information
+  from the <code class="ph codeph">DESCRIBE FORMATTED</code> output. You specify HDFS URIs or path specifications with
+  statements such as <code class="ph codeph">LOAD DATA</code> and the <code class="ph codeph">LOCATION</code> clause of <code class="ph codeph">CREATE
+  TABLE</code> or <code class="ph codeph">ALTER TABLE</code>. You might also use HDFS URIs or paths with Linux commands
+  such as <span class="keyword cmdname">hadoop</span> and <span class="keyword cmdname">hdfs</span> to copy, rename, and so on, data files in HDFS.
+</p>
+
+<p class="p">
+        If you connect to different Impala nodes within an <span class="keyword cmdname">impala-shell</span> session for
+        load-balancing purposes, you can enable the <code class="ph codeph">SYNC_DDL</code> query option to make each DDL
+        statement wait before returning, until the new or changed metadata has been received by all the Impala
+        nodes. See <a class="xref" href="../shared/../topics/impala_sync_ddl.html#sync_ddl">SYNC_DDL Query Option</a> for details.
+      </p>
+
+<p class="p">
+  Each table can also have associated table statistics and column statistics. To see these categories of
+  information, use the <code class="ph codeph">SHOW TABLE STATS <var class="keyword varname">table_name</var></code> and <code class="ph codeph">SHOW COLUMN
+  STATS <var class="keyword varname">table_name</var></code> statements.
+
+  See <a class="xref" href="impala_show.html#show">SHOW Statement</a> for details.
+</p>
+
+<div class="note important note_important"><span class="note__title importanttitle">Important:</span>
+        After adding or replacing data in a table used in performance-critical queries, issue a <code class="ph codeph">COMPUTE
+        STATS</code> statement to make sure all statistics are up-to-date. Consider updating statistics for a
+        table after any <code class="ph codeph">INSERT</code>, <code class="ph codeph">LOAD DATA</code>, or <code class="ph codeph">CREATE TABLE AS
+        SELECT</code> statement in Impala, or after loading data through Hive and doing a <code class="ph codeph">REFRESH
+        <var class="keyword varname">table_name</var></code> in Impala. This technique is especially important for tables that
+        are very large, used in join queries, or both.
+      </div>
+
+<p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+<p class="p">
+  The following example shows the results of both a standard <code class="ph codeph">DESCRIBE</code> and <code class="ph codeph">DESCRIBE
+  FORMATTED</code> for different kinds of schema objects:
+</p>
+
+  <ul class="ul">
+    <li class="li">
+      <code class="ph codeph">DESCRIBE</code> for a table or a view returns the name, type, and comment for each of the
+      columns. For a view, if the column value is computed by an expression, the column name is automatically
+      generated as <code class="ph codeph">_c0</code>, <code class="ph codeph">_c1</code>, and so on depending on the ordinal number of the
+      column.
+    </li>
+
+    <li class="li">
+      A table created with no special format or storage clauses is designated as a <code class="ph codeph">MANAGED_TABLE</code>
+      (an <span class="q">"internal table"</span> in Impala terminology). Its data files are stored in an HDFS directory under the
+      default Hive data directory. By default, it uses Text data format.
+    </li>
+
+    <li class="li">
+      A view is designated as <code class="ph codeph">VIRTUAL_VIEW</code> in <code class="ph codeph">DESCRIBE FORMATTED</code> output. Some
+      of its properties are <code class="ph codeph">NULL</code> or blank because they are inherited from the base table. The
+      text of the query that defines the view is part of the <code class="ph codeph">DESCRIBE FORMATTED</code> output.
+    </li>
+
+    <li class="li">
+      A table with additional clauses in the <code class="ph codeph">CREATE TABLE</code> statement has differences in
+      <code class="ph codeph">DESCRIBE FORMATTED</code> output. The output for <code class="ph codeph">T2</code> includes the
+      <code class="ph codeph">EXTERNAL_TABLE</code> keyword because of the <code class="ph codeph">CREATE EXTERNAL TABLE</code> syntax, and
+      different <code class="ph codeph">InputFormat</code> and <code class="ph codeph">OutputFormat</code> fields to reflect the Parquet file
+      format.
+    </li>
+  </ul>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; create table t1 (x int, y int, s string);
+Query: create table t1 (x int, y int, s string)
+[localhost:21000] &gt; describe t1;
+Query: describe t1
+Query finished, fetching results ...
++------+--------+---------+
+| name | type   | comment |
++------+--------+---------+
+| x    | int    |         |
+| y    | int    |         |
+| s    | string |         |
++------+--------+---------+
+Returned 3 row(s) in 0.13s
+[localhost:21000] &gt; describe formatted t1;
+Query: describe formatted t1
+Query finished, fetching results ...
++------------------------------+--------------------------------------------+------------+
+| name                         | type                                       | comment    |
++------------------------------+--------------------------------------------+------------+
+| # col_name                   | data_type                                  | comment    |
+|                              | NULL                                       | NULL       |
+| x                            | int                                        | None       |
+| y                            | int                                        | None       |
+| s                            | string                                     | None       |
+|                              | NULL                                       | NULL       |
+| # Detailed Table Information | NULL                                       | NULL       |
+| Database:                    | describe_formatted                         | NULL       |
+| Owner:                       | doc_demo                                   | NULL       |
+| CreateTime:                  | Mon Jul 22 17:03:16 EDT 2013               | NULL       |
+| LastAccessTime:              | UNKNOWN                                    | NULL       |
+| Protect Mode:                | None                                       | NULL       |
+| Retention:                   | 0                                          | NULL       |
+| Location:                    | hdfs://127.0.0.1:8020/user/hive/warehouse/ |            |
+|                              |   describe_formatted.db/t1                 | NULL       |
+| Table Type:                  | MANAGED_TABLE                              | NULL       |
+| Table Parameters:            | NULL                                       | NULL       |
+|                              | transient_lastDdlTime                      | 1374526996 |
+|                              | NULL                                       | NULL       |
+| # Storage Information        | NULL                                       | NULL       |
+| SerDe Library:               | org.apache.hadoop.hive.serde2.lazy.        |            |
+|                              |   LazySimpleSerDe                          | NULL       |
+| InputFormat:                 | org.apache.hadoop.mapred.TextInputFormat   | NULL       |
+| OutputFormat:                | org.apache.hadoop.hive.ql.io.              |            |
+|                              |   HiveIgnoreKeyTextOutputFormat            | NULL       |
+| Compressed:                  | No                                         | NULL       |
+| Num Buckets:                 | 0                                          | NULL       |
+| Bucket Columns:              | []                                         | NULL       |
+| Sort Columns:                | []                                         | NULL       |
++------------------------------+--------------------------------------------+------------+
+Returned 26 row(s) in 0.03s
+[localhost:21000] &gt; create view v1 as select x, upper(s) from t1;
+Query: create view v1 as select x, upper(s) from t1
+[localhost:21000] &gt; describe v1;
+Query: describe v1
+Query finished, fetching results ...
++------+--------+---------+
+| name | type   | comment |
++------+--------+---------+
+| x    | int    |         |
+| _c1  | string |         |
++------+--------+---------+
+Returned 2 row(s) in 0.10s
+[localhost:21000] &gt; describe formatted v1;
+Query: describe formatted v1
+Query finished, fetching results ...
++------------------------------+------------------------------+----------------------+
+| name                         | type                         | comment              |
++------------------------------+------------------------------+----------------------+
+| # col_name                   | data_type                    | comment              |
+|                              | NULL                         | NULL                 |
+| x                            | int                          | None                 |
+| _c1                          | string                       | None                 |
+|                              | NULL                         | NULL                 |
+| # Detailed Table Information | NULL                         | NULL                 |
+| Database:                    | describe_formatted           | NULL                 |
+| Owner:                       | doc_demo                     | NULL                 |
+| CreateTime:                  | Mon Jul 22 16:56:38 EDT 2013 | NULL                 |
+| LastAccessTime:              | UNKNOWN                      | NULL                 |
+| Protect Mode:                | None                         | NULL                 |
+| Retention:                   | 0                            | NULL                 |
+| Table Type:                  | VIRTUAL_VIEW                 | NULL                 |
+| Table Parameters:            | NULL                         | NULL                 |
+|                              | transient_lastDdlTime        | 1374526598           |
+|                              | NULL                         | NULL                 |
+| # Storage Information        | NULL                         | NULL                 |
+| SerDe Library:               | null                         | NULL                 |
+| InputFormat:                 | null                         | NULL                 |
+| OutputFormat:                | null                         | NULL                 |
+| Compressed:                  | No                           | NULL                 |
+| Num Buckets:                 | 0                            | NULL                 |
+| Bucket Columns:              | []                           | NULL                 |
+| Sort Columns:                | []                           | NULL                 |
+|                              | NULL                         | NULL                 |
+| # View Information           | NULL                         | NULL                 |
+| View Original Text:          | SELECT x, upper(s) FROM t1   | NULL                 |
+| View Expanded Text:          | SELECT x, upper(s) FROM t1   | NULL                 |
++------------------------------+------------------------------+----------------------+
+Returned 28 row(s) in 0.03s
+[localhost:21000] &gt; create external table t2 (x int, y int, s string) stored as parquet location '/user/doc_demo/sample_data';
+[localhost:21000] &gt; describe formatted t2;
+Query: describe formatted t2
+Query finished, fetching results ...
++------------------------------+----------------------------------------------------+------------+
+| name                         | type                                               | comment    |
++------------------------------+----------------------------------------------------+------------+
+| # col_name                   | data_type                                          | comment    |
+|                              | NULL                                               | NULL       |
+| x                            | int                                                | None       |
+| y                            | int                                                | None       |
+| s                            | string                                             | None       |
+|                              | NULL                                               | NULL       |
+| # Detailed Table Information | NULL                                               | NULL       |
+| Database:                    | describe_formatted                                 | NULL       |
+| Owner:                       | doc_demo                                           | NULL       |
+| CreateTime:                  | Mon Jul 22 17:01:47 EDT 2013                       | NULL       |
+| LastAccessTime:              | UNKNOWN                                            | NULL       |
+| Protect Mode:                | None                                               | NULL       |
+| Retention:                   | 0                                                  | NULL       |
+| Location:                    | hdfs://127.0.0.1:8020/user/doc_demo/sample_data    | NULL       |
+| Table Type:                  | EXTERNAL_TABLE                                     | NULL       |
+| Table Parameters:            | NULL                                               | NULL       |
+|                              | EXTERNAL                                           | TRUE       |
+|                              | transient_lastDdlTime                              | 1374526907 |
+|                              | NULL                                               | NULL       |
+| # Storage Information        | NULL                                               | NULL       |
+| SerDe Library:               | org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | NULL       |
+| InputFormat:                 | org.apache.impala.hive.serde.ParquetInputFormat    | NULL       |
+| OutputFormat:                | org.apache.impala.hive.serde.ParquetOutputFormat   | NULL       |
+| Compressed:                  | No                                                 | NULL       |
+| Num Buckets:                 | 0                                                  | NULL       |
+| Bucket Columns:              | []                                                 | NULL       |
+| Sort Columns:                | []                                                 | NULL       |
++------------------------------+----------------------------------------------------+------------+
+Returned 27 row(s) in 0.17s</code></pre>
+
+    <p class="p">
+        <strong class="ph b">Cancellation:</strong> Cannot be cancelled.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">HDFS permissions:</strong>
+      </p>
+    <p class="p">
+      The user ID that the <span class="keyword cmdname">impalad</span> daemon runs under,
+      typically the <code class="ph codeph">impala</code> user, must have read and execute
+      permissions for all directories that are part of the table.
+      (A table could span multiple different HDFS directories if it is partitioned.
+      The directories could be widely scattered because a partition can reside
+      in an arbitrary HDFS directory based on its <code class="ph codeph">LOCATION</code> attribute.)
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Kudu considerations:</strong>
+      </p>
+
+    <p class="p">
+      The information displayed for Kudu tables includes the additional attributes
+      that are only applicable for Kudu tables:
+    </p>
+    <ul class="ul">
+      <li class="li">
+        Whether or not the column is part of the primary key. Every Kudu table
+        has a <code class="ph codeph">true</code> value here for at least one column. There
+        could be multiple <code class="ph codeph">true</code> values, for tables with
+        composite primary keys.
+      </li>
+      <li class="li">
+        Whether or not the column is nullable. Specified by the <code class="ph codeph">NULL</code>
+        or <code class="ph codeph">NOT NULL</code> attributes on the <code class="ph codeph">CREATE TABLE</code> statement.
+        Columns that are part of the primary key are automatically non-nullable.
+      </li>
+      <li class="li">
+        The default value, if any, for the column. Specified by the <code class="ph codeph">DEFAULT</code>
+        attribute on the <code class="ph codeph">CREATE TABLE</code> statement. If the default value is
+        <code class="ph codeph">NULL</code>, that is not indicated in this column. It is implied by
+        <code class="ph codeph">nullable</code> being true and no other default value specified.
+      </li>
+      <li class="li">
+        The encoding used for values in the column. Specified by the <code class="ph codeph">ENCODING</code>
+        attribute on the <code class="ph codeph">CREATE TABLE</code> statement.
+      </li>
+      <li class="li">
+        The compression used for values in the column. Specified by the <code class="ph codeph">COMPRESSION</code>
+        attribute on the <code class="ph codeph">CREATE TABLE</code> statement.
+      </li>
+      <li class="li">
+        The block size (in bytes) used for the underlying Kudu storage layer for the column.
+        Specified by the <code class="ph codeph">BLOCK_SIZE</code> attribute on the <code class="ph codeph">CREATE TABLE</code>
+        statement.
+      </li>
+    </ul>
+
+    <p class="p">
+      The following example shows <code class="ph codeph">DESCRIBE</code> output for a simple Kudu table, with
+      a single-column primary key and all column attributes left with their default values:
+    </p>
+
+<pre class="pre codeblock"><code>
+describe million_rows;
++------+--------+---------+-------------+----------+---------------+---------------+---------------------+------------+
+| name | type   | comment | primary_key | nullable | default_value | encoding      | compression         | block_size |
++------+--------+---------+-------------+----------+---------------+---------------+---------------------+------------+
+| id   | string |         | true        | false    |               | AUTO_ENCODING | DEFAULT_COMPRESSION | 0          |
+| s    | string |         | false       | false    |               | AUTO_ENCODING | DEFAULT_COMPRESSION | 0          |
++------+--------+---------+-------------+----------+---------------+---------------+---------------------+------------+
+</code></pre>
+
+    <p class="p">
+      The following example shows <code class="ph codeph">DESCRIBE</code> output for a Kudu table with a
+      two-column primary key, and Kudu-specific attributes applied to some columns:
+    </p>
+
+<pre class="pre codeblock"><code>
+create table kudu_describe_example
+(
+  c1 int, c2 int,
+  c3 string, c4 string not null, c5 string default 'n/a', c6 string default '',
+  c7 bigint not null, c8 bigint null default null, c9 bigint default -1 encoding bit_shuffle,
+  primary key(c1,c2)
+)
+partition by hash (c1, c2) partitions 10 stored as kudu;
+
+describe kudu_describe_example;
++------+--------+---------+-------------+----------+---------------+---------------+---------------------+------------+
+| name | type   | comment | primary_key | nullable | default_value | encoding      | compression         | block_size |
++------+--------+---------+-------------+----------+---------------+---------------+---------------------+------------+
+| c1   | int    |         | true        | false    |               | AUTO_ENCODING | DEFAULT_COMPRESSION | 0          |
+| c2   | int    |         | true        | false    |               | AUTO_ENCODING | DEFAULT_COMPRESSION | 0          |
+| c3   | string |         | false       | true     |               | AUTO_ENCODING | DEFAULT_COMPRESSION | 0          |
+| c4   | string |         | false       | false    |               | AUTO_ENCODING | DEFAULT_COMPRESSION | 0          |
+| c5   | string |         | false       | true     | n/a           | AUTO_ENCODING | DEFAULT_COMPRESSION | 0          |
+| c6   | string |         | false       | true     |               | AUTO_ENCODING | DEFAULT_COMPRESSION | 0          |
+| c7   | bigint |         | false       | false    |               | AUTO_ENCODING | DEFAULT_COMPRESSION | 0          |
+| c8   | bigint |         | false       | true     |               | AUTO_ENCODING | DEFAULT_COMPRESSION | 0          |
+| c9   | bigint |         | false       | true     | -1            | BIT_SHUFFLE   | DEFAULT_COMPRESSION | 0          |
++------+--------+---------+-------------+----------+---------------+---------------+---------------------+------------+
+</code></pre>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+    <p class="p">
+      <a class="xref" href="impala_tables.html#tables">Overview of Impala Tables</a>, <a class="xref" href="impala_create_table.html#create_table">CREATE TABLE Statement</a>,
+      <a class="xref" href="impala_show.html#show_tables">SHOW TABLES Statement</a>, <a class="xref" href="impala_show.html#show_create_table">SHOW CREATE TABLE Statement</a>
+    </p>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_langref_sql.html">Impala SQL Statements</a></div></div></nav></article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_development.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_development.html b/docs/build3x/html/topics/impala_development.html
new file mode 100644
index 0000000..5b11207
--- /dev/null
+++ b/docs/build3x/html/topics/impala_development.html
@@ -0,0 +1,197 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_concepts.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="intro_dev"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Developing Impala Applications</title></head><body id="intro_dev"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Developing Impala Applications</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+      The core development language with Impala is SQL. You can also use Java or other languages to interact with
+      Impala through the standard JDBC and ODBC interfaces used by many business intelligence tools. For
+      specialized kinds of analysis, you can supplement the SQL built-in functions by writing
+      <a class="xref" href="impala_udf.html#udfs">user-defined functions (UDFs)</a> in C++ or Java.
+    </p>
+
+    <p class="p toc inpage"></p>
+  </div>
+
+  <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_concepts.html">Impala Concepts and Architecture</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="intro_dev__intro_sql">
+
+    <h2 class="title topictitle2" id="ariaid-title2">Overview of the Impala SQL Dialect</h2>
+
+
+    <div class="body conbody">
+
+      <p class="p">
+        The Impala SQL dialect is highly compatible with the SQL syntax used in the Apache Hive component (HiveQL). As
+        such, it is familiar to users who are already familiar with running SQL queries on the Hadoop
+        infrastructure. Currently, Impala SQL supports a subset of HiveQL statements, data types, and built-in
+        functions. Impala also includes additional built-in functions for common industry features, to simplify
+        porting SQL from non-Hadoop systems.
+      </p>
+
+      <p class="p">
+        For users coming to Impala from traditional database or data warehousing backgrounds, the following aspects of the SQL dialect
+        might seem familiar:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          <p class="p">
+            The <a class="xref" href="impala_select.html#select">SELECT statement</a> includes familiar clauses such as <code class="ph codeph">WHERE</code>,
+            <code class="ph codeph">GROUP BY</code>, <code class="ph codeph">ORDER BY</code>, and <code class="ph codeph">WITH</code>.
+            You will find familiar notions such as
+            <a class="xref" href="impala_joins.html#joins">joins</a>, <a class="xref" href="impala_functions.html#builtins">built-in
+            functions</a> for processing strings, numbers, and dates,
+            <a class="xref" href="impala_aggregate_functions.html#aggregate_functions">aggregate functions</a>,
+            <a class="xref" href="impala_subqueries.html#subqueries">subqueries</a>, and
+            <a class="xref" href="impala_operators.html#comparison_operators">comparison operators</a>
+            such as <code class="ph codeph">IN()</code> and <code class="ph codeph">BETWEEN</code>.
+            The <code class="ph codeph">SELECT</code> statement is the place where SQL standards compliance is most important.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+          From the data warehousing world, you will recognize the notion of
+          <a class="xref" href="impala_partitioning.html#partitioning">partitioned tables</a>.
+          One or more columns serve as partition keys, and the data is physically arranged so that
+          queries that refer to the partition key columns in the <code class="ph codeph">WHERE</code> clause
+          can skip partitions that do not match the filter conditions. For example, if you have 10
+          years worth of data and use a clause such as <code class="ph codeph">WHERE year = 2015</code>,
+          <code class="ph codeph">WHERE year &gt; 2010</code>, or <code class="ph codeph">WHERE year IN (2014, 2015)</code>,
+          Impala skips all the data for non-matching years, greatly reducing the amount of I/O
+          for the query.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+          In Impala 1.2 and higher, <a class="xref" href="impala_udf.html#udfs">UDFs</a> let you perform custom comparisons
+          and transformation logic during <code class="ph codeph">SELECT</code> and <code class="ph codeph">INSERT...SELECT</code> statements.
+          </p>
+        </li>
+      </ul>
+
+      <p class="p">
+        For users coming to Impala from traditional database or data warehousing backgrounds, the following aspects of the SQL dialect
+        might require some learning and practice for you to become proficient in the Hadoop environment:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          <p class="p">
+          Impala SQL is focused on queries and includes relatively little DML. There is no <code class="ph codeph">UPDATE</code>
+          or <code class="ph codeph">DELETE</code> statement. Stale data is typically discarded (by <code class="ph codeph">DROP TABLE</code>
+          or <code class="ph codeph">ALTER TABLE ... DROP PARTITION</code> statements) or replaced (by <code class="ph codeph">INSERT
+          OVERWRITE</code> statements).
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+          All data creation is done by <code class="ph codeph">INSERT</code> statements, which typically insert data in bulk by
+          querying from other tables. There are two variations, <code class="ph codeph">INSERT INTO</code> which appends to the
+          existing data, and <code class="ph codeph">INSERT OVERWRITE</code> which replaces the entire contents of a table or
+          partition (similar to <code class="ph codeph">TRUNCATE TABLE</code> followed by a new <code class="ph codeph">INSERT</code>).
+          Although there is an <code class="ph codeph">INSERT ... VALUES</code> syntax to create a small number of values in
+          a single statement, it is far more efficient to use the <code class="ph codeph">INSERT ... SELECT</code> to copy
+          and transform large amounts of data from one table to another in a single operation.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+          You often construct Impala table definitions and data files in some other environment, and then attach
+          Impala so that it can run real-time queries. The same data files and table metadata are shared with other
+          components of the Hadoop ecosystem. In particular, Impala can access tables created by Hive or data
+          inserted by Hive, and Hive can access tables and data produced by Impala. Many other Hadoop components
+          can write files in formats such as Parquet and Avro, that can then be queried by Impala.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+          Because Hadoop and Impala are focused on data warehouse-style operations on large data sets, Impala SQL
+          includes some idioms that you might find in the import utilities for traditional database systems. For
+          example, you can create a table that reads comma-separated or tab-separated text files, specifying the
+          separator in the <code class="ph codeph">CREATE TABLE</code> statement. You can create <strong class="ph b">external tables</strong> that read
+          existing data files but do not move or transform them.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+          Because Impala reads large quantities of data that might not be perfectly tidy and predictable, it does
+          not require length constraints on string data types. For example, you can define a database column as
+          <code class="ph codeph">STRING</code> with unlimited length, rather than <code class="ph codeph">CHAR(1)</code> or
+          <code class="ph codeph">VARCHAR(64)</code>. <span class="ph">(Although in Impala 2.0 and later, you can also use
+          length-constrained <code class="ph codeph">CHAR</code> and <code class="ph codeph">VARCHAR</code> types.)</span>
+          </p>
+        </li>
+
+      </ul>
+
+      <p class="p">
+        <strong class="ph b">Related information:</strong> <a class="xref" href="impala_langref.html#langref">Impala SQL Language Reference</a>, especially
+        <a class="xref" href="impala_langref_sql.html#langref_sql">Impala SQL Statements</a> and <a class="xref" href="impala_functions.html#builtins">Impala Built-In Functions</a>
+      </p>
+    </div>
+  </article>
+
+
+
+
+
+
+
+
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="intro_dev__intro_apis">
+
+    <h2 class="title topictitle2" id="ariaid-title3">Overview of Impala Programming Interfaces</h2>
+
+
+    <div class="body conbody">
+
+      <p class="p">
+        You can connect and submit requests to the Impala daemons through:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          The <code class="ph codeph"><a class="xref" href="impala_impala_shell.html#impala_shell">impala-shell</a></code> interactive
+          command interpreter.
+        </li>
+
+        <li class="li">
+          The <a class="xref" href="http://gethue.com/" target="_blank">Hue</a> web-based user interface.
+        </li>
+
+        <li class="li">
+          <a class="xref" href="impala_jdbc.html#impala_jdbc">JDBC</a>.
+        </li>
+
+        <li class="li">
+          <a class="xref" href="impala_odbc.html#impala_odbc">ODBC</a>.
+        </li>
+      </ul>
+
+      <p class="p">
+        With these options, you can use Impala in heterogeneous environments, with JDBC or ODBC applications
+        running on non-Linux platforms. You can also use Impala on combination with various Business Intelligence
+        tools that use the JDBC and ODBC interfaces.
+      </p>
+
+      <p class="p">
+        Each <code class="ph codeph">impalad</code> daemon process, running on separate nodes in a cluster, listens to
+        <a class="xref" href="impala_ports.html#ports">several ports</a> for incoming requests. Requests from
+        <code class="ph codeph">impala-shell</code> and Hue are routed to the <code class="ph codeph">impalad</code> daemons through the same
+        port. The <code class="ph codeph">impalad</code> daemons listen on separate ports for JDBC and ODBC requests.
+      </p>
+    </div>
+  </article>
+</article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_disable_codegen.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_disable_codegen.html b/docs/build3x/html/topics/impala_disable_codegen.html
new file mode 100644
index 0000000..3fae1e7
--- /dev/null
+++ b/docs/build3x/html/topics/impala_disable_codegen.html
@@ -0,0 +1,36 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="disable_codegen"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>DISABLE_CODEGEN Query Option</title></head><body id="disable_codegen"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">DISABLE_CODEGEN Query Option</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+
+      This is a debug option, intended for diagnosing and working around issues that cause crashes. If a query
+      fails with an <span class="q">"illegal instruction"</span> or other hardware-specific message, try setting
+      <code class="ph codeph">DISABLE_CODEGEN=true</code> and running the query again. If the query succeeds only when the
+      <code class="ph codeph">DISABLE_CODEGEN</code> option is turned on, submit the problem to <span class="keyword">the appropriate support channel</span> and include that
+      detail in the problem report. Do not otherwise run with this setting turned on, because it results in lower
+      overall performance.
+    </p>
+
+    <p class="p">
+      Because the code generation phase adds a small amount of overhead for each query, you might turn on the
+      <code class="ph codeph">DISABLE_CODEGEN</code> option to achieve maximum throughput when running many short-lived queries
+      against small tables.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Type:</strong> Boolean; recognized values are 1 and 0, or <code class="ph codeph">true</code> and <code class="ph codeph">false</code>;
+        any other value interpreted as <code class="ph codeph">false</code>
+      </p>
+    <p class="p">
+        <strong class="ph b">Default:</strong> <code class="ph codeph">false</code> (shown as 0 in output of <code class="ph codeph">SET</code> statement)
+      </p>
+
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_disable_row_runtime_filtering.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_disable_row_runtime_filtering.html b/docs/build3x/html/topics/impala_disable_row_runtime_filtering.html
new file mode 100644
index 0000000..80d84f5
--- /dev/null
+++ b/docs/build3x/html/topics/impala_disable_row_runtime_filtering.html
@@ -0,0 +1,90 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="disable_row_runtime_filtering"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>DISABLE_ROW_RUNTIME_FILTERING Query Option (Impala 2.5 or higher only)</title></head><body id="disable_row_runtime_filtering"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">DISABLE_ROW_RUNTIME_FILTERING Query Option (<span class="keyword">Impala 2.5</span> or higher only)</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+
+      The <code class="ph codeph">DISABLE_ROW_RUNTIME_FILTERING</code> query option
+      reduces the scope of the runtime filtering feature. Queries still dynamically prune
+      partitions, but do not apply the filtering logic to individual rows within partitions.
+    </p>
+
+    <p class="p">
+      Only applies to queries against Parquet tables. For other file formats, Impala
+      only prunes at the level of partitions, not individual rows.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Type:</strong> Boolean; recognized values are 1 and 0, or <code class="ph codeph">true</code> and <code class="ph codeph">false</code>;
+        any other value interpreted as <code class="ph codeph">false</code>
+      </p>
+    <p class="p">
+        <strong class="ph b">Default:</strong> <code class="ph codeph">false</code>
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.5.0</span>
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+    <p class="p">
+      Impala automatically evaluates whether the per-row filters are being
+      effective at reducing the amount of intermediate data. Therefore,
+      this option is typically only needed for the rare case where Impala
+      cannot accurately determine how effective the per-row filtering is
+      for a query.
+    </p>
+
+    <p class="p">
+        Because the runtime filtering feature applies mainly to resource-intensive
+        and long-running queries, only adjust this query option when tuning long-running queries
+        involving some combination of large partitioned tables and joins involving large tables.
+      </p>
+
+    <p class="p">
+      Because this setting only improves query performance in very specific
+      circumstances, depending on the query characteristics and data distribution,
+      only use it when you determine through benchmarking that it improves
+      performance of specific expensive queries.
+      Consider setting this query option immediately before the expensive query and
+      unsetting it immediately afterward.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">File format considerations:</strong>
+      </p>
+
+    <p class="p">
+      This query option only applies to queries against HDFS-based tables
+      using the Parquet file format.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Kudu considerations:</strong>
+      </p>
+
+    <p class="p">
+      When applied to a query involving a Kudu table, this option turns off
+      all runtime filtering for the Kudu table.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+    <p class="p">
+      <a class="xref" href="impala_runtime_filtering.html">Runtime Filtering for Impala Queries (Impala 2.5 or higher only)</a>,
+      <a class="xref" href="impala_runtime_filter_mode.html#runtime_filter_mode">RUNTIME_FILTER_MODE Query Option (Impala 2.5 or higher only)</a>
+
+    </p>
+
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_disable_streaming_preaggregations.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_disable_streaming_preaggregations.html b/docs/build3x/html/topics/impala_disable_streaming_preaggregations.html
new file mode 100644
index 0000000..bf1f9bc
--- /dev/null
+++ b/docs/build3x/html/topics/impala_disable_streaming_preaggregations.html
@@ -0,0 +1,50 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="disable_streaming_preaggregations"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>DISABLE_STREAMING_PREAGGREGATIONS Query Option (Impala 2.5 or higher only)</title></head><body id="disable_streaming_preaggregations"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">DISABLE_STREAMING_PREAGGREGATIONS Query Option (<span class="keyword">Impala 2.5</span> or higher only)</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+
+      Turns off the <span class="q">"streaming preaggregation"</span> optimization that is available in <span class="keyword">Impala 2.5</span>
+      and higher. This optimization reduces unnecessary work performed by queries that perform aggregation
+      operations on columns with few or no duplicate values, for example <code class="ph codeph">DISTINCT <var class="keyword varname">id_column</var></code>
+      or <code class="ph codeph">GROUP BY <var class="keyword varname">unique_column</var></code>. If the optimization causes regressions in
+      existing queries that use aggregation functions, you can turn it off as needed by setting this query option.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Type:</strong> Boolean; recognized values are 1 and 0, or <code class="ph codeph">true</code> and <code class="ph codeph">false</code>;
+        any other value interpreted as <code class="ph codeph">false</code>
+      </p>
+    <p class="p">
+        <strong class="ph b">Default:</strong> <code class="ph codeph">false</code> (shown as 0 in output of <code class="ph codeph">SET</code> statement)
+      </p>
+
+    <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+        In <span class="keyword">Impala 2.5.0</span>, only the value 1 enables the option, and the value
+        <code class="ph codeph">true</code> is not recognized. This limitation is
+        tracked by the issue
+        <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-3334" target="_blank">IMPALA-3334</a>,
+        which shows the releases where the problem is fixed.
+      </div>
+
+    <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+    <p class="p">
+      Typically, queries that would require enabling this option involve very large numbers of
+      aggregated values, such as a billion or more distinct keys being processed on each
+      worker node.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.5.0</span>
+      </p>
+
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_disable_unsafe_spills.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_disable_unsafe_spills.html b/docs/build3x/html/topics/impala_disable_unsafe_spills.html
new file mode 100644
index 0000000..63f1c1b
--- /dev/null
+++ b/docs/build3x/html/topics/impala_disable_unsafe_spills.html
@@ -0,0 +1,50 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="disable_unsafe_spills"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>DISABLE_UNSAFE_SPILLS Query Option (Impala 2.0 or higher only)</title></head><body id="disable_unsafe_spills"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">DISABLE_UNSAFE_SPILLS Query Option (<span class="keyword">Impala 2.0</span> or higher only)</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+
+      Enable this option if you prefer to have queries fail when they exceed the Impala memory limit, rather than
+      write temporary data to disk.
+    </p>
+
+    <p class="p">
+      Queries that <span class="q">"spill"</span> to disk typically complete successfully, when in earlier Impala releases they would have failed.
+      However, queries with exorbitant memory requirements due to missing statistics or inefficient join clauses could
+      become so slow as a result that you would rather have them cancelled automatically and reduce the memory
+      usage through standard Impala tuning techniques.
+    </p>
+
+    <p class="p">
+      This option prevents only <span class="q">"unsafe"</span> spill operations, meaning that one or more tables are missing
+      statistics or the query does not include a hint to set the most efficient mechanism for a join or
+      <code class="ph codeph">INSERT ... SELECT</code> into a partitioned table. These are the tables most likely to result in
+      suboptimal execution plans that could cause unnecessary spilling. Therefore, leaving this option enabled is a
+      good way to find tables on which to run the <code class="ph codeph">COMPUTE STATS</code> statement.
+    </p>
+
+    <p class="p">
+      See <a class="xref" href="impala_scalability.html#spill_to_disk">SQL Operations that Spill to Disk</a> for information about the <span class="q">"spill to disk"</span>
+      feature for queries processing large result sets with joins, <code class="ph codeph">ORDER BY</code>, <code class="ph codeph">GROUP
+      BY</code>, <code class="ph codeph">DISTINCT</code>, aggregation functions, or analytic functions.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Type:</strong> Boolean; recognized values are 1 and 0, or <code class="ph codeph">true</code> and <code class="ph codeph">false</code>;
+        any other value interpreted as <code class="ph codeph">false</code>
+      </p>
+    <p class="p">
+        <strong class="ph b">Default:</strong> <code class="ph codeph">false</code> (shown as 0 in output of <code class="ph codeph">SET</code> statement)
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.0.0</span>
+      </p>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_disk_space.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_disk_space.html b/docs/build3x/html/topics/impala_disk_space.html
new file mode 100644
index 0000000..560be2b
--- /dev/null
+++ b/docs/build3x/html/topics/impala_disk_space.html
@@ -0,0 +1,133 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_admin.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="disk_space"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Managing Disk Space for Impala Data</title></head><body id="disk_space"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Managing Disk Space for Impala Data</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+      Although Impala typically works with many large files in an HDFS storage system with plenty of capacity,
+      there are times when you might perform some file cleanup to reclaim space, or advise developers on techniques
+      to minimize space consumption and file duplication.
+    </p>
+
+    <ul class="ul">
+      <li class="li">
+        <p class="p">
+          Use compact binary file formats where practical. Numeric and time-based data in particular can be stored
+          in more compact form in binary data files. Depending on the file format, various compression and encoding
+          features can reduce file size even further. You can specify the <code class="ph codeph">STORED AS</code> clause as part
+          of the <code class="ph codeph">CREATE TABLE</code> statement, or <code class="ph codeph">ALTER TABLE</code> with the <code class="ph codeph">SET
+          FILEFORMAT</code> clause for an existing table or partition within a partitioned table. See
+          <a class="xref" href="impala_file_formats.html#file_formats">How Impala Works with Hadoop File Formats</a> for details about file formats, especially
+          <a class="xref" href="impala_parquet.html#parquet">Using the Parquet File Format with Impala Tables</a>. See <a class="xref" href="impala_create_table.html#create_table">CREATE TABLE Statement</a> and
+          <a class="xref" href="impala_alter_table.html#alter_table">ALTER TABLE Statement</a> for syntax details.
+        </p>
+      </li>
+
+      <li class="li">
+        <p class="p">
+          You manage underlying data files differently depending on whether the corresponding Impala table is
+          defined as an <a class="xref" href="impala_tables.html#internal_tables">internal</a> or
+          <a class="xref" href="impala_tables.html#external_tables">external</a> table:
+        </p>
+        <ul class="ul">
+          <li class="li">
+            Use the <code class="ph codeph">DESCRIBE FORMATTED</code> statement to check if a particular table is internal
+            (managed by Impala) or external, and to see the physical location of the data files in HDFS. See
+            <a class="xref" href="impala_describe.html#describe">DESCRIBE Statement</a> for details.
+          </li>
+
+          <li class="li">
+            For Impala-managed (<span class="q">"internal"</span>) tables, use <code class="ph codeph">DROP TABLE</code> statements to remove
+            data files. See <a class="xref" href="impala_drop_table.html#drop_table">DROP TABLE Statement</a> for details.
+          </li>
+
+          <li class="li">
+            For tables not managed by Impala (<span class="q">"external"</span> tables), use appropriate HDFS-related commands such
+            as <code class="ph codeph">hadoop fs</code>, <code class="ph codeph">hdfs dfs</code>, or <code class="ph codeph">distcp</code>, to create, move,
+            copy, or delete files within HDFS directories that are accessible by the <code class="ph codeph">impala</code> user.
+            Issue a <code class="ph codeph">REFRESH <var class="keyword varname">table_name</var></code> statement after adding or removing any
+            files from the data directory of an external table. See <a class="xref" href="impala_refresh.html#refresh">REFRESH Statement</a> for
+            details.
+          </li>
+
+          <li class="li">
+            Use external tables to reference HDFS data files in their original location. With this technique, you
+            avoid copying the files, and you can map more than one Impala table to the same set of data files. When
+            you drop the Impala table, the data files are left undisturbed. See
+            <a class="xref" href="impala_tables.html#external_tables">External Tables</a> for details.
+          </li>
+
+          <li class="li">
+            Use the <code class="ph codeph">LOAD DATA</code> statement to move HDFS files into the data directory for an Impala
+            table from inside Impala, without the need to specify the HDFS path of the destination directory. This
+            technique works for both internal and external tables. See
+            <a class="xref" href="impala_load_data.html#load_data">LOAD DATA Statement</a> for details.
+          </li>
+        </ul>
+      </li>
+
+      <li class="li">
+        <p class="p">
+          Make sure that the HDFS trashcan is configured correctly. When you remove files from HDFS, the space
+          might not be reclaimed for use by other files until sometime later, when the trashcan is emptied. See
+          <a class="xref" href="impala_drop_table.html#drop_table">DROP TABLE Statement</a> for details. See
+          <a class="xref" href="impala_prereqs.html#prereqs_account">User Account Requirements</a> for permissions needed for the HDFS trashcan to operate
+          correctly.
+        </p>
+      </li>
+
+      <li class="li">
+        <p class="p">
+          Drop all tables in a database before dropping the database itself. See
+          <a class="xref" href="impala_drop_database.html#drop_database">DROP DATABASE Statement</a> for details.
+        </p>
+      </li>
+
+      <li class="li">
+        <p class="p">
+          Clean up temporary files after failed <code class="ph codeph">INSERT</code> statements. If an <code class="ph codeph">INSERT</code>
+          statement encounters an error, and you see a directory named <span class="ph filepath">.impala_insert_staging</span>
+          or <span class="ph filepath">_impala_insert_staging</span> left behind in the data directory for the table, it might
+          contain temporary data files taking up space in HDFS. You might be able to salvage these data files, for
+          example if they are complete but could not be moved into place due to a permission error. Or, you might
+          delete those files through commands such as <code class="ph codeph">hadoop fs</code> or <code class="ph codeph">hdfs dfs</code>, to
+          reclaim space before re-trying the <code class="ph codeph">INSERT</code>. Issue <code class="ph codeph">DESCRIBE FORMATTED
+          <var class="keyword varname">table_name</var></code> to see the HDFS path where you can check for temporary files.
+        </p>
+      </li>
+
+      <li class="li">
+        <p class="p">
+        By default, intermediate files used during large sort, join, aggregation, or analytic function operations
+        are stored in the directory <span class="ph filepath">/tmp/impala-scratch</span> . These files are removed when the
+        operation finishes. (Multiple concurrent queries can perform operations that use the <span class="q">"spill to disk"</span>
+        technique, without any name conflicts for these temporary files.) You can specify a different location by
+        starting the <span class="keyword cmdname">impalad</span> daemon with the
+        <code class="ph codeph">--scratch_dirs="<var class="keyword varname">path_to_directory</var>"</code> configuration option.
+        You can specify a single directory, or a comma-separated list of directories. The scratch directories must
+        be on the local filesystem, not in HDFS. You might specify different directory paths for different hosts,
+        depending on the capacity and speed
+        of the available storage devices. In <span class="keyword">Impala 2.3</span> or higher, Impala successfully starts (with a warning
+        Impala successfully starts (with a warning written to the log) if it cannot create or read and write files
+        in one of the scratch directories. If there is less than 1 GB free on the filesystem where that directory resides,
+        Impala still runs, but writes a warning message to its log.  If Impala encounters an error reading or writing
+        files in a scratch directory during a query, Impala logs the error and the query fails.
+      </p>
+      </li>
+
+      <li class="li">
+        <p class="p">
+          If you use the Amazon Simple Storage Service (S3) as a place to offload
+          data to reduce the volume of local storage, Impala 2.2.0 and higher
+          can query the data directly from S3.
+          See <a class="xref" href="impala_s3.html#s3">Using Impala with the Amazon S3 Filesystem</a> for details.
+        </p>
+      </li>
+    </ul>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_admin.html">Impala Administration</a></div></div></nav></article></main></body></html>

[18/51] [partial] impala git commit: [DOCS] Impala doc site update for 3.0

Posted by mi...@apache.org.

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_operators.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_operators.html b/docs/build3x/html/topics/impala_operators.html
new file mode 100644
index 0000000..e03240b
--- /dev/null
+++ b/docs/build3x/html/topics/impala_operators.html
@@ -0,0 +1,2042 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_langref.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="
 Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="operators"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>SQL Operators</title></head><body id="operators"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">SQL Operators</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+
+      SQL operators are a class of comparison functions that are widely used within the <code class="ph codeph">WHERE</code> clauses of
+      <code class="ph codeph">SELECT</code> statements.
+    </p>
+
+    <p class="p toc inpage"></p>
+
+  </div>
+
+  <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_langref.html">Impala SQL Language Reference</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="operators__arithmetic_operators">
+
+    <h2 class="title topictitle2" id="ariaid-title2">Arithmetic Operators</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+
+        The arithmetic operators use expressions with a left-hand argument, the operator, and then (in most cases) a right-hand argument.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+<pre class="pre codeblock"><code><var class="keyword varname">left_hand_arg</var> <var class="keyword varname">binary_operator</var> <var class="keyword varname">right_hand_arg</var>
+<var class="keyword varname">unary_operator</var> <var class="keyword varname">single_arg</var>
+</code></pre>
+
+      <ul class="ul">
+        <li class="li">
+          <code class="ph codeph">+</code> and <code class="ph codeph">-</code>: Can be used either as unary or binary operators.
+          <ul class="ul">
+            <li class="li">
+              <p class="p">
+                With unary notation, such as <code class="ph codeph">+5</code>, <code class="ph codeph">-2.5</code>, or <code class="ph codeph">-<var class="keyword varname">col_name</var></code>,
+                they multiply their single numeric argument by <code class="ph codeph">+1</code> or <code class="ph codeph">-1</code>. Therefore, unary
+                <code class="ph codeph">+</code> returns its argument unchanged, while unary <code class="ph codeph">-</code> flips the sign of its argument. Although
+                you can double up these operators in expressions such as <code class="ph codeph">++5</code> (always positive) or <code class="ph codeph">-+2</code> or
+                <code class="ph codeph">+-2</code> (both always negative), you cannot double the unary minus operator because <code class="ph codeph">--</code> is
+                interpreted as the start of a comment. (You can use a double unary minus operator if you separate the <code class="ph codeph">-</code>
+                characters, for example with a space or parentheses.)
+              </p>
+            </li>
+
+            <li class="li">
+              <p class="p">
+                With binary notation, such as <code class="ph codeph">2+2</code>, <code class="ph codeph">5-2.5</code>, or <code class="ph codeph"><var class="keyword varname">col1</var> +
+                <var class="keyword varname">col2</var></code>, they add or subtract respectively the right-hand argument to (or from) the left-hand
+                argument. Both arguments must be of numeric types.
+              </p>
+            </li>
+          </ul>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            <code class="ph codeph">*</code> and <code class="ph codeph">/</code>: Multiplication and division respectively. Both arguments must be of numeric types.
+          </p>
+
+          <p class="p">
+            When multiplying, the shorter argument is promoted if necessary (such as <code class="ph codeph">SMALLINT</code> to <code class="ph codeph">INT</code> or
+            <code class="ph codeph">BIGINT</code>, or <code class="ph codeph">FLOAT</code> to <code class="ph codeph">DOUBLE</code>), and then the result is promoted again to the
+            next larger type. Thus, multiplying a <code class="ph codeph">TINYINT</code> and an <code class="ph codeph">INT</code> produces a <code class="ph codeph">BIGINT</code>
+            result. Multiplying a <code class="ph codeph">FLOAT</code> and a <code class="ph codeph">FLOAT</code> produces a <code class="ph codeph">DOUBLE</code> result. Multiplying
+            a <code class="ph codeph">FLOAT</code> and a <code class="ph codeph">DOUBLE</code> or a <code class="ph codeph">DOUBLE</code> and a <code class="ph codeph">DOUBLE</code> produces a
+            <code class="ph codeph">DECIMAL(38,17)</code>, because <code class="ph codeph">DECIMAL</code> values can represent much larger and more precise values than
+            <code class="ph codeph">DOUBLE</code>.
+          </p>
+
+          <p class="p">
+            When dividing, Impala always treats the arguments and result as <code class="ph codeph">DOUBLE</code> values to avoid losing precision. If you
+            need to insert the results of a division operation into a <code class="ph codeph">FLOAT</code> column, use the <code class="ph codeph">CAST()</code>
+            function to convert the result to the correct type.
+          </p>
+        </li>
+
+        <li class="li" id="arithmetic_operators__div">
+          <p class="p">
+            <code class="ph codeph">DIV</code>: Integer division. Arguments are not promoted to a floating-point type, and any fractional result
+            is discarded. For example, <code class="ph codeph">13 DIV 7</code> returns 1, <code class="ph codeph">14 DIV 7</code> returns 2, and
+            <code class="ph codeph">15 DIV 7</code> returns 2. This operator is the same as the <code class="ph codeph">QUOTIENT()</code> function.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            <code class="ph codeph">%</code>: Modulo operator. Returns the remainder of the left-hand argument divided by the right-hand argument. Both
+            arguments must be of one of the integer types.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            <code class="ph codeph">&amp;</code>, <code class="ph codeph">|</code>, <code class="ph codeph">~</code>, and <code class="ph codeph">^</code>: Bitwise operators that return the
+            logical AND, logical OR, <code class="ph codeph">NOT</code>, or logical XOR (exclusive OR) of their argument values. Both arguments must be of
+            one of the integer types. If the arguments are of different type, the argument with the smaller type is implicitly extended to
+            match the argument with the longer type.
+          </p>
+        </li>
+      </ul>
+
+      <p class="p">
+        You can chain a sequence of arithmetic expressions, optionally grouping them with parentheses.
+      </p>
+
+      <p class="p">
+        The arithmetic operators generally do not have equivalent calling conventions using functional notation. For example, prior to
+        <span class="keyword">Impala 2.2</span>, there is no <code class="ph codeph">MOD()</code> function equivalent to the <code class="ph codeph">%</code> modulo operator.
+        Conversely, there are some arithmetic functions that do not have a corresponding operator. For example, for exponentiation you use
+        the <code class="ph codeph">POW()</code> function, but there is no <code class="ph codeph">**</code> exponentiation operator. See
+        <a class="xref" href="impala_math_functions.html#math_functions">Impala Mathematical Functions</a> for the arithmetic functions you can use.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Complex type considerations:</strong>
+      </p>
+
+      <p class="p">
+        To access a column with a complex type (<code class="ph codeph">ARRAY</code>, <code class="ph codeph">STRUCT</code>, or <code class="ph codeph">MAP</code>)
+        in an aggregation function, you unpack the individual elements using join notation in the query,
+        and then apply the function to the final scalar item, field, key, or value at the bottom of any nested type hierarchy in the column.
+        See <a class="xref" href="../shared/../topics/impala_complex_types.html#complex_types">Complex Types (Impala 2.3 or higher only)</a> for details about using complex types in Impala.
+      </p>
+
+      <div class="p">
+The following example demonstrates calls to several aggregation functions
+using values from a column containing nested complex types
+(an <code class="ph codeph">ARRAY</code> of <code class="ph codeph">STRUCT</code> items).
+The array is unpacked inside the query using join notation.
+The array elements are referenced using the <code class="ph codeph">ITEM</code>
+pseudocolumn, and the structure fields inside the array elements
+are referenced using dot notation.
+Numeric values such as <code class="ph codeph">SUM()</code> and <code class="ph codeph">AVG()</code>
+are computed using the numeric <code class="ph codeph">R_NATIONKEY</code> field, and
+the general-purpose <code class="ph codeph">MAX()</code> and <code class="ph codeph">MIN()</code>
+values are computed from the string <code class="ph codeph">N_NAME</code> field.
+<pre class="pre codeblock"><code>describe region;
++-------------+-------------------------+---------+
+| name        | type                    | comment |
++-------------+-------------------------+---------+
+| r_regionkey | smallint                |         |
+| r_name      | string                  |         |
+| r_comment   | string                  |         |
+| r_nations   | array&lt;struct&lt;           |         |
+|             |   n_nationkey:smallint, |         |
+|             |   n_name:string,        |         |
+|             |   n_comment:string      |         |
+|             | &gt;&gt;                      |         |
++-------------+-------------------------+---------+
+
+select r_name, r_nations.item.n_nationkey
+  from region, region.r_nations as r_nations
+order by r_name, r_nations.item.n_nationkey;
++-------------+------------------+
+| r_name      | item.n_nationkey |
++-------------+------------------+
+| AFRICA      | 0                |
+| AFRICA      | 5                |
+| AFRICA      | 14               |
+| AFRICA      | 15               |
+| AFRICA      | 16               |
+| AMERICA     | 1                |
+| AMERICA     | 2                |
+| AMERICA     | 3                |
+| AMERICA     | 17               |
+| AMERICA     | 24               |
+| ASIA        | 8                |
+| ASIA        | 9                |
+| ASIA        | 12               |
+| ASIA        | 18               |
+| ASIA        | 21               |
+| EUROPE      | 6                |
+| EUROPE      | 7                |
+| EUROPE      | 19               |
+| EUROPE      | 22               |
+| EUROPE      | 23               |
+| MIDDLE EAST | 4                |
+| MIDDLE EAST | 10               |
+| MIDDLE EAST | 11               |
+| MIDDLE EAST | 13               |
+| MIDDLE EAST | 20               |
++-------------+------------------+
+
+select
+  r_name,
+  count(r_nations.item.n_nationkey) as count,
+  sum(r_nations.item.n_nationkey) as sum,
+  avg(r_nations.item.n_nationkey) as avg,
+  min(r_nations.item.n_name) as minimum,
+  max(r_nations.item.n_name) as maximum,
+  ndv(r_nations.item.n_nationkey) as distinct_vals
+from
+  region, region.r_nations as r_nations
+group by r_name
+order by r_name;
++-------------+-------+-----+------+-----------+----------------+---------------+
+| r_name      | count | sum | avg  | minimum   | maximum        | distinct_vals |
++-------------+-------+-----+------+-----------+----------------+---------------+
+| AFRICA      | 5     | 50  | 10   | ALGERIA   | MOZAMBIQUE     | 5             |
+| AMERICA     | 5     | 47  | 9.4  | ARGENTINA | UNITED STATES  | 5             |
+| ASIA        | 5     | 68  | 13.6 | CHINA     | VIETNAM        | 5             |
+| EUROPE      | 5     | 77  | 15.4 | FRANCE    | UNITED KINGDOM | 5             |
+| MIDDLE EAST | 5     | 58  | 11.6 | EGYPT     | SAUDI ARABIA   | 5             |
++-------------+-------+-----+------+-----------+----------------+---------------+
+</code></pre>
+</div>
+
+      <p class="p">
+        You cannot refer to a column with a complex data type (<code class="ph codeph">ARRAY</code>, <code class="ph codeph">STRUCT</code>, or <code class="ph codeph">MAP</code>
+        directly in an operator. You can apply operators only to scalar values that make up a complex type
+        (the fields of a <code class="ph codeph">STRUCT</code>, the items of an <code class="ph codeph">ARRAY</code>,
+        or the key or value portion of a <code class="ph codeph">MAP</code>) as part of a join query that refers to
+        the scalar value using the appropriate dot notation or <code class="ph codeph">ITEM</code>, <code class="ph codeph">KEY</code>, or <code class="ph codeph">VALUE</code>
+        pseudocolumn names.
+      </p>
+
+      <p class="p">
+        The following example shows how to do an arithmetic operation using a numeric field of a <code class="ph codeph">STRUCT</code> type that is an
+        item within an <code class="ph codeph">ARRAY</code> column. Once the scalar numeric value <code class="ph codeph">R_NATIONKEY</code> is extracted, it can be
+        used in an arithmetic expression, such as multiplying by 10:
+      </p>
+
+<pre class="pre codeblock"><code>
+-- The SMALLINT is a field within an array of structs.
+describe region;
++-------------+-------------------------+---------+
+| name        | type                    | comment |
++-------------+-------------------------+---------+
+| r_regionkey | smallint                |         |
+| r_name      | string                  |         |
+| r_comment   | string                  |         |
+| r_nations   | array&lt;struct&lt;           |         |
+|             |   n_nationkey:smallint, |         |
+|             |   n_name:string,        |         |
+|             |   n_comment:string      |         |
+|             | &gt;&gt;                      |         |
++-------------+-------------------------+---------+
+
+-- When we refer to the scalar value using dot notation,
+-- we can use arithmetic and comparison operators on it
+-- like any other number.
+select r_name, nation.item.n_name, nation.item.n_nationkey * 10
+  from region, region.r_nations as nation
+where nation.item.n_nationkey &lt; 5;
++-------------+-------------+------------------------------+
+| r_name      | item.n_name | nation.item.n_nationkey * 10 |
++-------------+-------------+------------------------------+
+| AMERICA     | CANADA      | 30                           |
+| AMERICA     | BRAZIL      | 20                           |
+| AMERICA     | ARGENTINA   | 10                           |
+| MIDDLE EAST | EGYPT       | 40                           |
+| AFRICA      | ALGERIA     | 0                            |
++-------------+-------------+------------------------------+
+</code></pre>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="operators__between">
+
+    <h2 class="title topictitle2" id="ariaid-title3">BETWEEN Operator</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+
+        In a <code class="ph codeph">WHERE</code> clause, compares an expression to both a lower and upper bound. The comparison is successful is the
+        expression is greater than or equal to the lower bound, and less than or equal to the upper bound. If the bound values are switched,
+        so the lower bound is greater than the upper bound, does not match any values.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+<pre class="pre codeblock"><code><var class="keyword varname">expression</var> BETWEEN <var class="keyword varname">lower_bound</var> AND <var class="keyword varname">upper_bound</var></code></pre>
+
+      <p class="p">
+        <strong class="ph b">Data types:</strong> Typically used with numeric data types. Works with any data type, although not very practical for
+        <code class="ph codeph">BOOLEAN</code> values. (<code class="ph codeph">BETWEEN false AND true</code> will match all <code class="ph codeph">BOOLEAN</code> values.) Use
+        <code class="ph codeph">CAST()</code> if necessary to ensure the lower and upper bound values are compatible types. Call string or date/time
+        functions if necessary to extract or transform the relevant portion to compare, especially if the value can be transformed into a
+        number.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+      <p class="p">
+        Be careful when using short string operands. A longer string that starts with the upper bound value will not be included, because it
+        is considered greater than the upper bound. For example, <code class="ph codeph">BETWEEN 'A' and 'M'</code> would not match the string value
+        <code class="ph codeph">'Midway'</code>. Use functions such as <code class="ph codeph">upper()</code>, <code class="ph codeph">lower()</code>, <code class="ph codeph">substr()</code>,
+        <code class="ph codeph">trim()</code>, and so on if necessary to ensure the comparison works as expected.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Complex type considerations:</strong>
+      </p>
+
+      <p class="p">
+        You cannot refer to a column with a complex data type (<code class="ph codeph">ARRAY</code>, <code class="ph codeph">STRUCT</code>, or <code class="ph codeph">MAP</code>
+        directly in an operator. You can apply operators only to scalar values that make up a complex type
+        (the fields of a <code class="ph codeph">STRUCT</code>, the items of an <code class="ph codeph">ARRAY</code>,
+        or the key or value portion of a <code class="ph codeph">MAP</code>) as part of a join query that refers to
+        the scalar value using the appropriate dot notation or <code class="ph codeph">ITEM</code>, <code class="ph codeph">KEY</code>, or <code class="ph codeph">VALUE</code>
+        pseudocolumn names.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>-- Retrieve data for January through June, inclusive.
+select c1 from t1 where month <strong class="ph b">between 1 and 6</strong>;
+
+-- Retrieve data for names beginning with 'A' through 'M' inclusive.
+-- Only test the first letter to ensure all the values starting with 'M' are matched.
+-- Do a case-insensitive comparison to match names with various capitalization conventions.
+select last_name from customers where upper(substr(last_name,1,1)) <strong class="ph b">between 'A' and 'M'</strong>;
+
+-- Retrieve data for only the first week of each month.
+select count(distinct visitor_id)) from web_traffic where dayofmonth(when_viewed) <strong class="ph b">between 1 and 7</strong>;</code></pre>
+
+      <p class="p">
+        The following example shows how to do a <code class="ph codeph">BETWEEN</code> comparison using a numeric field of a <code class="ph codeph">STRUCT</code> type
+        that is an item within an <code class="ph codeph">ARRAY</code> column. Once the scalar numeric value <code class="ph codeph">R_NATIONKEY</code> is extracted, it
+        can be used in a comparison operator:
+      </p>
+
+<pre class="pre codeblock"><code>
+-- The SMALLINT is a field within an array of structs.
+describe region;
++-------------+-------------------------+---------+
+| name        | type                    | comment |
++-------------+-------------------------+---------+
+| r_regionkey | smallint                |         |
+| r_name      | string                  |         |
+| r_comment   | string                  |         |
+| r_nations   | array&lt;struct&lt;           |         |
+|             |   n_nationkey:smallint, |         |
+|             |   n_name:string,        |         |
+|             |   n_comment:string      |         |
+|             | &gt;&gt;                      |         |
++-------------+-------------------------+---------+
+
+-- When we refer to the scalar value using dot notation,
+-- we can use arithmetic and comparison operators on it
+-- like any other number.
+select r_name, nation.item.n_name, nation.item.n_nationkey
+from region, region.r_nations as nation
+where nation.item.n_nationkey between 3 and 5
++-------------+-------------+------------------+
+| r_name      | item.n_name | item.n_nationkey |
++-------------+-------------+------------------+
+| AMERICA     | CANADA      | 3                |
+| MIDDLE EAST | EGYPT       | 4                |
+| AFRICA      | ETHIOPIA    | 5                |
++-------------+-------------+------------------+
+</code></pre>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="operators__comparison_operators">
+
+    <h2 class="title topictitle2" id="ariaid-title4">Comparison Operators</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+
+        Impala supports the familiar comparison operators for checking equality and sort order for the column data types:
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+<pre class="pre codeblock"><code><var class="keyword varname">left_hand_expression</var> <var class="keyword varname">comparison_operator</var> <var class="keyword varname">right_hand_expression</var></code></pre>
+
+      <ul class="ul">
+        <li class="li">
+          <code class="ph codeph">=</code>, <code class="ph codeph">!=</code>, <code class="ph codeph">&lt;&gt;</code>: apply to all types.
+        </li>
+
+        <li class="li">
+          <code class="ph codeph">&lt;</code>, <code class="ph codeph">&lt;=</code>, <code class="ph codeph">&gt;</code>, <code class="ph codeph">&gt;=</code>: apply to all types; for
+          <code class="ph codeph">BOOLEAN</code>, <code class="ph codeph">TRUE</code> is considered greater than <code class="ph codeph">FALSE</code>.
+        </li>
+      </ul>
+
+      <p class="p">
+        <strong class="ph b">Alternatives:</strong>
+      </p>
+
+      <p class="p">
+        The <code class="ph codeph">IN</code> and <code class="ph codeph">BETWEEN</code> operators provide shorthand notation for expressing combinations of equality,
+        less than, and greater than comparisons with a single operator.
+      </p>
+
+      <p class="p">
+        Because comparing any value to <code class="ph codeph">NULL</code> produces <code class="ph codeph">NULL</code> rather than <code class="ph codeph">TRUE</code> or
+        <code class="ph codeph">FALSE</code>, use the <code class="ph codeph">IS NULL</code> and <code class="ph codeph">IS NOT NULL</code> operators to check if a value is
+        <code class="ph codeph">NULL</code> or not.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Complex type considerations:</strong>
+      </p>
+
+      <p class="p">
+        You cannot refer to a column with a complex data type (<code class="ph codeph">ARRAY</code>, <code class="ph codeph">STRUCT</code>, or <code class="ph codeph">MAP</code>
+        directly in an operator. You can apply operators only to scalar values that make up a complex type
+        (the fields of a <code class="ph codeph">STRUCT</code>, the items of an <code class="ph codeph">ARRAY</code>,
+        or the key or value portion of a <code class="ph codeph">MAP</code>) as part of a join query that refers to
+        the scalar value using the appropriate dot notation or <code class="ph codeph">ITEM</code>, <code class="ph codeph">KEY</code>, or <code class="ph codeph">VALUE</code>
+        pseudocolumn names.
+      </p>
+
+      <p class="p">
+        The following example shows how to do an arithmetic operation using a numeric field of a <code class="ph codeph">STRUCT</code> type that is an
+        item within an <code class="ph codeph">ARRAY</code> column. Once the scalar numeric value <code class="ph codeph">R_NATIONKEY</code> is extracted, it can be
+        used with a comparison operator such as <code class="ph codeph">&lt;</code>:
+      </p>
+
+<pre class="pre codeblock"><code>
+-- The SMALLINT is a field within an array of structs.
+describe region;
++-------------+-------------------------+---------+
+| name        | type                    | comment |
++-------------+-------------------------+---------+
+| r_regionkey | smallint                |         |
+| r_name      | string                  |         |
+| r_comment   | string                  |         |
+| r_nations   | array&lt;struct&lt;           |         |
+|             |   n_nationkey:smallint, |         |
+|             |   n_name:string,        |         |
+|             |   n_comment:string      |         |
+|             | &gt;&gt;                      |         |
++-------------+-------------------------+---------+
+
+-- When we refer to the scalar value using dot notation,
+-- we can use arithmetic and comparison operators on it
+-- like any other number.
+select r_name, nation.item.n_name, nation.item.n_nationkey
+from region, region.r_nations as nation
+where nation.item.n_nationkey &lt; 5
++-------------+-------------+------------------+
+| r_name      | item.n_name | item.n_nationkey |
++-------------+-------------+------------------+
+| AMERICA     | CANADA      | 3                |
+| AMERICA     | BRAZIL      | 2                |
+| AMERICA     | ARGENTINA   | 1                |
+| MIDDLE EAST | EGYPT       | 4                |
+| AFRICA      | ALGERIA     | 0                |
++-------------+-------------+------------------+
+</code></pre>
+
+    </div>
+
+  </article>
+
+
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title5" id="operators__exists">
+
+    <h2 class="title topictitle2" id="ariaid-title5">EXISTS Operator</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+
+
+
+        The <code class="ph codeph">EXISTS</code> operator tests whether a subquery returns any results. You typically use it to find values from one
+        table that have corresponding values in another table.
+      </p>
+
+      <p class="p">
+        The converse, <code class="ph codeph">NOT EXISTS</code>, helps to find all the values from one table that do not have any corresponding values in
+        another table.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>EXISTS (<var class="keyword varname">subquery</var>)
+NOT EXISTS (<var class="keyword varname">subquery</var>)
+</code></pre>
+
+      <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+      <p class="p">
+        The subquery can refer to a different table than the outer query block, or the same table. For example, you might use
+        <code class="ph codeph">EXISTS</code> or <code class="ph codeph">NOT EXISTS</code> to check the existence of parent/child relationships between two columns of
+        the same table.
+      </p>
+
+      <p class="p">
+        You can also use operators and function calls within the subquery to test for other kinds of relationships other than strict
+        equality. For example, you might use a call to <code class="ph codeph">COUNT()</code> in the subquery to check whether the number of matching
+        values is higher or lower than some limit. You might call a UDF in the subquery to check whether values in one table matches a
+        hashed representation of those same values in a different table.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">NULL considerations:</strong>
+      </p>
+
+      <p class="p">
+        If the subquery returns any value at all (even <code class="ph codeph">NULL</code>), <code class="ph codeph">EXISTS</code> returns <code class="ph codeph">TRUE</code> and
+        <code class="ph codeph">NOT EXISTS</code> returns false.
+      </p>
+
+      <p class="p">
+        The following example shows how even when the subquery returns only <code class="ph codeph">NULL</code> values, <code class="ph codeph">EXISTS</code> still
+        returns <code class="ph codeph">TRUE</code> and thus matches all the rows from the table in the outer query block.
+      </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; create table all_nulls (x int);
+[localhost:21000] &gt; insert into all_nulls values (null), (null), (null);
+[localhost:21000] &gt; select y from t2 where exists (select x from all_nulls);
++---+
+| y |
++---+
+| 2 |
+| 4 |
+| 6 |
++---+
+</code></pre>
+
+      <p class="p">
+        However, if the table in the subquery is empty and so the subquery returns an empty result set, <code class="ph codeph">EXISTS</code> returns
+        <code class="ph codeph">FALSE</code>:
+      </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; create table empty (x int);
+[localhost:21000] &gt; select y from t2 where exists (select x from empty);
+[localhost:21000] &gt;
+</code></pre>
+
+      <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.0.0</span>
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Restrictions:</strong>
+      </p>
+
+      <p class="p">
+        Correlated subqueries used in <code class="ph codeph">EXISTS</code> and <code class="ph codeph">IN</code> operators cannot include a
+        <code class="ph codeph">LIMIT</code> clause.
+      </p>
+
+      <p class="p">
+        Prior to <span class="keyword">Impala 2.6</span>,
+        the <code class="ph codeph">NOT EXISTS</code> operator required a correlated subquery.
+        In <span class="keyword">Impala 2.6</span> and higher, <code class="ph codeph">NOT EXISTS</code> works with
+        uncorrelated queries also.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Complex type considerations:</strong>
+      </p>
+
+      <p class="p">
+        You cannot refer to a column with a complex data type (<code class="ph codeph">ARRAY</code>, <code class="ph codeph">STRUCT</code>, or <code class="ph codeph">MAP</code>
+        directly in an operator. You can apply operators only to scalar values that make up a complex type
+        (the fields of a <code class="ph codeph">STRUCT</code>, the items of an <code class="ph codeph">ARRAY</code>,
+        or the key or value portion of a <code class="ph codeph">MAP</code>) as part of a join query that refers to
+        the scalar value using the appropriate dot notation or <code class="ph codeph">ITEM</code>, <code class="ph codeph">KEY</code>, or <code class="ph codeph">VALUE</code>
+        pseudocolumn names.
+      </p>
+
+
+
+      <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+      <div class="p">
+
+
+        The following examples refer to these simple tables containing small sets of integers or strings:
+<pre class="pre codeblock"><code>[localhost:21000] &gt; create table t1 (x int);
+[localhost:21000] &gt; insert into t1 values (1), (2), (3), (4), (5), (6);
+
+[localhost:21000] &gt; create table t2 (y int);
+[localhost:21000] &gt; insert into t2 values (2), (4), (6);
+
+[localhost:21000] &gt; create table t3 (z int);
+[localhost:21000] &gt; insert into t3 values (1), (3), (5);
+
+[localhost:21000] &gt; create table month_names (m string);
+[localhost:21000] &gt; insert into month_names values
+                  &gt; ('January'), ('February'), ('March'),
+                  &gt; ('April'), ('May'), ('June'), ('July'),
+                  &gt; ('August'), ('September'), ('October'),
+                  &gt; ('November'), ('December');
+</code></pre>
+      </div>
+
+      <p class="p">
+        The following example shows a correlated subquery that finds all the values in one table that exist in another table. For each value
+        <code class="ph codeph">X</code> from <code class="ph codeph">T1</code>, the query checks if the <code class="ph codeph">Y</code> column of <code class="ph codeph">T2</code> contains an
+        identical value, and the <code class="ph codeph">EXISTS</code> operator returns <code class="ph codeph">TRUE</code> or <code class="ph codeph">FALSE</code> as appropriate in
+        each case.
+      </p>
+
+<pre class="pre codeblock"><code>localhost:21000] &gt; select x from t1 where exists (select y from t2 where t1.x = y);
++---+
+| x |
++---+
+| 2 |
+| 4 |
+| 6 |
++---+
+</code></pre>
+
+      <p class="p">
+        An uncorrelated query is less interesting in this case. Because the subquery always returns <code class="ph codeph">TRUE</code>, all rows from
+        <code class="ph codeph">T1</code> are returned. If the table contents where changed so that the subquery did not match any rows, none of the rows
+        from <code class="ph codeph">T1</code> would be returned.
+      </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; select x from t1 where exists (select y from t2 where y &gt; 5);
++---+
+| x |
++---+
+| 1 |
+| 2 |
+| 3 |
+| 4 |
+| 5 |
+| 6 |
++---+
+</code></pre>
+
+      <p class="p">
+        The following example shows how an uncorrelated subquery can test for the existence of some condition within a table. By using
+        <code class="ph codeph">LIMIT 1</code> or an aggregate function, the query returns a single result or no result based on whether the subquery
+        matches any rows. Here, we know that <code class="ph codeph">T1</code> and <code class="ph codeph">T2</code> contain some even numbers, but <code class="ph codeph">T3</code>
+        does not.
+      </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; select "contains an even number" from t1 where exists (select x from t1 where x % 2 = 0) limit 1;
++---------------------------+
+| 'contains an even number' |
++---------------------------+
+| contains an even number   |
++---------------------------+
+[localhost:21000] &gt; select "contains an even number" as assertion from t1 where exists (select x from t1 where x % 2 = 0) limit 1;
++-------------------------+
+| assertion               |
++-------------------------+
+| contains an even number |
++-------------------------+
+[localhost:21000] &gt; select "contains an even number" as assertion from t2 where exists (select x from t2 where y % 2 = 0) limit 1;
+ERROR: AnalysisException: couldn't resolve column reference: 'x'
+[localhost:21000] &gt; select "contains an even number" as assertion from t2 where exists (select y from t2 where y % 2 = 0) limit 1;
++-------------------------+
+| assertion               |
++-------------------------+
+| contains an even number |
++-------------------------+
+[localhost:21000] &gt; select "contains an even number" as assertion from t3 where exists (select z from t3 where z % 2 = 0) limit 1;
+[localhost:21000] &gt;
+</code></pre>
+
+      <p class="p">
+        The following example finds numbers in one table that are 1 greater than numbers from another table. The <code class="ph codeph">EXISTS</code>
+        notation is simpler than an equivalent <code class="ph codeph">CROSS JOIN</code> between the tables. (The example then also illustrates how the
+        same test could be performed using an <code class="ph codeph">IN</code> operator.)
+      </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; select x from t1 where exists (select y from t2 where x = y + 1);
++---+
+| x |
++---+
+| 3 |
+| 5 |
++---+
+[localhost:21000] &gt; select x from t1 where x in (select y + 1 from t2);
++---+
+| x |
++---+
+| 3 |
+| 5 |
++---+
+</code></pre>
+
+      <p class="p">
+        The following example finds values from one table that do not exist in another table.
+      </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; select x from t1 where not exists (select y from t2 where x = y);
++---+
+| x |
++---+
+| 1 |
+| 3 |
+| 5 |
++---+
+</code></pre>
+
+      <p class="p">
+        The following example uses the <code class="ph codeph">NOT EXISTS</code> operator to find all the leaf nodes in tree-structured data. This
+        simplified <span class="q">"tree of life"</span> has multiple levels (class, order, family, and so on), with each item pointing upward through a
+        <code class="ph codeph">PARENT</code> pointer. The example runs an outer query and a subquery on the same table, returning only those items whose
+        <code class="ph codeph">ID</code> value is <em class="ph i">not</em> referenced by the <code class="ph codeph">PARENT</code> of any other item.
+      </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; create table tree (id int, parent int, name string);
+[localhost:21000] &gt; insert overwrite tree values
+                  &gt; (0, null, "animals"),
+                  &gt; (1, 0, "placentals"),
+                  &gt; (2, 0, "marsupials"),
+                  &gt; (3, 1, "bats"),
+                  &gt; (4, 1, "cats"),
+                  &gt; (5, 2, "kangaroos"),
+                  &gt; (6, 4, "lions"),
+                  &gt; (7, 4, "tigers"),
+                  &gt; (8, 5, "red kangaroo"),
+                  &gt; (9, 2, "wallabies");
+[localhost:21000] &gt; select name as "leaf node" from tree one
+                  &gt; where not exists (select parent from tree two where one.id = two.parent);
++--------------+
+| leaf node    |
++--------------+
+| bats         |
+| lions        |
+| tigers       |
+| red kangaroo |
+| wallabies    |
++--------------+
+</code></pre>
+
+      <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+      <p class="p">
+        <a class="xref" href="impala_subqueries.html#subqueries">Subqueries in Impala SELECT Statements</a>
+      </p>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title6" id="operators__ilike">
+
+    <h2 class="title topictitle2" id="ariaid-title6">ILIKE Operator</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+
+        A case-insensitive comparison operator for <code class="ph codeph">STRING</code> data, with basic wildcard capability using <code class="ph codeph">_</code> to match a single
+        character and <code class="ph codeph">%</code> to match multiple characters. The argument expression must match the entire string value.
+        Typically, it is more efficient to put any <code class="ph codeph">%</code> wildcard match at the end of the string.
+      </p>
+
+      <p class="p">
+        This operator, available in <span class="keyword">Impala 2.5</span> and higher, is the equivalent of the <code class="ph codeph">LIKE</code> operator,
+        but with case-insensitive comparisons.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+<pre class="pre codeblock"><code><var class="keyword varname">string_expression</var> ILIKE <var class="keyword varname">wildcard_expression</var>
+<var class="keyword varname">string_expression</var> NOT ILIKE <var class="keyword varname">wildcard_expression</var>
+</code></pre>
+
+      <p class="p">
+        <strong class="ph b">Complex type considerations:</strong>
+      </p>
+
+      <p class="p">
+        You cannot refer to a column with a complex data type (<code class="ph codeph">ARRAY</code>, <code class="ph codeph">STRUCT</code>, or <code class="ph codeph">MAP</code>
+        directly in an operator. You can apply operators only to scalar values that make up a complex type
+        (the fields of a <code class="ph codeph">STRUCT</code>, the items of an <code class="ph codeph">ARRAY</code>,
+        or the key or value portion of a <code class="ph codeph">MAP</code>) as part of a join query that refers to
+        the scalar value using the appropriate dot notation or <code class="ph codeph">ITEM</code>, <code class="ph codeph">KEY</code>, or <code class="ph codeph">VALUE</code>
+        pseudocolumn names.
+      </p>
+
+
+
+      <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+      <p class="p">
+        In the following examples, strings that are the same except for differences in uppercase
+        and lowercase match successfully with <code class="ph codeph">ILIKE</code>, but do not match
+        with <code class="ph codeph">LIKE</code>:
+      </p>
+
+<pre class="pre codeblock"><code>select 'fooBar' ilike 'FOOBAR';
++-------------------------+
+| 'foobar' ilike 'foobar' |
++-------------------------+
+| true                    |
++-------------------------+
+
+select 'fooBar' like 'FOOBAR';
++------------------------+
+| 'foobar' like 'foobar' |
++------------------------+
+| false                  |
++------------------------+
+
+select 'FOOBAR' ilike 'f%';
++---------------------+
+| 'foobar' ilike 'f%' |
++---------------------+
+| true                |
++---------------------+
+
+select 'FOOBAR' like 'f%';
++--------------------+
+| 'foobar' like 'f%' |
++--------------------+
+| false              |
++--------------------+
+
+select 'ABCXYZ' not ilike 'ab_xyz';
++-----------------------------+
+| not 'abcxyz' ilike 'ab_xyz' |
++-----------------------------+
+| false                       |
++-----------------------------+
+
+select 'ABCXYZ' not like 'ab_xyz';
++----------------------------+
+| not 'abcxyz' like 'ab_xyz' |
++----------------------------+
+| true                       |
++----------------------------+
+</code></pre>
+
+      <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+      <p class="p">
+        For case-sensitive comparisons, see <a class="xref" href="impala_operators.html#like">LIKE Operator</a>.
+        For a more general kind of search operator using regular expressions, see <a class="xref" href="impala_operators.html#regexp">REGEXP Operator</a>
+        or its case-insensitive counterpart <a class="xref" href="impala_operators.html#iregexp">IREGEXP Operator</a>.
+      </p>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title7" id="operators__in">
+
+    <h2 class="title topictitle2" id="ariaid-title7">IN Operator</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+
+
+
+        The <code class="ph codeph">IN</code> operator compares an argument value to a set of values, and returns <code class="ph codeph">TRUE</code> if the argument
+        matches any value in the set. The <code class="ph codeph">NOT IN</code> operator reverses the comparison, and checks if the argument value is not
+        part of a set of values.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+<pre class="pre codeblock"><code><var class="keyword varname">expression</var> IN (<var class="keyword varname">expression</var> [, <var class="keyword varname">expression</var>])
+<var class="keyword varname">expression</var> IN (<var class="keyword varname">subquery</var>)
+
+<var class="keyword varname">expression</var> NOT IN (<var class="keyword varname">expression</var> [, <var class="keyword varname">expression</var>])
+<var class="keyword varname">expression</var> NOT IN (<var class="keyword varname">subquery</var>)
+</code></pre>
+
+      <p class="p">
+        The left-hand expression and the set of comparison values must be of compatible types.
+      </p>
+
+      <p class="p">
+        The left-hand expression must consist only of a single value, not a tuple. Although the left-hand expression is typically a column
+        name, it could also be some other value. For example, the <code class="ph codeph">WHERE</code> clauses <code class="ph codeph">WHERE id IN (5)</code> and
+        <code class="ph codeph">WHERE 5 IN (id)</code> produce the same results.
+      </p>
+
+      <p class="p">
+        The set of values to check against can be specified as constants, function calls, column names, or other expressions in the query
+        text. The maximum number of expressions in the <code class="ph codeph">IN</code> list is 9999. (The maximum number of elements of
+        a single expression is 10,000 items, and the <code class="ph codeph">IN</code> operator itself counts as one.)
+      </p>
+
+      <p class="p">
+        In Impala 2.0 and higher, the set of values can also be generated by a subquery. <code class="ph codeph">IN</code> can evaluate an unlimited
+        number of results using a subquery.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+      <p class="p">
+        Any expression using the <code class="ph codeph">IN</code> operator could be rewritten as a series of equality tests connected with
+        <code class="ph codeph">OR</code>, but the <code class="ph codeph">IN</code> syntax is often clearer, more concise, and easier for Impala to optimize. For
+        example, with partitioned tables, queries frequently use <code class="ph codeph">IN</code> clauses to filter data by comparing the partition key
+        columns to specific values.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">NULL considerations:</strong>
+      </p>
+
+      <p class="p">
+        If there really is a matching non-null value, <code class="ph codeph">IN</code> returns <code class="ph codeph">TRUE</code>:
+      </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; select 1 in (1,null,2,3);
++----------------------+
+| 1 in (1, null, 2, 3) |
++----------------------+
+| true                 |
++----------------------+
+[localhost:21000] &gt; select 1 not in (1,null,2,3);
++--------------------------+
+| 1 not in (1, null, 2, 3) |
++--------------------------+
+| false                    |
++--------------------------+
+</code></pre>
+
+      <p class="p">
+        If the searched value is not found in the comparison values, and the comparison values include <code class="ph codeph">NULL</code>, the result is
+        <code class="ph codeph">NULL</code>:
+      </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; select 5 in (1,null,2,3);
++----------------------+
+| 5 in (1, null, 2, 3) |
++----------------------+
+| NULL                 |
++----------------------+
+[localhost:21000] &gt; select 5 not in (1,null,2,3);
++--------------------------+
+| 5 not in (1, null, 2, 3) |
++--------------------------+
+| NULL                     |
++--------------------------+
+[localhost:21000] &gt; select 1 in (null);
++-------------+
+| 1 in (null) |
++-------------+
+| NULL        |
++-------------+
+[localhost:21000] &gt; select 1 not in (null);
++-----------------+
+| 1 not in (null) |
++-----------------+
+| NULL            |
++-----------------+
+</code></pre>
+
+      <p class="p">
+        If the left-hand argument is <code class="ph codeph">NULL</code>, <code class="ph codeph">IN</code> always returns <code class="ph codeph">NULL</code>. This rule applies even
+        if the comparison values include <code class="ph codeph">NULL</code>.
+      </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; select null in (1,2,3);
++-------------------+
+| null in (1, 2, 3) |
++-------------------+
+| NULL              |
++-------------------+
+[localhost:21000] &gt; select null not in (1,2,3);
++-----------------------+
+| null not in (1, 2, 3) |
++-----------------------+
+| NULL                  |
++-----------------------+
+[localhost:21000] &gt; select null in (null);
++----------------+
+| null in (null) |
++----------------+
+| NULL           |
++----------------+
+[localhost:21000] &gt; select null not in (null);
++--------------------+
+| null not in (null) |
++--------------------+
+| NULL               |
++--------------------+
+</code></pre>
+
+      <p class="p">
+        <strong class="ph b">Added in:</strong> Available in earlier Impala releases, but new capabilities were added in
+        <span class="keyword">Impala 2.0.0</span>
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Complex type considerations:</strong>
+      </p>
+
+      <p class="p">
+        You cannot refer to a column with a complex data type (<code class="ph codeph">ARRAY</code>, <code class="ph codeph">STRUCT</code>, or <code class="ph codeph">MAP</code>
+        directly in an operator. You can apply operators only to scalar values that make up a complex type
+        (the fields of a <code class="ph codeph">STRUCT</code>, the items of an <code class="ph codeph">ARRAY</code>,
+        or the key or value portion of a <code class="ph codeph">MAP</code>) as part of a join query that refers to
+        the scalar value using the appropriate dot notation or <code class="ph codeph">ITEM</code>, <code class="ph codeph">KEY</code>, or <code class="ph codeph">VALUE</code>
+        pseudocolumn names.
+      </p>
+
+      <p class="p">
+        The following example shows how to do an arithmetic operation using a numeric field of a <code class="ph codeph">STRUCT</code> type that is an
+        item within an <code class="ph codeph">ARRAY</code> column. Once the scalar numeric value <code class="ph codeph">R_NATIONKEY</code> is extracted, it can be
+        used in an arithmetic expression, such as multiplying by 10:
+      </p>
+
+<pre class="pre codeblock"><code>
+-- The SMALLINT is a field within an array of structs.
+describe region;
++-------------+-------------------------+---------+
+| name        | type                    | comment |
++-------------+-------------------------+---------+
+| r_regionkey | smallint                |         |
+| r_name      | string                  |         |
+| r_comment   | string                  |         |
+| r_nations   | array&lt;struct&lt;           |         |
+|             |   n_nationkey:smallint, |         |
+|             |   n_name:string,        |         |
+|             |   n_comment:string      |         |
+|             | &gt;&gt;                      |         |
++-------------+-------------------------+---------+
+
+-- When we refer to the scalar value using dot notation,
+-- we can use arithmetic and comparison operators on it
+-- like any other number.
+select r_name, nation.item.n_name, nation.item.n_nationkey
+from region, region.r_nations as nation
+where nation.item.n_nationkey in (1,3,5)
++---------+-------------+------------------+
+| r_name  | item.n_name | item.n_nationkey |
++---------+-------------+------------------+
+| AMERICA | CANADA      | 3                |
+| AMERICA | ARGENTINA   | 1                |
+| AFRICA  | ETHIOPIA    | 5                |
++---------+-------------+------------------+
+</code></pre>
+
+      <p class="p">
+        <strong class="ph b">Restrictions:</strong>
+      </p>
+
+      <p class="p">
+        Correlated subqueries used in <code class="ph codeph">EXISTS</code> and <code class="ph codeph">IN</code> operators cannot include a
+        <code class="ph codeph">LIMIT</code> clause.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>-- Using IN is concise and self-documenting.
+SELECT * FROM t1 WHERE c1 IN (1,2,10);
+-- Equivalent to series of = comparisons ORed together.
+SELECT * FROM t1 WHERE c1 = 1 OR c1 = 2 OR c1 = 10;
+
+SELECT c1 AS "starts with vowel" FROM t2 WHERE upper(substr(c1,1,1)) IN ('A','E','I','O','U');
+
+SELECT COUNT(DISTINCT(visitor_id)) FROM web_traffic WHERE month IN ('January','June','July');</code></pre>
+
+      <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+      <p class="p">
+        <a class="xref" href="impala_subqueries.html#subqueries">Subqueries in Impala SELECT Statements</a>
+      </p>
+
+    </div>
+
+  </article>
+
+
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title8" id="operators__iregexp">
+
+    <h2 class="title topictitle2" id="ariaid-title8">IREGEXP Operator</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+
+        Tests whether a value matches a regular expression, using case-insensitive string comparisons.
+        Uses the POSIX regular expression syntax where <code class="ph codeph">^</code> and
+        <code class="ph codeph">$</code> match the beginning and end of the string, <code class="ph codeph">.</code> represents any single character, <code class="ph codeph">*</code>
+        represents a sequence of zero or more items, <code class="ph codeph">+</code> represents a sequence of one or more items, <code class="ph codeph">?</code>
+        produces a non-greedy match, and so on.
+      </p>
+
+      <p class="p">
+        This operator, available in <span class="keyword">Impala 2.5</span> and higher, is the equivalent of the <code class="ph codeph">REGEXP</code> operator,
+        but with case-insensitive comparisons.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+<pre class="pre codeblock"><code><var class="keyword varname">string_expression</var> IREGEXP <var class="keyword varname">regular_expression</var>
+</code></pre>
+
+      <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+      <p class="p">
+        The regular expression must match the entire value, not just occur somewhere inside it. Use <code class="ph codeph">.*</code> at the beginning,
+        the end, or both if you only need to match characters anywhere in the middle. Thus, the <code class="ph codeph">^</code> and <code class="ph codeph">$</code>
+        atoms are often redundant, although you might already have them in your expression strings that you reuse from elsewhere.
+      </p>
+
+
+
+      <p class="p">
+        The <code class="ph codeph">|</code> symbol is the alternation operator, typically used within <code class="ph codeph">()</code> to match different sequences.
+        The <code class="ph codeph">()</code> groups do not allow backreferences. To retrieve the part of a value matched within a <code class="ph codeph">()</code>
+        section, use the <code class="ph codeph"><a class="xref" href="impala_string_functions.html#string_functions__regexp_extract">regexp_extract()</a></code>
+        built-in function. (Currently, there is not any case-insensitive equivalent for the <code class="ph codeph">regexp_extract()</code> function.)
+      </p>
+
+      <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+        <p class="p">
+        In Impala 1.3.1 and higher, the <code class="ph codeph">REGEXP</code> and <code class="ph codeph">RLIKE</code> operators now match a
+        regular expression string that occurs anywhere inside the target string, the same as if the regular
+        expression was enclosed on each side by <code class="ph codeph">.*</code>. See
+        <a class="xref" href="../shared/../topics/impala_operators.html#regexp">REGEXP Operator</a> for examples. Previously, these operators only
+        succeeded when the regular expression matched the entire target string. This change improves compatibility
+        with the regular expression support for popular database systems. There is no change to the behavior of the
+        <code class="ph codeph">regexp_extract()</code> and <code class="ph codeph">regexp_replace()</code> built-in functions.
+      </p>
+      </div>
+
+      <p class="p">
+        In Impala 2.0 and later, the Impala regular expression syntax conforms to the POSIX Extended Regular
+        Expression syntax used by the Google RE2 library. For details, see
+        <a class="xref" href="https://code.google.com/p/re2/" target="_blank">the RE2 documentation</a>. It
+        has most idioms familiar from regular expressions in Perl, Python, and so on, including
+        <code class="ph codeph">.*?</code> for non-greedy matches.
+      </p>
+
+      <p class="p">
+        In Impala 2.0 and later, a change in the underlying regular expression library could cause changes in the
+        way regular expressions are interpreted by this function. Test any queries that use regular expressions and
+        adjust the expression patterns if necessary. See
+        <a class="xref" href="../shared/../topics/impala_incompatible_changes.html#incompatible_changes_200">Incompatible Changes Introduced in Impala 2.0.0</a> for details.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Complex type considerations:</strong>
+      </p>
+
+      <p class="p">
+        You cannot refer to a column with a complex data type (<code class="ph codeph">ARRAY</code>, <code class="ph codeph">STRUCT</code>, or <code class="ph codeph">MAP</code>
+        directly in an operator. You can apply operators only to scalar values that make up a complex type
+        (the fields of a <code class="ph codeph">STRUCT</code>, the items of an <code class="ph codeph">ARRAY</code>,
+        or the key or value portion of a <code class="ph codeph">MAP</code>) as part of a join query that refers to
+        the scalar value using the appropriate dot notation or <code class="ph codeph">ITEM</code>, <code class="ph codeph">KEY</code>, or <code class="ph codeph">VALUE</code>
+        pseudocolumn names.
+      </p>
+
+
+
+      <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+      <p class="p">
+        The following examples demonstrate the syntax for the <code class="ph codeph">IREGEXP</code> operator.
+      </p>
+
+<pre class="pre codeblock"><code>select 'abcABCaabbcc' iregexp '^[a-c]+$';
++---------------------------------+
+| 'abcabcaabbcc' iregexp '[a-c]+' |
++---------------------------------+
+| true                            |
++---------------------------------+
+</code></pre>
+
+      <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+      <p class="p">
+        <a class="xref" href="impala_operators.html#regexp">REGEXP Operator</a>
+      </p>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="is_distinct_from__is_distinct" id="operators__is_distinct_from">
+
+    <h2 class="title topictitle2" id="is_distinct_from__is_distinct">IS DISTINCT FROM Operator</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+
+
+
+        The <code class="ph codeph">IS DISTINCT FROM</code> operator, and its converse the <code class="ph codeph">IS NOT DISTINCT FROM</code> operator, test whether or
+        not values are identical. <code class="ph codeph">IS NOT DISTINCT FROM</code> is similar to the <code class="ph codeph">=</code> operator, and <code class="ph codeph">IS
+        DISTINCT FROM</code> is similar to the <code class="ph codeph">!=</code> operator, except that <code class="ph codeph">NULL</code> values are treated as
+        identical. Therefore, <code class="ph codeph">IS NOT DISTINCT FROM</code> returns <code class="ph codeph">true</code> rather than <code class="ph codeph">NULL</code>, and
+        <code class="ph codeph">IS DISTINCT FROM</code> returns <code class="ph codeph">false</code> rather than <code class="ph codeph">NULL</code>, when comparing two
+        <code class="ph codeph">NULL</code> values. If one of the values being compared is <code class="ph codeph">NULL</code> and the other is not, <code class="ph codeph">IS DISTINCT
+        FROM</code> returns <code class="ph codeph">true</code> and <code class="ph codeph">IS NOT DISTINCT FROM</code> returns <code class="ph codeph">false</code>, again instead
+        of returning <code class="ph codeph">NULL</code> in both cases.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+<pre class="pre codeblock"><code><var class="keyword varname">expression1</var> IS DISTINCT FROM <var class="keyword varname">expression2</var>
+
+<var class="keyword varname">expression1</var> IS NOT DISTINCT FROM <var class="keyword varname">expression2</var>
+<var class="keyword varname">expression1</var> &lt;=&gt; <var class="keyword varname">expression2</var>
+</code></pre>
+
+      <p class="p">
+        The operator <code class="ph codeph">&lt;=&gt;</code> is an alias for <code class="ph codeph">IS NOT DISTINCT FROM</code>.
+        It is typically used as a <code class="ph codeph">NULL</code>-safe equality operator in join queries.
+        That is, <code class="ph codeph">A &lt;=&gt; B</code> is true if <code class="ph codeph">A</code> equals <code class="ph codeph">B</code>
+        or if both <code class="ph codeph">A</code> and <code class="ph codeph">B</code> are <code class="ph codeph">NULL</code>.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+      <p class="p">
+        This operator provides concise notation for comparing two values and always producing a <code class="ph codeph">true</code> or
+        <code class="ph codeph">false</code> result, without treating <code class="ph codeph">NULL</code> as a special case. Otherwise, to unambiguously distinguish
+        between two values requires a compound expression involving <code class="ph codeph">IS [NOT] NULL</code> tests of both operands in addition to the
+        <code class="ph codeph">=</code> or <code class="ph codeph">!=</code> operator.
+      </p>
+
+      <p class="p">
+        The <code class="ph codeph">&lt;=&gt;</code> operator, used like an equality operator in a join query,
+        is more efficient than the equivalent clause: <code class="ph codeph">A = B OR (A IS NULL AND B IS NULL)</code>.
+        The <code class="ph codeph">&lt;=&gt;</code> operator can use a hash join, while the <code class="ph codeph">OR</code> expression
+        cannot.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+      <p class="p">
+        The following examples show how <code class="ph codeph">IS DISTINCT FROM</code> gives output similar to
+        the <code class="ph codeph">!=</code> operator, and <code class="ph codeph">IS NOT DISTINCT FROM</code> gives output
+        similar to the <code class="ph codeph">=</code> operator. The exception is when the expression involves
+        a <code class="ph codeph">NULL</code> value on one side or both sides, where <code class="ph codeph">!=</code> and
+        <code class="ph codeph">=</code> return <code class="ph codeph">NULL</code> but the <code class="ph codeph">IS [NOT] DISTINCT FROM</code>
+        operators still return <code class="ph codeph">true</code> or <code class="ph codeph">false</code>.
+      </p>
+
+<pre class="pre codeblock"><code>
+select 1 is distinct from 0, 1 != 0;
++----------------------+--------+
+| 1 is distinct from 0 | 1 != 0 |
++----------------------+--------+
+| true                 | true   |
++----------------------+--------+
+
+select 1 is distinct from 1, 1 != 1;
++----------------------+--------+
+| 1 is distinct from 1 | 1 != 1 |
++----------------------+--------+
+| false                | false  |
++----------------------+--------+
+
+select 1 is distinct from null, 1 != null;
++-------------------------+-----------+
+| 1 is distinct from null | 1 != null |
++-------------------------+-----------+
+| true                    | NULL      |
++-------------------------+-----------+
+
+select null is distinct from null, null != null;
++----------------------------+--------------+
+| null is distinct from null | null != null |
++----------------------------+--------------+
+| false                      | NULL         |
++----------------------------+--------------+
+
+select 1 is not distinct from 0, 1 = 0;
++--------------------------+-------+
+| 1 is not distinct from 0 | 1 = 0 |
++--------------------------+-------+
+| false                    | false |
++--------------------------+-------+
+
+select 1 is not distinct from 1, 1 = 1;
++--------------------------+-------+
+| 1 is not distinct from 1 | 1 = 1 |
++--------------------------+-------+
+| true                     | true  |
++--------------------------+-------+
+
+select 1 is not distinct from null, 1 = null;
++-----------------------------+----------+
+| 1 is not distinct from null | 1 = null |
++-----------------------------+----------+
+| false                       | NULL     |
++-----------------------------+----------+
+
+select null is not distinct from null, null = null;
++--------------------------------+-------------+
+| null is not distinct from null | null = null |
++--------------------------------+-------------+
+| true                           | NULL        |
++--------------------------------+-------------+
+</code></pre>
+
+      <p class="p">
+        The following example shows how <code class="ph codeph">IS DISTINCT FROM</code> considers
+        <code class="ph codeph">CHAR</code> values to be the same (not distinct from each other)
+        if they only differ in the number of trailing spaces. Therefore, sometimes
+        the result of an <code class="ph codeph">IS [NOT] DISTINCT FROM</code> operator differs
+        depending on whether the values are <code class="ph codeph">STRING</code>/<code class="ph codeph">VARCHAR</code>
+        or <code class="ph codeph">CHAR</code>.
+      </p>
+
+<pre class="pre codeblock"><code>
+select
+  'x' is distinct from 'x ' as string_with_trailing_spaces,
+  cast('x' as char(5)) is distinct from cast('x ' as char(5)) as char_with_trailing_spaces;
++-----------------------------+---------------------------+
+| string_with_trailing_spaces | char_with_trailing_spaces |
++-----------------------------+---------------------------+
+| true                        | false                     |
++-----------------------------+---------------------------+
+</code></pre>
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title10" id="operators__is_null">
+
+    <h2 class="title topictitle2" id="ariaid-title10">IS NULL Operator</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+
+
+
+
+
+        The <code class="ph codeph">IS NULL</code> operator, and its converse the <code class="ph codeph">IS NOT NULL</code> operator, test whether a specified value is
+        <code class="ph codeph"><a class="xref" href="impala_literals.html#null">NULL</a></code>. Because using <code class="ph codeph">NULL</code> with any of the other
+        comparison operators such as <code class="ph codeph">=</code> or <code class="ph codeph">!=</code> also returns <code class="ph codeph">NULL</code> rather than
+        <code class="ph codeph">TRUE</code> or <code class="ph codeph">FALSE</code>, you use a special-purpose comparison operator to check for this special condition.
+      </p>
+
+      <p class="p">
+        In <span class="keyword">Impala 2.11</span> and higher, you can use
+        the operators <code class="ph codeph">IS UNKNOWN</code> and
+        <code class="ph codeph">IS NOT UNKNOWN</code> as synonyms for
+        <code class="ph codeph">IS NULL</code> and <code class="ph codeph">IS NOT NULL</code>,
+        respectively.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+<pre class="pre codeblock"><code><var class="keyword varname">expression</var> IS NULL
+<var class="keyword varname">expression</var> IS NOT NULL
+
+<span class="ph"><var class="keyword varname">expression</var> IS UNKNOWN</span>
+<span class="ph"><var class="keyword varname">expression</var> IS NOT UNKNOWN</span>
+</code></pre>
+
+      <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+      <p class="p">
+        In many cases, <code class="ph codeph">NULL</code> values indicate some incorrect or incomplete processing during data ingestion or conversion.
+        You might check whether any values in a column are <code class="ph codeph">NULL</code>, and if so take some followup action to fill them in.
+      </p>
+
+      <p class="p">
+        With sparse data, often represented in <span class="q">"wide"</span> tables, it is common for most values to be <code class="ph codeph">NULL</code> with only an
+        occasional non-<code class="ph codeph">NULL</code> value. In those cases, you can use the <code class="ph codeph">IS NOT NULL</code> operator to identify the
+        rows containing any data at all for a particular column, regardless of the actual value.
+      </p>
+
+      <p class="p">
+        With a well-designed database schema, effective use of <code class="ph codeph">NULL</code> values and <code class="ph codeph">IS NULL</code> and <code class="ph codeph">IS NOT
+        NULL</code> operators can save having to design custom logic around special values such as 0, -1, <code class="ph codeph">'N/A'</code>, empty
+        string, and so on. <code class="ph codeph">NULL</code> lets you distinguish between a value that is known to be 0, false, or empty, and a truly
+        unknown value.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Complex type considerations:</strong>
+      </p>
+
+      <p class="p">
+        The <code class="ph codeph">IS [NOT] UNKNOWN</code> operator, as with the <code class="ph codeph">IS [NOT] NULL</code>
+        operator, is not applicable to complex type columns (<code class="ph codeph">STRUCT</code>,
+        <code class="ph codeph">ARRAY</code>, or <code class="ph codeph">MAP</code>). Using a complex type column with this
+        operator causes a query error.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>-- If this value is non-zero, something is wrong.
+select count(*) from employees where employee_id is null;
+
+-- With data from disparate sources, some fields might be blank.
+-- Not necessarily an error condition.
+select count(*) from census where household_income is null;
+
+-- Sometimes we expect fields to be null, and followup action
+-- is needed when they are not.
+select count(*) from web_traffic where weird_http_code is not null;</code></pre>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title11" id="operators__is_true">
+
+    <h2 class="title topictitle2" id="ariaid-title11">IS TRUE Operator</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+
+
+
+
+        This variation of the <code class="ph codeph">IS</code> operator tests for truth
+        or falsity, with right-hand arguments <code class="ph codeph">[NOT] TRUE</code>,
+        <code class="ph codeph">[NOT] FALSE</code>, and <code class="ph codeph">[NOT] UNKNOWN</code>.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+<pre class="pre codeblock"><code><var class="keyword varname">expression</var> IS TRUE
+<var class="keyword varname">expression</var> IS NOT TRUE
+
+<var class="keyword varname">expression</var> IS FALSE
+<var class="keyword varname">expression</var> IS NOT FALSE
+
+</code></pre>
+
+      <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+      <p class="p">
+        This <code class="ph codeph">IS TRUE</code> and <code class="ph codeph">IS FALSE</code> forms are
+        similar to doing equality comparisons with the Boolean values
+        <code class="ph codeph">TRUE</code> and <code class="ph codeph">FALSE</code>, except that
+        <code class="ph codeph">IS TRUE</code> and <code class="ph codeph">IS FALSE</code>
+        always return either <code class="ph codeph">TRUE</code> or <code class="ph codeph">FALSE</code>,
+        even if the left-hand side expression returns <code class="ph codeph">NULL</code>
+      </p>
+
+      <p class="p">
+        These operators let you simplify Boolean comparisons that must also
+        check for <code class="ph codeph">NULL</code>, for example
+        <code class="ph codeph">X != 10 AND X IS NOT NULL</code> is equivalent to
+        <code class="ph codeph">(X != 10) IS TRUE</code>.
+      </p>
+
+      <p class="p">
+        In <span class="keyword">Impala 2.11</span> and higher, you can use
+        the operators <code class="ph codeph">IS [NOT] TRUE</code> and
+        <code class="ph codeph">IS [NOT] FALSE</code> as equivalents for the built-in
+        functions <code class="ph codeph">istrue()</code>, <code class="ph codeph">isnottrue()</code>,
+        <code class="ph codeph">isfalse()</code>, and <code class="ph codeph">isnotfalse()</code>.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Complex type considerations:</strong>
+      </p>
+
+      <p class="p">
+        The <code class="ph codeph">IS [NOT] TRUE</code> and <code class="ph codeph">IS [NOT] FALSE</code> operators are not
+        applicable to complex type columns (<code class="ph codeph">STRUCT</code>, <code class="ph codeph">ARRAY</code>, or
+        <code class="ph codeph">MAP</code>). Using a complex type column with these operators causes a query error.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.11.0</span>
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>
+select assertion, b, b is true, b is false, b is unknown
+  from boolean_test;
++-------------+-------+-----------+------------+-----------+
+| assertion   | b     | istrue(b) | isfalse(b) | b is null |
++-------------+-------+-----------+------------+-----------+
+| 2 + 2 = 4   | true  | true      | false      | false     |
+| 2 + 2 = 5   | false | false     | true       | false     |
+| 1 = null    | NULL  | false     | false      | true      |
+| null = null | NULL  | false     | false      | true      |
++-------------+-------+-----------+------------+-----------+
+</code></pre>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title12" id="operators__like">
+
+    <h2 class="title topictitle2" id="ariaid-title12">LIKE Operator</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+
+        A comparison operator for <code class="ph codeph">STRING</code> data, with basic wildcard capability using the underscore
+        (<code class="ph codeph">_</code>) to match a single character and the percent sign (<code class="ph codeph">%</code>) to match multiple
+        characters. The argument expression must match the entire string value.
+        Typically, it is more efficient to put any <code class="ph codeph">%</code> wildcard match at the end of the string.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+<pre class="pre codeblock"><code><var class="keyword varname">string_expression</var> LIKE <var class="keyword varname">wildcard_expression</var>
+<var class="keyword varname">string_expression</var> NOT LIKE <var class="keyword varname">wildcard_expression</var>
+</code></pre>
+
+      <p class="p">
+        <strong class="ph b">Complex type considerations:</strong>
+      </p>
+
+      <p class="p">
+        You cannot refer to a column with a complex data type (<code class="ph codeph">ARRAY</code>, <code class="ph codeph">STRUCT</code>, or <code class="ph codeph">MAP</code>
+        directly in an operator. You can apply operators only to scalar values that make up a complex type
+        (the fields of a <code class="ph codeph">STRUCT</code>, the items of an <code class="ph codeph">ARRAY</code>,
+        or the key or value portion of a <code class="ph codeph">MAP</code>) as part of a join query that refers to
+        the scalar value using the appropriate dot notation or <code class="ph codeph">ITEM</code>, <code class="ph codeph">KEY</code>, or <code class="ph codeph">VALUE</code>
+        pseudocolumn names.
+      </p>
+
+
+
+      <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>select distinct c_last_name from customer where c_last_name like 'Mc%' or c_last_name like 'Mac%';
+select count(c_last_name) from customer where c_last_name like 'M%';
+select c_email_address from customer where c_email_address like '%.edu';
+
+-- We can find 4-letter names beginning with 'M' by calling functions...
+select distinct c_last_name from customer where length(c_last_name) = 4 and substr(c_last_name,1,1) = 'M';
+-- ...or in a more readable way by matching M followed by exactly 3 characters.
+select distinct c_last_name from customer where c_last_name like 'M___';</code></pre>
+
+      <p class="p">
+        For case-insensitive comparisons, see <a class="xref" href="impala_operators.html#ilike">ILIKE Operator</a>.
+        For a more general kind of search operator using regular expressions, see <a class="xref" href="impala_operators.html#regexp">REGEXP Operator</a>
+        or its case-insensitive counterpart <a class="xref" href="impala_operators.html#iregexp">IREGEXP Operator</a>.
+      </p>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title13" id="operators__logical_operators">
+
+    <h2 class="title topictitle2" id="ariaid-title13">Logical Operators</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+
+        Logical operators return a <code class="ph codeph">BOOLEAN</code> value, based on a binary or unary logical operation between arguments that are
+        also Booleans. Typically, the argument expressions use <a class="xref" href="impala_operators.html#comparison_operators">comparison
+        operators</a>.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+<pre class="pre codeblock"><code><var class="keyword varname">boolean_expression</var> <var class="keyword varname">binary_logical_operator</var> <var class="keyword varname">boolean_expression</var>
+<var class="keyword varname">unary_logical_operator</var> <var class="keyword varname">boolean_expression</var>
+</code></pre>
+
+      <p class="p">
+        The Impala logical operators are:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          <code class="ph codeph">AND</code>: A binary operator that returns <code class="ph codeph">true</code> if its left-hand and right-hand arguments both evaluate
+          to <code class="ph codeph">true</code>, <code class="ph codeph">NULL</code> if either argument is <code class="ph codeph">NULL</code>, and <code class="ph codeph">false</code> otherwise.
+        </li>
+
+        <li class="li">
+          <code class="ph codeph">OR</code>: A binary operator that returns <code class="ph codeph">true</code> if either of its left-hand and right-hand arguments
+          evaluate to <code class="ph codeph">true</code>, <code class="ph codeph">NULL</code> if one argument is <code class="ph codeph">NULL</code> and the other is either
+          <code class="ph codeph">NULL</code> or <code class="ph codeph">false</code>, and <code class="ph codeph">false</code> otherwise.
+        </li>
+
+        <li class="li">
+          <code class="ph codeph">NOT</code>: A unary operator that flips the state of a Boolean expression from <code class="ph codeph">true</code> to
+          <code class="ph codeph">false</code>, or <code class="ph codeph">false</code> to <code class="ph codeph">true</code>. If the argument expression is <code class="ph codeph">NULL</code>,
+          the result remains <code class="ph codeph">NULL</code>. (When <code class="ph codeph">NOT</code> is used this way as a unary logical operator, it works
+          differently than the <code class="ph codeph">IS NOT NULL</code> comparison operator, which returns <code class="ph codeph">true</code> when applied to a
+          <code class="ph codeph">NULL</code>.)
+        </li>
+      </ul>
+
+      <p class="p">
+        <strong class="ph b">Complex type considerations:</strong>
+      </p>
+
+      <p class="p">
+        You cannot refer to a column with a complex data type (<code class="ph codeph">ARRAY</code>, <code class="ph codeph">STRUCT</code>, or <code class="ph codeph">MAP</code>
+        directly in an operator. You can apply operators only to scalar values that make up a complex type
+        (the fields of a <code class="ph codeph">STRUCT</code>, the items of an <code class="ph codeph">ARRAY</code>,
+        or the key or value portion of a <code class="ph codeph">MAP</code>) as part of a join query that refers to
+        the scalar value using the appropriate dot notation or <code class="ph codeph">ITEM</code>, <code class="ph codeph">KEY</code>, or <code class="ph codeph">VALUE</code>
+        pseudocolumn names.
+      </p>
+
+      <p class="p">
+        The following example shows how to do an arithmetic operation using a numeric field of a <code class="ph codeph">STRUCT</code> type that is an
+        item within an <code class="ph codeph">ARRAY</code> column. Once the scalar numeric value <code class="ph codeph">R_NATIONKEY</code> is extracted, it can be
+        used in an arithmetic expression, such as multiplying by 10:
+      </p>
+
+<pre class="pre codeblock"><code>
+-- The SMALLINT is a field within an array of structs.
+describe region;
++-------------+-------------------------+---------+
+| name        | type                    | comment |
++-------------+-------------------------+---------+
+| r_regionkey | smallint                |         |
+| r_name      | string                  |         |
+| r_comment   | string                  |         |
+| r_nations   | array&lt;struct&lt;           |         |
+|             |   n_nationkey:smallint, |         |
+|             |   n_name:string,        |         |
+|             |   n_comment:string      |         |
+|             | &gt;&gt;                      |         |
++-------------+-------------------------+---------+
+
+-- When we refer to the scalar value using dot notation,
+-- we can use arithmetic and comparison operators on it
+-- like any other number.
+select r_name, nation.item.n_name, nation.item.n_nationkey
+  from region, region.r_nations as nation
+where
+  nation.item.n_nationkey between 3 and 5
+  or nation.item.n_nationkey &lt; 15;
++-------------+----------------+------------------+
+| r_name      | item.n_name    | item.n_nationkey |
++-------------+----------------+------------------+
+| EUROPE      | UNITED KINGDOM | 23               |
+| EUROPE      | RUSSIA         | 22               |
+| EUROPE      | ROMANIA        | 19               |
+| ASIA        | VIETNAM        | 21               |
+| ASIA        | CHINA          | 18               |
+| AMERICA     | UNITED STATES  | 24               |
+| AMERICA     | PERU           | 17               |
+| AMERICA     | CANADA         | 3                |
+| MIDDLE EAST | SAUDI ARABIA   | 20               |
+| MIDDLE EAST | EGYPT          | 4                |
+| AFRICA      | MOZAMBIQUE     | 16               |
+| AFRICA      | ETHIOPIA       | 5                |
++-------------+----------------+------------------+
+</code></pre>
+
+      <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+      <p class="p">
+        These examples demonstrate the <code class="ph codeph">AND</code> operator:
+      </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; select true and true;
++---------------+
+| true and true |
++---------------+
+| true          |
++---------------+
+[localhost:21000] &gt; select true and false;
++----------------+
+| true and false |
++----------------+
+| false          |
++----------------+
+[localhost:21000] &gt; select false and false;
++-----------------+
+| false and false |
++-----------------+
+| false           |
++-----------------+
+[localhost:21000] &gt; select true and null;
++---------------+
+| true and null |
++---------------+
+| NULL          |
++---------------+
+[localhost:21000] &gt; select (10 &gt; 2) and (6 != 9);
++-----------------------+
+| (10 &gt; 2) and (6 != 9) |
++-----------------------+
+| true                  |
++-----------------------+
+</code></pre>
+
+      <p class="p">
+        These examples demonstrate the <code class="ph codeph">OR</code> operator:
+      </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; select true or true;
++--------------+
+| true or true |
++--------------+
+| true         |
++--------------+
+[localhost:21000] &gt; select true or false;
++---------------+
+| true or false |
++---------------+
+| true          |
++---------------+
+[localhost:21000] &gt; select false or false;
++----------------+
+| false or false |
++----------------+
+| false          |
++----------------+
+[localhost:21000] &gt; select true or null;
++--------------+
+| true or null |
++--------------+
+| true         |
++--------------+
+[localhost:21000] &gt; select null or true;
++--------------+
+| null or true |
++--------------+
+| true         |
++--------------+
+[localhost:21000] &gt; select false or null;
++---------------+
+| false or null |
++---------------+
+| NULL          |
++---------------+
+[localhost:21000] &gt; select (1 = 1) or ('hello' = 'world');
++--------------------------------+
+| (1 = 1) or ('hello' = 'world') |
++--------------------------------+
+| true                           |
++--------------------------------+
+[localhost:21000] &gt; select (2 + 2 != 4) or (-1 &gt; 0);
++--------------------------+
+| (2 + 2 != 4) or (-1 &gt; 0) |
++--------------------------+
+| false                    |
++--------------------------+
+</code></pre>
+
+      <p class="p">
+        These examples demonstrate the <code class="ph codeph">NOT</code> operator:
+      </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; select not true;
++----------+
+| not true |
++----------+
+| false    |
++----------+
+[localhost:21000] &gt; select not false;
++-----------+
+| not false |
++-----------+
+| true      |
++-----------+
+[localhost:21000] &gt; select not null;
++----------+
+| not null |
++----------+
+| NULL     |
++----------+
+[localhost:21000] &gt; select not (1=1);
++-------------+
+| not (1 = 1) |
++-------------+
+| false       |
++-------------+
+</code></pre>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title14" id="operators__regexp">
+
+    <h2 class="title topictitle2" id="ariaid-title14">REGEXP Operator</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+
+        Tests whether a value matches a regular expression. Uses the POSIX regular expression syntax where <code class="ph codeph">^</code> and
+        <code class="ph codeph">$</code> match the beginning and end of the string, <code class="ph codeph">.</code> represents any single character, <code class="ph codeph">*</code>
+        represents a sequence of zero or more items, <code class="ph codeph">+</code> represents a sequence of one or more items, <code class="ph codeph">?</code>
+        produces a non-greedy match, and so on.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+<pre class="pre codeblock"><code><var class="keyword varname">string_ex

<TRUNCATED>

[11/51] [partial] impala git commit: [DOCS] Impala doc site update for 3.0

Posted by mi...@apache.org.

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_proxy.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_proxy.html b/docs/build3x/html/topics/impala_proxy.html
new file mode 100644
index 0000000..9a2b90d
--- /dev/null
+++ b/docs/build3x/html/topics/impala_proxy.html
@@ -0,0 +1,501 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_admin.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="proxy"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Using Impala through a Proxy for High Availability</title></head><body id="proxy"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Using Impala through a Proxy for High Availability</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+      For most clusters that have multiple users and production availability requirements, you might set up a proxy
+      server to relay requests to and from Impala.
+    </p>
+
+    <p class="p">
+      Currently, the Impala statestore mechanism does not include such proxying and load-balancing features. Set up
+      a software package of your choice to perform these functions.
+    </p>
+
+    <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+      <p class="p">
+        Most considerations for load balancing and high availability apply to the <span class="keyword cmdname">impalad</span> daemon.
+        The <span class="keyword cmdname">statestored</span> and <span class="keyword cmdname">catalogd</span> daemons do not have special
+        requirements for high availability, because problems with those daemons do not result in data loss.
+        If those daemons become unavailable due to an outage on a particular
+        host, you can stop the Impala service, delete the <span class="ph uicontrol">Impala StateStore</span> and
+        <span class="ph uicontrol">Impala Catalog Server</span> roles, add the roles on a different host, and restart the
+        Impala service.
+      </p>
+    </div>
+
+    <p class="p toc inpage"></p>
+
+  </div>
+
+  <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_admin.html">Impala Administration</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="proxy__proxy_overview">
+
+    <h2 class="title topictitle2" id="ariaid-title2">Overview of Proxy Usage and Load Balancing for Impala</h2>
+
+
+    <div class="body conbody">
+
+      <p class="p">
+        Using a load-balancing proxy server for Impala has the following advantages:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          Applications connect to a single well-known host and port, rather than keeping track of the hosts where
+          the <span class="keyword cmdname">impalad</span> daemon is running.
+        </li>
+
+        <li class="li">
+          If any host running the <span class="keyword cmdname">impalad</span> daemon becomes unavailable, application connection
+          requests still succeed because you always connect to the proxy server rather than a specific host running
+          the <span class="keyword cmdname">impalad</span> daemon.
+        </li>
+
+        <li class="li">
+          The coordinator node for each Impala query potentially requires more memory and CPU cycles than the other
+          nodes that process the query. The proxy server can issue queries using round-robin scheduling, so that
+          each connection uses a different coordinator node. This load-balancing technique lets the Impala nodes
+          share this additional work, rather than concentrating it on a single machine.
+        </li>
+      </ul>
+
+      <p class="p">
+        The following setup steps are a general outline that apply to any load-balancing proxy software:
+      </p>
+
+      <ol class="ol">
+        <li class="li">
+          Select and download the load-balancing proxy software or other
+          load-balancing hardware appliance. It should only need to be installed
+          and configured on a single host, typically on an edge node. Pick a
+          host other than the DataNodes where <span class="keyword cmdname">impalad</span> is
+          running, because the intention is to protect against the possibility
+          of one or more of these DataNodes becoming unavailable.
+        </li>
+
+        <li class="li">
+          Configure the load balancer (typically by editing a configuration file).
+          In particular:
+          <ul class="ul">
+            <li class="li">
+              Set up a port that the load balancer will listen on to relay
+              Impala requests back and forth. </li>
+            <li class="li">
+              See <a class="xref" href="#proxy_balancing">Choosing the Load-Balancing Algorithm</a> for load
+              balancing algorithm options.
+            </li>
+            <li class="li">
+              For Kerberized clusters, follow the instructions in <a class="xref" href="impala_proxy.html#proxy_kerberos">Special Proxy Considerations for Clusters Using Kerberos</a>.
+            </li>
+          </ul>
+        </li>
+
+        <li class="li">
+          If you are using Hue or JDBC-based applications, you typically set
+          up load balancing for both ports 21000 and 21050, because these client
+          applications connect through port 21050 while the
+            <span class="keyword cmdname">impala-shell</span> command connects through port
+          21000. See <a class="xref" href="impala_ports.html#ports">Ports Used by Impala</a> for when to use port
+          21000, 21050, or another value depending on what type of connections
+          you are load balancing.
+        </li>
+
+        <li class="li">
+          Run the load-balancing proxy server, pointing it at the configuration file that you set up.
+        </li>
+
+        <li class="li">
+          For any scripts, jobs, or configuration settings for applications
+          that formerly connected to a specific DataNode to run Impala SQL
+          statements, change the connection information (such as the
+            <code class="ph codeph">-i</code> option in <span class="keyword cmdname">impala-shell</span>) to
+          point to the load balancer instead.
+        </li>
+      </ol>
+
+      <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+        The following sections use the HAProxy software as a representative example of a load balancer
+        that you can use with Impala.
+      </div>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="proxy__proxy_balancing">
+    <h2 class="title topictitle2" id="ariaid-title3">Choosing the Load-Balancing Algorithm</h2>
+    <div class="body conbody">
+      <p class="p">
+        Load-balancing software offers a number of algorithms to distribute requests.
+        Each algorithm has its own characteristics that make it suitable in some situations
+        but not others.
+      </p>
+
+      <dl class="dl">
+
+          <dt class="dt dlterm">Leastconn</dt>
+          <dd class="dd">
+            Connects sessions to the coordinator with the fewest connections,
+            to balance the load evenly. Typically used for workloads consisting
+            of many independent, short-running queries. In configurations with
+            only a few client machines, this setting can avoid having all
+            requests go to only a small set of coordinators.
+          </dd>
+          <dd class="dd ddexpand">
+            Recommended for Impala with F5.
+          </dd>
+
+
+          <dt class="dt dlterm">Source IP Persistence</dt>
+          <dd class="dd">
+            <p class="p">
+              Sessions from the same IP address always go to the same
+              coordinator. A good choice for Impala workloads containing a mix
+              of queries and DDL statements, such as <code class="ph codeph">CREATE TABLE</code>
+              and <code class="ph codeph">ALTER TABLE</code>. Because the metadata changes from
+              a DDL statement take time to propagate across the cluster, prefer
+              to use the Source IP Persistence in this case. If you are unable
+              to choose Source IP Persistence, run the DDL and subsequent queries
+              that depend on the results of the DDL through the same session,
+              for example by running <code class="ph codeph">impala-shell -f <var class="keyword varname">script_file</var></code>
+              to submit several statements through a single session.
+            </p>
+          </dd>
+          <dd class="dd ddexpand">
+            <p class="p">
+              Required for setting up high availability with Hue.
+            </p>
+          </dd>
+
+
+          <dt class="dt dlterm">Round-robin</dt>
+          <dd class="dd">
+            <p class="p">
+              Distributes connections to all coordinator nodes.
+              Typically not recommended for Impala.
+            </p>
+          </dd>
+
+      </dl>
+
+      <p class="p">
+        You might need to perform benchmarks and load testing to determine
+        which setting is optimal for your use case. Always set up using two
+        load-balancing algorithms: Source IP Persistence for Hue and Leastconn
+        for others.
+      </p>
+
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="proxy__proxy_kerberos">
+
+    <h2 class="title topictitle2" id="ariaid-title4">Special Proxy Considerations for Clusters Using Kerberos</h2>
+
+
+    <div class="body conbody">
+
+      <p class="p">
+        In a cluster using Kerberos, applications check host credentials to
+        verify that the host they are connecting to is the same one that is
+        actually processing the request, to prevent man-in-the-middle attacks.
+      </p>
+      <p class="p">
+        In <span class="keyword">Impala 2.11</span> and lower
+        versions, once you enable a proxy server in a Kerberized cluster, users
+        will not be able to connect to individual impala daemons directly from
+        impala-shell.
+      </p>
+
+      <p class="p">
+        In <span class="keyword">Impala 2.12</span> and higher,
+        if you enable a proxy server in a Kerberized cluster, users have an
+        option to connect to Impala daemons directly from
+          <span class="keyword cmdname">impala-shell</span> using the <code class="ph codeph">-b</code> /
+          <code class="ph codeph">--kerberos_host_fqdn</code> option when you start
+          <span class="keyword cmdname">impala-shell</span>. This option can be used for testing or
+        troubleshooting purposes, but not recommended for live production
+        environments as it defeats the purpose of a load balancer/proxy.
+      </p>
+
+      <div class="p">
+        Example:
+<pre class="pre codeblock"><code>
+impala-shell -i impalad-1.mydomain.com -k -b loadbalancer-1.mydomain.com
+</code></pre>
+      </div>
+
+      <div class="p">
+        Alternatively, with the fully qualified
+        configurations:
+<pre class="pre codeblock"><code>impala-shell --impalad=impalad-1.mydomain.com:21000 --kerberos --kerberos_host_fqdn=loadbalancer-1.mydomain.com</code></pre>
+      </div>
+      <p class="p">
+        See <a class="xref" href="impala_shell_options.html#shell_options">impala-shell Configuration Options</a> for
+        information about the option.
+      </p>
+
+      <p class="p">
+        To clarify that the load-balancing proxy server is legitimate, perform
+        these extra Kerberos setup steps:
+      </p>
+
+      <ol class="ol">
+        <li class="li">
+          This section assumes you are starting with a Kerberos-enabled cluster. See
+          <a class="xref" href="impala_kerberos.html#kerberos">Enabling Kerberos Authentication for Impala</a> for instructions for setting up Impala with Kerberos. See
+          <span class="xref">the documentation for your Apache Hadoop distribution</span> for general steps to set up Kerberos.
+        </li>
+
+        <li class="li">
+          Choose the host you will use for the proxy server. Based on the Kerberos setup procedure, it should
+          already have an entry <code class="ph codeph">impala/<var class="keyword varname">proxy_host</var>@<var class="keyword varname">realm</var></code> in
+          its keytab. If not, go back over the initial Kerberos configuration steps for the keytab on each host
+          running the <span class="keyword cmdname">impalad</span> daemon.
+        </li>
+
+        <li class="li">
+          Copy the keytab file from the proxy host to all other hosts in the cluster that run the
+          <span class="keyword cmdname">impalad</span> daemon. (For optimal performance, <span class="keyword cmdname">impalad</span> should be running
+          on all DataNodes in the cluster.) Put the keytab file in a secure location on each of these other hosts.
+        </li>
+
+        <li class="li">
+          Add an entry <code class="ph codeph">impala/<var class="keyword varname">actual_hostname</var>@<var class="keyword varname">realm</var></code> to the keytab on each
+          host running the <span class="keyword cmdname">impalad</span> daemon.
+        </li>
+
+        <li class="li">
+
+         For each impalad node, merge the existing keytab with the proxy’s keytab using
+          <span class="keyword cmdname">ktutil</span>, producing a new keytab file. For example:
+          <pre class="pre codeblock"><code>$ ktutil
+  ktutil: read_kt proxy.keytab
+  ktutil: read_kt impala.keytab
+  ktutil: write_kt proxy_impala.keytab
+  ktutil: quit</code></pre>
+
+        </li>
+
+        <li class="li">
+
+          To verify that the keytabs are merged, run the command:
+<pre class="pre codeblock"><code>
+klist -k <var class="keyword varname">keytabfile</var>
+</code></pre>
+          which lists the credentials for both <code class="ph codeph">principal</code> and <code class="ph codeph">be_principal</code> on
+          all nodes.
+        </li>
+
+
+        <li class="li">
+
+          Make sure that the <code class="ph codeph">impala</code> user has permission to read this merged keytab file.
+
+        </li>
+
+        <li class="li">
+          Change the following configuration settings for each host in the cluster that participates
+          in the load balancing:
+          <ul class="ul">
+            <li class="li">
+              In the <span class="keyword cmdname">impalad</span> option definition, add:
+<pre class="pre codeblock"><code>
+--principal=impala/<em class="ph i">proxy_host@realm</em>
+  --be_principal=impala/<em class="ph i">actual_host@realm</em>
+  --keytab_file=<em class="ph i">path_to_merged_keytab</em>
+</code></pre>
+              <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+                Every host has different <code class="ph codeph">--be_principal</code> because the actual hostname
+                is different on each host.
+
+                Specify the fully qualified domain name (FQDN) for the proxy host, not the IP
+                address. Use the exact FQDN as returned by a reverse DNS lookup for the associated
+                IP address.
+
+              </div>
+            </li>
+
+            <li class="li">
+              Modify the startup options. See <a class="xref" href="impala_config_options.html#config_options">Modifying Impala Startup Options</a> for the procedure to modify the startup
+              options.
+            </li>
+          </ul>
+        </li>
+
+        <li class="li">
+          Restart Impala to make the changes take effect. Restart the <span class="keyword cmdname">impalad</span> daemons on all
+          hosts in the cluster, as well as the <span class="keyword cmdname">statestored</span> and <span class="keyword cmdname">catalogd</span>
+          daemons.
+        </li>
+
+      </ol>
+
+
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title5" id="proxy__tut_proxy">
+
+    <h2 class="title topictitle2" id="ariaid-title5">Example of Configuring HAProxy Load Balancer for Impala</h2>
+
+
+    <div class="body conbody">
+
+      <p class="p">
+        If you are not already using a load-balancing proxy, you can experiment with
+        <a class="xref" href="http://haproxy.1wt.eu/" target="_blank">HAProxy</a> a free, open source load
+        balancer. This example shows how you might install and configure that load balancer on a Red Hat Enterprise
+        Linux system.
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          <p class="p">
+            Install the load balancer: <code class="ph codeph">yum install haproxy</code>
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            Set up the configuration file: <span class="ph filepath">/etc/haproxy/haproxy.cfg</span>. See the following section
+            for a sample configuration file.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            Run the load balancer (on a single host, preferably one not running <span class="keyword cmdname">impalad</span>):
+          </p>
+<pre class="pre codeblock"><code>/usr/sbin/haproxy –f /etc/haproxy/haproxy.cfg</code></pre>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            In <span class="keyword cmdname">impala-shell</span>, JDBC applications, or ODBC applications, connect to the listener
+            port of the proxy host, rather than port 21000 or 21050 on a host actually running <span class="keyword cmdname">impalad</span>.
+            The sample configuration file sets haproxy to listen on port 25003, therefore you would send all
+            requests to <code class="ph codeph"><var class="keyword varname">haproxy_host</var>:25003</code>.
+          </p>
+        </li>
+      </ul>
+
+      <p class="p">
+        This is the sample <span class="ph filepath">haproxy.cfg</span> used in this example:
+      </p>
+
+<pre class="pre codeblock"><code>global
+    # To have these messages end up in /var/log/haproxy.log you will
+    # need to:
+    #
+    # 1) configure syslog to accept network log events.  This is done
+    #    by adding the '-r' option to the SYSLOGD_OPTIONS in
+    #    /etc/sysconfig/syslog
+    #
+    # 2) configure local2 events to go to the /var/log/haproxy.log
+    #   file. A line like the following can be added to
+    #   /etc/sysconfig/syslog
+    #
+    #    local2.*                       /var/log/haproxy.log
+    #
+    log         127.0.0.1 local0
+    log         127.0.0.1 local1 notice
+    chroot      /var/lib/haproxy
+    pidfile     /var/run/haproxy.pid
+    maxconn     4000
+    user        haproxy
+    group       haproxy
+    daemon
+
+    # turn on stats unix socket
+    #stats socket /var/lib/haproxy/stats
+
+#---------------------------------------------------------------------
+# common defaults that all the 'listen' and 'backend' sections will
+# use if not designated in their block
+#
+# You might need to adjust timing values to prevent timeouts.
+#
+# The timeout values should be dependant on how you use the cluster
+# and how long your queries run.
+#---------------------------------------------------------------------
+defaults
+    mode                    http
+    log                     global
+    option                  httplog
+    option                  dontlognull
+    option http-server-close
+    option forwardfor       except 127.0.0.0/8
+    option                  redispatch
+    retries                 3
+    maxconn                 3000
+    timeout connect 5000
+    timeout client 3600s
+    timeout server 3600s
+
+#
+# This sets up the admin page for HA Proxy at port 25002.
+#
+listen stats :25002
+    balance
+    mode http
+    stats enable
+    stats auth <var class="keyword varname">username</var>:<var class="keyword varname">password</var>
+
+# This is the setup for Impala. Impala client connect to load_balancer_host:25003.
+# HAProxy will balance connections among the list of servers listed below.
+# The list of Impalad is listening at port 21000 for beeswax (impala-shell) or original ODBC driver.
+# For JDBC or ODBC version 2.x driver, use port 21050 instead of 21000.
+listen impala :25003
+    mode tcp
+    option tcplog
+    balance leastconn
+
+    server <var class="keyword varname">symbolic_name_1</var> impala-host-1.example.com:21000
+    server <var class="keyword varname">symbolic_name_2</var> impala-host-2.example.com:21000
+    server <var class="keyword varname">symbolic_name_3</var> impala-host-3.example.com:21000
+    server <var class="keyword varname">symbolic_name_4</var> impala-host-4.example.com:21000
+
+# Setup for Hue or other JDBC-enabled applications.
+# In particular, Hue requires sticky sessions.
+# The application connects to load_balancer_host:21051, and HAProxy balances
+# connections to the associated hosts, where Impala listens for JDBC
+# requests on port 21050.
+listen impalajdbc :21051
+    mode tcp
+    option tcplog
+    balance source
+    server <var class="keyword varname">symbolic_name_5</var> impala-host-1.example.com:21050 check
+    server <var class="keyword varname">symbolic_name_6</var> impala-host-2.example.com:21050 check
+    server <var class="keyword varname">symbolic_name_7</var> impala-host-3.example.com:21050 check
+    server <var class="keyword varname">symbolic_name_8</var> impala-host-4.example.com:21050 check
+</code></pre>
+      <div class="note important note_important"><span class="note__title importanttitle">Important:</span>
+        Hue requires the <code class="ph codeph">check</code> option at end of each line in
+        the above file to ensure HAProxy can detect any unreachable Impalad
+        server, and failover can be successful. Without the TCP check, you may hit
+        an error when the <span class="keyword cmdname">impalad</span> daemon to which Hue tries to
+        connect is down.
+      </div>
+
+      <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+        If your JDBC or ODBC application connects to Impala through a load balancer such as
+        <code class="ph codeph">haproxy</code>, be cautious about reusing the connections. If the load balancer has set up
+        connection timeout values, either check the connection frequently so that it never sits idle longer than
+        the load balancer timeout value, or check the connection validity before using it and create a new one if
+        the connection has been closed.
+      </div>
+
+    </div>
+
+  </article>
+
+</article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_query_options.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_query_options.html b/docs/build3x/html/topics/impala_query_options.html
new file mode 100644
index 0000000..40b0c8e
--- /dev/null
+++ b/docs/build3x/html/topics/impala_query_options.html
@@ -0,0 +1,55 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_set.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_abort_on_error.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_allow_unsupported_formats.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_appx_count_distinct.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_batch_size.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_buffer_pool_limit.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_compression_codec.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_compute_stats_min_sample_size.html"><meta name="DC.Relation" scheme="UR
 I" content="../topics/impala_debug_action.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_decimal_v2.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_default_join_distribution_mode.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_default_spillable_buffer_size.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_disable_codegen.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_disable_row_runtime_filtering.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_disable_streaming_preaggregations.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_disable_unsafe_spills.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_exec_single_node_rows_threshold.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_exec_time_limit_s.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_explain_level.html"><meta name=
 "DC.Relation" scheme="URI" content="../topics/impala_hbase_cache_blocks.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_hbase_caching.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_live_progress.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_live_summary.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_max_errors.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_max_row_size.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_max_num_runtime_filters.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_max_scan_range_length.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_mem_limit.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_min_spillable_buffer_size.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_mt_dop.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_n
 um_nodes.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_num_scanner_threads.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_optimize_partition_key_scans.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_parquet_compression_codec.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_parquet_annotate_strings_utf8.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_parquet_array_resolution.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_parquet_fallback_schema_resolution.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_parquet_file_size.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_prefetch_mode.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_timeout_s.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_request_pool.html"><meta name="DC.Relation" scheme="URI" content="../topics
 /impala_replica_preference.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_runtime_bloom_filter_size.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_runtime_filter_max_size.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_runtime_filter_min_size.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_runtime_filter_mode.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_runtime_filter_wait_time_ms.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_s3_skip_insert_staging.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_schedule_random_replica.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_scratch_limit.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_shuffle_distinct_exprs.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_support_start_over.html"><meta name="DC.Relation" scheme="URI" c
 ontent="../topics/impala_sync_ddl.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="query_options"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Query Options for the SET Statement</title></head><body id="query_options"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Query Options for the SET Statement</h1>
+
+
+  <div class="body conbody">
+
+    <p class="p">
+      You can specify the following options using the <code class="ph codeph">SET</code> statement, and those settings affect all
+      queries issued from that session.
+    </p>
+
+    <p class="p">
+      Some query options are useful in day-to-day operations for improving usability, performance, or flexibility.
+    </p>
+
+    <p class="p">
+      Other query options control special-purpose aspects of Impala operation and are intended primarily for
+      advanced debugging or troubleshooting.
+    </p>
+
+    <p class="p">
+      Options with Boolean parameters can be set to 1 or <code class="ph codeph">true</code> to enable, or 0 or <code class="ph codeph">false</code>
+      to turn off.
+    </p>
+
+    <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+      <p class="p">
+        In Impala 2.0 and later, you can set query options directly through the JDBC and ODBC interfaces by using the
+        <code class="ph codeph">SET</code> statement. Formerly, <code class="ph codeph">SET</code> was only available as a command within the
+        <span class="keyword cmdname">impala-shell</span> interpreter.
+      </p>
+      <p class="p">
+        In <span class="keyword">Impala 2.11</span> and later, you can set query options for an <span class="keyword cmdname">impala-shell</span> session
+        by specifying one or more command-line arguments of the form
+        <code class="ph codeph">--query_option=<var class="keyword varname">option</var>=<var class="keyword varname">value</var></code>.
+        See <a class="xref" href="impala_shell_options.html">impala-shell Configuration Options</a> for details.
+      </p>
+    </div>
+
+
+
+    <p class="p toc"></p>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+    <p class="p">
+      <a class="xref" href="impala_set.html#set">SET Statement</a>
+    </p>
+  </div>
+<nav role="navigation" class="related-links"><ul class="ullinks"><li class="link ulchildlink"><strong><a href="../topics/impala_abort_on_error.html">ABORT_ON_ERROR Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_allow_unsupported_formats.html">ALLOW_UNSUPPORTED_FORMATS Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_appx_count_distinct.html">APPX_COUNT_DISTINCT Query Option (Impala 2.0 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_batch_size.html">BATCH_SIZE Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_buffer_pool_limit.html">BUFFER_POOL_LIMIT Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_compression_codec.html">COMPRESSION_CODEC Query Option (Impala 2.0 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a hr
 ef="../topics/impala_compute_stats_min_sample_size.html">COMPUTE_STATS_MIN_SAMPLE_SIZE Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_debug_action.html">DEBUG_ACTION Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_decimal_v2.html">DECIMAL_V2 Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_default_join_distribution_mode.html">DEFAULT_JOIN_DISTRIBUTION_MODE Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_default_spillable_buffer_size.html">DEFAULT_SPILLABLE_BUFFER_SIZE Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_disable_codegen.html">DISABLE_CODEGEN Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_disable_row_runtime_filtering.html">DISABLE_ROW_RUNTIME_FILTERING Query Option (Impala 2.5 or hi
 gher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_disable_streaming_preaggregations.html">DISABLE_STREAMING_PREAGGREGATIONS Query Option (Impala 2.5 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_disable_unsafe_spills.html">DISABLE_UNSAFE_SPILLS Query Option (Impala 2.0 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_exec_single_node_rows_threshold.html">EXEC_SINGLE_NODE_ROWS_THRESHOLD Query Option (Impala 2.1 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_exec_time_limit_s.html">EXEC_TIME_LIMIT_S Query Option (Impala 2.12 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_explain_level.html">EXPLAIN_LEVEL Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_hbase_cache_blocks.html">HBASE_CAC
 HE_BLOCKS Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_hbase_caching.html">HBASE_CACHING Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_live_progress.html">LIVE_PROGRESS Query Option (Impala 2.3 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_live_summary.html">LIVE_SUMMARY Query Option (Impala 2.3 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_max_errors.html">MAX_ERRORS Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_max_row_size.html">MAX_ROW_SIZE Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_max_num_runtime_filters.html">MAX_NUM_RUNTIME_FILTERS Query Option (Impala 2.5 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_max_scan_r
 ange_length.html">MAX_SCAN_RANGE_LENGTH Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_mem_limit.html">MEM_LIMIT Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_min_spillable_buffer_size.html">MIN_SPILLABLE_BUFFER_SIZE Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_mt_dop.html">MT_DOP Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_num_nodes.html">NUM_NODES Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_num_scanner_threads.html">NUM_SCANNER_THREADS Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_optimize_partition_key_scans.html">OPTIMIZE_PARTITION_KEY_SCANS Query Option (Impala 2.5 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_parquet_co
 mpression_codec.html">PARQUET_COMPRESSION_CODEC Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_parquet_annotate_strings_utf8.html">PARQUET_ANNOTATE_STRINGS_UTF8 Query Option (Impala 2.6 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_parquet_array_resolution.html">PARQUET_ARRAY_RESOLUTION Query Option (Impala 2.9 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_parquet_fallback_schema_resolution.html">PARQUET_FALLBACK_SCHEMA_RESOLUTION Query Option (Impala 2.6 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_parquet_file_size.html">PARQUET_FILE_SIZE Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_prefetch_mode.html">PREFETCH_MODE Query Option (Impala 2.6 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href
 ="../topics/impala_query_timeout_s.html">QUERY_TIMEOUT_S Query Option (Impala 2.0 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_request_pool.html">REQUEST_POOL Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_replica_preference.html">REPLICA_PREFERENCE Query Option (Impala 2.7 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_runtime_bloom_filter_size.html">RUNTIME_BLOOM_FILTER_SIZE Query Option (Impala 2.5 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_runtime_filter_max_size.html">RUNTIME_FILTER_MAX_SIZE Query Option (Impala 2.6 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_runtime_filter_min_size.html">RUNTIME_FILTER_MIN_SIZE Query Option (Impala 2.6 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a
  href="../topics/impala_runtime_filter_mode.html">RUNTIME_FILTER_MODE Query Option (Impala 2.5 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_runtime_filter_wait_time_ms.html">RUNTIME_FILTER_WAIT_TIME_MS Query Option (Impala 2.5 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_s3_skip_insert_staging.html">S3_SKIP_INSERT_STAGING Query Option (Impala 2.6 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_schedule_random_replica.html">SCHEDULE_RANDOM_REPLICA Query Option (Impala 2.5 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_scratch_limit.html">SCRATCH_LIMIT Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_shuffle_distinct_exprs.html">SHUFFLE_DISTINCT_EXPRS Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a hr
 ef="../topics/impala_support_start_over.html">SUPPORT_START_OVER Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_sync_ddl.html">SYNC_DDL Query Option</a></strong><br></li></ul><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_set.html">SET Statement</a></div></div></nav></article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_query_timeout_s.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_query_timeout_s.html b/docs/build3x/html/topics/impala_query_timeout_s.html
new file mode 100644
index 0000000..d6c11e6
--- /dev/null
+++ b/docs/build3x/html/topics/impala_query_timeout_s.html
@@ -0,0 +1,62 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="query_timeout_s"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>QUERY_TIMEOUT_S Query Option (Impala 2.0 or higher only)</title></head><body id="query_timeout_s"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">QUERY_TIMEOUT_S Query Option (<span class="keyword">Impala 2.0</span> or higher only)</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+
+      Sets the idle query timeout value for the session, in seconds. Queries that sit idle for longer than the
+      timeout value are automatically cancelled. If the system administrator specified the
+      <code class="ph codeph">--idle_query_timeout</code> startup option, <code class="ph codeph">QUERY_TIMEOUT_S</code> must be smaller than
+      or equal to the <code class="ph codeph">--idle_query_timeout</code> value.
+    </p>
+
+    <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+        <p class="p">
+          The timeout clock for queries and sessions only starts ticking when the query or session is idle.
+          For queries, this means the query has results ready but is waiting for a client to fetch the data. A
+          query can run for an arbitrary time without triggering a timeout, because the query is computing results
+          rather than sitting idle waiting for the results to be fetched. The timeout period is intended to prevent
+          unclosed queries from consuming resources and taking up slots in the admission count of running queries,
+          potentially preventing other queries from starting.
+        </p>
+        <p class="p">
+          For sessions, this means that no query has been submitted for some period of time.
+        </p>
+      </div>
+
+    <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>SET QUERY_TIMEOUT_S=<var class="keyword varname">seconds</var>;</code></pre>
+
+
+
+    <p class="p">
+      <strong class="ph b">Type:</strong> numeric
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Default:</strong> 0 (no timeout if <code class="ph codeph">--idle_query_timeout</code> not in effect; otherwise, use
+      <code class="ph codeph">--idle_query_timeout</code> value)
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.0.0</span>
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+    <p class="p">
+      <a class="xref" href="impala_timeouts.html#timeouts">Setting Timeout Periods for Daemons, Queries, and Sessions</a>
+    </p>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_rcfile.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_rcfile.html b/docs/build3x/html/topics/impala_rcfile.html
new file mode 100644
index 0000000..72b1bd8
--- /dev/null
+++ b/docs/build3x/html/topics/impala_rcfile.html
@@ -0,0 +1,246 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_file_formats.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="rcfile"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Using the RCFile File Format with Impala Tables</title></head><body id="rcfile"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Using the RCFile File Format with Impala Tables</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+
+      Impala supports using RCFile data files.
+    </p>
+
+    <table class="table"><caption><span class="table--title-label">Table 1. </span><span class="title">RCFile Format Support in Impala</span></caption><colgroup><col style="width:10%"><col style="width:10%"><col style="width:20%"><col style="width:30%"><col style="width:30%"></colgroup><thead class="thead">
+          <tr class="row">
+            <th class="entry nocellnorowborder" id="rcfile__entry__1">
+              File Type
+            </th>
+            <th class="entry nocellnorowborder" id="rcfile__entry__2">
+              Format
+            </th>
+            <th class="entry nocellnorowborder" id="rcfile__entry__3">
+              Compression Codecs
+            </th>
+            <th class="entry nocellnorowborder" id="rcfile__entry__4">
+              Impala Can CREATE?
+            </th>
+            <th class="entry nocellnorowborder" id="rcfile__entry__5">
+              Impala Can INSERT?
+            </th>
+          </tr>
+        </thead><tbody class="tbody">
+          <tr class="row">
+            <td class="entry nocellnorowborder" headers="rcfile__entry__1 ">
+              <a class="xref" href="impala_rcfile.html#rcfile">RCFile</a>
+            </td>
+            <td class="entry nocellnorowborder" headers="rcfile__entry__2 ">
+              Structured
+            </td>
+            <td class="entry nocellnorowborder" headers="rcfile__entry__3 ">
+              Snappy, gzip, deflate, bzip2
+            </td>
+            <td class="entry nocellnorowborder" headers="rcfile__entry__4 ">
+              Yes.
+            </td>
+            <td class="entry nocellnorowborder" headers="rcfile__entry__5 ">
+              No. Import data by using <code class="ph codeph">LOAD DATA</code> on data files already in the right format, or use
+              <code class="ph codeph">INSERT</code> in Hive followed by <code class="ph codeph">REFRESH <var class="keyword varname">table_name</var></code> in Impala.
+            </td>
+
+          </tr>
+        </tbody></table>
+
+    <p class="p toc inpage"></p>
+  </div>
+
+  <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_file_formats.html">How Impala Works with Hadoop File Formats</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="rcfile__rcfile_create">
+
+    <h2 class="title topictitle2" id="ariaid-title2">Creating RCFile Tables and Loading Data</h2>
+
+
+    <div class="body conbody">
+
+      <p class="p">
+        If you do not have an existing data file to use, begin by creating one in the appropriate format.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">To create an RCFile table:</strong>
+      </p>
+
+      <p class="p">
+        In the <code class="ph codeph">impala-shell</code> interpreter, issue a command similar to:
+      </p>
+
+<pre class="pre codeblock"><code>create table rcfile_table (<var class="keyword varname">column_specs</var>) stored as rcfile;</code></pre>
+
+      <p class="p">
+        Because Impala can query some kinds of tables that it cannot currently write to, after creating tables of
+        certain file formats, you might use the Hive shell to load the data. See
+        <a class="xref" href="impala_file_formats.html#file_formats">How Impala Works with Hadoop File Formats</a> for details. After loading data into a table through
+        Hive or other mechanism outside of Impala, issue a <code class="ph codeph">REFRESH <var class="keyword varname">table_name</var></code>
+        statement the next time you connect to the Impala node, before querying the table, to make Impala recognize
+        the new data.
+      </p>
+
+      <div class="note important note_important"><span class="note__title importanttitle">Important:</span>
+        See <a class="xref" href="impala_known_issues.html#known_issues">Known Issues and Workarounds in Impala</a> for potential compatibility issues with
+        RCFile tables created in Hive 0.12, due to a change in the default RCFile SerDe for Hive.
+      </div>
+
+      <p class="p">
+        For example, here is how you might create some RCFile tables in Impala (by specifying the columns
+        explicitly, or cloning the structure of another table), load data through Hive, and query them through
+        Impala:
+      </p>
+
+<pre class="pre codeblock"><code>$ impala-shell -i localhost
+[localhost:21000] &gt; create table rcfile_table (x int) stored as rcfile;
+[localhost:21000] &gt; create table rcfile_clone like some_other_table stored as rcfile;
+[localhost:21000] &gt; quit;
+
+$ hive
+hive&gt; insert into table rcfile_table select x from some_other_table;
+3 Rows loaded to rcfile_table
+Time taken: 19.015 seconds
+hive&gt; quit;
+
+$ impala-shell -i localhost
+[localhost:21000] &gt; select * from rcfile_table;
+Returned 0 row(s) in 0.23s
+[localhost:21000] &gt; -- Make Impala recognize the data loaded through Hive;
+[localhost:21000] &gt; refresh rcfile_table;
+[localhost:21000] &gt; select * from rcfile_table;
++---+
+| x |
++---+
+| 1 |
+| 2 |
+| 3 |
++---+
+Returned 3 row(s) in 0.23s</code></pre>
+
+      <p class="p">
+        <strong class="ph b">Complex type considerations:</strong>
+        Although you can create tables in this file format using
+        the complex types (<code class="ph codeph">ARRAY</code>, <code class="ph codeph">STRUCT</code>,
+        and <code class="ph codeph">MAP</code>) available in <span class="keyword">Impala 2.3</span> and higher,
+        currently, Impala can query these types only in Parquet tables.
+        <span class="ph">
+        The one exception to the preceding rule is <code class="ph codeph">COUNT(*)</code> queries on RCFile tables that include complex types.
+        Such queries are allowed in <span class="keyword">Impala 2.6</span> and higher.
+        </span>
+      </p>
+
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="rcfile__rcfile_compression">
+
+    <h2 class="title topictitle2" id="ariaid-title3">Enabling Compression for RCFile Tables</h2>
+
+
+    <div class="body conbody">
+
+      <p class="p">
+
+        You may want to enable compression on existing tables. Enabling compression provides performance gains in
+        most cases and is supported for RCFile tables. For example, to enable Snappy compression, you would specify
+        the following additional settings when loading data through the Hive shell:
+      </p>
+
+<pre class="pre codeblock"><code>hive&gt; SET hive.exec.compress.output=true;
+hive&gt; SET mapred.max.split.size=256000000;
+hive&gt; SET mapred.output.compression.type=BLOCK;
+hive&gt; SET mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec;
+hive&gt; INSERT OVERWRITE TABLE <var class="keyword varname">new_table</var> SELECT * FROM <var class="keyword varname">old_table</var>;</code></pre>
+
+      <p class="p">
+        If you are converting partitioned tables, you must complete additional steps. In such a case, specify
+        additional settings similar to the following:
+      </p>
+
+<pre class="pre codeblock"><code>hive&gt; CREATE TABLE <var class="keyword varname">new_table</var> (<var class="keyword varname">your_cols</var>) PARTITIONED BY (<var class="keyword varname">partition_cols</var>) STORED AS <var class="keyword varname">new_format</var>;
+hive&gt; SET hive.exec.dynamic.partition.mode=nonstrict;
+hive&gt; SET hive.exec.dynamic.partition=true;
+hive&gt; INSERT OVERWRITE TABLE <var class="keyword varname">new_table</var> PARTITION(<var class="keyword varname">comma_separated_partition_cols</var>) SELECT * FROM <var class="keyword varname">old_table</var>;</code></pre>
+
+      <p class="p">
+        Remember that Hive does not require that you specify a source format for it. Consider the case of
+        converting a table with two partition columns called <code class="ph codeph">year</code> and <code class="ph codeph">month</code> to a
+        Snappy compressed RCFile. Combining the components outlined previously to complete this table conversion,
+        you would specify settings similar to the following:
+      </p>
+
+<pre class="pre codeblock"><code>hive&gt; CREATE TABLE tbl_rc (int_col INT, string_col STRING) STORED AS RCFILE;
+hive&gt; SET hive.exec.compress.output=true;
+hive&gt; SET mapred.max.split.size=256000000;
+hive&gt; SET mapred.output.compression.type=BLOCK;
+hive&gt; SET mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec;
+hive&gt; SET hive.exec.dynamic.partition.mode=nonstrict;
+hive&gt; SET hive.exec.dynamic.partition=true;
+hive&gt; INSERT OVERWRITE TABLE tbl_rc SELECT * FROM tbl;</code></pre>
+
+      <p class="p">
+        To complete a similar process for a table that includes partitions, you would specify settings similar to
+        the following:
+      </p>
+
+<pre class="pre codeblock"><code>hive&gt; CREATE TABLE tbl_rc (int_col INT, string_col STRING) PARTITIONED BY (year INT) STORED AS RCFILE;
+hive&gt; SET hive.exec.compress.output=true;
+hive&gt; SET mapred.max.split.size=256000000;
+hive&gt; SET mapred.output.compression.type=BLOCK;
+hive&gt; SET mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec;
+hive&gt; SET hive.exec.dynamic.partition.mode=nonstrict;
+hive&gt; SET hive.exec.dynamic.partition=true;
+hive&gt; INSERT OVERWRITE TABLE tbl_rc PARTITION(year) SELECT * FROM tbl;</code></pre>
+
+      <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+        <p class="p">
+          The compression type is specified in the following command:
+        </p>
+<pre class="pre codeblock"><code>SET mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec;</code></pre>
+        <p class="p">
+          You could elect to specify alternative codecs such as <code class="ph codeph">GzipCodec</code> here.
+        </p>
+      </div>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="rcfile__rcfile_performance">
+
+    <h2 class="title topictitle2" id="ariaid-title4">Query Performance for Impala RCFile Tables</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        In general, expect query performance with RCFile tables to be
+        faster than with tables using text data, but slower than with
+        Parquet tables. See <a class="xref" href="impala_parquet.html#parquet">Using the Parquet File Format with Impala Tables</a>
+        for information about using the Parquet file format for
+        high-performance analytic queries.
+      </p>
+
+      <p class="p">
+        In <span class="keyword">Impala 2.6</span> and higher, Impala queries are optimized for files stored in Amazon S3.
+        For Impala tables that use the file formats Parquet, RCFile, SequenceFile,
+        Avro, and uncompressed text, the setting <code class="ph codeph">fs.s3a.block.size</code>
+        in the <span class="ph filepath">core-site.xml</span> configuration file determines
+        how Impala divides the I/O work of reading the data files. This configuration
+        setting is specified in bytes. By default, this
+        value is 33554432 (32 MB), meaning that Impala parallelizes S3 read operations on the files
+        as if they were made up of 32 MB blocks. For example, if your S3 queries primarily access
+        Parquet files written by MapReduce or Hive, increase <code class="ph codeph">fs.s3a.block.size</code>
+        to 134217728 (128 MB) to match the row group size of those files. If most S3 queries involve
+        Parquet files written by Impala, increase <code class="ph codeph">fs.s3a.block.size</code>
+        to 268435456 (256 MB) to match the row group size produced by Impala.
+      </p>
+
+    </div>
+  </article>
+
+
+</article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_real.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_real.html b/docs/build3x/html/topics/impala_real.html
new file mode 100644
index 0000000..5e772c2
--- /dev/null
+++ b/docs/build3x/html/topics/impala_real.html
@@ -0,0 +1,39 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_datatypes.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="real"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>REAL Data Type</title></head><body id="real"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">REAL Data Type</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+      An alias for the <code class="ph codeph">DOUBLE</code> data type. See <a class="xref" href="impala_double.html#double">DOUBLE Data Type</a> for details.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+    <p class="p">
+      These examples show how you can use the type names <code class="ph codeph">REAL</code> and <code class="ph codeph">DOUBLE</code>
+      interchangeably, and behind the scenes Impala treats them always as <code class="ph codeph">DOUBLE</code>.
+    </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; create table r1 (x real);
+[localhost:21000] &gt; describe r1;
++------+--------+---------+
+| name | type   | comment |
++------+--------+---------+
+| x    | double |         |
++------+--------+---------+
+[localhost:21000] &gt; insert into r1 values (1.5), (cast (2.2 as double));
+[localhost:21000] &gt; select cast (1e6 as real);
++---------------------------+
+| cast(1000000.0 as double) |
++---------------------------+
+| 1000000                   |
++---------------------------+</code></pre>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_datatypes.html">Data Types</a></div></div></nav></article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_refresh.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_refresh.html b/docs/build3x/html/topics/impala_refresh.html
new file mode 100644
index 0000000..5359668
--- /dev/null
+++ b/docs/build3x/html/topics/impala_refresh.html
@@ -0,0 +1,408 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_langref_sql.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="refresh"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>REFRESH Statement</title></head><body id="refresh"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">REFRESH Statement</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+
+      To accurately respond to queries, the Impala node that acts as the coordinator (the node to which you are
+      connected through <span class="keyword cmdname">impala-shell</span>, JDBC, or ODBC) must have current metadata about those
+      databases and tables that are referenced in Impala queries. If you are not familiar with the way Impala uses
+      metadata and how it shares the same metastore database as Hive, see
+      <a class="xref" href="impala_hadoop.html#intro_metastore">Overview of Impala Metadata and the Metastore</a> for background information.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>REFRESH [<var class="keyword varname">db_name</var>.]<var class="keyword varname">table_name</var> [PARTITION (<var class="keyword varname">key_col1</var>=<var class="keyword varname">val1</var> [, <var class="keyword varname">key_col2</var>=<var class="keyword varname">val2</var>...])]
+<span class="ph">REFRESH FUNCTIONS <var class="keyword varname">db_name</var></span>
+</code></pre>
+
+    <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+    <p class="p">
+      Use the <code class="ph codeph">REFRESH</code> statement to load the latest metastore metadata and block location data for
+      a particular table in these scenarios:
+    </p>
+
+    <ul class="ul">
+      <li class="li">
+        After loading new data files into the HDFS data directory for the table. (Once you have set up an ETL
+        pipeline to bring data into Impala on a regular basis, this is typically the most frequent reason why
+        metadata needs to be refreshed.)
+      </li>
+
+      <li class="li">
+        After issuing <code class="ph codeph">ALTER TABLE</code>, <code class="ph codeph">INSERT</code>, <code class="ph codeph">LOAD DATA</code>, or other
+        table-modifying SQL statement in Hive.
+      </li>
+    </ul>
+
+    <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+      <p class="p">
+        In <span class="keyword">Impala 2.3</span> and higher, the syntax <code class="ph codeph">ALTER TABLE <var class="keyword varname">table_name</var> RECOVER PARTITIONS</code>
+        is a faster alternative to <code class="ph codeph">REFRESH</code> when the only change to the table data is the addition of
+        new partition directories through Hive or manual HDFS operations.
+        See <a class="xref" href="impala_alter_table.html#alter_table">ALTER TABLE Statement</a> for details.
+      </p>
+    </div>
+
+    <p class="p">
+      You only need to issue the <code class="ph codeph">REFRESH</code> statement on the node to which you connect to issue
+      queries. The coordinator node divides the work among all the Impala nodes in a cluster, and sends read
+      requests for the correct HDFS blocks without relying on the metadata on the other nodes.
+    </p>
+
+    <p class="p">
+      <code class="ph codeph">REFRESH</code> reloads the metadata for the table from the metastore database, and does an
+      incremental reload of the low-level block location data to account for any new data files added to the HDFS
+      data directory for the table. It is a low-overhead, single-table operation, specifically tuned for the common
+      scenario where new data files are added to HDFS.
+    </p>
+
+    <p class="p">
+      Only the metadata for the specified table is flushed. The table must already exist and be known to Impala,
+      either because the <code class="ph codeph">CREATE TABLE</code> statement was run in Impala rather than Hive, or because a
+      previous <code class="ph codeph">INVALIDATE METADATA</code> statement caused Impala to reload its entire metadata catalog.
+    </p>
+
+    <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+      <p class="p">
+        The catalog service broadcasts any changed metadata as a result of Impala
+        <code class="ph codeph">ALTER TABLE</code>, <code class="ph codeph">INSERT</code> and <code class="ph codeph">LOAD DATA</code> statements to all
+        Impala nodes. Thus, the <code class="ph codeph">REFRESH</code> statement is only required if you load data through Hive
+        or by manipulating data files in HDFS directly. See <a class="xref" href="impala_components.html#intro_catalogd">The Impala Catalog Service</a> for
+        more information on the catalog service.
+      </p>
+      <p class="p">
+        Another way to avoid inconsistency across nodes is to enable the
+        <code class="ph codeph">SYNC_DDL</code> query option before performing a DDL statement or an <code class="ph codeph">INSERT</code> or
+        <code class="ph codeph">LOAD DATA</code>.
+      </p>
+      <p class="p">
+        The table name is a required parameter. To flush the metadata for all tables, use the
+        <code class="ph codeph"><a class="xref" href="impala_invalidate_metadata.html#invalidate_metadata">INVALIDATE METADATA</a></code>
+        command.
+      </p>
+      <p class="p">
+      Because <code class="ph codeph">REFRESH <var class="keyword varname">table_name</var></code> only works for tables that the current
+      Impala node is already aware of, when you create a new table in the Hive shell, enter
+      <code class="ph codeph">INVALIDATE METADATA <var class="keyword varname">new_table</var></code> before you can see the new table in
+      <span class="keyword cmdname">impala-shell</span>. Once the table is known by Impala, you can issue <code class="ph codeph">REFRESH
+      <var class="keyword varname">table_name</var></code> after you add data files for that table.
+    </p>
+    </div>
+
+    <p class="p">
+      <code class="ph codeph">INVALIDATE METADATA</code> and <code class="ph codeph">REFRESH</code> are counterparts: <code class="ph codeph">INVALIDATE
+      METADATA</code> waits to reload the metadata when needed for a subsequent query, but reloads all the
+      metadata for the table, which can be an expensive operation, especially for large tables with many
+      partitions. <code class="ph codeph">REFRESH</code> reloads the metadata immediately, but only loads the block location
+      data for newly added data files, making it a less expensive operation overall. If data was altered in some
+      more extensive way, such as being reorganized by the HDFS balancer, use <code class="ph codeph">INVALIDATE
+      METADATA</code> to avoid a performance penalty from reduced local reads. If you used Impala version 1.0,
+      the <code class="ph codeph">INVALIDATE METADATA</code> statement works just like the Impala 1.0 <code class="ph codeph">REFRESH</code>
+      statement did, while the Impala 1.1 <code class="ph codeph">REFRESH</code> is optimized for the common use case of adding
+      new data files to an existing table, thus the table name argument is now required.
+    </p>
+
+    <p class="p">
+      A metadata update for an <code class="ph codeph">impalad</code> instance <strong class="ph b">is</strong> required if:
+    </p>
+
+    <ul class="ul">
+      <li class="li">
+        A metadata change occurs.
+      </li>
+
+      <li class="li">
+        <strong class="ph b">and</strong> the change is made through Hive.
+      </li>
+
+      <li class="li">
+        <strong class="ph b">and</strong> the change is made to a metastore database to which clients such as the Impala shell or ODBC directly
+        connect.
+      </li>
+    </ul>
+
+    <p class="p">
+      A metadata update for an Impala node is <strong class="ph b">not</strong> required after you run <code class="ph codeph">ALTER TABLE</code>,
+      <code class="ph codeph">INSERT</code>, or other table-modifying statement in Impala rather than Hive. Impala handles the
+      metadata synchronization automatically through the catalog service.
+    </p>
+
+    <p class="p">
+      Database and table metadata is typically modified by:
+    </p>
+
+    <ul class="ul">
+      <li class="li">
+        Hive - through <code class="ph codeph">ALTER</code>, <code class="ph codeph">CREATE</code>, <code class="ph codeph">DROP</code> or
+        <code class="ph codeph">INSERT</code> operations.
+      </li>
+
+      <li class="li">
+        Impalad - through <code class="ph codeph">CREATE TABLE</code>, <code class="ph codeph">ALTER TABLE</code>, and <code class="ph codeph">INSERT</code>
+        operations. <span class="ph">Such changes are propagated to all Impala nodes by the
+        Impala catalog service.</span>
+      </li>
+    </ul>
+
+    <p class="p">
+      <code class="ph codeph">REFRESH</code> causes the metadata for that table to be immediately reloaded. For a huge table,
+      that process could take a noticeable amount of time; but doing the refresh up front avoids an unpredictable
+      delay later, for example if the next reference to the table is during a benchmark test.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Refreshing a single partition:</strong>
+    </p>
+
+    <p class="p">
+      In <span class="keyword">Impala 2.7</span> and higher, the <code class="ph codeph">REFRESH</code> statement can apply to a single partition at a time,
+      rather than the whole table. Include the optional <code class="ph codeph">PARTITION (<var class="keyword varname">partition_spec</var>)</code>
+      clause and specify values for each of the partition key columns.
+    </p>
+
+    <p class="p">
+      The following examples show how to make Impala aware of data added to a single partition, after data is loaded into
+      a partition's data directory using some mechanism outside Impala, such as Hive or Spark. The partition can be one that
+      Impala created and is already aware of, or a new partition created through Hive.
+    </p>
+
+<pre class="pre codeblock"><code>
+impala&gt; create table p (x int) partitioned by (y int);
+impala&gt; insert into p (x,y) values (1,2), (2,2), (2,1);
+impala&gt; show partitions p;
++-------+-------+--------+------+...
+| y     | #Rows | #Files | Size |...
++-------+-------+--------+------+...
+| 1     | -1    | 1      | 2B   |...
+| 2     | -1    | 1      | 4B   |...
+| Total | -1    | 2      | 6B   |...
++-------+-------+--------+------+...
+
+-- ... Data is inserted into one of the partitions by some external mechanism ...
+beeline&gt; insert into p partition (y = 1) values(1000);
+
+impala&gt; refresh p partition (y=1);
+impala&gt; select x from p where y=1;
++------+
+| x    |
++------+
+| 2    | &lt;- Original data created by Impala
+| 1000 | &lt;- Additional data inserted through Beeline
++------+
+
+</code></pre>
+
+    <p class="p">
+      The same applies for tables with more than one partition key column.
+      The <code class="ph codeph">PARTITION</code> clause of the <code class="ph codeph">REFRESH</code>
+      statement must include all the partition key columns.
+    </p>
+
+<pre class="pre codeblock"><code>
+impala&gt; create table p2 (x int) partitioned by (y int, z int);
+impala&gt; insert into p2 (x,y,z) values (0,0,0), (1,2,3), (2,2,3);
+impala&gt; show partitions p2;
++-------+---+-------+--------+------+...
+| y     | z | #Rows | #Files | Size |...
++-------+---+-------+--------+------+...
+| 0     | 0 | -1    | 1      | 2B   |...
+| 2     | 3 | -1    | 1      | 4B   |...
+| Total |   | -1    | 2      | 6B   |...
++-------+---+-------+--------+------+...
+
+-- ... Data is inserted into one of the partitions by some external mechanism ...
+beeline&gt; insert into p2 partition (y = 2, z = 3) values(1000);
+
+impala&gt; refresh p2 partition (y=2, z=3);
+impala&gt; select x from p where y=2 and z = 3;
++------+
+| x    |
++------+
+| 1    | &lt;- Original data created by Impala
+| 2    | &lt;- Original data created by Impala
+| 1000 | &lt;- Additional data inserted through Beeline
++------+
+
+</code></pre>
+
+    <p class="p">
+      The following examples show how specifying a nonexistent partition does not cause any error,
+      and the order of the partition key columns does not have to match the column order in the table.
+      The partition spec must include all the partition key columns; specifying an incomplete set of
+      columns does cause an error.
+    </p>
+
+<pre class="pre codeblock"><code>
+-- Partition doesn't exist.
+refresh p2 partition (y=0, z=3);
+refresh p2 partition (y=0, z=-1)
+-- Key columns specified in a different order than the table definition.
+refresh p2 partition (z=1, y=0)
+-- Incomplete partition spec causes an error.
+refresh p2 partition (y=0)
+ERROR: AnalysisException: Items in partition spec must exactly match the partition columns in the table definition: default.p2 (1 vs 2)
+
+</code></pre>
+
+    <p class="p">
+        If you connect to different Impala nodes within an <span class="keyword cmdname">impala-shell</span> session for
+        load-balancing purposes, you can enable the <code class="ph codeph">SYNC_DDL</code> query option to make each DDL
+        statement wait before returning, until the new or changed metadata has been received by all the Impala
+        nodes. See <a class="xref" href="../shared/../topics/impala_sync_ddl.html#sync_ddl">SYNC_DDL Query Option</a> for details.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+    <p class="p">
+      The following example shows how you might use the <code class="ph codeph">REFRESH</code> statement after manually adding
+      new HDFS data files to the Impala data directory for a table:
+    </p>
+
+<pre class="pre codeblock"><code>[impalad-host:21000] &gt; refresh t1;
+[impalad-host:21000] &gt; refresh t2;
+[impalad-host:21000] &gt; select * from t1;
+...
+[impalad-host:21000] &gt; select * from t2;
+... </code></pre>
+
+    <p class="p">
+      For more examples of using <code class="ph codeph">REFRESH</code> and <code class="ph codeph">INVALIDATE METADATA</code> with a
+      combination of Impala and Hive operations, see <a class="xref" href="impala_tutorial.html#tutorial_impala_hive">Switching Back and Forth Between Impala and Hive</a>.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Related impala-shell options:</strong>
+    </p>
+
+    <p class="p">
+      The <span class="keyword cmdname">impala-shell</span> option <code class="ph codeph">-r</code> issues an <code class="ph codeph">INVALIDATE METADATA</code> statement
+      when starting up the shell, effectively performing a <code class="ph codeph">REFRESH</code> of all tables.
+      Due to the expense of reloading the metadata for all tables, the <span class="keyword cmdname">impala-shell</span> <code class="ph codeph">-r</code>
+      option is not recommended for day-to-day use in a production environment. (This option was mainly intended as a workaround
+      for synchronization issues in very old Impala versions.)
+    </p>
+
+    <p class="p">
+        <strong class="ph b">HDFS permissions:</strong>
+      </p>
+    <p class="p">
+      The user ID that the <span class="keyword cmdname">impalad</span> daemon runs under,
+      typically the <code class="ph codeph">impala</code> user, must have execute
+      permissions for all the relevant directories holding table data.
+      (A table could have data spread across multiple directories,
+      or in unexpected paths, if it uses partitioning or
+      specifies a <code class="ph codeph">LOCATION</code> attribute for
+      individual partitions or the entire table.)
+      Issues with permissions might not cause an immediate error for this statement,
+      but subsequent statements such as <code class="ph codeph">SELECT</code>
+      or <code class="ph codeph">SHOW TABLE STATS</code> could fail.
+    </p>
+    <p class="p">
+      All HDFS and Sentry permissions and privileges are the same whether you refresh the entire table
+      or a single partition.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">HDFS considerations:</strong>
+      </p>
+
+    <p class="p">
+      The <code class="ph codeph">REFRESH</code> command checks HDFS permissions of the underlying data files and directories,
+      caching this information so that a statement can be cancelled immediately if for example the
+      <code class="ph codeph">impala</code> user does not have permission to write to the data directory for the table. Impala
+      reports any lack of write permissions as an <code class="ph codeph">INFO</code> message in the log file, in case that
+      represents an oversight. If you change HDFS permissions to make data readable or writeable by the Impala
+      user, issue another <code class="ph codeph">REFRESH</code> to make Impala aware of the change.
+    </p>
+
+    <div class="note important note_important"><span class="note__title importanttitle">Important:</span>
+        After adding or replacing data in a table used in performance-critical queries, issue a <code class="ph codeph">COMPUTE
+        STATS</code> statement to make sure all statistics are up-to-date. Consider updating statistics for a
+        table after any <code class="ph codeph">INSERT</code>, <code class="ph codeph">LOAD DATA</code>, or <code class="ph codeph">CREATE TABLE AS
+        SELECT</code> statement in Impala, or after loading data through Hive and doing a <code class="ph codeph">REFRESH
+        <var class="keyword varname">table_name</var></code> in Impala. This technique is especially important for tables that
+        are very large, used in join queries, or both.
+      </div>
+
+    <p class="p">
+        <strong class="ph b">Amazon S3 considerations:</strong>
+      </p>
+    <p class="p">
+        The <code class="ph codeph">REFRESH</code> and <code class="ph codeph">INVALIDATE METADATA</code> statements also cache metadata
+        for tables where the data resides in the Amazon Simple Storage Service (S3).
+        In particular, issue a <code class="ph codeph">REFRESH</code> for a table after adding or removing files
+        in the associated S3 data directory.
+        See <a class="xref" href="../shared/../topics/impala_s3.html#s3">Using Impala with the Amazon S3 Filesystem</a> for details about working with S3 tables.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Cancellation:</strong> Cannot be cancelled.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Kudu considerations:</strong>
+      </p>
+    <p class="p">
+        Much of the metadata for Kudu tables is handled by the underlying
+        storage layer. Kudu tables have less reliance on the metastore
+        database, and require less metadata caching on the Impala side.
+        For example, information about partitions in Kudu tables is managed
+        by Kudu, and Impala does not cache any block locality metadata
+        for Kudu tables.
+      </p>
+    <p class="p">
+        The <code class="ph codeph">REFRESH</code> and <code class="ph codeph">INVALIDATE METADATA</code>
+        statements are needed less frequently for Kudu tables than for
+        HDFS-backed tables. Neither statement is needed when data is
+        added to, removed, or updated in a Kudu table, even if the changes
+        are made directly to Kudu through a client program using the Kudu API.
+        Run <code class="ph codeph">REFRESH <var class="keyword varname">table_name</var></code> or
+        <code class="ph codeph">INVALIDATE METADATA <var class="keyword varname">table_name</var></code>
+        for a Kudu table only after making a change to the Kudu table schema,
+        such as adding or dropping a column, by a mechanism other than
+        Impala.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">UDF considerations:</strong>
+      </p>
+    <div class="p">
+        In <span class="keyword">Impala 2.9</span> and higher, you can refresh the user-defined functions (UDFs)
+        that Impala recognizes, at the database level, by running the <code class="ph codeph">REFRESH FUNCTIONS</code>
+        statement with the database name as an argument. Java-based UDFs can be added to the metastore
+        database through Hive <code class="ph codeph">CREATE FUNCTION</code> statements, and made visible to Impala
+        by subsequently running <code class="ph codeph">REFRESH FUNCTIONS</code>. For example:
+
+<pre class="pre codeblock"><code>CREATE DATABASE shared_udfs;
+USE shared_udfs;
+...use CREATE FUNCTION statements in Hive to create some Java-based UDFs
+   that Impala is not initially aware of...
+REFRESH FUNCTIONS shared_udfs;
+SELECT udf_created_by_hive(c1) FROM ...
+</code></pre>
+      </div>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+    <p class="p">
+      <a class="xref" href="impala_hadoop.html#intro_metastore">Overview of Impala Metadata and the Metastore</a>,
+      <a class="xref" href="impala_invalidate_metadata.html#invalidate_metadata">INVALIDATE METADATA Statement</a>
+    </p>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_langref_sql.html">Impala SQL Statements</a></div></div></nav></article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_release_notes.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_release_notes.html b/docs/build3x/html/topics/impala_release_notes.html
new file mode 100644
index 0000000..86359c9
--- /dev/null
+++ b/docs/build3x/html/topics/impala_release_notes.html
@@ -0,0 +1,26 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_new_features.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_incompatible_changes.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_known_issues.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_fixed_issues.html"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="impala_release_notes"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Impala Release Notes</title></head><body id="impala_release_notes"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Impala Release Notes</h1>
+
+
+  <div class="body conbody">
+
+    <p class="p">
+      These release notes provide information on the <a class="xref" href="impala_new_features.html#new_features">new
+      features</a> and <a class="xref" href="impala_known_issues.html#known_issues">known issues and limitations</a> for
+      Impala versions up to <span class="ph">Impala 3.0.x</span>. For users
+      upgrading from earlier Impala releases, or using Impala in combination with specific versions of other
+      software, <a class="xref" href="impala_incompatible_changes.html#incompatible_changes">Incompatible Changes and Limitations in Apache Impala</a> lists any changes to
+      file formats, SQL syntax, or software dependencies to take into account.
+    </p>
+
+    <p class="p">
+      Once you are finished reviewing these release notes, for more information about using Impala, see
+      <a class="xref" href="impala_concepts.html">Impala Concepts and Architecture</a>.
+    </p>
+
+    <p class="p toc"></p>
+  </div>
+<nav role="navigation" class="related-links"><ul class="ullinks"><li class="link ulchildlink"><strong><a href="../topics/impala_new_features.html">New Features in Apache Impala</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_incompatible_changes.html">Incompatible Changes and Limitations in Apache Impala</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_known_issues.html">Known Issues and Workarounds in Impala</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_fixed_issues.html">Fixed Issues in Apache Impala</a></strong><br></li></ul></nav></article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_relnotes.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_relnotes.html b/docs/build3x/html/topics/impala_relnotes.html
new file mode 100644
index 0000000..f9b8d62
--- /dev/null
+++ b/docs/build3x/html/topics/impala_relnotes.html
@@ -0,0 +1,26 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="relnotes"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Impala Release Notes</title></head><body id="relnotes"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Impala Release Notes</h1>
+
+
+  <div class="body conbody" id="relnotes__relnotes_intro">
+
+    <p class="p">
+      These release notes provide information on the <a class="xref" href="impala_new_features.html#new_features">new
+      features</a> and <a class="xref" href="impala_known_issues.html#known_issues">known issues and limitations</a> for
+      Impala versions up to <span class="ph">Impala 3.0.x</span>. For users
+      upgrading from earlier Impala releases, or using Impala in combination with specific versions of other
+      software, <a class="xref" href="impala_incompatible_changes.html#incompatible_changes">Incompatible Changes and Limitations in Apache Impala</a> lists any changes to
+      file formats, SQL syntax, or software dependencies to take into account.
+    </p>
+
+    <p class="p">
+      Once you are finished reviewing these release notes, for more information about using Impala, see
+      <a class="xref" href="impala_concepts.html">Impala Concepts and Architecture</a>.
+    </p>
+
+    <p class="p toc"></p>
+  </div>
+</article></main></body></html>

[40/51] [partial] impala git commit: [DOCS] Impala doc site update for 3.0

Posted by mi...@apache.org.

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_count.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_count.html b/docs/build3x/html/topics/impala_count.html
new file mode 100644
index 0000000..a451013
--- /dev/null
+++ b/docs/build3x/html/topics/impala_count.html
@@ -0,0 +1,353 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_aggregate_functions.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="count"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>COUNT Function</title></head><body id="count"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">COUNT Function</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+
+      An aggregate function that returns the number of rows, or the number of non-<code class="ph codeph">NULL</code> rows.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>COUNT([DISTINCT | ALL] <var class="keyword varname">expression</var>) [OVER (<var class="keyword varname">analytic_clause</var>)]</code></pre>
+
+    <p class="p">
+      Depending on the argument, <code class="ph codeph">COUNT()</code> considers rows that meet certain conditions:
+    </p>
+
+    <ul class="ul">
+      <li class="li">
+        The notation <code class="ph codeph">COUNT(*)</code> includes <code class="ph codeph">NULL</code> values in the total.
+      </li>
+
+      <li class="li">
+        The notation <code class="ph codeph">COUNT(<var class="keyword varname">column_name</var>)</code> only considers rows where the column
+        contains a non-<code class="ph codeph">NULL</code> value.
+      </li>
+
+      <li class="li">
+        You can also combine <code class="ph codeph">COUNT</code> with the <code class="ph codeph">DISTINCT</code> operator to eliminate
+        duplicates before counting, and to count the combinations of values across multiple columns.
+      </li>
+    </ul>
+
+    <p class="p">
+      When the query contains a <code class="ph codeph">GROUP BY</code> clause, returns one value for each combination of
+      grouping values.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Return type:</strong> <code class="ph codeph">BIGINT</code>
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+    <p class="p">
+        If you frequently run aggregate functions such as <code class="ph codeph">MIN()</code>, <code class="ph codeph">MAX()</code>, and
+        <code class="ph codeph">COUNT(DISTINCT)</code> on partition key columns, consider enabling the <code class="ph codeph">OPTIMIZE_PARTITION_KEY_SCANS</code>
+        query option, which optimizes such queries. This feature is available in <span class="keyword">Impala 2.5</span> and higher.
+        See <a class="xref" href="../shared/../topics/impala_optimize_partition_key_scans.html">OPTIMIZE_PARTITION_KEY_SCANS Query Option (Impala 2.5 or higher only)</a>
+        for the kinds of queries that this option applies to, and slight differences in how partitions are
+        evaluated when this query option is enabled.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Complex type considerations:</strong>
+      </p>
+
+    <p class="p">
+        To access a column with a complex type (<code class="ph codeph">ARRAY</code>, <code class="ph codeph">STRUCT</code>, or <code class="ph codeph">MAP</code>)
+        in an aggregation function, you unpack the individual elements using join notation in the query,
+        and then apply the function to the final scalar item, field, key, or value at the bottom of any nested type hierarchy in the column.
+        See <a class="xref" href="../shared/../topics/impala_complex_types.html#complex_types">Complex Types (Impala 2.3 or higher only)</a> for details about using complex types in Impala.
+      </p>
+
+    <div class="p">
+The following example demonstrates calls to several aggregation functions
+using values from a column containing nested complex types
+(an <code class="ph codeph">ARRAY</code> of <code class="ph codeph">STRUCT</code> items).
+The array is unpacked inside the query using join notation.
+The array elements are referenced using the <code class="ph codeph">ITEM</code>
+pseudocolumn, and the structure fields inside the array elements
+are referenced using dot notation.
+Numeric values such as <code class="ph codeph">SUM()</code> and <code class="ph codeph">AVG()</code>
+are computed using the numeric <code class="ph codeph">R_NATIONKEY</code> field, and
+the general-purpose <code class="ph codeph">MAX()</code> and <code class="ph codeph">MIN()</code>
+values are computed from the string <code class="ph codeph">N_NAME</code> field.
+<pre class="pre codeblock"><code>describe region;
++-------------+-------------------------+---------+
+| name        | type                    | comment |
++-------------+-------------------------+---------+
+| r_regionkey | smallint                |         |
+| r_name      | string                  |         |
+| r_comment   | string                  |         |
+| r_nations   | array&lt;struct&lt;           |         |
+|             |   n_nationkey:smallint, |         |
+|             |   n_name:string,        |         |
+|             |   n_comment:string      |         |
+|             | &gt;&gt;                      |         |
++-------------+-------------------------+---------+
+
+select r_name, r_nations.item.n_nationkey
+  from region, region.r_nations as r_nations
+order by r_name, r_nations.item.n_nationkey;
++-------------+------------------+
+| r_name      | item.n_nationkey |
++-------------+------------------+
+| AFRICA      | 0                |
+| AFRICA      | 5                |
+| AFRICA      | 14               |
+| AFRICA      | 15               |
+| AFRICA      | 16               |
+| AMERICA     | 1                |
+| AMERICA     | 2                |
+| AMERICA     | 3                |
+| AMERICA     | 17               |
+| AMERICA     | 24               |
+| ASIA        | 8                |
+| ASIA        | 9                |
+| ASIA        | 12               |
+| ASIA        | 18               |
+| ASIA        | 21               |
+| EUROPE      | 6                |
+| EUROPE      | 7                |
+| EUROPE      | 19               |
+| EUROPE      | 22               |
+| EUROPE      | 23               |
+| MIDDLE EAST | 4                |
+| MIDDLE EAST | 10               |
+| MIDDLE EAST | 11               |
+| MIDDLE EAST | 13               |
+| MIDDLE EAST | 20               |
++-------------+------------------+
+
+select
+  r_name,
+  count(r_nations.item.n_nationkey) as count,
+  sum(r_nations.item.n_nationkey) as sum,
+  avg(r_nations.item.n_nationkey) as avg,
+  min(r_nations.item.n_name) as minimum,
+  max(r_nations.item.n_name) as maximum,
+  ndv(r_nations.item.n_nationkey) as distinct_vals
+from
+  region, region.r_nations as r_nations
+group by r_name
+order by r_name;
++-------------+-------+-----+------+-----------+----------------+---------------+
+| r_name      | count | sum | avg  | minimum   | maximum        | distinct_vals |
++-------------+-------+-----+------+-----------+----------------+---------------+
+| AFRICA      | 5     | 50  | 10   | ALGERIA   | MOZAMBIQUE     | 5             |
+| AMERICA     | 5     | 47  | 9.4  | ARGENTINA | UNITED STATES  | 5             |
+| ASIA        | 5     | 68  | 13.6 | CHINA     | VIETNAM        | 5             |
+| EUROPE      | 5     | 77  | 15.4 | FRANCE    | UNITED KINGDOM | 5             |
+| MIDDLE EAST | 5     | 58  | 11.6 | EGYPT     | SAUDI ARABIA   | 5             |
++-------------+-------+-----+------+-----------+----------------+---------------+
+</code></pre>
+</div>
+
+    <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>-- How many rows total are in the table, regardless of NULL values?
+select count(*) from t1;
+-- How many rows are in the table with non-NULL values for a column?
+select count(c1) from t1;
+-- Count the rows that meet certain conditions.
+-- Again, * includes NULLs, so COUNT(*) might be greater than COUNT(col).
+select count(*) from t1 where x &gt; 10;
+select count(c1) from t1 where x &gt; 10;
+-- Can also be used in combination with DISTINCT and/or GROUP BY.
+-- Combine COUNT and DISTINCT to find the number of unique values.
+-- Must use column names rather than * with COUNT(DISTINCT ...) syntax.
+-- Rows with NULL values are not counted.
+select count(distinct c1) from t1;
+-- Rows with a NULL value in _either_ column are not counted.
+select count(distinct c1, c2) from t1;
+-- Return more than one result.
+select month, year, count(distinct visitor_id) from web_stats group by month, year;
+</code></pre>
+
+    <div class="p">
+      The following examples show how to use <code class="ph codeph">COUNT()</code> in an analytic context. They use a table
+      containing integers from 1 to 10. Notice how the <code class="ph codeph">COUNT()</code> is reported for each input value, as
+      opposed to the <code class="ph codeph">GROUP BY</code> clause which condenses the result set.
+<pre class="pre codeblock"><code>select x, property, count(x) over (partition by property) as count from int_t where property in ('odd','even');
++----+----------+-------+
+| x  | property | count |
++----+----------+-------+
+| 2  | even     | 5     |
+| 4  | even     | 5     |
+| 6  | even     | 5     |
+| 8  | even     | 5     |
+| 10 | even     | 5     |
+| 1  | odd      | 5     |
+| 3  | odd      | 5     |
+| 5  | odd      | 5     |
+| 7  | odd      | 5     |
+| 9  | odd      | 5     |
++----+----------+-------+
+</code></pre>
+
+Adding an <code class="ph codeph">ORDER BY</code> clause lets you experiment with results that are cumulative or apply to a moving
+set of rows (the <span class="q">"window"</span>). The following examples use <code class="ph codeph">COUNT()</code> in an analytic context
+(that is, with an <code class="ph codeph">OVER()</code> clause) to produce a running count of all the even values,
+then a running count of all the odd values. The basic <code class="ph codeph">ORDER BY x</code> clause implicitly
+activates a window clause of <code class="ph codeph">RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW</code>,
+which is effectively the same as <code class="ph codeph">ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW</code>,
+therefore all of these examples produce the same results:
+<pre class="pre codeblock"><code>select x, property,
+  count(x) over (partition by property <strong class="ph b">order by x</strong>) as 'cumulative count'
+  from int_t where property in ('odd','even');
++----+----------+------------------+
+| x  | property | cumulative count |
++----+----------+------------------+
+| 2  | even     | 1                |
+| 4  | even     | 2                |
+| 6  | even     | 3                |
+| 8  | even     | 4                |
+| 10 | even     | 5                |
+| 1  | odd      | 1                |
+| 3  | odd      | 2                |
+| 5  | odd      | 3                |
+| 7  | odd      | 4                |
+| 9  | odd      | 5                |
++----+----------+------------------+
+
+select x, property,
+  count(x) over
+  (
+    partition by property
+    <strong class="ph b">order by x</strong>
+    <strong class="ph b">range between unbounded preceding and current row</strong>
+  ) as 'cumulative total'
+from int_t where property in ('odd','even');
++----+----------+------------------+
+| x  | property | cumulative count |
++----+----------+------------------+
+| 2  | even     | 1                |
+| 4  | even     | 2                |
+| 6  | even     | 3                |
+| 8  | even     | 4                |
+| 10 | even     | 5                |
+| 1  | odd      | 1                |
+| 3  | odd      | 2                |
+| 5  | odd      | 3                |
+| 7  | odd      | 4                |
+| 9  | odd      | 5                |
++----+----------+------------------+
+
+select x, property,
+  count(x) over
+  (
+    partition by property
+    <strong class="ph b">order by x</strong>
+    <strong class="ph b">rows between unbounded preceding and current row</strong>
+  ) as 'cumulative total'
+  from int_t where property in ('odd','even');
++----+----------+------------------+
+| x  | property | cumulative count |
++----+----------+------------------+
+| 2  | even     | 1                |
+| 4  | even     | 2                |
+| 6  | even     | 3                |
+| 8  | even     | 4                |
+| 10 | even     | 5                |
+| 1  | odd      | 1                |
+| 3  | odd      | 2                |
+| 5  | odd      | 3                |
+| 7  | odd      | 4                |
+| 9  | odd      | 5                |
++----+----------+------------------+
+</code></pre>
+
+The following examples show how to construct a moving window, with a running count taking into account 1 row before
+and 1 row after the current row, within the same partition (all the even values or all the odd values).
+Therefore, the count is consistently 3 for rows in the middle of the window, and 2 for
+rows near the ends of the window, where there is no preceding or no following row in the partition.
+Because of a restriction in the Impala <code class="ph codeph">RANGE</code> syntax, this type of
+moving window is possible with the <code class="ph codeph">ROWS BETWEEN</code> clause but not the <code class="ph codeph">RANGE BETWEEN</code>
+clause:
+<pre class="pre codeblock"><code>select x, property,
+  count(x) over
+  (
+    partition by property
+    <strong class="ph b">order by x</strong>
+    <strong class="ph b">rows between 1 preceding and 1 following</strong>
+  ) as 'moving total'
+  from int_t where property in ('odd','even');
++----+----------+--------------+
+| x  | property | moving total |
++----+----------+--------------+
+| 2  | even     | 2            |
+| 4  | even     | 3            |
+| 6  | even     | 3            |
+| 8  | even     | 3            |
+| 10 | even     | 2            |
+| 1  | odd      | 2            |
+| 3  | odd      | 3            |
+| 5  | odd      | 3            |
+| 7  | odd      | 3            |
+| 9  | odd      | 2            |
++----+----------+--------------+
+
+-- Doesn't work because of syntax restriction on RANGE clause.
+select x, property,
+  count(x) over
+  (
+    partition by property
+    <strong class="ph b">order by x</strong>
+    <strong class="ph b">range between 1 preceding and 1 following</strong>
+  ) as 'moving total'
+from int_t where property in ('odd','even');
+ERROR: AnalysisException: RANGE is only supported with both the lower and upper bounds UNBOUNDED or one UNBOUNDED and the other CURRENT ROW.
+</code></pre>
+    </div>
+
+    <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+        <p class="p">
+          By default, Impala only allows a single <code class="ph codeph">COUNT(DISTINCT <var class="keyword varname">columns</var>)</code>
+          expression in each query.
+        </p>
+        <p class="p">
+          If you do not need precise accuracy, you can produce an estimate of the distinct values for a column by
+          specifying <code class="ph codeph">NDV(<var class="keyword varname">column</var>)</code>; a query can contain multiple instances of
+          <code class="ph codeph">NDV(<var class="keyword varname">column</var>)</code>. To make Impala automatically rewrite
+          <code class="ph codeph">COUNT(DISTINCT)</code> expressions to <code class="ph codeph">NDV()</code>, enable the
+          <code class="ph codeph">APPX_COUNT_DISTINCT</code> query option.
+        </p>
+        <p class="p">
+          To produce the same result as multiple <code class="ph codeph">COUNT(DISTINCT)</code> expressions, you can use the
+          following technique for queries involving a single table:
+        </p>
+<pre class="pre codeblock"><code>select v1.c1 result1, v2.c1 result2 from
+  (select count(distinct col1) as c1 from t1) v1
+    cross join
+  (select count(distinct col2) as c1 from t1) v2;
+</code></pre>
+        <p class="p">
+          Because <code class="ph codeph">CROSS JOIN</code> is an expensive operation, prefer to use the <code class="ph codeph">NDV()</code>
+          technique wherever practical.
+        </p>
+      </div>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+    <p class="p">
+      <a class="xref" href="impala_analytic_functions.html#analytic_functions">Impala Analytic Functions</a>
+    </p>
+
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_aggregate_functions.html">Impala Aggregate Functions</a></div></div></nav></article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_create_database.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_create_database.html b/docs/build3x/html/topics/impala_create_database.html
new file mode 100644
index 0000000..14cd785
--- /dev/null
+++ b/docs/build3x/html/topics/impala_create_database.html
@@ -0,0 +1,209 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_langref_sql.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="create_database"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>CREATE DATABASE Statement</title></head><body id="create_database"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">CREATE DATABASE Statement</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+
+      Creates a new database.
+    </p>
+
+    <p class="p">
+      In Impala, a database is both:
+    </p>
+
+    <ul class="ul">
+      <li class="li">
+        A logical construct for grouping together related tables, views, and functions within their own namespace.
+        You might use a separate database for each application, set of related tables, or round of experimentation.
+      </li>
+
+      <li class="li">
+        A physical construct represented by a directory tree in HDFS. Tables (internal tables), partitions, and
+        data files are all located under this directory. You can perform HDFS-level operations such as backing it up and measuring space usage,
+        or remove it with a <code class="ph codeph">DROP DATABASE</code> statement.
+      </li>
+    </ul>
+
+    <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>CREATE (DATABASE|SCHEMA) [IF NOT EXISTS] <var class="keyword varname">database_name</var>[COMMENT '<var class="keyword varname">database_comment</var>']
+  [LOCATION <var class="keyword varname">hdfs_path</var>];</code></pre>
+
+    <p class="p">
+        <strong class="ph b">Statement type:</strong> DDL
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+    <p class="p">
+      A database is physically represented as a directory in HDFS, with a filename extension <code class="ph codeph">.db</code>,
+      under the main Impala data directory. If the associated HDFS directory does not exist, it is created for you.
+      All databases and their associated directories are top-level objects, with no physical or logical nesting.
+    </p>
+
+    <p class="p">
+      After creating a database, to make it the current database within an <span class="keyword cmdname">impala-shell</span> session,
+      use the <code class="ph codeph">USE</code> statement. You can refer to tables in the current database without prepending
+      any qualifier to their names.
+    </p>
+
+    <p class="p">
+      When you first connect to Impala through <span class="keyword cmdname">impala-shell</span>, the database you start in (before
+      issuing any <code class="ph codeph">CREATE DATABASE</code> or <code class="ph codeph">USE</code> statements) is named
+      <code class="ph codeph">default</code>.
+    </p>
+
+    <div class="p">
+        Impala includes another predefined database, <code class="ph codeph">_impala_builtins</code>, that serves as the location
+        for the <a class="xref" href="../shared/../topics/impala_functions.html#builtins">built-in functions</a>. To see the built-in
+        functions, use a statement like the following:
+<pre class="pre codeblock"><code>show functions in _impala_builtins;
+show functions in _impala_builtins like '*<var class="keyword varname">substring</var>*';
+</code></pre>
+      </div>
+
+    <p class="p">
+      After creating a database, your <span class="keyword cmdname">impala-shell</span> session or another
+      <span class="keyword cmdname">impala-shell</span> connected to the same node can immediately access that database. To access
+      the database through the Impala daemon on a different node, issue the <code class="ph codeph">INVALIDATE METADATA</code>
+      statement first while connected to that other node.
+    </p>
+
+    <p class="p">
+      Setting the <code class="ph codeph">LOCATION</code> attribute for a new database is a way to work with sets of files in an
+      HDFS directory structure outside the default Impala data directory, as opposed to setting the
+      <code class="ph codeph">LOCATION</code> attribute for each individual table.
+    </p>
+
+    <p class="p">
+        If you connect to different Impala nodes within an <span class="keyword cmdname">impala-shell</span> session for
+        load-balancing purposes, you can enable the <code class="ph codeph">SYNC_DDL</code> query option to make each DDL
+        statement wait before returning, until the new or changed metadata has been received by all the Impala
+        nodes. See <a class="xref" href="../shared/../topics/impala_sync_ddl.html#sync_ddl">SYNC_DDL Query Option</a> for details.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Hive considerations:</strong>
+      </p>
+
+    <p class="p">
+      When you create a database in Impala, the database can also be used by Hive.
+      When you create a database in Hive, issue an <code class="ph codeph">INVALIDATE METADATA</code>
+      statement in Impala to make Impala permanently aware of the new database.
+    </p>
+
+    <p class="p">
+      The <code class="ph codeph">SHOW DATABASES</code> statement lists all databases, or the databases whose name
+      matches a wildcard pattern. <span class="ph">In <span class="keyword">Impala 2.5</span> and higher, the
+      <code class="ph codeph">SHOW DATABASES</code> output includes a second column that displays the associated
+      comment, if any, for each database.</span>
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Amazon S3 considerations:</strong>
+      </p>
+
+    <p class="p">
+      To specify that any tables created within a database reside on the Amazon S3 system,
+      you can include an <code class="ph codeph">s3a://</code> prefix on the <code class="ph codeph">LOCATION</code>
+      attribute. In <span class="keyword">Impala 2.6</span> and higher, Impala automatically creates any
+      required folders as the databases, tables, and partitions are created, and removes
+      them when they are dropped.
+    </p>
+
+    <p class="p">
+        In <span class="keyword">Impala 2.6</span> and higher, Impala DDL statements such as
+        <code class="ph codeph">CREATE DATABASE</code>, <code class="ph codeph">CREATE TABLE</code>, <code class="ph codeph">DROP DATABASE CASCADE</code>,
+        <code class="ph codeph">DROP TABLE</code>, and <code class="ph codeph">ALTER TABLE [ADD|DROP] PARTITION</code> can create or remove folders
+        as needed in the Amazon S3 system. Prior to <span class="keyword">Impala 2.6</span>, you had to create folders yourself and point
+        Impala database, tables, or partitions at them, and manually remove folders when no longer needed.
+        See <a class="xref" href="../shared/../topics/impala_s3.html#s3">Using Impala with the Amazon S3 Filesystem</a> for details about reading and writing S3 data with Impala.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Cancellation:</strong> Cannot be cancelled.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">HDFS permissions:</strong>
+      </p>
+    <p class="p">
+      The user ID that the <span class="keyword cmdname">impalad</span> daemon runs under,
+      typically the <code class="ph codeph">impala</code> user, must have write
+      permission for the parent HDFS directory under which the database
+      is located.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+    <pre class="pre codeblock"><code>create database first_db;
+use first_db;
+create table t1 (x int);
+
+create database second_db;
+use second_db;
+-- Each database has its own namespace for tables.
+-- You can reuse the same table names in each database.
+create table t1 (s string);
+
+create database temp;
+
+-- You can either USE a database after creating it,
+-- or qualify all references to the table name with the name of the database.
+-- Here, tables T2 and T3 are both created in the TEMP database.
+
+create table temp.t2 (x int, y int);
+use database temp;
+create table t3 (s string);
+
+-- You cannot drop a database while it is selected by the USE statement.
+drop database temp;
+<em class="ph i">ERROR: AnalysisException: Cannot drop current default database: temp</em>
+
+-- The always-available database 'default' is a convenient one to USE
+-- before dropping a database you created.
+use default;
+
+-- Before dropping a database, first drop all the tables inside it,
+<span class="ph">-- or in <span class="keyword">Impala 2.3</span> and higher use the CASCADE clause.</span>
+drop database temp;
+ERROR: ImpalaRuntimeException: Error making 'dropDatabase' RPC to Hive Metastore:
+CAUSED BY: InvalidOperationException: Database temp is not empty
+show tables in temp;
++------+
+| name |
++------+
+| t3   |
++------+
+
+<span class="ph">-- <span class="keyword">Impala 2.3</span> and higher:</span>
+<span class="ph">drop database temp cascade;</span>
+
+-- Earlier releases:
+drop table temp.t3;
+drop database temp;
+</code></pre>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+    <p class="p">
+      <a class="xref" href="impala_databases.html#databases">Overview of Impala Databases</a>, <a class="xref" href="impala_drop_database.html#drop_database">DROP DATABASE Statement</a>,
+      <a class="xref" href="impala_use.html#use">USE Statement</a>, <a class="xref" href="impala_show.html#show_databases">SHOW DATABASES</a>,
+      <a class="xref" href="impala_tables.html#tables">Overview of Impala Tables</a>
+    </p>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_langref_sql.html">Impala SQL Statements</a></div></div></nav></article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_create_function.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_create_function.html b/docs/build3x/html/topics/impala_create_function.html
new file mode 100644
index 0000000..9b25620
--- /dev/null
+++ b/docs/build3x/html/topics/impala_create_function.html
@@ -0,0 +1,502 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_langref_sql.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="create_function"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>CREATE FUNCTION Statement</title></head><body id="create_function"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">CREATE FUNCTION Statement</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+
+      Creates a user-defined function (UDF), which you can use to implement custom logic during
+      <code class="ph codeph">SELECT</code> or <code class="ph codeph">INSERT</code> operations.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+    <p class="p">
+      The syntax is different depending on whether you create a scalar UDF, which is called once for each row and
+      implemented by a single function, or a user-defined aggregate function (UDA), which is implemented by
+      multiple functions that compute intermediate results across sets of rows.
+    </p>
+
+    <p class="p">
+      In <span class="keyword">Impala 2.5</span> and higher, the syntax is also different for creating or dropping scalar Java-based UDFs.
+      The statements for Java UDFs use a new syntax, without any argument types or return type specified. Java-based UDFs
+      created using the new syntax persist across restarts of the Impala catalog server, and can be shared transparently
+      between Impala and Hive.
+    </p>
+
+    <p class="p">
+      To create a persistent scalar C++ UDF with <code class="ph codeph">CREATE FUNCTION</code>:
+    </p>
+
+<pre class="pre codeblock"><code>CREATE FUNCTION [IF NOT EXISTS] [<var class="keyword varname">db_name</var>.]<var class="keyword varname">function_name</var>([<var class="keyword varname">arg_type</var>[, <var class="keyword varname">arg_type</var>...])
+  RETURNS <var class="keyword varname">return_type</var>
+  LOCATION '<var class="keyword varname">hdfs_path_to_dot_so</var>'
+  SYMBOL='<var class="keyword varname">symbol_name</var>'</code></pre>
+
+    <div class="p">
+      To create a persistent Java UDF with <code class="ph codeph">CREATE FUNCTION</code>:
+<pre class="pre codeblock"><code>CREATE FUNCTION [IF NOT EXISTS] [<var class="keyword varname">db_name</var>.]<var class="keyword varname">function_name</var>
+  LOCATION '<var class="keyword varname">hdfs_path_to_jar</var>'
+  SYMBOL='<var class="keyword varname">class_name</var>'</code></pre>
+    </div>
+
+
+
+    <p class="p">
+      To create a persistent UDA, which must be written in C++, issue a <code class="ph codeph">CREATE AGGREGATE FUNCTION</code> statement:
+    </p>
+
+<pre class="pre codeblock"><code>CREATE [AGGREGATE] FUNCTION [IF NOT EXISTS] [<var class="keyword varname">db_name</var>.]<var class="keyword varname">function_name</var>([<var class="keyword varname">arg_type</var>[, <var class="keyword varname">arg_type</var>...])
+  RETURNS <var class="keyword varname">return_type</var>
+  LOCATION '<var class="keyword varname">hdfs_path</var>'
+  [INIT_FN='<var class="keyword varname">function</var>]
+  UPDATE_FN='<var class="keyword varname">function</var>
+  MERGE_FN='<var class="keyword varname">function</var>
+  [PREPARE_FN='<var class="keyword varname">function</var>]
+  [CLOSEFN='<var class="keyword varname">function</var>]
+  <span class="ph">[SERIALIZE_FN='<var class="keyword varname">function</var>]</span>
+  [FINALIZE_FN='<var class="keyword varname">function</var>]
+  <span class="ph">[INTERMEDIATE <var class="keyword varname">type_spec</var>]</span></code></pre>
+
+    <p class="p">
+        <strong class="ph b">Statement type:</strong> DDL
+      </p>
+
+    <p class="p">
+      <strong class="ph b">Varargs notation:</strong>
+    </p>
+
+    <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+      <p class="p">
+        Variable-length argument lists are supported for C++ UDFs, but currently not for Java UDFs.
+      </p>
+    </div>
+
+    <p class="p">
+      If the underlying implementation of your function accepts a variable number of arguments:
+    </p>
+
+    <ul class="ul">
+      <li class="li">
+        The variable arguments must go last in the argument list.
+      </li>
+
+      <li class="li">
+        The variable arguments must all be of the same type.
+      </li>
+
+      <li class="li">
+        You must include at least one instance of the variable arguments in every function call invoked from SQL.
+      </li>
+
+      <li class="li">
+        You designate the variable portion of the argument list in the <code class="ph codeph">CREATE FUNCTION</code> statement
+        by including <code class="ph codeph">...</code> immediately after the type name of the first variable argument. For
+        example, to create a function that accepts an <code class="ph codeph">INT</code> argument, followed by a
+        <code class="ph codeph">BOOLEAN</code>, followed by one or more <code class="ph codeph">STRING</code> arguments, your <code class="ph codeph">CREATE
+        FUNCTION</code> statement would look like:
+<pre class="pre codeblock"><code>CREATE FUNCTION <var class="keyword varname">func_name</var> (INT, BOOLEAN, STRING ...)
+  RETURNS <var class="keyword varname">type</var> LOCATION '<var class="keyword varname">path</var>' SYMBOL='<var class="keyword varname">entry_point</var>';
+</code></pre>
+      </li>
+    </ul>
+
+    <p class="p">
+      See <a class="xref" href="impala_udf.html#udf_varargs">Variable-Length Argument Lists</a> for how to code a C++ UDF to accept
+      variable-length argument lists.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Scalar and aggregate functions:</strong>
+    </p>
+
+    <p class="p">
+      The simplest kind of user-defined function returns a single scalar value each time it is called, typically
+      once for each row in the result set. This general kind of function is what is usually meant by UDF.
+      User-defined aggregate functions (UDAs) are a specialized kind of UDF that produce a single value based on
+      the contents of multiple rows. You usually use UDAs in combination with a <code class="ph codeph">GROUP BY</code> clause to
+      condense a large result set into a smaller one, or even a single row summarizing column values across an
+      entire table.
+    </p>
+
+    <p class="p">
+      You create UDAs by using the <code class="ph codeph">CREATE AGGREGATE FUNCTION</code> syntax. The clauses
+      <code class="ph codeph">INIT_FN</code>, <code class="ph codeph">UPDATE_FN</code>, <code class="ph codeph">MERGE_FN</code>,
+      <span class="ph"><code class="ph codeph">SERIALIZE_FN</code>,</span> <code class="ph codeph">FINALIZE_FN</code>, and
+      <code class="ph codeph">INTERMEDIATE</code> only apply when you create a UDA rather than a scalar UDF.
+    </p>
+
+    <p class="p">
+      The <code class="ph codeph">*_FN</code> clauses specify functions to call at different phases of function processing.
+    </p>
+
+    <ul class="ul">
+      <li class="li">
+        <strong class="ph b">Initialize:</strong> The function you specify with the <code class="ph codeph">INIT_FN</code> clause does any initial
+        setup, such as initializing member variables in internal data structures. This function is often a stub for
+        simple UDAs. You can omit this clause and a default (no-op) function will be used.
+      </li>
+
+      <li class="li">
+        <strong class="ph b">Update:</strong> The function you specify with the <code class="ph codeph">UPDATE_FN</code> clause is called once for each
+        row in the original result set, that is, before any <code class="ph codeph">GROUP BY</code> clause is applied. A separate
+        instance of the function is called for each different value returned by the <code class="ph codeph">GROUP BY</code>
+        clause. The final argument passed to this function is a pointer, to which you write an updated value based
+        on its original value and the value of the first argument.
+      </li>
+
+      <li class="li">
+        <strong class="ph b">Merge:</strong> The function you specify with the <code class="ph codeph">MERGE_FN</code> clause is called an arbitrary
+        number of times, to combine intermediate values produced by different nodes or different threads as Impala
+        reads and processes data files in parallel. The final argument passed to this function is a pointer, to
+        which you write an updated value based on its original value and the value of the first argument.
+      </li>
+
+      <li class="li">
+        <strong class="ph b">Serialize:</strong> The function you specify with the <code class="ph codeph">SERIALIZE_FN</code> clause frees memory
+        allocated to intermediate results. It is required if any memory was allocated by the Allocate function in
+        the Init, Update, or Merge functions, or if the intermediate type contains any pointers. See
+        <span class="xref">the UDA code samples</span> for details.
+      </li>
+
+      <li class="li">
+        <strong class="ph b">Finalize:</strong> The function you specify with the <code class="ph codeph">FINALIZE_FN</code> clause does any required
+        teardown for resources acquired by your UDF, such as freeing memory, closing file handles if you explicitly
+        opened any files, and so on. This function is often a stub for simple UDAs. You can omit this clause and a
+        default (no-op) function will be used. It is required in UDAs where the final return type is different than
+        the intermediate type. or if any memory was allocated by the Allocate function in the Init, Update, or
+        Merge functions. See <span class="xref">the UDA code samples</span> for details.
+      </li>
+    </ul>
+
+    <p class="p">
+      If you use a consistent naming convention for each of the underlying functions, Impala can automatically
+      determine the names based on the first such clause, so the others are optional.
+    </p>
+
+
+
+    <p class="p">
+      For end-to-end examples of UDAs, see <a class="xref" href="impala_udf.html#udfs">Impala User-Defined Functions (UDFs)</a>.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Complex type considerations:</strong>
+      </p>
+
+    <p class="p">
+        Currently, Impala UDFs cannot accept arguments or return values of the Impala complex types
+        (<code class="ph codeph">STRUCT</code>, <code class="ph codeph">ARRAY</code>, or <code class="ph codeph">MAP</code>).
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+    <ul class="ul">
+      <li class="li">
+        You can write Impala UDFs in either C++ or Java. C++ UDFs are new to Impala, and are the recommended format
+        for high performance utilizing native code. Java-based UDFs are compatible between Impala and Hive, and are
+        most suited to reusing existing Hive UDFs. (Impala can run Java-based Hive UDFs but not Hive UDAs.)
+      </li>
+
+      <li class="li">
+        <span class="keyword">Impala 2.5</span> introduces UDF improvements to persistence for both C++ and Java UDFs,
+        and better compatibility between Impala and Hive for Java UDFs.
+        See <a class="xref" href="impala_udf.html#udfs">Impala User-Defined Functions (UDFs)</a> for details.
+      </li>
+
+      <li class="li">
+        The body of the UDF is represented by a <code class="ph codeph">.so</code> or <code class="ph codeph">.jar</code> file, which you store
+        in HDFS and the <code class="ph codeph">CREATE FUNCTION</code> statement distributes to each Impala node.
+      </li>
+
+      <li class="li">
+        Impala calls the underlying code during SQL statement evaluation, as many times as needed to process all
+        the rows from the result set. All UDFs are assumed to be deterministic, that is, to always return the same
+        result when passed the same argument values. Impala might or might not skip some invocations of a UDF if
+        the result value is already known from a previous call. Therefore, do not rely on the UDF being called a
+        specific number of times, and do not return different result values based on some external factor such as
+        the current time, a random number function, or an external data source that could be updated while an
+        Impala query is in progress.
+      </li>
+
+      <li class="li">
+        The names of the function arguments in the UDF are not significant, only their number, positions, and data
+        types.
+      </li>
+
+      <li class="li">
+        You can overload the same function name by creating multiple versions of the function, each with a
+        different argument signature. For security reasons, you cannot make a UDF with the same name as any
+        built-in function.
+      </li>
+
+      <li class="li">
+        In the UDF code, you represent the function return result as a <code class="ph codeph">struct</code>. This
+        <code class="ph codeph">struct</code> contains 2 fields. The first field is a <code class="ph codeph">boolean</code> representing
+        whether the value is <code class="ph codeph">NULL</code> or not. (When this field is <code class="ph codeph">true</code>, the return
+        value is interpreted as <code class="ph codeph">NULL</code>.) The second field is the same type as the specified function
+        return type, and holds the return value when the function returns something other than
+        <code class="ph codeph">NULL</code>.
+      </li>
+
+      <li class="li">
+        In the UDF code, you represent the function arguments as an initial pointer to a UDF context structure,
+        followed by references to zero or more <code class="ph codeph">struct</code>s, corresponding to each of the arguments.
+        Each <code class="ph codeph">struct</code> has the same 2 fields as with the return value, a <code class="ph codeph">boolean</code>
+        field representing whether the argument is <code class="ph codeph">NULL</code>, and a field of the appropriate type
+        holding any non-<code class="ph codeph">NULL</code> argument value.
+      </li>
+
+      <li class="li">
+        For sample code and build instructions for UDFs,
+        see <span class="xref">the sample UDFs in the Impala github repo</span>.
+      </li>
+
+      <li class="li">
+        Because the file representing the body of the UDF is stored in HDFS, it is automatically available to all
+        the Impala nodes. You do not need to manually copy any UDF-related files between servers.
+      </li>
+
+      <li class="li">
+        Because Impala currently does not have any <code class="ph codeph">ALTER FUNCTION</code> statement, if you need to rename
+        a function, move it to a different database, or change its signature or other properties, issue a
+        <code class="ph codeph">DROP FUNCTION</code> statement for the original function followed by a <code class="ph codeph">CREATE
+        FUNCTION</code> with the desired properties.
+      </li>
+
+      <li class="li">
+        Because each UDF is associated with a particular database, either issue a <code class="ph codeph">USE</code> statement
+        before doing any <code class="ph codeph">CREATE FUNCTION</code> statements, or specify the name of the function as
+        <code class="ph codeph"><var class="keyword varname">db_name</var>.<var class="keyword varname">function_name</var></code>.
+      </li>
+    </ul>
+
+    <p class="p">
+        If you connect to different Impala nodes within an <span class="keyword cmdname">impala-shell</span> session for
+        load-balancing purposes, you can enable the <code class="ph codeph">SYNC_DDL</code> query option to make each DDL
+        statement wait before returning, until the new or changed metadata has been received by all the Impala
+        nodes. See <a class="xref" href="../shared/../topics/impala_sync_ddl.html#sync_ddl">SYNC_DDL Query Option</a> for details.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Compatibility:</strong>
+      </p>
+
+    <p class="p">
+      Impala can run UDFs that were created through Hive, as long as they refer to Impala-compatible data types
+      (not composite or nested column types). Hive can run Java-based UDFs that were created through Impala, but
+      not Impala UDFs written in C++.
+    </p>
+
+    <p class="p">
+        The Hive <code class="ph codeph">current_user()</code> function cannot be
+        called from a Java UDF through Impala.
+      </p>
+
+    <p class="p"><strong class="ph b">Persistence:</strong></p>
+
+    <p class="p">
+        In <span class="keyword">Impala 2.5</span> and higher, Impala UDFs and UDAs written in C++ are persisted in the metastore database.
+        Java UDFs are also persisted, if they were created with the new <code class="ph codeph">CREATE FUNCTION</code> syntax for Java UDFs,
+        where the Java function argument and return types are omitted.
+        Java-based UDFs created with the old <code class="ph codeph">CREATE FUNCTION</code> syntax do not persist across restarts
+        because they are held in the memory of the <span class="keyword cmdname">catalogd</span> daemon.
+        Until you re-create such Java UDFs using the new <code class="ph codeph">CREATE FUNCTION</code> syntax,
+        you must reload those Java-based UDFs by running the original <code class="ph codeph">CREATE FUNCTION</code> statements again each time
+        you restart the <span class="keyword cmdname">catalogd</span> daemon.
+        Prior to <span class="keyword">Impala 2.5</span> the requirement to reload functions after a restart applied to both C++ and Java functions.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Cancellation:</strong> Cannot be cancelled.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">HDFS permissions:</strong> This statement does not touch any HDFS files or directories,
+        therefore no HDFS permissions are required.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+    <p class="p">
+      For additional examples of all kinds of user-defined functions, see <a class="xref" href="impala_udf.html#udfs">Impala User-Defined Functions (UDFs)</a>.
+    </p>
+
+    <p class="p">
+      The following example shows how to take a Java jar file and make all the functions inside one of its classes
+      into UDFs under a single (overloaded) function name in Impala. Each <code class="ph codeph">CREATE FUNCTION</code> or
+      <code class="ph codeph">DROP FUNCTION</code> statement applies to all the overloaded Java functions with the same name.
+      This example uses the signatureless syntax for <code class="ph codeph">CREATE FUNCTION</code> and <code class="ph codeph">DROP FUNCTION</code>,
+      which is available in <span class="keyword">Impala 2.5</span> and higher.
+    </p>
+    <p class="p">
+      At the start, the jar file is in the local filesystem. Then it is copied into HDFS, so that it is
+      available for Impala to reference through the <code class="ph codeph">CREATE FUNCTION</code> statement and
+      queries that refer to the Impala function name.
+    </p>
+<pre class="pre codeblock"><code>
+$ jar -tvf udf-examples.jar
+     0 Mon Feb 22 04:06:50 PST 2016 META-INF/
+   122 Mon Feb 22 04:06:48 PST 2016 META-INF/MANIFEST.MF
+     0 Mon Feb 22 04:06:46 PST 2016 org/
+     0 Mon Feb 22 04:06:46 PST 2016 org/apache/
+     0 Mon Feb 22 04:06:46 PST 2016 org/apache/impala/
+  2460 Mon Feb 22 04:06:46 PST 2016 org/apache/impala/IncompatibleUdfTest.class
+   541 Mon Feb 22 04:06:46 PST 2016 org/apache/impala/TestUdfException.class
+  3438 Mon Feb 22 04:06:46 PST 2016 org/apache/impala/JavaUdfTest.class
+  5872 Mon Feb 22 04:06:46 PST 2016 org/apache/impala/TestUdf.class
+...
+$ hdfs dfs -put udf-examples.jar /user/impala/udfs
+$ hdfs dfs -ls /user/impala/udfs
+Found 2 items
+-rw-r--r--   3 jrussell supergroup        853 2015-10-09 14:05 /user/impala/udfs/hello_world.jar
+-rw-r--r--   3 jrussell supergroup       7366 2016-06-08 14:25 /user/impala/udfs/udf-examples.jar
+</code></pre>
+    <p class="p">
+      In <span class="keyword cmdname">impala-shell</span>, the <code class="ph codeph">CREATE FUNCTION</code> refers to the HDFS path of the jar file
+      and the fully qualified class name inside the jar. Each of the functions inside the class becomes an
+      Impala function, each one overloaded under the specified Impala function name.
+    </p>
+<pre class="pre codeblock"><code>
+[localhost:21000] &gt; create function testudf location '/user/impala/udfs/udf-examples.jar' symbol='org.apache.impala.TestUdf';
+[localhost:21000] &gt; show functions;
++-------------+---------------------------------------+-------------+---------------+
+| return type | signature                             | binary type | is persistent |
++-------------+---------------------------------------+-------------+---------------+
+| BIGINT      | testudf(BIGINT)                       | JAVA        | true          |
+| BOOLEAN     | testudf(BOOLEAN)                      | JAVA        | true          |
+| BOOLEAN     | testudf(BOOLEAN, BOOLEAN)             | JAVA        | true          |
+| BOOLEAN     | testudf(BOOLEAN, BOOLEAN, BOOLEAN)    | JAVA        | true          |
+| DOUBLE      | testudf(DOUBLE)                       | JAVA        | true          |
+| DOUBLE      | testudf(DOUBLE, DOUBLE)               | JAVA        | true          |
+| DOUBLE      | testudf(DOUBLE, DOUBLE, DOUBLE)       | JAVA        | true          |
+| FLOAT       | testudf(FLOAT)                        | JAVA        | true          |
+| FLOAT       | testudf(FLOAT, FLOAT)                 | JAVA        | true          |
+| FLOAT       | testudf(FLOAT, FLOAT, FLOAT)          | JAVA        | true          |
+| INT         | testudf(INT)                          | JAVA        | true          |
+| DOUBLE      | testudf(INT, DOUBLE)                  | JAVA        | true          |
+| INT         | testudf(INT, INT)                     | JAVA        | true          |
+| INT         | testudf(INT, INT, INT)                | JAVA        | true          |
+| SMALLINT    | testudf(SMALLINT)                     | JAVA        | true          |
+| SMALLINT    | testudf(SMALLINT, SMALLINT)           | JAVA        | true          |
+| SMALLINT    | testudf(SMALLINT, SMALLINT, SMALLINT) | JAVA        | true          |
+| STRING      | testudf(STRING)                       | JAVA        | true          |
+| STRING      | testudf(STRING, STRING)               | JAVA        | true          |
+| STRING      | testudf(STRING, STRING, STRING)       | JAVA        | true          |
+| TINYINT     | testudf(TINYINT)                      | JAVA        | true          |
++-------------+---------------------------------------+-------------+---------------+
+</code></pre>
+    <p class="p">
+      These are all simple functions that return their single arguments, or
+      sum, concatenate, and so on their multiple arguments. Impala determines which
+      overloaded function to use based on the number and types of the arguments.
+    </p>
+<pre class="pre codeblock"><code>
+insert into bigint_x values (1), (2), (4), (3);
+select testudf(x) from bigint_x;
++-----------------+
+| udfs.testudf(x) |
++-----------------+
+| 1               |
+| 2               |
+| 4               |
+| 3               |
++-----------------+
+
+insert into int_x values (1), (2), (4), (3);
+select testudf(x, x+1, x*x) from int_x;
++-------------------------------+
+| udfs.testudf(x, x + 1, x * x) |
++-------------------------------+
+| 4                             |
+| 9                             |
+| 25                            |
+| 16                            |
++-------------------------------+
+
+select testudf(x) from string_x;
++-----------------+
+| udfs.testudf(x) |
++-----------------+
+| one             |
+| two             |
+| four            |
+| three           |
++-----------------+
+select testudf(x,x) from string_x;
++--------------------+
+| udfs.testudf(x, x) |
++--------------------+
+| oneone             |
+| twotwo             |
+| fourfour           |
+| threethree         |
++--------------------+
+</code></pre>
+
+    <p class="p">
+      The previous example used the same Impala function name as the name of the class.
+      This example shows how the Impala function name is independent of the underlying
+      Java class or function names. A second <code class="ph codeph">CREATE FUNCTION</code> statement
+      results in a set of overloaded functions all named <code class="ph codeph">my_func</code>,
+      to go along with the overloaded functions all named <code class="ph codeph">testudf</code>.
+    </p>
+<pre class="pre codeblock"><code>
+create function my_func location '/user/impala/udfs/udf-examples.jar'
+  symbol='org.apache.impala.TestUdf';
+
+show functions;
++-------------+---------------------------------------+-------------+---------------+
+| return type | signature                             | binary type | is persistent |
++-------------+---------------------------------------+-------------+---------------+
+| BIGINT      | my_func(BIGINT)                       | JAVA        | true          |
+| BOOLEAN     | my_func(BOOLEAN)                      | JAVA        | true          |
+| BOOLEAN     | my_func(BOOLEAN, BOOLEAN)             | JAVA        | true          |
+...
+| BIGINT      | testudf(BIGINT)                       | JAVA        | true          |
+| BOOLEAN     | testudf(BOOLEAN)                      | JAVA        | true          |
+| BOOLEAN     | testudf(BOOLEAN, BOOLEAN)             | JAVA        | true          |
+...
+</code></pre>
+    <p class="p">
+      The corresponding <code class="ph codeph">DROP FUNCTION</code> statement with no signature
+      drops all the overloaded functions with that name.
+    </p>
+<pre class="pre codeblock"><code>
+drop function my_func;
+show functions;
++-------------+---------------------------------------+-------------+---------------+
+| return type | signature                             | binary type | is persistent |
++-------------+---------------------------------------+-------------+---------------+
+| BIGINT      | testudf(BIGINT)                       | JAVA        | true          |
+| BOOLEAN     | testudf(BOOLEAN)                      | JAVA        | true          |
+| BOOLEAN     | testudf(BOOLEAN, BOOLEAN)             | JAVA        | true          |
+...
+</code></pre>
+    <p class="p">
+      The signatureless <code class="ph codeph">CREATE FUNCTION</code> syntax for Java UDFs ensures that
+      the functions shown in this example remain available after the Impala service
+      (specifically, the Catalog Server) are restarted.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+    <p class="p">
+      <a class="xref" href="impala_udf.html#udfs">Impala User-Defined Functions (UDFs)</a> for more background information, usage instructions, and examples for
+      Impala UDFs; <a class="xref" href="impala_drop_function.html#drop_function">DROP FUNCTION Statement</a>
+    </p>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_langref_sql.html">Impala SQL Statements</a></div></div></nav></article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_create_role.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_create_role.html b/docs/build3x/html/topics/impala_create_role.html
new file mode 100644
index 0000000..2930c3a
--- /dev/null
+++ b/docs/build3x/html/topics/impala_create_role.html
@@ -0,0 +1,70 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_langref_sql.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="create_role"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>CREATE ROLE Statement (Impala 2.0 or higher only)</title></head><body id="create_role"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">CREATE ROLE Statement (<span class="keyword">Impala 2.0</span> or higher only)</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+
+
+      The <code class="ph codeph">CREATE ROLE</code> statement creates a role to which privileges can be granted. Privileges can
+      be granted to roles, which can then be assigned to users. A user that has been assigned a role will only be
+      able to exercise the privileges of that role. Only users that have administrative privileges can create/drop
+      roles. By default, the <code class="ph codeph">hive</code>, <code class="ph codeph">impala</code> and <code class="ph codeph">hue</code> users have
+      administrative privileges in Sentry.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>CREATE ROLE <var class="keyword varname">role_name</var>
+</code></pre>
+
+    <p class="p">
+        <strong class="ph b">Required privileges:</strong>
+      </p>
+
+    <p class="p">
+      Only administrative users (those with <code class="ph codeph">ALL</code> privileges on the server, defined in the Sentry
+      policy file) can use this statement.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Compatibility:</strong>
+      </p>
+
+    <p class="p">
+      Impala makes use of any roles and privileges specified by the <code class="ph codeph">GRANT</code> and
+      <code class="ph codeph">REVOKE</code> statements in Hive, and Hive makes use of any roles and privileges specified by the
+      <code class="ph codeph">GRANT</code> and <code class="ph codeph">REVOKE</code> statements in Impala. The Impala <code class="ph codeph">GRANT</code>
+      and <code class="ph codeph">REVOKE</code> statements for privileges do not require the <code class="ph codeph">ROLE</code> keyword to be
+      repeated before each role name, unlike the equivalent Hive statements.
+    </p>
+
+
+
+    <p class="p">
+        <strong class="ph b">Cancellation:</strong> Cannot be cancelled.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">HDFS permissions:</strong> This statement does not touch any HDFS files or directories,
+        therefore no HDFS permissions are required.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+    <p class="p">
+      <a class="xref" href="impala_authorization.html#authorization">Enabling Sentry Authorization for Impala</a>, <a class="xref" href="impala_grant.html#grant">GRANT Statement (Impala 2.0 or higher only)</a>,
+      <a class="xref" href="impala_revoke.html#revoke">REVOKE Statement (Impala 2.0 or higher only)</a>, <a class="xref" href="impala_drop_role.html#drop_role">DROP ROLE Statement (Impala 2.0 or higher only)</a>,
+      <a class="xref" href="impala_show.html#show">SHOW Statement</a>
+    </p>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_langref_sql.html">Impala SQL Statements</a></div></div></nav></article></main></body></html>

[45/51] [partial] impala git commit: [DOCS] Impala doc site update for 3.0

Posted by mi...@apache.org.

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_avro.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_avro.html b/docs/build3x/html/topics/impala_avro.html
new file mode 100644
index 0000000..2c6c196
--- /dev/null
+++ b/docs/build3x/html/topics/impala_avro.html
@@ -0,0 +1,565 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_file_formats.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta
  name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="avro"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Using the Avro File Format with Impala Tables</title></head><body id="avro"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Using the Avro File Format with Impala Tables</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+
+      Impala supports using tables whose data files use the Avro file format. Impala can query Avro
+      tables, and in Impala 1.4.0 and higher can create them, but currently cannot insert data into them. For
+      insert operations, use Hive, then switch back to Impala to run queries.
+    </p>
+
+    <table class="table"><caption><span class="table--title-label">Table 1. </span><span class="title">Avro Format Support in Impala</span></caption><colgroup><col style="width:10%"><col style="width:10%"><col style="width:20%"><col style="width:30%"><col style="width:30%"></colgroup><thead class="thead">
+          <tr class="row">
+            <th class="entry nocellnorowborder" id="avro__entry__1">
+              File Type
+            </th>
+            <th class="entry nocellnorowborder" id="avro__entry__2">
+              Format
+            </th>
+            <th class="entry nocellnorowborder" id="avro__entry__3">
+              Compression Codecs
+            </th>
+            <th class="entry nocellnorowborder" id="avro__entry__4">
+              Impala Can CREATE?
+            </th>
+            <th class="entry nocellnorowborder" id="avro__entry__5">
+              Impala Can INSERT?
+            </th>
+          </tr>
+        </thead><tbody class="tbody">
+          <tr class="row">
+            <td class="entry nocellnorowborder" headers="avro__entry__1 ">
+              <a class="xref" href="impala_avro.html#avro">Avro</a>
+            </td>
+            <td class="entry nocellnorowborder" headers="avro__entry__2 ">
+              Structured
+            </td>
+            <td class="entry nocellnorowborder" headers="avro__entry__3 ">
+              Snappy, gzip, deflate, bzip2
+            </td>
+            <td class="entry nocellnorowborder" headers="avro__entry__4 ">
+              Yes, in Impala 1.4.0 and higher. Before that, create the table using Hive.
+            </td>
+            <td class="entry nocellnorowborder" headers="avro__entry__5 ">
+              No. Import data by using <code class="ph codeph">LOAD DATA</code> on data files already in the right format, or use
+              <code class="ph codeph">INSERT</code> in Hive followed by <code class="ph codeph">REFRESH <var class="keyword varname">table_name</var></code> in Impala.
+            </td>
+
+          </tr>
+        </tbody></table>
+
+    <p class="p toc inpage"></p>
+  </div>
+
+  <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_file_formats.html">How Impala Works with Hadoop File Formats</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="avro__avro_create_table">
+
+    <h2 class="title topictitle2" id="ariaid-title2">Creating Avro Tables</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        To create a new table using the Avro file format, issue the <code class="ph codeph">CREATE TABLE</code> statement through
+        Impala with the <code class="ph codeph">STORED AS AVRO</code> clause, or through Hive. If you create the table through
+        Impala, you must include column definitions that match the fields specified in the Avro schema. With Hive,
+        you can omit the columns and just specify the Avro schema.
+      </p>
+
+      <p class="p">
+        In <span class="keyword">Impala 2.3</span> and higher, the <code class="ph codeph">CREATE TABLE</code> for Avro tables can include
+        SQL-style column definitions rather than specifying Avro notation through the <code class="ph codeph">TBLPROPERTIES</code>
+        clause. Impala issues warning messages if there are any mismatches between the types specified in the
+        SQL column definitions and the underlying types; for example, any <code class="ph codeph">TINYINT</code> or
+        <code class="ph codeph">SMALLINT</code> columns are treated as <code class="ph codeph">INT</code> in the underlying Avro files,
+        and therefore are displayed as <code class="ph codeph">INT</code> in any <code class="ph codeph">DESCRIBE</code> or
+        <code class="ph codeph">SHOW CREATE TABLE</code> output.
+      </p>
+
+      <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+        <p class="p">
+        Currently, Avro tables cannot contain <code class="ph codeph">TIMESTAMP</code> columns. If you need to store date and
+        time values in Avro tables, as a workaround you can use a <code class="ph codeph">STRING</code> representation of the
+        values, convert the values to <code class="ph codeph">BIGINT</code> with the <code class="ph codeph">UNIX_TIMESTAMP()</code> function,
+        or create separate numeric columns for individual date and time fields using the <code class="ph codeph">EXTRACT()</code>
+        function.
+      </p>
+      </div>
+
+
+
+      <p class="p">
+        The following examples demonstrate creating an Avro table in Impala, using either an inline column
+        specification or one taken from a JSON file stored in HDFS:
+      </p>
+
+<pre class="pre codeblock"><code>
+[localhost:21000] &gt; CREATE TABLE avro_only_sql_columns
+                  &gt; (
+                  &gt;   id INT,
+                  &gt;   bool_col BOOLEAN,
+                  &gt;   tinyint_col TINYINT, /* Gets promoted to INT */
+                  &gt;   smallint_col SMALLINT, /* Gets promoted to INT */
+                  &gt;   int_col INT,
+                  &gt;   bigint_col BIGINT,
+                  &gt;   float_col FLOAT,
+                  &gt;   double_col DOUBLE,
+                  &gt;   date_string_col STRING,
+                  &gt;   string_col STRING
+                  &gt; )
+                  &gt; STORED AS AVRO;
+
+[localhost:21000] &gt; CREATE TABLE impala_avro_table
+                  &gt; (bool_col BOOLEAN, int_col INT, long_col BIGINT, float_col FLOAT, double_col DOUBLE, string_col STRING, nullable_int INT)
+                  &gt; STORED AS AVRO
+                  &gt; TBLPROPERTIES ('avro.schema.literal'='{
+                  &gt;    "name": "my_record",
+                  &gt;    "type": "record",
+                  &gt;    "fields": [
+                  &gt;       {"name":"bool_col", "type":"boolean"},
+                  &gt;       {"name":"int_col", "type":"int"},
+                  &gt;       {"name":"long_col", "type":"long"},
+                  &gt;       {"name":"float_col", "type":"float"},
+                  &gt;       {"name":"double_col", "type":"double"},
+                  &gt;       {"name":"string_col", "type":"string"},
+                  &gt;       {"name": "nullable_int", "type": ["null", "int"]}]}');
+
+[localhost:21000] &gt; CREATE TABLE avro_examples_of_all_types (
+                  &gt;     id INT,
+                  &gt;     bool_col BOOLEAN,
+                  &gt;     tinyint_col TINYINT,
+                  &gt;     smallint_col SMALLINT,
+                  &gt;     int_col INT,
+                  &gt;     bigint_col BIGINT,
+                  &gt;     float_col FLOAT,
+                  &gt;     double_col DOUBLE,
+                  &gt;     date_string_col STRING,
+                  &gt;     string_col STRING
+                  &gt;   )
+                  &gt;   STORED AS AVRO
+                  &gt;   TBLPROPERTIES ('avro.schema.url'='hdfs://localhost:8020/avro_schemas/alltypes.json');
+
+</code></pre>
+
+      <p class="p">
+        The following example demonstrates creating an Avro table in Hive:
+      </p>
+
+<pre class="pre codeblock"><code>
+hive&gt; CREATE TABLE hive_avro_table
+    &gt; ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
+    &gt; STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
+    &gt; OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
+    &gt; TBLPROPERTIES ('avro.schema.literal'='{
+    &gt;    "name": "my_record",
+    &gt;    "type": "record",
+    &gt;    "fields": [
+    &gt;       {"name":"bool_col", "type":"boolean"},
+    &gt;       {"name":"int_col", "type":"int"},
+    &gt;       {"name":"long_col", "type":"long"},
+    &gt;       {"name":"float_col", "type":"float"},
+    &gt;       {"name":"double_col", "type":"double"},
+    &gt;       {"name":"string_col", "type":"string"},
+    &gt;       {"name": "nullable_int", "type": ["null", "int"]}]}');
+
+</code></pre>
+
+      <p class="p">
+        Each field of the record becomes a column of the table. Note that any other information, such as the record
+        name, is ignored.
+      </p>
+
+
+
+      <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+        For nullable Avro columns, make sure to put the <code class="ph codeph">"null"</code> entry before the actual type name.
+        In Impala, all columns are nullable; Impala currently does not have a <code class="ph codeph">NOT NULL</code> clause. Any
+        non-nullable property is only enforced on the Avro side.
+      </div>
+
+      <p class="p">
+        Most column types map directly from Avro to Impala under the same names. These are the exceptions and
+        special cases to consider:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          The <code class="ph codeph">DECIMAL</code> type is defined in Avro as a <code class="ph codeph">BYTE</code> type with the
+          <code class="ph codeph">logicalType</code> property set to <code class="ph codeph">"decimal"</code> and a specified precision and
+          scale.
+        </li>
+
+        <li class="li">
+          The Avro <code class="ph codeph">long</code> type maps to <code class="ph codeph">BIGINT</code> in Impala.
+        </li>
+      </ul>
+
+      <p class="p">
+        If you create the table through Hive, switch back to <span class="keyword cmdname">impala-shell</span> and issue an
+        <code class="ph codeph">INVALIDATE METADATA <var class="keyword varname">table_name</var></code> statement. Then you can run queries for
+        that table through <span class="keyword cmdname">impala-shell</span>.
+      </p>
+
+      <div class="p">
+        In rare instances, a mismatch could occur between the Avro schema and the column definitions in the
+        metastore database. In <span class="keyword">Impala 2.3</span> and higher, Impala checks for such inconsistencies during
+        a <code class="ph codeph">CREATE TABLE</code> statement and each time it loads the metadata for a table (for example,
+        after <code class="ph codeph">INVALIDATE METADATA</code>). Impala uses the following rules to determine how to treat
+        mismatching columns, a process known as <dfn class="term">schema reconciliation</dfn>:
+        <ul class="ul">
+        <li class="li">
+          If there is a mismatch in the number of columns, Impala uses the column
+          definitions from the Avro schema.
+        </li>
+        <li class="li">
+          If there is a mismatch in column name or type, Impala uses the column definition from the Avro schema.
+          Because a <code class="ph codeph">CHAR</code> or <code class="ph codeph">VARCHAR</code> column in Impala maps to an Avro <code class="ph codeph">STRING</code>,
+          this case is not considered a mismatch and the column is preserved as <code class="ph codeph">CHAR</code> or <code class="ph codeph">VARCHAR</code>
+          in the reconciled schema. <span class="ph">Prior to <span class="keyword">Impala 2.7</span> the column
+          name and comment for such <code class="ph codeph">CHAR</code> and <code class="ph codeph">VARCHAR</code> columns was also taken from the SQL column definition.
+          In <span class="keyword">Impala 2.7</span> and higher, the column name and comment from the Avro schema file take precedence for such columns,
+          and only the <code class="ph codeph">CHAR</code> or <code class="ph codeph">VARCHAR</code> type is preserved from the SQL column definition.</span>
+        </li>
+        <li class="li">
+          An Impala <code class="ph codeph">TIMESTAMP</code> column definition maps to an Avro <code class="ph codeph">STRING</code> and is presented as a <code class="ph codeph">STRING</code>
+          in the reconciled schema, because Avro has no binary <code class="ph codeph">TIMESTAMP</code> representation.
+          As a result, no Avro table can have a <code class="ph codeph">TIMESTAMP</code> column; this restriction is the same as
+          in earlier Impala releases.
+        </li>
+        </ul>
+      </div>
+
+      <p class="p">
+        <strong class="ph b">Complex type considerations:</strong>
+        Although you can create tables in this file format using
+        the complex types (<code class="ph codeph">ARRAY</code>, <code class="ph codeph">STRUCT</code>,
+        and <code class="ph codeph">MAP</code>) available in <span class="keyword">Impala 2.3</span> and higher,
+        currently, Impala can query these types only in Parquet tables.
+        <span class="ph">
+        The one exception to the preceding rule is <code class="ph codeph">COUNT(*)</code> queries on RCFile tables that include complex types.
+        Such queries are allowed in <span class="keyword">Impala 2.6</span> and higher.
+        </span>
+      </p>
+
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="avro__avro_map_table">
+
+    <h2 class="title topictitle2" id="ariaid-title3">Using a Hive-Created Avro Table in Impala</h2>
+
+    <div class="body conbody">
+
+      <div class="p">
+        If you have an Avro table created through Hive, you can use it in Impala as long as it contains only
+        Impala-compatible data types. It cannot contain:
+        <ul class="ul">
+          <li class="li">
+            Complex types: <code class="ph codeph">array</code>, <code class="ph codeph">map</code>, <code class="ph codeph">record</code>,
+            <code class="ph codeph">struct</code>, <code class="ph codeph">union</code> other than
+            <code class="ph codeph">[<var class="keyword varname">supported_type</var>,null]</code> or
+            <code class="ph codeph">[null,<var class="keyword varname">supported_type</var>]</code>
+          </li>
+
+          <li class="li">
+            The Avro-specific types <code class="ph codeph">enum</code>, <code class="ph codeph">bytes</code>, and <code class="ph codeph">fixed</code>
+          </li>
+
+          <li class="li">
+            Any scalar type other than those listed in <a class="xref" href="impala_datatypes.html#datatypes">Data Types</a>
+          </li>
+        </ul>
+        Because Impala and Hive share the same metastore database, Impala can directly access the table definitions
+        and data for tables that were created in Hive.
+      </div>
+
+      <p class="p">
+        If you create an Avro table in Hive, issue an <code class="ph codeph">INVALIDATE METADATA</code> the next time you
+        connect to Impala through <span class="keyword cmdname">impala-shell</span>. This is a one-time operation to make Impala
+        aware of the new table. You can issue the statement while connected to any Impala node, and the catalog
+        service broadcasts the change to all other Impala nodes.
+      </p>
+
+      <p class="p">
+        If you load new data into an Avro table through Hive, either through a Hive <code class="ph codeph">LOAD DATA</code> or
+        <code class="ph codeph">INSERT</code> statement, or by manually copying or moving files into the data directory for the
+        table, issue a <code class="ph codeph">REFRESH <var class="keyword varname">table_name</var></code> statement the next time you connect
+        to Impala through <span class="keyword cmdname">impala-shell</span>. You can issue the statement while connected to any
+        Impala node, and the catalog service broadcasts the change to all other Impala nodes. If you issue the
+        <code class="ph codeph">LOAD DATA</code> statement through Impala, you do not need a <code class="ph codeph">REFRESH</code> afterward.
+      </p>
+
+      <p class="p">
+        Impala only supports fields of type <code class="ph codeph">boolean</code>, <code class="ph codeph">int</code>, <code class="ph codeph">long</code>,
+        <code class="ph codeph">float</code>, <code class="ph codeph">double</code>, and <code class="ph codeph">string</code>, or unions of these types with
+        null; for example, <code class="ph codeph">["string", "null"]</code>. Unions with <code class="ph codeph">null</code> essentially
+        create a nullable type.
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="avro__avro_json">
+
+    <h2 class="title topictitle2" id="ariaid-title4">Specifying the Avro Schema through JSON</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        While you can embed a schema directly in your <code class="ph codeph">CREATE TABLE</code> statement, as shown above,
+        column width restrictions in the Hive metastore limit the length of schema you can specify. If you
+        encounter problems with long schema literals, try storing your schema as a <code class="ph codeph">JSON</code> file in
+        HDFS instead. Specify your schema in HDFS using table properties similar to the following:
+      </p>
+
+<pre class="pre codeblock"><code>tblproperties ('avro.schema.url'='hdfs//your-name-node:port/path/to/schema.json');</code></pre>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title5" id="avro__avro_load_data">
+
+    <h2 class="title topictitle2" id="ariaid-title5">Loading Data into an Avro Table</h2>
+
+
+    <div class="body conbody">
+
+      <p class="p">
+        Currently, Impala cannot write Avro data files. Therefore, an Avro table cannot be used as the destination
+        of an Impala <code class="ph codeph">INSERT</code> statement or <code class="ph codeph">CREATE TABLE AS SELECT</code>.
+      </p>
+
+      <p class="p">
+        To copy data from another table, issue any <code class="ph codeph">INSERT</code> statements through Hive. For information
+        about loading data into Avro tables through Hive, see
+        <a class="xref" href="https://cwiki.apache.org/confluence/display/Hive/AvroSerDe" target="_blank">Avro
+        page on the Hive wiki</a>.
+      </p>
+
+      <p class="p">
+        If you already have data files in Avro format, you can also issue <code class="ph codeph">LOAD DATA</code> in either
+        Impala or Hive. Impala can move existing Avro data files into an Avro table, it just cannot create new
+        Avro data files.
+      </p>
+
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title6" id="avro__avro_compression">
+
+    <h2 class="title topictitle2" id="ariaid-title6">Enabling Compression for Avro Tables</h2>
+
+
+    <div class="body conbody">
+
+      <p class="p">
+
+        To enable compression for Avro tables, specify settings in the Hive shell to enable compression and to
+        specify a codec, then issue a <code class="ph codeph">CREATE TABLE</code> statement as in the preceding examples. Impala
+        supports the <code class="ph codeph">snappy</code> and <code class="ph codeph">deflate</code> codecs for Avro tables.
+      </p>
+
+      <p class="p">
+        For example:
+      </p>
+
+<pre class="pre codeblock"><code>hive&gt; set hive.exec.compress.output=true;
+hive&gt; set avro.output.codec=snappy;</code></pre>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title7" id="avro__avro_schema_evolution">
+
+    <h2 class="title topictitle2" id="ariaid-title7">How Impala Handles Avro Schema Evolution</h2>
+
+
+    <div class="body conbody">
+
+      <p class="p">
+        Starting in Impala 1.1, Impala can deal with Avro data files that employ <dfn class="term">schema evolution</dfn>,
+        where different data files within the same table use slightly different type definitions. (You would
+        perform the schema evolution operation by issuing an <code class="ph codeph">ALTER TABLE</code> statement in the Hive
+        shell.) The old and new types for any changed columns must be compatible, for example a column might start
+        as an <code class="ph codeph">int</code> and later change to a <code class="ph codeph">bigint</code> or <code class="ph codeph">float</code>.
+      </p>
+
+      <p class="p">
+        As with any other tables where the definitions are changed or data is added outside of the current
+        <span class="keyword cmdname">impalad</span> node, ensure that Impala loads the latest metadata for the table if the Avro
+        schema is modified through Hive. Issue a <code class="ph codeph">REFRESH <var class="keyword varname">table_name</var></code> or
+        <code class="ph codeph">INVALIDATE METADATA <var class="keyword varname">table_name</var></code> statement. <code class="ph codeph">REFRESH</code>
+        reloads the metadata immediately, <code class="ph codeph">INVALIDATE METADATA</code> reloads the metadata the next time
+        the table is accessed.
+      </p>
+
+      <p class="p">
+        When Avro data files or columns are not consulted during a query, Impala does not check for consistency.
+        Thus, if you issue <code class="ph codeph">SELECT c1, c2 FROM t1</code>, Impala does not return any error if the column
+        <code class="ph codeph">c3</code> changed in an incompatible way. If a query retrieves data from some partitions but not
+        others, Impala does not check the data files for the unused partitions.
+      </p>
+
+      <p class="p">
+        In the Hive DDL statements, you can specify an <code class="ph codeph">avro.schema.literal</code> table property (if the
+        schema definition is short) or an <code class="ph codeph">avro.schema.url</code> property (if the schema definition is
+        long, or to allow convenient editing for the definition).
+      </p>
+
+      <p class="p">
+        For example, running the following SQL code in the Hive shell creates a table using the Avro file format
+        and puts some sample data into it:
+      </p>
+
+<pre class="pre codeblock"><code>CREATE TABLE avro_table (a string, b string)
+ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
+STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
+OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
+TBLPROPERTIES (
+  'avro.schema.literal'='{
+    "type": "record",
+    "name": "my_record",
+    "fields": [
+      {"name": "a", "type": "int"},
+      {"name": "b", "type": "string"}
+    ]}');
+
+INSERT OVERWRITE TABLE avro_table SELECT 1, "avro" FROM functional.alltypes LIMIT 1;
+</code></pre>
+
+      <p class="p">
+        Once the Avro table is created and contains data, you can query it through the
+        <span class="keyword cmdname">impala-shell</span> command:
+      </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; select * from avro_table;
++---+------+
+| a | b    |
++---+------+
+| 1 | avro |
++---+------+
+</code></pre>
+
+      <p class="p">
+        Now in the Hive shell, you change the type of a column and add a new column with a default value:
+      </p>
+
+<pre class="pre codeblock"><code>-- Promote column "a" from INT to FLOAT (no need to update Avro schema)
+ALTER TABLE avro_table CHANGE A A FLOAT;
+
+-- Add column "c" with default
+ALTER TABLE avro_table ADD COLUMNS (c int);
+ALTER TABLE avro_table SET TBLPROPERTIES (
+  'avro.schema.literal'='{
+    "type": "record",
+    "name": "my_record",
+    "fields": [
+      {"name": "a", "type": "int"},
+      {"name": "b", "type": "string"},
+      {"name": "c", "type": "int", "default": 10}
+    ]}');
+</code></pre>
+
+      <p class="p">
+        Once again in <span class="keyword cmdname">impala-shell</span>, you can query the Avro table based on its latest schema
+        definition. Because the table metadata was changed outside of Impala, you issue a <code class="ph codeph">REFRESH</code>
+        statement first so that Impala has up-to-date metadata for the table.
+      </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; refresh avro_table;
+[localhost:21000] &gt; select * from avro_table;
++---+------+----+
+| a | b    | c  |
++---+------+----+
+| 1 | avro | 10 |
++---+------+----+
+</code></pre>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title8" id="avro__avro_data_types">
+
+    <h2 class="title topictitle2" id="ariaid-title8">Data Type Considerations for Avro Tables</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        The Avro format defines a set of data types whose names differ from the names of the corresponding Impala
+        data types. If you are preparing Avro files using other Hadoop components such as Pig or MapReduce, you
+        might need to work with the type names defined by Avro. The following figure lists the Avro-defined types
+        and the equivalent types in Impala.
+      </p>
+
+<pre class="pre codeblock"><code>Primitive Types (Avro -&gt; Impala)
+--------------------------------
+STRING -&gt; STRING
+STRING -&gt; CHAR
+STRING -&gt; VARCHAR
+INT -&gt; INT
+BOOLEAN -&gt; BOOLEAN
+LONG -&gt;  BIGINT
+FLOAT -&gt;  FLOAT
+DOUBLE -&gt; DOUBLE
+
+Logical Types
+-------------
+BYTES + logicalType = "decimal" -&gt; DECIMAL
+
+Avro Types with No Impala Equivalent
+------------------------------------
+RECORD, MAP, ARRAY, UNION,  ENUM, FIXED, NULL
+
+Impala Types with No Avro Equivalent
+------------------------------------
+TIMESTAMP
+
+</code></pre>
+
+      <p class="p">
+        The Avro specification allows string values up to 2**64 bytes in length.
+        Impala queries for Avro tables use 32-bit integers to hold string lengths.
+        In <span class="keyword">Impala 2.5</span> and higher, Impala truncates <code class="ph codeph">CHAR</code>
+        and <code class="ph codeph">VARCHAR</code> values in Avro tables to (2**31)-1 bytes.
+        If a query encounters a <code class="ph codeph">STRING</code> value longer than (2**31)-1
+        bytes in an Avro table, the query fails. In earlier releases,
+        encountering such long values in an Avro table could cause a crash.
+      </p>
+
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title9" id="avro__avro_performance">
+
+    <h2 class="title topictitle2" id="ariaid-title9">Query Performance for Impala Avro Tables</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        In general, expect query performance with Avro tables to be
+        faster than with tables using text data, but slower than with
+        Parquet tables. See <a class="xref" href="impala_parquet.html#parquet">Using the Parquet File Format with Impala Tables</a>
+        for information about using the Parquet file format for
+        high-performance analytic queries.
+      </p>
+
+      <p class="p">
+        In <span class="keyword">Impala 2.6</span> and higher, Impala queries are optimized for files stored in Amazon S3.
+        For Impala tables that use the file formats Parquet, RCFile, SequenceFile,
+        Avro, and uncompressed text, the setting <code class="ph codeph">fs.s3a.block.size</code>
+        in the <span class="ph filepath">core-site.xml</span> configuration file determines
+        how Impala divides the I/O work of reading the data files. This configuration
+        setting is specified in bytes. By default, this
+        value is 33554432 (32 MB), meaning that Impala parallelizes S3 read operations on the files
+        as if they were made up of 32 MB blocks. For example, if your S3 queries primarily access
+        Parquet files written by MapReduce or Hive, increase <code class="ph codeph">fs.s3a.block.size</code>
+        to 134217728 (128 MB) to match the row group size of those files. If most S3 queries involve
+        Parquet files written by Impala, increase <code class="ph codeph">fs.s3a.block.size</code>
+        to 268435456 (256 MB) to match the row group size produced by Impala.
+      </p>
+
+    </div>
+  </article>
+
+</article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_batch_size.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_batch_size.html b/docs/build3x/html/topics/impala_batch_size.html
new file mode 100644
index 0000000..cf89ad1
--- /dev/null
+++ b/docs/build3x/html/topics/impala_batch_size.html
@@ -0,0 +1,34 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="batch_size"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>BATCH_SIZE Query Option</title></head><body id="batch_size"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">BATCH_SIZE Query Option</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+
+      Number of rows evaluated at a time by SQL operators. Unspecified or a size of 0 uses a predefined default
+      size. Using a large number improves responsiveness, especially for scan operations, at the cost of a higher memory footprint.
+    </p>
+
+    <p class="p">
+      This option is primarily for testing during Impala development, or for use under the direction of <span class="keyword">the appropriate support channel</span>.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Type:</strong> numeric
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Default:</strong> 0 (meaning the predefined default of 1024)
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Range:</strong> 0-65536. The value of 0 still has the special meaning of <span class="q">"use the default"</span>,
+      so the effective range is 1-65536. The maximum applies in <span class="keyword">Impala 2.11</span> and higher.
+    </p>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_bigint.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_bigint.html b/docs/build3x/html/topics/impala_bigint.html
new file mode 100644
index 0000000..ac3d700
--- /dev/null
+++ b/docs/build3x/html/topics/impala_bigint.html
@@ -0,0 +1,138 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_datatypes.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="bigint"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>BIGINT Data Type</title></head><body id="bigint"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">BIGINT Data Type</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+      An 8-byte integer data type used in <code class="ph codeph">CREATE TABLE</code> and <code class="ph codeph">ALTER TABLE</code>
+      statements.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+    <p class="p">
+      In the column definition of a <code class="ph codeph">CREATE TABLE</code> statement:
+    </p>
+
+<pre class="pre codeblock"><code><var class="keyword varname">column_name</var> BIGINT</code></pre>
+
+    <p class="p">
+      <strong class="ph b">Range:</strong> -9223372036854775808 .. 9223372036854775807. There is no <code class="ph codeph">UNSIGNED</code> subtype.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Conversions:</strong> Impala automatically converts to a floating-point type (<code class="ph codeph">FLOAT</code> or
+      <code class="ph codeph">DOUBLE</code>) automatically. Use <code class="ph codeph">CAST()</code> to convert to <code class="ph codeph">TINYINT</code>,
+      <code class="ph codeph">SMALLINT</code>, <code class="ph codeph">INT</code>, <code class="ph codeph">STRING</code>, or <code class="ph codeph">TIMESTAMP</code>.
+      <span class="ph">
+          Casting an integer or floating-point value <code class="ph codeph">N</code> to
+        <code class="ph codeph">TIMESTAMP</code> produces a value that is <code class="ph codeph">N</code> seconds past the start of the epoch
+        date (January 1, 1970). By default, the result value represents a date and time in the UTC time zone.
+        If the setting <code class="ph codeph">--use_local_tz_for_unix_timestamp_conversions=true</code> is in effect,
+        the resulting <code class="ph codeph">TIMESTAMP</code> represents a date and time in the local time zone.
+        </span>
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>CREATE TABLE t1 (x BIGINT);
+SELECT CAST(1000 AS BIGINT);
+</code></pre>
+
+    <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+    <p class="p">
+      <code class="ph codeph">BIGINT</code> is a convenient type to use for column declarations because you can use any kind of
+      integer values in <code class="ph codeph">INSERT</code> statements and they are promoted to <code class="ph codeph">BIGINT</code> where
+      necessary. However, <code class="ph codeph">BIGINT</code> also requires the most bytes of any integer type on disk and in
+      memory, meaning your queries are not as efficient and scalable as possible if you overuse this type.
+      Therefore, prefer to use the smallest integer type with sufficient range to hold all input values, and
+      <code class="ph codeph">CAST()</code> when necessary to the appropriate type.
+    </p>
+
+    <p class="p">
+      For a convenient and automated way to check the bounds of the <code class="ph codeph">BIGINT</code> type, call the
+      functions <code class="ph codeph">MIN_BIGINT()</code> and <code class="ph codeph">MAX_BIGINT()</code>.
+    </p>
+
+    <p class="p">
+      If an integer value is too large to be represented as a <code class="ph codeph">BIGINT</code>, use a
+      <code class="ph codeph">DECIMAL</code> instead with sufficient digits of precision.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">NULL considerations:</strong> Casting any non-numeric value to this type produces a <code class="ph codeph">NULL</code>
+        value.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Partitioning:</strong> Prefer to use this type for a partition key column. Impala can process the numeric
+        type more efficiently than a <code class="ph codeph">STRING</code> representation of the value.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">HBase considerations:</strong> This data type is fully compatible with HBase tables.
+      </p>
+
+
+
+    <p class="p">
+        <strong class="ph b">Text table considerations:</strong> Values of this type are potentially larger in text tables than in tables
+        using Parquet or other binary formats.
+      </p>
+
+
+
+    <p class="p">
+        <strong class="ph b">Internal details:</strong> Represented in memory as an 8-byte value.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Added in:</strong> Available in all versions of Impala.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Column statistics considerations:</strong> Because this type has a fixed size, the maximum and average size
+        fields are always filled in for column statistics, even before you run the <code class="ph codeph">COMPUTE STATS</code>
+        statement.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Sqoop considerations:</strong>
+      </p>
+
+    <p class="p"> If you use Sqoop to
+        convert RDBMS data to Parquet, be careful with interpreting any
+        resulting values from <code class="ph codeph">DATE</code>, <code class="ph codeph">DATETIME</code>,
+        or <code class="ph codeph">TIMESTAMP</code> columns. The underlying values are
+        represented as the Parquet <code class="ph codeph">INT64</code> type, which is
+        represented as <code class="ph codeph">BIGINT</code> in the Impala table. The Parquet
+        values represent the time in milliseconds, while Impala interprets
+          <code class="ph codeph">BIGINT</code> as the time in seconds. Therefore, if you have
+        a <code class="ph codeph">BIGINT</code> column in a Parquet table that was imported
+        this way from Sqoop, divide the values by 1000 when interpreting as the
+          <code class="ph codeph">TIMESTAMP</code> type.</p>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+    <p class="p">
+      <a class="xref" href="impala_literals.html#numeric_literals">Numeric Literals</a>, <a class="xref" href="impala_tinyint.html#tinyint">TINYINT Data Type</a>,
+      <a class="xref" href="impala_smallint.html#smallint">SMALLINT Data Type</a>, <a class="xref" href="impala_int.html#int">INT Data Type</a>,
+      <a class="xref" href="impala_bigint.html#bigint">BIGINT Data Type</a>, <a class="xref" href="impala_decimal.html#decimal">DECIMAL Data Type (Impala 3.0 or higher only)</a>,
+      <a class="xref" href="impala_math_functions.html#math_functions">Impala Mathematical Functions</a>
+    </p>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_datatypes.html">Data Types</a></div></div></nav></article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_bit_functions.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_bit_functions.html b/docs/build3x/html/topics/impala_bit_functions.html
new file mode 100644
index 0000000..4c33b22
--- /dev/null
+++ b/docs/build3x/html/topics/impala_bit_functions.html
@@ -0,0 +1,848 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_functions.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="bit_functions"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Impala Bit Functions</title></head><body id="bit_functions"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Impala Bit Functions</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+      Bit manipulation functions perform bitwise operations involved in scientific processing or computer science algorithms.
+      For example, these functions include setting, clearing, or testing bits within an integer value, or changing the
+      positions of bits with or without wraparound.
+    </p>
+
+    <p class="p">
+      If a function takes two integer arguments that are required to be of the same type, the smaller argument is promoted
+      to the type of the larger one if required. For example, <code class="ph codeph">BITAND(1,4096)</code> treats both arguments as
+      <code class="ph codeph">SMALLINT</code>, because 1 can be represented as a <code class="ph codeph">TINYINT</code> but 4096 requires a <code class="ph codeph">SMALLINT</code>.
+    </p>
+
+    <p class="p">
+     Remember that all Impala integer values are signed. Therefore, when dealing with binary values where the most significant
+     bit is 1, the specified or returned values might be negative when represented in base 10.
+    </p>
+
+    <p class="p">
+      Whenever any argument is <code class="ph codeph">NULL</code>, either the input value, bit position, or number of shift or rotate positions,
+      the return value from any of these functions is also <code class="ph codeph">NULL</code>
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+    <p class="p">
+      The bit functions operate on all the integral data types: <a class="xref" href="impala_int.html#int">INT Data Type</a>,
+      <a class="xref" href="impala_bigint.html#bigint">BIGINT Data Type</a>, <a class="xref" href="impala_smallint.html#smallint">SMALLINT Data Type</a>, and
+      <a class="xref" href="impala_tinyint.html#tinyint">TINYINT Data Type</a>.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Function reference:</strong>
+    </p>
+
+    <p class="p">
+      Impala supports the following bit functions:
+    </p>
+
+
+
+    <dl class="dl">
+
+
+
+        <dt class="dt dlterm" id="bit_functions__bitand">
+          <code class="ph codeph">bitand(integer_type a, same_type b)</code>
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Returns an integer value representing the bits that are set to 1 in both of the arguments.
+          If the arguments are of different sizes, the smaller is promoted to the type of the larger.
+          <p class="p">
+            <strong class="ph b">Usage notes:</strong> The <code class="ph codeph">bitand()</code> function is equivalent to the <code class="ph codeph">&amp;</code> binary operator.
+          </p>
+          <p class="p">
+        <strong class="ph b">Return type:</strong> Same as the input value
+      </p>
+          <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.3.0</span>
+      </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+          <p class="p">
+            The following examples show the results of ANDing integer values.
+            255 contains all 1 bits in its lowermost 7 bits.
+            32767 contains all 1 bits in its lowermost 15 bits.
+
+            You can use the <code class="ph codeph">bin()</code> function to check the binary representation of any
+            integer value, although the result is always represented as a 64-bit value.
+            If necessary, the smaller argument is promoted to the
+            type of the larger one.
+          </p>
+<pre class="pre codeblock"><code>select bitand(255, 32767); /* 0000000011111111 &amp; 0111111111111111 */
++--------------------+
+| bitand(255, 32767) |
++--------------------+
+| 255                |
++--------------------+
+
+select bitand(32767, 1); /* 0111111111111111 &amp; 0000000000000001 */
++------------------+
+| bitand(32767, 1) |
++------------------+
+| 1                |
++------------------+
+
+select bitand(32, 16); /* 00010000 &amp; 00001000 */
++----------------+
+| bitand(32, 16) |
++----------------+
+| 0              |
++----------------+
+
+select bitand(12,5); /* 00001100 &amp; 00000101 */
++---------------+
+| bitand(12, 5) |
++---------------+
+| 4             |
++---------------+
+
+select bitand(-1,15); /* 11111111 &amp; 00001111 */
++----------------+
+| bitand(-1, 15) |
++----------------+
+| 15             |
++----------------+
+</code></pre>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="bit_functions__bitnot">
+          <code class="ph codeph">bitnot(integer_type a)</code>
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Inverts all the bits of the input argument.
+          <p class="p">
+            <strong class="ph b">Usage notes:</strong> The <code class="ph codeph">bitnot()</code> function is equivalent to the <code class="ph codeph">~</code> unary operator.
+          </p>
+          <p class="p">
+        <strong class="ph b">Return type:</strong> Same as the input value
+      </p>
+          <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.3.0</span>
+      </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+          <p class="p">
+            These examples illustrate what happens when you flip all the bits of an integer value.
+            The sign always changes. The decimal representation is one different between the positive and
+            negative values.
+
+          </p>
+<pre class="pre codeblock"><code>select bitnot(127); /* 01111111 -&gt; 10000000 */
++-------------+
+| bitnot(127) |
++-------------+
+| -128        |
++-------------+
+
+select bitnot(16); /* 00010000 -&gt; 11101111 */
++------------+
+| bitnot(16) |
++------------+
+| -17        |
++------------+
+
+select bitnot(0); /* 00000000 -&gt; 11111111 */
++-----------+
+| bitnot(0) |
++-----------+
+| -1        |
++-----------+
+
+select bitnot(-128); /* 10000000 -&gt; 01111111 */
++--------------+
+| bitnot(-128) |
++--------------+
+| 127          |
++--------------+
+</code></pre>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="bit_functions__bitor">
+          <code class="ph codeph">bitor(integer_type a, same_type b)</code>
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Returns an integer value representing the bits that are set to 1 in either of the arguments.
+          If the arguments are of different sizes, the smaller is promoted to the type of the larger.
+          <p class="p">
+            <strong class="ph b">Usage notes:</strong> The <code class="ph codeph">bitor()</code> function is equivalent to the <code class="ph codeph">|</code> binary operator.
+          </p>
+          <p class="p">
+        <strong class="ph b">Return type:</strong> Same as the input value
+      </p>
+          <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.3.0</span>
+      </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+          <p class="p">
+            The following examples show the results of ORing integer values.
+          </p>
+<pre class="pre codeblock"><code>select bitor(1,4); /* 00000001 | 00000100 */
++-------------+
+| bitor(1, 4) |
++-------------+
+| 5           |
++-------------+
+
+select bitor(16,48); /* 00001000 | 00011000 */
++---------------+
+| bitor(16, 48) |
++---------------+
+| 48            |
++---------------+
+
+select bitor(0,7); /* 00000000 | 00000111 */
++-------------+
+| bitor(0, 7) |
++-------------+
+| 7           |
++-------------+
+</code></pre>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="bit_functions__bitxor">
+          <code class="ph codeph">bitxor(integer_type a, same_type b)</code>
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Returns an integer value representing the bits that are set to 1 in one but not both of the arguments.
+          If the arguments are of different sizes, the smaller is promoted to the type of the larger.
+          <p class="p">
+            <strong class="ph b">Usage notes:</strong> The <code class="ph codeph">bitxor()</code> function is equivalent to the <code class="ph codeph">^</code> binary operator.
+          </p>
+          <p class="p">
+        <strong class="ph b">Return type:</strong> Same as the input value
+      </p>
+          <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.3.0</span>
+      </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+          <p class="p">
+            The following examples show the results of XORing integer values.
+            XORing a non-zero value with zero returns the non-zero value.
+            XORing two identical values returns zero, because all the 1 bits from the first argument are also 1 bits in the second argument.
+            XORing different non-zero values turns off some bits and leaves others turned on, based on whether the same bit is set in both arguments.
+          </p>
+<pre class="pre codeblock"><code>select bitxor(0,15); /* 00000000 ^ 00001111 */
++---------------+
+| bitxor(0, 15) |
++---------------+
+| 15            |
++---------------+
+
+select bitxor(7,7); /* 00000111 ^ 00000111 */
++--------------+
+| bitxor(7, 7) |
++--------------+
+| 0            |
++--------------+
+
+select bitxor(8,4); /* 00001000 ^ 00000100 */
++--------------+
+| bitxor(8, 4) |
++--------------+
+| 12           |
++--------------+
+
+select bitxor(3,7); /* 00000011 ^ 00000111 */
++--------------+
+| bitxor(3, 7) |
++--------------+
+| 4            |
++--------------+
+</code></pre>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="bit_functions__countset">
+          <code class="ph codeph">countset(integer_type a [, int zero_or_one])</code>
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> By default, returns the number of 1 bits in the specified integer value.
+          If the optional second argument is set to zero, it returns the number of 0 bits instead.
+          <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+          <p class="p">
+            In discussions of information theory, this operation is referred to as the
+            <span class="q">"<a class="xref" href="https://en.wikipedia.org/wiki/Hamming_weight" target="_blank">population count</a>"</span>
+            or <span class="q">"popcount"</span>.
+          </p>
+          <p class="p">
+        <strong class="ph b">Return type:</strong> Same as the input value
+      </p>
+          <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.3.0</span>
+      </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+          <p class="p">
+            The following examples show how to count the number of 1 bits in an integer value.
+          </p>
+<pre class="pre codeblock"><code>select countset(1); /* 00000001 */
++-------------+
+| countset(1) |
++-------------+
+| 1           |
++-------------+
+
+select countset(3); /* 00000011 */
++-------------+
+| countset(3) |
++-------------+
+| 2           |
++-------------+
+
+select countset(16); /* 00010000 */
++--------------+
+| countset(16) |
++--------------+
+| 1            |
++--------------+
+
+select countset(17); /* 00010001 */
++--------------+
+| countset(17) |
++--------------+
+| 2            |
++--------------+
+
+select countset(7,1); /* 00000111 = 3 1 bits; the function counts 1 bits by default */
++----------------+
+| countset(7, 1) |
++----------------+
+| 3              |
++----------------+
+
+select countset(7,0); /* 00000111 = 5 0 bits; third argument can only be 0 or 1 */
++----------------+
+| countset(7, 0) |
++----------------+
+| 5              |
++----------------+
+</code></pre>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="bit_functions__getbit">
+          <code class="ph codeph">getbit(integer_type a, int position)</code>
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Returns a 0 or 1 representing the bit at a
+          specified position. The positions are numbered right to left, starting at zero.
+          The position argument cannot be negative.
+          <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+          <p class="p">
+            When you use a literal input value, it is treated as an 8-bit, 16-bit,
+            and so on value, the smallest type that is appropriate.
+            The type of the input value limits the range of the positions.
+            Cast the input value to the appropriate type if you need to
+            ensure it is treated as a 64-bit, 32-bit, and so on value.
+          </p>
+          <p class="p">
+        <strong class="ph b">Return type:</strong> Same as the input value
+      </p>
+          <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.3.0</span>
+      </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+          <p class="p">
+            The following examples show how to test a specific bit within an integer value.
+          </p>
+<pre class="pre codeblock"><code>select getbit(1,0); /* 00000001 */
++--------------+
+| getbit(1, 0) |
++--------------+
+| 1            |
++--------------+
+
+select getbit(16,1) /* 00010000 */
++---------------+
+| getbit(16, 1) |
++---------------+
+| 0             |
++---------------+
+
+select getbit(16,4) /* 00010000 */
++---------------+
+| getbit(16, 4) |
++---------------+
+| 1             |
++---------------+
+
+select getbit(16,5) /* 00010000 */
++---------------+
+| getbit(16, 5) |
++---------------+
+| 0             |
++---------------+
+
+select getbit(-1,3); /* 11111111 */
++---------------+
+| getbit(-1, 3) |
++---------------+
+| 1             |
++---------------+
+
+select getbit(-1,25); /* 11111111 */
+ERROR: Invalid bit position: 25
+
+select getbit(cast(-1 as int),25); /* 11111111111111111111111111111111 */
++-----------------------------+
+| getbit(cast(-1 as int), 25) |
++-----------------------------+
+| 1                           |
++-----------------------------+
+</code></pre>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="bit_functions__rotateleft">
+          <code class="ph codeph">rotateleft(integer_type a, int positions)</code>
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Rotates an integer value left by a specified number of bits.
+          As the most significant bit is taken out of the original value,
+          if it is a 1 bit, it is <span class="q">"rotated"</span> back to the least significant bit.
+          Therefore, the final value has the same number of 1 bits as the original value,
+          just in different positions.
+          In computer science terms, this operation is a
+          <span class="q">"<a class="xref" href="https://en.wikipedia.org/wiki/Circular_shift" target="_blank">circular shift</a>"</span>.
+          <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+          <p class="p">
+            Specifying a second argument of zero leaves the original value unchanged.
+            Rotating a -1 value by any number of positions still returns -1,
+            because the original value has all 1 bits and all the 1 bits are
+            preserved during rotation.
+            Similarly, rotating a 0 value by any number of positions still returns 0.
+            Rotating a value by the same number of bits as in the value returns the same value.
+            Because this is a circular operation, the number of positions is not limited
+            to the number of bits in the input value.
+            For example, rotating an 8-bit value by 1, 9, 17, and so on positions returns an
+            identical result in each case.
+          </p>
+          <p class="p">
+        <strong class="ph b">Return type:</strong> Same as the input value
+      </p>
+          <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.3.0</span>
+      </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+<pre class="pre codeblock"><code>select rotateleft(1,4); /* 00000001 -&gt; 00010000 */
++------------------+
+| rotateleft(1, 4) |
++------------------+
+| 16               |
++------------------+
+
+select rotateleft(-1,155); /* 11111111 -&gt; 11111111 */
++---------------------+
+| rotateleft(-1, 155) |
++---------------------+
+| -1                  |
++---------------------+
+
+select rotateleft(-128,1); /* 10000000 -&gt; 00000001 */
++---------------------+
+| rotateleft(-128, 1) |
++---------------------+
+| 1                   |
++---------------------+
+
+select rotateleft(-127,3); /* 10000001 -&gt; 00001100 */
++---------------------+
+| rotateleft(-127, 3) |
++---------------------+
+| 12                  |
++---------------------+
+
+</code></pre>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="bit_functions__rotateright">
+          <code class="ph codeph">rotateright(integer_type a, int positions)</code>
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Rotates an integer value right by a specified number of bits.
+          As the least significant bit is taken out of the original value,
+          if it is a 1 bit, it is <span class="q">"rotated"</span> back to the most significant bit.
+          Therefore, the final value has the same number of 1 bits as the original value,
+          just in different positions.
+          In computer science terms, this operation is a
+          <span class="q">"<a class="xref" href="https://en.wikipedia.org/wiki/Circular_shift" target="_blank">circular shift</a>"</span>.
+          <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+          <p class="p">
+            Specifying a second argument of zero leaves the original value unchanged.
+            Rotating a -1 value by any number of positions still returns -1,
+            because the original value has all 1 bits and all the 1 bits are
+            preserved during rotation.
+            Similarly, rotating a 0 value by any number of positions still returns 0.
+            Rotating a value by the same number of bits as in the value returns the same value.
+            Because this is a circular operation, the number of positions is not limited
+            to the number of bits in the input value.
+            For example, rotating an 8-bit value by 1, 9, 17, and so on positions returns an
+            identical result in each case.
+          </p>
+          <p class="p">
+        <strong class="ph b">Return type:</strong> Same as the input value
+      </p>
+          <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.3.0</span>
+      </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+<pre class="pre codeblock"><code>select rotateright(16,4); /* 00010000 -&gt; 00000001 */
++--------------------+
+| rotateright(16, 4) |
++--------------------+
+| 1                  |
++--------------------+
+
+select rotateright(-1,155); /* 11111111 -&gt; 11111111 */
++----------------------+
+| rotateright(-1, 155) |
++----------------------+
+| -1                   |
++----------------------+
+
+select rotateright(-128,1); /* 10000000 -&gt; 01000000 */
++----------------------+
+| rotateright(-128, 1) |
++----------------------+
+| 64                   |
++----------------------+
+
+select rotateright(-127,3); /* 10000001 -&gt; 00110000 */
++----------------------+
+| rotateright(-127, 3) |
++----------------------+
+| 48                   |
++----------------------+
+</code></pre>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="bit_functions__setbit">
+          <code class="ph codeph">setbit(integer_type a, int position [, int zero_or_one])</code>
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> By default, changes a bit at a specified position to a 1, if it is not already.
+          If the optional third argument is set to zero, the specified bit is set to 0 instead.
+          <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+          If the bit at the specified position was already 1 (by default)
+          or 0 (with a third argument of zero), the return value is
+          the same as the first argument.
+          The positions are numbered right to left, starting at zero.
+          (Therefore, the return value could be different from the first argument
+          even if the position argument is zero.)
+          The position argument cannot be negative.
+          <p class="p">
+            When you use a literal input value, it is treated as an 8-bit, 16-bit,
+            and so on value, the smallest type that is appropriate.
+            The type of the input value limits the range of the positions.
+            Cast the input value to the appropriate type if you need to
+            ensure it is treated as a 64-bit, 32-bit, and so on value.
+          </p>
+          <p class="p">
+        <strong class="ph b">Return type:</strong> Same as the input value
+      </p>
+          <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.3.0</span>
+      </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+<pre class="pre codeblock"><code>select setbit(0,0); /* 00000000 -&gt; 00000001 */
++--------------+
+| setbit(0, 0) |
++--------------+
+| 1            |
++--------------+
+
+select setbit(0,3); /* 00000000 -&gt; 00001000 */
++--------------+
+| setbit(0, 3) |
++--------------+
+| 8            |
++--------------+
+
+select setbit(7,3); /* 00000111 -&gt; 00001111 */
++--------------+
+| setbit(7, 3) |
++--------------+
+| 15           |
++--------------+
+
+select setbit(15,3); /* 00001111 -&gt; 00001111 */
++---------------+
+| setbit(15, 3) |
++---------------+
+| 15            |
++---------------+
+
+select setbit(0,32); /* By default, 0 is a TINYINT with only 8 bits. */
+ERROR: Invalid bit position: 32
+
+select setbit(cast(0 as bigint),32); /* For BIGINT, the position can be 0..63. */
++-------------------------------+
+| setbit(cast(0 as bigint), 32) |
++-------------------------------+
+| 4294967296                    |
++-------------------------------+
+
+select setbit(7,3,1); /* 00000111 -&gt; 00001111; setting to 1 is the default */
++-----------------+
+| setbit(7, 3, 1) |
++-----------------+
+| 15              |
++-----------------+
+
+select setbit(7,2,0); /* 00000111 -&gt; 00000011; third argument of 0 clears instead of sets */
++-----------------+
+| setbit(7, 2, 0) |
++-----------------+
+| 3               |
++-----------------+
+</code></pre>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="bit_functions__shiftleft">
+          <code class="ph codeph">shiftleft(integer_type a, int positions)</code>
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Shifts an integer value left by a specified number of bits.
+          As the most significant bit is taken out of the original value,
+          it is discarded and the least significant bit becomes 0.
+          In computer science terms, this operation is a <span class="q">"<a class="xref" href="https://en.wikipedia.org/wiki/Logical_shift" target="_blank">logical shift</a>"</span>.
+          <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+          <p class="p">
+            The final value has either the same number of 1 bits as the original value, or fewer.
+            Shifting an 8-bit value by 8 positions, a 16-bit value by 16 positions, and so on produces
+            a result of zero.
+          </p>
+          <p class="p">
+            Specifying a second argument of zero leaves the original value unchanged.
+            Shifting any value by 0 returns the original value.
+            Shifting any value by 1 is the same as multiplying it by 2,
+            as long as the value is small enough; larger values eventually
+            become negative when shifted, as the sign bit is set.
+            Starting with the value 1 and shifting it left by N positions gives
+            the same result as 2 to the Nth power, or <code class="ph codeph">pow(2,<var class="keyword varname">N</var>)</code>.
+          </p>
+          <p class="p">
+        <strong class="ph b">Return type:</strong> Same as the input value
+      </p>
+          <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.3.0</span>
+      </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+<pre class="pre codeblock"><code>select shiftleft(1,0); /* 00000001 -&gt; 00000001 */
++-----------------+
+| shiftleft(1, 0) |
++-----------------+
+| 1               |
++-----------------+
+
+select shiftleft(1,3); /* 00000001 -&gt; 00001000 */
++-----------------+
+| shiftleft(1, 3) |
++-----------------+
+| 8               |
++-----------------+
+
+select shiftleft(8,2); /* 00001000 -&gt; 00100000 */
++-----------------+
+| shiftleft(8, 2) |
++-----------------+
+| 32              |
++-----------------+
+
+select shiftleft(127,1); /* 01111111 -&gt; 11111110 */
++-------------------+
+| shiftleft(127, 1) |
++-------------------+
+| -2                |
++-------------------+
+
+select shiftleft(127,5); /* 01111111 -&gt; 11100000 */
++-------------------+
+| shiftleft(127, 5) |
++-------------------+
+| -32               |
++-------------------+
+
+select shiftleft(-1,4); /* 11111111 -&gt; 11110000 */
++------------------+
+| shiftleft(-1, 4) |
++------------------+
+| -16              |
++------------------+
+</code></pre>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="bit_functions__shiftright">
+          <code class="ph codeph">shiftright(integer_type a, int positions)</code>
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Shifts an integer value right by a specified number of bits.
+          As the least significant bit is taken out of the original value,
+          it is discarded and the most significant bit becomes 0.
+          In computer science terms, this operation is a <span class="q">"<a class="xref" href="https://en.wikipedia.org/wiki/Logical_shift" target="_blank">logical shift</a>"</span>.
+          <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+          <p class="p">
+          Therefore, the final value has either the same number of 1 bits as the original value, or fewer.
+          Shifting an 8-bit value by 8 positions, a 16-bit value by 16 positions, and so on produces
+          a result of zero.
+          </p>
+          <p class="p">
+            Specifying a second argument of zero leaves the original value unchanged.
+            Shifting any value by 0 returns the original value.
+            Shifting any positive value right by 1 is the same as dividing it by 2.
+            Negative values become positive when shifted right.
+          </p>
+          <p class="p">
+        <strong class="ph b">Return type:</strong> Same as the input value
+      </p>
+          <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.3.0</span>
+      </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+<pre class="pre codeblock"><code>select shiftright(16,0); /* 00010000 -&gt; 00010000 */
++-------------------+
+| shiftright(16, 0) |
++-------------------+
+| 16                |
++-------------------+
+
+select shiftright(16,4); /* 00010000 -&gt; 00000001 */
++-------------------+
+| shiftright(16, 4) |
++-------------------+
+| 1                 |
++-------------------+
+
+select shiftright(16,5); /* 00010000 -&gt; 00000000 */
++-------------------+
+| shiftright(16, 5) |
++-------------------+
+| 0                 |
++-------------------+
+
+select shiftright(-1,1); /* 11111111 -&gt; 01111111 */
++-------------------+
+| shiftright(-1, 1) |
++-------------------+
+| 127               |
++-------------------+
+
+select shiftright(-1,5); /* 11111111 -&gt; 00000111 */
++-------------------+
+| shiftright(-1, 5) |
++-------------------+
+| 7                 |
++-------------------+
+</code></pre>
+        </dd>
+
+
+
+    </dl>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_functions.html">Impala Built-In Functions</a></div></div></nav></article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_boolean.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_boolean.html b/docs/build3x/html/topics/impala_boolean.html
new file mode 100644
index 0000000..afbf2e3
--- /dev/null
+++ b/docs/build3x/html/topics/impala_boolean.html
@@ -0,0 +1,170 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_datatypes.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="boolean"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>BOOLEAN Data Type</title></head><body id="boolean"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">BOOLEAN Data Type</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+      A data type used in <code class="ph codeph">CREATE TABLE</code> and <code class="ph codeph">ALTER TABLE</code> statements, representing a
+      single true/false choice.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+    <p class="p">
+      In the column definition of a <code class="ph codeph">CREATE TABLE</code> statement:
+    </p>
+
+<pre class="pre codeblock"><code><var class="keyword varname">column_name</var> BOOLEAN</code></pre>
+
+    <p class="p">
+      <strong class="ph b">Range:</strong> <code class="ph codeph">TRUE</code> or <code class="ph codeph">FALSE</code>. Do not use quotation marks around the
+      <code class="ph codeph">TRUE</code> and <code class="ph codeph">FALSE</code> literal values. You can write the literal values in
+      uppercase, lowercase, or mixed case. The values queried from a table are always returned in lowercase,
+      <code class="ph codeph">true</code> or <code class="ph codeph">false</code>.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Conversions:</strong> Impala does not automatically convert any other type to <code class="ph codeph">BOOLEAN</code>. All
+      conversions must use an explicit call to the <code class="ph codeph">CAST()</code> function.
+    </p>
+
+    <p class="p">
+      You can use <code class="ph codeph">CAST()</code> to convert
+
+      any integer or floating-point type to
+      <code class="ph codeph">BOOLEAN</code>: a value of 0 represents <code class="ph codeph">false</code>, and any non-zero value is converted
+      to <code class="ph codeph">true</code>.
+    </p>
+
+<pre class="pre codeblock"><code>SELECT CAST(42 AS BOOLEAN) AS nonzero_int, CAST(99.44 AS BOOLEAN) AS nonzero_decimal,
+  CAST(000 AS BOOLEAN) AS zero_int, CAST(0.0 AS BOOLEAN) AS zero_decimal;
++-------------+-----------------+----------+--------------+
+| nonzero_int | nonzero_decimal | zero_int | zero_decimal |
++-------------+-----------------+----------+--------------+
+| true        | true            | false    | false        |
++-------------+-----------------+----------+--------------+
+</code></pre>
+
+    <p class="p">
+      When you cast the opposite way, from <code class="ph codeph">BOOLEAN</code> to a numeric type,
+      the result becomes either 1 or 0:
+    </p>
+
+<pre class="pre codeblock"><code>SELECT CAST(true AS INT) AS true_int, CAST(true AS DOUBLE) AS true_double,
+  CAST(false AS INT) AS false_int, CAST(false AS DOUBLE) AS false_double;
++----------+-------------+-----------+--------------+
+| true_int | true_double | false_int | false_double |
++----------+-------------+-----------+--------------+
+| 1        | 1           | 0         | 0            |
++----------+-------------+-----------+--------------+
+</code></pre>
+
+    <p class="p">
+
+      You can cast <code class="ph codeph">DECIMAL</code> values to <code class="ph codeph">BOOLEAN</code>, with the same treatment of zero and
+      non-zero values as the other numeric types. You cannot cast a <code class="ph codeph">BOOLEAN</code> to a
+      <code class="ph codeph">DECIMAL</code>.
+    </p>
+
+    <p class="p">
+      You cannot cast a <code class="ph codeph">STRING</code> value to <code class="ph codeph">BOOLEAN</code>, although you can cast a
+      <code class="ph codeph">BOOLEAN</code> value to <code class="ph codeph">STRING</code>, returning <code class="ph codeph">'1'</code> for
+      <code class="ph codeph">true</code> values and <code class="ph codeph">'0'</code> for <code class="ph codeph">false</code> values.
+    </p>
+
+    <p class="p">
+      Although you can cast a <code class="ph codeph">TIMESTAMP</code> to a <code class="ph codeph">BOOLEAN</code> or a
+      <code class="ph codeph">BOOLEAN</code> to a <code class="ph codeph">TIMESTAMP</code>, the results are unlikely to be useful. Any non-zero
+      <code class="ph codeph">TIMESTAMP</code> (that is, any value other than <code class="ph codeph">1970-01-01 00:00:00</code>) becomes
+      <code class="ph codeph">TRUE</code> when converted to <code class="ph codeph">BOOLEAN</code>, while <code class="ph codeph">1970-01-01 00:00:00</code>
+      becomes <code class="ph codeph">FALSE</code>. A value of <code class="ph codeph">FALSE</code> becomes <code class="ph codeph">1970-01-01
+      00:00:00</code> when converted to <code class="ph codeph">BOOLEAN</code>, and <code class="ph codeph">TRUE</code> becomes one second
+      past this epoch date, that is, <code class="ph codeph">1970-01-01 00:00:01</code>.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">NULL considerations:</strong> An expression of this type produces a <code class="ph codeph">NULL</code> value if any
+        argument of the expression is <code class="ph codeph">NULL</code>.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Partitioning:</strong>
+      </p>
+
+    <p class="p">
+      Do not use a <code class="ph codeph">BOOLEAN</code> column as a partition key. Although you can create such a table,
+      subsequent operations produce errors:
+    </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; create table truth_table (assertion string) partitioned by (truth boolean);
+[localhost:21000] &gt; insert into truth_table values ('Pigs can fly',false);
+ERROR: AnalysisException: INSERT into table with BOOLEAN partition column (truth) is not supported: partitioning.truth_table
+</code></pre>
+
+    <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>SELECT 1 &lt; 2;
+SELECT 2 = 5;
+SELECT 100 &lt; NULL, 100 &gt; NULL;
+CREATE TABLE assertions (claim STRING, really BOOLEAN);
+INSERT INTO assertions VALUES
+  ("1 is less than 2", 1 &lt; 2),
+  ("2 is the same as 5", 2 = 5),
+  ("Grass is green", true),
+  ("The moon is made of green cheese", false);
+SELECT claim FROM assertions WHERE really = TRUE;
+</code></pre>
+
+    <p class="p">
+        <strong class="ph b">HBase considerations:</strong> This data type is fully compatible with HBase tables.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Parquet considerations:</strong> This type is fully compatible with Parquet tables.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Text table considerations:</strong> Values of this type are potentially larger in text tables than in tables
+        using Parquet or other binary formats.
+      </p>
+
+
+
+
+
+
+
+    <p class="p">
+        <strong class="ph b">Column statistics considerations:</strong> Because this type has a fixed size, the maximum and average size
+        fields are always filled in for column statistics, even before you run the <code class="ph codeph">COMPUTE STATS</code>
+        statement.
+      </p>
+
+
+
+    <p class="p">
+        <strong class="ph b">Kudu considerations:</strong>
+      </p>
+    <p class="p">
+        Currently, the data types <code class="ph codeph">BOOLEAN</code>, <code class="ph codeph">FLOAT</code>,
+        and <code class="ph codeph">DOUBLE</code> cannot be used for primary key columns in Kudu tables.
+      </p>
+
+
+
+    <p class="p">
+      <strong class="ph b">Related information:</strong> <a class="xref" href="impala_literals.html#boolean_literals">Boolean Literals</a>,
+      <a class="xref" href="impala_operators.html#operators">SQL Operators</a>,
+      <a class="xref" href="impala_conditional_functions.html#conditional_functions">Impala Conditional Functions</a>
+    </p>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_datatypes.html">Data Types</a></div></div></nav></article></main></body></html>

[31/51] [partial] impala git commit: [DOCS] Impala doc site update for 3.0

Posted by mi...@apache.org.

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_float.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_float.html b/docs/build3x/html/topics/impala_float.html
new file mode 100644
index 0000000..53661d0
--- /dev/null
+++ b/docs/build3x/html/topics/impala_float.html
@@ -0,0 +1,153 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_datatypes.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="float"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>FLOAT Data Type</title></head><body id="float"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">FLOAT Data Type</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+      A single precision floating-point data type used in <code class="ph codeph">CREATE TABLE</code> and <code class="ph codeph">ALTER
+      TABLE</code> statements.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+    <p class="p">
+      In the column definition of a <code class="ph codeph">CREATE TABLE</code> statement:
+    </p>
+
+<pre class="pre codeblock"><code><var class="keyword varname">column_name</var> FLOAT</code></pre>
+
+    <p class="p">
+      <strong class="ph b">Range:</strong> 1.40129846432481707e-45 .. 3.40282346638528860e+38, positive or negative
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Precision:</strong> 6 to 9 significant digits, depending on usage. The number of significant digits does
+      not depend on the position of the decimal point.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Representation:</strong> The values are stored in 4 bytes, using
+      <a class="xref" href="https://en.wikipedia.org/wiki/Single-precision_floating-point_format" target="_blank">IEEE 754 Single Precision Binary Floating Point</a> format.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Conversions:</strong> Impala automatically converts <code class="ph codeph">FLOAT</code> to more precise
+      <code class="ph codeph">DOUBLE</code> values, but not the other way around. You can use <code class="ph codeph">CAST()</code> to convert
+      <code class="ph codeph">FLOAT</code> values to <code class="ph codeph">TINYINT</code>, <code class="ph codeph">SMALLINT</code>, <code class="ph codeph">INT</code>,
+      <code class="ph codeph">BIGINT</code>, <code class="ph codeph">STRING</code>, <code class="ph codeph">TIMESTAMP</code>, or <code class="ph codeph">BOOLEAN</code>.
+      You can use exponential notation in <code class="ph codeph">FLOAT</code> literals or when casting from
+      <code class="ph codeph">STRING</code>, for example <code class="ph codeph">1.0e6</code> to represent one million.
+      <span class="ph">
+          Casting an integer or floating-point value <code class="ph codeph">N</code> to
+        <code class="ph codeph">TIMESTAMP</code> produces a value that is <code class="ph codeph">N</code> seconds past the start of the epoch
+        date (January 1, 1970). By default, the result value represents a date and time in the UTC time zone.
+        If the setting <code class="ph codeph">--use_local_tz_for_unix_timestamp_conversions=true</code> is in effect,
+        the resulting <code class="ph codeph">TIMESTAMP</code> represents a date and time in the local time zone.
+        </span>
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+
+    <p class="p">
+        Impala does not evaluate NaN (not a number) as equal to any other numeric values,
+        including other NaN values. For example, the following statement, which evaluates equality
+        between two NaN values, returns <code class="ph codeph">false</code>:
+      </p>
+
+<pre class="pre codeblock"><code>
+SELECT CAST('nan' AS FLOAT)=CAST('nan' AS FLOAT);
+</code></pre>
+
+    <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>CREATE TABLE t1 (x FLOAT);
+SELECT CAST(1000.5 AS FLOAT);
+</code></pre>
+
+    <p class="p">
+        <strong class="ph b">Partitioning:</strong> Because fractional values of this type are not always represented precisely, when this
+        type is used for a partition key column, the underlying HDFS directories might not be named exactly as you
+        expect. Prefer to partition on a <code class="ph codeph">DECIMAL</code> column instead.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">HBase considerations:</strong> This data type is fully compatible with HBase tables.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Parquet considerations:</strong> This type is fully compatible with Parquet tables.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Text table considerations:</strong> Values of this type are potentially larger in text tables than in tables
+        using Parquet or other binary formats.
+      </p>
+
+
+
+    <p class="p">
+        <strong class="ph b">Internal details:</strong> Represented in memory as a 4-byte value.
+      </p>
+
+
+
+    <p class="p">
+        <strong class="ph b">Column statistics considerations:</strong> Because this type has a fixed size, the maximum and average size
+        fields are always filled in for column statistics, even before you run the <code class="ph codeph">COMPUTE STATS</code>
+        statement.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Restrictions:</strong>
+      </p>
+
+
+
+    <p class="p">
+        Due to the way arithmetic on <code class="ph codeph">FLOAT</code> and <code class="ph codeph">DOUBLE</code> columns uses
+        high-performance hardware instructions, and distributed queries can perform these operations in different
+        order for each query, results can vary slightly for aggregate function calls such as <code class="ph codeph">SUM()</code>
+        and <code class="ph codeph">AVG()</code> for <code class="ph codeph">FLOAT</code> and <code class="ph codeph">DOUBLE</code> columns, particularly on
+        large data sets where millions or billions of values are summed or averaged. For perfect consistency and
+        repeatability, use the <code class="ph codeph">DECIMAL</code> data type for such operations instead of
+        <code class="ph codeph">FLOAT</code> or <code class="ph codeph">DOUBLE</code>.
+      </p>
+
+    <p class="p">
+        The inability to exactly represent certain floating-point values means that
+        <code class="ph codeph">DECIMAL</code> is sometimes a better choice than <code class="ph codeph">DOUBLE</code>
+        or <code class="ph codeph">FLOAT</code> when precision is critical, particularly when
+        transferring data from other database systems that use different representations
+        or file formats.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Kudu considerations:</strong>
+      </p>
+    <p class="p">
+        Currently, the data types <code class="ph codeph">BOOLEAN</code>, <code class="ph codeph">FLOAT</code>,
+        and <code class="ph codeph">DOUBLE</code> cannot be used for primary key columns in Kudu tables.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+    <p class="p">
+      <a class="xref" href="impala_literals.html#numeric_literals">Numeric Literals</a>, <a class="xref" href="impala_math_functions.html#math_functions">Impala Mathematical Functions</a>,
+      <a class="xref" href="impala_double.html#double">DOUBLE Data Type</a>
+    </p>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_datatypes.html">Data Types</a></div></div></nav></article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_functions.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_functions.html b/docs/build3x/html/topics/impala_functions.html
new file mode 100644
index 0000000..44fa0c2
--- /dev/null
+++ b/docs/build3x/html/topics/impala_functions.html
@@ -0,0 +1,162 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_langref.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_math_functions.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_bit_functions.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_conversion_functions.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_datetime_functions.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_conditional_functions.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_string_functions.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_misc_functions.html"><meta name="DC.Relation" scheme="URI" content=
 "../topics/impala_aggregate_functions.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_analytic_functions.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_udf.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="builtins"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Impala Built-In Functions</title></head><body id="builtins"><main role="main"><article role="article" aria-labelledby="builtins__title_functions">
+
+  <h1 class="title topictitle1" id="builtins__title_functions">Impala Built-In Functions</h1>
+
+
+
+  <div class="body conbody">
+
+
+
+    <p class="p">
+      Impala supports several categories of built-in functions. These functions let you perform mathematical
+      calculations, string manipulation, date calculations, and other kinds of data transformations directly in
+      <code class="ph codeph">SELECT</code> statements. The built-in functions let a SQL query return results with all
+      formatting, calculating, and type conversions applied, rather than performing time-consuming postprocessing
+      in another application. By applying function calls where practical, you can make a SQL query that is as
+      convenient as an expression in a procedural programming language or a formula in a spreadsheet.
+    </p>
+
+    <p class="p">
+      The categories of functions supported by Impala are:
+    </p>
+
+    <ul class="ul">
+      <li class="li">
+        <a class="xref" href="impala_math_functions.html#math_functions">Impala Mathematical Functions</a>
+      </li>
+
+      <li class="li">
+        <a class="xref" href="impala_conversion_functions.html#conversion_functions">Impala Type Conversion Functions</a>
+      </li>
+
+      <li class="li">
+        <a class="xref" href="impala_datetime_functions.html#datetime_functions">Impala Date and Time Functions</a>
+      </li>
+
+      <li class="li">
+        <a class="xref" href="impala_conditional_functions.html#conditional_functions">Impala Conditional Functions</a>
+      </li>
+
+      <li class="li">
+        <a class="xref" href="impala_string_functions.html#string_functions">Impala String Functions</a>
+      </li>
+
+      <li class="li">
+        Aggregation functions, explained in <a class="xref" href="impala_aggregate_functions.html#aggregate_functions">Impala Aggregate Functions</a>.
+      </li>
+    </ul>
+
+    <p class="p">
+      You call any of these functions through the <code class="ph codeph">SELECT</code> statement. For most functions, you can
+      omit the <code class="ph codeph">FROM</code> clause and supply literal values for any required arguments:
+    </p>
+
+<pre class="pre codeblock"><code>select abs(-1);
++---------+
+| abs(-1) |
++---------+
+| 1       |
++---------+
+
+select concat('The rain ', 'in Spain');
++---------------------------------+
+| concat('the rain ', 'in spain') |
++---------------------------------+
+| The rain in Spain               |
++---------------------------------+
+
+select power(2,5);
++-------------+
+| power(2, 5) |
++-------------+
+| 32          |
++-------------+
+</code></pre>
+
+    <p class="p">
+      When you use a <code class="ph codeph">FROM</code> clause and specify a column name as a function argument, the function is
+      applied for each item in the result set:
+    </p>
+
+
+
+<pre class="pre codeblock"><code>select concat('Country = ',country_code) from all_countries where population &gt; 100000000;
+select round(price) as dollar_value from product_catalog where price between 0.0 and 100.0;
+</code></pre>
+
+    <p class="p">
+      Typically, if any argument to a built-in function is <code class="ph codeph">NULL</code>, the result value is also
+      <code class="ph codeph">NULL</code>:
+    </p>
+
+<pre class="pre codeblock"><code>select cos(null);
++-----------+
+| cos(null) |
++-----------+
+| NULL      |
++-----------+
+
+select power(2,null);
++----------------+
+| power(2, null) |
++----------------+
+| NULL           |
++----------------+
+
+select concat('a',null,'b');
++------------------------+
+| concat('a', null, 'b') |
++------------------------+
+| NULL                   |
++------------------------+
+</code></pre>
+
+    <p class="p">
+        Aggregate functions are a special category with different rules. These functions calculate a return value
+        across all the items in a result set, so they require a <code class="ph codeph">FROM</code> clause in the query:
+      </p>
+
+<pre class="pre codeblock"><code>select count(product_id) from product_catalog;
+select max(height), avg(height) from census_data where age &gt; 20;
+</code></pre>
+
+    <p class="p">
+        Aggregate functions also ignore <code class="ph codeph">NULL</code> values rather than returning a <code class="ph codeph">NULL</code>
+        result. For example, if some rows have <code class="ph codeph">NULL</code> for a particular column, those rows are
+        ignored when computing the <code class="ph codeph">AVG()</code> for that column. Likewise, specifying
+        <code class="ph codeph">COUNT(<var class="keyword varname">col_name</var>)</code> in a query counts only those rows where
+        <var class="keyword varname">col_name</var> contains a non-<code class="ph codeph">NULL</code> value.
+      </p>
+
+    <p class="p">
+      Aggregate functions are a special category with different rules. These functions calculate a return value
+      across all the items in a result set, so they do require a <code class="ph codeph">FROM</code> clause in the query:
+    </p>
+
+
+
+<pre class="pre codeblock"><code>select count(product_id) from product_catalog;
+select max(height), avg(height) from census_data where age &gt; 20;
+</code></pre>
+
+    <p class="p">
+      Aggregate functions also ignore <code class="ph codeph">NULL</code> values rather than returning a <code class="ph codeph">NULL</code>
+      result. For example, if some rows have <code class="ph codeph">NULL</code> for a particular column, those rows are ignored
+      when computing the AVG() for that column. Likewise, specifying <code class="ph codeph">COUNT(col_name)</code> in a query
+      counts only those rows where <code class="ph codeph">col_name</code> contains a non-<code class="ph codeph">NULL</code> value.
+    </p>
+
+    <p class="p">
+      Analytic functions are a variation on aggregate functions. Instead of returning a single value, or an
+      identical value for each group of rows, they can compute values that vary based on a <span class="q">"window"</span> consisting
+      of other rows around them in the result set.
+    </p>
+
+    <p class="p toc"></p>
+
+  </div>
+
+<nav role="navigation" class="related-links"><ul class="ullinks"><li class="link ulchildlink"><strong><a href="../topics/impala_math_functions.html">Impala Mathematical Functions</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_bit_functions.html">Impala Bit Functions</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_conversion_functions.html">Impala Type Conversion Functions</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_datetime_functions.html">Impala Date and Time Functions</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_conditional_functions.html">Impala Conditional Functions</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_string_functions.html">Impala String Functions</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_misc_functions.html">Impala Miscellaneous Functions</
 a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_aggregate_functions.html">Impala Aggregate Functions</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_analytic_functions.html">Impala Analytic Functions</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_udf.html">Impala User-Defined Functions (UDFs)</a></strong><br></li></ul><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_langref.html">Impala SQL Language Reference</a></div></div></nav></article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_functions_overview.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_functions_overview.html b/docs/build3x/html/topics/impala_functions_overview.html
new file mode 100644
index 0000000..fef454e
--- /dev/null
+++ b/docs/build3x/html/topics/impala_functions_overview.html
@@ -0,0 +1,109 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_schema_objects.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="functions"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Overview of Impala Functions</title></head><body id="functions"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Overview of Impala Functions</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+      Functions let you apply arithmetic, string, or other computations and transformations to Impala data. You
+      typically use them in <code class="ph codeph">SELECT</code> lists and <code class="ph codeph">WHERE</code> clauses to filter and format
+      query results so that the result set is exactly what you want, with no further processing needed on the
+      application side.
+    </p>
+
+    <p class="p">
+      Scalar functions return a single result for each input row. See <a class="xref" href="impala_functions.html#builtins">Impala Built-In Functions</a>.
+    </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; select name, population from country where continent = 'North America' order by population desc limit 4;
+[localhost:21000] &gt; select upper(name), population from country where continent = 'North America' order by population desc limit 4;
++-------------+------------+
+| upper(name) | population |
++-------------+------------+
+| USA         | 320000000  |
+| MEXICO      | 122000000  |
+| CANADA      | 25000000   |
+| GUATEMALA   | 16000000   |
++-------------+------------+
+</code></pre>
+    <p class="p">
+      Aggregate functions combine the results from multiple rows:
+      either a single result for the entire table, or a separate result for each group of rows.
+      Aggregate functions are frequently used in combination with <code class="ph codeph">GROUP BY</code>
+      and <code class="ph codeph">HAVING</code> clauses in the <code class="ph codeph">SELECT</code> statement.
+      See <a class="xref" href="impala_aggregate_functions.html#aggregate_functions">Impala Aggregate Functions</a>.
+    </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; select continent, <strong class="ph b">sum(population)</strong> as howmany from country <strong class="ph b">group by continent</strong> order by howmany desc;
++---------------+------------+
+| continent     | howmany    |
++---------------+------------+
+| Asia          | 4298723000 |
+| Africa        | 1110635000 |
+| Europe        | 742452000  |
+| North America | 565265000  |
+| South America | 406740000  |
+| Oceania       | 38304000   |
++---------------+------------+
+</code></pre>
+
+    <p class="p">
+      User-defined functions (UDFs) let you code your own logic.  They can be either scalar or aggregate functions.
+      UDFs let you implement important business or scientific logic using high-performance code for Impala to automatically parallelize.
+      You can also use UDFs to implement convenience functions to simplify reporting or porting SQL from other database systems.
+      See <a class="xref" href="impala_udf.html#udfs">Impala User-Defined Functions (UDFs)</a>.
+    </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; select <strong class="ph b">rot13('Hello world!')</strong> as 'Weak obfuscation';
++------------------+
+| weak obfuscation |
++------------------+
+| Uryyb jbeyq!     |
++------------------+
+[localhost:21000] &gt; select <strong class="ph b">likelihood_of_new_subatomic_particle(sensor1, sensor2, sensor3)</strong> as probability
+                  &gt; from experimental_results group by experiment;
+</code></pre>
+
+    <p class="p">
+      Each function is associated with a specific database. For example, if you issue a <code class="ph codeph">USE somedb</code>
+      statement followed by <code class="ph codeph">CREATE FUNCTION somefunc</code>, the new function is created in the
+      <code class="ph codeph">somedb</code> database, and you could refer to it through the fully qualified name
+      <code class="ph codeph">somedb.somefunc</code>. You could then issue another <code class="ph codeph">USE</code> statement
+      and create a function with the same name in a different database.
+    </p>
+
+    <p class="p">
+      Impala built-in functions are associated with a special database named <code class="ph codeph">_impala_builtins</code>,
+      which lets you refer to them from any database without qualifying the name.
+    </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; show databases;
++-------------------------+
+| name                    |
++-------------------------+
+| <strong class="ph b">_impala_builtins</strong>        |
+| analytic_functions      |
+| avro_testing            |
+| data_file_size          |
+...
+[localhost:21000] &gt; show functions in _impala_builtins like '*subs*';
++-------------+-----------------------------------+
+| return type | signature                         |
++-------------+-----------------------------------+
+| STRING      | substr(STRING, BIGINT)            |
+| STRING      | substr(STRING, BIGINT, BIGINT)    |
+| STRING      | substring(STRING, BIGINT)         |
+| STRING      | substring(STRING, BIGINT, BIGINT) |
++-------------+-----------------------------------+
+</code></pre>
+
+    <p class="p">
+      <strong class="ph b">Related statements:</strong> <a class="xref" href="impala_create_function.html#create_function">CREATE FUNCTION Statement</a>,
+      <a class="xref" href="impala_drop_function.html#drop_function">DROP FUNCTION Statement</a>
+    </p>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_schema_objects.html">Impala Schema Objects and Object Names</a></div></div></nav></article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_grant.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_grant.html b/docs/build3x/html/topics/impala_grant.html
new file mode 100644
index 0000000..33b0a45
--- /dev/null
+++ b/docs/build3x/html/topics/impala_grant.html
@@ -0,0 +1,256 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_langref_sql.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="grant"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>GRANT Statement (Impala 2.0 or higher only)</title></head><body id="grant"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">GRANT Statement (<span class="keyword">Impala 2.0</span> or higher only)</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+       The
+        <code class="ph codeph">GRANT</code> statement grants a privilege on a specified object
+      to a role or grants a role to a group.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>GRANT ROLE <var class="keyword varname">role_name</var> TO GROUP <var class="keyword varname">group_name</var>
+
+GRANT <var class="keyword varname">privilege</var> ON <var class="keyword varname">object_type</var> <var class="keyword varname">object_name</var>
+   TO [ROLE] <var class="keyword varname">roleName</var>
+   [WITH GRANT OPTION]
+
+<span class="ph" id="grant__privileges">privilege ::= ALL | ALTER | CREATE | DROP | INSERT | REFRESH | SELECT | SELECT(<var class="keyword varname">column_name</var>)</span>
+<span class="ph" id="grant__priv_objs">object_type ::= TABLE | DATABASE | SERVER | URI</span>
+</code></pre>
+
+    <p class="p">
+      Typically, the object name is an identifier. For URIs, it is a string literal.
+    </p>
+
+
+
+    <p class="p">
+        <strong class="ph b">Required privileges:</strong>
+      </p>
+
+    <p class="p">
+      Only administrative users (initially, a predefined set of users
+      specified in the Sentry service configuration file) can use this
+      statement.
+    </p>
+    <p class="p">Only Sentry administrative users can grant roles to a group. </p>
+
+    <p class="p"> The <code class="ph codeph">WITH GRANT OPTION</code> clause allows members of the
+      specified role to issue <code class="ph codeph">GRANT</code> and <code class="ph codeph">REVOKE</code>
+      statements for those same privileges Hence, if a role has the
+        <code class="ph codeph">ALL</code> privilege on a database and the <code class="ph codeph">WITH GRANT
+        OPTION</code> set, users granted that role can execute
+        <code class="ph codeph">GRANT</code>/<code class="ph codeph">REVOKE</code> statements only for that
+      database or child tables of the database. This means a user could revoke
+      the privileges of the user that provided them the <code class="ph codeph">GRANT
+        OPTION</code>. </p>
+
+    <p class="p"> Impala does not currently support revoking only the <code class="ph codeph">WITH GRANT
+        OPTION</code> from a privilege previously granted to a role. To remove
+      the <code class="ph codeph">WITH GRANT OPTION</code>, revoke the privilege and grant it
+      again without the <code class="ph codeph">WITH GRANT OPTION</code> flag. </p>
+
+    <p class="p">
+      The ability to grant or revoke <code class="ph codeph">SELECT</code> privilege on specific columns is available
+      in <span class="keyword">Impala 2.3</span> and higher. See <span class="xref">the documentation for Apache Sentry</span> for details.
+    </p>
+    <p class="p">
+      <strong class="ph b">Usage notes:</strong>
+    </p>
+
+    <p class="p">
+      You can only grant the <code class="ph codeph">ALL</code> privilege to the
+        <code class="ph codeph">URI</code> object. Finer-grained privileges mentioned below on
+      a <code class="ph codeph">URI</code> are not supported.
+    </p>
+
+    <div class="p">
+      Starting in <span class="keyword">Impala 3.0</span>, finer grained privileges
+      are enforced as below.<table class="simpletable frame-all" id="grant__simpletable_kmb_ppn_ndb"><col style="width:33.33333333333333%"><col style="width:33.33333333333333%"><col style="width:33.33333333333333%"><thead><tr class="sthead">
+          <th class="stentry" id="grant__simpletable_kmb_ppn_ndb__stentry__1">Privilege</th>
+          <th class="stentry" id="grant__simpletable_kmb_ppn_ndb__stentry__2">Scope</th>
+          <th class="stentry" id="grant__simpletable_kmb_ppn_ndb__stentry__3">SQL Allowed to Execute</th>
+        </tr></thead><tbody><tr class="strow">
+          <td class="stentry" headers="grant__simpletable_kmb_ppn_ndb__stentry__1"><code class="ph codeph">REFRESH</code></td>
+          <td class="stentry" headers="grant__simpletable_kmb_ppn_ndb__stentry__2"><code class="ph codeph">SERVER</code></td>
+          <td class="stentry" headers="grant__simpletable_kmb_ppn_ndb__stentry__3"><code class="ph codeph">INVALIDATE METADATA</code> on all tables in all
+                databases<p class="p"><code class="ph codeph">REFRESH</code> on all tables and functions
+              in all databases</p></td>
+        </tr><tr class="strow">
+          <td class="stentry" headers="grant__simpletable_kmb_ppn_ndb__stentry__1"><code class="ph codeph">REFRESH</code></td>
+          <td class="stentry" headers="grant__simpletable_kmb_ppn_ndb__stentry__2"><code class="ph codeph">DATABASE</code></td>
+          <td class="stentry" headers="grant__simpletable_kmb_ppn_ndb__stentry__3"><code class="ph codeph">INVALIDATE METADATA</code> on all tables in the
+            named database<p class="p"><code class="ph codeph">REFRESH</code> on all tables and
+              functions in the named database</p></td>
+        </tr><tr class="strow">
+          <td class="stentry" headers="grant__simpletable_kmb_ppn_ndb__stentry__1"><code class="ph codeph">REFRESH</code></td>
+          <td class="stentry" headers="grant__simpletable_kmb_ppn_ndb__stentry__2"><code class="ph codeph">TABLE</code></td>
+          <td class="stentry" headers="grant__simpletable_kmb_ppn_ndb__stentry__3"><code class="ph codeph">INVALIDATE METADATA</code> on the named
+                table<p class="p"><code class="ph codeph">REFRESH</code> on the named
+            table</p></td>
+        </tr><tr class="strow">
+          <td class="stentry" headers="grant__simpletable_kmb_ppn_ndb__stentry__1"><code class="ph codeph">CREATE</code></td>
+          <td class="stentry" headers="grant__simpletable_kmb_ppn_ndb__stentry__2"><code class="ph codeph">SERVER</code></td>
+          <td class="stentry" headers="grant__simpletable_kmb_ppn_ndb__stentry__3"><code class="ph codeph">CREATE DATABASE</code> on all
+                databases<p class="p"><code class="ph codeph">CREATE TABLE</code> on all
+            tables</p></td>
+        </tr><tr class="strow">
+          <td class="stentry" headers="grant__simpletable_kmb_ppn_ndb__stentry__1"><code class="ph codeph">CREATE</code></td>
+          <td class="stentry" headers="grant__simpletable_kmb_ppn_ndb__stentry__2"><code class="ph codeph">DATABASE</code></td>
+          <td class="stentry" headers="grant__simpletable_kmb_ppn_ndb__stentry__3"><code class="ph codeph">CREATE TABLE</code> on all tables in the named
+            database</td>
+        </tr><tr class="strow">
+          <td class="stentry" headers="grant__simpletable_kmb_ppn_ndb__stentry__1"><code class="ph codeph">DROP</code></td>
+          <td class="stentry" headers="grant__simpletable_kmb_ppn_ndb__stentry__2"><code class="ph codeph">SERVER</code></td>
+          <td class="stentry" headers="grant__simpletable_kmb_ppn_ndb__stentry__3"><code class="ph codeph">DROP DATBASE</code> on all databases<p class="p"><code class="ph codeph">DROP
+                TABLE</code> on all tables</p></td>
+        </tr><tr class="strow">
+          <td class="stentry" headers="grant__simpletable_kmb_ppn_ndb__stentry__1"><code class="ph codeph">DROP</code></td>
+          <td class="stentry" headers="grant__simpletable_kmb_ppn_ndb__stentry__2"><code class="ph codeph">DATABASE</code></td>
+          <td class="stentry" headers="grant__simpletable_kmb_ppn_ndb__stentry__3"><code class="ph codeph">DROP DATABASE</code> on the named
+                database<p class="p"><code class="ph codeph">DROP TABLE</code> on all tables in the
+              named database</p></td>
+        </tr><tr class="strow">
+          <td class="stentry" headers="grant__simpletable_kmb_ppn_ndb__stentry__1"><code class="ph codeph">DROP</code></td>
+          <td class="stentry" headers="grant__simpletable_kmb_ppn_ndb__stentry__2"><code class="ph codeph">TABLE</code></td>
+          <td class="stentry" headers="grant__simpletable_kmb_ppn_ndb__stentry__3"><code class="ph codeph">DROP TABLE</code> on the named table</td>
+        </tr><tr class="strow">
+          <td class="stentry" headers="grant__simpletable_kmb_ppn_ndb__stentry__1"><code class="ph codeph">ALTER</code></td>
+          <td class="stentry" headers="grant__simpletable_kmb_ppn_ndb__stentry__2"><code class="ph codeph">SERVER</code></td>
+          <td class="stentry" headers="grant__simpletable_kmb_ppn_ndb__stentry__3"><code class="ph codeph">ALTER TABLE</code> on all tables</td>
+        </tr><tr class="strow">
+          <td class="stentry" headers="grant__simpletable_kmb_ppn_ndb__stentry__1"><code class="ph codeph">ALTER</code></td>
+          <td class="stentry" headers="grant__simpletable_kmb_ppn_ndb__stentry__2"><code class="ph codeph">DATABASE</code></td>
+          <td class="stentry" headers="grant__simpletable_kmb_ppn_ndb__stentry__3"><code class="ph codeph">ALTER TABLE</code> on the tables in the named
+            database</td>
+        </tr><tr class="strow">
+          <td class="stentry" headers="grant__simpletable_kmb_ppn_ndb__stentry__1"><code class="ph codeph">ALTER</code></td>
+          <td class="stentry" headers="grant__simpletable_kmb_ppn_ndb__stentry__2"><code class="ph codeph">TABLE</code></td>
+          <td class="stentry" headers="grant__simpletable_kmb_ppn_ndb__stentry__3"><code class="ph codeph">ALTER TABLE</code> on the named table</td>
+        </tr></tbody></table>
+    </div>
+
+    <div class="p">
+      <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+        <div class="p">
+          <ul class="ul">
+            <li class="li">
+              <code class="ph codeph">ALTER TABLE RENAME</code> requires the
+                <code class="ph codeph">ALTER</code> privilege at the <code class="ph codeph">TABLE</code>
+              level and the <code class="ph codeph">CREATE</code> privilege at the
+                <code class="ph codeph">DATABASE</code> level.
+            </li>
+
+            <li class="li">
+              <code class="ph codeph">CREATE TABLE AS SELECT</code> requires the
+                <code class="ph codeph">CREATE</code> privilege on the database that should
+              contain the new table and the <code class="ph codeph">SELECT</code> privilege on
+              the tables referenced in the query portion of the statement.
+            </li>
+
+            <li class="li">
+              <code class="ph codeph">COMPUTE STATS</code> requires  the
+                <code class="ph codeph">ALTER</code> and <code class="ph codeph">SELECT</code> privileges on
+              the target table.
+            </li>
+          </ul>
+        </div>
+      </div>
+    </div>
+
+    <p class="p">
+        <strong class="ph b">Compatibility:</strong>
+      </p>
+
+    <div class="p">
+      <ul class="ul">
+        <li class="li">
+          The Impala <code class="ph codeph">GRANT</code> and <code class="ph codeph">REVOKE</code> statements are available in
+          <span class="keyword">Impala 2.0</span> and later.
+        </li>
+
+        <li class="li">
+          In <span class="keyword">Impala 1.4</span> and later, Impala can make use of any roles and privileges specified by the
+          <code class="ph codeph">GRANT</code> and <code class="ph codeph">REVOKE</code> statements in Hive, when your system is configured to
+          use the Sentry service instead of the file-based policy mechanism.
+        </li>
+
+        <li class="li">
+          The Impala <code class="ph codeph">GRANT</code> and <code class="ph codeph">REVOKE</code> statements for privileges do not require
+          the <code class="ph codeph">ROLE</code> keyword to be repeated before each role name, unlike the equivalent Hive
+          statements.
+        </li>
+
+        <li class="li">
+          Currently, each Impala <code class="ph codeph">GRANT</code> or <code class="ph codeph">REVOKE</code> statement can only grant or
+          revoke a single privilege to or from a single role.
+        </li>
+      </ul>
+    </div>
+
+    <p class="p">
+        <strong class="ph b">Cancellation:</strong> Cannot be cancelled.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">HDFS permissions:</strong> This statement does not touch any HDFS files or directories,
+        therefore no HDFS permissions are required.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Kudu considerations:</strong>
+      </p>
+    <div class="p">
+        Access to Kudu tables must be granted to and revoked from roles with the
+        following considerations:
+        <ul class="ul">
+          <li class="li">
+            Only users with the <code class="ph codeph">ALL</code> privilege on
+              <code class="ph codeph">SERVER</code> can create external Kudu tables.
+          </li>
+          <li class="li">
+            The <code class="ph codeph">ALL</code> privileges on <code class="ph codeph">SERVER</code> is
+            required to specify the <code class="ph codeph">kudu.master_addresses</code>
+            property in the <code class="ph codeph">CREATE TABLE</code> statements for managed
+            tables as well as external tables.
+          </li>
+          <li class="li">
+            Access to Kudu tables is enforced at the table level and at the
+            column level.
+          </li>
+          <li class="li">
+            The <code class="ph codeph">SELECT</code>- and <code class="ph codeph">INSERT</code>-specific
+            permissions are supported.
+          </li>
+          <li class="li">
+            The <code class="ph codeph">DELETE</code>, <code class="ph codeph">UPDATE</code>, and
+            <code class="ph codeph">UPSERT</code> operations require the <code class="ph codeph">ALL</code>
+            privilege.
+          </li>
+        </ul>
+        Because non-SQL APIs can access Kudu data without going through Sentry
+        authorization, currently the Sentry support is considered preliminary
+        and subject to change.
+      </div>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+    <p class="p">
+      <a class="xref" href="impala_authorization.html#authorization">Enabling Sentry Authorization for Impala</a>, <a class="xref" href="impala_revoke.html#revoke">REVOKE Statement (Impala 2.0 or higher only)</a>,
+      <a class="xref" href="impala_create_role.html#create_role">CREATE ROLE Statement (Impala 2.0 or higher only)</a>, <a class="xref" href="impala_drop_role.html#drop_role">DROP ROLE Statement (Impala 2.0 or higher only)</a>,
+      <a class="xref" href="impala_show.html#show">SHOW Statement</a>
+    </p>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_langref_sql.html">Impala SQL Statements</a></div></div></nav></article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_group_by.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_group_by.html b/docs/build3x/html/topics/impala_group_by.html
new file mode 100644
index 0000000..bcc6c1d
--- /dev/null
+++ b/docs/build3x/html/topics/impala_group_by.html
@@ -0,0 +1,140 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_select.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="group_by"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>GROUP BY Clause</title></head><body id="group_by"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">GROUP BY Clause</h1>
+
+
+  <div class="body conbody">
+
+    <p class="p">
+      Specify the <code class="ph codeph">GROUP BY</code> clause in queries that use aggregation functions, such as
+      <code class="ph codeph"><a class="xref" href="impala_count.html#count">COUNT()</a></code>,
+      <code class="ph codeph"><a class="xref" href="impala_sum.html#sum">SUM()</a></code>,
+      <code class="ph codeph"><a class="xref" href="impala_avg.html#avg">AVG()</a></code>,
+      <code class="ph codeph"><a class="xref" href="impala_min.html#min">MIN()</a></code>, and
+      <code class="ph codeph"><a class="xref" href="impala_max.html#max">MAX()</a></code>. Specify in the
+      <code class="ph codeph"><a class="xref" href="impala_group_by.html#group_by">GROUP BY</a></code> clause the names of all the
+      columns that do not participate in the aggregation operation.
+    </p>
+
+
+    <p class="p">
+        <strong class="ph b">Complex type considerations:</strong>
+      </p>
+
+    <p class="p">
+      In <span class="keyword">Impala 2.3</span> and higher, the complex data types <code class="ph codeph">STRUCT</code>,
+      <code class="ph codeph">ARRAY</code>, and <code class="ph codeph">MAP</code> are available. These columns cannot
+      be referenced directly in the <code class="ph codeph">ORDER BY</code> clause.
+      When you query a complex type column, you use join notation to <span class="q">"unpack"</span> the elements
+      of the complex type, and within the join query you can include an <code class="ph codeph">ORDER BY</code>
+      clause to control the order in the result set of the scalar elements from the complex type.
+      See <a class="xref" href="impala_complex_types.html#complex_types">Complex Types (Impala 2.3 or higher only)</a> for details about Impala support for complex types.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Zero-length strings:</strong> For purposes of clauses such as <code class="ph codeph">DISTINCT</code> and <code class="ph codeph">GROUP
+        BY</code>, Impala considers zero-length strings (<code class="ph codeph">""</code>), <code class="ph codeph">NULL</code>, and space
+        to all be different values.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+    <p class="p">
+      For example, the following query finds the 5 items that sold the highest total quantity (using the
+      <code class="ph codeph">SUM()</code> function, and also counts the number of sales transactions for those items (using the
+      <code class="ph codeph">COUNT()</code> function). Because the column representing the item IDs is not used in any
+      aggregation functions, we specify that column in the <code class="ph codeph">GROUP BY</code> clause.
+    </p>
+
+<pre class="pre codeblock"><code>select
+  <strong class="ph b">ss_item_sk</strong> as Item,
+  <strong class="ph b">count</strong>(ss_item_sk) as Times_Purchased,
+  <strong class="ph b">sum</strong>(ss_quantity) as Total_Quantity_Purchased
+from store_sales
+  <strong class="ph b">group by ss_item_sk</strong>
+  order by sum(ss_quantity) desc
+  limit 5;
++-------+-----------------+--------------------------+
+| item  | times_purchased | total_quantity_purchased |
++-------+-----------------+--------------------------+
+| 9325  | 372             | 19072                    |
+| 4279  | 357             | 18501                    |
+| 7507  | 371             | 18475                    |
+| 5953  | 369             | 18451                    |
+| 16753 | 375             | 18446                    |
++-------+-----------------+--------------------------+</code></pre>
+
+    <p class="p">
+      The <code class="ph codeph">HAVING</code> clause lets you filter the results of aggregate functions, because you cannot
+      refer to those expressions in the <code class="ph codeph">WHERE</code> clause. For example, to find the 5 lowest-selling
+      items that were included in at least 100 sales transactions, we could use this query:
+    </p>
+
+<pre class="pre codeblock"><code>select
+  <strong class="ph b">ss_item_sk</strong> as Item,
+  <strong class="ph b">count</strong>(ss_item_sk) as Times_Purchased,
+  <strong class="ph b">sum</strong>(ss_quantity) as Total_Quantity_Purchased
+from store_sales
+  <strong class="ph b">group by ss_item_sk</strong>
+  <strong class="ph b">having times_purchased &gt;= 100</strong>
+  order by sum(ss_quantity)
+  limit 5;
++-------+-----------------+--------------------------+
+| item  | times_purchased | total_quantity_purchased |
++-------+-----------------+--------------------------+
+| 13943 | 105             | 4087                     |
+| 2992  | 101             | 4176                     |
+| 4773  | 107             | 4204                     |
+| 14350 | 103             | 4260                     |
+| 11956 | 102             | 4275                     |
++-------+-----------------+--------------------------+</code></pre>
+
+    <p class="p">
+      When performing calculations involving scientific or financial data, remember that columns with type
+      <code class="ph codeph">FLOAT</code> or <code class="ph codeph">DOUBLE</code> are stored as true floating-point numbers, which cannot
+      precisely represent every possible fractional value. Thus, if you include a <code class="ph codeph">FLOAT</code> or
+      <code class="ph codeph">DOUBLE</code> column in a <code class="ph codeph">GROUP BY</code> clause, the results might not precisely match
+      literal values in your query or from an original Text data file. Use rounding operations, the
+      <code class="ph codeph">BETWEEN</code> operator, or another arithmetic technique to match floating-point values that are
+      <span class="q">"near"</span> literal values you expect. For example, this query on the <code class="ph codeph">ss_wholesale_cost</code>
+      column returns cost values that are close but not identical to the original figures that were entered as
+      decimal fractions.
+    </p>
+
+<pre class="pre codeblock"><code>select ss_wholesale_cost, avg(ss_quantity * ss_sales_price) as avg_revenue_per_sale
+  from sales
+  group by ss_wholesale_cost
+  order by avg_revenue_per_sale desc
+  limit 5;
++-------------------+----------------------+
+| ss_wholesale_cost | avg_revenue_per_sale |
++-------------------+----------------------+
+| 96.94000244140625 | 4454.351539300434    |
+| 95.93000030517578 | 4423.119941283189    |
+| 98.37999725341797 | 4332.516490316291    |
+| 97.97000122070312 | 4330.480601655014    |
+| 98.52999877929688 | 4291.316953108634    |
++-------------------+----------------------+</code></pre>
+
+    <p class="p">
+      Notice how wholesale cost values originally entered as decimal fractions such as <code class="ph codeph">96.94</code> and
+      <code class="ph codeph">98.38</code> are slightly larger or smaller in the result set, due to precision limitations in the
+      hardware floating-point types. The imprecise representation of <code class="ph codeph">FLOAT</code> and
+      <code class="ph codeph">DOUBLE</code> values is why financial data processing systems often store currency using data types
+      that are less space-efficient but avoid these types of rounding errors.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+    <p class="p">
+      <a class="xref" href="impala_select.html#select">SELECT Statement</a>,
+      <a class="xref" href="impala_aggregate_functions.html#aggregate_functions">Impala Aggregate Functions</a>
+    </p>
+
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_select.html">SELECT Statement</a></div></div></nav></article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_group_concat.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_group_concat.html b/docs/build3x/html/topics/impala_group_concat.html
new file mode 100644
index 0000000..3a390c0
--- /dev/null
+++ b/docs/build3x/html/topics/impala_group_concat.html
@@ -0,0 +1,141 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_aggregate_functions.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="group_concat"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>GROUP_CONCAT Function</title></head><body id="group_concat"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">GROUP_CONCAT Function</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+
+      An aggregate function that returns a single string representing the argument value concatenated together for
+      each row of the result set. If the optional separator string is specified, the separator is added between
+      each pair of concatenated values. The default separator is a comma followed by a space.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>GROUP_CONCAT([ALL<span class="ph"> | DISTINCT</span>] <var class="keyword varname">expression</var> [, <var class="keyword varname">separator</var>])</code></pre>
+
+    <p class="p">
+        <strong class="ph b">Usage notes:</strong> <code class="ph codeph">concat()</code> and <code class="ph codeph">concat_ws()</code> are appropriate for
+        concatenating the values of multiple columns within the same row, while <code class="ph codeph">group_concat()</code>
+        joins together values from different rows.
+      </p>
+
+    <p class="p">
+      By default, returns a single string covering the whole result set. To include other columns or values in the
+      result set, or to produce multiple concatenated strings for subsets of rows, include a <code class="ph codeph">GROUP
+      BY</code> clause in the query.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Return type:</strong> <code class="ph codeph">STRING</code>
+    </p>
+
+    <p class="p">
+        This function cannot be used in an analytic context. That is, the <code class="ph codeph">OVER()</code> clause is not allowed at all with this function.
+      </p>
+
+    <p class="p">
+      Currently, Impala returns an error if the result value grows larger than 1 GiB.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+    <p class="p">
+      The following examples illustrate various aspects of the <code class="ph codeph">GROUP_CONCAT()</code> function.
+    </p>
+
+    <p class="p">
+      You can call the function directly on a <code class="ph codeph">STRING</code> column. To use it with a numeric column, cast
+      the value to <code class="ph codeph">STRING</code>.
+    </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; create table t1 (x int, s string);
+[localhost:21000] &gt; insert into t1 values (1, "one"), (3, "three"), (2, "two"), (1, "one");
+[localhost:21000] &gt; select group_concat(s) from t1;
++----------------------+
+| group_concat(s)      |
++----------------------+
+| one, three, two, one |
++----------------------+
+[localhost:21000] &gt; select group_concat(cast(x as string)) from t1;
++---------------------------------+
+| group_concat(cast(x as string)) |
++---------------------------------+
+| 1, 3, 2, 1                      |
++---------------------------------+
+</code></pre>
+
+    <p class="p">
+      Specify the <code class="ph codeph">DISTINCT</code> keyword to eliminate duplicate values from
+      the concatenated result:
+    </p>
+
+<pre class="pre codeblock"><code>
+[localhost:21000] &gt; select group_concat(distinct s) from t1;
++--------------------------+
+| group_concat(distinct s) |
++--------------------------+
+| three, two, one          |
++--------------------------+
+</code></pre>
+
+    <p class="p">
+      The optional separator lets you format the result in flexible ways. The separator can be an arbitrary string
+      expression, not just a single character.
+    </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; select group_concat(s,"|") from t1;
++----------------------+
+| group_concat(s, '|') |
++----------------------+
+| one|three|two|one    |
++----------------------+
+[localhost:21000] &gt; select group_concat(s,'---') from t1;
++-------------------------+
+| group_concat(s, '---')  |
++-------------------------+
+| one---three---two---one |
++-------------------------+
+</code></pre>
+
+    <p class="p">
+      The default separator is a comma followed by a space. To get a comma-delimited result without extra spaces,
+      specify a delimiter character that is only a comma.
+    </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; select group_concat(s,',') from t1;
++----------------------+
+| group_concat(s, ',') |
++----------------------+
+| one,three,two,one    |
++----------------------+
+</code></pre>
+
+    <p class="p">
+      Including a <code class="ph codeph">GROUP BY</code> clause lets you produce a different concatenated result for each group
+      in the result set. In this example, the only <code class="ph codeph">X</code> value that occurs more than once is
+      <code class="ph codeph">1</code>, so that is the only row in the result set where <code class="ph codeph">GROUP_CONCAT()</code> returns a
+      delimited value. For groups containing a single value, <code class="ph codeph">GROUP_CONCAT()</code> returns the original
+      value of its <code class="ph codeph">STRING</code> argument.
+    </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; select x, group_concat(s) from t1 group by x;
++---+-----------------+
+| x | group_concat(s) |
++---+-----------------+
+| 2 | two             |
+| 3 | three           |
+| 1 | one, one        |
++---+-----------------+
+</code></pre>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_aggregate_functions.html">Impala Aggregate Functions</a></div></div></nav></article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_hadoop.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_hadoop.html b/docs/build3x/html/topics/impala_hadoop.html
new file mode 100644
index 0000000..30c0a97
--- /dev/null
+++ b/docs/build3x/html/topics/impala_hadoop.html
@@ -0,0 +1,138 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_concepts.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="intro_hadoop"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>How Impala Fits Into the Hadoop Ecosystem</title></head><body id="intro_hadoop"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">How Impala Fits Into the Hadoop Ecosystem</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+      Impala makes use of many familiar components within the Hadoop ecosystem. Impala can interchange data with
+      other Hadoop components, as both a consumer and a producer, so it can fit in flexible ways into your ETL and
+      ELT pipelines.
+    </p>
+
+    <p class="p toc inpage"></p>
+  </div>
+
+  <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_concepts.html">Impala Concepts and Architecture</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="intro_hadoop__intro_hive">
+
+    <h2 class="title topictitle2" id="ariaid-title2">How Impala Works with Hive</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        A major Impala goal is to make SQL-on-Hadoop operations fast and efficient enough to appeal to new
+        categories of users and open up Hadoop to new types of use cases. Where practical, it makes use of existing
+        Apache Hive infrastructure that many Hadoop users already have in place to perform long-running,
+        batch-oriented SQL queries.
+      </p>
+
+      <p class="p">
+        In particular, Impala keeps its table definitions in a traditional MySQL or PostgreSQL database known as
+        the <strong class="ph b">metastore</strong>, the same database where Hive keeps this type of data. Thus, Impala can access tables
+        defined or loaded by Hive, as long as all columns use Impala-supported data types, file formats, and
+        compression codecs.
+      </p>
+
+      <p class="p">
+        The initial focus on query features and performance means that Impala can read more types of data with the
+        <code class="ph codeph">SELECT</code> statement than it can write with the <code class="ph codeph">INSERT</code> statement. To query
+        data using the Avro, RCFile, or SequenceFile <a class="xref" href="impala_file_formats.html#file_formats">file
+        formats</a>, you load the data using Hive.
+      </p>
+
+      <p class="p">
+        The Impala query optimizer can also make use of <a class="xref" href="impala_perf_stats.html#perf_table_stats">table
+        statistics</a> and <a class="xref" href="impala_perf_stats.html#perf_column_stats">column statistics</a>.
+        Originally, you gathered this information with the <code class="ph codeph">ANALYZE TABLE</code> statement in Hive; in
+        Impala 1.2.2 and higher, use the Impala <code class="ph codeph"><a class="xref" href="impala_compute_stats.html#compute_stats">COMPUTE
+        STATS</a></code> statement instead. <code class="ph codeph">COMPUTE STATS</code> requires less setup, is more
+        reliable, and does not require switching back and forth between <span class="keyword cmdname">impala-shell</span>
+        and the Hive shell.
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="intro_hadoop__intro_metastore">
+
+    <h2 class="title topictitle2" id="ariaid-title3">Overview of Impala Metadata and the Metastore</h2>
+
+
+    <div class="body conbody">
+
+      <p class="p">
+        As discussed in <a class="xref" href="impala_hadoop.html#intro_hive">How Impala Works with Hive</a>, Impala maintains information about table
+        definitions in a central database known as the <strong class="ph b">metastore</strong>. Impala also tracks other metadata for the
+        low-level characteristics of data files:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          The physical locations of blocks within HDFS.
+        </li>
+      </ul>
+
+      <p class="p">
+        For tables with a large volume of data and/or many partitions, retrieving all the metadata for a table can
+        be time-consuming, taking minutes in some cases. Thus, each Impala node caches all of this metadata to
+        reuse for future queries against the same table.
+      </p>
+
+      <p class="p">
+        If the table definition or the data in the table is updated, all other Impala daemons in the cluster must
+        receive the latest metadata, replacing the obsolete cached metadata, before issuing a query against that
+        table. In Impala 1.2 and higher, the metadata update is automatic, coordinated through the
+        <span class="keyword cmdname">catalogd</span> daemon, for all DDL and DML statements issued through Impala. See
+        <a class="xref" href="impala_components.html#intro_catalogd">The Impala Catalog Service</a> for details.
+      </p>
+
+      <p class="p">
+        For DDL and DML issued through Hive, or changes made manually to files in HDFS, you still use the
+        <code class="ph codeph">REFRESH</code> statement (when new data files are added to existing tables) or the
+        <code class="ph codeph">INVALIDATE METADATA</code> statement (for entirely new tables, or after dropping a table,
+        performing an HDFS rebalance operation, or deleting data files). Issuing <code class="ph codeph">INVALIDATE
+        METADATA</code> by itself retrieves metadata for all the tables tracked by the metastore. If you know
+        that only specific tables have been changed outside of Impala, you can issue <code class="ph codeph">REFRESH
+        <var class="keyword varname">table_name</var></code> for each affected table to only retrieve the latest metadata for
+        those tables.
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="intro_hadoop__intro_hdfs">
+
+    <h2 class="title topictitle2" id="ariaid-title4">How Impala Uses HDFS</h2>
+
+
+    <div class="body conbody">
+
+      <p class="p">
+        Impala uses the distributed filesystem HDFS as its primary data storage medium. Impala relies on the
+        redundancy provided by HDFS to guard against hardware or network outages on individual nodes. Impala table
+        data is physically represented as data files in HDFS, using familiar HDFS file formats and compression
+        codecs. When data files are present in the directory for a new table, Impala reads them all, regardless of
+        file name. New data is added in files with names controlled by Impala.
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title5" id="intro_hadoop__intro_hbase">
+
+    <h2 class="title topictitle2" id="ariaid-title5">How Impala Uses HBase</h2>
+
+
+    <div class="body conbody">
+
+      <p class="p">
+        HBase is an alternative to HDFS as a storage medium for Impala data. It is a database storage system built
+        on top of HDFS, without built-in SQL support. Many Hadoop users already have it configured and store large
+        (often sparse) data sets in it. By defining tables in Impala and mapping them to equivalent tables in
+        HBase, you can query the contents of the HBase tables through Impala, and even perform join queries
+        including both Impala and HBase tables. See <a class="xref" href="impala_hbase.html#impala_hbase">Using Impala to Query HBase Tables</a> for details.
+      </p>
+    </div>
+  </article>
+</article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_having.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_having.html b/docs/build3x/html/topics/impala_having.html
new file mode 100644
index 0000000..dd255ab
--- /dev/null
+++ b/docs/build3x/html/topics/impala_having.html
@@ -0,0 +1,39 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_select.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="having"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>HAVING Clause</title></head><body id="having"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">HAVING Clause</h1>
+
+
+  <div class="body conbody">
+
+    <p class="p">
+      Performs a filter operation on a <code class="ph codeph">SELECT</code> query, by examining the results of aggregation
+      functions rather than testing each individual table row. Therefore, it is always used in conjunction with a
+      function such as <code class="ph codeph"><a class="xref" href="impala_count.html#count">COUNT()</a></code>,
+      <code class="ph codeph"><a class="xref" href="impala_sum.html#sum">SUM()</a></code>,
+      <code class="ph codeph"><a class="xref" href="impala_avg.html#avg">AVG()</a></code>,
+      <code class="ph codeph"><a class="xref" href="impala_min.html#min">MIN()</a></code>, or
+      <code class="ph codeph"><a class="xref" href="impala_max.html#max">MAX()</a></code>, and typically with the
+      <code class="ph codeph"><a class="xref" href="impala_group_by.html#group_by">GROUP BY</a></code> clause also.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Restrictions:</strong>
+      </p>
+
+    <p class="p">
+      The filter expression in the <code class="ph codeph">HAVING</code> clause cannot include a scalar subquery.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+    <p class="p">
+      <a class="xref" href="impala_select.html#select">SELECT Statement</a>,
+      <a class="xref" href="impala_group_by.html#group_by">GROUP BY Clause</a>,
+      <a class="xref" href="impala_aggregate_functions.html#aggregate_functions">Impala Aggregate Functions</a>
+    </p>
+
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_select.html">SELECT Statement</a></div></div></nav></article></main></body></html>

[29/51] [partial] impala git commit: [DOCS] Impala doc site update for 3.0

Posted by mi...@apache.org.

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_impala_shell.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_impala_shell.html b/docs/build3x/html/topics/impala_impala_shell.html
new file mode 100644
index 0000000..42b01e7
--- /dev/null
+++ b/docs/build3x/html/topics/impala_impala_shell.html
@@ -0,0 +1,87 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_shell_options.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_connecting.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_shell_running_commands.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_shell_commands.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="impala_shell"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Using the Impala Shell (impala-shell Command)</title></head><body id=
 "impala_shell"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Using the Impala Shell (impala-shell Command)</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+
+      You can use the Impala shell tool (<code class="ph codeph">impala-shell</code>) to set up databases and tables, insert
+      data, and issue queries. For ad hoc queries and exploration, you can submit SQL statements in an interactive
+      session. To automate your work, you can specify command-line options to process a single statement or a
+      script file. The <span class="keyword cmdname">impala-shell</span> interpreter accepts all the same SQL statements listed in
+      <a class="xref" href="impala_langref_sql.html#langref_sql">Impala SQL Statements</a>, plus some shell-only commands that you can use for tuning
+      performance and diagnosing problems.
+    </p>
+
+    <p class="p">
+      The <code class="ph codeph">impala-shell</code> command fits into the familiar Unix toolchain:
+    </p>
+
+    <ul class="ul">
+      <li class="li">
+        The <code class="ph codeph">-q</code> option lets you issue a single query from the command line, without starting the
+        interactive interpreter. You could use this option to run <code class="ph codeph">impala-shell</code> from inside a shell
+        script or with the command invocation syntax from a Python, Perl, or other kind of script.
+      </li>
+
+      <li class="li">
+        The <code class="ph codeph">-f</code> option lets you process a file containing multiple SQL statements,
+        such as a set of reports or DDL statements to create a group of tables and views.
+      </li>
+
+      <li class="li">
+        The <code class="ph codeph">--var</code> option lets you pass substitution variables to the statements that
+        are executed by that <span class="keyword cmdname">impala-shell</span> session, for example the statements
+        in a script file processed by the <code class="ph codeph">-f</code> option. You encode the substitution variable
+        on the command line using the notation
+        <code class="ph codeph">--var=<var class="keyword varname">variable_name</var>=<var class="keyword varname">value</var></code>.
+        Within a SQL statement, you substitute the value by using the notation <code class="ph codeph">${var:<var class="keyword varname">variable_name</var>}</code>.
+        This feature is available in <span class="keyword">Impala 2.5</span> and higher.
+      </li>
+
+      <li class="li">
+        The <code class="ph codeph">-o</code> option lets you save query output to a file.
+      </li>
+
+      <li class="li">
+        The <code class="ph codeph">-B</code> option turns off pretty-printing, so that you can produce comma-separated,
+        tab-separated, or other delimited text files as output. (Use the <code class="ph codeph">--output_delimiter</code> option
+        to choose the delimiter character; the default is the tab character.)
+      </li>
+
+      <li class="li">
+        In non-interactive mode, query output is printed to <code class="ph codeph">stdout</code> or to the file specified by the
+        <code class="ph codeph">-o</code> option, while incidental output is printed to <code class="ph codeph">stderr</code>, so that you can
+        process just the query output as part of a Unix pipeline.
+      </li>
+
+      <li class="li">
+        In interactive mode, <code class="ph codeph">impala-shell</code> uses the <code class="ph codeph">readline</code> facility to recall
+        and edit previous commands.
+      </li>
+    </ul>
+
+    <p class="p">
+      For information on installing the Impala shell, see <a class="xref" href="impala_install.html#install">Installing Impala</a>.
+    </p>
+
+    <p class="p">
+      For information about establishing a connection to a DataNode running the <code class="ph codeph">impalad</code> daemon
+      through the <code class="ph codeph">impala-shell</code> command, see <a class="xref" href="impala_connecting.html#connecting">Connecting to impalad through impala-shell</a>.
+    </p>
+
+    <p class="p">
+      For a list of the <code class="ph codeph">impala-shell</code> command-line options, see
+      <a class="xref" href="impala_shell_options.html#shell_options">impala-shell Configuration Options</a>. For reference information about the
+      <code class="ph codeph">impala-shell</code> interactive commands, see
+      <a class="xref" href="impala_shell_commands.html#shell_commands">impala-shell Command Reference</a>.
+    </p>
+
+    <p class="p toc"></p>
+  </div>
+<nav role="navigation" class="related-links"><ul class="ullinks"><li class="link ulchildlink"><strong><a href="../topics/impala_shell_options.html">impala-shell Configuration Options</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_connecting.html">Connecting to impalad through impala-shell</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_shell_running_commands.html">Running Commands and SQL Statements in impala-shell</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_shell_commands.html">impala-shell Command Reference</a></strong><br></li></ul></nav></article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_incompatible_changes.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_incompatible_changes.html b/docs/build3x/html/topics/impala_incompatible_changes.html
new file mode 100644
index 0000000..3d25658
--- /dev/null
+++ b/docs/build3x/html/topics/impala_incompatible_changes.html
@@ -0,0 +1,1526 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_release_notes.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="incompatible_changes"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Incompatible Changes and Limitations in Apache Impala</title></head><body id="incompatible_changes"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1"><span class="ph">Incompatible Changes and Limitations in Apache Impala</span></h1>
+
+
+  <div class="body conbody">
+
+    <p class="p">
+      The Impala version covered by this documentation library contains the following incompatible changes. These
+      are things such as file format changes, removed features, or changes to implementation, default
+      configuration, dependencies, or prerequisites that could cause issues during or after an Impala upgrade.
+    </p>
+
+    <p class="p">
+      Even added SQL statements or clauses can produce incompatibilities, if you have databases, tables, or columns
+      whose names conflict with the new keywords. <span class="ph">See
+      <a class="xref" href="impala_reserved_words.html#reserved_words">Impala Reserved Words</a> for the set of reserved words for the current
+      release, and the quoting techniques to avoid name conflicts.</span>
+    </p>
+
+    <p class="p toc inpage"></p>
+  </div>
+  <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_release_notes.html">Impala Release Notes</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="incompatible_changes__incompatible_changes_300x">
+    <h2 class="title topictitle2" id="ariaid-title2">Incompatible Changes Introduced in Impala 3.0.x</h2>
+    <div class="body conbody">
+      <p class="p"> For the full list of issues closed in this release, including any that
+        introduce behavior changes or incompatibilities, see the <a class="xref" href="https://impala.apache.org/docs/changelog-3.0.html" target="_blank">changelog for <span class="keyword">Impala 3.0</span></a>. </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="incompatible_changes__incompatible_changes_212x">
+
+    <h2 class="title topictitle2" id="ariaid-title3">Incompatible Changes Introduced in Impala 2.12.x</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        For the full list of issues closed in this release, including any that introduce
+        behavior changes or incompatibilities, see the
+        <a class="xref" href="https://impala.apache.org/docs/changelog-2.12.html" target="_blank">changelog for <span class="keyword">Impala 2.12</span></a>.
+      </p>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="incompatible_changes__incompatible_changes_211x">
+
+    <h2 class="title topictitle2" id="ariaid-title4">Incompatible Changes Introduced in Impala 2.11.x</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        For the full list of issues closed in this release, including any that introduce
+        behavior changes or incompatibilities, see the
+        <a class="xref" href="https://impala.apache.org/docs/changelog-2.11.html" target="_blank">changelog for <span class="keyword">Impala 2.11</span></a>.
+      </p>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title5" id="incompatible_changes__incompatible_changes_210x">
+
+    <h2 class="title topictitle2" id="ariaid-title5">Incompatible Changes Introduced in Impala 2.10.x</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        For the full list of issues closed in this release, including any that introduce
+        behavior changes or incompatibilities, see the
+        <a class="xref" href="https://impala.apache.org/docs/changelog-2.10.html" target="_blank">changelog for <span class="keyword">Impala 2.10</span></a>.
+      </p>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title6" id="incompatible_changes__incompatible_changes_29x">
+
+    <h2 class="title topictitle2" id="ariaid-title6">Incompatible Changes Introduced in Impala 2.9.x</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        For the full list of issues closed in this release, including any that introduce
+        behavior changes or incompatibilities, see the
+        <a class="xref" href="https://impala.apache.org/docs/changelog-2.9.html" target="_blank">changelog for <span class="keyword">Impala 2.9</span></a>.
+      </p>
+
+
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title7" id="incompatible_changes__incompatible_changes_28x">
+
+    <h2 class="title topictitle2" id="ariaid-title7">Incompatible Changes Introduced in Impala 2.8.x</h2>
+
+    <div class="body conbody">
+      <ul class="ul">
+        <li class="li">
+          <p class="p">
+            Llama support is removed completely from Impala. Related flags (<code class="ph codeph">--enable_rm</code>)
+            and query options (such as <code class="ph codeph">V_CPU_CORES</code>) remain but do not have any effect.
+          </p>
+          <p class="p">
+            If <code class="ph codeph">--enable_rm</code> is passed to Impala, a warning is printed to the log on startup.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            The syntax related to Kudu tables includes a number of new reserved words,
+            such as <code class="ph codeph">COMPRESSION</code>, <code class="ph codeph">DEFAULT</code>, and <code class="ph codeph">ENCODING</code>, that
+            might conflict with names of existing tables, columns, or other identifiers from older Impala versions.
+            See <a class="xref" href="impala_reserved_words.html#reserved_words">Impala Reserved Words</a> for the full list of reserved words.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            The DDL syntax for Kudu tables, particularly in the <code class="ph codeph">CREATE TABLE</code> statement, is different
+            from the special <code class="ph codeph">impala_next</code> fork that was previously used for accessing Kudu tables
+            from Impala:
+          </p>
+          <ul class="ul">
+            <li class="li">
+              <p class="p">
+                The <code class="ph codeph">DISTRIBUTE BY</code> clause is now <code class="ph codeph">PARTITIONED BY</code>.
+              </p>
+            </li>
+            <li class="li">
+              <p class="p">
+                The <code class="ph codeph">INTO <var class="keyword varname">N</var> BUCKETS</code>
+                clause is now <code class="ph codeph">PARTITIONS <var class="keyword varname">N</var></code>.
+              </p>
+            </li>
+            <li class="li">
+              <p class="p">
+                The <code class="ph codeph">SPLIT ROWS</code> clause is replaced by different syntax for specifying
+                the ranges covered by each partition.
+              </p>
+            </li>
+          </ul>
+        </li>
+        <li class="li">
+          <p class="p">
+            The <code class="ph codeph">DESCRIBE</code> output for Kudu tables includes several extra columns.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            Non-primary-key columns can contain <code class="ph codeph">NULL</code> values by default. The
+            <code class="ph codeph">SHOW CREATE TABLE</code> output for these columns displays the <code class="ph codeph">NULL</code>
+            attribute. There was a period during early experimental versions of Impala + Kudu where
+            non-primary-key columns had the <code class="ph codeph">NOT NULL</code> attribute by default.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            The <code class="ph codeph">IGNORE</code> keyword that was present in early experimental versions of Impala + Kudu
+            is no longer present. The behavior of the <code class="ph codeph">IGNORE</code> keyword is now the default:
+            DML statements continue with warnings, instead of failing with errors, if they encounter conditions
+            such as <span class="q">"primary key already exists"</span> for an <code class="ph codeph">INSERT</code> statement or
+            <span class="q">"primary key already deleted"</span> for a <code class="ph codeph">DELETE</code> statement.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            The replication factor for Kudu tables must be an odd number.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            A UDF compiled into an LLVM IR bitcode module (<code class="ph codeph">.bc</code>) might
+            encounter a runtime error when native code generation is turned off by
+            setting the query option <code class="ph codeph">DISABLE_CODEGEN=1</code>.
+            This issue also applies when running a built-in or native UDF with
+            more than 20 arguments.
+            See <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-4432" target="_blank">IMPALA-4432</a> for details.
+            As a workaround, either turn native code generation back on with the query option
+            <code class="ph codeph">DISABLE_CODEGEN=0</code>, or use the regular UDF compilation path
+            that does not produce an IR module.
+          </p>
+        </li>
+      </ul>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title8" id="incompatible_changes__incompatible_changes_27x">
+
+    <h2 class="title topictitle2" id="ariaid-title8">Incompatible Changes Introduced in Impala 2.7.x</h2>
+
+    <div class="body conbody">
+      <ul class="ul">
+        <li class="li">
+          <p class="p">
+            Bug fixes related to parsing of floating-point values (IMPALA-1731 and IMPALA-3868) can change
+            the results of casting strings that represent invalid floating-point values.
+            For example, formerly a string value beginning or ending with <code class="ph codeph">inf</code>,
+            such as <code class="ph codeph">1.23inf</code> or <code class="ph codeph">infinite</code>, now are converted to <code class="ph codeph">NULL</code>
+            when interpreted as a floating-point value.
+            Formerly, they were interpreted as the special <span class="q">"infinity"</span> value when converting from string to floating-point.
+            Similarly, now only the string <code class="ph codeph">NaN</code> (case-sensitive) is interpreted as the special <span class="q">"not a number"</span>
+            value. String values containing multiple dots, such as <code class="ph codeph">3..141</code> or <code class="ph codeph">3.1.4.1</code>,
+            are now interpreted as <code class="ph codeph">NULL</code> rather than being converted to valid floating-point values.
+          </p>
+        </li>
+      </ul>
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title9" id="incompatible_changes__incompatible_changes_26x">
+
+    <h2 class="title topictitle2" id="ariaid-title9">Incompatible Changes Introduced in Impala 2.6.x</h2>
+
+    <div class="body conbody">
+      <ul class="ul">
+        <li class="li">
+          <p class="p">
+            The default for the <code class="ph codeph">RUNTIME_FILTER_MODE</code>
+            query option is changed to <code class="ph codeph">GLOBAL</code> (the highest setting).
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            The <code class="ph codeph">RUNTIME_BLOOM_FILTER_SIZE</code> setting is now only used
+            as a fallback if statistics are not available; otherwise, Impala
+            uses the statistics to estimate the appropriate size to use for each filter.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            Admission control and dynamic resource pools are enabled by default.
+            When upgrading from an earlier release, you must turn on these settings yourself
+            if they are not already enabled.
+            See <a class="xref" href="impala_admission.html#admission_control">Admission Control and Query Queuing</a> for details
+            about admission control.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            Impala reserves some new keywords, in preparation for support for Kudu syntax:
+            <code class="ph codeph">buckets</code>, <code class="ph codeph">delete</code>, <code class="ph codeph">distribute</code>,
+            <code class="ph codeph">hash</code>, <code class="ph codeph">ignore</code>, <code class="ph codeph">split</code>, and <code class="ph codeph">update</code>.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            For Kerberized clusters, the Catalog service now uses
+            the Kerberos principal instead of the operating sytem user that runs
+            the <span class="keyword cmdname">catalogd</span> daemon.
+            This eliminates the requirement to configure a <code class="ph codeph">hadoop.user.group.static.mapping.overrides</code>
+            setting to put the OS user into the Sentry administrative group, on clusters where the principal
+            and the OS user name for this user are different.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            The mechanism for interpreting <code class="ph codeph">DECIMAL</code> literals is
+            improved, no longer going through an intermediate conversion step
+            to <code class="ph codeph">DOUBLE</code>:
+          </p>
+          <ul class="ul">
+            <li class="li">
+              <p class="p">
+                Casting a <code class="ph codeph">DECIMAL</code> value to <code class="ph codeph">TIMESTAMP</code>
+                <code class="ph codeph">DOUBLE</code> produces a more precise
+                value for the <code class="ph codeph">TIMESTAMP</code> than formerly.
+              </p>
+            </li>
+            <li class="li">
+              <p class="p">
+                Certain function calls involving <code class="ph codeph">DECIMAL</code> literals
+                now succeed, when formerly they failed due to lack of a function
+                signature with a <code class="ph codeph">DOUBLE</code> argument.
+              </p>
+            </li>
+          </ul>
+        </li>
+        <li class="li">
+          <p class="p">
+            Improved type accuracy for <code class="ph codeph">CASE</code> return values.
+            If all <code class="ph codeph">WHEN</code> clauses of the <code class="ph codeph">CASE</code>
+            expression are of <code class="ph codeph">CHAR</code> type, the final result
+            is also <code class="ph codeph">CHAR</code> instead of being converted to
+            <code class="ph codeph">STRING</code>.
+          </p>
+        </li>
+        <li class="li">
+          <div class="p">
+        The initial release of <span class="keyword">Impala 2.5</span> sometimes has a higher peak memory usage than in previous releases
+        while reading Parquet files.
+        The following query options might help to reduce memory consumption in the Parquet scanner:
+        <ul class="ul">
+          <li class="li">
+            Reduce the number of scanner threads, for example: <code class="ph codeph">set num_scanner_threads=30</code>
+          </li>
+          <li class="li">
+            Reduce the batch size, for example: <code class="ph codeph">set batch_size=512</code>
+          </li>
+          <li class="li">
+            Increase the memory limit, for example: <code class="ph codeph">set mem_limit=64g</code>
+          </li>
+        </ul>
+        You can track the status of the fix for this issue at
+        <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-3662" target="_blank">IMPALA-3662</a>.
+      </div>
+        </li>
+        <li class="li">
+          <p class="p">
+            The <code class="ph codeph">S3_SKIP_INSERT_STAGING</code> query option, which is enabled by
+            default, increases the speed of <code class="ph codeph">INSERT</code> operations for S3 tables.
+            The speedup applies to regular <code class="ph codeph">INSERT</code>, but not <code class="ph codeph">INSERT OVERWRITE</code>.
+            The tradeoff is the possibility of inconsistent output files left behind if a
+            node fails during <code class="ph codeph">INSERT</code> execution.
+            See <a class="xref" href="impala_s3_skip_insert_staging.html#s3_skip_insert_staging">S3_SKIP_INSERT_STAGING Query Option (Impala 2.6 or higher only)</a> for details.
+          </p>
+        </li>
+      </ul>
+      <p class="p">
+        Certain features are turned off by default, to avoid regressions or unexpected
+        behavior following an upgrade. Consider turning on these features after suitable testing:
+      </p>
+      <ul class="ul">
+        <li class="li">
+          <p class="p">
+            Impala now recognizes the <code class="ph codeph">auth_to_local</code> setting,
+            specified through the HDFS configuration setting
+            <code class="ph codeph">hadoop.security.auth_to_local</code>.
+            This feature is disabled by default; to enable it,
+            specify <code class="ph codeph">--load_auth_to_local_rules=true</code>
+            in the <span class="keyword cmdname">impalad</span> configuration settings.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            A new query option, <code class="ph codeph">PARQUET_ANNOTATE_STRINGS_UTF8</code>,
+            makes Impala include the <code class="ph codeph">UTF-8</code> annotation
+            metadata for <code class="ph codeph">STRING</code>, <code class="ph codeph">CHAR</code>,
+            and <code class="ph codeph">VARCHAR</code> columns in Parquet files created
+            by <code class="ph codeph">INSERT</code> or <code class="ph codeph">CREATE TABLE AS SELECT</code>
+            statements.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            A new query option,
+            <code class="ph codeph">PARQUET_FALLBACK_SCHEMA_RESOLUTION</code>,
+            lets Impala locate columns within Parquet files based on
+            column name rather than ordinal position.
+            This enhancement improves interoperability with applications
+            that write Parquet files with a different order or subset of
+            columns than are used in the Impala table.
+          </p>
+        </li>
+      </ul>
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title10" id="incompatible_changes__incompatible_changes_25x">
+
+    <h2 class="title topictitle2" id="ariaid-title10">Incompatible Changes Introduced in Impala 2.5.x</h2>
+
+    <div class="body conbody">
+      <ul class="ul">
+        <li class="li">
+          <p class="p">
+            The admission control default limit for concurrent queries (the <span class="ph uicontrol">max requests</span>
+            setting) is now unlimited instead of 200.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            Multiplying a mixture of <code class="ph codeph">DECIMAL</code> and <code class="ph codeph">FLOAT</code> or
+            <code class="ph codeph">DOUBLE</code> values now returns
+            <code class="ph codeph">DOUBLE</code> rather than <code class="ph codeph">DECIMAL</code>. This
+            change avoids some cases where an intermediate value would underflow or overflow
+            and become <code class="ph codeph">NULL</code> unexpectedly. The results of
+            multiplying <code class="ph codeph">DECIMAL</code> and <code class="ph codeph">FLOAT</code> or
+            <code class="ph codeph">DOUBLE</code> might now be slightly less precise than
+            before. Previously, the intermediate types and thus the final result
+            depended on the exact order of the values of different types being
+            multiplied, which made the final result values difficult to
+            reason about.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            Previously, the <code class="ph codeph">_</code> and <code class="ph codeph">%</code> wildcard
+            characters for the <code class="ph codeph">LIKE</code> operator would not match
+            characters on the second or subsequent lines of multi-line string values. The fix for issue
+            <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2204" target="_blank">IMPALA-2204</a> causes
+            the wildcard matching to apply to the entire string for values
+            containing embedded <code class="ph codeph">\n</code> characters. This could cause
+            different results than in previous Impala releases for identical
+            queries on identical data.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            Formerly, all Impala UDFs and UDAs required running the
+            <code class="ph codeph">CREATE FUNCTION</code> statements to
+            re-create them after each <span class="keyword cmdname">catalogd</span> restart.
+            In <span class="keyword">Impala 2.5</span> and higher, functions written in C++ are persisted across
+            restarts, and the requirement to
+            re-create functions only applies to functions written in Java. Adapt any
+            function-reloading logic that you have added to your Impala environment.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+              <code class="ph codeph">CREATE TABLE LIKE</code> no longer inherits HDFS caching settings from the source table.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            The <code class="ph codeph">SHOW DATABASES</code> statement now returns two columns rather than one.
+            The second column includes the associated comment string, if any, for each database.
+            Adjust any application code that examines the list of databases and assumes the
+            result set contains only a single column.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            The output of the <code class="ph codeph">SHOW FUNCTIONS</code> statement includes
+            two new columns, showing the kind of the function (for example,
+            <code class="ph codeph">BUILTIN</code>) and whether or not the function persists
+            across catalog server restarts. For example, the <code class="ph codeph">SHOW
+            FUNCTIONS</code> output for the
+            <code class="ph codeph">_impala_builtins</code> database starts with:
+          </p>
+<pre class="pre codeblock"><code>
++--------------+-------------------------------------------------+-------------+---------------+
+| return type  | signature                                       | binary type | is persistent |
++--------------+-------------------------------------------------+-------------+---------------+
+| BIGINT       | abs(BIGINT)                                     | BUILTIN     | true          |
+| DECIMAL(*,*) | abs(DECIMAL(*,*))                               | BUILTIN     | true          |
+| DOUBLE       | abs(DOUBLE)                                     | BUILTIN     | true          |
+...
+</code></pre>
+        </li>
+      </ul>
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title11" id="incompatible_changes__incompatible_changes_24x">
+
+    <h2 class="title topictitle2" id="ariaid-title11">Incompatible Changes Introduced in Impala 2.4.x</h2>
+
+    <div class="body conbody">
+      <p class="p">
+        Other than support for DSSD storage, the Impala feature set for <span class="keyword">Impala 2.4</span> is the same as for <span class="keyword">Impala 2.3</span>.
+        Therefore, there are no incompatible changes for Impala introduced in <span class="keyword">Impala 2.4</span>.
+      </p>
+    </div>
+
+  </article>
+
+
+
+
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title12" id="incompatible_changes__incompatible_changes_23x">
+
+    <h2 class="title topictitle2" id="ariaid-title12">Incompatible Changes Introduced in Impala 2.3.x</h2>
+
+    <div class="body conbody">
+
+      <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+        <p class="p">
+          The use of the Llama component for integrated resource management within YARN
+          is no longer supported with <span class="keyword">Impala 2.3</span> and higher.
+          The Llama support code is removed entirely in <span class="keyword">Impala 2.8</span> and higher.
+        </p>
+        <p class="p">
+          For clusters running Impala alongside
+          other data management components, you define static service pools to define the resources
+          available to Impala and other components. Then within the area allocated for Impala,
+          you can create dynamic service pools, each with its own settings for the Impala admission control feature.
+        </p>
+      </div>
+
+      <ul class="ul">
+
+        <li class="li">
+          <p class="p">
+            If Impala encounters a Parquet file that is invalid because of an incorrect magic number,
+            the query skips the file. This change is caused by the fix for issue <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2130" target="_blank">IMPALA-2130</a>.
+            Previously, Impala would attempt to read the file despite the possibility that the file was corrupted.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            Previously, calls to overloaded built-in functions could treat parameters as <code class="ph codeph">DOUBLE</code>
+            or <code class="ph codeph">FLOAT</code> when no overload had a signature that matched the exact argument types.
+            Now Impala prefers the function signature with <code class="ph codeph">DECIMAL</code> parameters in this case.
+            This change avoids a possible loss of precision in function calls such as <code class="ph codeph">greatest(0, 99999.8888)</code>;
+            now both parameters are treated as <code class="ph codeph">DECIMAL</code> rather than <code class="ph codeph">DOUBLE</code>, avoiding
+            any loss of precision in the fractional value.
+            This could cause slightly different results than in previous Impala releases for certain function calls.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            Formerly, adding or subtracting a large interval value to a <code class="ph codeph">TIMESTAMP</code> could produce
+            a nonsensical result. Now when the result goes outside the range of <code class="ph codeph">TIMESTAMP</code> values,
+            Impala returns <code class="ph codeph">NULL</code>.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            Formerly, it was possible to accidentally create a table with identical row and column delimiters.
+            This could happen unintentionally, when specifying one of the delimiters and using the
+            default value for the other. Now an attempt to use identical delimiters still succeeds,
+            but displays a warning message.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            Formerly, Impala could include snippets of table data in log files by default, for example
+            when reporting conversion errors for data values. Now any such log messages are only produced
+            at higher logging levels that you would enable only during debugging.
+          </p>
+        </li>
+
+      </ul>
+    </div>
+
+  </article>
+
+
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title13" id="incompatible_changes__incompatible_changes_22x">
+
+    <h2 class="title topictitle2" id="ariaid-title13">Incompatible Changes Introduced in Impala 2.2.x</h2>
+
+    <div class="body conbody">
+
+      <section class="section" id="incompatible_changes_22x__files_220"><h3 class="title sectiontitle">
+        Changes to File Handling
+      </h3>
+
+        <p class="p">
+        Impala queries ignore files with extensions commonly used for temporary work files by Hadoop tools. Any
+        files with extensions <code class="ph codeph">.tmp</code> or <code class="ph codeph">.copying</code> are not considered part of the
+        Impala table. The suffix matching is case-insensitive, so for example Impala ignores both
+        <code class="ph codeph">.copying</code> and <code class="ph codeph">.COPYING</code> suffixes.
+      </p>
+        <p class="p">
+          The log rotation feature in Impala 2.2.0 and higher
+          means that older log files are now removed by default.
+          The default is to preserve the latest 10 log files for each
+          severity level, for each Impala-related daemon. If you have
+          set up your own log rotation processes that expect older
+          files to be present, either adjust your procedures or
+          change the Impala <code class="ph codeph">-max_log_files</code> setting.
+          <span class="ph">See <a class="xref" href="impala_logging.html#logs_rotate">Rotating Impala Logs</a> for details.</span>
+        </p>
+      </section>
+
+      <section class="section" id="incompatible_changes_22x__prereqs_210"><h3 class="title sectiontitle">
+        Changes to Prerequisites
+      </h3>
+
+        <p class="p">
+        The prerequisite for CPU architecture has been relaxed in Impala 2.2.0 and higher. From this release
+        onward, Impala works on CPUs that have the SSSE3 instruction set. The SSE4 instruction set is no longer
+        required. This relaxed requirement simplifies the upgrade planning from Impala 1.x releases, which also
+        worked on SSSE3-enabled processors.
+      </p>
+      </section>
+
+    </div>
+  </article>
+
+
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title14" id="incompatible_changes__incompatible_changes_21x">
+
+    <h2 class="title topictitle2" id="ariaid-title14">Incompatible Changes Introduced in Impala 2.1.x</h2>
+
+    <div class="body conbody">
+
+      <section class="section" id="incompatible_changes_21x__prereqs_210"><h3 class="title sectiontitle">
+        Changes to Prerequisites
+      </h3>
+
+        <p class="p">
+          Currently, Impala 2.1.x does not function on CPUs without the SSE4.1 instruction set. This minimum CPU
+          requirement is higher than in previous versions, which relied on the older SSSE3 instruction set. Check
+          the CPU level of the hosts in your cluster before upgrading to <span class="keyword">Impala 2.1</span>.
+        </p>
+      </section>
+
+      <section class="section" id="incompatible_changes_21x__output_format_210"><h3 class="title sectiontitle">
+        Changes to Output Format
+      </h3>
+
+        <p class="p">
+          The <span class="q">"small query"</span> optimization feature introduces some new information in the
+          <code class="ph codeph">EXPLAIN</code> plan, which you might need to account for if you parse the text of the plan
+          output.
+        </p>
+      </section>
+
+      <section class="section" id="incompatible_changes_21x__reserved_words_210"><h3 class="title sectiontitle">
+        New Reserved Words
+      </h3>
+
+      <p class="p">
+        New SQL syntax introduces additional reserved words:
+        <code class="ph codeph">FOR</code>, <code class="ph codeph">GRANT</code>, <code class="ph codeph">REVOKE</code>, <code class="ph codeph">ROLE</code>, <code class="ph codeph">ROLES</code>,
+        <code class="ph codeph">INCREMENTAL</code>.
+        <span class="ph">As always, see <a class="xref" href="impala_reserved_words.html#reserved_words">Impala Reserved Words</a>
+        for the set of reserved words for the current release, and the quoting techniques to avoid name conflicts.</span>
+      </p>
+      </section>
+    </div>
+  </article>
+
+
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title15" id="incompatible_changes__incompatible_changes_205">
+
+    <h2 class="title topictitle2" id="ariaid-title15">Incompatible Changes Introduced in Impala 2.0.5</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        No incompatible changes.
+      </p>
+
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title16" id="incompatible_changes__incompatible_changes_204">
+
+    <h2 class="title topictitle2" id="ariaid-title16">Incompatible Changes Introduced in Impala 2.0.4</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        No incompatible changes.
+      </p>
+
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title17" id="incompatible_changes__incompatible_changes_203">
+
+    <h2 class="title topictitle2" id="ariaid-title17">Incompatible Changes Introduced in Impala 2.0.3</h2>
+
+    <div class="body conbody">
+
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title18" id="incompatible_changes__incompatible_changes_202">
+
+    <h2 class="title topictitle2" id="ariaid-title18">Incompatible Changes Introduced in Impala 2.0.2</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        No incompatible changes.
+      </p>
+
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title19" id="incompatible_changes__incompatible_changes_201">
+
+    <h2 class="title topictitle2" id="ariaid-title19">Incompatible Changes Introduced in Impala 2.0.1</h2>
+
+    <div class="body conbody">
+
+      <ul class="ul">
+        <li class="li">
+          <p class="p">
+        The <code class="ph codeph">INSERT</code> statement has always left behind a hidden work directory inside the data
+        directory of the table. Formerly, this hidden work directory was named
+        <span class="ph filepath">.impala_insert_staging</span> . In Impala 2.0.1 and later, this directory name is changed to
+        <span class="ph filepath">_impala_insert_staging</span> . (While HDFS tools are expected to treat names beginning
+        either with underscore and dot as hidden, in practice names beginning with an underscore are more widely
+        supported.) If you have any scripts, cleanup jobs, and so on that rely on the name of this work directory,
+        adjust them to use the new name.
+      </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            The <code class="ph codeph">abs()</code> function now takes a broader range of numeric types as arguments, and the
+            return type is the same as the argument type.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            Shorthand notation for character classes in regular expressions, such as <code class="ph codeph">\d</code> for digit,
+            are now available again in regular expression operators and functions such as
+            <code class="ph codeph">regexp_extract()</code> and <code class="ph codeph">regexp_replace()</code>. Some other differences in
+            regular expression behavior remain between Impala 1.x and Impala 2.x releases. See
+            <a class="xref" href="impala_incompatible_changes.html#incompatible_changes_200">Incompatible Changes Introduced in Impala 2.0.0</a> for details.
+          </p>
+        </li>
+      </ul>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title20" id="incompatible_changes__incompatible_changes_200">
+
+    <h2 class="title topictitle2" id="ariaid-title20">Incompatible Changes Introduced in Impala 2.0.0</h2>
+
+    <div class="body conbody">
+
+      <section class="section" id="incompatible_changes_200__prereqs_200"><h3 class="title sectiontitle">
+        Changes to Prerequisites
+      </h3>
+
+        <p class="p">
+          Currently, Impala 2.0.x does not function on CPUs without the SSE4.1 instruction set. This minimum CPU
+          requirement is higher than in previous versions, which relied on the older SSSE3 instruction set. Check
+          the CPU level of the hosts in your cluster before upgrading to <span class="keyword">Impala 2.0</span>.
+        </p>
+      </section>
+
+      <section class="section" id="incompatible_changes_200__queries_200"><h3 class="title sectiontitle">
+        Changes to Query Syntax
+      </h3>
+
+
+        <p class="p">
+          The new syntax where query hints are allowed in comments causes some changes in the way comments are
+          parsed in the <span class="keyword cmdname">impala-shell</span> interpreter. Previously, you could end a
+          <code class="ph codeph">--</code> comment line with a semicolon and <span class="keyword cmdname">impala-shell</span> would treat that
+          as a no-op statement. Now, a comment line ending with a semicolon is passed as an empty statement to
+          the Impala daemon, where it is flagged as an error.
+        </p>
+
+        <p class="p">
+          Impala 2.0 and later uses a different support library for regular expression parsing than in earlier
+          Impala versions. Now, Impala uses the
+          <a class="xref" href="https://code.google.com/p/re2/" target="_blank">Google RE2 library</a>
+          rather than Boost for evaluating regular expressions. This implementation change causes some
+          differences in the allowed regular expression syntax, and in the way certain regex operators are
+          interpreted. The following are some of the major differences (not necessarily a complete list):
+        </p>
+        <ul class="ul">
+          <li class="li">
+            <p class="p">
+              <code class="ph codeph">.*?</code> notation for non-greedy matches is now supported, where it was not in earlier
+              Impala releases.
+            </p>
+          </li>
+
+          <li class="li">
+            <p class="p">
+              By default, <code class="ph codeph">^</code> and <code class="ph codeph">$</code> now match only begin/end of buffer, not
+              begin/end of each line. This behavior can be overridden in the regex itself using the
+              <code class="ph codeph">m</code> flag.
+            </p>
+          </li>
+
+          <li class="li">
+            <p class="p">
+              By default, <code class="ph codeph">.</code> does not match newline. This behavior can be overridden in the regex
+              itself using the <code class="ph codeph">s</code> flag.
+            </p>
+          </li>
+
+          <li class="li">
+            <p class="p">
+              <code class="ph codeph">\Z</code> is not supported.
+            </p>
+          </li>
+
+          <li class="li">
+            <p class="p">
+              <code class="ph codeph">&lt;</code> and <code class="ph codeph">&gt;</code> for start of word and end of word are not
+              supported.
+            </p>
+          </li>
+
+          <li class="li">
+            <p class="p">
+              Lookahead and lookbehind are not supported.
+            </p>
+          </li>
+
+          <li class="li">
+            <p class="p">
+              Shorthand notation for character classes, such as <code class="ph codeph">\d</code> for digit, is not recognized.
+              (This restriction is lifted in Impala 2.0.1, which restores the shorthand notation.)
+            </p>
+          </li>
+        </ul>
+      </section>
+
+      <section class="section" id="incompatible_changes_200__output_format_210"><h3 class="title sectiontitle">
+        Changes to Output Format
+      </h3>
+
+
+        <p class="p">
+        In Impala 2.0 and later, <code class="ph codeph">user()</code> returns the full Kerberos principal string, such as
+        <code class="ph codeph">user@example.com</code>, in a Kerberized environment.
+      </p>
+
+        <p class="p">
+          The changed format for the user name in secure environments is also reflected where the user name is
+          displayed in the output of the <code class="ph codeph">PROFILE</code> command.
+        </p>
+
+        <p class="p">
+          In the output from <code class="ph codeph">SHOW FUNCTIONS</code>, <code class="ph codeph">SHOW AGGREGATE FUNCTIONS</code>, and
+          <code class="ph codeph">SHOW ANALYTIC FUNCTIONS</code>, arguments and return types of arbitrary
+          <code class="ph codeph">DECIMAL</code> scale and precision are represented as <code class="ph codeph">DECIMAL(*,*)</code>.
+          Formerly, these items were displayed as <code class="ph codeph">DECIMAL(-1,-1)</code>.
+        </p>
+
+      </section>
+
+      <section class="section" id="incompatible_changes_200__query_options_200"><h3 class="title sectiontitle">
+        Changes to Query Options
+      </h3>
+
+        <p class="p">
+          The <code class="ph codeph">PARQUET_COMPRESSION_CODEC</code> query option has been replaced by the
+          <code class="ph codeph">COMPRESSION_CODEC</code> query option.
+          <span class="ph">See <a class="xref" href="impala_compression_codec.html#compression_codec">COMPRESSION_CODEC Query Option (Impala 2.0 or higher only)</a> for details.</span>
+        </p>
+      </section>
+
+      <section class="section" id="incompatible_changes_200__config_options_200"><h3 class="title sectiontitle">
+        Changes to Configuration Options
+      </h3>
+
+
+        <p class="p">
+          The meaning of the <code class="ph codeph">--idle_query_timeout</code> configuration option is changed, to
+          accommodate the new <code class="ph codeph">QUERY_TIMEOUT_S</code> query option. Rather than setting an absolute
+          timeout period that applies to all queries, it now sets a maximum timeout period, which can be adjusted
+          downward for individual queries by specifying a value for the <code class="ph codeph">QUERY_TIMEOUT_S</code> query
+          option. In sessions where no <code class="ph codeph">QUERY_TIMEOUT_S</code> query option is specified, the
+          <code class="ph codeph">--idle_query_timeout</code> timeout period applies the same as in earlier versions.
+        </p>
+
+        <p class="p">
+          The <code class="ph codeph">--strict_unicode</code> option of <span class="keyword cmdname">impala-shell</span> was removed. To avoid
+          problems with Unicode values in <span class="keyword cmdname">impala-shell</span>, define the following locale setting
+          before running <span class="keyword cmdname">impala-shell</span>:
+        </p>
+<pre class="pre codeblock"><code>export LC_CTYPE=en_US.UTF-8
+</code></pre>
+
+      </section>
+
+      <section class="section" id="incompatible_changes_200__reserved_words_210"><h3 class="title sectiontitle">
+        New Reserved Words
+      </h3>
+
+        <p class="p">
+          Some new SQL syntax requires the addition of new reserved words: <code class="ph codeph">ANTI</code>,
+          <code class="ph codeph">ANALYTIC</code>, <code class="ph codeph">OVER</code>, <code class="ph codeph">PRECEDING</code>,
+          <code class="ph codeph">UNBOUNDED</code>, <code class="ph codeph">FOLLOWING</code>, <code class="ph codeph">CURRENT</code>,
+          <code class="ph codeph">ROWS</code>, <code class="ph codeph">RANGE</code>, <code class="ph codeph">CHAR</code>, <code class="ph codeph">VARCHAR</code>.
+          <span class="ph">As always, see <a class="xref" href="impala_reserved_words.html#reserved_words">Impala Reserved Words</a>
+          for the set of reserved words for the current release, and the quoting techniques to avoid name conflicts.</span>
+        </p>
+      </section>
+
+      <section class="section" id="incompatible_changes_200__output_files_200"><h3 class="title sectiontitle">
+        Changes to Data Files
+      </h3>
+
+
+        <p class="p" id="incompatible_changes_200__parquet_block_size">
+          The default Parquet block size for Impala is changed from 1 GB to 256 MB. This change could have
+          implications for the sizes of Parquet files produced by <code class="ph codeph">INSERT</code> and <code class="ph codeph">CREATE
+          TABLE AS SELECT</code> statements.
+        </p>
+        <p class="p">
+          Although older Impala releases typically produced files that were smaller than the old default size of
+          1 GB, now the file size matches more closely whatever value is specified for the
+          <code class="ph codeph">PARQUET_FILE_SIZE</code> query option. Thus, if you use a non-default value for this setting,
+          the output files could be larger than before. They still might be somewhat smaller than the specified
+          value, because Impala makes conservative estimates about the space needed to represent each column as
+          it encodes the data.
+        </p>
+        <p class="p">
+          When you do not specify an explicit value for the <code class="ph codeph">PARQUET_FILE_SIZE</code> query option,
+          Impala tries to keep the file size within the 256 MB default size, but Impala might adjust the file
+          size to be somewhat larger if needed to accommodate the layout for <dfn class="term">wide</dfn> tables, that is,
+          tables with hundreds or thousands of columns.
+        </p>
+        <p class="p">
+          This change is unlikely to affect memory usage while writing Parquet files, because Impala does not
+          pre-allocate the memory needed to hold the entire Parquet block.
+        </p>
+
+      </section>
+
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title21" id="incompatible_changes__incompatible_changes_144">
+    <h2 class="title topictitle2" id="ariaid-title21">Incompatible Changes Introduced in Impala 1.4.4</h2>
+    <div class="body conbody">
+      <p class="p">
+        No incompatible changes.
+      </p>
+
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title22" id="incompatible_changes__incompatible_changes_143">
+
+    <h2 class="title topictitle2" id="ariaid-title22">Incompatible Changes Introduced in Impala 1.4.3</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        No incompatible changes. The TLS/SSL security fix does not require any change in the way you interact with
+        Impala.
+      </p>
+
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title23" id="incompatible_changes__incompatible_changes_142">
+
+    <h2 class="title topictitle2" id="ariaid-title23">Incompatible Changes Introduced in Impala 1.4.2</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        None. Impala 1.4.2 is purely a bug-fix release. It does not include any incompatible changes.
+      </p>
+
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title24" id="incompatible_changes__incompatible_changes_141">
+
+    <h2 class="title topictitle2" id="ariaid-title24">Incompatible Changes Introduced in Impala 1.4.1</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        None. Impala 1.4.1 is purely a bug-fix release. It does not include any incompatible changes.
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title25" id="incompatible_changes__incompatible_changes_140">
+
+    <h2 class="title topictitle2" id="ariaid-title25">Incompatible Changes Introduced in Impala 1.4.0</h2>
+
+
+    <div class="body conbody">
+
+      <ul class="ul">
+        <li class="li">
+          <p class="p">
+            There is a slight change to required security privileges in the Sentry framework. To create a new
+            object, now you need the <code class="ph codeph">ALL</code> privilege on the parent object. For example, to create a
+            new table, view, or function requires having the <code class="ph codeph">ALL</code> privilege on the database
+            containing the new object. See <a class="xref" href="impala_authorization.html">Enabling Sentry Authorization for Impala</a> for a full list of operations and
+            associated privileges.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            With the ability of <code class="ph codeph">ORDER BY</code> queries to process unlimited amounts of data with no
+            <code class="ph codeph">LIMIT</code> clause, the query options <code class="ph codeph">DEFAULT_ORDER_BY_LIMIT</code> and
+            <code class="ph codeph">ABORT_ON_DEFAULT_LIMIT_EXCEEDED</code> are now deprecated and have no effect.
+            <span class="ph">See <a class="xref" href="impala_order_by.html#order_by">ORDER BY Clause</a> for details about improvements to
+            the <code class="ph codeph">ORDER BY</code> clause.</span>
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            There are some changes to the list of reserved words. <span class="ph">See
+            <a class="xref" href="impala_reserved_words.html#reserved_words">Impala Reserved Words</a> for the most current list.</span> The following
+            keywords are new:
+          </p>
+          <ul class="ul">
+            <li class="li">
+              <code class="ph codeph">API_VERSION</code>
+            </li>
+
+            <li class="li">
+              <code class="ph codeph">BINARY</code>
+            </li>
+
+            <li class="li">
+              <code class="ph codeph">CACHED</code>
+            </li>
+
+            <li class="li">
+              <code class="ph codeph">CLASS</code>
+            </li>
+
+            <li class="li">
+              <code class="ph codeph">PARTITIONS</code>
+            </li>
+
+            <li class="li">
+              <code class="ph codeph">PRODUCED</code>
+            </li>
+
+            <li class="li">
+              <code class="ph codeph">UNCACHED</code>
+            </li>
+          </ul>
+          <p class="p">
+            The following were formerly reserved keywords, but are no longer reserved:
+          </p>
+          <ul class="ul">
+            <li class="li">
+              <code class="ph codeph">COUNT</code>
+            </li>
+
+            <li class="li">
+              <code class="ph codeph">GROUP_CONCAT</code>
+            </li>
+
+            <li class="li">
+              <code class="ph codeph">NDV</code>
+            </li>
+
+            <li class="li">
+              <code class="ph codeph">SUM</code>
+            </li>
+          </ul>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            The fix for issue
+            <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-973" target="_blank">IMPALA-973</a>
+            changes the behavior of the <code class="ph codeph">INVALIDATE METADATA</code> statement regarding nonexistent
+            tables. In Impala 1.4.0 and higher, the statement returns an error if the specified table is not in the
+            metastore database at all. It completes successfully if the specified table is in the metastore
+            database but not yet recognized by Impala, for example if the table was created through Hive. Formerly,
+            you could issue this statement for a completely nonexistent table, with no error.
+          </p>
+        </li>
+      </ul>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title26" id="incompatible_changes__incompatible_changes_133">
+
+    <h2 class="title topictitle2" id="ariaid-title26">Incompatible Changes Introduced in Impala 1.3.3</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        No incompatible changes. The TLS/SSL security fix does not require any change in the way you interact with
+        Impala.
+      </p>
+
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title27" id="incompatible_changes__incompatible_changes_132">
+
+    <h2 class="title topictitle2" id="ariaid-title27">Incompatible Changes Introduced in Impala 1.3.2</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        With the fix for IMPALA-1019, you can use HDFS caching for files that are accessed by Impala.
+      </p>
+
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title28" id="incompatible_changes__incompatible_changes_131">
+
+    <h2 class="title topictitle2" id="ariaid-title28">Incompatible Changes Introduced in Impala 1.3.1</h2>
+
+    <div class="body conbody">
+
+      <ul class="ul">
+        <li class="li">
+          <p class="p">
+        In Impala 1.3.1 and higher, the <code class="ph codeph">REGEXP</code> and <code class="ph codeph">RLIKE</code> operators now match a
+        regular expression string that occurs anywhere inside the target string, the same as if the regular
+        expression was enclosed on each side by <code class="ph codeph">.*</code>. See
+        <a class="xref" href="../shared/../topics/impala_operators.html#regexp">REGEXP Operator</a> for examples. Previously, these operators only
+        succeeded when the regular expression matched the entire target string. This change improves compatibility
+        with the regular expression support for popular database systems. There is no change to the behavior of the
+        <code class="ph codeph">regexp_extract()</code> and <code class="ph codeph">regexp_replace()</code> built-in functions.
+      </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            The result set for the <code class="ph codeph">SHOW FUNCTIONS</code> statement includes a new first column, with the
+            data type of the return value. <span class="ph">See <a class="xref" href="impala_show.html#show">SHOW Statement</a> for
+            examples.</span>
+          </p>
+        </li>
+      </ul>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title29" id="incompatible_changes__incompatible_changes_130">
+
+    <h2 class="title topictitle2" id="ariaid-title29">Incompatible Changes Introduced in Impala 1.3.0</h2>
+
+    <div class="body conbody">
+
+      <ul class="ul">
+        <li class="li">
+          <p class="p">
+            The <code class="ph codeph">EXPLAIN_LEVEL</code> query option now accepts numeric options from 0 (most concise) to 3
+            (most verbose), rather than only 0 or 1. If you formerly used <code class="ph codeph">SET EXPLAIN_LEVEL=1</code> to
+            get detailed explain plans, switch to <code class="ph codeph">SET EXPLAIN_LEVEL=3</code>. If you used the mnemonic
+            keyword (<code class="ph codeph">SET EXPLAIN_LEVEL=verbose</code>), you do not need to change your code because now
+            level 3 corresponds to <code class="ph codeph">verbose</code>. <span class="ph">See
+            <a class="xref" href="impala_explain_level.html#explain_level">EXPLAIN_LEVEL Query Option</a> for details about the allowed explain levels, and
+            <a class="xref" href="impala_explain_plan.html#explain_plan">Understanding Impala Query Performance - EXPLAIN Plans and Query Profiles</a> for usage information.</span>
+          </p>
+        </li>
+
+        <li class="li">
+          <div class="p">
+            The keyword <code class="ph codeph">DECIMAL</code> is now a reserved word. If you have any databases, tables,
+            columns, or other objects already named <code class="ph codeph">DECIMAL</code>, quote any references to them using
+            backticks (<code class="ph codeph">``</code>) to avoid name conflicts with the keyword.
+            <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+              Although the <code class="ph codeph">DECIMAL</code> keyword is a reserved word, currently Impala does not support
+              <code class="ph codeph">DECIMAL</code> as a data type for columns.
+            </div>
+          </div>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            The query option formerly named <code class="ph codeph">YARN_POOL</code> is now named
+            <code class="ph codeph">REQUEST_POOL</code> to reflect its broader use with the Impala admission control feature.
+            <span class="ph">See <a class="xref" href="impala_request_pool.html#request_pool">REQUEST_POOL Query Option</a> for information about the
+            option, and <a class="xref" href="impala_admission.html#admission_control">Admission Control and Query Queuing</a> for details about its use with the
+            admission control feature.</span>
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            There are some changes to the list of reserved words. <span class="ph">See
+            <a class="xref" href="impala_reserved_words.html#reserved_words">Impala Reserved Words</a> for the most current list.</span>
+          </p>
+          <ul class="ul">
+            <li class="li">
+              <p class="p">
+                The names of aggregate functions are no longer reserved words, so you can have databases, tables,
+                columns, or other objects named <code class="ph codeph">AVG</code>, <code class="ph codeph">MIN</code>, and so on without any
+                name conflicts.
+              </p>
+            </li>
+
+            <li class="li">
+              <p class="p">
+                The internal function names <code class="ph codeph">DISTINCTPC</code> and <code class="ph codeph">DISTINCTPCSA</code> are no
+                longer reserved words, although <code class="ph codeph">DISTINCT</code> is still a reserved word.
+              </p>
+            </li>
+
+            <li class="li">
+              <p class="p">
+                The keywords <code class="ph codeph">CLOSE_FN</code> and <code class="ph codeph">PREPARE_FN</code> are now reserved words.
+                <span class="ph">See <a class="xref" href="impala_create_function.html#create_function">CREATE FUNCTION Statement</a> for their role in
+                the <code class="ph codeph">CREATE FUNCTION</code> statement, and <a class="xref" href="impala_udf.html#udf_threads">Thread-Safe Work Area for UDFs</a> for
+                usage information.</span>
+              </p>
+            </li>
+          </ul>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            The HDFS property <code class="ph codeph">dfs.client.file-block-storage-locations.timeout</code> was renamed to
+            <code class="ph codeph">dfs.client.file-block-storage-locations.timeout.millis</code>, to emphasize that the unit of
+            measure is milliseconds, not seconds. Impala requires a timeout of at least 10 seconds, making the
+            minimum value for this setting 10000. If you are not using cluster management software, you might need to
+            edit the <span class="ph filepath">hdfs-site.xml</span> file in the Impala configuration directory for the new name
+            and minimum value.
+          </p>
+        </li>
+      </ul>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title30" id="incompatible_changes__incompatible_changes_124">
+
+    <h2 class="title topictitle2" id="ariaid-title30">Incompatible Changes Introduced in Impala 1.2.4</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        There are no incompatible changes introduced in Impala 1.2.4.
+      </p>
+
+      <p class="p">
+        Previously, after creating a table in Hive, you had to issue the <code class="ph codeph">INVALIDATE METADATA</code>
+        statement with no table name, a potentially expensive operation on clusters with many databases, tables,
+        and partitions. Starting in Impala 1.2.4, you can issue the statement <code class="ph codeph">INVALIDATE METADATA
+        <var class="keyword varname">table_name</var></code> for a table newly created through Hive. Loading the metadata for
+        only this one table is faster and involves less network overhead. Therefore, you might revisit your setup
+        DDL scripts to add the table name to <code class="ph codeph">INVALIDATE METADATA</code> statements, in cases where you
+        create and populate the tables through Hive before querying them through Impala.
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title31" id="incompatible_changes__incompatible_changes_123">
+
+    <h2 class="title topictitle2" id="ariaid-title31">Incompatible Changes Introduced in Impala 1.2.3</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        Because the feature set of Impala 1.2.3 is identical to Impala 1.2.2, there are no new incompatible
+        changes. See <a class="xref" href="impala_incompatible_changes.html#incompatible_changes_122">Incompatible Changes Introduced in Impala 1.2.2</a> if you are upgrading
+        from Impala 1.2.1 or 1.1.x.
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title32" id="incompatible_changes__incompatible_changes_122">
+
+    <h2 class="title topictitle2" id="ariaid-title32">Incompatible Changes Introduced in Impala 1.2.2</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        The following changes to SQL syntax and semantics in Impala 1.2.2 could require updates to your SQL code,
+        or schema objects such as tables or views:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          <p class="p">
+            With the addition of the <code class="ph codeph">CROSS JOIN</code> keyword, you might need to rewrite any queries
+            that refer to a table named <code class="ph codeph">CROSS</code> or use the name <code class="ph codeph">CROSS</code> as a table
+            alias:
+          </p>
+<pre class="pre codeblock"><code>-- Formerly, 'cross' in this query was an alias for t1
+-- and it was a normal join query.
+-- In 1.2.2 and higher, CROSS JOIN is a keyword, so 'cross'
+-- is not interpreted as a table alias, and the query
+-- uses the special CROSS JOIN processing rather than a
+-- regular join.
+select * from t1 cross join t2...
+
+-- Now if CROSS is used in other context such as a table or column name,
+-- use backticks to escape it.
+create table `cross` (x int);
+select * from `cross`;</code></pre>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            Formerly, a <code class="ph codeph">DROP DATABASE</code> statement in Impala would not remove the top-level HDFS
+            directory for that database. The <code class="ph codeph">DROP DATABASE</code> has been enhanced to remove that
+            directory. (You still need to drop all the tables inside the database first; this change only applies
+            to the top-level directory for the entire database.)
+          </p>
+        </li>
+
+        <li class="li">
+          The keyword <code class="ph codeph">PARQUET</code> is introduced as a synonym for <code class="ph codeph">PARQUETFILE</code> in the
+          <code class="ph codeph">CREATE TABLE</code> and <code class="ph codeph">ALTER TABLE</code> statements, because that is the common
+          name for the file format. (As opposed to SequenceFile and RCFile where the <span class="q">"File"</span> suffix is part of
+          the name.) Documentation examples have been changed to prefer the new shorter keyword. The
+          <code class="ph codeph">PARQUETFILE</code> keyword is still available for backward compatibility with older Impala
+          versions.
+        </li>
+
+        <li class="li">
+          New overloads are available for several operators and built-in functions, allowing you to insert their
+          result values into smaller numeric columns such as <code class="ph codeph">INT</code>, <code class="ph codeph">SMALLINT</code>,
+          <code class="ph codeph">TINYINT</code>, and <code class="ph codeph">FLOAT</code> without using a <code class="ph codeph">CAST()</code> call. If you
+          remove the <code class="ph codeph">CAST()</code> calls from <code class="ph codeph">INSERT</code> statements, those statements might
+          not work with earlier versions of Impala.
+        </li>
+      </ul>
+
+      <p class="p">
+        Because many users are likely to upgrade straight from Impala 1.x to Impala 1.2.2, also read
+        <a class="xref" href="impala_incompatible_changes.html#incompatible_changes_121">Incompatible Changes Introduced in Impala 1.2.1</a> for things to note about upgrading
+        to Impala 1.2.x in general.
+      </p>
+
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title33" id="incompatible_changes__incompatible_changes_121">
+
+    <h2 class="title topictitle2" id="ariaid-title33">Incompatible Changes Introduced in Impala 1.2.1</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        The following changes to SQL syntax and semantics in Impala 1.2.1 could require updates to your SQL code,
+        or schema objects such as tables or views:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          <p class="p">
+        In Impala 1.2.1 and higher, all <code class="ph codeph">NULL</code> values come at the end of the result set for
+        <code class="ph codeph">ORDER BY ... ASC</code> queries, and at the beginning of the result set for <code class="ph codeph">ORDER BY ...
+        DESC</code> queries. In effect, <code class="ph codeph">NULL</code> is considered greater than all other values for
+        sorting purposes. The original Impala behavior always put <code class="ph codeph">NULL</code> values at the end, even for
+        <code class="ph codeph">ORDER BY ... DESC</code> queries. The new behavior in Impala 1.2.1 makes Impala more compatible
+        with other popular database systems. In Impala 1.2.1 and higher, you can override or specify the sorting
+        behavior for <code class="ph codeph">NULL</code> by adding the clause <code class="ph codeph">NULLS FIRST</code> or <code class="ph codeph">NULLS
+        LAST</code> at the end of the <code class="ph codeph">ORDER BY</code> clause.
+      </p>
+          <p class="p">
+            See <a class="xref" href="impala_literals.html#null">NULL</a> for more information.
+          </p>
+        </li>
+      </ul>
+
+      <p class="p">
+        The new <span class="keyword cmdname">catalogd</span> service might require changes to any user-written scripts that stop,
+        start, or restart Impala services, install or upgrade Impala packages, or issue <code class="ph codeph">REFRESH</code> or
+        <code class="ph codeph">INVALIDATE METADATA</code> statements:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          <p class="p">
+            See <a class="xref" href="../shared/../topics/impala_install.html#install">Installing Impala</a>,
+            <a class="xref" href="../shared/../topics/impala_upgrading.html#upgrading">Upgrading Impala</a> and
+            <a class="xref" href="../shared/../topics/impala_processes.html#processes">Starting Impala</a>, for usage information for the
+            <span class="keyword cmdname">catalogd</span> daemon.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            The <code class="ph codeph">REFRESH</code> and <code class="ph codeph">INVALIDATE METADATA</code> statements are no longer needed
+            when the <code class="ph codeph">CREATE TABLE</code>, <code class="ph codeph">INSERT</code>, or other table-changing or
+            data-changing operation is performed through Impala. These statements are still needed if such
+            operations are done through Hive or by manipulating data files directly in HDFS, but in those cases the
+            statements only need to be issued on one Impala node rather than on all nodes. See
+            <a class="xref" href="../shared/../topics/impala_refresh.html#refresh">REFRESH Statement</a> and
+            <a class="xref" href="../shared/../topics/impala_invalidate_metadata.html#invalidate_metadata">INVALIDATE METADATA Statement</a> for the latest usage
+            information for those statements.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            See <a class="xref" href="../shared/../topics/impala_components.html#intro_catalogd">The Impala Catalog Service</a> for background information on the
+            <span class="keyword cmdname">catalogd</span> service.
+          </p>
+        </li>
+      </ul>
+
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title34" id="incompatible_changes__incompatible_changes_120">
+
+    <h2 class="title topictitle2" id="ariaid-title34">Incompatible Changes Introduced in Impala 1.2.0 (Beta)</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        There are no incompatible changes to SQL syntax in Impala 1.2.0 (beta).
+      </p>
+
+      <p class="p">
+        The new <span class="keyword cmdname">catalogd</span> service might require changes to any user-written scripts that stop,
+        start, or restart Impala services, install or upgrade Impala packages, or issue <code class="ph codeph">REFRESH</code> or
+        <code class="ph codeph">INVALIDATE METADATA</code> statements:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          <p class="p">
+            See <a class="xref" href="../shared/../topics/impala_install.html#install">Installing Impala</a>,
+            <a class="xref" href="../shared/../topics/impala_upgrading.html#upgrading">Upgrading Impala</a> and
+            <a class="xref" href="../shared/../topics/impala_processes.html#processes">Starting Impala</a>, for usage information for the
+            <span class="keyword cmdname">catalogd</span> daemon.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            The <code class="ph codeph">REFRESH</code> and <code class="ph codeph">INVALIDATE METADATA</code> statements are no longer needed
+            when the <code class="ph codeph">CREATE TABLE</code>, <code class="ph codeph">INSERT</code>, or other table-changing or
+            data-changing operation is performed through Impala. These statements are still needed if such
+            operations are done through Hive or by manipulating data files directly in HDFS, but in those cases the
+            statements only need to be issued on one Impala node rather than on all nodes. See
+            <a class="xref" href="../shared/../topics/impala_refresh.html#refresh">REFRESH Statement</a> and
+            <a class="xref" href="../shared/../topics/impala_invalidate_metadata.html#invalidate_metadata">INVALIDATE METADATA Statement</a> for the latest usage
+            information for those statements.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            See <a class="xref" href="../shared/../topics/impala_components.html#intro_catalogd">The Impala Catalog Service</a> for background information on the
+            <span class="keyword cmdname">catalogd</span> service.
+          </p>
+        </li>
+      </ul>
+
+      <p class="p">
+        The new resource management feature interacts with both YARN and Llama services.
+        <span class="ph">See
+        <a class="xref" href="impala_resource_management.html#resource_management">Resource Management for Impala</a> for usage information for Impala resource
+        management.</span>
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title35" id="incompatible_changes__incompatible_changes_111">
+
+    <h2 class="title topictitle2" id="ariaid-title35">Incompatible Changes Introduced in Impala 1.1.1</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        There are no incompatible changes in Impala 1.1.1.
+      </p>
+
+
+
+
+
+
+
+
+
+
+
+      <p class="p">
+        Previously, it was not possible to create Parquet data through Impala and reuse that table within Hive. Now
+        that Parquet support is available for Hive 10, reusing existing Impala Parquet data files in Hive requires
+        updating the table metadata. Use the following command if you are already running Impala 1.1.1:
+      </p>
+
+<pre class="pre codeblock"><code>ALTER TABLE <var class="keyword varname">table_name</var> SET FILEFORMAT PARQUETFILE;
+</code></pre>
+
+      <p class="p">
+        If you are running a level of Impala that is older than 1.1.1, do the metadata update through Hive:
+      </p>
+
+<pre class="pre codeblock"><code>ALTER TABLE <var class="keyword varname">table_name</var> SET SERDE 'parquet.hive.serde.ParquetHiveSerDe';
+ALTER TABLE <var class="keyword varname">table_name</var> SET FILEFORMAT
+  INPUTFORMAT "parquet.hive.DeprecatedParquetInputFormat"
+  OUTPUTFORMAT "parquet.hive.DeprecatedParquetOutputFormat";
+</code></pre>
+
+      <p class="p">
+        Impala 1.1.1 and higher can reuse Parquet data files created by Hive, without any action required.
+      </p>
+
+      <p class="p">
+        As usual, make sure to upgrade the Impala LZO package to the latest level at the same
+        time as you upgrade the Impala server.
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title36" id="incompatible_changes__incompatible_changes_11">
+
+    <h2 class="title topictitle2" id="ariaid-title36">Incompatible Change Introduced in Impala 1.1</h2>
+
+    <div class="body conbody">
+
+      <ul class="ul">
+        <li class="li">
+          <p class="p">
+            The <code class="ph codeph">REFRESH</code> statement now requires a table name; in Impala 1.0, the table name was
+            optional. This syntax change is part of the internal rework to make <code class="ph codeph">REFRESH</code> a true
+            Impala SQL statement so that it can be called through the JDBC and ODBC APIs. <code class="ph codeph">REFRESH</code>
+            now reloads the metadata immediately, rather than marking it for update the next time any affected
+            table is accessed. The previous behavior, where omitting the table name caused a refresh of the entire
+            Impala metadata catalog, is available through the new <code class="ph codeph">INVALIDATE METADATA</code> statement.
+            <code class="ph codeph">INVALIDATE METADATA</code> can be specified with a table name to affect a single table, or
+            without a table name to affect the entire metadata catalog; the relevant metadata is reloaded the next
+            time it is requested during the processing for a SQL statement. See
+            <a class="xref" href="impala_refresh.html#refresh">REFRESH Statement</a> and
+            <a class="xref" href="impala_invalidate_metadata.html#invalidate_metadata">INVALIDATE METADATA Statement</a> for the latest details about these
+            statements.
+          </p>
+        </li>
+      </ul>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title37" id="incompatible_changes__incompatible_changes_10">
+
+    <h2 class="title topictitle2" id="ariaid-title37">Incompatible Changes Introduced in Impala 1.0</h2>
+
+    <div class="body conbody">
+
+      <ul class="ul">
+        <li class="li">
+          If you use LZO-compressed text files, when you upgrade Impala to version 1.0, also update the
+          Impala LZO package to the latest level. See <a class="xref" href="impala_txtfile.html#lzo">Using LZO-Compressed Text Files</a> for
+          details.
+        </li>
+      </ul>
+    </div>
+  </article>
+
+</article></main></body></html>

[27/51] [partial] impala git commit: [DOCS] Impala doc site update for 3.0

Posted by mi...@apache.org.

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_invalidate_metadata.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_invalidate_metadata.html b/docs/build3x/html/topics/impala_invalidate_metadata.html
new file mode 100644
index 0000000..ae7e419
--- /dev/null
+++ b/docs/build3x/html/topics/impala_invalidate_metadata.html
@@ -0,0 +1,286 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_langref_sql.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="invalidate_metadata"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>INVALIDATE METADATA Statement</title></head><body id="invalidate_metadata"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">INVALIDATE METADATA Statement</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+
+      Marks the metadata for one or all tables as stale. Required after a table is created through the Hive shell,
+      before the table is available for Impala queries. The next time the current Impala node performs a query
+      against a table whose metadata is invalidated, Impala reloads the associated metadata before the query
+      proceeds. This is a relatively expensive operation compared to the incremental metadata update done by the
+      <code class="ph codeph">REFRESH</code> statement, so in the common scenario of adding new data files to an existing table,
+      prefer <code class="ph codeph">REFRESH</code> rather than <code class="ph codeph">INVALIDATE METADATA</code>. If you are not familiar
+      with the way Impala uses metadata and how it shares the same metastore database as Hive, see
+      <a class="xref" href="impala_hadoop.html#intro_metastore">Overview of Impala Metadata and the Metastore</a> for background information.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>INVALIDATE METADATA [[<var class="keyword varname">db_name</var>.]<var class="keyword varname">table_name</var>]</code></pre>
+
+    <p class="p">
+      By default, the cached metadata for all tables is flushed. If you specify a table name, only the metadata for
+      that one table is flushed. Even for a single table, <code class="ph codeph">INVALIDATE METADATA</code> is more expensive
+      than <code class="ph codeph">REFRESH</code>, so prefer <code class="ph codeph">REFRESH</code> in the common case where you add new data
+      files for an existing table.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Internal details:</strong>
+      </p>
+
+    <p class="p">
+      To accurately respond to queries, Impala must have current metadata about those databases and tables that
+      clients query directly. Therefore, if some other entity modifies information used by Impala in the metastore
+      that Impala and Hive share, the information cached by Impala must be updated. However, this does not mean
+      that all metadata updates require an Impala update.
+    </p>
+
+    <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+      <p class="p">
+        In Impala 1.2.4 and higher, you can specify a table name with <code class="ph codeph">INVALIDATE METADATA</code> after
+        the table is created in Hive, allowing you to make individual tables visible to Impala without doing a full
+        reload of the catalog metadata. Impala 1.2.4 also includes other changes to make the metadata broadcast
+        mechanism faster and more responsive, especially during Impala startup. See
+        <a class="xref" href="../shared/../topics/impala_new_features.html#new_features_124">New Features in Impala 1.2.4</a> for details.
+      </p>
+      <p class="p">
+        In Impala 1.2 and higher, a dedicated daemon (<span class="keyword cmdname">catalogd</span>) broadcasts DDL changes made
+        through Impala to all Impala nodes. Formerly, after you created a database or table while connected to one
+        Impala node, you needed to issue an <code class="ph codeph">INVALIDATE METADATA</code> statement on another Impala node
+        before accessing the new database or table from the other node. Now, newly created or altered objects are
+        picked up automatically by all Impala nodes. You must still use the <code class="ph codeph">INVALIDATE METADATA</code>
+        technique after creating or altering objects through Hive. See
+        <a class="xref" href="impala_components.html#intro_catalogd">The Impala Catalog Service</a> for more information on the catalog service.
+      </p>
+      <p class="p">
+        The <code class="ph codeph">INVALIDATE METADATA</code> statement is new in Impala 1.1 and higher, and takes over some of
+        the use cases of the Impala 1.0 <code class="ph codeph">REFRESH</code> statement. Because <code class="ph codeph">REFRESH</code> now
+        requires a table name parameter, to flush the metadata for all tables at once, use the <code class="ph codeph">INVALIDATE
+        METADATA</code> statement.
+      </p>
+      <p class="p">
+      Because <code class="ph codeph">REFRESH <var class="keyword varname">table_name</var></code> only works for tables that the current
+      Impala node is already aware of, when you create a new table in the Hive shell, enter
+      <code class="ph codeph">INVALIDATE METADATA <var class="keyword varname">new_table</var></code> before you can see the new table in
+      <span class="keyword cmdname">impala-shell</span>. Once the table is known by Impala, you can issue <code class="ph codeph">REFRESH
+      <var class="keyword varname">table_name</var></code> after you add data files for that table.
+    </p>
+    </div>
+
+    <p class="p">
+      <code class="ph codeph">INVALIDATE METADATA</code> and <code class="ph codeph">REFRESH</code> are counterparts: <code class="ph codeph">INVALIDATE
+      METADATA</code> waits to reload the metadata when needed for a subsequent query, but reloads all the
+      metadata for the table, which can be an expensive operation, especially for large tables with many
+      partitions. <code class="ph codeph">REFRESH</code> reloads the metadata immediately, but only loads the block location
+      data for newly added data files, making it a less expensive operation overall. If data was altered in some
+      more extensive way, such as being reorganized by the HDFS balancer, use <code class="ph codeph">INVALIDATE
+      METADATA</code> to avoid a performance penalty from reduced local reads. If you used Impala version 1.0,
+      the <code class="ph codeph">INVALIDATE METADATA</code> statement works just like the Impala 1.0 <code class="ph codeph">REFRESH</code>
+      statement did, while the Impala 1.1 <code class="ph codeph">REFRESH</code> is optimized for the common use case of adding
+      new data files to an existing table, thus the table name argument is now required.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+    <p class="p">
+      A metadata update for an <code class="ph codeph">impalad</code> instance <strong class="ph b">is</strong> required if:
+    </p>
+
+    <ul class="ul">
+      <li class="li">
+        A metadata change occurs.
+      </li>
+
+      <li class="li">
+        <strong class="ph b">and</strong> the change is made from another <code class="ph codeph">impalad</code> instance in your cluster, or through
+        Hive.
+      </li>
+
+      <li class="li">
+        <strong class="ph b">and</strong> the change is made to a metastore database to which clients such as the Impala shell or ODBC directly
+        connect.
+      </li>
+    </ul>
+
+    <p class="p">
+      A metadata update for an Impala node is <strong class="ph b">not</strong> required when you issue queries from the same Impala node
+      where you ran <code class="ph codeph">ALTER TABLE</code>, <code class="ph codeph">INSERT</code>, or other table-modifying statement.
+    </p>
+
+    <p class="p">
+      Database and table metadata is typically modified by:
+    </p>
+
+    <ul class="ul">
+      <li class="li">
+        Hive - via <code class="ph codeph">ALTER</code>, <code class="ph codeph">CREATE</code>, <code class="ph codeph">DROP</code> or
+        <code class="ph codeph">INSERT</code> operations.
+      </li>
+
+      <li class="li">
+        Impalad - via <code class="ph codeph">CREATE TABLE</code>, <code class="ph codeph">ALTER TABLE</code>, and <code class="ph codeph">INSERT</code>
+        operations.
+      </li>
+    </ul>
+
+    <p class="p">
+      <code class="ph codeph">INVALIDATE METADATA</code> causes the metadata for that table to be marked as stale, and reloaded
+      the next time the table is referenced. For a huge table, that process could take a noticeable amount of time;
+      thus you might prefer to use <code class="ph codeph">REFRESH</code> where practical, to avoid an unpredictable delay later,
+      for example if the next reference to the table is during a benchmark test.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+    <p class="p">
+      The following example shows how you might use the <code class="ph codeph">INVALIDATE METADATA</code> statement after
+      creating new tables (such as SequenceFile or HBase tables) through the Hive shell. Before the
+      <code class="ph codeph">INVALIDATE METADATA</code> statement was issued, Impala would give a <span class="q">"table not found"</span> error
+      if you tried to refer to those table names. The <code class="ph codeph">DESCRIBE</code> statements cause the latest
+      metadata to be immediately loaded for the tables, avoiding a delay the next time those tables are queried.
+    </p>
+
+<pre class="pre codeblock"><code>[impalad-host:21000] &gt; invalidate metadata;
+[impalad-host:21000] &gt; describe t1;
+...
+[impalad-host:21000] &gt; describe t2;
+... </code></pre>
+
+    <p class="p">
+      For more examples of using <code class="ph codeph">REFRESH</code> and <code class="ph codeph">INVALIDATE METADATA</code> with a
+      combination of Impala and Hive operations, see <a class="xref" href="impala_tutorial.html#tutorial_impala_hive">Switching Back and Forth Between Impala and Hive</a>.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">HDFS permissions:</strong>
+      </p>
+    <p class="p">
+      The user ID that the <span class="keyword cmdname">impalad</span> daemon runs under,
+      typically the <code class="ph codeph">impala</code> user, must have execute
+      permissions for all the relevant directories holding table data.
+      (A table could have data spread across multiple directories,
+      or in unexpected paths, if it uses partitioning or
+      specifies a <code class="ph codeph">LOCATION</code> attribute for
+      individual partitions or the entire table.)
+      Issues with permissions might not cause an immediate error for this statement,
+      but subsequent statements such as <code class="ph codeph">SELECT</code>
+      or <code class="ph codeph">SHOW TABLE STATS</code> could fail.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">HDFS considerations:</strong>
+      </p>
+
+    <p class="p">
+      By default, the <code class="ph codeph">INVALIDATE METADATA</code> command checks HDFS permissions of the underlying data
+      files and directories, caching this information so that a statement can be cancelled immediately if for
+      example the <code class="ph codeph">impala</code> user does not have permission to write to the data directory for the
+      table. (This checking does not apply when the <span class="keyword cmdname">catalogd</span> configuration option
+      <code class="ph codeph">--load_catalog_in_background</code> is set to <code class="ph codeph">false</code>, which it is by default.)
+      Impala reports any lack of write permissions as an <code class="ph codeph">INFO</code> message in the log file, in case
+      that represents an oversight. If you change HDFS permissions to make data readable or writeable by the Impala
+      user, issue another <code class="ph codeph">INVALIDATE METADATA</code> to make Impala aware of the change.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+    <p class="p">
+      This example illustrates creating a new database and new table in Hive, then doing an <code class="ph codeph">INVALIDATE
+      METADATA</code> statement in Impala using the fully qualified table name, after which both the new table
+      and the new database are visible to Impala. The ability to specify <code class="ph codeph">INVALIDATE METADATA
+      <var class="keyword varname">table_name</var></code> for a table created in Hive is a new capability in Impala 1.2.4. In
+      earlier releases, that statement would have returned an error indicating an unknown table, requiring you to
+      do <code class="ph codeph">INVALIDATE METADATA</code> with no table name, a more expensive operation that reloaded metadata
+      for all tables and databases.
+    </p>
+
+<pre class="pre codeblock"><code>$ hive
+hive&gt; create database new_db_from_hive;
+OK
+Time taken: 4.118 seconds
+hive&gt; create table new_db_from_hive.new_table_from_hive (x int);
+OK
+Time taken: 0.618 seconds
+hive&gt; quit;
+$ impala-shell
+[localhost:21000] &gt; show databases like 'new*';
+[localhost:21000] &gt; refresh new_db_from_hive.new_table_from_hive;
+ERROR: AnalysisException: Database does not exist: new_db_from_hive
+[localhost:21000] &gt; invalidate metadata new_db_from_hive.new_table_from_hive;
+[localhost:21000] &gt; show databases like 'new*';
++--------------------+
+| name               |
++--------------------+
+| new_db_from_hive   |
++--------------------+
+[localhost:21000] &gt; show tables in new_db_from_hive;
++---------------------+
+| name                |
++---------------------+
+| new_table_from_hive |
++---------------------+</code></pre>
+
+    <p class="p">
+        <strong class="ph b">Amazon S3 considerations:</strong>
+      </p>
+    <p class="p">
+        The <code class="ph codeph">REFRESH</code> and <code class="ph codeph">INVALIDATE METADATA</code> statements also cache metadata
+        for tables where the data resides in the Amazon Simple Storage Service (S3).
+        In particular, issue a <code class="ph codeph">REFRESH</code> for a table after adding or removing files
+        in the associated S3 data directory.
+        See <a class="xref" href="../shared/../topics/impala_s3.html#s3">Using Impala with the Amazon S3 Filesystem</a> for details about working with S3 tables.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Cancellation:</strong> Cannot be cancelled.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Kudu considerations:</strong>
+      </p>
+    <p class="p">
+        Much of the metadata for Kudu tables is handled by the underlying
+        storage layer. Kudu tables have less reliance on the metastore
+        database, and require less metadata caching on the Impala side.
+        For example, information about partitions in Kudu tables is managed
+        by Kudu, and Impala does not cache any block locality metadata
+        for Kudu tables.
+      </p>
+    <p class="p">
+        The <code class="ph codeph">REFRESH</code> and <code class="ph codeph">INVALIDATE METADATA</code>
+        statements are needed less frequently for Kudu tables than for
+        HDFS-backed tables. Neither statement is needed when data is
+        added to, removed, or updated in a Kudu table, even if the changes
+        are made directly to Kudu through a client program using the Kudu API.
+        Run <code class="ph codeph">REFRESH <var class="keyword varname">table_name</var></code> or
+        <code class="ph codeph">INVALIDATE METADATA <var class="keyword varname">table_name</var></code>
+        for a Kudu table only after making a change to the Kudu table schema,
+        such as adding or dropping a column, by a mechanism other than
+        Impala.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+    <p class="p">
+      <a class="xref" href="impala_hadoop.html#intro_metastore">Overview of Impala Metadata and the Metastore</a>,
+      <a class="xref" href="impala_refresh.html#refresh">REFRESH Statement</a>
+    </p>
+
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_langref_sql.html">Impala SQL Statements</a></div></div></nav></article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_isilon.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_isilon.html b/docs/build3x/html/topics/impala_isilon.html
new file mode 100644
index 0000000..b0a2a2a
--- /dev/null
+++ b/docs/build3x/html/topics/impala_isilon.html
@@ -0,0 +1,89 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="impala_isilon"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Using Impala with Isilon Storage</title></head><body id="impala_isilon"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Using Impala with Isilon Storage</h1>
+
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+
+      You can use Impala to query data files that reside on EMC Isilon storage devices, rather than in HDFS.
+      This capability allows convenient query access to a storage system where you might already be
+      managing large volumes of data. The combination of the Impala query engine and Isilon storage is
+      certified on <span class="keyword">Impala 2.2.4</span> or higher.
+    </p>
+
+    <div class="p">
+        Because the EMC Isilon storage devices use a global value for the block size
+        rather than a configurable value for each file, the <code class="ph codeph">PARQUET_FILE_SIZE</code>
+        query option has no effect when Impala inserts data into a table or partition
+        residing on Isilon storage. Use the <code class="ph codeph">isi</code> command to set the
+        default block size globally on the Isilon device. For example, to set the
+        Isilon default block size to 256 MB, the recommended size for Parquet
+        data files for Impala, issue the following command:
+<pre class="pre codeblock"><code>isi hdfs settings modify --default-block-size=256MB</code></pre>
+      </div>
+
+    <p class="p">
+      The typical use case for Impala and Isilon together is to use Isilon for the
+      default filesystem, replacing HDFS entirely. In this configuration,
+      when you create a database, table, or partition, the data always resides on
+      Isilon storage and you do not need to specify any special <code class="ph codeph">LOCATION</code>
+      attribute. If you do specify a <code class="ph codeph">LOCATION</code> attribute, its value refers
+      to a path within the Isilon filesystem.
+      For example:
+    </p>
+<pre class="pre codeblock"><code>-- If the default filesystem is Isilon, all Impala data resides there
+-- and all Impala databases and tables are located there.
+CREATE TABLE t1 (x INT, s STRING);
+
+-- You can specify LOCATION for database, table, or partition,
+-- using values from the Isilon filesystem.
+CREATE DATABASE d1 LOCATION '/some/path/on/isilon/server/d1.db';
+CREATE TABLE d1.t2 (a TINYINT, b BOOLEAN);
+</code></pre>
+
+    <p class="p">
+      Impala can write to, delete, and rename data files and database, table,
+      and partition directories on Isilon storage. Therefore, Impala statements such
+      as
+      <code class="ph codeph">CREATE TABLE</code>, <code class="ph codeph">DROP TABLE</code>,
+      <code class="ph codeph">CREATE DATABASE</code>, <code class="ph codeph">DROP DATABASE</code>,
+      <code class="ph codeph">ALTER TABLE</code>,
+      and
+      <code class="ph codeph">INSERT</code> work the same with Isilon storage as with HDFS.
+    </p>
+
+    <p class="p">
+      When the Impala spill-to-disk feature is activated by a query that approaches
+      the memory limit, Impala writes all the temporary data to a local (not Isilon)
+      storage device. Because the I/O bandwidth for the temporary data depends on
+      the number of local disks, and clusters using Isilon storage might not have
+      as many local disks attached, pay special attention on Isilon-enabled clusters
+      to any queries that use the spill-to-disk feature. Where practical, tune the
+      queries or allocate extra memory for Impala to avoid spilling.
+      Although you can specify an Isilon storage device as the destination for
+      the temporary data for the spill-to-disk feature, that configuration is
+      not recommended due to the need to transfer the data both ways using remote I/O.
+    </p>
+
+    <p class="p">
+      When tuning Impala queries on HDFS, you typically try to avoid any remote reads.
+      When the data resides on Isilon storage, all the I/O consists of remote reads.
+      Do not be alarmed when you see non-zero numbers for remote read measurements
+      in query profile output. The benefit of the Impala and Isilon integration is
+      primarily convenience of not having to move or copy large volumes of data to HDFS,
+      rather than raw query performance. You can increase the performance of Impala
+      I/O for Isilon systems by increasing the value for the
+      <code class="ph codeph">--num_remote_hdfs_io_threads</code> startup option for the
+      <span class="keyword cmdname">impalad</span> daemon.
+    </p>
+
+
+  </div>
+
+</article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_jdbc.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_jdbc.html b/docs/build3x/html/topics/impala_jdbc.html
new file mode 100644
index 0000000..33ed714
--- /dev/null
+++ b/docs/build3x/html/topics/impala_jdbc.html
@@ -0,0 +1,340 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_config.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="impala_jdbc"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Configuring Impala to Work with JDBC</title></head><body id="impala_jdbc"><main role="main"><article role="article" aria-labelledby="impala_jdbc__jdbc">
+
+  <h1 class="title topictitle1" id="impala_jdbc__jdbc">Configuring Impala to Work with JDBC</h1>
+
+
+  <div class="body conbody">
+
+    <p class="p">
+
+      Impala supports the standard JDBC interface, allowing access from commercial Business Intelligence tools and
+      custom software written in Java or other programming languages. The JDBC driver allows you to access Impala
+      from a Java program that you write, or a Business Intelligence or similar tool that uses JDBC to communicate
+      with various database products.
+    </p>
+
+    <p class="p">
+      Setting up a JDBC connection to Impala involves the following steps:
+    </p>
+
+    <ul class="ul">
+      <li class="li">
+        Verifying the communication port where the Impala daemons in your cluster are listening for incoming JDBC
+        requests.
+      </li>
+
+      <li class="li">
+        Installing the JDBC driver on every system that runs the JDBC-enabled application.
+      </li>
+
+      <li class="li">
+        Specifying a connection string for the JDBC application to access one of the servers running the
+        <span class="keyword cmdname">impalad</span> daemon, with the appropriate security settings.
+      </li>
+    </ul>
+
+    <p class="p toc inpage"></p>
+  </div>
+
+  <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_config.html">Managing Impala</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="impala_jdbc__jdbc_port">
+
+    <h2 class="title topictitle2" id="ariaid-title2">Configuring the JDBC Port</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        The default port used by JDBC 2.0 and later (as well as ODBC 2.x) is 21050. Impala server accepts JDBC
+        connections through this same port 21050 by default. Make sure this port is available for communication
+        with other hosts on your network, for example, that it is not blocked by firewall software. If your JDBC
+        client software connects to a different port, specify that alternative port number with the
+        <code class="ph codeph">--hs2_port</code> option when starting <code class="ph codeph">impalad</code>. See
+        <a class="xref" href="impala_processes.html#processes">Starting Impala</a> for details about Impala startup options. See
+        <a class="xref" href="impala_ports.html#ports">Ports Used by Impala</a> for information about all ports used for communication between Impala
+        and clients or between Impala components.
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="impala_jdbc__jdbc_driver_choice">
+
+    <h2 class="title topictitle2" id="ariaid-title3">Choosing the JDBC Driver</h2>
+
+
+    <div class="body conbody">
+
+      <p class="p">
+        In Impala 2.0 and later, you can use the Hive 0.13 JDBC driver.  If you are
+        already using JDBC applications with an earlier Impala release, you should update
+        your JDBC driver, because the Hive 0.12 driver that was formerly the only choice
+        is not compatible with Impala 2.0 and later.
+      </p>
+
+      <p class="p">
+        The Hive JDBC driver provides a substantial speed increase for JDBC
+        applications with Impala 2.0 and higher, for queries that return large result sets.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Complex type considerations:</strong>
+      </p>
+
+      <p class="p">
+        The Impala complex types (<code class="ph codeph">STRUCT</code>, <code class="ph codeph">ARRAY</code>, or <code class="ph codeph">MAP</code>)
+        are available in <span class="keyword">Impala 2.3</span> and higher.
+        To use these types with JDBC requires version 2.5.28 or higher of the JDBC Connector for Impala.
+        To use these types with ODBC requires version 2.5.30 or higher of the ODBC Connector for Impala.
+        Consider upgrading all JDBC and ODBC drivers at the same time you upgrade from <span class="keyword">Impala 2.3</span> or higher.
+      </p>
+      <p class="p">
+        Although the result sets from queries involving complex types consist of all scalar values,
+        the queries involve join notation and column references that might not be understood by
+        a particular JDBC or ODBC connector. Consider defining a view that represents the
+        flattened version of a table containing complex type columns, and pointing the JDBC
+        or ODBC application at the view.
+        See <a class="xref" href="../shared/../topics/impala_complex_types.html#complex_types">Complex Types (Impala 2.3 or higher only)</a> for details.
+      </p>
+
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="impala_jdbc__jdbc_setup">
+
+    <h2 class="title topictitle2" id="ariaid-title4">Enabling Impala JDBC Support on Client Systems</h2>
+
+
+    <div class="body conbody">
+
+      <section class="section" id="jdbc_setup__install_hive_driver"><h3 class="title sectiontitle">Using the Hive JDBC Driver</h3>
+
+        <p class="p">
+          You install the Hive JDBC driver (<code class="ph codeph">hive-jdbc</code> package) through the Linux package manager, on
+          hosts within the cluster. The driver consists of several Java JAR files. The same driver can be used by Impala and Hive.
+        </p>
+
+        <p class="p">
+          To get the JAR files, install the Hive JDBC driver on each host in the cluster that will run
+          JDBC applications.
+        </p>
+
+        <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+          The latest JDBC driver, corresponding to Hive 0.13, provides substantial performance improvements for
+          Impala queries that return large result sets. Impala 2.0 and later are compatible with the Hive 0.13
+          driver. If you already have an older JDBC driver installed, and are running Impala 2.0 or higher, consider
+          upgrading to the latest Hive JDBC driver for best performance with JDBC applications.
+        </div>
+
+        <p class="p">
+          If you are using JDBC-enabled applications on hosts outside the cluster, you cannot use the the same install
+          procedure on the hosts. Install the JDBC driver on at least one cluster host using the preceding
+          procedure. Then download the JAR files to each client machine that will use JDBC with Impala:
+        </p>
+
+  <pre class="pre codeblock"><code>commons-logging-X.X.X.jar
+  hadoop-common.jar
+  hive-common-X.XX.X.jar
+  hive-jdbc-X.XX.X.jar
+  hive-metastore-X.XX.X.jar
+  hive-service-X.XX.X.jar
+  httpclient-X.X.X.jar
+  httpcore-X.X.X.jar
+  libfb303-X.X.X.jar
+  libthrift-X.X.X.jar
+  log4j-X.X.XX.jar
+  slf4j-api-X.X.X.jar
+  slf4j-logXjXX-X.X.X.jar
+  </code></pre>
+
+        <p class="p">
+          <strong class="ph b">To enable JDBC support for Impala on the system where you run the JDBC application:</strong>
+        </p>
+
+        <ol class="ol">
+          <li class="li">
+            Download the JAR files listed above to each client machine.
+            <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+              For Maven users, see
+              <a class="xref" href="https://github.com/onefoursix/Cloudera-Impala-JDBC-Example" target="_blank">this sample github page</a> for an example of the
+              dependencies you could add to a <code class="ph codeph">pom</code> file instead of downloading the individual JARs.
+            </div>
+          </li>
+
+          <li class="li">
+            Store the JAR files in a location of your choosing, ideally a directory already referenced in your
+            <code class="ph codeph">CLASSPATH</code> setting. For example:
+            <ul class="ul">
+              <li class="li">
+                On Linux, you might use a location such as <code class="ph codeph">/opt/jars/</code>.
+              </li>
+
+              <li class="li">
+                On Windows, you might use a subdirectory underneath <span class="ph filepath">C:\Program Files</span>.
+              </li>
+            </ul>
+          </li>
+
+          <li class="li">
+            To successfully load the Impala JDBC driver, client programs must be able to locate the associated JAR
+            files. This often means setting the <code class="ph codeph">CLASSPATH</code> for the client process to include the
+            JARs. Consult the documentation for your JDBC client for more details on how to install new JDBC drivers,
+            but some examples of how to set <code class="ph codeph">CLASSPATH</code> variables include:
+            <ul class="ul">
+              <li class="li">
+                On Linux, if you extracted the JARs to <code class="ph codeph">/opt/jars/</code>, you might issue the following
+                command to prepend the JAR files path to an existing classpath:
+  <pre class="pre codeblock"><code>export CLASSPATH=/opt/jars/*.jar:$CLASSPATH</code></pre>
+              </li>
+
+              <li class="li">
+                On Windows, use the <strong class="ph b">System Properties</strong> control panel item to modify the <strong class="ph b">Environment
+                Variables</strong> for your system. Modify the environment variables to include the path to which you
+                extracted the files.
+                <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+                  If the existing <code class="ph codeph">CLASSPATH</code> on your client machine refers to some older version of
+                  the Hive JARs, ensure that the new JARs are the first ones listed. Either put the new JAR files
+                  earlier in the listings, or delete the other references to Hive JAR files.
+                </div>
+              </li>
+            </ul>
+          </li>
+        </ol>
+      </section>
+
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title5" id="impala_jdbc__jdbc_connect">
+
+    <h2 class="title topictitle2" id="ariaid-title5">Establishing JDBC Connections</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        The JDBC driver class depends on which driver you select.
+      </p>
+
+      <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+        If your JDBC or ODBC application connects to Impala through a load balancer such as
+        <code class="ph codeph">haproxy</code>, be cautious about reusing the connections. If the load balancer has set up
+        connection timeout values, either check the connection frequently so that it never sits idle longer than
+        the load balancer timeout value, or check the connection validity before using it and create a new one if
+        the connection has been closed.
+      </div>
+
+      <section class="section" id="jdbc_connect__class_hive_driver"><h3 class="title sectiontitle">Using the Hive JDBC Driver</h3>
+
+
+      <p class="p">
+        For example, with the Hive JDBC driver, the class name is <code class="ph codeph">org.apache.hive.jdbc.HiveDriver</code>.
+        Once you have configured Impala to work with JDBC, you can establish connections between the two.
+        To do so for a cluster that does not use
+        Kerberos authentication, use a connection string of the form
+        <code class="ph codeph">jdbc:hive2://<var class="keyword varname">host</var>:<var class="keyword varname">port</var>/;auth=noSasl</code>.
+
+        For example, you might use:
+      </p>
+
+<pre class="pre codeblock"><code>jdbc:hive2://myhost.example.com:21050/;auth=noSasl</code></pre>
+
+      <p class="p">
+        To connect to an instance of Impala that requires Kerberos authentication, use a connection string of the
+        form
+        <code class="ph codeph">jdbc:hive2://<var class="keyword varname">host</var>:<var class="keyword varname">port</var>/;principal=<var class="keyword varname">principal_name</var></code>.
+        The principal must be the same user principal you used when starting Impala. For example, you might use:
+      </p>
+
+<pre class="pre codeblock"><code>jdbc:hive2://myhost.example.com:21050/;principal=impala/myhost.example.com@H2.EXAMPLE.COM</code></pre>
+
+      <p class="p">
+        To connect to an instance of Impala that requires LDAP authentication, use a connection string of the form
+        <code class="ph codeph">jdbc:hive2://<var class="keyword varname">host</var>:<var class="keyword varname">port</var>/<var class="keyword varname">db_name</var>;user=<var class="keyword varname">ldap_userid</var>;password=<var class="keyword varname">ldap_password</var></code>.
+        For example, you might use:
+      </p>
+
+<pre class="pre codeblock"><code>jdbc:hive2://myhost.example.com:21050/test_db;user=fred;password=xyz123</code></pre>
+
+      <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+          <p class="p">
+        Prior to <span class="keyword">Impala 2.5</span>, the Hive JDBC driver did not support connections that use both Kerberos authentication
+        and SSL encryption. If your cluster is running an older release that has this restriction,
+        use an alternative JDBC driver that supports
+        both of these security features.
+      </p>
+      </div>
+
+      </section>
+
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title6" id="impala_jdbc__jdbc_odbc_notes">
+    <h2 class="title topictitle2" id="ariaid-title6">Notes about JDBC and ODBC Interaction with Impala SQL Features</h2>
+    <div class="body conbody">
+      <p class="p">
+        Most Impala SQL features work equivalently through the <span class="keyword cmdname">impala-shell</span> interpreter
+        of the JDBC or ODBC APIs. The following are some exceptions to keep in mind when switching between
+        the interactive shell and applications using the APIs:
+      </p>
+      <ul class="ul">
+        <li class="li">
+          <p class="p">
+        <strong class="ph b">Complex type considerations:</strong>
+      </p>
+          <ul class="ul">
+          <li class="li">
+          <p class="p">
+            Queries involving the complex types (<code class="ph codeph">ARRAY</code>, <code class="ph codeph">STRUCT</code>, and <code class="ph codeph">MAP</code>)
+            require notation that might not be available in all levels of JDBC and ODBC drivers.
+            If you have trouble querying such a table due to the driver level or
+            inability to edit the queries used by the application, you can create a view that exposes
+            a <span class="q">"flattened"</span> version of the complex columns and point the application at the view.
+            See <a class="xref" href="impala_complex_types.html#complex_types">Complex Types (Impala 2.3 or higher only)</a> for details.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            The complex types available in <span class="keyword">Impala 2.3</span> and higher are supported by the
+            JDBC <code class="ph codeph">getColumns()</code> API.
+            Both <code class="ph codeph">MAP</code> and <code class="ph codeph">ARRAY</code> are reported as the JDBC SQL Type <code class="ph codeph">ARRAY</code>,
+            because this is the closest matching Java SQL type. This behavior is consistent with Hive.
+            <code class="ph codeph">STRUCT</code> types are reported as the JDBC SQL Type <code class="ph codeph">STRUCT</code>.
+          </p>
+          <div class="p">
+            To be consistent with Hive's behavior, the TYPE_NAME field is populated
+            with the primitive type name for scalar types, and with the full <code class="ph codeph">toSql()</code>
+            for complex types. The resulting type names are somewhat inconsistent,
+            because nested types are printed differently than top-level types. For example,
+            the following list shows how <code class="ph codeph">toSQL()</code> for Impala types are
+            translated to <code class="ph codeph">TYPE_NAME</code> values:
+<pre class="pre codeblock"><code>DECIMAL(10,10)         becomes  DECIMAL
+CHAR(10)               becomes  CHAR
+VARCHAR(10)            becomes  VARCHAR
+ARRAY&lt;DECIMAL(10,10)&gt;  becomes  ARRAY&lt;DECIMAL(10,10)&gt;
+ARRAY&lt;CHAR(10)&gt;        becomes  ARRAY&lt;CHAR(10)&gt;
+ARRAY&lt;VARCHAR(10)&gt;     becomes  ARRAY&lt;VARCHAR(10)&gt;
+
+</code></pre>
+          </div>
+          </li>
+        </ul>
+        </li>
+      </ul>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title7" id="impala_jdbc__jdbc_kudu">
+    <h2 class="title topictitle2" id="ariaid-title7">Kudu Considerations for DML Statements</h2>
+    <div class="body conbody">
+      <p class="p">
+        Currently, Impala <code class="ph codeph">INSERT</code>, <code class="ph codeph">UPDATE</code>, or
+        other DML statements issued through the JDBC interface against a Kudu
+        table do not return JDBC error codes for conditions such as duplicate
+        primary key columns. Therefore, for applications that issue a high
+        volume of DML statements, prefer to use the Kudu Java API directly
+        rather than a JDBC application.
+      </p>
+    </div>
+  </article>
+
+</article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_joins.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_joins.html b/docs/build3x/html/topics/impala_joins.html
new file mode 100644
index 0000000..51ccf6b
--- /dev/null
+++ b/docs/build3x/html/topics/impala_joins.html
@@ -0,0 +1,531 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_select.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="joins"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Joins in Impala SELECT Statements</title></head><body id="joins"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Joins in Impala SELECT Statements</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+
+      A join query is a <code class="ph codeph">SELECT</code> statement that combines data from two or more tables,
+      and returns a result set containing items from some or all of those tables. It is a way to
+      cross-reference and correlate related data that is organized into multiple tables, typically
+      using identifiers that are repeated in each of the joined tables.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+    <p class="p">
+        Impala supports a wide variety of <code class="ph codeph">JOIN</code> clauses. Left, right, semi, full, and outer joins
+        are supported in all Impala versions. The <code class="ph codeph">CROSS JOIN</code> operator is available in Impala 1.2.2
+        and higher. During performance tuning, you can override the reordering of join clauses that Impala does
+        internally by including the keyword <code class="ph codeph">STRAIGHT_JOIN</code> immediately after the
+        <code class="ph codeph">SELECT</code> and any <code class="ph codeph">DISTINCT</code> or <code class="ph codeph">ALL</code> keywords.
+      </p>
+
+<pre class="pre codeblock"><code>SELECT <var class="keyword varname">select_list</var> FROM
+  <var class="keyword varname">table_or_subquery1</var> [INNER] JOIN <var class="keyword varname">table_or_subquery2</var> |
+  <var class="keyword varname">table_or_subquery1</var> {LEFT [OUTER] | RIGHT [OUTER] | FULL [OUTER]} JOIN <var class="keyword varname">table_or_subquery2</var> |
+  <var class="keyword varname">table_or_subquery1</var> {LEFT | RIGHT} SEMI JOIN <var class="keyword varname">table_or_subquery2</var> |
+  <span class="ph"><var class="keyword varname">table_or_subquery1</var> {LEFT | RIGHT} ANTI JOIN <var class="keyword varname">table_or_subquery2</var> |</span>
+    [ ON <var class="keyword varname">col1</var> = <var class="keyword varname">col2</var> [AND <var class="keyword varname">col3</var> = <var class="keyword varname">col4</var> ...] |
+      USING (<var class="keyword varname">col1</var> [, <var class="keyword varname">col2</var> ...]) ]
+  [<var class="keyword varname">other_join_clause</var> ...]
+[ WHERE <var class="keyword varname">where_clauses</var> ]
+
+SELECT <var class="keyword varname">select_list</var> FROM
+  <var class="keyword varname">table_or_subquery1</var>, <var class="keyword varname">table_or_subquery2</var> [, <var class="keyword varname">table_or_subquery3</var> ...]
+  [<var class="keyword varname">other_join_clause</var> ...]
+WHERE
+    <var class="keyword varname">col1</var> = <var class="keyword varname">col2</var> [AND <var class="keyword varname">col3</var> = <var class="keyword varname">col4</var> ...]
+
+SELECT <var class="keyword varname">select_list</var> FROM
+  <var class="keyword varname">table_or_subquery1</var> CROSS JOIN <var class="keyword varname">table_or_subquery2</var>
+  [<var class="keyword varname">other_join_clause</var> ...]
+[ WHERE <var class="keyword varname">where_clauses</var> ]</code></pre>
+
+    <p class="p">
+      <strong class="ph b">SQL-92 and SQL-89 Joins:</strong>
+    </p>
+
+    <p class="p">
+      Queries with the explicit <code class="ph codeph">JOIN</code> keywords are known as SQL-92 style joins, referring to the
+      level of the SQL standard where they were introduced. The corresponding <code class="ph codeph">ON</code> or
+      <code class="ph codeph">USING</code> clauses clearly show which columns are used as the join keys in each case:
+    </p>
+
+<pre class="pre codeblock"><code>SELECT t1.c1, t2.c2 FROM <strong class="ph b">t1 JOIN t2</strong>
+  <strong class="ph b">ON t1.id = t2.id and t1.type_flag = t2.type_flag</strong>
+  WHERE t1.c1 &gt; 100;
+
+SELECT t1.c1, t2.c2 FROM <strong class="ph b">t1 JOIN t2</strong>
+  <strong class="ph b">USING (id, type_flag)</strong>
+  WHERE t1.c1 &gt; 100;</code></pre>
+
+    <p class="p">
+      The <code class="ph codeph">ON</code> clause is a general way to compare columns across the two tables, even if the column
+      names are different. The <code class="ph codeph">USING</code> clause is a shorthand notation for specifying the join
+      columns, when the column names are the same in both tables. You can code equivalent <code class="ph codeph">WHERE</code>
+      clauses that compare the columns, instead of <code class="ph codeph">ON</code> or <code class="ph codeph">USING</code> clauses, but that
+      practice is not recommended because mixing the join comparisons with other filtering clauses is typically
+      less readable and harder to maintain.
+    </p>
+
+    <p class="p">
+      Queries with a comma-separated list of tables and subqueries are known as SQL-89 style joins. In these
+      queries, the equality comparisons between columns of the joined tables go in the <code class="ph codeph">WHERE</code>
+      clause alongside other kinds of comparisons. This syntax is easy to learn, but it is also easy to
+      accidentally remove a <code class="ph codeph">WHERE</code> clause needed for the join to work correctly.
+    </p>
+
+<pre class="pre codeblock"><code>SELECT t1.c1, t2.c2 FROM <strong class="ph b">t1, t2</strong>
+  WHERE
+  <strong class="ph b">t1.id = t2.id AND t1.type_flag = t2.type_flag</strong>
+  AND t1.c1 &gt; 100;</code></pre>
+
+    <p class="p">
+      <strong class="ph b">Self-joins:</strong>
+    </p>
+
+    <p class="p">
+      Impala can do self-joins, for example to join on two different columns in the same table to represent
+      parent-child relationships or other tree-structured data. There is no explicit syntax for this; just use the
+      same table name for both the left-hand and right-hand table, and assign different table aliases to use when
+      referring to the fully qualified column names:
+    </p>
+
+<pre class="pre codeblock"><code>-- Combine fields from both parent and child rows.
+SELECT lhs.id, rhs.parent, lhs.c1, rhs.c2 FROM tree_data lhs, tree_data rhs WHERE lhs.id = rhs.parent;</code></pre>
+
+    <p class="p">
+      <strong class="ph b">Cartesian joins:</strong>
+    </p>
+
+    <div class="p">
+      To avoid producing huge result sets by mistake, Impala does not allow Cartesian joins of the form:
+<pre class="pre codeblock"><code>SELECT ... FROM t1 JOIN t2;
+SELECT ... FROM t1, t2;</code></pre>
+      If you intend to join the tables based on common values, add <code class="ph codeph">ON</code> or <code class="ph codeph">WHERE</code>
+      clauses to compare columns across the tables. If you truly intend to do a Cartesian join, use the
+      <code class="ph codeph">CROSS JOIN</code> keyword as the join operator. The <code class="ph codeph">CROSS JOIN</code> form does not use
+      any <code class="ph codeph">ON</code> clause, because it produces a result set with all combinations of rows from the
+      left-hand and right-hand tables. The result set can still be filtered by subsequent <code class="ph codeph">WHERE</code>
+      clauses. For example:
+    </div>
+
+<pre class="pre codeblock"><code>SELECT ... FROM t1 CROSS JOIN t2;
+SELECT ... FROM t1 CROSS JOIN t2 WHERE <var class="keyword varname">tests_on_non_join_columns</var>;</code></pre>
+
+    <p class="p">
+      <strong class="ph b">Inner and outer joins:</strong>
+    </p>
+
+    <p class="p">
+      An inner join is the most common and familiar type: rows in the result set contain the requested columns from
+      the appropriate tables, for all combinations of rows where the join columns of the tables have identical
+      values. If a column with the same name occurs in both tables, use a fully qualified name or a column alias to
+      refer to the column in the select list or other clauses. Impala performs inner joins by default for both
+      SQL-89 and SQL-92 join syntax:
+    </p>
+
+<pre class="pre codeblock"><code>-- The following 3 forms are all equivalent.
+SELECT t1.id, c1, c2 FROM t1, t2 WHERE t1.id = t2.id;
+SELECT t1.id, c1, c2 FROM t1 JOIN t2 ON t1.id = t2.id;
+SELECT t1.id, c1, c2 FROM t1 INNER JOIN t2 ON t1.id = t2.id;</code></pre>
+
+    <p class="p">
+      An outer join retrieves all rows from the left-hand table, or the right-hand table, or both; wherever there
+      is no matching data in the table on the other side of the join, the corresponding columns in the result set
+      are set to <code class="ph codeph">NULL</code>. To perform an outer join, include the <code class="ph codeph">OUTER</code> keyword in the
+      join operator, along with either <code class="ph codeph">LEFT</code>, <code class="ph codeph">RIGHT</code>, or <code class="ph codeph">FULL</code>:
+    </p>
+
+<pre class="pre codeblock"><code>SELECT * FROM t1 LEFT OUTER JOIN t2 ON t1.id = t2.id;
+SELECT * FROM t1 RIGHT OUTER JOIN t2 ON t1.id = t2.id;
+SELECT * FROM t1 FULL OUTER JOIN t2 ON t1.id = t2.id;</code></pre>
+
+    <p class="p">
+      For outer joins, Impala requires SQL-92 syntax; that is, the <code class="ph codeph">JOIN</code> keyword instead of
+      comma-separated table names. Impala does not support vendor extensions such as <code class="ph codeph">(+)</code> or
+      <code class="ph codeph">*=</code> notation for doing outer joins with SQL-89 query syntax.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Equijoins and Non-Equijoins:</strong>
+    </p>
+
+    <p class="p">
+      By default, Impala requires an equality comparison between the left-hand and right-hand tables, either
+      through <code class="ph codeph">ON</code>, <code class="ph codeph">USING</code>, or <code class="ph codeph">WHERE</code> clauses. These types of
+      queries are classified broadly as equijoins. Inner, outer, full, and semi joins can all be equijoins based on
+      the presence of equality tests between columns in the left-hand and right-hand tables.
+    </p>
+
+    <p class="p">
+      In Impala 1.2.2 and higher, non-equijoin queries are also possible, with comparisons such as
+      <code class="ph codeph">!=</code> or <code class="ph codeph">&lt;</code> between the join columns. These kinds of queries require care to
+      avoid producing huge result sets that could exceed resource limits. Once you have planned a non-equijoin
+      query that produces a result set of acceptable size, you can code the query using the <code class="ph codeph">CROSS
+      JOIN</code> operator, and add the extra comparisons in the <code class="ph codeph">WHERE</code> clause:
+    </p>
+
+<pre class="pre codeblock"><code>SELECT * FROM t1 CROSS JOIN t2 WHERE t1.total &gt; t2.maximum_price;</code></pre>
+
+    <p class="p">
+      In <span class="keyword">Impala 2.3</span> and higher, additional non-equijoin queries are possible due to the addition
+      of nested loop joins. These queries typically involve <code class="ph codeph">SEMI JOIN</code>,
+      <code class="ph codeph">ANTI JOIN</code>, or <code class="ph codeph">FULL OUTER JOIN</code> clauses.
+      Impala sometimes also uses nested loop joins internally when evaluating <code class="ph codeph">OUTER JOIN</code>
+      queries involving complex type columns.
+      Query phases involving nested loop joins do not use the spill-to-disk mechanism if they
+      exceed the memory limit. Impala decides internally when to use each join mechanism; you cannot
+      specify any query hint to choose between the nested loop join or the original hash join algorithm.
+    </p>
+
+<pre class="pre codeblock"><code>SELECT * FROM t1 LEFT OUTER JOIN t2 ON t1.int_col &lt; t2.int_col;</code></pre>
+
+    <p class="p">
+      <strong class="ph b">Semi-joins:</strong>
+    </p>
+
+    <p class="p">
+      Semi-joins are a relatively rarely used variation. With the left semi-join, only data from the left-hand
+      table is returned, for rows where there is matching data in the right-hand table, based on comparisons
+      between join columns in <code class="ph codeph">ON</code> or <code class="ph codeph">WHERE</code> clauses. Only one instance of each row
+      from the left-hand table is returned, regardless of how many matching rows exist in the right-hand table.
+      <span class="ph">A right semi-join (available in Impala 2.0 and higher) reverses the comparison and returns
+      data from the right-hand table.</span>
+    </p>
+
+<pre class="pre codeblock"><code>SELECT t1.c1, t1.c2, t1.c2 FROM t1 LEFT SEMI JOIN t2 ON t1.id = t2.id;</code></pre>
+
+    <p class="p">
+      <strong class="ph b">Natural joins (not supported):</strong>
+    </p>
+
+    <p class="p">
+      Impala does not support the <code class="ph codeph">NATURAL JOIN</code> operator, again to avoid inconsistent or huge
+      result sets. Natural joins do away with the <code class="ph codeph">ON</code> and <code class="ph codeph">USING</code> clauses, and
+      instead automatically join on all columns with the same names in the left-hand and right-hand tables. This
+      kind of query is not recommended for rapidly evolving data structures such as are typically used in Hadoop.
+      Thus, Impala does not support the <code class="ph codeph">NATURAL JOIN</code> syntax, which can produce different query
+      results as columns are added to or removed from tables.
+    </p>
+
+    <p class="p">
+      If you do have any queries that use <code class="ph codeph">NATURAL JOIN</code>, make sure to rewrite them with explicit
+      <code class="ph codeph">USING</code> clauses, because Impala could interpret the <code class="ph codeph">NATURAL</code> keyword as a
+      table alias:
+    </p>
+
+<pre class="pre codeblock"><code>-- 'NATURAL' is interpreted as an alias for 't1' and Impala attempts an inner join,
+-- resulting in an error because inner joins require explicit comparisons between columns.
+SELECT t1.c1, t2.c2 FROM t1 NATURAL JOIN t2;
+ERROR: NotImplementedException: Join with 't2' requires at least one conjunctive equality predicate.
+  To perform a Cartesian product between two tables, use a CROSS JOIN.
+
+-- If you expect the tables to have identically named columns with matching values,
+-- list the corresponding column names in a USING clause.
+SELECT t1.c1, t2.c2 FROM t1 JOIN t2 USING (id, type_flag, name, address);</code></pre>
+
+    <p class="p">
+      <strong class="ph b">Anti-joins (<span class="keyword">Impala 2.0</span> and higher only):</strong>
+    </p>
+
+    <p class="p">
+      Impala supports the <code class="ph codeph">LEFT ANTI JOIN</code> and <code class="ph codeph">RIGHT ANTI JOIN</code> clauses in
+      <span class="keyword">Impala 2.0</span> and higher. The <code class="ph codeph">LEFT</code> or <code class="ph codeph">RIGHT</code>
+      keyword is required for this kind of join. For <code class="ph codeph">LEFT ANTI JOIN</code>, this clause returns those
+      values from the left-hand table that have no matching value in the right-hand table. <code class="ph codeph">RIGHT ANTI
+      JOIN</code> reverses the comparison and returns values from the right-hand table. You can express this
+      negative relationship either through the <code class="ph codeph">ANTI JOIN</code> clause or through a <code class="ph codeph">NOT
+      EXISTS</code> operator with a subquery.
+    </p>
+
+
+
+    <p class="p">
+        <strong class="ph b">Complex type considerations:</strong>
+      </p>
+
+
+
+    <p class="p">
+      When referring to a column with a complex type (<code class="ph codeph">STRUCT</code>, <code class="ph codeph">ARRAY</code>, or <code class="ph codeph">MAP</code>)
+      in a query, you use join notation to <span class="q">"unpack"</span> the scalar fields of the struct, the elements of the array, or
+      the key-value pairs of the map. (The join notation is not required for aggregation operations, such as
+      <code class="ph codeph">COUNT()</code> or <code class="ph codeph">SUM()</code> for array elements.) Because Impala recognizes which complex type elements are associated with which row
+      of the result set, you use the same syntax as for a cross or cartesian join, without an explicit join condition.
+      See <a class="xref" href="impala_complex_types.html#complex_types">Complex Types (Impala 2.3 or higher only)</a> for details about Impala support for complex types.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+    <p class="p">
+      You typically use join queries in situations like these:
+    </p>
+
+    <ul class="ul">
+      <li class="li">
+        When related data arrives from different sources, with each data set physically residing in a separate
+        table. For example, you might have address data from business records that you cross-check against phone
+        listings or census data.
+        <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+          Impala can join tables of different file formats, including Impala-managed tables and HBase tables. For
+          example, you might keep small dimension tables in HBase, for convenience of single-row lookups and
+          updates, and for the larger fact tables use Parquet or other binary file format optimized for scan
+          operations. Then, you can issue a join query to cross-reference the fact tables with the dimension
+          tables.
+        </div>
+      </li>
+
+      <li class="li">
+        When data is normalized, a technique for reducing data duplication by dividing it across multiple tables.
+        This kind of organization is often found in data that comes from traditional relational database systems.
+        For example, instead of repeating some long string such as a customer name in multiple tables, each table
+        might contain a numeric customer ID. Queries that need to display the customer name could <span class="q">"join"</span> the
+        table that specifies which customer ID corresponds to which name.
+      </li>
+
+      <li class="li">
+        When certain columns are rarely needed for queries, so they are moved into separate tables to reduce
+        overhead for common queries. For example, a <code class="ph codeph">biography</code> field might be rarely needed in
+        queries on employee data. Putting that field in a separate table reduces the amount of I/O for common
+        queries on employee addresses or phone numbers. Queries that do need the <code class="ph codeph">biography</code> column
+        can retrieve it by performing a join with that separate table.
+      </li>
+
+      <li class="li">
+        In <span class="keyword">Impala 2.3</span> or higher, when referring to complex type columns in queries.
+        See <a class="xref" href="impala_complex_types.html#complex_types">Complex Types (Impala 2.3 or higher only)</a> for details.
+      </li>
+    </ul>
+
+    <p class="p">
+      When comparing columns with the same names in <code class="ph codeph">ON</code> or <code class="ph codeph">WHERE</code> clauses, use the
+      fully qualified names such as <code class="ph codeph"><var class="keyword varname">db_name</var>.<var class="keyword varname">table_name</var></code>, or
+      assign table aliases, column aliases, or both to make the code more compact and understandable:
+    </p>
+
+<pre class="pre codeblock"><code>select t1.c1 as first_id, t2.c2 as second_id from
+  t1 join t2 on first_id = second_id;
+
+select fact.custno, dimension.custno from
+  customer_data as fact join customer_address as dimension
+  using (custno)</code></pre>
+
+    <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+      <p class="p">
+        Performance for join queries is a crucial aspect for Impala, because complex join queries are
+        resource-intensive operations. An efficient join query produces much less network traffic and CPU overhead
+        than an inefficient one. For best results:
+      </p>
+      <ul class="ul">
+        <li class="li">
+          Make sure that both <a class="xref" href="impala_perf_stats.html#perf_stats">table and column statistics</a> are
+          available for all the tables involved in a join query, and especially for the columns referenced in any
+          join conditions. Impala uses the statistics to automatically deduce an efficient join order.
+          Use <a class="xref" href="impala_show.html#show"><code class="ph codeph">SHOW TABLE STATS <var class="keyword varname">table_name</var></code> and
+          <code class="ph codeph">SHOW COLUMN STATS <var class="keyword varname">table_name</var></code></a> to check if statistics are
+          already present. Issue the <code class="ph codeph">COMPUTE STATS <var class="keyword varname">table_name</var></code> for a nonpartitioned table,
+          or (in Impala 2.1.0 and higher) <code class="ph codeph">COMPUTE INCREMENTAL STATS <var class="keyword varname">table_name</var></code>
+          for a partitioned table, to collect the initial statistics at both the table and column levels, and to keep the
+          statistics up to date after any substantial <code class="ph codeph">INSERT</code> or <code class="ph codeph">LOAD DATA</code> operations.
+        </li>
+
+        <li class="li">
+          If table or column statistics are not available, join the largest table first. You can check the
+          existence of statistics with the <code class="ph codeph">SHOW TABLE STATS <var class="keyword varname">table_name</var></code> and
+          <code class="ph codeph">SHOW COLUMN STATS <var class="keyword varname">table_name</var></code> statements.
+        </li>
+
+        <li class="li">
+          If table or column statistics are not available, join subsequent tables according to which table has the
+          most selective filter, based on overall size and <code class="ph codeph">WHERE</code> clauses. Joining the table with
+          the most selective filter results in the fewest number of rows being returned.
+        </li>
+      </ul>
+      <p class="p">
+        For more information and examples of performance for join queries, see
+        <a class="xref" href="impala_perf_joins.html#perf_joins">Performance Considerations for Join Queries</a>.
+      </p>
+    </div>
+
+    <p class="p">
+      To control the result set from a join query, include the names of corresponding column names in both tables
+      in an <code class="ph codeph">ON</code> or <code class="ph codeph">USING</code> clause, or by coding equality comparisons for those
+      columns in the <code class="ph codeph">WHERE</code> clause.
+    </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; select c_last_name, ca_city from customer join customer_address where c_customer_sk = ca_address_sk;
++-------------+-----------------+
+| c_last_name | ca_city         |
++-------------+-----------------+
+| Lewis       | Fairfield       |
+| Moses       | Fairview        |
+| Hamilton    | Pleasant Valley |
+| White       | Oak Ridge       |
+| Moran       | Glendale        |
+...
+| Richards    | Lakewood         |
+| Day         | Lebanon          |
+| Painter     | Oak Hill         |
+| Bentley     | Greenfield       |
+| Jones       | Stringtown       |
++-------------+------------------+
+Returned 50000 row(s) in 9.82s</code></pre>
+
+    <p class="p">
+      One potential downside of joins is the possibility of excess resource usage in poorly constructed queries.
+      Impala imposes restrictions on join queries to guard against such issues. To minimize the chance of runaway
+      queries on large data sets, Impala requires every join query to contain at least one equality predicate
+      between the columns of the various tables. For example, if <code class="ph codeph">T1</code> contains 1000 rows and
+      <code class="ph codeph">T2</code> contains 1,000,000 rows, a query <code class="ph codeph">SELECT <var class="keyword varname">columns</var> FROM t1 JOIN
+      t2</code> could return up to 1 billion rows (1000 * 1,000,000); Impala requires that the query include a
+      clause such as <code class="ph codeph">ON t1.c1 = t2.c2</code> or <code class="ph codeph">WHERE t1.c1 = t2.c2</code>.
+    </p>
+
+    <p class="p">
+      Because even with equality clauses, the result set can still be large, as we saw in the previous example, you
+      might use a <code class="ph codeph">LIMIT</code> clause to return a subset of the results:
+    </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; select c_last_name, ca_city from customer, customer_address where c_customer_sk = ca_address_sk limit 10;
++-------------+-----------------+
+| c_last_name | ca_city         |
++-------------+-----------------+
+| Lewis       | Fairfield       |
+| Moses       | Fairview        |
+| Hamilton    | Pleasant Valley |
+| White       | Oak Ridge       |
+| Moran       | Glendale        |
+| Sharp       | Lakeview        |
+| Wiles       | Farmington      |
+| Shipman     | Union           |
+| Gilbert     | New Hope        |
+| Brunson     | Martinsville    |
++-------------+-----------------+
+Returned 10 row(s) in 0.63s</code></pre>
+
+    <p class="p">
+      Or you might use additional comparison operators or aggregation functions to condense a large result set into
+      a smaller set of values:
+    </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; -- Find the names of customers who live in one particular town.
+[localhost:21000] &gt; select distinct c_last_name from customer, customer_address where
+  c_customer_sk = ca_address_sk
+  and ca_city = "Green Acres";
++---------------+
+| c_last_name   |
++---------------+
+| Hensley       |
+| Pearson       |
+| Mayer         |
+| Montgomery    |
+| Ricks         |
+...
+| Barrett       |
+| Price         |
+| Hill          |
+| Hansen        |
+| Meeks         |
++---------------+
+Returned 332 row(s) in 0.97s
+
+[localhost:21000] &gt; -- See how many different customers in this town have names starting with "A".
+[localhost:21000] &gt; select count(distinct c_last_name) from customer, customer_address where
+  c_customer_sk = ca_address_sk
+  and ca_city = "Green Acres"
+  and substr(c_last_name,1,1) = "A";
++-----------------------------+
+| count(distinct c_last_name) |
++-----------------------------+
+| 12                          |
++-----------------------------+
+Returned 1 row(s) in 1.00s</code></pre>
+
+    <p class="p">
+      Because a join query can involve reading large amounts of data from disk, sending large amounts of data
+      across the network, and loading large amounts of data into memory to do the comparisons and filtering, you
+      might do benchmarking, performance analysis, and query tuning to find the most efficient join queries for
+      your data set, hardware capacity, network configuration, and cluster workload.
+    </p>
+
+    <p class="p">
+      The two categories of joins in Impala are known as <strong class="ph b">partitioned joins</strong> and <strong class="ph b">broadcast joins</strong>. If
+      inaccurate table or column statistics, or some quirk of the data distribution, causes Impala to choose the
+      wrong mechanism for a particular join, consider using query hints as a temporary workaround. For details, see
+      <a class="xref" href="impala_hints.html#hints">Optimizer Hints</a>.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Handling NULLs in Join Columns:</strong>
+    </p>
+
+    <p class="p">
+      By default, join key columns do not match if either one contains a <code class="ph codeph">NULL</code> value.
+      To treat such columns as equal if both contain <code class="ph codeph">NULL</code>, you can use an expression
+      such as <code class="ph codeph">A = B OR (A IS NULL AND B IS NULL)</code>.
+      In <span class="keyword">Impala 2.5</span> and higher, the <code class="ph codeph">&lt;=&gt;</code> operator (shorthand for
+      <code class="ph codeph">IS NOT DISTINCT FROM</code>) performs the same comparison in a concise and efficient form.
+      The <code class="ph codeph">&lt;=&gt;</code> operator is more efficient in for comparing join keys in a <code class="ph codeph">NULL</code>-safe
+      manner, because the operator can use a hash join while the <code class="ph codeph">OR</code> expression cannot.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+    <div class="p">
+      The following examples refer to these simple tables containing small sets of integers:
+<pre class="pre codeblock"><code>[localhost:21000] &gt; create table t1 (x int);
+[localhost:21000] &gt; insert into t1 values (1), (2), (3), (4), (5), (6);
+
+[localhost:21000] &gt; create table t2 (y int);
+[localhost:21000] &gt; insert into t2 values (2), (4), (6);
+
+[localhost:21000] &gt; create table t3 (z int);
+[localhost:21000] &gt; insert into t3 values (1), (3), (5);
+</code></pre>
+    </div>
+
+
+
+    <p class="p">
+      The following example demonstrates an anti-join, returning the values from <code class="ph codeph">T1</code> that do not
+      exist in <code class="ph codeph">T2</code> (in this case, the odd numbers 1, 3, and 5):
+    </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; select x from t1 left anti join t2 on (t1.x = t2.y);
++---+
+| x |
++---+
+| 1 |
+| 3 |
+| 5 |
++---+
+</code></pre>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+    <p class="p">
+      See these tutorials for examples of different kinds of joins:
+    </p>
+
+    <ul class="ul">
+      <li class="li">
+        <a class="xref" href="impala_tutorial.html#tut_cross_join">Cross Joins and Cartesian Products with the CROSS JOIN Operator</a>
+      </li>
+    </ul>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_select.html">SELECT Statement</a></div></div></nav></article></main></body></html>

[30/51] [partial] impala git commit: [DOCS] Impala doc site update for 3.0

Posted by mi...@apache.org.

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_hbase.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_hbase.html b/docs/build3x/html/topics/impala_hbase.html
new file mode 100644
index 0000000..ef339ea
--- /dev/null
+++ b/docs/build3x/html/topics/impala_hbase.html
@@ -0,0 +1,772 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.
 0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="impala_hbase"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Using Impala to Query HBase Tables</title></head><body id="impala_hbase"><main role="main"><article role="article" aria-labelledby="impala_hbase__hbase">
+
+  <h1 class="title topictitle1" id="impala_hbase__hbase">Using Impala to Query HBase Tables</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+
+      You can use Impala to query HBase tables. This capability allows convenient access to a storage system that
+      is tuned for different kinds of workloads than the default with Impala. The default Impala tables use data
+      files stored on HDFS, which are ideal for bulk loads and queries using full-table scans. In contrast, HBase
+      can do efficient queries for data organized for OLTP-style workloads, with lookups of individual rows or
+      ranges of values.
+    </p>
+
+    <p class="p">
+      From the perspective of an Impala user, coming from an RDBMS background, HBase is a kind of key-value store
+      where the value consists of multiple fields. The key is mapped to one column in the Impala table, and the
+      various fields of the value are mapped to the other columns in the Impala table.
+    </p>
+
+    <p class="p">
+      For background information on HBase, see <a class="xref" href="https://hbase.apache.org/book.html" target="_blank">the Apache HBase documentation</a>.
+    </p>
+
+    <p class="p toc inpage"></p>
+  </div>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title2" id="impala_hbase__hbase_using">
+
+    <h2 class="title topictitle2" id="ariaid-title2">Overview of Using HBase with Impala</h2>
+
+
+    <div class="body conbody">
+
+      <p class="p">
+        When you use Impala with HBase:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          You create the tables on the Impala side using the Hive shell, because the Impala <code class="ph codeph">CREATE
+          TABLE</code> statement currently does not support custom SerDes and some other syntax needed for these
+          tables:
+          <ul class="ul">
+            <li class="li">
+              You designate it as an HBase table using the <code class="ph codeph">STORED BY
+              'org.apache.hadoop.hive.hbase.HBaseStorageHandler'</code> clause on the Hive <code class="ph codeph">CREATE
+              TABLE</code> statement.
+            </li>
+
+            <li class="li">
+              You map these specially created tables to corresponding tables that exist in HBase, with the clause
+              <code class="ph codeph">TBLPROPERTIES("hbase.table.name" = "<var class="keyword varname">table_name_in_hbase</var>")</code> on the
+              Hive <code class="ph codeph">CREATE TABLE</code> statement.
+            </li>
+
+            <li class="li">
+              See <a class="xref" href="#hbase_queries">Examples of Querying HBase Tables from Impala</a> for a full example.
+            </li>
+          </ul>
+        </li>
+
+        <li class="li">
+          You define the column corresponding to the HBase row key as a string with the <code class="ph codeph">#string</code>
+          keyword, or map it to a <code class="ph codeph">STRING</code> column.
+        </li>
+
+        <li class="li">
+          Because Impala and Hive share the same metastore database, once you create the table in Hive, you can
+          query or insert into it through Impala. (After creating a new table through Hive, issue the
+          <code class="ph codeph">INVALIDATE METADATA</code> statement in <span class="keyword cmdname">impala-shell</span> to make Impala aware of
+          the new table.)
+        </li>
+
+        <li class="li">
+          You issue queries against the Impala tables. For efficient queries, use <code class="ph codeph">WHERE</code> clauses to
+          find a single key value or a range of key values wherever practical, by testing the Impala column
+          corresponding to the HBase row key. Avoid queries that do full-table scans, which are efficient for
+          regular Impala tables but inefficient in HBase.
+        </li>
+      </ul>
+
+      <p class="p">
+        To work with an HBase table from Impala, ensure that the <code class="ph codeph">impala</code> user has read/write
+        privileges for the HBase table, using the <code class="ph codeph">GRANT</code> command in the HBase shell. For details
+        about HBase security, see <a class="xref" href="https://hbase.apache.org/book.html#security" target="_blank">the Security chapter in the Apache HBase documentation</a>.
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="impala_hbase__hbase_config">
+
+    <h2 class="title topictitle2" id="ariaid-title3">Configuring HBase for Use with Impala</h2>
+
+
+    <div class="body conbody">
+
+      <p class="p">
+        HBase works out of the box with Impala. There is no mandatory configuration needed to use these two
+        components together.
+      </p>
+
+      <p class="p">
+        To avoid delays if HBase is unavailable during Impala startup or after an <code class="ph codeph">INVALIDATE
+        METADATA</code> statement, set timeout values similar to the following in
+        <span class="ph filepath">/etc/impala/conf/hbase-site.xml</span>:
+      </p>
+
+<pre class="pre codeblock"><code>&lt;property&gt;
+  &lt;name&gt;hbase.client.retries.number&lt;/name&gt;
+  &lt;value&gt;3&lt;/value&gt;
+&lt;/property&gt;
+&lt;property&gt;
+  &lt;name&gt;hbase.rpc.timeout&lt;/name&gt;
+  &lt;value&gt;3000&lt;/value&gt;
+&lt;/property&gt;
+</code></pre>
+
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="impala_hbase__hbase_types">
+
+    <h2 class="title topictitle2" id="ariaid-title4">Supported Data Types for HBase Columns</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        To understand how Impala column data types are mapped to fields in HBase, you should have some background
+        knowledge about HBase first. You set up the mapping by running the <code class="ph codeph">CREATE TABLE</code> statement
+        in the Hive shell. See
+        <a class="xref" href="https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration" target="_blank">the
+        Hive wiki</a> for a starting point, and <a class="xref" href="#hbase_queries">Examples of Querying HBase Tables from Impala</a> for examples.
+      </p>
+
+      <p class="p">
+        HBase works as a kind of <span class="q">"bit bucket"</span>, in the sense that HBase does not enforce any typing for the
+        key or value fields. All the type enforcement is done on the Impala side.
+      </p>
+
+      <p class="p">
+        For best performance of Impala queries against HBase tables, most queries will perform comparisons in the
+        <code class="ph codeph">WHERE</code> against the column that corresponds to the HBase row key. When creating the table
+        through the Hive shell, use the <code class="ph codeph">STRING</code> data type for the column that corresponds to the
+        HBase row key. Impala can translate conditional tests (through operators such as <code class="ph codeph">=</code>,
+        <code class="ph codeph">&lt;</code>, <code class="ph codeph">BETWEEN</code>, and <code class="ph codeph">IN</code>) against this column into fast
+        lookups in HBase, but this optimization (<span class="q">"predicate pushdown"</span>) only works when that column is
+        defined as <code class="ph codeph">STRING</code>.
+      </p>
+
+      <p class="p">
+        Starting in Impala 1.1, Impala also supports reading and writing to columns that are defined in the Hive
+        <code class="ph codeph">CREATE TABLE</code> statement using binary data types, represented in the Hive table definition
+        using the <code class="ph codeph">#binary</code> keyword, often abbreviated as <code class="ph codeph">#b</code>. Defining numeric
+        columns as binary can reduce the overall data volume in the HBase tables. You should still define the
+        column that corresponds to the HBase row key as a <code class="ph codeph">STRING</code>, to allow fast lookups using
+        those columns.
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title5" id="impala_hbase__hbase_performance">
+
+    <h2 class="title topictitle2" id="ariaid-title5">Performance Considerations for the Impala-HBase Integration</h2>
+
+
+    <div class="body conbody">
+
+      <p class="p">
+        To understand the performance characteristics of SQL queries against data stored in HBase, you should have
+        some background knowledge about how HBase interacts with SQL-oriented systems first. See
+        <a class="xref" href="https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration" target="_blank">the
+        Hive wiki</a> for a starting point; because Impala shares the same metastore database as Hive, the
+        information about mapping columns from Hive tables to HBase tables is generally applicable to Impala too.
+      </p>
+
+      <p class="p">
+        Impala uses the HBase client API via Java Native Interface (JNI) to query data stored in HBase. This
+        querying does not read HFiles directly. The extra communication overhead makes it important to choose what
+        data to store in HBase or in HDFS, and construct efficient queries that can retrieve the HBase data
+        efficiently:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          Use HBase table for queries that return a single row or a range of rows, not queries that scan the entire
+          table. (If a query has no <code class="ph codeph">WHERE</code> clause, that is a strong indicator that it is an
+          inefficient query for an HBase table.)
+        </li>
+
+        <li class="li">
+          If you have join queries that do aggregation operations on large fact tables and join the results against
+          small dimension tables, consider using Impala for the fact tables and HBase for the dimension tables.
+          (Because Impala does a full scan on the HBase table in this case, rather than doing single-row HBase
+          lookups based on the join column, only use this technique where the HBase table is small enough that
+          doing a full table scan does not cause a performance bottleneck for the query.)
+        </li>
+      </ul>
+
+      <p class="p">
+        Query predicates are applied to row keys as start and stop keys, thereby limiting the scope of a particular
+        lookup. If row keys are not mapped to string columns, then ordering is typically incorrect and comparison
+        operations do not work. For example, if row keys are not mapped to string columns, evaluating for greater
+        than (&gt;) or less than (&lt;) cannot be completed.
+      </p>
+
+      <p class="p">
+        Predicates on non-key columns can be sent to HBase to scan as <code class="ph codeph">SingleColumnValueFilters</code>,
+        providing some performance gains. In such a case, HBase returns fewer rows than if those same predicates
+        were applied using Impala. While there is some improvement, it is not as great when start and stop rows are
+        used. This is because the number of rows that HBase must examine is not limited as it is when start and
+        stop rows are used. As long as the row key predicate only applies to a single row, HBase will locate and
+        return that row. Conversely, if a non-key predicate is used, even if it only applies to a single row, HBase
+        must still scan the entire table to find the correct result.
+      </p>
+
+      <div class="example"><h3 class="title sectiontitle">Interpreting EXPLAIN Output for HBase Queries</h3>
+
+
+
+        <p class="p">
+          For example, here are some queries against the following Impala table, which is mapped to an HBase table.
+          The examples show excerpts from the output of the <code class="ph codeph">EXPLAIN</code> statement, demonstrating what
+          things to look for to indicate an efficient or inefficient query against an HBase table.
+        </p>
+
+        <p class="p">
+          The first column (<code class="ph codeph">cust_id</code>) was specified as the key column in the <code class="ph codeph">CREATE
+          EXTERNAL TABLE</code> statement; for performance, it is important to declare this column as
+          <code class="ph codeph">STRING</code>. Other columns, such as <code class="ph codeph">BIRTH_YEAR</code> and
+          <code class="ph codeph">NEVER_LOGGED_ON</code>, are also declared as <code class="ph codeph">STRING</code>, rather than their
+          <span class="q">"natural"</span> types of <code class="ph codeph">INT</code> or <code class="ph codeph">BOOLEAN</code>, because Impala can optimize
+          those types more effectively in HBase tables. For comparison, we leave one column,
+          <code class="ph codeph">YEAR_REGISTERED</code>, as <code class="ph codeph">INT</code> to show that filtering on this column is
+          inefficient.
+        </p>
+
+<pre class="pre codeblock"><code>describe hbase_table;
+Query: describe hbase_table
++-----------------------+--------+---------+
+| name                  | type   | comment |
++-----------------------+--------+---------+
+| cust_id               | <strong class="ph b">string</strong> |         |
+| birth_year            | <strong class="ph b">string</strong> |         |
+| never_logged_on       | <strong class="ph b">string</strong> |         |
+| private_email_address | string |         |
+| year_registered       | <strong class="ph b">int</strong>    |         |
++-----------------------+--------+---------+
+</code></pre>
+
+        <p class="p">
+          The best case for performance involves a single row lookup using an equality comparison on the column
+          defined as the row key:
+        </p>
+
+<pre class="pre codeblock"><code>explain select count(*) from hbase_table where cust_id = 'some_user@example.com';
++------------------------------------------------------------------------------------+
+| Explain String                                                                     |
++------------------------------------------------------------------------------------+
+| Estimated Per-Host Requirements: Memory=1.01GB VCores=1                            |
+| WARNING: The following tables are missing relevant table and/or column statistics. |
+| hbase.hbase_table                                                                  |
+|                                                                                    |
+| 03:AGGREGATE [MERGE FINALIZE]                                                      |
+| |  output: sum(count(*))                                                           |
+| |                                                                                  |
+| 02:EXCHANGE [PARTITION=UNPARTITIONED]                                              |
+| |                                                                                  |
+| 01:AGGREGATE                                                                       |
+| |  output: count(*)                                                                |
+| |                                                                                  |
+<strong class="ph b">| 00:SCAN HBASE [hbase.hbase_table]                                                  |</strong>
+<strong class="ph b">|    start key: some_user@example.com                                                |</strong>
+<strong class="ph b">|    stop key: some_user@example.com\0                                               |</strong>
++------------------------------------------------------------------------------------+
+</code></pre>
+
+        <p class="p">
+          Another type of efficient query involves a range lookup on the row key column, using SQL operators such
+          as greater than (or equal), less than (or equal), or <code class="ph codeph">BETWEEN</code>. This example also includes
+          an equality test on a non-key column; because that column is a <code class="ph codeph">STRING</code>, Impala can let
+          HBase perform that test, indicated by the <code class="ph codeph">hbase filters:</code> line in the
+          <code class="ph codeph">EXPLAIN</code> output. Doing the filtering within HBase is more efficient than transmitting all
+          the data to Impala and doing the filtering on the Impala side.
+        </p>
+
+<pre class="pre codeblock"><code>explain select count(*) from hbase_table where cust_id between 'a' and 'b'
+  and never_logged_on = 'true';
++------------------------------------------------------------------------------------+
+| Explain String                                                                     |
++------------------------------------------------------------------------------------+
+...
+
+| 01:AGGREGATE                                                                       |
+| |  output: count(*)                                                                |
+| |                                                                                  |
+<strong class="ph b">| 00:SCAN HBASE [hbase.hbase_table]                                                  |</strong>
+<strong class="ph b">|    start key: a                                                                    |</strong>
+<strong class="ph b">|    stop key: b\0                                                                   |</strong>
+<strong class="ph b">|    hbase filters: cols:never_logged_on EQUAL 'true'                                |</strong>
++------------------------------------------------------------------------------------+
+</code></pre>
+
+        <p class="p">
+          The query is less efficient if Impala has to evaluate any of the predicates, because Impala must scan the
+          entire HBase table. Impala can only push down predicates to HBase for columns declared as
+          <code class="ph codeph">STRING</code>. This example tests a column declared as <code class="ph codeph">INT</code>, and the
+          <code class="ph codeph">predicates:</code> line in the <code class="ph codeph">EXPLAIN</code> output indicates that the test is
+          performed after the data is transmitted to Impala.
+        </p>
+
+<pre class="pre codeblock"><code>explain select count(*) from hbase_table where year_registered = 2010;
++------------------------------------------------------------------------------------+
+| Explain String                                                                     |
++------------------------------------------------------------------------------------+
+...
+
+| 01:AGGREGATE                                                                       |
+| |  output: count(*)                                                                |
+| |                                                                                  |
+<strong class="ph b">| 00:SCAN HBASE [hbase.hbase_table]                                                  |</strong>
+<strong class="ph b">|    predicates: year_registered = 2010                                              |</strong>
++------------------------------------------------------------------------------------+
+</code></pre>
+
+        <p class="p">
+          The same inefficiency applies if the key column is compared to any non-constant value. Here, even though
+          the key column is a <code class="ph codeph">STRING</code>, and is tested using an equality operator, Impala must scan
+          the entire HBase table because the key column is compared to another column value rather than a constant.
+        </p>
+
+<pre class="pre codeblock"><code>explain select count(*) from hbase_table where cust_id = private_email_address;
++------------------------------------------------------------------------------------+
+| Explain String                                                                     |
++------------------------------------------------------------------------------------+
+...
+
+| 01:AGGREGATE                                                                       |
+| |  output: count(*)                                                                |
+| |                                                                                  |
+<strong class="ph b">| 00:SCAN HBASE [hbase.hbase_table]                                                  |</strong>
+<strong class="ph b">|    predicates: cust_id = private_email_address                                    |</strong>
++------------------------------------------------------------------------------------+
+</code></pre>
+
+        <p class="p">
+          Currently, tests on the row key using <code class="ph codeph">OR</code> or <code class="ph codeph">IN</code> clauses are not
+          optimized into direct lookups either. Such limitations might be lifted in the future, so always check the
+          <code class="ph codeph">EXPLAIN</code> output to be sure whether a particular SQL construct results in an efficient
+          query or not for HBase tables.
+        </p>
+
+<pre class="pre codeblock"><code>explain select count(*) from hbase_table where
+  cust_id = 'some_user@example.com' or cust_id = 'other_user@example.com';
++----------------------------------------------------------------------------------------+
+| Explain String                                                                         |
++----------------------------------------------------------------------------------------+
+...
+
+| 01:AGGREGATE                                                                           |
+| |  output: count(*)                                                                    |
+| |                                                                                      |
+<strong class="ph b">| 00:SCAN HBASE [hbase.hbase_table]                                                      |</strong>
+<strong class="ph b">|    predicates: cust_id = 'some_user@example.com' OR cust_id = 'other_user@example.com' |</strong>
++----------------------------------------------------------------------------------------+
+
+explain select count(*) from hbase_table where
+  cust_id in ('some_user@example.com', 'other_user@example.com');
++------------------------------------------------------------------------------------+
+| Explain String                                                                     |
++------------------------------------------------------------------------------------+
+...
+
+| 01:AGGREGATE                                                                       |
+| |  output: count(*)                                                                |
+| |                                                                                  |
+<strong class="ph b">| 00:SCAN HBASE [hbase.hbase_table]                                                  |</strong>
+<strong class="ph b">|    predicates: cust_id IN ('some_user@example.com', 'other_user@example.com')      |</strong>
++------------------------------------------------------------------------------------+
+</code></pre>
+
+        <p class="p">
+          Either rewrite into separate queries for each value and combine the results in the application, or
+          combine the single-row queries using UNION ALL:
+        </p>
+
+<pre class="pre codeblock"><code>select count(*) from hbase_table where cust_id = 'some_user@example.com';
+select count(*) from hbase_table where cust_id = 'other_user@example.com';
+
+explain
+  select count(*) from hbase_table where cust_id = 'some_user@example.com'
+  union all
+  select count(*) from hbase_table where cust_id = 'other_user@example.com';
++------------------------------------------------------------------------------------+
+| Explain String                                                                     |
++------------------------------------------------------------------------------------+
+...
+
+| |  04:AGGREGATE                                                                    |
+| |  |  output: count(*)                                                             |
+| |  |                                                                               |
+<strong class="ph b">| |  03:SCAN HBASE [hbase.hbase_table]                                               |</strong>
+<strong class="ph b">| |     start key: other_user@example.com                                            |</strong>
+<strong class="ph b">| |     stop key: other_user@example.com\0                                           |</strong>
+| |                                                                                  |
+| 10:MERGE                                                                           |
+...
+
+| 02:AGGREGATE                                                                       |
+| |  output: count(*)                                                                |
+| |                                                                                  |
+<strong class="ph b">| 01:SCAN HBASE [hbase.hbase_table]                                                  |</strong>
+<strong class="ph b">|    start key: some_user@example.com                                                |</strong>
+<strong class="ph b">|    stop key: some_user@example.com\0                                               |</strong>
++------------------------------------------------------------------------------------+
+</code></pre>
+
+      </div>
+
+      <div class="example"><h3 class="title sectiontitle">Configuration Options for Java HBase Applications</h3>
+
+
+
+        <p class="p"> If you have an HBase Java application that calls the
+            <code class="ph codeph">setCacheBlocks</code> or <code class="ph codeph">setCaching</code>
+          methods of the class <a class="xref" href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html" target="_blank">org.apache.hadoop.hbase.client.Scan</a>, you can set these same
+          caching behaviors through Impala query options, to control the memory
+          pressure on the HBase RegionServer. For example, when doing queries in
+          HBase that result in full-table scans (which by default are
+          inefficient for HBase), you can reduce memory usage and speed up the
+          queries by turning off the <code class="ph codeph">HBASE_CACHE_BLOCKS</code> setting
+          and specifying a large number for the <code class="ph codeph">HBASE_CACHING</code>
+          setting.
+        </p>
+
+        <p class="p">
+          To set these options, issue commands like the following in <span class="keyword cmdname">impala-shell</span>:
+        </p>
+
+<pre class="pre codeblock"><code>-- Same as calling setCacheBlocks(true) or setCacheBlocks(false).
+set hbase_cache_blocks=true;
+set hbase_cache_blocks=false;
+
+-- Same as calling setCaching(rows).
+set hbase_caching=1000;
+</code></pre>
+
+        <p class="p">
+          Or update the <span class="keyword cmdname">impalad</span> defaults file <span class="ph filepath">/etc/default/impala</span> and
+          include settings for <code class="ph codeph">HBASE_CACHE_BLOCKS</code> and/or <code class="ph codeph">HBASE_CACHING</code> in the
+          <code class="ph codeph">-default_query_options</code> setting for <code class="ph codeph">IMPALA_SERVER_ARGS</code>. See
+          <a class="xref" href="impala_config_options.html#config_options">Modifying Impala Startup Options</a> for details.
+        </p>
+
+        <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+          In Impala 2.0 and later, these options are settable through the JDBC or ODBC interfaces using the
+          <code class="ph codeph">SET</code> statement.
+        </div>
+
+      </div>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title6" id="impala_hbase__hbase_scenarios">
+
+    <h2 class="title topictitle2" id="ariaid-title6">Use Cases for Querying HBase through Impala</h2>
+
+
+    <div class="body conbody">
+
+      <p class="p">
+        The following are popular use cases for using Impala to query HBase tables:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          Keeping large fact tables in Impala, and smaller dimension tables in HBase. The fact tables use Parquet
+          or other binary file format optimized for scan operations. Join queries scan through the large Impala
+          fact tables, and cross-reference the dimension tables using efficient single-row lookups in HBase.
+        </li>
+
+        <li class="li">
+          Using HBase to store rapidly incrementing counters, such as how many times a web page has been viewed, or
+          on a social network, how many connections a user has or how many votes a post received. HBase is
+          efficient for capturing such changeable data: the append-only storage mechanism is efficient for writing
+          each change to disk, and a query always returns the latest value. An application could query specific
+          totals like these from HBase, and combine the results with a broader set of data queried from Impala.
+        </li>
+
+        <li class="li">
+          <p class="p">
+            Storing very wide tables in HBase. Wide tables have many columns, possibly thousands, typically
+            recording many attributes for an important subject such as a user of an online service. These tables
+            are also often sparse, that is, most of the columns values are <code class="ph codeph">NULL</code>, 0,
+            <code class="ph codeph">false</code>, empty string, or other blank or placeholder value. (For example, any particular
+            web site user might have never used some site feature, filled in a certain field in their profile,
+            visited a particular part of the site, and so on.) A typical query against this kind of table is to
+            look up a single row to retrieve all the information about a specific subject, rather than summing,
+            averaging, or filtering millions of rows as in typical Impala-managed tables.
+          </p>
+          <p class="p">
+            Or the HBase table could be joined with a larger Impala-managed table. For example, analyze the large
+            Impala table representing web traffic for a site and pick out 50 users who view the most pages. Join
+            that result with the wide user table in HBase to look up attributes of those users. The HBase side of
+            the join would result in 50 efficient single-row lookups in HBase, rather than scanning the entire user
+            table.
+          </p>
+        </li>
+      </ul>
+    </div>
+  </article>
+
+
+
+
+
+
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title7" id="impala_hbase__hbase_loading">
+
+    <h2 class="title topictitle2" id="ariaid-title7">Loading Data into an HBase Table</h2>
+
+
+    <div class="body conbody">
+
+      <p class="p">
+        The Impala <code class="ph codeph">INSERT</code> statement works for HBase tables. The <code class="ph codeph">INSERT ... VALUES</code>
+        syntax is ideally suited to HBase tables, because inserting a single row is an efficient operation for an
+        HBase table. (For regular Impala tables, with data files in HDFS, the tiny data files produced by
+        <code class="ph codeph">INSERT ... VALUES</code> are extremely inefficient, so you would not use that technique with
+        tables containing any significant data volume.)
+      </p>
+
+
+
+      <p class="p">
+        When you use the <code class="ph codeph">INSERT ... SELECT</code> syntax, the result in the HBase table could be fewer
+        rows than you expect. HBase only stores the most recent version of each unique row key, so if an
+        <code class="ph codeph">INSERT ... SELECT</code> statement copies over multiple rows containing the same value for the
+        key column, subsequent queries will only return one row with each key column value:
+      </p>
+
+      <p class="p">
+        Although Impala does not have an <code class="ph codeph">UPDATE</code> statement, you can achieve the same effect by
+        doing successive <code class="ph codeph">INSERT</code> statements using the same value for the key column each time:
+      </p>
+
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title8" id="impala_hbase__hbase_limitations">
+
+    <h2 class="title topictitle2" id="ariaid-title8">Limitations and Restrictions of the Impala and HBase Integration</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        The Impala integration with HBase has the following limitations and restrictions, some inherited from the
+        integration between HBase and Hive, and some unique to Impala:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          <p class="p">
+            If you issue a <code class="ph codeph">DROP TABLE</code> for an internal (Impala-managed) table that is mapped to an
+            HBase table, the underlying table is not removed in HBase. The Hive <code class="ph codeph">DROP TABLE</code>
+            statement also removes the HBase table in this case.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            The <code class="ph codeph">INSERT OVERWRITE</code> statement is not available for HBase tables. You can insert new
+            data, or modify an existing row by inserting a new row with the same key value, but not replace the
+            entire contents of the table. You can do an <code class="ph codeph">INSERT OVERWRITE</code> in Hive if you need this
+            capability.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            If you issue a <code class="ph codeph">CREATE TABLE LIKE</code> statement for a table mapped to an HBase table, the
+            new table is also an HBase table, but inherits the same underlying HBase table name as the original.
+            The new table is effectively an alias for the old one, not a new table with identical column structure.
+            Avoid using <code class="ph codeph">CREATE TABLE LIKE</code> for HBase tables, to avoid any confusion.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            Copying data into an HBase table using the Impala <code class="ph codeph">INSERT ... SELECT</code> syntax might
+            produce fewer new rows than are in the query result set. If the result set contains multiple rows with
+            the same value for the key column, each row supercedes any previous rows with the same key value.
+            Because the order of the inserted rows is unpredictable, you cannot rely on this technique to preserve
+            the <span class="q">"latest"</span> version of a particular key value.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            Because the complex data types (<code class="ph codeph">ARRAY</code>, <code class="ph codeph">STRUCT</code>, and <code class="ph codeph">MAP</code>)
+            available in <span class="keyword">Impala 2.3</span> and higher are currently only supported in Parquet tables, you cannot
+            use these types in HBase tables that are queried through Impala.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+        The <code class="ph codeph">LOAD DATA</code> statement cannot be used with HBase tables.
+      </p>
+        </li>
+        <li class="li">
+          <p class="p">
+        The <code class="ph codeph">TABLESAMPLE</code> clause of the <code class="ph codeph">SELECT</code>
+        statement does not apply to a table reference derived from a view, a subquery,
+        or anything other than a real base table. This clause only works for tables
+        backed by HDFS or HDFS-like data files, therefore it does not apply to Kudu or
+        HBase tables.
+      </p>
+        </li>
+      </ul>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title9" id="impala_hbase__hbase_queries">
+
+    <h2 class="title topictitle2" id="ariaid-title9">Examples of Querying HBase Tables from Impala</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        The following examples create an HBase table with four column families,
+        create a corresponding table through Hive,
+        then insert and query the table through Impala.
+      </p>
+      <p class="p">
+        In HBase shell, the table
+        name is quoted in <code class="ph codeph">CREATE</code> and <code class="ph codeph">DROP</code> statements. Tables created in HBase
+        begin in <span class="q">"enabled"</span> state; before dropping them through the HBase shell, you must issue a
+        <code class="ph codeph">disable '<var class="keyword varname">table_name</var>'</code> statement.
+      </p>
+
+<pre class="pre codeblock"><code>$ hbase shell
+15/02/10 16:07:45
+HBase Shell; enter 'help&lt;RETURN&gt;' for list of supported commands.
+Type "exit&lt;RETURN&gt;" to leave the HBase Shell
+...
+
+hbase(main):001:0&gt; create 'hbasealltypessmall', 'boolsCF', 'intsCF', 'floatsCF', 'stringsCF'
+0 row(s) in 4.6520 seconds
+
+=&gt; Hbase::Table - hbasealltypessmall
+hbase(main):006:0&gt; quit
+</code></pre>
+
+        <p class="p">
+          Issue the following <code class="ph codeph">CREATE TABLE</code> statement in the Hive shell. (The Impala <code class="ph codeph">CREATE
+          TABLE</code> statement currently does not support the <code class="ph codeph">STORED BY</code> clause, so you switch into Hive to
+          create the table, then back to Impala and the <span class="keyword cmdname">impala-shell</span> interpreter to issue the
+          queries.)
+        </p>
+
+        <p class="p">
+          This example creates an external table mapped to the HBase table, usable by both Impala and Hive. It is
+          defined as an external table so that when dropped by Impala or Hive, the original HBase table is not touched at all.
+        </p>
+
+        <p class="p">
+          The <code class="ph codeph">WITH SERDEPROPERTIES</code> clause
+          specifies that the first column (<code class="ph codeph">ID</code>) represents the row key, and maps the remaining
+          columns of the SQL table to HBase column families. The mapping relies on the ordinal order of the
+          columns in the table, not the column names in the <code class="ph codeph">CREATE TABLE</code> statement.
+          The first column is defined to be the lookup key; the
+          <code class="ph codeph">STRING</code> data type produces the fastest key-based lookups for HBase tables.
+        </p>
+
+        <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+          For Impala with HBase tables, the most important aspect to ensure good performance is to use a
+          <code class="ph codeph">STRING</code> column as the row key, as shown in this example.
+        </div>
+
+<pre class="pre codeblock"><code>$ hive
+...
+hive&gt; use hbase;
+OK
+Time taken: 4.095 seconds
+hive&gt; CREATE EXTERNAL TABLE hbasestringids (
+    &gt;   id string,
+    &gt;   bool_col boolean,
+    &gt;   tinyint_col tinyint,
+    &gt;   smallint_col smallint,
+    &gt;   int_col int,
+    &gt;   bigint_col bigint,
+    &gt;   float_col float,
+    &gt;   double_col double,
+    &gt;   date_string_col string,
+    &gt;   string_col string,
+    &gt;   timestamp_col timestamp)
+    &gt; STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
+    &gt; WITH SERDEPROPERTIES (
+    &gt;   "hbase.columns.mapping" =
+    &gt;   ":key,boolsCF:bool_col,intsCF:tinyint_col,intsCF:smallint_col,intsCF:int_col,intsCF:\
+    &gt;   bigint_col,floatsCF:float_col,floatsCF:double_col,stringsCF:date_string_col,\
+    &gt;   stringsCF:string_col,stringsCF:timestamp_col"
+    &gt; )
+    &gt; TBLPROPERTIES("hbase.table.name" = "hbasealltypessmall");
+OK
+Time taken: 2.879 seconds
+hive&gt; quit;
+</code></pre>
+
+        <p class="p">
+          Once you have established the mapping to an HBase table, you can issue DML statements and queries
+          from Impala. The following example shows a series of <code class="ph codeph">INSERT</code>
+          statements followed by a query.
+          The ideal kind of query from a performance standpoint
+          retrieves a row from the table based on a row key
+          mapped to a string column.
+          An initial <code class="ph codeph">INVALIDATE METADATA <var class="keyword varname">table_name</var></code>
+          statement makes the table created through Hive visible to Impala.
+        </p>
+
+<pre class="pre codeblock"><code>$ impala-shell -i localhost -d hbase
+Starting Impala Shell without Kerberos authentication
+Connected to localhost:21000
+...
+Query: use `hbase`
+[localhost:21000] &gt; invalidate metadata hbasestringids;
+Fetched 0 row(s) in 0.09s
+[localhost:21000] &gt; desc hbasestringids;
++-----------------+-----------+---------+
+| name            | type      | comment |
++-----------------+-----------+---------+
+| id              | string    |         |
+| bool_col        | boolean   |         |
+| double_col      | double    |         |
+| float_col       | float     |         |
+| bigint_col      | bigint    |         |
+| int_col         | int       |         |
+| smallint_col    | smallint  |         |
+| tinyint_col     | tinyint   |         |
+| date_string_col | string    |         |
+| string_col      | string    |         |
+| timestamp_col   | timestamp |         |
++-----------------+-----------+---------+
+Fetched 11 row(s) in 0.02s
+[localhost:21000] &gt; insert into hbasestringids values ('0001',true,3.141,9.94,1234567,32768,4000,76,'2014-12-31','Hello world',now());
+Inserted 1 row(s) in 0.26s
+[localhost:21000] &gt; insert into hbasestringids values ('0002',false,2.004,6.196,1500,8000,129,127,'2014-01-01','Foo bar',now());
+Inserted 1 row(s) in 0.12s
+[localhost:21000] &gt; select * from hbasestringids where id = '0001';
++------+----------+------------+-------------------+------------+---------+--------------+-------------+-----------------+-------------+-------------------------------+
+| id   | bool_col | double_col | float_col         | bigint_col | int_col | smallint_col | tinyint_col | date_string_col | string_col  | timestamp_col                 |
++------+----------+------------+-------------------+------------+---------+--------------+-------------+-----------------+-------------+-------------------------------+
+| 0001 | true     | 3.141      | 9.939999580383301 | 1234567    | 32768   | 4000         | 76          | 2014-12-31      | Hello world | 2015-02-10 16:36:59.764838000 |
++------+----------+------------+-------------------+------------+---------+--------------+-------------+-----------------+-------------+-------------------------------+
+Fetched 1 row(s) in 0.54s
+</code></pre>
+
+        <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+        After you create a table in Hive, such as the HBase mapping table in this example, issue an
+        <code class="ph codeph">INVALIDATE METADATA <var class="keyword varname">table_name</var></code> statement the next time you connect to
+        Impala, make Impala aware of the new table. (Prior to Impala 1.2.4, you could not specify the table name if
+        Impala was not aware of the table yet; in Impala 1.2.4 and higher, specifying the table name avoids
+        reloading the metadata for other tables that are not changed.)
+      </div>
+    </div>
+  </article>
+</article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_hbase_cache_blocks.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_hbase_cache_blocks.html b/docs/build3x/html/topics/impala_hbase_cache_blocks.html
new file mode 100644
index 0000000..27ebee3
--- /dev/null
+++ b/docs/build3x/html/topics/impala_hbase_cache_blocks.html
@@ -0,0 +1,36 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="hbase_cache_blocks"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>HBASE_CACHE_BLOCKS Query Option</title></head><body id="hbase_cache_blocks"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">HBASE_CACHE_BLOCKS Query Option</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+
+      Setting this option is equivalent to calling the
+        <code class="ph codeph">setCacheBlocks</code> method of the class <a class="xref" href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html" target="_blank">org.apache.hadoop.hbase.client.Scan</a>, in an HBase Java
+      application. Helps to control the memory pressure on the HBase
+      RegionServer, in conjunction with the <code class="ph codeph">HBASE_CACHING</code> query
+      option. </p>
+
+    <p class="p">
+        <strong class="ph b">Type:</strong> Boolean; recognized values are 1 and 0, or <code class="ph codeph">true</code> and <code class="ph codeph">false</code>;
+        any other value interpreted as <code class="ph codeph">false</code>
+      </p>
+    <p class="p">
+        <strong class="ph b">Default:</strong> <code class="ph codeph">false</code> (shown as 0 in output of <code class="ph codeph">SET</code> statement)
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+    <p class="p">
+      <a class="xref" href="impala_hbase.html#impala_hbase">Using Impala to Query HBase Tables</a>,
+      <a class="xref" href="impala_hbase_caching.html#hbase_caching">HBASE_CACHING Query Option</a>
+    </p>
+
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_hbase_caching.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_hbase_caching.html b/docs/build3x/html/topics/impala_hbase_caching.html
new file mode 100644
index 0000000..e2082d4
--- /dev/null
+++ b/docs/build3x/html/topics/impala_hbase_caching.html
@@ -0,0 +1,36 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="hbase_caching"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>HBASE_CACHING Query Option</title></head><body id="hbase_caching"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">HBASE_CACHING Query Option</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+
+      Setting this option is equivalent to calling the
+        <code class="ph codeph">setCaching</code> method of the class <a class="xref" href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html" target="_blank">org.apache.hadoop.hbase.client.Scan</a>, in an HBase Java
+      application. Helps to control the memory pressure on the HBase
+      RegionServer, in conjunction with the <code class="ph codeph">HBASE_CACHE_BLOCKS</code>
+      query option. </p>
+
+    <p class="p">
+      <strong class="ph b">Type:</strong> <code class="ph codeph">BOOLEAN</code>
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Default:</strong> 0
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+    <p class="p">
+      <a class="xref" href="impala_hbase.html#impala_hbase">Using Impala to Query HBase Tables</a>,
+      <a class="xref" href="impala_hbase_cache_blocks.html#hbase_cache_blocks">HBASE_CACHE_BLOCKS Query Option</a>
+    </p>
+
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_hints.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_hints.html b/docs/build3x/html/topics/impala_hints.html
new file mode 100644
index 0000000..7777fa2
--- /dev/null
+++ b/docs/build3x/html/topics/impala_hints.html
@@ -0,0 +1,488 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_langref_sql.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="hints"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Optimizer Hints</title></head><body id="hints"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Optimizer Hints</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+       The Impala SQL supports
+      query hints, for fine-tuning the inner workings of queries. Specify hints
+      as a temporary workaround for expensive queries, where missing statistics
+      or other factors cause inefficient performance. </p>
+
+    <p class="p"> Hints are most often used for the resource-intensive Impala queries,
+      such as: </p>
+
+    <ul class="ul">
+      <li class="li">
+        Join queries involving large tables, where intermediate result sets are transmitted across the network to
+        evaluate the join conditions.
+      </li>
+
+      <li class="li">
+        Inserting into partitioned Parquet tables, where many memory buffers could be allocated on each host to
+        hold intermediate results for each partition.
+      </li>
+    </ul>
+
+    <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+    <p class="p"> In <span class="keyword">Impala 2.0</span> and higher, you can
+      specify the hints inside comments that use either the <code class="ph codeph">/*
+        */</code> or <code class="ph codeph">--</code> notation. Specify a
+        <code class="ph codeph">+</code> symbol immediately before the hint name. Recently
+      added hints are only available using the <code class="ph codeph">/* */</code> and
+        <code class="ph codeph">--</code> notation. For clarity, the <code class="ph codeph">/* */</code>
+      and <code class="ph codeph">--</code> styles are used in the syntax and examples
+      throughout this section. With the <code class="ph codeph">/* */</code> or
+        <code class="ph codeph">--</code> notation for hints, specify a <code class="ph codeph">+</code>
+      symbol immediately before the first hint name. Multiple hints can be
+      specified separated by commas, for example <code class="ph codeph">/* +clustered,shuffle
+        */</code>
+    </p>
+
+<pre class="pre codeblock"><code>SELECT STRAIGHT_JOIN <var class="keyword varname">select_list</var> FROM
+<var class="keyword varname">join_left_hand_table</var>
+  JOIN /* +BROADCAST|SHUFFLE */
+<var class="keyword varname">join_right_hand_table</var>
+<var class="keyword varname">remainder_of_query</var>;
+
+SELECT <var class="keyword varname">select_list</var> FROM
+<var class="keyword varname">join_left_hand_table</var>
+  JOIN -- +BROADCAST|SHUFFLE
+<var class="keyword varname">join_right_hand_table</var>
+<var class="keyword varname">remainder_of_query</var>;
+
+INSERT <var class="keyword varname">insert_clauses</var>
+  /* +SHUFFLE|NOSHUFFLE */
+  SELECT <var class="keyword varname">remainder_of_query</var>;
+
+INSERT <var class="keyword varname">insert_clauses</var>
+  -- +SHUFFLE|NOSHUFFLE
+  SELECT <var class="keyword varname">remainder_of_query</var>;
+
+<span class="ph">
+INSERT /* +SHUFFLE|NOSHUFFLE */
+  <var class="keyword varname">insert_clauses</var>
+  SELECT <var class="keyword varname">remainder_of_query</var>;</span>
+
+<span class="ph">
+INSERT -- +SHUFFLE|NOSHUFFLE
+  <var class="keyword varname">insert_clauses</var>
+  SELECT <var class="keyword varname">remainder_of_query</var>;</span>
+
+<span class="ph">
+UPSERT /* +SHUFFLE|NOSHUFFLE */
+  <var class="keyword varname">upsert_clauses</var>
+  SELECT <var class="keyword varname">remainder_of_query</var>;</span>
+
+<span class="ph">
+UPSERT -- +SHUFFLE|NOSHUFFLE
+  <var class="keyword varname">upsert_clauses</var>
+  SELECT <var class="keyword varname">remainder_of_query</var>;</span>
+
+<span class="ph">SELECT <var class="keyword varname">select_list</var> FROM
+<var class="keyword varname">table_ref</var>
+  /* +{SCHEDULE_CACHE_LOCAL | SCHEDULE_DISK_LOCAL | SCHEDULE_REMOTE}
+    [,RANDOM_REPLICA] */
+<var class="keyword varname">remainder_of_query</var>;</span>
+
+<span class="ph">INSERT <var class="keyword varname">insert_clauses</var>
+  -- +CLUSTERED
+  SELECT <var class="keyword varname">remainder_of_query</var>;
+
+INSERT <var class="keyword varname">insert_clauses</var>
+  /* +CLUSTERED */
+  SELECT <var class="keyword varname">remainder_of_query</var>;</span>
+
+<span class="ph">INSERT -- +CLUSTERED
+  <var class="keyword varname">insert_clauses</var>
+  SELECT <var class="keyword varname">remainder_of_query</var>;
+
+INSERT /* +CLUSTERED */
+  <var class="keyword varname">insert_clauses</var>
+  SELECT <var class="keyword varname">remainder_of_query</var>;
+
+UPSERT -- +CLUSTERED
+  <var class="keyword varname">upsert_clauses</var>
+  SELECT <var class="keyword varname">remainder_of_query</var>;
+
+UPSERT /* +CLUSTERED */
+  <var class="keyword varname">upsert_clauses</var>
+  SELECT <var class="keyword varname">remainder_of_query</var>;</span>
+
+CREATE /* +SHUFFLE|NOSHUFFLE */
+  <var class="keyword varname">table_clauses</var>
+  AS SELECT <var class="keyword varname">remainder_of_query</var>;
+
+CREATE -- +SHUFFLE|NOSHUFFLE
+  <var class="keyword varname">table_clauses</var>
+  AS SELECT <var class="keyword varname">remainder_of_query</var>;
+
+CREATE /* +CLUSTER|NOCLUSTER */
+  <var class="keyword varname">table_clauses</var>
+  AS SELECT <var class="keyword varname">remainder_of_query</var>;
+
+CREATE -- +CLUSTER|NOCLUSTER
+  <var class="keyword varname">table_clauses</var>
+  AS SELECT <var class="keyword varname">remainder_of_query</var>;
+</code></pre>
+    <p class="p">The square bracket style hints are supported for backward compatibility,
+      but the syntax is deprecated and will be removed in a future release. For
+      that reason, any newly added hints are not available with the square
+      bracket syntax.</p>
+    <pre class="pre codeblock"><code>SELECT STRAIGHT_JOIN <var class="keyword varname">select_list</var> FROM
+<var class="keyword varname">join_left_hand_table</var>
+  JOIN [{ /* +BROADCAST */ | /* +SHUFFLE */ }]
+<var class="keyword varname">join_right_hand_table</var>
+<var class="keyword varname">remainder_of_query</var>;
+
+INSERT <var class="keyword varname">insert_clauses</var>
+  [{ /* +SHUFFLE */ | /* +NOSHUFFLE */ }]
+  [<span class="ph">/* +CLUSTERED */</span>]
+  SELECT <var class="keyword varname">remainder_of_query</var>;
+
+<span class="ph">
+UPSERT [{ /* +SHUFFLE */ | /* +NOSHUFFLE */ }]
+  [<span class="ph">/* +CLUSTERED */</span>]
+  <var class="keyword varname">upsert_clauses</var>
+  SELECT <var class="keyword varname">remainder_of_query</var>;</span>
+</code></pre>
+
+    <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+    <p class="p">
+      With both forms of hint syntax, include the <code class="ph codeph">STRAIGHT_JOIN</code>
+      keyword immediately after the <code class="ph codeph">SELECT</code> and any
+      <code class="ph codeph">DISTINCT</code> or <code class="ph codeph">ALL</code> keywords to prevent Impala from
+      reordering the tables in a way that makes the join-related hints ineffective.
+    </p>
+
+    <p class="p">
+        The <code class="ph codeph">STRAIGHT_JOIN</code> hint affects the join order of table references in the query
+        block containing the hint. It does not affect the join order of nested queries, such as views,
+        inline views, or <code class="ph codeph">WHERE</code>-clause subqueries. To use this hint for performance
+        tuning of complex queries, apply the hint to all query blocks that need a fixed join order.
+      </p>
+
+    <p class="p">
+      To reduce the need to use hints, run the <code class="ph codeph">COMPUTE STATS</code> statement against all tables involved
+      in joins, or used as the source tables for <code class="ph codeph">INSERT ... SELECT</code> operations where the
+      destination is a partitioned Parquet table. Do this operation after loading data or making substantial
+      changes to the data within each table. Having up-to-date statistics helps Impala choose more efficient query
+      plans without the need for hinting. See <a class="xref" href="impala_perf_stats.html#perf_stats">Table and Column Statistics</a> for details and
+      examples.
+    </p>
+
+    <p class="p">
+      To see which join strategy is used for a particular query, examine the <code class="ph codeph">EXPLAIN</code> output for
+      that query. See <a class="xref" href="impala_explain_plan.html#perf_explain">Using the EXPLAIN Plan for Performance Tuning</a> for details and examples.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Hints for join queries:</strong>
+    </p>
+
+    <p class="p">
+      The <code class="ph codeph">/* +BROADCAST */</code> and <code class="ph codeph">/* +SHUFFLE */</code> hints control the execution strategy for join
+      queries. Specify one of the following constructs immediately after the <code class="ph codeph">JOIN</code> keyword in a
+      query:
+    </p>
+
+    <ul class="ul">
+      <li class="li">
+        <code class="ph codeph">/* +SHUFFLE */</code> - Makes that join operation use the <span class="q">"partitioned"</span> technique, which divides
+        up corresponding rows from both tables using a hashing algorithm, sending subsets of the rows to other
+        nodes for processing. (The keyword <code class="ph codeph">SHUFFLE</code> is used to indicate a <span class="q">"partitioned join"</span>,
+        because that type of join is not related to <span class="q">"partitioned tables"</span>.) Since the alternative
+        <span class="q">"broadcast"</span> join mechanism is the default when table and index statistics are unavailable, you might
+        use this hint for queries where broadcast joins are unsuitable; typically, partitioned joins are more
+        efficient for joins between large tables of similar size.
+      </li>
+
+      <li class="li">
+        <code class="ph codeph">/* +BROADCAST */</code> - Makes that join operation use the <span class="q">"broadcast"</span> technique that sends the
+        entire contents of the right-hand table to all nodes involved in processing the join. This is the default
+        mode of operation when table and index statistics are unavailable, so you would typically only need it if
+        stale metadata caused Impala to mistakenly choose a partitioned join operation. Typically, broadcast joins
+        are more efficient in cases where one table is much smaller than the other. (Put the smaller table on the
+        right side of the <code class="ph codeph">JOIN</code> operator.)
+      </li>
+    </ul>
+
+    <p class="p">
+      <strong class="ph b">Hints for INSERT ... SELECT and CREATE TABLE AS SELECT (CTAS):</strong>
+    </p>
+    <p class="p" id="hints__insert_hints">
+      When inserting into partitioned tables, such as using the Parquet file
+      format, you can include a hint in the <code class="ph codeph">INSERT</code> or <code class="ph codeph">CREATE TABLE AS SELECT(CTAS)</code>
+      statements to fine-tune the overall performance of the operation and its
+      resource usage.</p>
+    <p class="p">
+      You would only use hints if an <code class="ph codeph">INSERT</code> or
+        <code class="ph codeph">CTAS</code> into a partitioned table was failing due to
+      capacity limits, or if such an operation was succeeding but with
+      less-than-optimal performance.
+    </p>
+
+    <ul class="ul">
+      <li class="li">
+        <code class="ph codeph">/* +SHUFFLE */</code> and <code class="ph codeph">/* +NOSHUFFLE */</code> Hints
+        <ul class="ul">
+          <li class="li">
+            <code class="ph codeph">/* +SHUFFLE */</code> adds an exchange node, before
+            writing the data, which re-partitions the result of the
+              <code class="ph codeph">SELECT</code> based on the partitioning columns of the
+            target table. With this hint, only one node writes to a partition at
+            a time, minimizing the global number of simultaneous writes and the
+            number of memory buffers holding data for individual partitions.
+            This also reduces fragmentation, resulting in fewer files. Thus it
+            reduces overall resource usage of the <code class="ph codeph">INSERT</code> or
+              <code class="ph codeph">CTAS</code> operation and allows some operations to
+            succeed that otherwise would fail. It does involve some data
+            transfer between the nodes so that the data files for a particular
+            partition are all written on the same node.
+
+            <p class="p">
+              Use <code class="ph codeph">/* +SHUFFLE */</code> in cases where an <code class="ph codeph">INSERT</code>
+              or <code class="ph codeph">CTAS</code> statement fails or runs inefficiently due
+              to all nodes attempting to write data for all partitions.
+            </p>
+
+            <p class="p"> If the table is unpartitioned or every partitioning expression
+              is constant, then <code class="ph codeph">/* +SHUFFLE */</code> will cause every
+              write to happen on the coordinator node.
+            </p>
+          </li>
+
+          <li class="li">
+            <code class="ph codeph">/* +NOSHUFFLE */</code> does not add exchange node before
+            inserting to partitioned tables and disables re-partitioning. So the
+            selected execution plan might be faster overall, but might also
+            produce a larger number of small data files or exceed capacity
+            limits, causing the <code class="ph codeph">INSERT</code> or <code class="ph codeph">CTAS</code>
+            operation to fail.
+
+            <p class="p"> Impala automatically uses the <code class="ph codeph">/*
+                +SHUFFLE */</code> method if any partition key column in the
+              source table, mentioned in the <code class="ph codeph">SELECT</code> clause,
+              does not have column statistics. In this case, use the <code class="ph codeph">/*
+                +NOSHUFFLE */</code> hint if you want to override this default
+              behavior.
+            </p>
+          </li>
+
+          <li class="li">
+            If column statistics are available for all partition key columns
+            in the source table mentioned in the <code class="ph codeph">INSERT ...
+              SELECT</code> or <code class="ph codeph">CTAS</code> query, Impala chooses
+            whether to use the <code class="ph codeph">/* +SHUFFLE */</code> or <code class="ph codeph">/*
+              +NOSHUFFLE */</code> technique based on the estimated number of
+            distinct values in those columns and the number of nodes involved in
+            the operation. In this case, you might need the <code class="ph codeph">/* +SHUFFLE
+              */</code> or the <code class="ph codeph">/* +NOSHUFFLE */</code> hint to
+            override the execution plan selected by Impala.
+          </li>
+        </ul>
+      </li>
+
+      <li class="li">
+        <code class="ph codeph">/* +CLUSTERED */</code> and <code class="ph codeph">/* +NOCLUSTERED
+          */</code> Hints
+        <ul class="ul">
+          <li class="li">
+            <code class="ph codeph">/* +CLUSTERED */</code> sorts data by the partition
+            columns before inserting to ensure that only one partition is
+            written at a time per node. Use this hint to reduce the number of
+            files kept open and the number of buffers kept in memory
+            simultaneously. This technique is primarily useful for inserts into
+            Parquet tables, where the large block size requires substantial
+            memory to buffer data for multiple output files at once. This hint
+            is available in <span class="keyword">Impala 2.8</span> or higher.
+
+            <p class="p">
+              Starting in <span class="keyword">Impala 3.0</span>, <code class="ph codeph">/*
+                +CLUSTERED */</code> is the default behavior for HDFS tables.
+            </p>
+          </li>
+
+          <li class="li">
+            <code class="ph codeph">/* +NOCLUSTERED */</code> does not sort by primary key
+            before insert. This hint is available in <span class="keyword">Impala 2.8</span> or higher.
+
+            <p class="p">
+              Use this hint when inserting to Kudu tables.
+            </p>
+
+            <p class="p">
+              In the versions lower than <span class="keyword">Impala 3.0</span>,
+                <code class="ph codeph">/* +NOCLUSTERED */</code> is the default in HDFS
+              tables.
+            </p>
+          </li>
+        </ul>
+      </li>
+    </ul>
+
+    <p class="p">
+      Starting from <span class="keyword">Impala 2.9</span>, <code class="ph codeph">INSERT</code>
+      or <code class="ph codeph">UPSERT</code> operations into Kudu tables automatically have
+      an exchange and sort node added to the plan that partitions and sorts the
+      rows according to the partitioning/primary key scheme of the target table
+      (unless the number of rows to be inserted is small enough to trigger
+      single node execution). Use the<code class="ph codeph"> /* +NOCLUSTERED */</code> and
+        <code class="ph codeph">/* +NOSHUFFLE */</code> hints together to disable partitioning
+      and sorting before the rows are sent to Kudu.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Hints for scheduling of HDFS blocks:</strong>
+    </p>
+
+    <p class="p">
+      The hints <code class="ph codeph">/* +SCHEDULE_CACHE_LOCAL */</code>,
+      <code class="ph codeph">/* +SCHEDULE_DISK_LOCAL */</code>, and
+      <code class="ph codeph">/* +SCHEDULE_REMOTE */</code> have the same effect
+      as specifying the <code class="ph codeph">REPLICA_PREFERENCE</code> query
+      option with the respective option settings of <code class="ph codeph">CACHE_LOCAL</code>,
+      <code class="ph codeph">DISK_LOCAL</code>, or <code class="ph codeph">REMOTE</code>.
+      The hint <code class="ph codeph">/* +RANDOM_REPLICA */</code> is the same as
+      enabling the <code class="ph codeph">SCHEDULE_RANDOM_REPLICA</code> query option.
+    </p>
+
+    <p class="p">
+      You can use these hints in combination by separating them with commas,
+      for example, <code class="ph codeph">/* +SCHEDULE_CACHE_LOCAL,RANDOM_REPLICA */</code>.
+      See <a class="xref" href="impala_replica_preference.html">REPLICA_PREFERENCE Query Option (Impala 2.7 or higher only)</a> and
+      <a class="xref" href="impala_schedule_random_replica.html">SCHEDULE_RANDOM_REPLICA Query Option (Impala 2.5 or higher only)</a> for information about how
+      these settings influence the way Impala processes HDFS data blocks.
+    </p>
+
+    <p class="p">
+      Specifying the replica preference as a query hint always overrides the
+      query option setting. Specifying either the <code class="ph codeph">SCHEDULE_RANDOM_REPLICA</code>
+      query option or the corresponding <code class="ph codeph">RANDOM_REPLICA</code> query hint
+      enables the random tie-breaking behavior when processing data blocks
+      during the query.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Suggestions versus directives:</strong>
+    </p>
+
+    <p class="p">
+      In early Impala releases, hints were always obeyed and so acted more like directives. Once Impala gained join
+      order optimizations, sometimes join queries were automatically reordered in a way that made a hint
+      irrelevant. Therefore, the hints act more like suggestions in Impala 1.2.2 and higher.
+    </p>
+
+    <p class="p">
+      To force Impala to follow the hinted execution mechanism for a join query, include the
+      <code class="ph codeph">STRAIGHT_JOIN</code> keyword in the <code class="ph codeph">SELECT</code> statement. See
+      <a class="xref" href="impala_perf_joins.html#straight_join">Overriding Join Reordering with STRAIGHT_JOIN</a> for details. When you use this technique, Impala does not
+      reorder the joined tables at all, so you must be careful to arrange the join order to put the largest table
+      (or subquery result set) first, then the smallest, second smallest, third smallest, and so on. This ordering lets Impala do the
+      most I/O-intensive parts of the query using local reads on the DataNodes, and then reduce the size of the
+      intermediate result set as much as possible as each subsequent table or subquery result set is joined.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Restrictions:</strong>
+      </p>
+
+    <p class="p">
+      Queries that include subqueries in the <code class="ph codeph">WHERE</code> clause can be rewritten internally as join
+      queries. Currently, you cannot apply hints to the joins produced by these types of queries.
+    </p>
+
+    <p class="p">
+      Because hints can prevent queries from taking advantage of new metadata or improvements in query planning,
+      use them only when required to work around performance issues, and be prepared to remove them when they are
+      no longer required, such as after a new Impala release or bug fix.
+    </p>
+
+    <p class="p">
+      In particular, the <code class="ph codeph">/* +BROADCAST */</code> and <code class="ph codeph">/* +SHUFFLE */</code> hints are expected to be
+      needed much less frequently in Impala 1.2.2 and higher, because the join order optimization feature in
+      combination with the <code class="ph codeph">COMPUTE STATS</code> statement now automatically choose join order and join
+      mechanism without the need to rewrite the query and add hints. See
+      <a class="xref" href="impala_perf_joins.html#perf_joins">Performance Considerations for Join Queries</a> for details.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Compatibility:</strong>
+      </p>
+
+    <p class="p">
+      The hints embedded within <code class="ph codeph">--</code> comments are compatible with Hive queries. The hints embedded
+      within <code class="ph codeph">/* */</code> comments or <code class="ph codeph">[ ]</code> square brackets are not recognized by or not
+      compatible with Hive. For example, Hive raises an error for Impala hints within <code class="ph codeph">/* */</code>
+      comments because it does not recognize the Impala hint names.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Considerations for views:</strong>
+      </p>
+
+    <p class="p">
+      If you use a hint in the query that defines a view, the hint is preserved when you query the view. Impala
+      internally rewrites all hints in views to use the <code class="ph codeph">--</code> comment notation, so that Hive can
+      query such views without errors due to unrecognized hint names.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+    <p class="p">
+      For example, this query joins a large customer table with a small lookup table of less than 100 rows. The
+      right-hand table can be broadcast efficiently to all nodes involved in the join. Thus, you would use the
+      <code class="ph codeph">/* +broadcast */</code> hint to force a broadcast join strategy:
+    </p>
+
+<pre class="pre codeblock"><code>select straight_join customer.address, state_lookup.state_name
+  from customer join <strong class="ph b">/* +broadcast */</strong> state_lookup
+  on customer.state_id = state_lookup.state_id;</code></pre>
+
+    <p class="p">
+      This query joins two large tables of unpredictable size. You might benchmark the query with both kinds of
+      hints and find that it is more efficient to transmit portions of each table to other nodes for processing.
+      Thus, you would use the <code class="ph codeph">/* +shuffle */</code> hint to force a partitioned join strategy:
+    </p>
+
+<pre class="pre codeblock"><code>select straight_join weather.wind_velocity, geospatial.altitude
+  from weather join <strong class="ph b">/* +shuffle */</strong> geospatial
+  on weather.lat = geospatial.lat and weather.long = geospatial.long;</code></pre>
+
+    <p class="p">
+      For joins involving three or more tables, the hint applies to the tables on either side of that specific
+      <code class="ph codeph">JOIN</code> keyword. The <code class="ph codeph">STRAIGHT_JOIN</code> keyword ensures that joins are processed
+      in a predictable order from left to right. For example, this query joins
+      <code class="ph codeph">t1</code> and <code class="ph codeph">t2</code> using a partitioned join, then joins that result set to
+      <code class="ph codeph">t3</code> using a broadcast join:
+    </p>
+
+<pre class="pre codeblock"><code>select straight_join t1.name, t2.id, t3.price
+  from t1 join <strong class="ph b">/* +shuffle */</strong> t2 join <strong class="ph b">/* +broadcast */</strong> t3
+  on t1.id = t2.id and t2.id = t3.id;</code></pre>
+
+
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+    <p class="p">
+      For more background information about join queries, see <a class="xref" href="impala_joins.html#joins">Joins in Impala SELECT Statements</a>. For
+      performance considerations, see <a class="xref" href="impala_perf_joins.html#perf_joins">Performance Considerations for Join Queries</a>.
+    </p>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_langref_sql.html">Impala SQL Statements</a></div></div></nav></article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_identifiers.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_identifiers.html b/docs/build3x/html/topics/impala_identifiers.html
new file mode 100644
index 0000000..267b91d
--- /dev/null
+++ b/docs/build3x/html/topics/impala_identifiers.html
@@ -0,0 +1,110 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_schema_objects.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="identifiers"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Overview of Impala Identifiers</title></head><body id="identifiers"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Overview of Impala Identifiers</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+      Identifiers are the names of databases, tables, or columns that you specify in a SQL statement. The rules for
+      identifiers govern what names you can give to things you create, the notation for referring to names
+      containing unusual characters, and other aspects such as case sensitivity.
+    </p>
+
+    <ul class="ul">
+      <li class="li">
+        <p class="p">
+        The minimum length of an identifier is 1 character.
+        </p>
+      </li>
+
+      <li class="li">
+        <p class="p">
+        The maximum length of an identifier is currently 128 characters, enforced by the metastore database.
+        </p>
+      </li>
+
+      <li class="li">
+        <p class="p">
+        An identifier must start with an alphabetic character. The remainder can contain any combination of
+        alphanumeric characters and underscores. Quoting the identifier with backticks has no effect on the allowed
+        characters in the name.
+        </p>
+      </li>
+
+      <li class="li">
+        <p class="p">
+        An identifier can contain only ASCII characters.
+        </p>
+      </li>
+
+      <li class="li">
+        <p class="p">
+        To use an identifier name that matches one of the Impala reserved keywords (listed in
+        <a class="xref" href="impala_reserved_words.html#reserved_words">Impala Reserved Words</a>), surround the identifier with <code class="ph codeph">``</code>
+        characters (backticks). Quote the reserved word even if it is part of a fully qualified name.
+        The following example shows how a reserved word can be used as a column name if it is quoted
+        with backticks in the <code class="ph codeph">CREATE TABLE</code> statement, and how the column name
+        must also be quoted with backticks in a query:
+        </p>
+<pre class="pre codeblock"><code>[localhost:21000] &gt; create table reserved (`data` string);
+
+[localhost:21000] &gt; select data from reserved;
+ERROR: AnalysisException: Syntax error in line 1:
+select data from reserved
+       ^
+Encountered: DATA
+Expected: ALL, CASE, CAST, DISTINCT, EXISTS, FALSE, IF, INTERVAL, NOT, NULL, STRAIGHT_JOIN, TRUE, IDENTIFIER
+CAUSED BY: Exception: Syntax error
+
+[localhost:21000] &gt; select reserved.data from reserved;
+ERROR: AnalysisException: Syntax error in line 1:
+select reserved.data from reserved
+                ^
+Encountered: DATA
+Expected: IDENTIFIER
+CAUSED BY: Exception: Syntax error
+
+[localhost:21000] &gt; select reserved.`data` from reserved;
+
+[localhost:21000] &gt;
+</code></pre>
+
+        <div class="note important note_important"><span class="note__title importanttitle">Important:</span>
+          Because the list of reserved words grows over time as new SQL syntax is added,
+          consider adopting coding conventions (especially for any automated scripts
+          or in packaged applications) to always quote all identifiers with backticks.
+          Quoting all identifiers protects your SQL from compatibility issues if
+          new reserved words are added in later releases.
+        </div>
+
+      </li>
+
+      <li class="li">
+        <p class="p">
+        Impala identifiers are always case-insensitive. That is, tables named <code class="ph codeph">t1</code> and
+        <code class="ph codeph">T1</code> always refer to the same table, regardless of quote characters. Internally, Impala
+        always folds all specified table and column names to lowercase. This is why the column headers in query
+        output are always displayed in lowercase.
+        </p>
+      </li>
+    </ul>
+
+    <p class="p">
+      See <a class="xref" href="impala_aliases.html#aliases">Overview of Impala Aliases</a> for how to define shorter or easier-to-remember aliases if the
+      original names are long or cryptic identifiers.
+      <span class="ph"> Aliases follow the same rules as identifiers when it comes to case
+        insensitivity. Aliases can be longer than identifiers (up to the maximum length of a Java string) and can
+        include additional characters such as spaces and dashes when they are quoted using backtick characters.
+        </span>
+    </p>
+
+    <p class="p">
+        Another way to define different names for the same tables or columns is to create views. See
+        <a class="xref" href="../shared/../topics/impala_views.html#views">Overview of Impala Views</a> for details.
+      </p>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_schema_objects.html">Impala Schema Objects and Object Names</a></div></div></nav></article></main></body></html>

[17/51] [partial] impala git commit: [DOCS] Impala doc site update for 3.0

Posted by mi...@apache.org.

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_optimize_partition_key_scans.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_optimize_partition_key_scans.html b/docs/build3x/html/topics/impala_optimize_partition_key_scans.html
new file mode 100644
index 0000000..6fea36f
--- /dev/null
+++ b/docs/build3x/html/topics/impala_optimize_partition_key_scans.html
@@ -0,0 +1,188 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="optimize_partition_key_scans"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>OPTIMIZE_PARTITION_KEY_SCANS Query Option (Impala 2.5 or higher only)</title></head><body id="optimize_partition_key_scans"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">OPTIMIZE_PARTITION_KEY_SCANS Query Option (<span class="keyword">Impala 2.5</span> or higher only)</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+
+      Enables a fast code path for queries that apply simple aggregate functions to partition key
+      columns: <code class="ph codeph">MIN(<var class="keyword varname">key_column</var>)</code>, <code class="ph codeph">MAX(<var class="keyword varname">key_column</var>)</code>,
+      or <code class="ph codeph">COUNT(DISTINCT <var class="keyword varname">key_column</var>)</code>.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Type:</strong> Boolean; recognized values are 1 and 0, or <code class="ph codeph">true</code> and <code class="ph codeph">false</code>;
+        any other value interpreted as <code class="ph codeph">false</code>
+      </p>
+    <p class="p">
+        <strong class="ph b">Default:</strong> <code class="ph codeph">false</code> (shown as 0 in output of <code class="ph codeph">SET</code> statement)
+      </p>
+
+    <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+        In <span class="keyword">Impala 2.5.0</span>, only the value 1 enables the option, and the value
+        <code class="ph codeph">true</code> is not recognized. This limitation is
+        tracked by the issue
+        <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-3334" target="_blank">IMPALA-3334</a>,
+        which shows the releases where the problem is fixed.
+      </div>
+
+    <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.5.0</span>
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+    <p class="p">
+      This optimization speeds up common <span class="q">"introspection"</span> operations when using queries
+      to calculate the cardinality and range for partition key columns.
+    </p>
+
+    <p class="p">
+      This optimization does not apply if the queries contain any <code class="ph codeph">WHERE</code>,
+      <code class="ph codeph">GROUP BY</code>, or <code class="ph codeph">HAVING</code> clause. The relevant queries
+      should only compute the minimum, maximum, or number of distinct values for the
+      partition key columns across the whole table.
+    </p>
+
+    <p class="p">
+      This optimization is enabled by a query option because it skips some consistency checks
+      and therefore can return slightly different partition values if partitions are in the
+      process of being added, dropped, or loaded outside of Impala. Queries might exhibit different
+      behavior depending on the setting of this option in the following cases:
+    </p>
+
+    <ul class="ul">
+      <li class="li">
+        <p class="p">
+          If files are removed from a partition using HDFS or other non-Impala operations,
+          there is a period until the next <code class="ph codeph">REFRESH</code> of the table where regular
+          queries fail at run time because they detect the missing files. With this optimization
+          enabled, queries that evaluate only the partition key column values (not the contents of
+          the partition itself) succeed, and treat the partition as if it still exists.
+        </p>
+      </li>
+      <li class="li">
+        <p class="p">
+          If a partition contains any data files, but the data files do not contain any rows,
+          a regular query considers that the partition does not exist. With this optimization
+          enabled, the partition is treated as if it exists.
+        </p>
+        <p class="p">
+          If the partition includes no files at all, this optimization does not change the query
+          behavior: the partition is considered to not exist whether or not this optimization is enabled.
+        </p>
+      </li>
+    </ul>
+
+    <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+    <p class="p">
+      The following example shows initial schema setup and the default behavior of queries that
+      return just the partition key column for a table:
+    </p>
+
+<pre class="pre codeblock"><code>
+-- Make a partitioned table with 3 partitions.
+create table t1 (s string) partitioned by (year int);
+insert into t1 partition (year=2015) values ('last year');
+insert into t1 partition (year=2016) values ('this year');
+insert into t1 partition (year=2017) values ('next year');
+
+-- Regardless of the option setting, this query must read the
+-- data files to know how many rows to return for each year value.
+explain select year from t1;
++-----------------------------------------------------+
+| Explain String                                      |
++-----------------------------------------------------+
+| Estimated Per-Host Requirements: Memory=0B VCores=0 |
+|                                                     |
+| F00:PLAN FRAGMENT [UNPARTITIONED]                   |
+|   00:SCAN HDFS [key_cols.t1]                        |
+|      partitions=3/3 files=4 size=40B                |
+|      table stats: 3 rows total                      |
+|      column stats: all                              |
+|      hosts=3 per-host-mem=unavailable               |
+|      tuple-ids=0 row-size=4B cardinality=3          |
++-----------------------------------------------------+
+
+-- The aggregation operation means the query does not need to read
+-- the data within each partition: the result set contains exactly 1 row
+-- per partition, derived from the partition key column value.
+-- By default, Impala still includes a 'scan' operation in the query.
+explain select distinct year from t1;
++------------------------------------------------------------------------------------+
+| Explain String                                                                     |
++------------------------------------------------------------------------------------+
+| Estimated Per-Host Requirements: Memory=0B VCores=0                                |
+|                                                                                    |
+| 01:AGGREGATE [FINALIZE]                                                            |
+| |  group by: year                                                                  |
+| |                                                                                  |
+| 00:SCAN HDFS [key_cols.t1]                                                         |
+|    partitions=0/0 files=0 size=0B                                                  |
++------------------------------------------------------------------------------------+
+</code></pre>
+
+    <p class="p">
+      The following examples show how the plan is made more efficient when the
+      <code class="ph codeph">OPTIMIZE_PARTITION_KEY_SCANS</code> option is enabled:
+    </p>
+
+<pre class="pre codeblock"><code>
+set optimize_partition_key_scans=1;
+OPTIMIZE_PARTITION_KEY_SCANS set to 1
+
+-- The aggregation operation is turned into a UNION internally,
+-- with constant values known in advance based on the metadata
+-- for the partitioned table.
+explain select distinct year from t1;
++-----------------------------------------------------+
+| Explain String                                      |
++-----------------------------------------------------+
+| Estimated Per-Host Requirements: Memory=0B VCores=0 |
+|                                                     |
+| F00:PLAN FRAGMENT [UNPARTITIONED]                   |
+|   01:AGGREGATE [FINALIZE]                           |
+|   |  group by: year                                 |
+|   |  hosts=1 per-host-mem=unavailable               |
+|   |  tuple-ids=1 row-size=4B cardinality=3          |
+|   |                                                 |
+|   00:UNION                                          |
+|      constant-operands=3                            |
+|      hosts=1 per-host-mem=unavailable               |
+|      tuple-ids=0 row-size=4B cardinality=3          |
++-----------------------------------------------------+
+
+-- The same optimization applies to other aggregation queries
+-- that only return values based on partition key columns:
+-- MIN, MAX, COUNT(DISTINCT), and so on.
+explain select min(year) from t1;
++-----------------------------------------------------+
+| Explain String                                      |
++-----------------------------------------------------+
+| Estimated Per-Host Requirements: Memory=0B VCores=0 |
+|                                                     |
+| F00:PLAN FRAGMENT [UNPARTITIONED]                   |
+|   01:AGGREGATE [FINALIZE]                           |
+|   |  output: min(year)                              |
+|   |  hosts=1 per-host-mem=unavailable               |
+|   |  tuple-ids=1 row-size=4B cardinality=1          |
+|   |                                                 |
+|   00:UNION                                          |
+|      constant-operands=3                            |
+|      hosts=1 per-host-mem=unavailable               |
+|      tuple-ids=0 row-size=4B cardinality=3          |
++-----------------------------------------------------+
+</code></pre>
+
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_order_by.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_order_by.html b/docs/build3x/html/topics/impala_order_by.html
new file mode 100644
index 0000000..b4cc1f3
--- /dev/null
+++ b/docs/build3x/html/topics/impala_order_by.html
@@ -0,0 +1,398 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_select.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="order_by"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>ORDER BY Clause</title></head><body id="order_by"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">ORDER BY Clause</h1>
+
+
+  <div class="body conbody">
+
+    <p class="p">
+      The familiar <code class="ph codeph">ORDER BY</code> clause of a <code class="ph codeph">SELECT</code> statement sorts the result set
+      based on the values from one or more columns.
+    </p>
+
+    <p class="p">
+      For distributed queries, this is a relatively expensive operation, because the entire result set must be
+      produced and transferred to one node before the sorting can happen. This can require more memory capacity
+      than a query without <code class="ph codeph">ORDER BY</code>. Even if the query takes approximately the same time to finish
+      with or without the <code class="ph codeph">ORDER BY</code> clause, subjectively it can appear slower because no results
+      are available until all processing is finished, rather than results coming back gradually as rows matching
+      the <code class="ph codeph">WHERE</code> clause are found. Therefore, if you only need the first N results from the sorted
+      result set, also include the <code class="ph codeph">LIMIT</code> clause, which reduces network overhead and the memory
+      requirement on the coordinator node.
+    </p>
+
+    <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+      <p class="p">
+        In Impala 1.4.0 and higher, the <code class="ph codeph">LIMIT</code> clause is now optional (rather than required) for
+        queries that use the <code class="ph codeph">ORDER BY</code> clause. Impala automatically uses a temporary disk work area
+        to perform the sort if the sort operation would otherwise exceed the Impala memory limit for a particular
+        DataNode.
+      </p>
+    </div>
+
+    <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+    <p class="p">
+      The full syntax for the <code class="ph codeph">ORDER BY</code> clause is:
+    </p>
+
+<pre class="pre codeblock"><code>ORDER BY <var class="keyword varname">col_ref</var> [, <var class="keyword varname">col_ref</var> ...] [ASC | DESC] [NULLS FIRST | NULLS LAST]
+
+col_ref ::= <var class="keyword varname">column_name</var> | <var class="keyword varname">integer_literal</var>
+</code></pre>
+
+    <p class="p">
+      Although the most common usage is <code class="ph codeph">ORDER BY <var class="keyword varname">column_name</var></code>, you can also
+      specify <code class="ph codeph">ORDER BY 1</code> to sort by the first column of the result set, <code class="ph codeph">ORDER BY
+      2</code> to sort by the second column, and so on. The number must be a numeric literal, not some other kind
+      of constant expression. (If the argument is some other expression, even a <code class="ph codeph">STRING</code> value, the
+      query succeeds but the order of results is undefined.)
+    </p>
+
+    <p class="p">
+      <code class="ph codeph">ORDER BY <var class="keyword varname">column_number</var></code> can only be used when the query explicitly lists
+      the columns in the <code class="ph codeph">SELECT</code> list, not with <code class="ph codeph">SELECT *</code> queries.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Ascending and descending sorts:</strong>
+    </p>
+
+    <p class="p">
+      The default sort order (the same as using the <code class="ph codeph">ASC</code> keyword) puts the smallest values at the
+      start of the result set, and the largest values at the end. Specifying the <code class="ph codeph">DESC</code> keyword
+      reverses that order.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Sort order for NULL values:</strong>
+    </p>
+
+    <p class="p">
+      See <a class="xref" href="impala_literals.html#null">NULL</a> for details about how <code class="ph codeph">NULL</code> values are positioned
+      in the sorted result set, and how to use the <code class="ph codeph">NULLS FIRST</code> and <code class="ph codeph">NULLS LAST</code>
+      clauses. (The sort position for <code class="ph codeph">NULL</code> values in <code class="ph codeph">ORDER BY ... DESC</code> queries is
+      changed in Impala 1.2.1 and higher to be more standards-compliant, and the <code class="ph codeph">NULLS FIRST</code> and
+      <code class="ph codeph">NULLS LAST</code> keywords are new in Impala 1.2.1.)
+    </p>
+
+    <p class="p">
+        Prior to Impala 1.4.0, Impala required any query including an
+        <code class="ph codeph"><a class="xref" href="../shared/../topics/impala_order_by.html#order_by">ORDER BY</a></code> clause to also use a
+        <code class="ph codeph"><a class="xref" href="../shared/../topics/impala_limit.html#limit">LIMIT</a></code> clause. In Impala 1.4.0 and
+        higher, the <code class="ph codeph">LIMIT</code> clause is optional for <code class="ph codeph">ORDER BY</code> queries. In cases where
+        sorting a huge result set requires enough memory to exceed the Impala memory limit for a particular node,
+        Impala automatically uses a temporary disk work area to perform the sort operation.
+      </p>
+
+
+    <p class="p">
+        <strong class="ph b">Complex type considerations:</strong>
+      </p>
+
+    <p class="p">
+      In <span class="keyword">Impala 2.3</span> and higher, the complex data types <code class="ph codeph">STRUCT</code>,
+      <code class="ph codeph">ARRAY</code>, and <code class="ph codeph">MAP</code> are available. These columns cannot
+      be referenced directly in the <code class="ph codeph">ORDER BY</code> clause.
+      When you query a complex type column, you use join notation to <span class="q">"unpack"</span> the elements
+      of the complex type, and within the join query you can include an <code class="ph codeph">ORDER BY</code>
+      clause to control the order in the result set of the scalar elements from the complex type.
+      See <a class="xref" href="impala_complex_types.html#complex_types">Complex Types (Impala 2.3 or higher only)</a> for details about Impala support for complex types.
+    </p>
+
+    <p class="p">
+      The following query shows how a complex type column cannot be directly used in an <code class="ph codeph">ORDER BY</code> clause:
+    </p>
+
+<pre class="pre codeblock"><code>CREATE TABLE games (id BIGINT, score ARRAY &lt;BIGINT&gt;) STORED AS PARQUET;
+...use LOAD DATA to load externally created Parquet files into the table...
+SELECT id FROM games ORDER BY score DESC;
+ERROR: AnalysisException: ORDER BY expression 'score' with complex type 'ARRAY&lt;BIGINT&gt;' is not supported.
+</code></pre>
+
+    <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+    <p class="p">
+      The following query retrieves the user ID and score, only for scores greater than one million,
+      with the highest scores for each user listed first.
+      Because the individual array elements are now represented as separate rows in the result set,
+      they can be used in the <code class="ph codeph">ORDER BY</code> clause, referenced using the <code class="ph codeph">ITEM</code>
+      pseudocolumn that represents each array element.
+    </p>
+
+<pre class="pre codeblock"><code>SELECT id, item FROM games, games.score
+  WHERE item &gt; 1000000
+ORDER BY id, item desc;
+</code></pre>
+
+    <p class="p">
+      The following queries use similar <code class="ph codeph">ORDER BY</code> techniques with variations of the <code class="ph codeph">GAMES</code>
+      table, where the complex type is an <code class="ph codeph">ARRAY</code> containing <code class="ph codeph">STRUCT</code> or <code class="ph codeph">MAP</code>
+      elements to represent additional details about each game that was played.
+      For an array of structures, the fields of the structure are referenced as <code class="ph codeph">ITEM.<var class="keyword varname">field_name</var></code>.
+      For an array of maps, the keys and values within each array element are referenced as <code class="ph codeph">ITEM.KEY</code>
+      and <code class="ph codeph">ITEM.VALUE</code>.
+    </p>
+
+<pre class="pre codeblock"><code>CREATE TABLE games2 (id BIGINT, play array &lt; struct &lt;game_name: string, score: BIGINT, high_score: boolean&gt; &gt;) STORED AS PARQUET
+...use LOAD DATA to load externally created Parquet files into the table...
+SELECT id, item.game_name, item.score FROM games2, games2.play
+  WHERE item.score &gt; 1000000
+ORDER BY id, item.score DESC;
+
+CREATE TABLE games3 (id BIGINT, play ARRAY &lt; MAP &lt;STRING, BIGINT&gt; &gt;) STORED AS PARQUET;
+...use LOAD DATA to load externally created Parquet files into the table...
+SELECT id, info.key AS k, info.value AS v from games3, games3.play AS plays, games3.play.item AS info
+  WHERE info.KEY = 'score' AND info.VALUE &gt; 1000000
+ORDER BY id, info.value desc;
+</code></pre>
+
+    <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+    <p class="p">
+      Although the <code class="ph codeph">LIMIT</code> clause is now optional on <code class="ph codeph">ORDER BY</code> queries, if your
+      query only needs some number of rows that you can predict in advance, use the <code class="ph codeph">LIMIT</code> clause
+      to reduce unnecessary processing. For example, if the query has a clause <code class="ph codeph">LIMIT 10</code>, each data
+      node sorts its portion of the relevant result set and only returns 10 rows to the coordinator node. The
+      coordinator node picks the 10 highest or lowest row values out of this small intermediate result set.
+    </p>
+
+    <p class="p">
+      If an <code class="ph codeph">ORDER BY</code> clause is applied to an early phase of query processing, such as a subquery
+      or a view definition, Impala ignores the <code class="ph codeph">ORDER BY</code> clause. To get ordered results from a
+      subquery or view, apply an <code class="ph codeph">ORDER BY</code> clause to the outermost or final <code class="ph codeph">SELECT</code>
+      level.
+    </p>
+
+    <p class="p">
+      <code class="ph codeph">ORDER BY</code> is often used in combination with <code class="ph codeph">LIMIT</code> to perform <span class="q">"top-N"</span>
+      queries:
+    </p>
+
+<pre class="pre codeblock"><code>SELECT user_id AS "Top 10 Visitors", SUM(page_views) FROM web_stats
+  GROUP BY page_views, user_id
+  ORDER BY SUM(page_views) DESC LIMIT 10;
+</code></pre>
+
+    <p class="p">
+      <code class="ph codeph">ORDER BY</code> is sometimes used in combination with <code class="ph codeph">OFFSET</code> and
+      <code class="ph codeph">LIMIT</code> to paginate query results, although it is relatively inefficient to issue multiple
+      queries like this against the large tables typically used with Impala:
+    </p>
+
+<pre class="pre codeblock"><code>SELECT page_title AS "Page 1 of search results", page_url FROM search_content
+  WHERE LOWER(page_title) LIKE '%game%')
+  ORDER BY page_title LIMIT 10 OFFSET 0;
+SELECT page_title AS "Page 2 of search results", page_url FROM search_content
+  WHERE LOWER(page_title) LIKE '%game%')
+  ORDER BY page_title LIMIT 10 OFFSET 10;
+SELECT page_title AS "Page 3 of search results", page_url FROM search_content
+  WHERE LOWER(page_title) LIKE '%game%')
+  ORDER BY page_title LIMIT 10 OFFSET 20;
+</code></pre>
+
+    <p class="p">
+        <strong class="ph b">Internal details:</strong>
+      </p>
+
+    <p class="p">
+      Impala sorts the intermediate results of an <code class="ph codeph">ORDER BY</code> clause in memory whenever practical. In
+      a cluster of N DataNodes, each node sorts roughly 1/Nth of the result set, the exact proportion varying
+      depending on how the data matching the query is distributed in HDFS.
+    </p>
+
+    <p class="p">
+      If the size of the sorted intermediate result set on any DataNode would cause the query to exceed the Impala
+      memory limit, Impala sorts as much as practical in memory, then writes partially sorted data to disk. (This
+      technique is known in industry terminology as <span class="q">"external sorting"</span> and <span class="q">"spilling to disk"</span>.) As each
+      8 MB batch of data is written to disk, Impala frees the corresponding memory to sort a new 8 MB batch of
+      data. When all the data has been processed, a final merge sort operation is performed to correctly order the
+      in-memory and on-disk results as the result set is transmitted back to the coordinator node. When external
+      sorting becomes necessary, Impala requires approximately 60 MB of RAM at a minimum for the buffers needed to
+      read, write, and sort the intermediate results. If more RAM is available on the DataNode, Impala will use
+      the additional RAM to minimize the amount of disk I/O for sorting.
+    </p>
+
+    <p class="p">
+      This external sort technique is used as appropriate on each DataNode (possibly including the coordinator
+      node) to sort the portion of the result set that is processed on that node. When the sorted intermediate
+      results are sent back to the coordinator node to produce the final result set, the coordinator node uses a
+      merge sort technique to produce a final sorted result set without using any extra resources on the
+      coordinator node.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Configuration for disk usage:</strong>
+    </p>
+
+    <p class="p">
+        By default, intermediate files used during large sort, join, aggregation, or analytic function operations
+        are stored in the directory <span class="ph filepath">/tmp/impala-scratch</span> . These files are removed when the
+        operation finishes. (Multiple concurrent queries can perform operations that use the <span class="q">"spill to disk"</span>
+        technique, without any name conflicts for these temporary files.) You can specify a different location by
+        starting the <span class="keyword cmdname">impalad</span> daemon with the
+        <code class="ph codeph">--scratch_dirs="<var class="keyword varname">path_to_directory</var>"</code> configuration option.
+        You can specify a single directory, or a comma-separated list of directories. The scratch directories must
+        be on the local filesystem, not in HDFS. You might specify different directory paths for different hosts,
+        depending on the capacity and speed
+        of the available storage devices. In <span class="keyword">Impala 2.3</span> or higher, Impala successfully starts (with a warning
+        Impala successfully starts (with a warning written to the log) if it cannot create or read and write files
+        in one of the scratch directories. If there is less than 1 GB free on the filesystem where that directory resides,
+        Impala still runs, but writes a warning message to its log.  If Impala encounters an error reading or writing
+        files in a scratch directory during a query, Impala logs the error and the query fails.
+      </p>
+
+
+
+
+
+    <p class="p">
+        <strong class="ph b">Sorting considerations:</strong> Although you can specify an <code class="ph codeph">ORDER BY</code> clause in an
+        <code class="ph codeph">INSERT ... SELECT</code> statement, any <code class="ph codeph">ORDER BY</code> clause is ignored and the
+        results are not necessarily sorted. An <code class="ph codeph">INSERT ... SELECT</code> operation potentially creates
+        many different data files, prepared on different data nodes, and therefore the notion of the data being
+        stored in sorted order is impractical.
+      </p>
+
+    <div class="p">
+        An <code class="ph codeph">ORDER BY</code> clause without an additional <code class="ph codeph">LIMIT</code> clause is ignored in any
+        view definition. If you need to sort the entire result set from a view, use an <code class="ph codeph">ORDER BY</code>
+        clause in the <code class="ph codeph">SELECT</code> statement that queries the view. You can still make a simple <span class="q">"top
+        10"</span> report by combining the <code class="ph codeph">ORDER BY</code> and <code class="ph codeph">LIMIT</code> clauses in the same
+        view definition:
+<pre class="pre codeblock"><code>[localhost:21000] &gt; create table unsorted (x bigint);
+[localhost:21000] &gt; insert into unsorted values (1), (9), (3), (7), (5), (8), (4), (6), (2);
+[localhost:21000] &gt; create view sorted_view as select x from unsorted order by x;
+[localhost:21000] &gt; select x from sorted_view; -- ORDER BY clause in view has no effect.
++---+
+| x |
++---+
+| 1 |
+| 9 |
+| 3 |
+| 7 |
+| 5 |
+| 8 |
+| 4 |
+| 6 |
+| 2 |
++---+
+[localhost:21000] &gt; select x from sorted_view order by x; -- View query requires ORDER BY at outermost level.
++---+
+| x |
++---+
+| 1 |
+| 2 |
+| 3 |
+| 4 |
+| 5 |
+| 6 |
+| 7 |
+| 8 |
+| 9 |
++---+
+[localhost:21000] &gt; create view top_3_view as select x from unsorted order by x limit 3;
+[localhost:21000] &gt; select x from top_3_view; -- ORDER BY and LIMIT together in view definition are preserved.
++---+
+| x |
++---+
+| 1 |
+| 2 |
+| 3 |
++---+
+</code></pre>
+      </div>
+
+    <p class="p">
+      With the lifting of the requirement to include a <code class="ph codeph">LIMIT</code> clause in every <code class="ph codeph">ORDER
+      BY</code> query (in Impala 1.4 and higher):
+    </p>
+
+    <ul class="ul">
+      <li class="li">
+        <p class="p">
+          Now the use of scratch disk space raises the possibility of an <span class="q">"out of disk space"</span> error on a
+          particular DataNode, as opposed to the previous possibility of an <span class="q">"out of memory"</span> error. Make sure
+          to keep at least 1 GB free on the filesystem used for temporary sorting work.
+        </p>
+      </li>
+
+    </ul>
+
+    <p class="p">
+        In Impala 1.2.1 and higher, all <code class="ph codeph">NULL</code> values come at the end of the result set for
+        <code class="ph codeph">ORDER BY ... ASC</code> queries, and at the beginning of the result set for <code class="ph codeph">ORDER BY ...
+        DESC</code> queries. In effect, <code class="ph codeph">NULL</code> is considered greater than all other values for
+        sorting purposes. The original Impala behavior always put <code class="ph codeph">NULL</code> values at the end, even for
+        <code class="ph codeph">ORDER BY ... DESC</code> queries. The new behavior in Impala 1.2.1 makes Impala more compatible
+        with other popular database systems. In Impala 1.2.1 and higher, you can override or specify the sorting
+        behavior for <code class="ph codeph">NULL</code> by adding the clause <code class="ph codeph">NULLS FIRST</code> or <code class="ph codeph">NULLS
+        LAST</code> at the end of the <code class="ph codeph">ORDER BY</code> clause.
+      </p>
+<pre class="pre codeblock"><code>[localhost:21000] &gt; create table numbers (x int);
+[localhost:21000] &gt; insert into numbers values (1), (null), (2), (null), (3);
+[localhost:21000] &gt; select x from numbers order by x nulls first;
++------+
+| x    |
++------+
+| NULL |
+| NULL |
+| 1    |
+| 2    |
+| 3    |
++------+
+[localhost:21000] &gt; select x from numbers order by x desc nulls first;
++------+
+| x    |
++------+
+| NULL |
+| NULL |
+| 3    |
+| 2    |
+| 1    |
++------+
+[localhost:21000] &gt; select x from numbers order by x nulls last;
++------+
+| x    |
++------+
+| 1    |
+| 2    |
+| 3    |
+| NULL |
+| NULL |
++------+
+[localhost:21000] &gt; select x from numbers order by x desc nulls last;
++------+
+| x    |
++------+
+| 3    |
+| 2    |
+| 1    |
+| NULL |
+| NULL |
++------+
+</code></pre>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+    <p class="p">
+      See <a class="xref" href="impala_select.html#select">SELECT Statement</a> for further examples of queries with the <code class="ph codeph">ORDER
+      BY</code> clause.
+    </p>
+
+    <p class="p">
+      Analytic functions use the <code class="ph codeph">ORDER BY</code> clause in a different context to define the sequence in
+      which rows are analyzed. See <a class="xref" href="impala_analytic_functions.html#analytic_functions">Impala Analytic Functions</a> for details.
+    </p>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_select.html">SELECT Statement</a></div></div></nav></article></main></body></html>

[32/51] [partial] impala git commit: [DOCS] Impala doc site update for 3.0

Posted by mi...@apache.org.

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_fixed_issues.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_fixed_issues.html b/docs/build3x/html/topics/impala_fixed_issues.html
new file mode 100644
index 0000000..0458052
--- /dev/null
+++ b/docs/build3x/html/topics/impala_fixed_issues.html
@@ -0,0 +1,5961 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_release_notes.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="fixed_issues"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Fixed Issues in Apache Impala</title></head><body id="fixed_issues"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1"><span class="ph">Fixed Issues in Apache Impala</span></h1>
+
+
+  <div class="body conbody">
+
+    <p class="p">
+      The following sections describe the major issues fixed in each Impala release.
+    </p>
+
+    <p class="p">
+      For known issues that are currently unresolved, see <a class="xref" href="impala_known_issues.html#known_issues">Known Issues and Workarounds in Impala</a>.
+    </p>
+
+    <p class="p toc inpage"></p>
+  </div>
+
+  <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_release_notes.html">Impala Release Notes</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="fixed_issues__fixed_issues_3_0_0">
+    <h2 class="title topictitle2" id="ariaid-title2">Issues Fixed in <span class="keyword">Impala 3.0</span></h2>
+    <div class="body conbody">
+      <p class="p"> For the full list of issues closed in this release, including bug
+        fixes, see the <a class="xref" href="https://impala.apache.org/docs/changelog-3.0.html" target="_blank">changelog for <span class="keyword">Impala 3.0</span></a>. </p>
+    </div>
+  </article>
+
+
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="fixed_issues__fixed_issues_2_12_0">
+
+    <h2 class="title topictitle2" id="ariaid-title3">Issues Fixed in <span class="keyword">Impala 2.12</span></h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        For the full list of issues closed in this release, including bug fixes,
+        see the <a class="xref" href="https://impala.apache.org/docs/changelog-2.12.html" target="_blank">changelog for <span class="keyword">Impala 2.12</span></a>.
+      </p>
+
+    </div>
+  </article>
+
+
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="fixed_issues__fixed_issues_2_11_0">
+
+    <h2 class="title topictitle2" id="ariaid-title4">Issues Fixed in <span class="keyword">Impala 2.11</span></h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        For the full list of issues closed in this release, including bug fixes,
+        see the <a class="xref" href="https://impala.apache.org/docs/changelog-2.11.html" target="_blank">changelog for <span class="keyword">Impala 2.11</span></a>.
+      </p>
+
+    </div>
+  </article>
+
+
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title5" id="fixed_issues__fixed_issues_2100">
+
+    <h2 class="title topictitle2" id="ariaid-title5">Issues Fixed in <span class="keyword">Impala 2.10</span></h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        For the full list of issues closed in this release, including bug fixes,
+        see the <a class="xref" href="https://impala.apache.org/docs/changelog-2.10.html" target="_blank">changelog for <span class="keyword">Impala 2.10</span></a>.
+      </p>
+
+    </div>
+  </article>
+
+
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title6" id="fixed_issues__fixed_issues_290">
+
+    <h2 class="title topictitle2" id="ariaid-title6">Issues Fixed in <span class="keyword">Impala 2.9.0</span></h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        For the full list of issues closed in this release, including bug fixes,
+        see the <a class="xref" href="https://impala.apache.org/docs/changelog-2.9.html" target="_blank">changelog for <span class="keyword">Impala 2.9</span></a>.
+      </p>
+
+    </div>
+  </article>
+
+
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title7" id="fixed_issues__fixed_issues_280">
+
+    <h2 class="title topictitle2" id="ariaid-title7">Issues Fixed in <span class="keyword">Impala 2.8.0</span></h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        For the full list of Impala fixed issues in <span class="keyword">Impala 2.8</span>, see
+        <a class="xref" href="https://issues.apache.org/jira/issues/?jql=type%20%3D%20bug%20and%20project%20%3D%20IMPALA%20AND%20resolution%20%3D%20fixed%20AND%20affectedVersion%20!%3D%20%22Impala%202.8.0%22%20AND%20fixVersion%20%3D%20%22Impala%202.8.0%22%20and%20not%20labels%20%3D%20broken-build%20order%20by%20priority%20desc" target="_blank">this report in the Impala JIRA tracker</a>.
+      </p>
+
+    </div>
+  </article>
+
+
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title8" id="fixed_issues__fixed_issues_270">
+
+    <h2 class="title topictitle2" id="ariaid-title8">Issues Fixed in <span class="keyword">Impala 2.7.0</span></h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+
+        For the full list of Impala fixed issues in Impala 2.7.0, see
+        <a class="xref" href="https://issues.apache.org/jira/issues/?jql=%20type%20%3D%20bug%20and%20project%20%3D%20IMPALA%20AND%20resolution%20%3D%20fixed%20AND%20affectedVersion%20!%3D%20%22Impala%202.7.0%22%20AND%20fixVersion%20%3D%20%22Impala%202.7.0%22%20and%20not%20labels%20%3D%20broken-build%20order%20by%20priority%20desc" target="_blank">this report in the Impala JIRA tracker</a>.
+      </p>
+
+    </div>
+  </article>
+
+
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title9" id="fixed_issues__fixed_issues_263">
+    <h2 class="title topictitle2" id="ariaid-title9">Issues Fixed in <span class="keyword">Impala 2.6.3</span></h2>
+    <div class="body conbody">
+      <p class="p"></p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title10" id="fixed_issues__fixed_issues_262">
+    <h2 class="title topictitle2" id="ariaid-title10">Issues Fixed in <span class="keyword">Impala 2.6.2</span></h2>
+    <div class="body conbody">
+      <p class="p"></p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title11" id="fixed_issues__fixed_issues_260">
+
+    <h2 class="title topictitle2" id="ariaid-title11">Issues Fixed in <span class="keyword">Impala 2.6.0</span></h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        The following list contains the most critical fixed issues
+        (<code class="ph codeph">priority='Blocker'</code>) from the JIRA system.
+        For the full list of fixed issues in <span class="keyword">Impala 2.6.0</span>, see
+        <a class="xref" href="https://issues.apache.org/jira/issues/?jql=%20type%20%3D%20bug%20and%20project%20%3D%20IMPALA%20AND%20resolution%20%3D%20fixed%20AND%20affectedVersion%20!%3D%20%22Impala%202.6.0%22%20AND%20fixVersion%20%3D%20%22Impala%202.6.0%22%20and%20not%20labels%20%3D%20broken-build%20order%20by%20priority%20desc" target="_blank">this report in the Impala JIRA tracker</a>.
+      </p>
+
+    </div>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title12" id="fixed_issues_260__IMPALA-3385">
+      <h3 class="title topictitle3" id="ariaid-title12">RuntimeState::error_log_ crashes</h3>
+      <div class="body conbody">
+      <p class="p">
+        A crash could occur, with stack trace pointing to <code class="ph codeph">impala::RuntimeState::ErrorLog</code>.
+      </p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-3385" target="_blank">IMPALA-3385</a></p>
+      <p class="p"><strong class="ph b">Severity:</strong> High</p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title13" id="fixed_issues_260__IMPALA-3378">
+      <h3 class="title topictitle3" id="ariaid-title13">HiveUdfCall::Open() produces unsynchronized access to JniUtil::global_refs_ vector</h3>
+      <div class="body conbody">
+      <p class="p">
+        A crash could occur because of contention between multiple calls to Java UDFs.
+      </p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-3378" target="_blank">IMPALA-3378</a></p>
+      <p class="p"><strong class="ph b">Severity:</strong> High</p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title14" id="fixed_issues_260__IMPALA-3379">
+      <h3 class="title topictitle3" id="ariaid-title14">HBaseTableWriter::CreatePutList() produces unsynchronized access to JniUtil::global_refs_ vector</h3>
+      <div class="body conbody">
+      <p class="p">
+        A crash could occur because of contention between multiple concurrent statements writing to HBase.
+      </p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-3379" target="_blank">IMPALA-3379</a></p>
+      <p class="p"><strong class="ph b">Severity:</strong> High</p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title15" id="fixed_issues_260__IMPALA-3317">
+      <h3 class="title topictitle3" id="ariaid-title15">Stress test failure: sorter.cc:745] Check failed: i == 0 (1 vs. 0) </h3>
+      <div class="body conbody">
+      <p class="p">
+        A crash or wrong results could occur if the spill-to-disk mechanism encountered a zero-length string at
+        the very end of a data block.
+      </p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-3317" target="_blank">IMPALA-3317</a></p>
+      <p class="p"><strong class="ph b">Severity:</strong> High</p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title16" id="fixed_issues_260__IMPALA-3311">
+      <h3 class="title topictitle3" id="ariaid-title16">String data coming out of agg can be corrupted by blocking operators</h3>
+      <div class="body conbody">
+      <p class="p">
+        If a query plan contains an aggregation node producing string values anywhere within a subplan
+        (that is,if in the SQL statement, the aggregate function appears within an inline view over a collection column),
+        the results of the aggregation may be incorrect.
+      </p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-3311" target="_blank">IMPALA-3311</a></p>
+      <p class="p"><strong class="ph b">Severity:</strong> High</p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title17" id="fixed_issues_260__IMPALA-3269">
+      <h3 class="title topictitle3" id="ariaid-title17">CTAS with subquery throws AuthzException</h3>
+      <div class="body conbody">
+      <p class="p">
+        A <code class="ph codeph">CREATE TABLE AS SELECT</code> operation could fail with an authorization error,
+        due to a slight difference in the privilege checking for the CTAS operation.
+      </p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-3269" target="_blank">IMPALA-3269</a></p>
+      <p class="p"><strong class="ph b">Severity:</strong> High</p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title18" id="fixed_issues_260__IMPALA-3237">
+      <h3 class="title topictitle3" id="ariaid-title18">Crash on inserting into table with binary and parquet</h3>
+      <div class="body conbody">
+      <p class="p">
+        Impala incorrectly allowed <code class="ph codeph">BINARY</code> to be specified as a column type,
+        resulting in a crash during a write to a Parquet table with a column of that type.
+      </p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-3237" target="_blank">IMPALA-3237</a></p>
+      <p class="p"><strong class="ph b">Severity:</strong> High</p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title19" id="fixed_issues_260__IMPALA-3105">
+      <h3 class="title topictitle3" id="ariaid-title19">RowBatch::MaxTupleBufferSize() calculation incorrect, may lead to memory corruption</h3>
+      <div class="body conbody">
+      <p class="p">
+        A crash could occur while querying tables with very large rows, for example wide tables with many
+        columns or very large string values. This problem was identified in Impala 2.3, but had low
+        reproducibility in subsequent releases. The fix ensures the memory allocation size is correct.
+      </p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-3105" target="_blank">IMPALA-3105</a></p>
+      <p class="p"><strong class="ph b">Severity:</strong> High</p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title20" id="fixed_issues_260__IMPALA-3494">
+      <h3 class="title topictitle3" id="ariaid-title20">Thrift buffer overflows when serialize more than 3355443200 bytes in impala</h3>
+      <div class="body conbody">
+      <p class="p">
+        A very large memory allocation within the <span class="keyword cmdname">catalogd</span> daemon could exceed an internal Thrift limit,
+        causing a crash.
+      </p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-3494" target="_blank">IMPALA-3494</a></p>
+      <p class="p"><strong class="ph b">Severity:</strong> High</p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title21" id="fixed_issues_260__IMPALA-3314">
+      <h3 class="title topictitle3" id="ariaid-title21">Altering table partition's storage format is not working and crashing the daemon</h3>
+      <div class="body conbody">
+      <p class="p">
+        If a partitioned table used a file format other than Avro, and the file format of an individual partition
+        was changed to Avro, subsequent queries could encounter a crash.
+      </p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-3314" target="_blank">IMPALA-3314</a></p>
+      <p class="p"><strong class="ph b">Severity:</strong> High</p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title22" id="fixed_issues_260__IMPALA-3798">
+      <h3 class="title topictitle3" id="ariaid-title22">Race condition may cause scanners to spin with runtime filters on Avro or Sequence files</h3>
+      <div class="body conbody">
+      <p class="p">
+        A timing problem during runtime filter processing could cause queries against Avro or SequenceFile tables
+        to hang.
+      </p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-3798" target="_blank">IMPALA-3798</a></p>
+      <p class="p"><strong class="ph b">Severity:</strong> High</p>
+      </div>
+    </article>
+
+  </article>
+
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title23" id="fixed_issues__fixed_issues_254">
+    <h2 class="title topictitle2" id="ariaid-title23">Issues Fixed in <span class="keyword">Impala 2.5.4</span></h2>
+    <div class="body conbody">
+      <p class="p"></p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title24" id="fixed_issues__fixed_issues_252">
+    <h2 class="title topictitle2" id="ariaid-title24">Issues Fixed in <span class="keyword">Impala 2.5.2</span></h2>
+    <div class="body conbody">
+      <p class="p"></p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title25" id="fixed_issues__fixed_issues_251">
+
+    <h2 class="title topictitle2" id="ariaid-title25">Issues Fixed in <span class="keyword">Impala 2.5.1</span></h2>
+
+    <div class="body conbody">
+      <p class="p"></p>
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title26" id="fixed_issues__fixed_issues_250">
+
+    <h2 class="title topictitle2" id="ariaid-title26">Issues Fixed in <span class="keyword">Impala 2.5.0</span></h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        The following list contains the most critical issues (<code class="ph codeph">priority='Blocker'</code>) from the JIRA system.
+        For the full list of fixed issues in <span class="keyword">Impala 2.5</span>, see
+        <a class="xref" href="https://issues.apache.org/jira/issues/?jql=%20type%20%3D%20bug%20and%20project%20%3D%20IMPALA%20AND%20resolution%20%3D%20fixed%20AND%20affectedVersion%20!%3D%20%22Impala%202.5.0%22%20AND%20fixVersion%20%3D%20%22Impala%202.5.0%22%20and%20not%20labels%20%3D%20broken-build%20order%20by%20priority%20desc" target="_blank">this report in the Impala JIRA tracker</a>.
+      </p>
+
+    </div>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title27" id="fixed_issues_250__IMPALA-2683">
+      <h3 class="title topictitle3" id="ariaid-title27">Stress test hit assert in LLVM: external function could not be resolved</h3>
+      <div class="body conbody">
+<p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2683" target="_blank">IMPALA-2683</a></p>
+<p class="p">The stress test was running a build with the TPC-H, TPC-DS, and TPC-H nested queries with scale factor 3.</p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title28" id="fixed_issues_250__IMPALA-2365">
+      <h3 class="title topictitle3" id="ariaid-title28">Impalad is crashing if udf jar is not available in hdfs location for first time</h3>
+      <div class="body conbody">
+        <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2365" target="_blank">IMPALA-2365</a></p>
+        <p class="p">
+          If a UDF JAR was not available in the HDFS location specified in the <code class="ph codeph">CREATE FUNCTION</code> statement,
+          the <span class="keyword cmdname">impalad</span> daemon could crash.
+        </p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title29" id="fixed_issues_250__IMPALA-2535-570">
+      <h3 class="title topictitle3" id="ariaid-title29">PAGG hits mem_limit when switching to I/O buffers</h3>
+      <div class="body conbody">
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2535" target="_blank">IMPALA-2535</a></p>
+      <p class="p">
+        A join query could fail with an out-of-memory error despite the apparent presence of sufficient memory.
+        The cause was the internal ordering of operations that could cause a later phase of the query to
+        allocate memory required by an earlier phase of the query. The workaround was to either increase
+        or decrease the <code class="ph codeph">MEM_LIMIT</code> query option, because the issue would only occur for a specific
+        combination of memory limit and data volume.
+      </p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title30" id="fixed_issues_250__IMPALA-2643-570">
+      <h3 class="title topictitle3" id="ariaid-title30">Prevent migrating incorrectly inferred identity predicates into inline views</h3>
+      <div class="body conbody">
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2643" target="_blank">IMPALA-2643</a></p>
+      <p class="p">
+        Referring to the same column twice in a view definition could cause the view to omit
+        rows where that column contained a <code class="ph codeph">NULL</code> value. This could cause
+        incorrect results due to an inaccurate <code class="ph codeph">COUNT(*)</code> value or rows missing
+        from the result set.
+      </p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title31" id="fixed_issues_250__IMPALA-1459-570">
+      <h3 class="title topictitle3" id="ariaid-title31">Fix migration/assignment of On-clause predicates inside inline views</h3>
+      <div class="body conbody">
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-1459" target="_blank">IMPALA-1459</a></p>
+      <p class="p">
+        Some combinations of <code class="ph codeph">ON</code> clauses in join queries could result in comparisons
+        being applied at the wrong stage of query processing, leading to incorrect results.
+        Wrong predicate assignment could happen under the following conditions:
+      </p>
+      <ul class="ul">
+        <li class="li">
+          The query includes an inline view that contains an outer join.
+        </li>
+        <li class="li">
+          That inline view is joined with another table in the enclosing query block.
+        </li>
+        <li class="li">
+          That join has an <code class="ph codeph">ON</code> clause containing a predicate that
+          only references columns originating from the outer-joined tables inside the inline view.
+        </li>
+      </ul>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title32" id="fixed_issues_250__IMPALA-2093">
+      <h3 class="title topictitle3" id="ariaid-title32">Wrong plan of NOT IN aggregate subquery when a constant is used in subquery predicate</h3>
+      <div class="body conbody">
+        <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2093" target="_blank">IMPALA-2093</a></p>
+        <p class="p">
+          <code class="ph codeph">IN</code> subqueries might return wrong results if the left-hand side of the <code class="ph codeph">IN</code> is a constant.
+          For example:
+        </p>
+<pre class="pre codeblock"><code>
+select * from alltypestiny t1
+  where 10 not in (select sum(int_col) from alltypestiny);
+</code></pre>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title33" id="fixed_issues_250__IMPALA-2940">
+      <h3 class="title topictitle3" id="ariaid-title33">Parquet DictDecoders accumulate throughout query</h3>
+      <div class="body conbody">
+        <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2940" target="_blank">IMPALA-2940</a></p>
+        <p class="p">
+          Parquet dictionary decoders can accumulate throughout query execution, leading to excessive memory usage. One decoder is created per-column per-split.
+        </p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title34" id="fixed_issues_250__IMPALA-3056">
+      <h3 class="title topictitle3" id="ariaid-title34">Planner doesn't set the has_local_target field correctly</h3>
+      <div class="body conbody">
+<p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-3056" target="_blank">IMPALA-3056</a></p>
+
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title35" id="fixed_issues_250__IMPALA-2742">
+      <h3 class="title topictitle3" id="ariaid-title35">MemPool allocation growth behavior</h3>
+      <div class="body conbody">
+        <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2742" target="_blank">IMPALA-2742</a></p>
+        <p class="p">
+          Currently, the MemPool would always double the size of the last allocation.
+          This can lead to bad behavior if the MemPool transferred the ownership of all its data
+          except the last chunk. In the next allocation, the next allocated chunk would double
+          the size of this large chunk, which can be undesirable.
+        </p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title36" id="fixed_issues_250__IMPALA-3035">
+      <h3 class="title topictitle3" id="ariaid-title36">Drop partition operations don't follow the catalog's locking protocol</h3>
+      <div class="body conbody">
+        <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-3035" target="_blank">IMPALA-3035</a></p>
+        <p class="p">
+          The <code class="ph codeph">CatalogOpExecutor.alterTableDropPartition()</code> function violates
+          the locking protocol used in the catalog that requires <code class="ph codeph">catalogLock_</code>
+          to be acquired before any table-level lock. That may cause deadlocks when <code class="ph codeph">ALTER TABLE DROP PARTITION</code>
+          is executed concurrently with other DDL operations.
+        </p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title37" id="fixed_issues_250__IMPALA-2215">
+      <h3 class="title topictitle3" id="ariaid-title37">HAVING clause without aggregation not applied properly</h3>
+      <div class="body conbody">
+        <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2215" target="_blank">IMPALA-2215</a></p>
+        <p class="p">
+          A query with a <code class="ph codeph">HAVING</code> clause but no <code class="ph codeph">GROUP BY</code> clause was not being rejected,
+          despite being invalid syntax. For example:
+        </p>
+
+<pre class="pre codeblock"><code>
+select case when 1=1 then 'didit' end as c1 from (select 1 as one) a having 1!=1;
+</code></pre>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title38" id="fixed_issues_250__IMPALA-2914">
+      <h3 class="title topictitle3" id="ariaid-title38">Hit DCHECK Check failed: HasDateOrTime()</h3>
+      <div class="body conbody">
+        <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2914" target="_blank">IMPALA-2914</a></p>
+        <p class="p">
+          <code class="ph codeph">TimestampValue::ToTimestampVal()</code> requires a valid <code class="ph codeph">TimestampValue</code> as input.
+          This requirement was not enforced in some places, leading to serious errors.
+        </p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title39" id="fixed_issues_250__IMPALA-2986">
+      <h3 class="title topictitle3" id="ariaid-title39">Aggregation spill loop gives up too early leading to mem limit exceeded errors</h3>
+      <div class="body conbody">
+        <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2986" target="_blank">IMPALA-2986</a></p>
+        <p class="p">
+          An aggregation query could fail with an out-of-memory error, despite sufficient memory being reported as available.
+        </p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title40" id="fixed_issues_250__IMPALA-2592">
+      <h3 class="title topictitle3" id="ariaid-title40">DataStreamSender::Channel::CloseInternal() does not close the channel on an error.</h3>
+      <div class="body conbody">
+        <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2592" target="_blank">IMPALA-2592</a></p>
+        <p class="p">
+          Some queries do not close an internal communication channel on an error.
+          This will cause the node on the other side of the channel to wait indefinitely, causing the query to hang.
+          For example, this issue could happen on a Kerberos-enabled system if the credential cache was outdated.
+          Although the affected query hangs, the <span class="keyword cmdname">impalad</span> daemons continue processing other queries.
+        </p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title41" id="fixed_issues_250__IMPALA-2184">
+      <h3 class="title topictitle3" id="ariaid-title41">Codegen does not catch exceptions in FROM_UNIXTIME()</h3>
+      <div class="body conbody">
+        <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2184" target="_blank">IMPALA-2184</a></p>
+        <p class="p">
+          Querying for the min or max value of a timestamp cast from a bigint via <code class="ph codeph">from_unixtime()</code>
+          fails silently and crashes instances of <span class="keyword cmdname">impalad</span> when the input includes a value outside of the valid range.
+        </p>
+
+        <p class="p"><strong class="ph b">Workaround:</strong> Disable native code generation with:</p>
+<pre class="pre codeblock"><code>
+SET disable_codegen=true;
+</code></pre>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title42" id="fixed_issues_250__IMPALA-2788">
+      <h3 class="title topictitle3" id="ariaid-title42">Impala returns wrong result for function 'conv(bigint, from_base, to_base)'</h3>
+      <div class="body conbody">
+        <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2788" target="_blank">IMPALA-2788</a></p>
+        <p class="p">
+          Impala returns wrong result for function <code class="ph codeph">conv()</code>.
+          Function <code class="ph codeph">conv(bigint, from_base, to_base)</code> returns an correct result,
+          while <code class="ph codeph">conv(string, from_base, to_base)</code> returns the correct value.
+          For example:
+        </p>
+
+<pre class="pre codeblock"><code>
+
+select 2061013007, conv(2061013007, 16, 10), conv('2061013007', 16, 10);
++------------+--------------------------+----------------------------+
+| 2061013007 | conv(2061013007, 16, 10) | conv('2061013007', 16, 10) |
++------------+--------------------------+----------------------------+
+| 2061013007 | 1627467783               | 139066421255               |
++------------+--------------------------+----------------------------+
+Fetched 1 row(s) in 0.65s
+
+select 2061013007, conv(cast(2061013007 as bigint), 16, 10), conv('2061013007', 16, 10);
++------------+------------------------------------------+----------------------------+
+| 2061013007 | conv(cast(2061013007 as bigint), 16, 10) | conv('2061013007', 16, 10) |
++------------+------------------------------------------+----------------------------+
+| 2061013007 | 1627467783                               | 139066421255               |
++------------+------------------------------------------+----------------------------+
+
+select 2061013007, conv(cast(2061013007 as string), 16, 10), conv('2061013007', 16, 10);
++------------+------------------------------------------+----------------------------+
+| 2061013007 | conv(cast(2061013007 as string), 16, 10) | conv('2061013007', 16, 10) |
++------------+------------------------------------------+----------------------------+
+| 2061013007 | 139066421255                             | 139066421255               |
++------------+------------------------------------------+----------------------------+
+
+select 2061013007, conv(cast(cast(2061013007 as decimal(20,0)) as bigint), 16, 10), conv('2061013007', 16, 10);
++------------+-----------------------------------------------------------------+----------------------------+
+| 2061013007 | conv(cast(cast(2061013007 as decimal(20,0)) as bigint), 16, 10) | conv('2061013007', 16, 10) |
++------------+-----------------------------------------------------------------+----------------------------+
+| 2061013007 | 1627467783                                                      | 139066421255               |
++------------+-----------------------------------------------------------------+----------------------------+
+
+</code></pre>
+
+        <p class="p"><strong class="ph b">Workaround:</strong>
+          Cast the value to string and use <code class="ph codeph">conv(string, from_base, to_base)</code> for conversion.
+        </p>
+      </div>
+    </article>
+
+
+
+  </article>
+
+
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title43" id="fixed_issues__fixed_issues_241">
+
+    <h2 class="title topictitle2" id="ariaid-title43">Issues Fixed in <span class="keyword">Impala 2.4.1</span></h2>
+
+    <div class="body conbody">
+      <p class="p">
+      </p>
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title44" id="fixed_issues__fixed_issues_240">
+
+    <h2 class="title topictitle2" id="ariaid-title44">Issues Fixed in <span class="keyword">Impala 2.4.0</span></h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        The set of fixes for Impala in <span class="keyword">Impala 2.4.0</span> is the same as
+        in <span class="keyword">Impala 2.3.2</span>.
+
+      </p>
+
+    </div>
+
+  </article>
+
+
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title45" id="fixed_issues__fixed_issues_234">
+
+    <h2 class="title topictitle2" id="ariaid-title45">Issues Fixed in <span class="keyword">Impala 2.3.4</span></h2>
+
+    <div class="body conbody">
+      <p class="p"></p>
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title46" id="fixed_issues__fixed_issues_232">
+
+    <h2 class="title topictitle2" id="ariaid-title46">Issues Fixed in <span class="keyword">Impala 2.3.2</span></h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        This section lists the most serious or frequently encountered customer
+        issues fixed in <span class="keyword">Impala 2.3.2</span>.
+      </p>
+
+    </div>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title47" id="fixed_issues_232__IMPALA-2829">
+      <h3 class="title topictitle3" id="ariaid-title47">SEGV in AnalyticEvalNode touching NULL input_stream_</h3>
+      <div class="body conbody">
+      <p class="p">
+        A query involving an analytic function could encounter a serious error.
+        This issue was encountered infrequently, depending upon specific combinations
+        of queries and data.
+      </p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2829" target="_blank">IMPALA-2829</a></p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title48" id="fixed_issues_232__IMPALA-2722">
+      <h3 class="title topictitle3" id="ariaid-title48">Free local allocations per row batch in non-partitioned AGG and HJ</h3>
+      <div class="body conbody">
+      <p class="p">
+        An outer join query could fail unexpectedly with an out-of-memory error
+        when the <span class="q">"spill to disk"</span> mechanism was turned off.
+      </p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2722" target="_blank">IMPALA-2722</a></p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title49" id="fixed_issues_232__IMPALA-2612">
+
+      <h3 class="title topictitle3" id="ariaid-title49">Free local allocations once for every row batch when building hash tables</h3>
+      <div class="body conbody">
+      <p class="p">
+        A join query could encounter a serious error due to an internal failure to allocate memory, which
+        resulted in dereferencing a <code class="ph codeph">NULL</code> pointer.
+      </p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2612" target="_blank">IMPALA-2612</a></p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title50" id="fixed_issues_232__IMPALA-2643">
+      <h3 class="title topictitle3" id="ariaid-title50">Prevent migrating incorrectly inferred identity predicates into inline views</h3>
+      <div class="body conbody">
+      <p class="p">
+        Referring to the same column twice in a view definition could cause the view to omit
+        rows where that column contained a <code class="ph codeph">NULL</code> value. This could cause
+        incorrect results due to an inaccurate <code class="ph codeph">COUNT(*)</code> value or rows missing
+        from the result set.
+      </p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2643" target="_blank">IMPALA-2643</a></p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title51" id="fixed_issues_232__IMPALA-2695">
+      <h3 class="title topictitle3" id="ariaid-title51">Fix GRANTs on URIs with uppercase letters</h3>
+      <div class="body conbody">
+      <p class="p">
+        A <code class="ph codeph">GRANT</code> statement for a URI could be ineffective if the URI
+        contained uppercase letters, for example in an uppercase directory name.
+        Subsequent statements, such as <code class="ph codeph">CREATE EXTERNAL TABLE</code>
+        with a <code class="ph codeph">LOCATION</code> clause, could fail with an authorization exception.
+      </p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2695" target="_blank">IMPALA-2695</a></p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="IMPALA-2664-552__IMPALA-2648-552" id="fixed_issues_232__IMPALA-2664-552">
+      <h3 class="title topictitle3" id="IMPALA-2664-552__IMPALA-2648-552">Avoid sending large partition stats objects over thrift</h3>
+      <div class="body conbody">
+      <p class="p">
+        The <span class="keyword cmdname">catalogd</span> daemon could encounter a serious error
+        when loading the incremental statistics metadata for tables with large
+        numbers of partitions and columns. The problem occurred when the
+        internal representation of metadata for the table exceeded 2
+        GB, for example in a table with 20K partitions and 77 columns. The fix causes a
+        <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> operation to fail if it
+        would produce metadata that exceeded the maximum size.
+      </p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2664" target="_blank">IMPALA-2664</a>,
+        <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2648" target="_blank">IMPALA-2648</a></p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title53" id="fixed_issues_232__IMPALA-2226">
+      <h3 class="title topictitle3" id="ariaid-title53">Throw AnalysisError if table properties are too large (for the Hive metastore)</h3>
+      <div class="body conbody">
+      <p class="p">
+        <code class="ph codeph">CREATE TABLE</code> or <code class="ph codeph">ALTER TABLE</code> statements could fail with
+        metastore database errors due to length limits on the <code class="ph codeph">SERDEPROPERTIES</code> and <code class="ph codeph">TBLPROPERTIES</code> clauses.
+        (The limit on key size is 256, while the limit on value size is 4000.) The fix makes Impala handle these error conditions
+        more cleanly, by detecting too-long values rather than passing them to the metastore database.
+      </p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2226" target="_blank">IMPALA-2226</a></p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title54" id="fixed_issues_232__IMPALA-2273-552">
+      <h3 class="title topictitle3" id="ariaid-title54">Make MAX_PAGE_HEADER_SIZE configurable</h3>
+      <div class="body conbody">
+      <p class="p">
+        Impala could fail to access Parquet data files with page headers larger than 8 MB, which could
+        occur, for example, if the minimum or maximum values for a column were long strings. The
+        fix adds a configuration setting <code class="ph codeph">--max_page_header_size</code>, which you can use to
+        increase the Impala size limit to a value higher than 8 MB.
+      </p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2273" target="_blank">IMPALA-2273</a></p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title55" id="fixed_issues_232__IMPALA-2473">
+      <h3 class="title topictitle3" id="ariaid-title55">reduce scanner memory usage</h3>
+      <div class="body conbody">
+      <p class="p">
+        Queries on Parquet tables could consume excessive memory (potentially multiple gigabytes) due to producing
+        large intermediate data values while evaluating groups of rows. The workaround was to reduce the size of
+        the <code class="ph codeph">NUM_SCANNER_THREADS</code> query option, the <code class="ph codeph">BATCH_SIZE</code> query option,
+        or both.
+      </p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2473" target="_blank">IMPALA-2473</a></p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title56" id="fixed_issues_232__IMPALA-2113">
+      <h3 class="title topictitle3" id="ariaid-title56">Handle error when distinct and aggregates are used with a having clause</h3>
+      <div class="body conbody">
+      <p class="p">
+        A query that included a <code class="ph codeph">DISTINCT</code> operator and a <code class="ph codeph">HAVING</code> clause, but no
+        aggregate functions or <code class="ph codeph">GROUP BY</code>, would fail with an uninformative error message.
+      </p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2113" target="_blank">IMPALA-2113</a></p>
+
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title57" id="fixed_issues_232__IMPALA-2225">
+      <h3 class="title topictitle3" id="ariaid-title57">Handle error when star based select item and aggregate are incorrectly used</h3>
+      <div class="body conbody">
+      <p class="p">
+        A query that included <code class="ph codeph">*</code> in the <code class="ph codeph">SELECT</code> list, in addition to an
+        aggregate function call, would fail with an uninformative message if the query had no
+        <code class="ph codeph">GROUP BY</code> clause.
+      </p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2225" target="_blank">IMPALA-2225</a></p>
+
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title58" id="fixed_issues_232__IMPALA-2731-552">
+      <h3 class="title topictitle3" id="ariaid-title58">Refactor MemPool usage in HBase scan node</h3>
+      <div class="body conbody">
+      <p class="p">
+        Queries involving HBase tables used substantially more memory than in earlier Impala versions.
+        The problem occurred starting in Impala 2.2.8, as a result of the changes for IMPALA-2284.
+        The fix for this issue involves removing a separate memory work area for HBase queries
+        and reusing other memory that was already allocated.
+      </p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2731" target="_blank">IMPALA-2731</a></p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title59" id="fixed_issues_232__IMPALA-1459-552">
+      <h3 class="title topictitle3" id="ariaid-title59">Fix migration/assignment of On-clause predicates inside inline views</h3>
+      <div class="body conbody">
+      <p class="p">
+        Some combinations of <code class="ph codeph">ON</code> clauses in join queries could result in comparisons
+        being applied at the wrong stage of query processing, leading to incorrect results.
+        Wrong predicate assignment could happen under the following conditions:
+      </p>
+      <ul class="ul">
+        <li class="li">
+          The query includes an inline view that contains an outer join.
+        </li>
+        <li class="li">
+          That inline view is joined with another table in the enclosing query block.
+        </li>
+        <li class="li">
+          That join has an <code class="ph codeph">ON</code> clause containing a predicate that
+          only references columns originating from the outer-joined tables inside the inline view.
+        </li>
+      </ul>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-1459" target="_blank">IMPALA-1459</a></p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title60" id="fixed_issues_232__IMPALA-2558">
+      <h3 class="title topictitle3" id="ariaid-title60">DCHECK in parquet scanner after block read error</h3>
+      <div class="body conbody">
+      <p class="p">
+        A debug build of Impala could encounter a serious error after encountering some kinds of I/O
+        errors for Parquet files. This issue only occurred in debug builds, not release builds.
+      </p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2558" target="_blank">IMPALA-2558</a></p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title61" id="fixed_issues_232__IMPALA-2535">
+      <h3 class="title topictitle3" id="ariaid-title61">PAGG hits mem_limit when switching to I/O buffers</h3>
+      <div class="body conbody">
+      <p class="p">
+        A join query could fail with an out-of-memory error despite the apparent presence of sufficient memory.
+        The cause was the internal ordering of operations that could cause a later phase of the query to
+        allocate memory required by an earlier phase of the query. The workaround was to either increase
+        or decrease the <code class="ph codeph">MEM_LIMIT</code> query option, because the issue would only occur for a specific
+        combination of memory limit and data volume.
+      </p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2535" target="_blank">IMPALA-2535</a></p>
+      </div>
+    </article>
+
+
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title62" id="fixed_issues_232__IMPALA-2559">
+      <h3 class="title topictitle3" id="ariaid-title62">Fix check failed: sorter_runs_.back()-&gt;is_pinned_</h3>
+      <div class="body conbody">
+      <p class="p">
+        A query could fail with an internal error while calculating the memory limit.
+        This was an infrequent condition uncovered during stress testing.
+      </p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2559" target="_blank">IMPALA-2559</a></p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title63" id="fixed_issues_232__IMPALA-2614">
+      <h3 class="title topictitle3" id="ariaid-title63">Don't ignore Status returned by DataStreamRecvr::CreateMerger()</h3>
+      <div class="body conbody">
+      <p class="p">
+        A query could fail with an internal error while calculating the memory limit.
+        This was an infrequent condition uncovered during stress testing.
+      </p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2614" target="_blank">IMPALA-2614</a>,
+      <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2559" target="_blank">IMPALA-2559</a></p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title64" id="fixed_issues_232__IMPALA-2591">
+      <h3 class="title topictitle3" id="ariaid-title64">DataStreamSender::Send() does not return an error status if SendBatch() failed</h3>
+      <div class="body conbody">
+
+      <p class="p">
+      </p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2591" target="_blank">IMPALA-2591</a></p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title65" id="fixed_issues_232__IMPALA-2598">
+      <h3 class="title topictitle3" id="ariaid-title65">Re-enable SSL and Kerberos on server-server</h3>
+      <div class="body conbody">
+      <p class="p">
+        These fixes lift the restriction on using SSL encryption and Kerberos authentication together
+        for internal communication between Impala components.
+      </p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2598" target="_blank">IMPALA-2598</a>,
+      <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2747" target="_blank">IMPALA-2747</a></p>
+      </div>
+    </article>
+
+
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title66" id="fixed_issues__fixed_issues_231">
+
+    <h2 class="title topictitle2" id="ariaid-title66">Issues Fixed in <span class="keyword">Impala 2.3.1</span></h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        The version of Impala that is included with <span class="keyword">Impala 2.3.1</span> is identical to <span class="keyword">Impala 2.3.0</span>.
+        There are no new bug fixes, new features, or incompatible changes.
+      </p>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title67" id="fixed_issues__fixed_issues_230">
+
+    <h2 class="title topictitle2" id="ariaid-title67">Issues Fixed in <span class="keyword">Impala 2.3.0</span></h2>
+
+    <div class="body conbody">
+      <p class="p"> This section lists the most serious or frequently encountered customer
+        issues fixed in <span class="keyword">Impala 2.3</span>. Any issues already fixed in
+        <span class="keyword">Impala 2.2</span> maintenance releases (up through <span class="keyword">Impala 2.2.8</span>) are also included.
+        Those issues are listed under the respective <span class="keyword">Impala 2.2</span> sections and are
+        not repeated here.
+      </p>
+
+
+
+    </div>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title68" id="fixed_issues_230__serious_230">
+      <h3 class="title topictitle3" id="ariaid-title68">Fixes for Serious Errors</h3>
+      <div class="body conbody">
+      <p class="p">
+        A number of issues were resolved that could result in serious errors
+        when encountered. The most critical or commonly encountered are
+        listed here.
+      </p>
+      <p class="p"><strong class="ph b">Bugs:</strong>
+
+
+
+      <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2168" target="_blank">IMPALA-2168</a>,
+      <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2378" target="_blank">IMPALA-2378</a>,
+      <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2369" target="_blank">IMPALA-2369</a>,
+      <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2357" target="_blank">IMPALA-2357</a>,
+      <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2319" target="_blank">IMPALA-2319</a>,
+      <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2314" target="_blank">IMPALA-2314</a>,
+      <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2016" target="_blank">IMPALA-2016</a>
+      </p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title69" id="fixed_issues_230__correctness_230">
+      <h3 class="title topictitle3" id="ariaid-title69">Fixes for Correctness Errors</h3>
+      <div class="body conbody">
+      <p class="p">
+        A number of issues were resolved that could result in wrong results
+        when encountered. The most critical or commonly encountered are
+        listed here.
+      </p>
+      <p class="p"><strong class="ph b">Bugs:</strong>
+
+
+
+      <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2192" target="_blank">IMPALA-2192</a>,
+      <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2440" target="_blank">IMPALA-2440</a>,
+      <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2090" target="_blank">IMPALA-2090</a>,
+      <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2086" target="_blank">IMPALA-2086</a>,
+      <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-1947" target="_blank">IMPALA-1947</a>,
+      <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-1917" target="_blank">IMPALA-1917</a>
+      </p>
+      </div>
+    </article>
+
+  </article>
+
+
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title70" id="fixed_issues__fixed_issues_2210">
+
+    <h2 class="title topictitle2" id="ariaid-title70">Issues Fixed in <span class="keyword">Impala 2.2.10</span></h2>
+
+    <div class="body conbody">
+      <p class="p"></p>
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title71" id="fixed_issues__fixed_issues_229">
+
+    <h2 class="title topictitle2" id="ariaid-title71">Issues Fixed in <span class="keyword">Impala 2.2.9</span></h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        This section lists the most frequently encountered customer issues fixed in <span class="keyword">Impala 2.2.9</span>.
+      </p>
+
+    </div>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title72" id="fixed_issues_229__IMPALA-1917">
+
+      <h3 class="title topictitle3" id="ariaid-title72">Query return empty result if it contains NullLiteral in inlineview</h3>
+      <div class="body conbody">
+      <p class="p">
+        If an inline view in a <code class="ph codeph">FROM</code> clause contained a <code class="ph codeph">NULL</code> literal,
+        the result set was empty.
+      </p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-1917" target="_blank">IMPALA-1917</a></p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title73" id="fixed_issues_229__IMPALA-2731">
+
+      <h3 class="title topictitle3" id="ariaid-title73">HBase scan node uses 2-4x memory after upgrade to Impala 2.2.8</h3>
+      <div class="body conbody">
+      <p class="p">
+        Queries involving HBase tables used substantially more memory than in earlier Impala versions.
+        The problem occurred starting in Impala 2.2.8, as a result of the changes for IMPALA-2284.
+        The fix for this issue involves removing a separate memory work area for HBase queries
+        and reusing other memory that was already allocated.
+      </p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2731" target="_blank">IMPALA-2731</a></p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title74" id="fixed_issues_229__IMPALA-1459">
+      <h3 class="title topictitle3" id="ariaid-title74">Fix migration/assignment of On-clause predicates inside inline views</h3>
+      <div class="body conbody">
+
+      <p class="p">
+        Some combinations of <code class="ph codeph">ON</code> clauses in join queries could result in comparisons
+        being applied at the wrong stage of query processing, leading to incorrect results.
+        Wrong predicate assignment could happen under the following conditions:
+      </p>
+      <ul class="ul">
+        <li class="li">
+          The query includes an inline view that contains an outer join.
+        </li>
+        <li class="li">
+          That inline view is joined with another table in the enclosing query block.
+        </li>
+        <li class="li">
+          That join has an <code class="ph codeph">ON</code> clause containing a predicate that
+          only references columns originating from the outer-joined tables inside the inline view.
+        </li>
+      </ul>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-1459" target="_blank">IMPALA-1459</a></p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title75" id="fixed_issues_229__IMPALA-2446">
+      <h3 class="title topictitle3" id="ariaid-title75">Fix wrong predicate assignment in outer joins</h3>
+      <div class="body conbody">
+      <p class="p">
+        The join predicate for an <code class="ph codeph">OUTER JOIN</code> clause could be applied at the wrong stage
+        of query processing, leading to incorrect results.
+      </p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2446" target="_blank">IMPALA-2446</a></p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title76" id="fixed_issues_229__IMPALA-2648">
+      <h3 class="title topictitle3" id="ariaid-title76">Avoid sending large partition stats objects over thrift</h3>
+      <div class="body conbody">
+      <p class="p"> The <span class="keyword cmdname">catalogd</span> daemon could encounter a serious error when loading the
+          incremental statistics metadata for tables with large numbers of partitions and columns.
+          The problem occurred when the internal representation of metadata for the table exceeded 2
+          GB, for example in a table with 20K partitions and 77 columns. The fix causes a
+            <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> operation to fail if it would produce
+          metadata that exceeded the maximum size. </p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2648" target="_blank">IMPALA-2648</a>,
+      <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2664" target="_blank">IMPALA-2664</a></p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title77" id="fixed_issues_229__IMPALA-1675">
+      <h3 class="title topictitle3" id="ariaid-title77">Avoid overflow when adding large intervals to TIMESTAMPs</h3>
+      <div class="body conbody">
+      <p class="p"> Adding or subtracting a large <code class="ph codeph">INTERVAL</code> value to a
+            <code class="ph codeph">TIMESTAMP</code> value could produce an incorrect result, with the value
+          wrapping instead of returning an out-of-range error. </p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-1675" target="_blank">IMPALA-1675</a></p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title78" id="fixed_issues_229__IMPALA-1949">
+      <h3 class="title topictitle3" id="ariaid-title78">Analysis exception when a binary operator contains an IN operator with values</h3>
+      <div class="body conbody">
+      <p class="p">
+        An <code class="ph codeph">IN</code> operator with literal values could cause a statement to fail if used
+        as the argument to a binary operator, such as an equality test for a <code class="ph codeph">BOOLEAN</code> value.
+      </p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-1949" target="_blank">IMPALA-1949</a></p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title79" id="fixed_issues_229__IMPALA-2273">
+
+      <h3 class="title topictitle3" id="ariaid-title79">Make MAX_PAGE_HEADER_SIZE configurable</h3>
+      <div class="body conbody">
+      <p class="p"> Impala could fail to access Parquet data files with page headers larger than 8 MB, which
+          could occur, for example, if the minimum or maximum values for a column were long strings.
+          The fix adds a configuration setting <code class="ph codeph">--max_page_header_size</code>, which you
+          can use to increase the Impala size limit to a value higher than 8 MB. </p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2273" target="_blank">IMPALA-2273</a></p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title80" id="fixed_issues_229__IMPALA-2357">
+      <h3 class="title topictitle3" id="ariaid-title80">Fix spilling sorts with var-len slots that are NULL or empty.</h3>
+      <div class="body conbody">
+      <p class="p">
+        A query that activated the spill-to-disk mechanism could fail if it contained a sort expression
+        involving certain combinations of fixed-length or variable-length types.
+      </p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2357" target="_blank">IMPALA-2357</a></p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title81" id="fixed_issues_229__block_pin_oom">
+      <h3 class="title topictitle3" id="ariaid-title81">Work-around IMPALA-2344: Fail query with OOM in case block-&gt;Pin() fails</h3>
+      <div class="body conbody">
+      <p class="p">
+        Some queries that activated the spill-to-disk mechanism could produce a serious error
+        if there was insufficient memory to set up internal work areas. Now those queries
+        produce normal out-of-memory errors instead.
+      </p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2344" target="_blank">IMPALA-2344</a></p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title82" id="fixed_issues_229__IMPALA-2252">
+      <h3 class="title topictitle3" id="ariaid-title82">Crash (likely race) tearing down BufferedBlockMgr on query failure</h3>
+      <div class="body conbody">
+      <p class="p">
+        A serious error could occur under rare circumstances, due to a race condition while freeing memory during heavily concurrent workloads.
+      </p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2252" target="_blank">IMPALA-2252</a></p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title83" id="fixed_issues_229__IMPALA-1746">
+      <h3 class="title topictitle3" id="ariaid-title83">QueryExecState doesn't check for query cancellation or errors</h3>
+      <div class="body conbody">
+      <p class="p">
+        A call to <code class="ph codeph">SetError()</code> in a user-defined function (UDF) would not cause the query to fail as expected.
+      </p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-1746" target="_blank">IMPALA-1746</a></p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title84" id="fixed_issues_229__IMPALA-2533">
+      <h3 class="title topictitle3" id="ariaid-title84">Impala throws IllegalStateException when inserting data into a partition while select
+        subquery group by partition columns</h3>
+      <div class="body conbody">
+      <p class="p">
+        An <code class="ph codeph">INSERT ... SELECT</code> operation into a partitioned table could fail if the <code class="ph codeph">SELECT</code> query
+        included a <code class="ph codeph">GROUP BY</code> clause referring to the partition key columns.
+      </p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2533" target="_blank">IMPALA-2533</a></p>
+      </div>
+    </article>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title85" id="fixed_issues__fixed_issues_228">
+
+    <h2 class="title topictitle2" id="ariaid-title85">Issues Fixed in <span class="keyword">Impala 2.2.8</span></h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        This section lists the most frequently encountered customer issues fixed in <span class="keyword">Impala 2.2.8</span>.
+      </p>
+
+    </div>
+
+
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title86" id="fixed_issues_228__IMPALA-1136">
+      <h3 class="title topictitle3" id="ariaid-title86">Impala is unable to read hive tables created with the "STORED AS AVRO" clause</h3>
+      <div class="body conbody">
+      <p class="p">Impala could not read Avro tables created in Hive with the <code class="ph codeph">STORED AS AVRO</code> clause.</p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-1136" target="_blank">IMPALA-1136</a>,
+      <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2161" target="_blank">IMPALA-2161</a></p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title87" id="fixed_issues_228__IMPALA-2213">
+      <h3 class="title topictitle3" id="ariaid-title87">make Parquet scanner fail query if the file size metadata is stale</h3>
+      <div class="body conbody">
+      <p class="p">If a Parquet file in HDFS was overwritten by a smaller file, Impala could encounter a serious error.
+      Issuing a <code class="ph codeph">INVALIDATE METADATA</code> statement before a subsequent query would avoid the error.
+      The fix allows Impala to handle such inconsistencies in Parquet file length cleanly regardless of whether the
+      table metadata is up-to-date.</p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2213" target="_blank">IMPALA-2213</a></p>
+      </div>
+    </article>
+
+     <article class="topic concept nested2" aria-labelledby="ariaid-title88" id="fixed_issues_228__IMPALA-2249">
+      <h3 class="title topictitle3" id="ariaid-title88">Avoid allocating StringBuffer &gt; 1GB in ScannerContext::Stream::GetBytesInternal()</h3>
+      <div class="body conbody">
+      <p class="p">Impala could encounter a serious error when reading compressed text files larger than 1 GB. The fix causes Impala
+      to issue an error message instead in this case.</p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2249" target="_blank">IMPALA-2249</a></p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title89" id="fixed_issues_228__IMPALA-2284">
+      <h3 class="title topictitle3" id="ariaid-title89">Disallow long (1&lt;&lt;30) strings in group_concat()</h3>
+      <div class="body conbody">
+      <p class="p">A query using the <code class="ph codeph">group_concat()</code> function could encounter a serious error if the returned string value was larger than 1 GB.
+      Now the query fails with an error message in this case.</p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2284" target="_blank">IMPALA-2284</a></p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title90" id="fixed_issues_228__IMPALA-2270">
+      <h3 class="title topictitle3" id="ariaid-title90">avoid FnvHash64to32 with empty inputs</h3>
+      <div class="body conbody">
+      <p class="p">An edge case in the algorithm used to distribute data among nodes could result in uneven distribution of work for some queries,
+      with all data sent to the same node.</p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2270" target="_blank">IMPALA-2270</a></p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title91" id="fixed_issues_228__IMPALA-2348">
+      <h3 class="title topictitle3" id="ariaid-title91">The catalog does not close the connection to HMS during table invalidation</h3>
+      <div class="body conbody">
+      <p class="p">A communication error could occur between Impala and the Hive metastore database, causing Impala operations that update
+      table metadata to fail.</p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2348" target="_blank">IMPALA-2348</a></p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title92" id="fixed_issues_228__IMPALA-2364-548">
+      <h3 class="title topictitle3" id="ariaid-title92">Wrong DCHECK in PHJ::ProcessProbeBatch</h3>
+      <div class="body conbody">
+      <p class="p">Certain queries could encounter a serious error if the spill-to-disk mechanism was activated.</p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2364" target="_blank">IMPALA-2364</a></p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title93" id="fixed_issues_228__IMPALA-2165-548">
+      <h3 class="title topictitle3" id="ariaid-title93">Avoid cardinality 0 in scan nodes of small tables and low selectivity</h3>
+      <div class="body conbody">
+      <p class="p">Impala could generate a suboptimal query plan for some queries involving small tables.</p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2165" target="_blank">IMPALA-2165</a></p>
+      </div>
+    </article>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title94" id="fixed_issues__fixed_issues_227">
+
+    <h2 class="title topictitle2" id="ariaid-title94">Issues Fixed in <span class="keyword">Impala 2.2.7</span></h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        This section lists the most frequently encountered customer issues fixed in <span class="keyword">Impala 2.2.7</span>.
+      </p>
+
+    </div>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title95" id="fixed_issues_227__IMPALA-1983">
+      <h3 class="title topictitle3" id="ariaid-title95">Warn if table stats are potentially corrupt.</h3>
+      <div class="body conbody">
+      <p class="p">
+        Impala warns if it detects a discrepancy in table statistics: a table considered to have zero rows even though there are data files present.
+        In this case, Impala also skips query optimizations that are normally applied to very small tables.
+      </p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-1983" target="_blank">IMPALA-1983:</a></p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title96" id="fixed_issues_227__IMPALA-2266">
+      <h3 class="title topictitle3" id="ariaid-title96">Pass correct child node in 2nd phase merge aggregation.</h3>
+      <div class="body conbody">
+      <p class="p">A query could encounter a serious error if it included a particular combination of aggregate functions and inline views.</p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2266" target="_blank">IMPALA-2266</a></p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title97" id="fixed_issues_227__IMPALA-2216">
+      <h3 class="title topictitle3" id="ariaid-title97">Set the output smap of an EmptySetNode produced from an empty inline view.</h3>
+      <div class="body conbody">
+      <p class="p">A query could encounter a serious error if it included an inline view whose subquery had no <code class="ph codeph">FROM</code> clause.</p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2216" target="_blank">IMPALA-2216</a></p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title98" id="fixed_issues_227__IMPALA-2203">
+      <h3 class="title topictitle3" id="ariaid-title98">Set an InsertStmt's result exprs from the source statement's result exprs.</h3>
+      <div class="body conbody">
+      <p class="p">
+      A <code class="ph codeph">CREATE TABLE AS SELECT</code> or <code class="ph codeph">INSERT ... SELECT</code> statement could produce
+      different results than a <code class="ph codeph">SELECT</code> statement, for queries including a <code class="ph codeph">FULL JOIN</code> clause
+      and including literal values in the select list.
+      </p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2203" target="_blank">IMPALA-2203</a></p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title99" id="fixed_issues_227__IMPALA-2088">
+      <h3 class="title topictitle3" id="ariaid-title99">Fix planning of empty union operands with analytics.</h3>
+      <div class="body conbody">
+      <p class="p">
+        A query could return incorrect results if it contained a <code class="ph codeph">UNION</code> clause,
+        calls to analytic functions, and a constant expression that evaluated to <code class="ph codeph">FALSE</code>.
+      </p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2088" target="_blank">IMPALA-2088</a></p>
+      </div>
+    </article>
+
+
+
+
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title100" id="fixed_issues_227__IMPALA-2089">
+      <h3 class="title topictitle3" id="ariaid-title100">Retain eq predicates bound by grouping slots with complex grouping exprs.</h3>
+      <div class="body conbody">
+      <p class="p">
+        A query containing an <code class="ph codeph">INNER JOIN</code> clause could return undesired rows.
+        Some predicate specified in the <code class="ph codeph">ON</code> clause could be omitted from the filtering operation.
+      </p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2089" target="_blank">IMPALA-2089</a></p>
+      </div>
+    </article>
+
+
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title101" id="fixed_issues_227__IMPALA-2199">
+      <h3 class="title topictitle3" id="ariaid-title101">Row count not set for empty partition when spec is used with compute incremental stats</h3>
+      <div class="body conbody">
+      <p class="p">
+        A <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> statement could leave the row count for an emptyp partition as -1,
+        rather than initializing the row count to 0. The missing statistic value could result in reduced query performance.
+      </p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2199" target="_blank">IMPALA-2199</a></p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title102" id="fixed_issues_227__IMPALA-1898">
+      <h3 class="title topictitle3" id="ariaid-title102">Explicit aliases + ordinals analysis bug</h3>
+      <div class="body conbody">
+      <p class="p">
+        A query could encounter a serious error if it included column aliases with the same names as table columns, and used
+        ordinal numbers in an <code class="ph codeph">ORDER BY</code> or <code class="ph codeph">GROUP BY</code> clause.
+      </p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-1898" target="_blank">IMPALA-1898</a></p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title103" id="fixed_issues_227__IMPALA-1987">
+      <h3 class="title topictitle3" id="ariaid-title103">Fix TupleIsNullPredicate to return false if no tuples are nullable.</h3>
+      <div class="body conbody">
+      <p class="p">
+        A query could return incorrect results if it included an outer join clause, inline views, and calls to functions such as <code class="ph codeph">coalesce()</code>
+        that can generate <code class="ph codeph">NULL</code> values.
+      </p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-1987" target="_blank">IMPALA-1987</a></p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title104" id="fixed_issues_227__IMPALA-2178">
+      <h3 class="title topictitle3" id="ariaid-title104">fix Expr::ComputeResultsLayout() logic</h3>
+      <div class="body conbody">
+      <p class="p">
+        A query could return incorrect results if the table contained multiple <code class="ph codeph">CHAR</code> columns with length of 2 or less,
+        and the query included a <code class="ph codeph">GROUP BY</code> clause that referred to multiple such columns.
+      </p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2178" target="_blank">IMPALA-2178</a></p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title105" id="fixed_issues_227__IMPALA-1737">
+      <h3 class="title topictitle3" id="ariaid-title105">Substitute an InsertStmt's partition key exprs with the root node's smap.</h3>
+      <div class="body conbody">
+      <p class="p">
+        An <code class="ph codeph">INSERT</code> statement could encounter a serious error if the <code class="ph codeph">SELECT</code>
+        portion called an analytic function.
+      </p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-1737" target="_blank">IMPALA-1737</a></p>
+      </div>
+    </article>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title106" id="fixed_issues__fixed_issues_225">
+
+    <h2 class="title topictitle2" id="ariaid-title106">Issues Fixed in Impala <span class="keyword">Impala 2.2.5</span></h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        This section lists the most frequently encountered customer issues fixed in <span class="keyword">Impala 2.2.5</span>.
+      </p>
+
+    </div>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title107" id="fixed_issues_225__IMPALA-2048">
+      <h3 class="title topictitle3" id="ariaid-title107">Impala DML/DDL operations corrupt table metadata leading to Hive query failures</h3>
+      <div class="body conbody">
+      <p class="p">
+        When the Impala <code class="ph codeph">COMPUTE STATS</code> statement was run on a partitioned Parquet table that was created in Hive, the table subsequently became inaccessible in Hive.
+        The table was still accessible to Impala. Regaining access in Hive required a workaround of creating a new table. The error displayed in Hive was:
+      </p>
+<pre class="pre codeblock"><code>Error: Error while compiling statement: FAILED: SemanticException Class not found: org.apache.impala.hive.serde.ParquetInputFormat (state=42000,code=40000)</code></pre>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2048" target="_blank">IMPALA-2048</a></p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title108" id="fixed_issues_225__IMPALA-1929">
+      <h3 class="title topictitle3" id="ariaid-title108">Avoiding a DCHECK of NULL hash table in spilled right joins</h3>
+
+      <div class="body conbody">
+      <p class="p">
+        A query could encounter a serious error if it contained a <code class="ph codeph">RIGHT OUTER</code>, <code class="ph codeph">RIGHT ANTI</code>, or <code class="ph codeph">FULL OUTER</code> join clause
+        and approached the memory limit on a host so that the <span class="q">"spill to disk"</span> mechanism was activated.
+      </p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-1929" target="_blank">IMPALA-1929</a></p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title109" id="fixed_issues_225__IMPALA-2136">
+      <h3 class="title topictitle3" id="ariaid-title109">Bug in PrintTColumnValue caused wrong stats for TINYINT partition cols</h3>
+
+      <div class="body conbody">
+      <p class="p">
+        Declaring a partition key column as a <code class="ph codeph">TINYINT</code> caused problems with the <code class="ph codeph">COMPUTE STATS</code> statement.
+        The associated partitions would always have zero estimated rows, leading to potential inefficient query plans.
+      </p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2136" target="_blank">IMPALA-2136</a></p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title110" id="fixed_issues_225__IMPALA-2018">
+      <h3 class="title topictitle3" id="ariaid-title110">Where clause does not propagate to joins inside nested views</h3>
+
+      <div class="body conbody">
+      <p class="p">
+        A query that referred to a view whose query referred to another view containing a join, could return incorrect results.
+        <code class="ph codeph">WHERE</code> clauses for the outermost query were not always applied, causing the result
+        set to include additional rows that should have been filtered out.
+      </p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2018" target="_blank">IMPALA-2018</a></p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title111" id="fixed_issues_225__IMPALA-2064">
+      <h3 class="title topictitle3" id="ariaid-title111">Add effective_user() builtin</h3>
+
+      <div class="body conbody">
+      <p class="p">
+        The <code class="ph codeph">user()</code> function returned the name of the logged-in user, which might not be the
+        same as the user name being checked for authorization if, for example, delegation was enabled.
+      </p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2064" target="_blank">IMPALA-2064</a></p>
+      <p class="p"><strong class="ph b">Resolution:</strong> Rather than change the behavior of the <code class="ph codeph">user()</code> function,
+      the fix introduces an additional function <code class="ph codeph">effective_user()</code> that returns the user name that is checked during authorization.</p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title112" id="fixed_issues_225__IMPALA-2125">
+      <h3 class="title topictitle3" id="ariaid-title112">Make UTC to local TimestampValue conversion faster.</h3>
+
+      <div class="body conbody">
+      <p class="p">
+        Query performance was improved substantially for Parquet files containing <code class="ph codeph">TIMESTAMP</code>
+        data written by Hive, when the <code class="ph codeph">-convert_legacy_hive_parquet_utc_timestamps=true</code> setting
+        is in effect.
+      </p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2125" target="_blank">IMPALA-2125</a></p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title113" id="fixed_issues_225__IMPALA-2065">
+      <h3 class="title topictitle3" id="ariaid-title113">Workaround IMPALA-1619 in BufferedBlockMgr::ConsumeMemory()</h3>
+
+      <div class="body conbody">
+      <p class="p">
+        A join query could encounter a serious error if the query
+        approached the memory limit on a host so that the <span class="q">"spill to disk"</span> mechanism was activated,
+        and data volume in the join was large enough that an internal memory buffer exceeded 1 GB in size on a particular host.
+        (Exceeding this limit would only happen for huge join queries, because Impala could split this intermediate data
+        into 16 parts during the join query, and the buffer only contains compact bookkeeping data rather than the actual
+        join column data.)
+      </p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2065" target="_blank">IMPALA-2065</a></p>
+      </div>
+    </article>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title114" id="fixed_issues__fixed_issues_223">
+
+    <h2 class="title topictitle2" id="ariaid-title114">Issues Fixed in <span class="keyword">Impala 2.2.3</span></h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        This section lists the most frequently encountered customer issues fixed in <span class="keyword">Impala 2.2.3</span>.
+      </p>
+
+    </div>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title115" id="fixed_issues_223__isilon_support">
+      <h3 class="title topictitle3" id="ariaid-title115">Enable using Isilon as the underlying filesystem.</h3>
+      <div class="body conbody">
+      <p class="p">
+        Enabling Impala to work with the Isilon filesystem involves a number of
+        fixes to performance and flexibility for dealing with I/O using remote reads.
+        See <a class="xref" href="impala_isilon.html#impala_isilon">Using Impala with Isilon Storage</a> for details on using Impala and Isilon together.
+      </p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-1968" target="_blank">IMPALA-1968</a>,
+      <a class="xref" href="https://issues.apache.org/jira/b

<TRUNCATED>

[16/51] [partial] impala git commit: [DOCS] Impala doc site update for 3.0

Posted by mi...@apache.org.

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_parquet.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_parquet.html b/docs/build3x/html/topics/impala_parquet.html
new file mode 100644
index 0000000..ce5242e
--- /dev/null
+++ b/docs/build3x/html/topics/impala_parquet.html
@@ -0,0 +1,1421 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_file_formats.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content=
 "Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content=
 "parquet"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Using the Parquet File Format with Impala Tables</title></head><body id="parquet"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Using the Parquet File Format with Impala Tables</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+
+      Impala helps you to create, manage, and query Parquet tables. Parquet is a column-oriented binary file format
+      intended to be highly efficient for the types of large-scale queries that Impala is best at. Parquet is
+      especially good for queries scanning particular columns within a table, for example to query <span class="q">"wide"</span>
+      tables with many columns, or to perform aggregation operations such as <code class="ph codeph">SUM()</code> and
+      <code class="ph codeph">AVG()</code> that need to process most or all of the values from a column. Each data file contains
+      the values for a set of rows (the <span class="q">"row group"</span>). Within a data file, the values from each column are
+      organized so that they are all adjacent, enabling good compression for the values from that column. Queries
+      against a Parquet table can retrieve and analyze these values from any column quickly and with minimal I/O.
+    </p>
+
+    <table class="table"><caption><span class="table--title-label">Table 1. </span><span class="title">Parquet Format Support in Impala</span></caption><colgroup><col style="width:10%"><col style="width:10%"><col style="width:20%"><col style="width:30%"><col style="width:30%"></colgroup><thead class="thead">
+          <tr class="row">
+            <th class="entry nocellnorowborder" id="parquet__entry__1">
+              File Type
+            </th>
+            <th class="entry nocellnorowborder" id="parquet__entry__2">
+              Format
+            </th>
+            <th class="entry nocellnorowborder" id="parquet__entry__3">
+              Compression Codecs
+            </th>
+            <th class="entry nocellnorowborder" id="parquet__entry__4">
+              Impala Can CREATE?
+            </th>
+            <th class="entry nocellnorowborder" id="parquet__entry__5">
+              Impala Can INSERT?
+            </th>
+          </tr>
+        </thead><tbody class="tbody">
+          <tr class="row">
+            <td class="entry nocellnorowborder" headers="parquet__entry__1 ">
+              <a class="xref" href="impala_parquet.html#parquet">Parquet</a>
+            </td>
+            <td class="entry nocellnorowborder" headers="parquet__entry__2 ">
+              Structured
+            </td>
+            <td class="entry nocellnorowborder" headers="parquet__entry__3 ">
+              Snappy, gzip; currently Snappy by default
+            </td>
+            <td class="entry nocellnorowborder" headers="parquet__entry__4 ">
+              Yes.
+            </td>
+            <td class="entry nocellnorowborder" headers="parquet__entry__5 ">
+              Yes: <code class="ph codeph">CREATE TABLE</code>, <code class="ph codeph">INSERT</code>, <code class="ph codeph">LOAD DATA</code>, and query.
+            </td>
+          </tr>
+        </tbody></table>
+
+    <p class="p toc inpage"></p>
+
+  </div>
+
+
+  <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_file_formats.html">How Impala Works with Hadoop File Formats</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="parquet__parquet_ddl">
+
+    <h2 class="title topictitle2" id="ariaid-title2">Creating Parquet Tables in Impala</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        To create a table named <code class="ph codeph">PARQUET_TABLE</code> that uses the Parquet format, you would use a
+        command like the following, substituting your own table name, column names, and data types:
+      </p>
+
+<pre class="pre codeblock"><code>[impala-host:21000] &gt; create table <var class="keyword varname">parquet_table_name</var> (x INT, y STRING) STORED AS PARQUET;</code></pre>
+
+
+
+      <p class="p">
+        Or, to clone the column names and data types of an existing table:
+      </p>
+
+<pre class="pre codeblock"><code>[impala-host:21000] &gt; create table <var class="keyword varname">parquet_table_name</var> LIKE <var class="keyword varname">other_table_name</var> STORED AS PARQUET;</code></pre>
+
+      <p class="p">
+        In Impala 1.4.0 and higher, you can derive column definitions from a raw Parquet data file, even without an
+        existing Impala table. For example, you can create an external table pointing to an HDFS directory, and
+        base the column definitions on one of the files in that directory:
+      </p>
+
+<pre class="pre codeblock"><code>CREATE EXTERNAL TABLE ingest_existing_files LIKE PARQUET '/user/etl/destination/datafile1.dat'
+  STORED AS PARQUET
+  LOCATION '/user/etl/destination';
+</code></pre>
+
+      <p class="p">
+        Or, you can refer to an existing data file and create a new empty table with suitable column definitions.
+        Then you can use <code class="ph codeph">INSERT</code> to create new data files or <code class="ph codeph">LOAD DATA</code> to transfer
+        existing data files into the new table.
+      </p>
+
+<pre class="pre codeblock"><code>CREATE TABLE columns_from_data_file LIKE PARQUET '/user/etl/destination/datafile1.dat'
+  STORED AS PARQUET;
+</code></pre>
+
+      <p class="p">
+        The default properties of the newly created table are the same as for any other <code class="ph codeph">CREATE
+        TABLE</code> statement. For example, the default file format is text; if you want the new table to use
+        the Parquet file format, include the <code class="ph codeph">STORED AS PARQUET</code> file also.
+      </p>
+
+      <p class="p">
+        In this example, the new table is partitioned by year, month, and day. These partition key columns are not
+        part of the data file, so you specify them in the <code class="ph codeph">CREATE TABLE</code> statement:
+      </p>
+
+<pre class="pre codeblock"><code>CREATE TABLE columns_from_data_file LIKE PARQUET '/user/etl/destination/datafile1.dat'
+  PARTITION (year INT, month TINYINT, day TINYINT)
+  STORED AS PARQUET;
+</code></pre>
+
+      <p class="p">
+        See <a class="xref" href="impala_create_table.html#create_table">CREATE TABLE Statement</a> for more details about the <code class="ph codeph">CREATE TABLE
+        LIKE PARQUET</code> syntax.
+      </p>
+
+      <p class="p">
+        Once you have created a table, to insert data into that table, use a command similar to the following,
+        again with your own table names:
+      </p>
+
+
+
+<pre class="pre codeblock"><code>[impala-host:21000] &gt; insert overwrite table <var class="keyword varname">parquet_table_name</var> select * from <var class="keyword varname">other_table_name</var>;</code></pre>
+
+      <p class="p">
+        If the Parquet table has a different number of columns or different column names than the other table,
+        specify the names of columns from the other table rather than <code class="ph codeph">*</code> in the
+        <code class="ph codeph">SELECT</code> statement.
+      </p>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="parquet__parquet_etl">
+
+    <h2 class="title topictitle2" id="ariaid-title3">Loading Data into Parquet Tables</h2>
+
+
+    <div class="body conbody">
+
+      <p class="p">
+        Choose from the following techniques for loading data into Parquet tables, depending on whether the
+        original data is already in an Impala table, or exists as raw data files outside Impala.
+      </p>
+
+      <p class="p">
+        If you already have data in an Impala or Hive table, perhaps in a different file format or partitioning
+        scheme, you can transfer the data to a Parquet table using the Impala <code class="ph codeph">INSERT...SELECT</code>
+        syntax. You can convert, filter, repartition, and do other things to the data as part of this same
+        <code class="ph codeph">INSERT</code> statement. See <a class="xref" href="#parquet_compression">Snappy and GZip Compression for Parquet Data Files</a> for some examples showing how to
+        insert data into Parquet tables.
+      </p>
+
+      <div class="p">
+        When inserting into partitioned tables, especially using the Parquet file format, you can include a hint in
+        the <code class="ph codeph">INSERT</code> statement to fine-tune the overall performance of the operation and its
+        resource usage:
+        <ul class="ul">
+
+          <li class="li">
+            You would only use hints if an <code class="ph codeph">INSERT</code> into a partitioned Parquet table was
+            failing due to capacity limits, or if such an <code class="ph codeph">INSERT</code> was succeeding but with
+            less-than-optimal performance.
+          </li>
+
+          <li class="li">
+            To use a hint to influence the join order, put the hint keyword <code class="ph codeph">/* +SHUFFLE */</code> or <code class="ph codeph">/* +NOSHUFFLE */</code>
+            (including the square brackets) after the <code class="ph codeph">PARTITION</code> clause, immediately before the
+            <code class="ph codeph">SELECT</code> keyword.
+          </li>
+
+          <li class="li">
+            <code class="ph codeph">/* +SHUFFLE */</code> selects an execution plan that reduces the number of files being written
+            simultaneously to HDFS, and the number of memory buffers holding data for individual partitions. Thus
+            it reduces overall resource usage for the <code class="ph codeph">INSERT</code> operation, allowing some
+            <code class="ph codeph">INSERT</code> operations to succeed that otherwise would fail. It does involve some data
+            transfer between the nodes so that the data files for a particular partition are all constructed on the
+            same node.
+          </li>
+
+          <li class="li">
+            <code class="ph codeph">/* +NOSHUFFLE */</code> selects an execution plan that might be faster overall, but might also
+            produce a larger number of small data files or exceed capacity limits, causing the
+            <code class="ph codeph">INSERT</code> operation to fail. Use <code class="ph codeph">/* +SHUFFLE */</code> in cases where an
+            <code class="ph codeph">INSERT</code> statement fails or runs inefficiently due to all nodes attempting to construct
+            data for all partitions.
+          </li>
+
+          <li class="li">
+            Impala automatically uses the <code class="ph codeph">/* +SHUFFLE */</code> method if any partition key column in the
+            source table, mentioned in the <code class="ph codeph">INSERT ... SELECT</code> query, does not have column
+            statistics. In this case, only the <code class="ph codeph">/* +NOSHUFFLE */</code> hint would have any effect.
+          </li>
+
+          <li class="li">
+            If column statistics are available for all partition key columns in the source table mentioned in the
+            <code class="ph codeph">INSERT ... SELECT</code> query, Impala chooses whether to use the <code class="ph codeph">/* +SHUFFLE */</code>
+            or <code class="ph codeph">/* +NOSHUFFLE */</code> technique based on the estimated number of distinct values in those
+            columns and the number of nodes involved in the <code class="ph codeph">INSERT</code> operation. In this case, you
+            might need the <code class="ph codeph">/* +SHUFFLE */</code> or the <code class="ph codeph">/* +NOSHUFFLE */</code> hint to override the
+            execution plan selected by Impala.
+          </li>
+
+          <li class="li">
+            In <span class="keyword">Impala 2.8</span> or higher, you can make the
+            <code class="ph codeph">INSERT</code> operation organize (<span class="q">"cluster"</span>)
+            the data for each partition to avoid buffering data for multiple partitions
+            and reduce the risk of an out-of-memory condition. Specify the hint as
+            <code class="ph codeph">/* +CLUSTERED */</code>. This technique is primarily
+            useful for inserts into Parquet tables, where the large block
+            size requires substantial memory to buffer data for multiple
+            output files at once.
+          </li>
+
+        </ul>
+      </div>
+
+      <p class="p">
+        Any <code class="ph codeph">INSERT</code> statement for a Parquet table requires enough free space in the HDFS filesystem
+        to write one block. Because Parquet data files use a block size of 1 GB by default, an
+        <code class="ph codeph">INSERT</code> might fail (even for a very small amount of data) if your HDFS is running low on
+        space.
+      </p>
+
+
+
+      <p class="p">
+        Avoid the <code class="ph codeph">INSERT...VALUES</code> syntax for Parquet tables, because
+        <code class="ph codeph">INSERT...VALUES</code> produces a separate tiny data file for each
+        <code class="ph codeph">INSERT...VALUES</code> statement, and the strength of Parquet is in its handling of data
+        (compressing, parallelizing, and so on) in <span class="ph">large</span> chunks.
+      </p>
+
+      <p class="p">
+        If you have one or more Parquet data files produced outside of Impala, you can quickly make the data
+        queryable through Impala by one of the following methods:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          The <code class="ph codeph">LOAD DATA</code> statement moves a single data file or a directory full of data files into
+          the data directory for an Impala table. It does no validation or conversion of the data. The original
+          data files must be somewhere in HDFS, not the local filesystem.
+
+        </li>
+
+        <li class="li">
+          The <code class="ph codeph">CREATE TABLE</code> statement with the <code class="ph codeph">LOCATION</code> clause creates a table
+          where the data continues to reside outside the Impala data directory. The original data files must be
+          somewhere in HDFS, not the local filesystem. For extra safety, if the data is intended to be long-lived
+          and reused by other applications, you can use the <code class="ph codeph">CREATE EXTERNAL TABLE</code> syntax so that
+          the data files are not deleted by an Impala <code class="ph codeph">DROP TABLE</code> statement.
+
+        </li>
+
+        <li class="li">
+          If the Parquet table already exists, you can copy Parquet data files directly into it, then use the
+          <code class="ph codeph">REFRESH</code> statement to make Impala recognize the newly added data. Remember to preserve
+          the block size of the Parquet data files by using the <code class="ph codeph">hadoop distcp -pb</code> command rather
+          than a <code class="ph codeph">-put</code> or <code class="ph codeph">-cp</code> operation on the Parquet files. See
+          <a class="xref" href="#parquet_compression_multiple">Example of Copying Parquet Data Files</a> for an example of this kind of operation.
+        </li>
+      </ul>
+
+      <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+        <p class="p">
+          Currently, Impala always decodes the column data in Parquet files based on the ordinal position of the
+          columns, not by looking up the position of each column based on its name. Parquet files produced outside
+          of Impala must write column data in the same order as the columns are declared in the Impala table. Any
+          optional columns that are omitted from the data files must be the rightmost columns in the Impala table
+          definition.
+        </p>
+
+        <p class="p">
+          If you created compressed Parquet files through some tool other than Impala, make sure that any
+          compression codecs are supported in Parquet by Impala. For example, Impala does not currently support LZO
+          compression in Parquet files. Also doublecheck that you used any recommended compatibility settings in
+          the other tool, such as <code class="ph codeph">spark.sql.parquet.binaryAsString</code> when writing Parquet files
+          through Spark.
+        </p>
+      </div>
+
+      <p class="p">
+        Recent versions of Sqoop can produce Parquet output files using the <code class="ph codeph">--as-parquetfile</code>
+        option.
+      </p>
+
+      <p class="p"> If you use Sqoop to
+        convert RDBMS data to Parquet, be careful with interpreting any
+        resulting values from <code class="ph codeph">DATE</code>, <code class="ph codeph">DATETIME</code>,
+        or <code class="ph codeph">TIMESTAMP</code> columns. The underlying values are
+        represented as the Parquet <code class="ph codeph">INT64</code> type, which is
+        represented as <code class="ph codeph">BIGINT</code> in the Impala table. The Parquet
+        values represent the time in milliseconds, while Impala interprets
+          <code class="ph codeph">BIGINT</code> as the time in seconds. Therefore, if you have
+        a <code class="ph codeph">BIGINT</code> column in a Parquet table that was imported
+        this way from Sqoop, divide the values by 1000 when interpreting as the
+          <code class="ph codeph">TIMESTAMP</code> type.</p>
+
+      <p class="p">
+        If the data exists outside Impala and is in some other format, combine both of the preceding techniques.
+        First, use a <code class="ph codeph">LOAD DATA</code> or <code class="ph codeph">CREATE EXTERNAL TABLE ... LOCATION</code> statement to
+        bring the data into an Impala table that uses the appropriate file format. Then, use an
+        <code class="ph codeph">INSERT...SELECT</code> statement to copy the data to the Parquet table, converting to Parquet
+        format as part of the process.
+      </p>
+
+
+
+      <p class="p">
+        Loading data into Parquet tables is a memory-intensive operation, because the incoming data is buffered
+        until it reaches <span class="ph">one data block</span> in size, then that chunk of data is
+        organized and compressed in memory before being written out. The memory consumption can be larger when
+        inserting data into partitioned Parquet tables, because a separate data file is written for each
+        combination of partition key column values, potentially requiring several
+        <span class="ph">large</span> chunks to be manipulated in memory at once.
+      </p>
+
+      <p class="p">
+        When inserting into a partitioned Parquet table, Impala redistributes the data among the nodes to reduce
+        memory consumption. You might still need to temporarily increase the memory dedicated to Impala during the
+        insert operation, or break up the load operation into several <code class="ph codeph">INSERT</code> statements, or both.
+      </p>
+
+      <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+        All the preceding techniques assume that the data you are loading matches the structure of the destination
+        table, including column order, column names, and partition layout. To transform or reorganize the data,
+        start by loading the data into a Parquet table that matches the underlying structure of the data, then use
+        one of the table-copying techniques such as <code class="ph codeph">CREATE TABLE AS SELECT</code> or <code class="ph codeph">INSERT ...
+        SELECT</code> to reorder or rename columns, divide the data among multiple partitions, and so on. For
+        example to take a single comprehensive Parquet data file and load it into a partitioned table, you would
+        use an <code class="ph codeph">INSERT ... SELECT</code> statement with dynamic partitioning to let Impala create separate
+        data files with the appropriate partition values; for an example, see
+        <a class="xref" href="impala_insert.html#insert">INSERT Statement</a>.
+      </div>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="parquet__parquet_performance">
+
+    <h2 class="title topictitle2" id="ariaid-title4">Query Performance for Impala Parquet Tables</h2>
+
+
+    <div class="body conbody">
+
+      <p class="p">
+        Query performance for Parquet tables depends on the number of columns needed to process the
+        <code class="ph codeph">SELECT</code> list and <code class="ph codeph">WHERE</code> clauses of the query, the way data is divided into
+        <span class="ph">large data files with block size equal to file size</span>, the reduction in I/O
+        by reading the data for each column in compressed format, which data files can be skipped (for partitioned
+        tables), and the CPU overhead of decompressing the data for each column.
+      </p>
+
+      <div class="p">
+        For example, the following is an efficient query for a Parquet table:
+<pre class="pre codeblock"><code>select avg(income) from census_data where state = 'CA';</code></pre>
+        The query processes only 2 columns out of a large number of total columns. If the table is partitioned by
+        the <code class="ph codeph">STATE</code> column, it is even more efficient because the query only has to read and decode
+        1 column from each data file, and it can read only the data files in the partition directory for the state
+        <code class="ph codeph">'CA'</code>, skipping the data files for all the other states, which will be physically located
+        in other directories.
+      </div>
+
+      <div class="p">
+        The following is a relatively inefficient query for a Parquet table:
+<pre class="pre codeblock"><code>select * from census_data;</code></pre>
+        Impala would have to read the entire contents of each <span class="ph">large</span> data file,
+        and decompress the contents of each column for each row group, negating the I/O optimizations of the
+        column-oriented format. This query might still be faster for a Parquet table than a table with some other
+        file format, but it does not take advantage of the unique strengths of Parquet data files.
+      </div>
+
+      <p class="p">
+        Impala can optimize queries on Parquet tables, especially join queries, better when statistics are
+        available for all the tables. Issue the <code class="ph codeph">COMPUTE STATS</code> statement for each table after
+        substantial amounts of data are loaded into or appended to it. See
+        <a class="xref" href="impala_compute_stats.html#compute_stats">COMPUTE STATS Statement</a> for details.
+      </p>
+
+      <p class="p">
+        The runtime filtering feature, available in <span class="keyword">Impala 2.5</span> and higher, works best with Parquet tables.
+        The per-row filtering aspect only applies to Parquet tables.
+        See <a class="xref" href="impala_runtime_filtering.html#runtime_filtering">Runtime Filtering for Impala Queries (Impala 2.5 or higher only)</a> for details.
+      </p>
+
+      <p class="p">
+        In <span class="keyword">Impala 2.6</span> and higher, Impala queries are optimized for files stored in Amazon S3.
+        For Impala tables that use the file formats Parquet, RCFile, SequenceFile,
+        Avro, and uncompressed text, the setting <code class="ph codeph">fs.s3a.block.size</code>
+        in the <span class="ph filepath">core-site.xml</span> configuration file determines
+        how Impala divides the I/O work of reading the data files. This configuration
+        setting is specified in bytes. By default, this
+        value is 33554432 (32 MB), meaning that Impala parallelizes S3 read operations on the files
+        as if they were made up of 32 MB blocks. For example, if your S3 queries primarily access
+        Parquet files written by MapReduce or Hive, increase <code class="ph codeph">fs.s3a.block.size</code>
+        to 134217728 (128 MB) to match the row group size of those files. If most S3 queries involve
+        Parquet files written by Impala, increase <code class="ph codeph">fs.s3a.block.size</code>
+        to 268435456 (256 MB) to match the row group size produced by Impala.
+      </p>
+
+      <p class="p">
+        In <span class="keyword">Impala 2.9</span> and higher, Parquet files written by Impala include
+        embedded metadata specifying the minimum and maximum values for each column, within
+        each row group and each data page within the row group. Impala-written Parquet files
+        typically contain a single row group; a row group can contain many data pages.
+        Impala uses this information (currently, only the metadata for each row group)
+        when reading each Parquet data file during a query, to quickly determine whether each
+        row group within the file potentially includes any rows that match the conditions in the
+        <code class="ph codeph">WHERE</code> clause. For example, if the column <code class="ph codeph">X</code> within
+        a particular Parquet file has a minimum value of 1 and a maximum value of 100, then
+        a query including the clause <code class="ph codeph">WHERE x &gt; 200</code> can quickly determine
+        that it is safe to skip that particular file, instead of scanning all the associated
+        column values. This optimization technique is especially effective for tables that
+        use the <code class="ph codeph">SORT BY</code> clause for the columns most frequently checked in
+        <code class="ph codeph">WHERE</code> clauses, because any <code class="ph codeph">INSERT</code> operation on
+        such tables produces Parquet data files with relatively narrow ranges of column values
+        within each file.
+      </p>
+
+    </div>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title5" id="parquet_performance__parquet_partitioning">
+
+      <h3 class="title topictitle3" id="ariaid-title5">Partitioning for Parquet Tables</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          As explained in <a class="xref" href="impala_partitioning.html#partitioning">Partitioning for Impala Tables</a>, partitioning is an important
+          performance technique for Impala generally. This section explains some of the performance considerations
+          for partitioned Parquet tables.
+        </p>
+
+        <p class="p">
+          The Parquet file format is ideal for tables containing many columns, where most queries only refer to a
+          small subset of the columns. As explained in <a class="xref" href="#parquet_data_files">How Parquet Data Files Are Organized</a>, the physical layout of
+          Parquet data files lets Impala read only a small fraction of the data for many queries. The performance
+          benefits of this approach are amplified when you use Parquet tables in combination with partitioning.
+          Impala can skip the data files for certain partitions entirely, based on the comparisons in the
+          <code class="ph codeph">WHERE</code> clause that refer to the partition key columns. For example, queries on
+          partitioned tables often analyze data for time intervals based on columns such as <code class="ph codeph">YEAR</code>,
+          <code class="ph codeph">MONTH</code>, and/or <code class="ph codeph">DAY</code>, or for geographic regions. Remember that Parquet
+          data files use a <span class="ph">large</span> block size, so when deciding how finely to
+          partition the data, try to find a granularity where each partition contains
+          <span class="ph">256 MB</span> or more of data, rather than creating a large number of smaller
+          files split among many partitions.
+        </p>
+
+        <p class="p">
+          Inserting into a partitioned Parquet table can be a resource-intensive operation, because each Impala
+          node could potentially be writing a separate data file to HDFS for each combination of different values
+          for the partition key columns. The large number of simultaneous open files could exceed the HDFS
+          <span class="q">"transceivers"</span> limit. To avoid exceeding this limit, consider the following techniques:
+        </p>
+
+        <ul class="ul">
+          <li class="li">
+            Load different subsets of data using separate <code class="ph codeph">INSERT</code> statements with specific values
+            for the <code class="ph codeph">PARTITION</code> clause, such as <code class="ph codeph">PARTITION (year=2010)</code>.
+          </li>
+
+          <li class="li">
+            Increase the <span class="q">"transceivers"</span> value for HDFS, sometimes spelled <span class="q">"xcievers"</span> (sic). The property
+            value in the <span class="ph filepath">hdfs-site.xml</span> configuration file is
+
+            <code class="ph codeph">dfs.datanode.max.transfer.threads</code>. For example, if you were loading 12 years of data
+            partitioned by year, month, and day, even a value of 4096 might not be high enough. This
+            <a class="xref" href="http://blog.cloudera.com/blog/2012/03/hbase-hadoop-xceivers/" target="_blank">blog post</a> explores the considerations for setting this value
+            higher or lower, using HBase examples for illustration.
+          </li>
+
+          <li class="li">
+            Use the <code class="ph codeph">COMPUTE STATS</code> statement to collect
+            <a class="xref" href="impala_perf_stats.html#perf_column_stats">column statistics</a> on the source table from
+            which data is being copied, so that the Impala query can estimate the number of different values in the
+            partition key columns and distribute the work accordingly.
+          </li>
+        </ul>
+
+      </div>
+
+    </article>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title6" id="parquet__parquet_compression">
+
+    <h2 class="title topictitle2" id="ariaid-title6">Snappy and GZip Compression for Parquet Data Files</h2>
+
+
+    <div class="body conbody">
+
+      <p class="p">
+
+        When Impala writes Parquet data files using the <code class="ph codeph">INSERT</code> statement, the underlying
+        compression is controlled by the <code class="ph codeph">COMPRESSION_CODEC</code> query option. (Prior to Impala 2.0, the
+        query option name was <code class="ph codeph">PARQUET_COMPRESSION_CODEC</code>.) The allowed values for this query option
+        are <code class="ph codeph">snappy</code> (the default), <code class="ph codeph">gzip</code>, and <code class="ph codeph">none</code>. The option
+        value is not case-sensitive. If the option is set to an unrecognized value, all kinds of queries will fail
+        due to the invalid option setting, not just queries involving Parquet tables.
+      </p>
+
+    </div>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title7" id="parquet_compression__parquet_snappy">
+
+      <h3 class="title topictitle3" id="ariaid-title7">Example of Parquet Table with Snappy Compression</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+
+          By default, the underlying data files for a Parquet table are compressed with Snappy. The combination of
+          fast compression and decompression makes it a good choice for many data sets. To ensure Snappy
+          compression is used, for example after experimenting with other compression codecs, set the
+          <code class="ph codeph">COMPRESSION_CODEC</code> query option to <code class="ph codeph">snappy</code> before inserting the data:
+        </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; create database parquet_compression;
+[localhost:21000] &gt; use parquet_compression;
+[localhost:21000] &gt; create table parquet_snappy like raw_text_data;
+[localhost:21000] &gt; set COMPRESSION_CODEC=snappy;
+[localhost:21000] &gt; insert into parquet_snappy select * from raw_text_data;
+Inserted 1000000000 rows in 181.98s
+</code></pre>
+
+      </div>
+
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title8" id="parquet_compression__parquet_gzip">
+
+      <h3 class="title topictitle3" id="ariaid-title8">Example of Parquet Table with GZip Compression</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          If you need more intensive compression (at the expense of more CPU cycles for uncompressing during
+          queries), set the <code class="ph codeph">COMPRESSION_CODEC</code> query option to <code class="ph codeph">gzip</code> before
+          inserting the data:
+        </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; create table parquet_gzip like raw_text_data;
+[localhost:21000] &gt; set COMPRESSION_CODEC=gzip;
+[localhost:21000] &gt; insert into parquet_gzip select * from raw_text_data;
+Inserted 1000000000 rows in 1418.24s
+</code></pre>
+
+      </div>
+
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title9" id="parquet_compression__parquet_none">
+
+      <h3 class="title topictitle3" id="ariaid-title9">Example of Uncompressed Parquet Table</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          If your data compresses very poorly, or you want to avoid the CPU overhead of compression and
+          decompression entirely, set the <code class="ph codeph">COMPRESSION_CODEC</code> query option to <code class="ph codeph">none</code>
+          before inserting the data:
+        </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; create table parquet_none like raw_text_data;
+[localhost:21000] &gt; set COMPRESSION_CODEC=none;
+[localhost:21000] &gt; insert into parquet_none select * from raw_text_data;
+Inserted 1000000000 rows in 146.90s
+</code></pre>
+
+      </div>
+
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title10" id="parquet_compression__parquet_compression_examples">
+
+      <h3 class="title topictitle3" id="ariaid-title10">Examples of Sizes and Speeds for Compressed Parquet Tables</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          Here are some examples showing differences in data sizes and query speeds for 1 billion rows of synthetic
+          data, compressed with each kind of codec. As always, run similar tests with realistic data sets of your
+          own. The actual compression ratios, and relative insert and query speeds, will vary depending on the
+          characteristics of the actual data.
+        </p>
+
+        <p class="p">
+          In this case, switching from Snappy to GZip compression shrinks the data by an additional 40% or so,
+          while switching from Snappy compression to no compression expands the data also by about 40%:
+        </p>
+
+<pre class="pre codeblock"><code>$ hdfs dfs -du -h /user/hive/warehouse/parquet_compression.db
+23.1 G  /user/hive/warehouse/parquet_compression.db/parquet_snappy
+13.5 G  /user/hive/warehouse/parquet_compression.db/parquet_gzip
+32.8 G  /user/hive/warehouse/parquet_compression.db/parquet_none
+</code></pre>
+
+        <p class="p">
+          Because Parquet data files are typically <span class="ph">large</span>, each directory will
+          have a different number of data files and the row groups will be arranged differently.
+        </p>
+
+        <p class="p">
+          At the same time, the less agressive the compression, the faster the data can be decompressed. In this
+          case using a table with a billion rows, a query that evaluates all the values for a particular column
+          runs faster with no compression than with Snappy compression, and faster with Snappy compression than
+          with Gzip compression. Query performance depends on several other factors, so as always, run your own
+          benchmarks with your own data to determine the ideal tradeoff between data size, CPU efficiency, and
+          speed of insert and query operations.
+        </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; desc parquet_snappy;
+Query finished, fetching results ...
++-----------+---------+---------+
+| name      | type    | comment |
++-----------+---------+---------+
+| id        | int     |         |
+| val       | int     |         |
+| zfill     | string  |         |
+| name      | string  |         |
+| assertion | boolean |         |
++-----------+---------+---------+
+Returned 5 row(s) in 0.14s
+[localhost:21000] &gt; select avg(val) from parquet_snappy;
+Query finished, fetching results ...
++-----------------+
+| _c0             |
++-----------------+
+| 250000.93577915 |
++-----------------+
+Returned 1 row(s) in 4.29s
+[localhost:21000] &gt; select avg(val) from parquet_gzip;
+Query finished, fetching results ...
++-----------------+
+| _c0             |
++-----------------+
+| 250000.93577915 |
++-----------------+
+Returned 1 row(s) in 6.97s
+[localhost:21000] &gt; select avg(val) from parquet_none;
+Query finished, fetching results ...
++-----------------+
+| _c0             |
++-----------------+
+| 250000.93577915 |
++-----------------+
+Returned 1 row(s) in 3.67s
+</code></pre>
+
+      </div>
+
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title11" id="parquet_compression__parquet_compression_multiple">
+
+      <h3 class="title topictitle3" id="ariaid-title11">Example of Copying Parquet Data Files</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          Here is a final example, to illustrate how the data files using the various compression codecs are all
+          compatible with each other for read operations. The metadata about the compression format is written into
+          each data file, and can be decoded during queries regardless of the <code class="ph codeph">COMPRESSION_CODEC</code>
+          setting in effect at the time. In this example, we copy data files from the
+          <code class="ph codeph">PARQUET_SNAPPY</code>, <code class="ph codeph">PARQUET_GZIP</code>, and <code class="ph codeph">PARQUET_NONE</code> tables
+          used in the previous examples, each containing 1 billion rows, all to the data directory of a new table
+          <code class="ph codeph">PARQUET_EVERYTHING</code>. A couple of sample queries demonstrate that the new table now
+          contains 3 billion rows featuring a variety of compression codecs for the data files.
+        </p>
+
+        <p class="p">
+          First, we create the table in Impala so that there is a destination directory in HDFS to put the data
+          files:
+        </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; create table parquet_everything like parquet_snappy;
+Query: create table parquet_everything like parquet_snappy
+</code></pre>
+
+        <p class="p">
+          Then in the shell, we copy the relevant data files into the data directory for this new table. Rather
+          than using <code class="ph codeph">hdfs dfs -cp</code> as with typical files, we use <code class="ph codeph">hadoop distcp -pb</code>
+          to ensure that the special <span class="ph"> block size</span> of the Parquet data files is
+          preserved.
+        </p>
+
+<pre class="pre codeblock"><code>$ hadoop distcp -pb /user/hive/warehouse/parquet_compression.db/parquet_snappy \
+  /user/hive/warehouse/parquet_compression.db/parquet_everything
+...<var class="keyword varname">MapReduce output</var>...
+$ hadoop distcp -pb /user/hive/warehouse/parquet_compression.db/parquet_gzip  \
+  /user/hive/warehouse/parquet_compression.db/parquet_everything
+...<var class="keyword varname">MapReduce output</var>...
+$ hadoop distcp -pb /user/hive/warehouse/parquet_compression.db/parquet_none  \
+  /user/hive/warehouse/parquet_compression.db/parquet_everything
+...<var class="keyword varname">MapReduce output</var>...
+</code></pre>
+
+        <p class="p">
+          Back in the <span class="keyword cmdname">impala-shell</span> interpreter, we use the <code class="ph codeph">REFRESH</code> statement to
+          alert the Impala server to the new data files for this table, then we can run queries demonstrating that
+          the data files represent 3 billion rows, and the values for one of the numeric columns match what was in
+          the original smaller tables:
+        </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; refresh parquet_everything;
+Query finished, fetching results ...
+
+Returned 0 row(s) in 0.32s
+[localhost:21000] &gt; select count(*) from parquet_everything;
+Query finished, fetching results ...
++------------+
+| _c0        |
++------------+
+| 3000000000 |
++------------+
+Returned 1 row(s) in 8.18s
+[localhost:21000] &gt; select avg(val) from parquet_everything;
+Query finished, fetching results ...
++-----------------+
+| _c0             |
++-----------------+
+| 250000.93577915 |
++-----------------+
+Returned 1 row(s) in 13.35s
+</code></pre>
+
+      </div>
+
+    </article>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title12" id="parquet__parquet_complex_types">
+
+    <h2 class="title topictitle2" id="ariaid-title12">Parquet Tables for Impala Complex Types</h2>
+
+    <div class="body conbody">
+
+    <p class="p">
+      In <span class="keyword">Impala 2.3</span> and higher, Impala supports the complex types
+      <code class="ph codeph">ARRAY</code>, <code class="ph codeph">STRUCT</code>, and <code class="ph codeph">MAP</code>
+      See <a class="xref" href="impala_complex_types.html#complex_types">Complex Types (Impala 2.3 or higher only)</a> for details.
+      Because these data types are currently supported only for the Parquet file format,
+      if you plan to use them, become familiar with the performance and storage aspects
+      of Parquet first.
+    </p>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title13" id="parquet__parquet_interop">
+
+    <h2 class="title topictitle2" id="ariaid-title13">Exchanging Parquet Data Files with Other Hadoop Components</h2>
+
+
+    <div class="body conbody">
+
+      <p class="p">
+        You can read and write Parquet data files from other <span class="keyword"></span> components.
+        See <span class="xref">the documentation for your Apache Hadoop distribution</span> for details.
+      </p>
+
+
+
+
+
+
+
+
+
+      <p class="p">
+        Previously, it was not possible to create Parquet data through Impala and reuse that table within Hive. Now
+        that Parquet support is available for Hive, reusing existing Impala Parquet data files in Hive
+        requires updating the table metadata. Use the following command if you are already running Impala 1.1.1 or
+        higher:
+      </p>
+
+<pre class="pre codeblock"><code>ALTER TABLE <var class="keyword varname">table_name</var> SET FILEFORMAT PARQUET;
+</code></pre>
+
+      <p class="p">
+        If you are running a level of Impala that is older than 1.1.1, do the metadata update through Hive:
+      </p>
+
+<pre class="pre codeblock"><code>ALTER TABLE <var class="keyword varname">table_name</var> SET SERDE 'parquet.hive.serde.ParquetHiveSerDe';
+ALTER TABLE <var class="keyword varname">table_name</var> SET FILEFORMAT
+  INPUTFORMAT "parquet.hive.DeprecatedParquetInputFormat"
+  OUTPUTFORMAT "parquet.hive.DeprecatedParquetOutputFormat";
+</code></pre>
+
+      <p class="p">
+        Impala 1.1.1 and higher can reuse Parquet data files created by Hive, without any action required.
+      </p>
+
+
+
+      <p class="p">
+        Impala supports the scalar data types that you can encode in a Parquet data file, but not composite or
+        nested types such as maps or arrays. In <span class="keyword">Impala 2.2</span> and higher, Impala can query Parquet data
+        files that include composite or nested types, as long as the query only refers to columns with scalar
+        types.
+
+      </p>
+
+      <p class="p">
+        If you copy Parquet data files between nodes, or even between different directories on the same node, make
+        sure to preserve the block size by using the command <code class="ph codeph">hadoop distcp -pb</code>. To verify that the
+        block size was preserved, issue the command <code class="ph codeph">hdfs fsck -blocks
+        <var class="keyword varname">HDFS_path_of_impala_table_dir</var></code> and check that the average block size is at or
+        near <span class="ph">256 MB (or whatever other size is defined by the
+        <code class="ph codeph">PARQUET_FILE_SIZE</code> query option).</span>. (The <code class="ph codeph">hadoop distcp</code> operation
+        typically leaves some directories behind, with names matching <span class="ph filepath">_distcp_logs_*</span>, that you
+        can delete from the destination directory afterward.)
+
+
+
+        Issue the command <span class="keyword cmdname">hadoop distcp</span> for details about <span class="keyword cmdname">distcp</span> command
+        syntax.
+      </p>
+
+
+
+      <p class="p">
+        Impala can query Parquet files that use the <code class="ph codeph">PLAIN</code>, <code class="ph codeph">PLAIN_DICTIONARY</code>,
+        <code class="ph codeph">BIT_PACKED</code>, and <code class="ph codeph">RLE</code> encodings.
+        Currently, Impala does not support <code class="ph codeph">RLE_DICTIONARY</code> encoding.
+        When creating files outside of Impala for use by Impala, make sure to use one of the supported encodings.
+        In particular, for MapReduce jobs, <code class="ph codeph">parquet.writer.version</code> must not be defined
+        (especially as <code class="ph codeph">PARQUET_2_0</code>) for writing the configurations of Parquet MR jobs.
+        Use the default version (or format). The default format, 1.0, includes some enhancements that are compatible with older versions.
+        Data using the 2.0 format might not be consumable by Impala, due to use of the <code class="ph codeph">RLE_DICTIONARY</code> encoding.
+      </p>
+      <div class="p">
+        To examine the internal structure and data of Parquet files, you can use the
+        <span class="keyword cmdname">parquet-tools</span> command. Make sure this
+        command is in your <code class="ph codeph">$PATH</code>. (Typically, it is symlinked from
+        <span class="ph filepath">/usr/bin</span>; sometimes, depending on your installation setup, you
+        might need to locate it under an alternative  <code class="ph codeph">bin</code> directory.)
+        The arguments to this command let you perform operations such as:
+        <ul class="ul">
+          <li class="li">
+            <code class="ph codeph">cat</code>: Print a file's contents to standard out. In <span class="keyword">Impala 2.3</span> and higher, you can use
+            the <code class="ph codeph">-j</code> option to output JSON.
+          </li>
+          <li class="li">
+            <code class="ph codeph">head</code>: Print the first few records of a file to standard output.
+          </li>
+          <li class="li">
+            <code class="ph codeph">schema</code>: Print the Parquet schema for the file.
+          </li>
+          <li class="li">
+            <code class="ph codeph">meta</code>: Print the file footer metadata, including key-value properties (like Avro schema), compression ratios,
+            encodings, compression used, and row group information.
+          </li>
+          <li class="li">
+            <code class="ph codeph">dump</code>: Print all data and metadata.
+          </li>
+        </ul>
+        Use <code class="ph codeph">parquet-tools -h</code> to see usage information for all the arguments.
+        Here are some examples showing <span class="keyword cmdname">parquet-tools</span> usage:
+
+<pre class="pre codeblock"><code>
+$ # Be careful doing this for a big file! Use parquet-tools head to be safe.
+$ parquet-tools cat sample.parq
+year = 1992
+month = 1
+day = 2
+dayofweek = 4
+dep_time = 748
+crs_dep_time = 750
+arr_time = 851
+crs_arr_time = 846
+carrier = US
+flight_num = 53
+actual_elapsed_time = 63
+crs_elapsed_time = 56
+arrdelay = 5
+depdelay = -2
+origin = CMH
+dest = IND
+distance = 182
+cancelled = 0
+diverted = 0
+
+year = 1992
+month = 1
+day = 3
+...
+
+</code></pre>
+
+<pre class="pre codeblock"><code>
+$ parquet-tools head -n 2 sample.parq
+year = 1992
+month = 1
+day = 2
+dayofweek = 4
+dep_time = 748
+crs_dep_time = 750
+arr_time = 851
+crs_arr_time = 846
+carrier = US
+flight_num = 53
+actual_elapsed_time = 63
+crs_elapsed_time = 56
+arrdelay = 5
+depdelay = -2
+origin = CMH
+dest = IND
+distance = 182
+cancelled = 0
+diverted = 0
+
+year = 1992
+month = 1
+day = 3
+...
+
+</code></pre>
+
+<pre class="pre codeblock"><code>
+$ parquet-tools schema sample.parq
+message schema {
+  optional int32 year;
+  optional int32 month;
+  optional int32 day;
+  optional int32 dayofweek;
+  optional int32 dep_time;
+  optional int32 crs_dep_time;
+  optional int32 arr_time;
+  optional int32 crs_arr_time;
+  optional binary carrier;
+  optional int32 flight_num;
+...
+
+</code></pre>
+
+<pre class="pre codeblock"><code>
+$ parquet-tools meta sample.parq
+creator:             impala version 2.2.0-...
+
+file schema:         schema
+-------------------------------------------------------------------
+year:                OPTIONAL INT32 R:0 D:1
+month:               OPTIONAL INT32 R:0 D:1
+day:                 OPTIONAL INT32 R:0 D:1
+dayofweek:           OPTIONAL INT32 R:0 D:1
+dep_time:            OPTIONAL INT32 R:0 D:1
+crs_dep_time:        OPTIONAL INT32 R:0 D:1
+arr_time:            OPTIONAL INT32 R:0 D:1
+crs_arr_time:        OPTIONAL INT32 R:0 D:1
+carrier:             OPTIONAL BINARY R:0 D:1
+flight_num:          OPTIONAL INT32 R:0 D:1
+...
+
+row group 1:         RC:20636601 TS:265103674
+-------------------------------------------------------------------
+year:                 INT32 SNAPPY DO:4 FPO:35 SZ:10103/49723/4.92 VC:20636601 ENC:PLAIN_DICTIONARY,RLE,PLAIN
+month:                INT32 SNAPPY DO:10147 FPO:10210 SZ:11380/35732/3.14 VC:20636601 ENC:PLAIN_DICTIONARY,RLE,PLAIN
+day:                  INT32 SNAPPY DO:21572 FPO:21714 SZ:3071658/9868452/3.21 VC:20636601 ENC:PLAIN_DICTIONARY,RLE,PLAIN
+dayofweek:            INT32 SNAPPY DO:3093276 FPO:3093319 SZ:2274375/5941876/2.61 VC:20636601 ENC:PLAIN_DICTIONARY,RLE,PLAIN
+dep_time:             INT32 SNAPPY DO:5367705 FPO:5373967 SZ:28281281/28573175/1.01 VC:20636601 ENC:PLAIN_DICTIONARY,RLE,PLAIN
+crs_dep_time:         INT32 SNAPPY DO:33649039 FPO:33654262 SZ:10220839/11574964/1.13 VC:20636601 ENC:PLAIN_DICTIONARY,RLE,PLAIN
+arr_time:             INT32 SNAPPY DO:43869935 FPO:43876489 SZ:28562410/28797767/1.01 VC:20636601 ENC:PLAIN_DICTIONARY,RLE,PLAIN
+crs_arr_time:         INT32 SNAPPY DO:72432398 FPO:72438151 SZ:10908972/12164626/1.12 VC:20636601 ENC:PLAIN_DICTIONARY,RLE,PLAIN
+carrier:              BINARY SNAPPY DO:83341427 FPO:83341558 SZ:114916/128611/1.12 VC:20636601 ENC:PLAIN_DICTIONARY,RLE,PLAIN
+flight_num:           INT32 SNAPPY DO:83456393 FPO:83488603 SZ:10216514/11474301/1.12 VC:20636601 ENC:PLAIN_DICTIONARY,RLE,PLAIN
+...
+
+</code></pre>
+      </div>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title14" id="parquet__parquet_data_files">
+
+    <h2 class="title topictitle2" id="ariaid-title14">How Parquet Data Files Are Organized</h2>
+
+
+    <div class="body conbody">
+
+      <p class="p">
+        Although Parquet is a column-oriented file format, do not expect to find one data file for each column.
+        Parquet keeps all the data for a row within the same data file, to ensure that the columns for a row are
+        always available on the same node for processing. What Parquet does is to set a large HDFS block size and a
+        matching maximum data file size, to ensure that I/O and network transfer requests apply to large batches of
+        data.
+      </p>
+
+      <p class="p">
+        Within that data file, the data for a set of rows is rearranged so that all the values from the first
+        column are organized in one contiguous block, then all the values from the second column, and so on.
+        Putting the values from the same column next to each other lets Impala use effective compression techniques
+        on the values in that column.
+      </p>
+
+      <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+        <p class="p">
+          Impala <code class="ph codeph">INSERT</code> statements write Parquet data files using an HDFS block size
+          <span class="ph">that matches the data file size</span>, to ensure that each data file is
+          represented by a single HDFS block, and the entire file can be processed on a single node without
+          requiring any remote reads.
+        </p>
+
+        <p class="p">
+          If you create Parquet data files outside of Impala, such as through a MapReduce or Pig job, ensure that
+          the HDFS block size is greater than or equal to the file size, so that the <span class="q">"one file per block"</span>
+          relationship is maintained. Set the <code class="ph codeph">dfs.block.size</code> or the <code class="ph codeph">dfs.blocksize</code>
+          property large enough that each file fits within a single HDFS block, even if that size is larger than
+          the normal HDFS block size.
+        </p>
+
+        <p class="p">
+          If the block size is reset to a lower value during a file copy, you will see lower performance for
+          queries involving those files, and the <code class="ph codeph">PROFILE</code> statement will reveal that some I/O is
+          being done suboptimally, through remote reads. See
+          <a class="xref" href="impala_parquet.html#parquet_compression_multiple">Example of Copying Parquet Data Files</a> for an example showing how to preserve the
+          block size when copying Parquet data files.
+        </p>
+      </div>
+
+      <p class="p">
+        When Impala retrieves or tests the data for a particular column, it opens all the data files, but only
+        reads the portion of each file containing the values for that column. The column values are stored
+        consecutively, minimizing the I/O required to process the values within a single column. If other columns
+        are named in the <code class="ph codeph">SELECT</code> list or <code class="ph codeph">WHERE</code> clauses, the data for all columns
+        in the same row is available within that same data file.
+      </p>
+
+      <p class="p">
+        If an <code class="ph codeph">INSERT</code> statement brings in less than <span class="ph">one Parquet
+        block's worth</span> of data, the resulting data file is smaller than ideal. Thus, if you do split up an ETL
+        job to use multiple <code class="ph codeph">INSERT</code> statements, try to keep the volume of data for each
+        <code class="ph codeph">INSERT</code> statement to approximately <span class="ph">256 MB, or a multiple of
+        256 MB</span>.
+      </p>
+
+    </div>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title15" id="parquet_data_files__parquet_encoding">
+
+      <h3 class="title topictitle3" id="ariaid-title15">RLE and Dictionary Encoding for Parquet Data Files</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          Parquet uses some automatic compression techniques, such as run-length encoding (RLE) and dictionary
+          encoding, based on analysis of the actual data values. Once the data values are encoded in a compact
+          form, the encoded data can optionally be further compressed using a compression algorithm. Parquet data
+          files created by Impala can use Snappy, GZip, or no compression; the Parquet spec also allows LZO
+          compression, but currently Impala does not support LZO-compressed Parquet files.
+        </p>
+
+        <p class="p">
+          RLE and dictionary encoding are compression techniques that Impala applies automatically to groups of
+          Parquet data values, in addition to any Snappy or GZip compression applied to the entire data files.
+          These automatic optimizations can save you time and planning that are normally needed for a traditional
+          data warehouse. For example, dictionary encoding reduces the need to create numeric IDs as abbreviations
+          for longer string values.
+        </p>
+
+        <p class="p">
+          Run-length encoding condenses sequences of repeated data values. For example, if many consecutive rows
+          all contain the same value for a country code, those repeating values can be represented by the value
+          followed by a count of how many times it appears consecutively.
+        </p>
+
+        <p class="p">
+          Dictionary encoding takes the different values present in a column, and represents each one in compact
+          2-byte form rather than the original value, which could be several bytes. (Additional compression is
+          applied to the compacted values, for extra space savings.) This type of encoding applies when the number
+          of different values for a column is less than 2**16 (16,384). It does not apply to columns of data type
+          <code class="ph codeph">BOOLEAN</code>, which are already very short. <code class="ph codeph">TIMESTAMP</code> columns sometimes have
+          a unique value for each row, in which case they can quickly exceed the 2**16 limit on distinct values.
+          The 2**16 limit on different values within a column is reset for each data file, so if several different
+          data files each contained 10,000 different city names, the city name column in each data file could still
+          be condensed using dictionary encoding.
+        </p>
+
+      </div>
+
+    </article>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title16" id="parquet__parquet_compacting">
+
+    <h2 class="title topictitle2" id="ariaid-title16">Compacting Data Files for Parquet Tables</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        If you reuse existing table structures or ETL processes for Parquet tables, you might encounter a <span class="q">"many
+        small files"</span> situation, which is suboptimal for query efficiency. For example, statements like these
+        might produce inefficiently organized data files:
+      </p>
+
+<pre class="pre codeblock"><code>-- In an N-node cluster, each node produces a data file
+-- for the INSERT operation. If you have less than
+-- N GB of data to copy, some files are likely to be
+-- much smaller than the <span class="ph">default Parquet</span> block size.
+insert into parquet_table select * from text_table;
+
+-- Even if this operation involves an overall large amount of data,
+-- when split up by year/month/day, each partition might only
+-- receive a small amount of data. Then the data files for
+-- the partition might be divided between the N nodes in the cluster.
+-- A multi-gigabyte copy operation might produce files of only
+-- a few MB each.
+insert into partitioned_parquet_table partition (year, month, day)
+  select year, month, day, url, referer, user_agent, http_code, response_time
+  from web_stats;
+</code></pre>
+
+      <p class="p">
+        Here are techniques to help you produce large data files in Parquet <code class="ph codeph">INSERT</code> operations, and
+        to compact existing too-small data files:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          <p class="p">
+            When inserting into a partitioned Parquet table, use statically partitioned <code class="ph codeph">INSERT</code>
+            statements where the partition key values are specified as constant values. Ideally, use a separate
+            <code class="ph codeph">INSERT</code> statement for each partition.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+        You might set the <code class="ph codeph">NUM_NODES</code> option to 1 briefly, during <code class="ph codeph">INSERT</code> or
+        <code class="ph codeph">CREATE TABLE AS SELECT</code> statements. Normally, those statements produce one or more data
+        files per data node. If the write operation involves small amounts of data, a Parquet table, and/or a
+        partitioned table, the default behavior could produce many small files when intuitively you might expect
+        only a single output file. <code class="ph codeph">SET NUM_NODES=1</code> turns off the <span class="q">"distributed"</span> aspect of the
+        write operation, making it more likely to produce only one or a few data files.
+      </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            Be prepared to reduce the number of partition key columns from what you are used to with traditional
+            analytic database systems.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            Do not expect Impala-written Parquet files to fill up the entire Parquet block size. Impala estimates
+            on the conservative side when figuring out how much data to write to each Parquet file. Typically, the
+            of uncompressed data in memory is substantially reduced on disk by the compression and encoding
+            techniques in the Parquet file format.
+
+            The final data file size varies depending on the compressibility of the data. Therefore, it is not an
+            indication of a problem if <span class="ph">256 MB</span> of text data is turned into 2
+            Parquet data files, each less than <span class="ph">256 MB</span>.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            If you accidentally end up with a table with many small data files, consider using one or more of the
+            preceding techniques and copying all the data into a new Parquet table, either through <code class="ph codeph">CREATE
+            TABLE AS SELECT</code> or <code class="ph codeph">INSERT ... SELECT</code> statements.
+          </p>
+
+          <p class="p">
+            To avoid rewriting queries to change table names, you can adopt a convention of always running
+            important queries against a view. Changing the view definition immediately switches any subsequent
+            queries to use the new underlying tables:
+          </p>
+<pre class="pre codeblock"><code>create view production_table as select * from table_with_many_small_files;
+-- CTAS or INSERT...SELECT all the data into a more efficient layout...
+alter view production_table as select * from table_with_few_big_files;
+select * from production_table where c1 = 100 and c2 &lt; 50 and ...;
+</code></pre>
+        </li>
+      </ul>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title17" id="parquet__parquet_schema_evolution">
+
+    <h2 class="title topictitle2" id="ariaid-title17">Schema Evolution for Parquet Tables</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        Schema evolution refers to using the statement <code class="ph codeph">ALTER TABLE ... REPLACE COLUMNS</code> to change
+        the names, data type, or number of columns in a table. You can perform schema evolution for Parquet tables
+        as follows:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          <p class="p">
+            The Impala <code class="ph codeph">ALTER TABLE</code> statement never changes any data files in the tables. From the
+            Impala side, schema evolution involves interpreting the same data files in terms of a new table
+            definition. Some types of schema changes make sense and are represented correctly. Other types of
+            changes cannot be represented in a sensible way, and produce special result values or conversion errors
+            during queries.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            The <code class="ph codeph">INSERT</code> statement always creates data using the latest table definition. You might
+            end up with data files with different numbers of columns or internal data representations if you do a
+            sequence of <code class="ph codeph">INSERT</code> and <code class="ph codeph">ALTER TABLE ... REPLACE COLUMNS</code> statements.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            If you use <code class="ph codeph">ALTER TABLE ... REPLACE COLUMNS</code> to define additional columns at the end,
+            when the original data files are used in a query, these final columns are considered to be all
+            <code class="ph codeph">NULL</code> values.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            If you use <code class="ph codeph">ALTER TABLE ... REPLACE COLUMNS</code> to define fewer columns than before, when
+            the original data files are used in a query, the unused columns still present in the data file are
+            ignored.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            Parquet represents the <code class="ph codeph">TINYINT</code>, <code class="ph codeph">SMALLINT</code>, and <code class="ph codeph">INT</code>
+            types the same internally, all stored in 32-bit integers.
+          </p>
+          <ul class="ul">
+            <li class="li">
+              That means it is easy to promote a <code class="ph codeph">TINYINT</code> column to <code class="ph codeph">SMALLINT</code> or
+              <code class="ph codeph">INT</code>, or a <code class="ph codeph">SMALLINT</code> column to <code class="ph codeph">INT</code>. The numbers are
+              represented exactly the same in the data file, and the columns being promoted would not contain any
+              out-of-range values.
+            </li>
+
+            <li class="li">
+              <p class="p">
+                If you change any of these column types to a smaller type, any values that are out-of-range for the
+                new type are returned incorrectly, typically as negative numbers.
+              </p>
+            </li>
+
+            <li class="li">
+              <p class="p">
+                You cannot change a <code class="ph codeph">TINYINT</code>, <code class="ph codeph">SMALLINT</code>, or <code class="ph codeph">INT</code>
+                column to <code class="ph codeph">BIGINT</code>, or the other way around. Although the <code class="ph codeph">ALTER
+                TABLE</code> succeeds, any attempt to query those columns results in conversion errors.
+              </p>
+            </li>
+
+            <li class="li">
+              <p class="p">
+                Any other type conversion for columns produces a conversion error during queries. For example,
+                <code class="ph codeph">INT</code> to <code class="ph codeph">STRING</code>, <code class="ph codeph">FLOAT</code> to <code class="ph codeph">DOUBLE</code>,
+                <code class="ph codeph">TIMESTAMP</code> to <code class="ph codeph">STRING</code>, <code class="ph codeph">DECIMAL(9,0)</code> to
+                <code class="ph codeph">DECIMAL(5,2)</code>, and so on.
+              </p>
+            </li>
+          </ul>
+        </li>
+      </ul>
+
+      <div class="p">
+        You might find that you have Parquet files where the columns do not line up in the same
+        order as in your Impala table. For example, you might have a Parquet file that was part of
+        a table with columns <code class="ph codeph">C1,C2,C3,C4</code>, and now you want to reuse the same
+        Parquet file in a table with columns <code class="ph codeph">C4,C2</code>. By default, Impala expects the
+        columns in the data file to appear in the same order as the columns defined for the table,
+        making it impractical to do some kinds of file reuse or schema evolution. In <span class="keyword">Impala 2.6</span>
+        and higher, the query option <code class="ph codeph">PARQUET_FALLBACK_SCHEMA_RESOLUTION=name</code> lets Impala
+        resolve columns by name, and therefore handle out-of-order or extra columns in the data file.
+        For example:
+
+<pre class="pre codeblock"><code>
+create database schema_evolution;
+use schema_evolution;
+create table t1 (c1 int, c2 boolean, c3 string, c4 timestamp)
+  stored as parquet;
+insert into t1 values
+  (1, true, 'yes', now()),
+  (2, false, 'no', now() + interval 1 day);
+
+select * from t1;
++----+-------+-----+-------------------------------+
+| c1 | c2    | c3  | c4                            |
++----+-------+-----+-------------------------------+
+| 1  | true  | yes | 2016-06-28 14:53:26.554369000 |
+| 2  | false | no  | 2016-06-29 14:53:26.554369000 |
++----+-------+-----+-------------------------------+
+
+desc formatted t1;
+...
+| Location:   | /user/hive/warehouse/schema_evolution.db/t1 |
+...
+
+-- Make T2 have the same data file as in T1, including 2
+-- unused columns and column order different than T2 expects.
+load data inpath '/user/hive/warehouse/schema_evolution.db/t1'
+  into table t2;
++----------------------------------------------------------+
+| summary                                                  |
++----------------------------------------------------------+
+| Loaded 1 file(s). Total files in destination location: 1 |
++----------------------------------------------------------+
+
+-- 'position' is the default setting.
+-- Impala cannot read the Parquet file if the column order does not match.
+set PARQUET_FALLBACK_SCHEMA_RESOLUTION=position;
+PARQUET_FALLBACK_SCHEMA_RESOLUTION set to position
+
+select * from t2;
+WARNINGS:
+File 'schema_evolution.db/t2/45331705_data.0.parq'
+has an incompatible Parquet schema for column 'schema_evolution.t2.c4'.
+Column type: TIMESTAMP, Parquet schema: optional int32 c1 [i:0 d:1 r:0]
+
+File 'schema_evolution.db/t2/45331705_data.0.parq'
+has an incompatible Parquet schema for column 'schema_evolution.t2.c4'.
+Column type: TIMESTAMP, Parquet schema: optional int32 c1 [i:0 d:1 r:0]
+
+-- With the 'name' setting, Impala can read the Parquet data files
+-- despite mismatching column order.
+set PARQUET_FALLBACK_SCHEMA_RESOLUTION=name;
+PARQUET_FALLBACK_SCHEMA_RESOLUTION set to name
+
+select * from t2;
++-------------------------------+-------+
+| c4                            | c2    |
++-------------------------------+-------+
+| 2016-06-28 14:53:26.554369000 | true  |
+| 2016-06-29 14:53:26.554369000 | false |
++-------------------------------+-------+
+
+</code></pre>
+
+        See <a class="xref" href="impala_parquet_fallback_schema_resolution.html#parquet_fallback_schema_resolution">PARQUET_FALLBACK_SCHEMA_RESOLUTION Query Option (Impala 2.6 or higher only)</a>
+        for more details.
+      </div>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title18" id="parquet__parquet_data_types">
+
+    <h2 class="title topictitle2" id="ariaid-title18">Data Type Considerations for Parquet Tables</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        The Parquet format defines a set of data types whose names differ from the names of the corresponding
+        Impala data types. If you are preparing Parquet files using other Hadoop components such as Pig or
+        MapReduce, you might need to work with the type names defined by Parquet. The following figure lists the
+        Parquet-defined types and the equivalent types in Impala.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Primitive types:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>BINARY -&gt; STRING
+BOOLEAN -&gt; BOOLEAN
+DOUBLE -&gt; DOUBLE
+FLOAT -&gt; FLOAT
+INT32 -&gt; INT
+INT64 -&gt; BIGINT
+INT96 -&gt; TIMESTAMP
+</code></pre>
+
+      <p class="p">
+        <strong class="ph b">Logical types:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>BINARY + OriginalType UTF8 -&gt; STRING
+BINARY + OriginalType ENUM -&gt; STRING
+BINARY + OriginalType DECIMAL -&gt; DECIMAL
+</code></pre>
+
+      <p class="p">
+        <strong class="ph b">Complex types:</strong>
+      </p>
+
+      <p class="p">
+        For the complex types (<code class="ph codeph">ARRAY</code>, <code class="ph codeph">MAP</code>, and <code class="ph codeph">STRUCT</code>)
+        available in <span class="keyword">Impala 2.3</span> and higher, Impala only supports queries
+        against those types in Parquet tables.
+      </p>
+
+    </div>
+
+  </article>
+
+</article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_parquet_annotate_strings_utf8.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_parquet_annotate_strings_utf8.html b/docs/build3x/html/topics/impala_parquet_annotate_strings_utf8.html
new file mode 100644
index 0000000..f72b664
--- /dev/null
+++ b/docs/build3x/html/topics/impala_parquet_annotate_strings_utf8.html
@@ -0,0 +1,54 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="parquet_annotate_strings_utf8"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>PARQUET_ANNOTATE_STRINGS_UTF8 Query Option (Impala 2.6 or higher only)</title></head><body id="parquet_annotate_strings_utf8"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">PARQUET_ANNOTATE_STRINGS_UTF8 Query Option (<span class="keyword">Impala 2.6</span> or higher only)</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+
+      Causes Impala <code class="ph codeph">INSERT</code> and <code class="ph codeph">CREATE TABLE AS SELECT</code> statements
+      to write Parquet files that use the UTF-8 annotation for <code class="ph codeph">STRING</code> columns.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+    <p class="p">
+      By default, Impala represents a <code class="ph codeph">STRING</code> column in Parquet as an unannotated binary field.
+    </p>
+    <p class="p">
+      Impala always uses the UTF-8 annotation when writing <code class="ph codeph">CHAR</code> and <code class="ph codeph">VARCHAR</code>
+      columns to Parquet files. An alternative to using the query option is to cast <code class="ph codeph">STRING</code>
+      values to <code class="ph codeph">VARCHAR</code>.
+    </p>
+    <p class="p">
+      This option is to help make Impala-written data more interoperable with other data processing engines.
+      Impala itself currently does not support all operations on UTF-8 data.
+      Although data processed by Impala is typically represented in ASCII, it is valid to designate the
+      data as UTF-8 when storing on disk, because ASCII is a subset of UTF-8.
+    </p>
+    <p class="p">
+        <strong class="ph b">Type:</strong> Boolean; recognized values are 1 and 0, or <code class="ph codeph">true</code> and <code class="ph codeph">false</code>;
+        any other value interpreted as <code class="ph codeph">false</code>
+      </p>
+    <p class="p">
+        <strong class="ph b">Default:</strong> <code class="ph codeph">false</code> (shown as 0 in output of <code class="ph codeph">SET</code> statement)
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.6.0</span>
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+    <p class="p">
+      <a class="xref" href="impala_parquet.html#parquet">Using the Parquet File Format with Impala Tables</a>
+    </p>
+
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_parquet_array_resolution.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_parquet_array_resolution.html b/docs/build3x/html/topics/impala_parquet_array_resolution.html
new file mode 100644
index 0000000..831ac46
--- /dev/null
+++ b/docs/build3x/html/topics/impala_parquet_array_resolution.html
@@ -0,0 +1,180 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="parquet_array_resolution"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>PARQUET_ARRAY_RESOLUTION Query Option (Impala 2.9 or higher only)</title></head><body id="parquet_array_resolution"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">
+    PARQUET_ARRAY_RESOLUTION Query Option (<span class="keyword">Impala 2.9</span> or higher only)
+  </h1>
+
+
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+      The <code class="ph codeph">PARQUET_ARRAY_RESOLUTION</code> query option controls the
+      behavior of the indexed-based resolution for nested arrays in Parquet.
+    </p>
+
+    <p class="p">
+      In Parquet, you can represent an array using a 2-level or 3-level
+      representation. The modern, standard representation is 3-level. The legacy
+      2-level scheme is supported for compatibility with older Parquet files.
+      However, there is no reliable metadata within Parquet files to indicate
+      which encoding was used. It is even possible to have mixed encodings within
+      the same file if there are multiple arrays. The
+      <code class="ph codeph">PARQUET_ARRAY_RESOLTUTION</code> option controls the process of
+      resolution that is to match every column/field reference from a query to a
+      column in the Parquet file.</p>
+
+    <p class="p">
+      The supported values for the query option are:
+    </p>
+
+    <ul class="ul">
+      <li class="li">
+        <code class="ph codeph">THREE_LEVEL</code>: Assumes arrays are encoded with the 3-level
+        representation, and does not attempt the 2-level resolution.
+      </li>
+
+      <li class="li">
+        <code class="ph codeph">TWO_LEVEL</code>: Assumes arrays are encoded with the 2-level
+        representation, and does not attempt the 3-level resolution.
+      </li>
+
+      <li class="li">
+        <code class="ph codeph">TWO_LEVEL_THEN_THREE_LEVEL</code>: First tries to resolve
+        assuming a 2-level representation, and if unsuccessful, tries a 3-level
+        representation.
+      </li>
+    </ul>
+
+    <p class="p">
+      All of the above options resolve arrays encoded with a single level.
+    </p>
+
+    <p class="p">
+      A failure to resolve a column/field reference in a query with a given array
+      resolution policy does not necessarily result in a warning or error returned
+      by the query. A mismatch might be treated like a missing column (returns
+      NULL values), and it is not possible to reliably distinguish the 'bad
+      resolution' and 'legitimately missing column' cases.
+    </p>
+
+    <p class="p">
+      The name-based policy generally does not have the problem of ambiguous
+      array representations. You specify to use the name-based policy by setting
+      the <code class="ph codeph">PARQUET_FALLBACK_SCHEMA_RESOLUTION</code> query option to
+      <code class="ph codeph">NAME</code>.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Type:</strong> Enum of <code class="ph codeph">ONE_LEVEL</code>, <code class="ph codeph">TWO_LEVEL</code>,
+      <code class="ph codeph">THREE_LEVEL</code>
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Default:</strong> <code class="ph codeph">THREE_LEVEL</code>
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.9.0</span>
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+    <p class="p">
+      EXAMPLE A: The following Parquet schema of a file can be interpreted as a
+      2-level or 3-level:
+    </p>
+
+<pre class="pre codeblock"><code>
+ParquetSchemaExampleA {
+  optional group single_element_groups (LIST) {
+    repeated group single_element_group {
+      required int64 count;
+    }
+  }
+}
+</code></pre>
+
+    <p class="p">
+      The following table schema corresponds to a 2-level interpretation:
+    </p>
+
+<pre class="pre codeblock"><code>
+CREATE TABLE t (col1 array&lt;struct&lt;f1: bigint&gt;&gt;) STORED AS PARQUET;
+</code></pre>
+
+    <p class="p">
+      Successful query with a 2-level interpretation:
+    </p>
+
+<pre class="pre codeblock"><code>
+SET PARQUET_ARRAY_RESOLUTION=TWO_LEVEL;
+SELECT ITEM.f1 FROM t.col1;
+</code></pre>
+
+    <p class="p">
+      The following table schema corresponds to a 3-level interpretation:
+    </p>
+
+<pre class="pre codeblock"><code>
+CREATE TABLE t (col1 array&lt;bigint&gt;) STORED AS PARQUET;
+</code></pre>
+
+    <p class="p">
+      Successful query with a 3-level interpretation:
+    </p>
+
+<pre class="pre codeblock"><code>
+SET PARQUET_ARRAY_RESOLUTION=THREE_LEVEL;
+SELECT ITEM FROM t.col1
+</code></pre>
+
+    <p class="p">
+      EXAMPLE B: The following Parquet schema of a file can be only be successfully
+      interpreted as a 2-level:
+    </p>
+
+<pre class="pre codeblock"><code>
+ParquetSchemaExampleB {
+  required group list_of_ints (LIST) {
+    repeated int32 list_of_ints_tuple;
+  }
+}
+</code></pre>
+
+    <p class="p">
+      The following table schema corresponds to a 2-level interpretation:
+    </p>
+
+<pre class="pre codeblock"><code>
+CREATE TABLE t (col1 array&lt;int&gt;) STORED AS PARQUET;
+</code></pre>
+
+    <p class="p">
+      Successful query with a 2-level interpretation:
+    </p>
+
+<pre class="pre codeblock"><code>
+SET PARQUET_ARRAY_RESOLUTION=TWO_LEVEL;
+SELECT ITEM FROM t.col1
+</code></pre>
+
+    <p class="p">
+      Unsuccessful query with a 3-level interpretation. The query returns
+      <code class="ph codeph">NULL</code>s as if the column was missing in the file:
+    </p>
+
+<pre class="pre codeblock"><code>
+SET PARQUET_ARRAY_RESOLUTION=THREE_LEVEL;
+SELECT ITEM FROM t.col1
+</code></pre>
+
+  </div>
+
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_parquet_compression_codec.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_parquet_compression_codec.html b/docs/build3x/html/topics/impala_parquet_compression_codec.html
new file mode 100644
index 0000000..ac5551a
--- /dev/null
+++ b/docs/build3x/html/topics/impala_parquet_compression_codec.html
@@ -0,0 +1,17 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="parquet_compression_codec"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>PARQUET_COMPRESSION_CODEC Query Option</title></head><body id="parquet_compression_codec"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">PARQUET_COMPRESSION_CODEC Query Option</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+
+      Deprecated. Use <code class="ph codeph">COMPRESSION_CODEC</code> in Impala 2.0 and later. See
+      <a class="xref" href="impala_compression_codec.html#compression_codec">COMPRESSION_CODEC Query Option (Impala 2.0 or higher only)</a> for details.
+    </p>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>

[12/51] [partial] impala git commit: [DOCS] Impala doc site update for 3.0

Posted by mi...@apache.org.

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_planning.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_planning.html b/docs/build3x/html/topics/impala_planning.html
new file mode 100644
index 0000000..e571e42
--- /dev/null
+++ b/docs/build3x/html/topics/impala_planning.html
@@ -0,0 +1,20 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_prereqs.html#prereqs"><meta name="DC.Relation" scheme="URI" content="../topics/impala_schema_design.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="planning"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Planning for Impala Deployment</title></head><body id="planning"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Planning for Impala Deployment</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+
+      Before you set up Impala in production, do some planning to make sure that your hardware setup has sufficient
+      capacity, that your cluster topology is optimal for Impala queries, and that your schema design and ETL
+      processes follow the best practices for Impala.
+    </p>
+
+    <p class="p toc"></p>
+  </div>
+<nav role="navigation" class="related-links"><ul class="ullinks"><li class="link ulchildlink"><strong><a href="../topics/impala_prereqs.html#prereqs">Impala Requirements</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_schema_design.html">Guidelines for Designing Impala Schemas</a></strong><br></li></ul></nav></article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_porting.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_porting.html b/docs/build3x/html/topics/impala_porting.html
new file mode 100644
index 0000000..8a8ba7e
--- /dev/null
+++ b/docs/build3x/html/topics/impala_porting.html
@@ -0,0 +1,603 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_langref.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="porting"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Porting SQL from Other Database Systems to Impala</title></head><body id="porting"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Porting SQL from Other Database Systems to Impala</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+
+      Although Impala uses standard SQL for queries, you might need to modify SQL source when bringing applications
+      to Impala, due to variations in data types, built-in functions, vendor language extensions, and
+      Hadoop-specific syntax. Even when SQL is working correctly, you might make further minor modifications for
+      best performance.
+    </p>
+
+    <p class="p toc inpage"></p>
+  </div>
+
+  <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_langref.html">Impala SQL Language Reference</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="porting__porting_ddl_dml">
+
+    <h2 class="title topictitle2" id="ariaid-title2">Porting DDL and DML Statements</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        When adapting SQL code from a traditional database system to Impala, expect to find a number of differences
+        in the DDL statements that you use to set up the schema. Clauses related to physical layout of files,
+        tablespaces, and indexes have no equivalent in Impala. You might restructure your schema considerably to
+        account for the Impala partitioning scheme and Hadoop file formats.
+      </p>
+
+      <p class="p">
+        Expect SQL queries to have a much higher degree of compatibility. With modest rewriting to address vendor
+        extensions and features not yet supported in Impala, you might be able to run identical or almost-identical
+        query text on both systems.
+      </p>
+
+      <p class="p">
+        Therefore, consider separating out the DDL into a separate Impala-specific setup script. Focus your reuse
+        and ongoing tuning efforts on the code for SQL queries.
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="porting__porting_data_types">
+
+    <h2 class="title topictitle2" id="ariaid-title3">Porting Data Types from Other Database Systems</h2>
+
+    <div class="body conbody">
+
+      <ul class="ul">
+        <li class="li">
+          <p class="p">
+            Change any <code class="ph codeph">VARCHAR</code>, <code class="ph codeph">VARCHAR2</code>, and <code class="ph codeph">CHAR</code> columns to
+            <code class="ph codeph">STRING</code>. Remove any length constraints from the column declarations; for example,
+            change <code class="ph codeph">VARCHAR(32)</code> or <code class="ph codeph">CHAR(1)</code> to <code class="ph codeph">STRING</code>. Impala is
+            very flexible about the length of string values; it does not impose any length constraints
+            or do any special processing (such as blank-padding) for <code class="ph codeph">STRING</code> columns.
+            (In Impala 2.0 and higher, there are data types <code class="ph codeph">VARCHAR</code> and <code class="ph codeph">CHAR</code>,
+            with length constraints for both types and blank-padding for <code class="ph codeph">CHAR</code>.
+            However, for performance reasons, it is still preferable to use <code class="ph codeph">STRING</code>
+            columns where practical.)
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            For national language character types such as <code class="ph codeph">NCHAR</code>, <code class="ph codeph">NVARCHAR</code>, or
+            <code class="ph codeph">NCLOB</code>, be aware that while Impala can store and query UTF-8 character data, currently
+            some string manipulation operations only work correctly with ASCII data. See
+            <a class="xref" href="impala_string.html#string">STRING Data Type</a> for details.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            Change any <code class="ph codeph">DATE</code>, <code class="ph codeph">DATETIME</code>, or <code class="ph codeph">TIME</code> columns to
+            <code class="ph codeph">TIMESTAMP</code>. Remove any precision constraints. Remove any timezone clauses, and make
+            sure your application logic or ETL process accounts for the fact that Impala expects all
+            <code class="ph codeph">TIMESTAMP</code> values to be in
+            <a class="xref" href="http://en.wikipedia.org/wiki/Coordinated_Universal_Time" target="_blank">Coordinated
+            Universal Time (UTC)</a>. See <a class="xref" href="impala_timestamp.html#timestamp">TIMESTAMP Data Type</a> for information about
+            the <code class="ph codeph">TIMESTAMP</code> data type, and
+            <a class="xref" href="impala_datetime_functions.html#datetime_functions">Impala Date and Time Functions</a> for conversion functions for different
+            date and time formats.
+          </p>
+          <p class="p">
+            You might also need to adapt date- and time-related literal values and format strings to use the
+            supported Impala date and time formats. If you have date and time literals with different separators or
+            different numbers of <code class="ph codeph">YY</code>, <code class="ph codeph">MM</code>, and so on placeholders than Impala
+            expects, consider using calls to <code class="ph codeph">regexp_replace()</code> to transform those values to the
+            Impala-compatible format. See <a class="xref" href="impala_timestamp.html#timestamp">TIMESTAMP Data Type</a> for information about the
+            allowed formats for date and time literals, and
+            <a class="xref" href="impala_string_functions.html#string_functions">Impala String Functions</a> for string conversion functions such as
+            <code class="ph codeph">regexp_replace()</code>.
+          </p>
+          <p class="p">
+            Instead of <code class="ph codeph">SYSDATE</code>, call the function <code class="ph codeph">NOW()</code>.
+          </p>
+          <p class="p">
+            Instead of adding or subtracting directly from a date value to produce a value <var class="keyword varname">N</var>
+            days in the past or future, use an <code class="ph codeph">INTERVAL</code> expression, for example <code class="ph codeph">NOW() +
+            INTERVAL 30 DAYS</code>.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            Although Impala supports <code class="ph codeph">INTERVAL</code> expressions for datetime arithmetic, as shown in
+            <a class="xref" href="impala_timestamp.html#timestamp">TIMESTAMP Data Type</a>, <code class="ph codeph">INTERVAL</code> is not available as a column
+            data type in Impala. For any <code class="ph codeph">INTERVAL</code> values stored in tables, convert them to numeric
+            values that you can add or subtract using the functions in
+            <a class="xref" href="impala_datetime_functions.html#datetime_functions">Impala Date and Time Functions</a>. For example, if you had a table
+            <code class="ph codeph">DEADLINES</code> with an <code class="ph codeph">INT</code> column <code class="ph codeph">TIME_PERIOD</code>, you could
+            construct dates N days in the future like so:
+          </p>
+<pre class="pre codeblock"><code>SELECT NOW() + INTERVAL time_period DAYS from deadlines;</code></pre>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            For <code class="ph codeph">YEAR</code> columns, change to the smallest Impala integer type that has sufficient
+            range. See <a class="xref" href="impala_datatypes.html#datatypes">Data Types</a> for details about ranges, casting, and so on
+            for the various numeric data types.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            Change any <code class="ph codeph">DECIMAL</code> and <code class="ph codeph">NUMBER</code> types. If fixed-point precision is not
+            required, you can use <code class="ph codeph">FLOAT</code> or <code class="ph codeph">DOUBLE</code> on the Impala side depending on
+            the range of values. For applications that require precise decimal values, such as financial data, you
+            might need to make more extensive changes to table structure and application logic, such as using
+            separate integer columns for dollars and cents, or encoding numbers as string values and writing UDFs
+            to manipulate them. See <a class="xref" href="impala_datatypes.html#datatypes">Data Types</a> for details about ranges,
+            casting, and so on for the various numeric data types.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            <code class="ph codeph">FLOAT</code>, <code class="ph codeph">DOUBLE</code>, and <code class="ph codeph">REAL</code> types are supported in
+            Impala. Remove any precision and scale specifications. (In Impala, <code class="ph codeph">REAL</code> is just an
+            alias for <code class="ph codeph">DOUBLE</code>; columns declared as <code class="ph codeph">REAL</code> are turned into
+            <code class="ph codeph">DOUBLE</code> behind the scenes.) See <a class="xref" href="impala_datatypes.html#datatypes">Data Types</a> for
+            details about ranges, casting, and so on for the various numeric data types.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            Most integer types from other systems have equivalents in Impala, perhaps under different names such as
+            <code class="ph codeph">BIGINT</code> instead of <code class="ph codeph">INT8</code>. For any that are unavailable, for example
+            <code class="ph codeph">MEDIUMINT</code>, switch to the smallest Impala integer type that has sufficient range.
+            Remove any precision specifications. See <a class="xref" href="impala_datatypes.html#datatypes">Data Types</a> for details
+            about ranges, casting, and so on for the various numeric data types.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            Remove any <code class="ph codeph">UNSIGNED</code> constraints. All Impala numeric types are signed. See
+            <a class="xref" href="impala_datatypes.html#datatypes">Data Types</a> for details about ranges, casting, and so on for the
+            various numeric data types.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            For any types holding bitwise values, use an integer type with enough range to hold all the relevant
+            bits within a positive integer. See <a class="xref" href="impala_datatypes.html#datatypes">Data Types</a> for details about
+            ranges, casting, and so on for the various numeric data types.
+          </p>
+          <p class="p">
+            For example, <code class="ph codeph">TINYINT</code> has a maximum positive value of 127, not 256, so to manipulate
+            8-bit bitfields as positive numbers switch to the next largest type <code class="ph codeph">SMALLINT</code>.
+          </p>
+<pre class="pre codeblock"><code>[localhost:21000] &gt; select cast(127*2 as tinyint);
++--------------------------+
+| cast(127 * 2 as tinyint) |
++--------------------------+
+| -2                       |
++--------------------------+
+[localhost:21000] &gt; select cast(128 as tinyint);
++----------------------+
+| cast(128 as tinyint) |
++----------------------+
+| -128                 |
++----------------------+
+[localhost:21000] &gt; select cast(127*2 as smallint);
++---------------------------+
+| cast(127 * 2 as smallint) |
++---------------------------+
+| 254                       |
++---------------------------+</code></pre>
+          <p class="p">
+            Impala does not support notation such as <code class="ph codeph">b'0101'</code> for bit literals.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            For BLOB values, use <code class="ph codeph">STRING</code> to represent <code class="ph codeph">CLOB</code> or
+            <code class="ph codeph">TEXT</code> types (character based large objects) up to 32 KB in size. Binary large objects
+            such as <code class="ph codeph">BLOB</code>, <code class="ph codeph">RAW</code> <code class="ph codeph">BINARY</code>, and
+            <code class="ph codeph">VARBINARY</code> do not currently have an equivalent in Impala.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            For Boolean-like types such as <code class="ph codeph">BOOL</code>, use the Impala <code class="ph codeph">BOOLEAN</code> type.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            Because Impala currently does not support composite or nested types, any spatial data types in other
+            database systems do not have direct equivalents in Impala. You could represent spatial values in string
+            format and write UDFs to process them. See <a class="xref" href="impala_udf.html#udfs">Impala User-Defined Functions (UDFs)</a> for details. Where
+            practical, separate spatial types into separate tables so that Impala can still work with the
+            non-spatial data.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            Take out any <code class="ph codeph">DEFAULT</code> clauses. Impala can use data files produced from many different
+            sources, such as Pig, Hive, or MapReduce jobs. The fast import mechanisms of <code class="ph codeph">LOAD DATA</code>
+            and external tables mean that Impala is flexible about the format of data files, and Impala does not
+            necessarily validate or cleanse data before querying it. When copying data through Impala
+            <code class="ph codeph">INSERT</code> statements, you can use conditional functions such as <code class="ph codeph">CASE</code> or
+            <code class="ph codeph">NVL</code> to substitute some other value for <code class="ph codeph">NULL</code> fields; see
+            <a class="xref" href="impala_conditional_functions.html#conditional_functions">Impala Conditional Functions</a> for details.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            Take out any constraints from your <code class="ph codeph">CREATE TABLE</code> and <code class="ph codeph">ALTER TABLE</code>
+            statements, for example <code class="ph codeph">PRIMARY KEY</code>, <code class="ph codeph">FOREIGN KEY</code>,
+            <code class="ph codeph">UNIQUE</code>, <code class="ph codeph">NOT NULL</code>, <code class="ph codeph">UNSIGNED</code>, or
+            <code class="ph codeph">CHECK</code> constraints. Impala can use data files produced from many different sources,
+            such as Pig, Hive, or MapReduce jobs. Therefore, Impala expects initial data validation to happen
+            earlier during the ETL or ELT cycle. After data is loaded into Impala tables, you can perform queries
+            to test for <code class="ph codeph">NULL</code> values. When copying data through Impala <code class="ph codeph">INSERT</code>
+            statements, you can use conditional functions such as <code class="ph codeph">CASE</code> or <code class="ph codeph">NVL</code> to
+            substitute some other value for <code class="ph codeph">NULL</code> fields; see
+            <a class="xref" href="impala_conditional_functions.html#conditional_functions">Impala Conditional Functions</a> for details.
+          </p>
+          <p class="p">
+            Do as much verification as practical before loading data into Impala. After data is loaded into Impala,
+            you can do further verification using SQL queries to check if values have expected ranges, if values
+            are <code class="ph codeph">NULL</code> or not, and so on. If there is a problem with the data, you will need to
+            re-run earlier stages of the ETL process, or do an <code class="ph codeph">INSERT ... SELECT</code> statement in
+            Impala to copy the faulty data to a new table and transform or filter out the bad values.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            Take out any <code class="ph codeph">CREATE INDEX</code>, <code class="ph codeph">DROP INDEX</code>, and <code class="ph codeph">ALTER
+            INDEX</code> statements, and equivalent <code class="ph codeph">ALTER TABLE</code> statements. Remove any
+            <code class="ph codeph">INDEX</code>, <code class="ph codeph">KEY</code>, or <code class="ph codeph">PRIMARY KEY</code> clauses from
+            <code class="ph codeph">CREATE TABLE</code> and <code class="ph codeph">ALTER TABLE</code> statements. Impala is optimized for bulk
+            read operations for data warehouse-style queries, and therefore does not support indexes for its
+            tables.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            Calls to built-in functions with out-of-range or otherwise incorrect arguments, return
+            <code class="ph codeph">NULL</code> in Impala as opposed to raising exceptions. (This rule applies even when the
+            <code class="ph codeph">ABORT_ON_ERROR=true</code> query option is in effect.) Run small-scale queries using
+            representative data to doublecheck that calls to built-in functions are returning expected values
+            rather than <code class="ph codeph">NULL</code>. For example, unsupported <code class="ph codeph">CAST</code> operations do not
+            raise an error in Impala:
+          </p>
+<pre class="pre codeblock"><code>select cast('foo' as int);
++--------------------+
+| cast('foo' as int) |
++--------------------+
+| NULL               |
++--------------------+</code></pre>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            For any other type not supported in Impala, you could represent their values in string format and write
+            UDFs to process them. See <a class="xref" href="impala_udf.html#udfs">Impala User-Defined Functions (UDFs)</a> for details.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            To detect the presence of unsupported or unconvertable data types in data files, do initial testing
+            with the <code class="ph codeph">ABORT_ON_ERROR=true</code> query option in effect. This option causes queries to
+            fail immediately if they encounter disallowed type conversions. See
+            <a class="xref" href="impala_abort_on_error.html#abort_on_error">ABORT_ON_ERROR Query Option</a> for details. For example:
+          </p>
+<pre class="pre codeblock"><code>set abort_on_error=true;
+select count(*) from (select * from t1);
+-- The above query will fail if the data files for T1 contain any
+-- values that can't be converted to the expected Impala data types.
+-- For example, if T1.C1 is defined as INT but the column contains
+-- floating-point values like 1.1, the query will return an error.</code></pre>
+        </li>
+      </ul>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="porting__porting_statements">
+
+    <h2 class="title topictitle2" id="ariaid-title4">SQL Statements to Remove or Adapt</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        Some SQL statements or clauses that you might be familiar with are not currently supported in Impala:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          <p class="p">
+            Impala has no <code class="ph codeph">DELETE</code> statement. Impala is intended for data warehouse-style operations
+            where you do bulk moves and transforms of large quantities of data. Instead of using
+            <code class="ph codeph">DELETE</code>, use <code class="ph codeph">INSERT OVERWRITE</code> to entirely replace the contents of a
+            table or partition, or use <code class="ph codeph">INSERT ... SELECT</code> to copy a subset of data (everything but
+            the rows you intended to delete) from one table to another. See <a class="xref" href="impala_dml.html#dml">DML Statements</a> for
+            an overview of Impala DML statements.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            Impala has no <code class="ph codeph">UPDATE</code> statement. Impala is intended for data warehouse-style operations
+            where you do bulk moves and transforms of large quantities of data. Instead of using
+            <code class="ph codeph">UPDATE</code>, do all necessary transformations early in the ETL process, such as in the job
+            that generates the original data, or when copying from one table to another to convert to a particular
+            file format or partitioning scheme. See <a class="xref" href="impala_dml.html#dml">DML Statements</a> for an overview of Impala DML
+            statements.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            Impala has no transactional statements, such as <code class="ph codeph">COMMIT</code> or <code class="ph codeph">ROLLBACK</code>.
+            Impala effectively works like the <code class="ph codeph">AUTOCOMMIT</code> mode in some database systems, where
+            changes take effect as soon as they are made.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            If your database, table, column, or other names conflict with Impala reserved words, use different
+            names or quote the names with backticks. See <a class="xref" href="impala_reserved_words.html#reserved_words">Impala Reserved Words</a>
+            for the current list of Impala reserved words.
+          </p>
+          <p class="p">
+            Conversely, if you use a keyword that Impala does not recognize, it might be interpreted as a table or
+            column alias. For example, in <code class="ph codeph">SELECT * FROM t1 NATURAL JOIN t2</code>, Impala does not
+            recognize the <code class="ph codeph">NATURAL</code> keyword and interprets it as an alias for the table
+            <code class="ph codeph">t1</code>. If you experience any unexpected behavior with queries, check the list of reserved
+            words to make sure all keywords in join and <code class="ph codeph">WHERE</code> clauses are recognized.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            Impala supports subqueries only in the <code class="ph codeph">FROM</code> clause of a query, not within the
+            <code class="ph codeph">WHERE</code> clauses. Therefore, you cannot use clauses such as <code class="ph codeph">WHERE
+            <var class="keyword varname">column</var> IN (<var class="keyword varname">subquery</var>)</code>. Also, Impala does not allow
+            <code class="ph codeph">EXISTS</code> or <code class="ph codeph">NOT EXISTS</code> clauses (although <code class="ph codeph">EXISTS</code> is a
+            reserved keyword).
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            Impala supports <code class="ph codeph">UNION</code> and <code class="ph codeph">UNION ALL</code> set operators, but not
+            <code class="ph codeph">INTERSECT</code>. <span class="ph">Prefer <code class="ph codeph">UNION ALL</code> over <code class="ph codeph">UNION</code> when you know the
+        data sets are disjoint or duplicate values are not a problem; <code class="ph codeph">UNION ALL</code> is more efficient
+        because it avoids materializing and sorting the entire result set to eliminate duplicate values.</span>
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            Within queries, Impala requires query aliases for any subqueries:
+          </p>
+<pre class="pre codeblock"><code>-- Without the alias 'contents_of_t1' at the end, query gives syntax error.
+select count(*) from (select * from t1) contents_of_t1;</code></pre>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            When an alias is declared for an expression in a query, that alias cannot be referenced again within
+            the same query block:
+          </p>
+<pre class="pre codeblock"><code>-- Can't reference AVERAGE twice in the SELECT list where it's defined.
+select avg(x) as average, average+1 from t1 group by x;
+ERROR: AnalysisException: couldn't resolve column reference: 'average'
+
+-- Although it can be referenced again later in the same query.
+select avg(x) as average from t1 group by x having average &gt; 3;</code></pre>
+          <p class="p">
+            For Impala, either repeat the expression again, or abstract the expression into a <code class="ph codeph">WITH</code>
+            clause, creating named columns that can be referenced multiple times anywhere in the base query:
+          </p>
+<pre class="pre codeblock"><code>-- The following 2 query forms are equivalent.
+select avg(x) as average, avg(x)+1 from t1 group by x;
+with avg_t as (select avg(x) average from t1 group by x) select average, average+1 from avg_t;</code></pre>
+
+        </li>
+
+        <li class="li">
+          <p class="p">
+            Impala does not support certain rarely used join types that are less appropriate for high-volume tables
+            used for data warehousing. In some cases, Impala supports join types but requires explicit syntax to
+            ensure you do not do inefficient joins of huge tables by accident. For example, Impala does not support
+            natural joins or anti-joins, and requires the <code class="ph codeph">CROSS JOIN</code> operator for Cartesian
+            products. See <a class="xref" href="impala_joins.html#joins">Joins in Impala SELECT Statements</a> for details on the syntax for Impala join clauses.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            Impala has a limited choice of partitioning types. Partitions are defined based on each distinct
+            combination of values for one or more partition key columns. Impala does not redistribute or check data
+            to create evenly distributed partitions; you must choose partition key columns based on your knowledge
+            of the data volume and distribution. Adapt any tables that use range, list, hash, or key partitioning
+            to use the Impala partition syntax for <code class="ph codeph">CREATE TABLE</code> and <code class="ph codeph">ALTER TABLE</code>
+            statements. Impala partitioning is similar to range partitioning where every range has exactly one
+            value, or key partitioning where the hash function produces a separate bucket for every combination of
+            key values. See <a class="xref" href="impala_partitioning.html#partitioning">Partitioning for Impala Tables</a> for usage details, and
+            <a class="xref" href="impala_create_table.html#create_table">CREATE TABLE Statement</a> and
+            <a class="xref" href="impala_alter_table.html#alter_table">ALTER TABLE Statement</a> for syntax.
+          </p>
+          <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+            Because the number of separate partitions is potentially higher than in other database systems, keep a
+            close eye on the number of partitions and the volume of data in each one; scale back the number of
+            partition key columns if you end up with too many partitions with a small volume of data in each one.
+            Remember, to distribute work for a query across a cluster, you need at least one HDFS block per node.
+            HDFS blocks are typically multiple megabytes, <span class="ph">especially</span> for Parquet
+            files. Therefore, if each partition holds only a few megabytes of data, you are unlikely to see much
+            parallelism in the query because such a small amount of data is typically processed by a single node.
+          </div>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            For <span class="q">"top-N"</span> queries, Impala uses the <code class="ph codeph">LIMIT</code> clause rather than comparing against a
+            pseudocolumn named <code class="ph codeph">ROWNUM</code> or <code class="ph codeph">ROW_NUM</code>. See
+            <a class="xref" href="impala_limit.html#limit">LIMIT Clause</a> for details.
+          </p>
+        </li>
+      </ul>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title5" id="porting__porting_antipatterns">
+
+    <h2 class="title topictitle2" id="ariaid-title5">SQL Constructs to Doublecheck</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        Some SQL constructs that are supported have behavior or defaults more oriented towards convenience than
+        optimal performance. Also, sometimes machine-generated SQL, perhaps issued through JDBC or ODBC
+        applications, might have inefficiencies or exceed internal Impala limits. As you port SQL code, be alert
+        and change these things where appropriate:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          <p class="p">
+            A <code class="ph codeph">CREATE TABLE</code> statement with no <code class="ph codeph">STORED AS</code> clause creates data files
+            in plain text format, which is convenient for data interchange but not a good choice for high-volume
+            data with high-performance queries. See <a class="xref" href="impala_file_formats.html#file_formats">How Impala Works with Hadoop File Formats</a> for why and
+            how to use specific file formats for compact data and high-performance queries. Especially see
+            <a class="xref" href="impala_parquet.html#parquet">Using the Parquet File Format with Impala Tables</a>, for details about the file format most heavily optimized for
+            large-scale data warehouse queries.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            A <code class="ph codeph">CREATE TABLE</code> statement with no <code class="ph codeph">PARTITIONED BY</code> clause stores all the
+            data files in the same physical location, which can lead to scalability problems when the data volume
+            becomes large.
+          </p>
+          <p class="p">
+            On the other hand, adapting tables that were already partitioned in a different database system could
+            produce an Impala table with a high number of partitions and not enough data in each one, leading to
+            underutilization of Impala's parallel query features.
+          </p>
+          <p class="p">
+            See <a class="xref" href="impala_partitioning.html#partitioning">Partitioning for Impala Tables</a> for details about setting up partitioning and
+            tuning the performance of queries on partitioned tables.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            The <code class="ph codeph">INSERT ... VALUES</code> syntax is suitable for setting up toy tables with a few rows for
+            functional testing, but because each such statement creates a separate tiny file in HDFS, it is not a
+            scalable technique for loading megabytes or gigabytes (let alone petabytes) of data. Consider revising
+            your data load process to produce raw data files outside of Impala, then setting up Impala external
+            tables or using the <code class="ph codeph">LOAD DATA</code> statement to use those data files instantly in Impala
+            tables, with no conversion or indexing stage. See <a class="xref" href="impala_tables.html#external_tables">External Tables</a> and
+            <a class="xref" href="impala_load_data.html#load_data">LOAD DATA Statement</a> for details about the Impala techniques for working with
+            data files produced outside of Impala; see <a class="xref" href="impala_tutorial.html#tutorial_etl">Data Loading and Querying Examples</a> for examples
+            of ETL workflow for Impala.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            If your ETL process is not optimized for Hadoop, you might end up with highly fragmented small data
+            files, or a single giant data file that cannot take advantage of distributed parallel queries or
+            partitioning. In this case, use an <code class="ph codeph">INSERT ... SELECT</code> statement to copy the data into a
+            new table and reorganize into a more efficient layout in the same operation. See
+            <a class="xref" href="impala_insert.html#insert">INSERT Statement</a> for details about the <code class="ph codeph">INSERT</code> statement.
+          </p>
+          <p class="p">
+            You can do <code class="ph codeph">INSERT ... SELECT</code> into a table with a more efficient file format (see
+            <a class="xref" href="impala_file_formats.html#file_formats">How Impala Works with Hadoop File Formats</a>) or from an unpartitioned table into a partitioned
+            one (see <a class="xref" href="impala_partitioning.html#partitioning">Partitioning for Impala Tables</a>).
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            The number of expressions allowed in an Impala query might be smaller than for some other database
+            systems, causing failures for very complicated queries (typically produced by automated SQL
+            generators). Where practical, keep the number of expressions in the <code class="ph codeph">WHERE</code> clauses to
+            approximately 2000 or fewer. As a workaround, set the query option
+            <code class="ph codeph">DISABLE_CODEGEN=true</code> if queries fail for this reason. See
+            <a class="xref" href="impala_disable_codegen.html#disable_codegen">DISABLE_CODEGEN Query Option</a> for details.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            If practical, rewrite <code class="ph codeph">UNION</code> queries to use the <code class="ph codeph">UNION ALL</code> operator
+            instead. <span class="ph">Prefer <code class="ph codeph">UNION ALL</code> over <code class="ph codeph">UNION</code> when you know the
+        data sets are disjoint or duplicate values are not a problem; <code class="ph codeph">UNION ALL</code> is more efficient
+        because it avoids materializing and sorting the entire result set to eliminate duplicate values.</span>
+          </p>
+        </li>
+      </ul>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title6" id="porting__porting_next">
+
+    <h2 class="title topictitle2" id="ariaid-title6">Next Porting Steps after Verifying Syntax and Semantics</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        Throughout this section, some of the decisions you make during the porting process also have a substantial
+        impact on performance. After your SQL code is ported and working correctly, doublecheck the
+        performance-related aspects of your schema design, physical layout, and queries to make sure that the
+        ported application is taking full advantage of Impala's parallelism, performance-related SQL features, and
+        integration with Hadoop components.
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          Have you run the <code class="ph codeph">COMPUTE STATS</code> statement on each table involved in join queries? Have
+          you also run <code class="ph codeph">COMPUTE STATS</code> for each table used as the source table in an <code class="ph codeph">INSERT
+          ... SELECT</code> or <code class="ph codeph">CREATE TABLE AS SELECT</code> statement?
+        </li>
+
+        <li class="li">
+          Are you using the most efficient file format for your data volumes, table structure, and query
+          characteristics?
+        </li>
+
+        <li class="li">
+          Are you using partitioning effectively? That is, have you partitioned on columns that are often used for
+          filtering in <code class="ph codeph">WHERE</code> clauses? Have you partitioned at the right granularity so that there
+          is enough data in each partition to parallelize the work for each query?
+        </li>
+
+        <li class="li">
+          Does your ETL process produce a relatively small number of multi-megabyte data files (good) rather than a
+          huge number of small files (bad)?
+        </li>
+      </ul>
+
+      <p class="p">
+        See <a class="xref" href="impala_performance.html#performance">Tuning Impala for Performance</a> for details about the whole performance tuning
+        process.
+      </p>
+    </div>
+  </article>
+</article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_ports.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_ports.html b/docs/build3x/html/topics/impala_ports.html
new file mode 100644
index 0000000..5acc1b6
--- /dev/null
+++ b/docs/build3x/html/topics/impala_ports.html
@@ -0,0 +1,421 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="ports"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Ports Used by Impala</title></head><body id="ports"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Ports Used by Impala</h1>
+
+
+  <div class="body conbody" id="ports__conbody_ports">
+
+    <p class="p">
+
+      Impala uses the TCP ports listed in the following table. Before deploying Impala, ensure these ports are open
+      on each system.
+    </p>
+
+    <table class="table"><caption></caption><colgroup><col style="width:18.181818181818183%"><col style="width:27.27272727272727%"><col style="width:9.090909090909092%"><col style="width:18.181818181818183%"><col style="width:27.27272727272727%"></colgroup><thead class="thead">
+          <tr class="row">
+            <th class="entry nocellnorowborder" id="ports__conbody_ports__entry__1">
+              Component
+            </th>
+            <th class="entry nocellnorowborder" id="ports__conbody_ports__entry__2">
+              Service
+            </th>
+            <th class="entry nocellnorowborder" id="ports__conbody_ports__entry__3">
+              Port
+            </th>
+            <th class="entry nocellnorowborder" id="ports__conbody_ports__entry__4">
+              Access Requirement
+            </th>
+            <th class="entry nocellnorowborder" id="ports__conbody_ports__entry__5">
+              Comment
+            </th>
+          </tr>
+        </thead><tbody class="tbody">
+          <tr class="row">
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__1 ">
+              <p class="p">
+                Impala Daemon
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__2 ">
+              <p class="p">
+                Impala Daemon Frontend Port
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__3 ">
+              <p class="p">
+                21000
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__4 ">
+              <p class="p">
+                External
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__5 ">
+              <p class="p">
+                Used to transmit commands and receive results by <code class="ph codeph">impala-shell</code> and
+                some ODBC drivers.
+              </p>
+            </td>
+          </tr>
+          <tr class="row">
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__1 ">
+              <p class="p">
+                Impala Daemon
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__2 ">
+              <p class="p">
+                Impala Daemon Frontend Port
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__3 ">
+              <p class="p">
+                21050
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__4 ">
+              <p class="p">
+                External
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__5 ">
+              <p class="p">
+                Used to transmit commands and receive results by applications, such as Business Intelligence tools,
+                using JDBC, the Beeswax query editor in Hue, and some ODBC drivers.
+              </p>
+            </td>
+          </tr>
+          <tr class="row">
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__1 ">
+              <p class="p">
+                Impala Daemon
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__2 ">
+              <p class="p">
+                Impala Daemon Backend Port
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__3 ">
+              <p class="p">
+                22000
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__4 ">
+              <p class="p">
+                Internal
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__5 ">
+              <p class="p">
+                Internal use only. Impala daemons use this port to communicate with each other.
+              </p>
+            </td>
+          </tr>
+          <tr class="row">
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__1 ">
+              <p class="p">
+                Impala Daemon
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__2 ">
+              <p class="p">
+                StateStoreSubscriber Service Port
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__3 ">
+              <p class="p">
+                23000
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__4 ">
+              <p class="p">
+                Internal
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__5 ">
+              <p class="p">
+                Internal use only. Impala daemons listen on this port for updates from the statestore daemon.
+              </p>
+            </td>
+          </tr>
+          <tr class="row">
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__1 ">
+              <p class="p">
+                Catalog Daemon
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__2 ">
+              <p class="p">
+                StateStoreSubscriber Service Port
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__3 ">
+              <p class="p">
+                23020
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__4 ">
+              <p class="p">
+                Internal
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__5 ">
+              <p class="p">
+                Internal use only. The catalog daemon listens on this port for updates from the statestore daemon.
+              </p>
+            </td>
+          </tr>
+          <tr class="row">
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__1 ">
+              <p class="p">
+                Impala Daemon
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__2 ">
+              <p class="p">
+                Impala Daemon HTTP Server Port
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__3 ">
+              <p class="p">
+                25000
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__4 ">
+              <p class="p">
+                External
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__5 ">
+              <p class="p">
+                Impala web interface for administrators to monitor and troubleshoot.
+              </p>
+            </td>
+          </tr>
+          <tr class="row">
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__1 ">
+              <p class="p">
+                Impala StateStore Daemon
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__2 ">
+              <p class="p">
+                StateStore HTTP Server Port
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__3 ">
+              <p class="p">
+                25010
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__4 ">
+              <p class="p">
+                External
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__5 ">
+              <p class="p">
+                StateStore web interface for administrators to monitor and troubleshoot.
+              </p>
+            </td>
+          </tr>
+          <tr class="row">
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__1 ">
+              <p class="p">
+                Impala Catalog Daemon
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__2 ">
+              <p class="p">
+                Catalog HTTP Server Port
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__3 ">
+              <p class="p">
+                25020
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__4 ">
+              <p class="p">
+                External
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__5 ">
+              <p class="p">
+                Catalog service web interface for administrators to monitor and troubleshoot. New in Impala 1.2 and
+                higher.
+              </p>
+            </td>
+          </tr>
+          <tr class="row">
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__1 ">
+              <p class="p">
+                Impala StateStore Daemon
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__2 ">
+              <p class="p">
+                StateStore Service Port
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__3 ">
+              <p class="p">
+                24000
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__4 ">
+              <p class="p">
+                Internal
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__5 ">
+              <p class="p">
+                Internal use only. The statestore daemon listens on this port for registration/unregistration
+                requests.
+              </p>
+            </td>
+          </tr>
+          <tr class="row">
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__1 ">
+              <p class="p">
+                Impala Catalog Daemon
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__2 ">
+              <p class="p">
+                StateStore Service Port
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__3 ">
+              <p class="p">
+                26000
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__4 ">
+              <p class="p">
+                Internal
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__5 ">
+              <p class="p">
+                Internal use only. The catalog service uses this port to communicate with the Impala daemons. New
+                in Impala 1.2 and higher.
+              </p>
+            </td>
+          </tr>
+          <tr class="row">
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__1 ">
+              <p class="p">
+                Impala Daemon
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__2 ">
+              <p class="p">
+                Llama Callback Port
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__3 ">
+              <p class="p">
+                28000
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__4 ">
+              <p class="p">
+                Internal
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__5 ">
+              <p class="p">
+                Internal use only. Impala daemons use to communicate with Llama. New in <span class="keyword">Impala 1.3</span> and higher.
+              </p>
+            </td>
+          </tr>
+          <tr class="row">
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__1 ">
+              <p class="p">
+                Impala Llama ApplicationMaster
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__2 ">
+              <p class="p">
+                Llama Thrift Admin Port
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__3 ">
+              <p class="p">
+                15002
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__4 ">
+              <p class="p">
+                Internal
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__5 ">
+              <p class="p">
+                Internal use only. New in <span class="keyword">Impala 1.3</span> and higher.
+              </p>
+            </td>
+          </tr>
+          <tr class="row">
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__1 ">
+              <p class="p">
+                Impala Llama ApplicationMaster
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__2 ">
+              <p class="p">
+                Llama Thrift Port
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__3 ">
+              <p class="p">
+                15000
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__4 ">
+              <p class="p">
+                Internal
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__5 ">
+              <p class="p">
+                Internal use only. New in <span class="keyword">Impala 1.3</span> and higher.
+              </p>
+            </td>
+          </tr>
+          <tr class="row">
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__1 ">
+              <p class="p">
+                Impala Llama ApplicationMaster
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__2 ">
+              <p class="p">
+                Llama HTTP Port
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__3 ">
+              <p class="p">
+                15001
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__4 ">
+              <p class="p">
+                External
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__5 ">
+              <p class="p">
+                Llama service web interface for administrators to monitor and troubleshoot.
+                New in <span class="keyword">Impala 1.3</span> and higher.
+              </p>
+            </td>
+          </tr>
+        </tbody></table>
+  </div>
+</article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_prefetch_mode.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_prefetch_mode.html b/docs/build3x/html/topics/impala_prefetch_mode.html
new file mode 100644
index 0000000..b7cc1f5
--- /dev/null
+++ b/docs/build3x/html/topics/impala_prefetch_mode.html
@@ -0,0 +1,47 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="prefetch_mode"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>PREFETCH_MODE Query Option (Impala 2.6 or higher only)</title></head><body id="prefetch_mode"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">PREFETCH_MODE Query Option (<span class="keyword">Impala 2.6</span> or higher only)</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+
+      Determines whether the prefetching optimization is applied during
+      join query processing.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Type:</strong> numeric (0, 1)
+      or corresponding mnemonic strings (<code class="ph codeph">NONE</code>, <code class="ph codeph">HT_BUCKET</code>).
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Default:</strong> 1 (equivalent to <code class="ph codeph">HT_BUCKET</code>)
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.6.0</span>
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+    <p class="p">
+      The default mode is 1, which means that hash table buckets are
+      prefetched during join query processing.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+    <p class="p">
+      <a class="xref" href="impala_joins.html#joins">Joins in Impala SELECT Statements</a>,
+      <a class="xref" href="impala_perf_joins.html#perf_joins">Performance Considerations for Join Queries</a>.
+    </p>
+
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_prereqs.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_prereqs.html b/docs/build3x/html/topics/impala_prereqs.html
new file mode 100644
index 0000000..88293d4
--- /dev/null
+++ b/docs/build3x/html/topics/impala_prereqs.html
@@ -0,0 +1,275 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_planning.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="prereqs"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Impala Requirements</title></head><body id="prereqs"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Impala Requirements</h1>
+
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+
+
+      To perform as expected, Impala depends on the availability of the software, hardware, and configurations
+      described in the following sections.
+    </p>
+
+    <p class="p toc inpage"></p>
+  </div>
+
+  <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_planning.html">Planning for Impala Deployment</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="prereqs__prereqs_os">
+
+    <h2 class="title topictitle2" id="ariaid-title2">Supported Operating Systems</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+
+
+
+
+
+
+
+
+        Apache Impala runs on Linux systems only. See the <span class="ph filepath">README.md</span>
+        file for more information.
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="prereqs__prereqs_hive">
+
+    <h2 class="title topictitle2" id="ariaid-title3">Hive Metastore and Related Configuration</h2>
+
+
+    <div class="body conbody">
+
+      <p class="p">
+
+
+
+        Impala can interoperate with data stored in Hive, and uses the same infrastructure as Hive for tracking
+        metadata about schema objects such as tables and columns. The following components are prerequisites for
+        Impala:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          MySQL or PostgreSQL, to act as a metastore database for both Impala and Hive.
+          <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+            <p class="p">
+              Installing and configuring a Hive metastore is an Impala requirement. Impala does not work without
+              the metastore database. For the process of installing and configuring the metastore, see
+              <a class="xref" href="impala_install.html#install">Installing Impala</a>.
+            </p>
+
+            <p class="p">
+              Always configure a <strong class="ph b">Hive metastore service</strong> rather than connecting directly to the metastore
+              database. The Hive metastore service is required to interoperate between different levels of
+              metastore APIs if this is necessary for your environment, and using it avoids known issues with
+              connecting directly to the metastore database.
+            </p>
+
+            <p class="p">
+              A summary of the metastore installation process is as follows:
+            </p>
+            <ul class="ul">
+              <li class="li">
+                Install a MySQL or PostgreSQL database. Start the database if it is not started after installation.
+              </li>
+
+              <li class="li">
+                Download the
+                <a class="xref" href="http://www.mysql.com/products/connector/" target="_blank">MySQL
+                connector</a> or the
+                <a class="xref" href="http://jdbc.postgresql.org/download.html" target="_blank">PostgreSQL
+                connector</a> and place it in the <code class="ph codeph">/usr/share/java/</code> directory.
+              </li>
+
+              <li class="li">
+                Use the appropriate command line tool for your database to create the metastore database.
+              </li>
+
+              <li class="li">
+                Use the appropriate command line tool for your database to grant privileges for the metastore
+                database to the <code class="ph codeph">hive</code> user.
+              </li>
+
+              <li class="li">
+                Modify <code class="ph codeph">hive-site.xml</code> to include information matching your particular database: its
+                URL, username, and password. You will copy the <code class="ph codeph">hive-site.xml</code> file to the Impala
+                Configuration Directory later in the Impala installation process.
+              </li>
+            </ul>
+          </div>
+        </li>
+
+        <li class="li">
+          <strong class="ph b">Optional:</strong> Hive. Although only the Hive metastore database is required for Impala to function, you
+          might install Hive on some client machines to create and load data into tables that use certain file
+          formats. See <a class="xref" href="impala_file_formats.html#file_formats">How Impala Works with Hadoop File Formats</a> for details. Hive does not need to be
+          installed on the same DataNodes as Impala; it just needs access to the same metastore database.
+        </li>
+      </ul>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="prereqs__prereqs_java">
+
+    <h2 class="title topictitle2" id="ariaid-title4">Java Dependencies</h2>
+
+
+    <div class="body conbody">
+
+      <p class="p">
+
+
+        Although Impala is primarily written in C++, it does use Java to communicate with various Hadoop
+        components:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          The officially supported JVM for Impala is the Oracle JVM. Other JVMs might cause issues, typically
+          resulting in a failure at <span class="keyword cmdname">impalad</span> startup. In particular, the JamVM used by default on
+          certain levels of Ubuntu systems can cause <span class="keyword cmdname">impalad</span> to fail to start.
+        </li>
+
+        <li class="li">
+          Internally, the <span class="keyword cmdname">impalad</span> daemon relies on the <code class="ph codeph">JAVA_HOME</code> environment
+          variable to locate the system Java libraries. Make sure the <span class="keyword cmdname">impalad</span> service is not run
+          from an environment with an incorrect setting for this variable.
+        </li>
+
+        <li class="li">
+          All Java dependencies are packaged in the <code class="ph codeph">impala-dependencies.jar</code> file, which is located
+          at <code class="ph codeph">/usr/lib/impala/lib/</code>. These map to everything that is built under
+          <code class="ph codeph">fe/target/dependency</code>.
+        </li>
+      </ul>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title5" id="prereqs__prereqs_network">
+
+    <h2 class="title topictitle2" id="ariaid-title5">Networking Configuration Requirements</h2>
+
+
+    <div class="body conbody">
+
+      <p class="p">
+
+        As part of ensuring best performance, Impala attempts to complete tasks on local data, as opposed to using
+        network connections to work with remote data. To support this goal, Impala matches
+        the&nbsp;<strong class="ph b">hostname</strong>&nbsp;provided to each Impala daemon with the&nbsp;<strong class="ph b">IP address</strong>&nbsp;of each DataNode by
+        resolving the hostname flag to an IP address. For Impala to work with local data, use a single IP interface
+        for the DataNode and the Impala daemon on each machine. Ensure that the Impala daemon's hostname flag
+        resolves to the IP address of the DataNode. For single-homed machines, this is usually automatic, but for
+        multi-homed machines, ensure that the Impala daemon's hostname resolves to the correct interface. Impala
+        tries to detect the correct hostname at start-up, and prints the derived hostname at the start of the log
+        in a message of the form:
+      </p>
+
+<pre class="pre codeblock"><code>Using hostname: impala-daemon-1.example.com</code></pre>
+
+      <p class="p">
+        In the majority of cases, this automatic detection works correctly. If you need to explicitly set the
+        hostname, do so by setting the&nbsp;<code class="ph codeph">--hostname</code>&nbsp;flag.
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title6" id="prereqs__prereqs_hardware">
+
+    <h2 class="title topictitle2" id="ariaid-title6">Hardware Requirements</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+
+
+
+
+
+
+
+
+        During join operations, portions of data from each joined table are loaded into memory. Data sets can be
+        very large, so ensure your hardware has sufficient memory to accommodate the joins you anticipate
+        completing.
+      </p>
+
+      <p class="p">
+        While requirements vary according to data set size, the following is generally recommended:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          CPU - Impala version 2.2 and higher uses the SSSE3 instruction set, which is included in newer processors.
+          <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+            This required level of processor is the same as in Impala version 1.x. The Impala 2.0 and 2.1 releases
+            had a stricter requirement for the SSE4.1 instruction set, which has now been relaxed.
+          </div>
+
+        </li>
+
+        <li class="li">
+          Memory - 128 GB or more recommended, ideally 256 GB or more. If the intermediate results during query
+          processing on a particular node exceed the amount of memory available to Impala on that node, the query
+          writes temporary work data to disk, which can lead to long query times. Note that because the work is
+          parallelized, and intermediate results for aggregate queries are typically smaller than the original
+          data, Impala can query and join tables that are much larger than the memory available on an individual
+          node.
+        </li>
+
+        <li class="li">
+          Storage - DataNodes with 12 or more disks each. I/O speeds are often the limiting factor for disk
+          performance with Impala. Ensure that you have sufficient disk space to store the data Impala will be
+          querying.
+        </li>
+      </ul>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title7" id="prereqs__prereqs_account">
+
+    <h2 class="title topictitle2" id="ariaid-title7">User Account Requirements</h2>
+
+
+    <div class="body conbody">
+
+      <p class="p">
+
+
+
+        Impala creates and uses a user and group named <code class="ph codeph">impala</code>. Do not delete this account or group
+        and do not modify the account's or group's permissions and rights. Ensure no existing systems obstruct the
+        functioning of these accounts and groups. For example, if you have scripts that delete user accounts not in
+        a white-list, add these accounts to the list of permitted accounts.
+      </p>
+
+      <p class="p">
+        For correct file deletion during <code class="ph codeph">DROP TABLE</code> operations, Impala must be able to move files
+        to the HDFS trashcan. You might need to create an HDFS directory <span class="ph filepath">/user/impala</span>,
+        writeable by the <code class="ph codeph">impala</code> user, so that the trashcan can be created. Otherwise, data files
+        might remain behind after a <code class="ph codeph">DROP TABLE</code> statement.
+      </p>
+
+      <p class="p">
+        Impala should not run as root. Best Impala performance is achieved using direct reads, but root is not
+        permitted to use direct reads. Therefore, running Impala as root negatively affects performance.
+      </p>
+
+      <p class="p">
+        By default, any user can connect to Impala and access all the associated databases and tables. You can
+        enable authorization and authentication based on the Linux OS user who connects to the Impala server, and
+        the associated groups for that user. <a class="xref" href="impala_security.html#security">Impala Security</a> for details. These
+        security features do not change the underlying file permission requirements; the <code class="ph codeph">impala</code>
+        user still needs to be able to access the data files.
+      </p>
+    </div>
+  </article>
+</article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_processes.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_processes.html b/docs/build3x/html/topics/impala_processes.html
new file mode 100644
index 0000000..4d64072
--- /dev/null
+++ b/docs/build3x/html/topics/impala_processes.html
@@ -0,0 +1,115 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_config_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="processes"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Starting Impala</title></head><body id="processes"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Starting Impala</h1>
+
+
+  <div class="body conbody">
+
+    <p class="p">
+
+
+
+      To activate Impala if it is installed but not yet started:
+    </p>
+
+    <ol class="ol">
+      <li class="li">
+        Set any necessary configuration options for the Impala services. See
+        <a class="xref" href="impala_config_options.html#config_options">Modifying Impala Startup Options</a> for details.
+      </li>
+
+      <li class="li">
+        Start one instance of the Impala statestore. The statestore helps Impala to distribute work efficiently,
+        and to continue running in the event of availability problems for other Impala nodes. If the statestore
+        becomes unavailable, Impala continues to function.
+      </li>
+
+      <li class="li">
+        Start one instance of the Impala catalog service.
+      </li>
+
+      <li class="li">
+        Start the main Impala service on one or more DataNodes, ideally on all DataNodes to maximize local
+        processing and avoid network traffic due to remote reads.
+      </li>
+    </ol>
+
+    <p class="p">
+      Once Impala is running, you can conduct interactive experiments using the instructions in
+      <a class="xref" href="impala_tutorial.html#tutorial">Impala Tutorials</a> and try <a class="xref" href="impala_impala_shell.html#impala_shell">Using the Impala Shell (impala-shell Command)</a>.
+    </p>
+
+    <p class="p toc inpage"></p>
+  </div>
+
+  <nav role="navigation" class="related-links"><ul class="ullinks"><li class="link ulchildlink"><strong><a href="../topics/impala_config_options.html">Modifying Impala Startup Options</a></strong><br></li></ul></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="processes__starting_via_cmdline">
+
+    <h2 class="title topictitle2" id="ariaid-title2">Starting Impala from the Command Line</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        To start the Impala state store and Impala from the command line or a script, you can either use the
+        <span class="keyword cmdname">service</span> command or you can start the daemons directly through the
+        <span class="keyword cmdname">impalad</span>, <code class="ph codeph">statestored</code>, and <span class="keyword cmdname">catalogd</span> executables.
+      </p>
+
+      <p class="p">
+        Start the Impala statestore and then start <code class="ph codeph">impalad</code> instances. You can modify the values
+        the service initialization scripts use when starting the statestore and Impala by editing
+        <code class="ph codeph">/etc/default/impala</code>.
+      </p>
+
+      <p class="p">
+        Start the statestore service using a command similar to the following:
+      </p>
+
+      <div class="p">
+<pre class="pre codeblock"><code>$ sudo service impala-state-store start</code></pre>
+      </div>
+
+      <p class="p">
+        Start the catalog service using a command similar to the following:
+      </p>
+
+<pre class="pre codeblock"><code>$ sudo service impala-catalog start</code></pre>
+
+      <p class="p">
+        Start the Impala service on each DataNode using a command similar to the following:
+      </p>
+
+      <div class="p">
+<pre class="pre codeblock"><code>$ sudo service impala-server start</code></pre>
+      </div>
+
+      <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+        <p class="p">
+        In <span class="keyword">Impala 2.5</span> and higher, Impala UDFs and UDAs written in C++ are persisted in the metastore database.
+        Java UDFs are also persisted, if they were created with the new <code class="ph codeph">CREATE FUNCTION</code> syntax for Java UDFs,
+        where the Java function argument and return types are omitted.
+        Java-based UDFs created with the old <code class="ph codeph">CREATE FUNCTION</code> syntax do not persist across restarts
+        because they are held in the memory of the <span class="keyword cmdname">catalogd</span> daemon.
+        Until you re-create such Java UDFs using the new <code class="ph codeph">CREATE FUNCTION</code> syntax,
+        you must reload those Java-based UDFs by running the original <code class="ph codeph">CREATE FUNCTION</code> statements again each time
+        you restart the <span class="keyword cmdname">catalogd</span> daemon.
+        Prior to <span class="keyword">Impala 2.5</span> the requirement to reload functions after a restart applied to both C++ and Java functions.
+      </p>
+      </div>
+
+      <div class="p">
+        If any of the services fail to start, review:
+        <ul class="ul">
+          <li class="li">
+            <a class="xref" href="impala_logging.html#logs_debug">Reviewing Impala Logs</a>
+          </li>
+
+          <li class="li">
+            <a class="xref" href="impala_troubleshooting.html#troubleshooting">Troubleshooting Impala</a>
+          </li>
+        </ul>
+      </div>
+    </div>
+  </article>
+</article></main></body></html>

[05/51] [partial] impala git commit: [DOCS] Impala doc site update for 3.0

Posted by mi...@apache.org.

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_security_files.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_security_files.html b/docs/build3x/html/topics/impala_security_files.html
new file mode 100644
index 0000000..b7fa280
--- /dev/null
+++ b/docs/build3x/html/topics/impala_security_files.html
@@ -0,0 +1,58 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_security.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="secure_files"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Securing Impala Data and Log Files</title></head><body id="secure_files"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Securing Impala Data and Log Files</h1>
+
+
+  <div class="body conbody">
+
+    <p class="p">
+      One aspect of security is to protect files from unauthorized access at the filesystem level. For example, if
+      you store sensitive data in HDFS, you specify permissions on the associated files and directories in HDFS to
+      restrict read and write permissions to the appropriate users and groups.
+    </p>
+
+    <p class="p">
+      If you issue queries containing sensitive values in the <code class="ph codeph">WHERE</code> clause, such as financial
+      account numbers, those values are stored in Impala log files in the Linux filesystem and you must secure
+      those files also. For the locations of Impala log files, see <a class="xref" href="impala_logging.html#logging">Using Impala Logging</a>.
+    </p>
+
+    <p class="p">
+      All Impala read and write operations are performed under the filesystem privileges of the
+      <code class="ph codeph">impala</code> user. The <code class="ph codeph">impala</code> user must be able to read all directories and data
+      files that you query, and write into all the directories and data files for <code class="ph codeph">INSERT</code> and
+      <code class="ph codeph">LOAD DATA</code> statements. At a minimum, make sure the <code class="ph codeph">impala</code> user is in the
+      <code class="ph codeph">hive</code> group so that it can access files and directories shared between Impala and Hive. See
+      <a class="xref" href="impala_prereqs.html#prereqs_account">User Account Requirements</a> for more details.
+    </p>
+
+    <p class="p">
+      Setting file permissions is necessary for Impala to function correctly, but is not an effective security
+      practice by itself:
+    </p>
+
+    <ul class="ul">
+      <li class="li">
+      <p class="p">
+        The way to ensure that only authorized users can submit requests for databases and tables they are allowed
+        to access is to set up Sentry authorization, as explained in
+        <a class="xref" href="impala_authorization.html#authorization">Enabling Sentry Authorization for Impala</a>. With authorization enabled, the checking of the user
+        ID and group is done by Impala, and unauthorized access is blocked by Impala itself. The actual low-level
+        read and write requests are still done by the <code class="ph codeph">impala</code> user, so you must have appropriate
+        file and directory permissions for that user ID.
+      </p>
+      </li>
+
+      <li class="li">
+      <p class="p">
+        You must also set up Kerberos authentication, as described in <a class="xref" href="impala_kerberos.html#kerberos">Enabling Kerberos Authentication for Impala</a>,
+        so that users can only connect from trusted hosts. With Kerberos enabled, if someone connects a new host to
+        the network and creates user IDs that match your privileged IDs, they will be blocked from connecting to
+        Impala at all from that host.
+      </p>
+      </li>
+    </ul>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_security.html">Impala Security</a></div></div></nav></article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_security_guidelines.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_security_guidelines.html b/docs/build3x/html/topics/impala_security_guidelines.html
new file mode 100644
index 0000000..c8bc24c
--- /dev/null
+++ b/docs/build3x/html/topics/impala_security_guidelines.html
@@ -0,0 +1,99 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_security.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="security_guidelines"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Security Guidelines for Impala</title></head><body id="security_guidelines"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Security Guidelines for Impala</h1>
+
+
+  <div class="body conbody">
+
+    <p class="p">
+      The following are the major steps to harden a cluster running Impala against accidents and mistakes, or
+      malicious attackers trying to access sensitive data:
+    </p>
+
+    <ul class="ul">
+      <li class="li">
+      <p class="p">
+        Secure the <code class="ph codeph">root</code> account. The <code class="ph codeph">root</code> user can tamper with the
+        <span class="keyword cmdname">impalad</span> daemon, read and write the data files in HDFS, log into other user accounts, and
+        access other system services that are beyond the control of Impala.
+      </p>
+      </li>
+
+      <li class="li">
+      <p class="p">
+        Restrict membership in the <code class="ph codeph">sudoers</code> list (in the <span class="ph filepath">/etc/sudoers</span> file).
+        The users who can run the <code class="ph codeph">sudo</code> command can do many of the same things as the
+        <code class="ph codeph">root</code> user.
+      </p>
+      </li>
+
+      <li class="li">
+      <p class="p">
+        Ensure the Hadoop ownership and permissions for Impala data files are restricted.
+      </p>
+      </li>
+
+      <li class="li">
+      <p class="p">
+        Ensure the Hadoop ownership and permissions for Impala log files are restricted.
+      </p>
+      </li>
+
+      <li class="li">
+      <p class="p">
+        Ensure that the Impala web UI (available by default on port 25000 on each Impala node) is
+        password-protected. See <a class="xref" href="impala_webui.html#webui">Impala Web User Interface for Debugging</a> for details.
+      </p>
+      </li>
+
+      <li class="li">
+      <p class="p">
+        Create a policy file that specifies which Impala privileges are available to users in particular Hadoop
+        groups (which by default map to Linux OS groups). Create the associated Linux groups using the
+        <span class="keyword cmdname">groupadd</span> command if necessary.
+      </p>
+      </li>
+
+      <li class="li">
+      <p class="p">
+        The Impala authorization feature makes use of the HDFS file ownership and permissions mechanism; for
+        background information, see the
+        <a class="xref" href="https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsPermissionsGuide.html" target="_blank">HDFS Permissions Guide</a>.
+        Set up users and assign them to groups at the OS level, corresponding to the
+        different categories of users with different access levels for various databases, tables, and HDFS
+        locations (URIs). Create the associated Linux users using the <span class="keyword cmdname">useradd</span> command if
+        necessary, and add them to the appropriate groups with the <span class="keyword cmdname">usermod</span> command.
+      </p>
+      </li>
+
+      <li class="li">
+      <p class="p">
+        Design your databases, tables, and views with database and table structure to allow policy rules to specify
+        simple, consistent rules. For example, if all tables related to an application are inside a single
+        database, you can assign privileges for that database and use the <code class="ph codeph">*</code> wildcard for the table
+        name. If you are creating views with different privileges than the underlying base tables, you might put
+        the views in a separate database so that you can use the <code class="ph codeph">*</code> wildcard for the database
+        containing the base tables, while specifying the precise names of the individual views. (For specifying
+        table or database names, you either specify the exact name or <code class="ph codeph">*</code> to mean all the databases
+        on a server, or all the tables and views in a database.)
+      </p>
+      </li>
+
+      <li class="li">
+      <p class="p">
+        Enable authorization by running the <code class="ph codeph">impalad</code> daemons with the <code class="ph codeph">-server_name</code>
+        and <code class="ph codeph">-authorization_policy_file</code> options on all nodes. (The authorization feature does not
+        apply to the <span class="keyword cmdname">statestored</span> daemon, which has no access to schema objects or data files.)
+      </p>
+      </li>
+
+      <li class="li">
+      <p class="p">
+        Set up authentication using Kerberos, to make sure users really are who they say they are.
+      </p>
+      </li>
+    </ul>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_security.html">Impala Security</a></div></div></nav></article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_security_install.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_security_install.html b/docs/build3x/html/topics/impala_security_install.html
new file mode 100644
index 0000000..09d4e38
--- /dev/null
+++ b/docs/build3x/html/topics/impala_security_install.html
@@ -0,0 +1,17 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_security.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="security_install"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Installation Considerations for Impala Security</title></head><body id="security_install"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Installation Considerations for Impala Security</h1>
+
+
+  <div class="body conbody">
+
+    <p class="p">
+      Impala 1.1 comes set up with all the software and settings needed to enable security when you run the
+      <span class="keyword cmdname">impalad</span> daemon with the new security-related options (<code class="ph codeph">-server_name</code> and
+      <code class="ph codeph">-authorization_policy_file</code>). You do not need to change any environment variables or install
+      any additional JAR files.
+    </p>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_security.html">Impala Security</a></div></div></nav></article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_security_metastore.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_security_metastore.html b/docs/build3x/html/topics/impala_security_metastore.html
new file mode 100644
index 0000000..b9034a8
--- /dev/null
+++ b/docs/build3x/html/topics/impala_security_metastore.html
@@ -0,0 +1,30 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_security.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="security_metastore"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Securing the Hive Metastore Database</title></head><body id="security_metastore"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Securing the Hive Metastore Database</h1>
+
+
+  <div class="body conbody">
+
+
+
+    <p class="p">
+      It is important to secure the Hive metastore, so that users cannot access the names or other information
+      about databases and tables the through the Hive client or by querying the metastore database. Do this by
+      turning on Hive metastore security, using the instructions in
+      <span class="xref">the documentation for your Apache Hadoop distribution</span> for securing different Hive components:
+    </p>
+
+    <ul class="ul">
+      <li class="li">
+        Secure the Hive Metastore.
+      </li>
+
+      <li class="li">
+        In addition, allow access to the metastore only from the HiveServer2 server, and then disable local access
+        to the HiveServer2 server.
+      </li>
+    </ul>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_security.html">Impala Security</a></div></div></nav></article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_security_webui.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_security_webui.html b/docs/build3x/html/topics/impala_security_webui.html
new file mode 100644
index 0000000..44f7a19
--- /dev/null
+++ b/docs/build3x/html/topics/impala_security_webui.html
@@ -0,0 +1,57 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_security.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="security_webui"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Securing the Impala Web User Interface</title></head><body id="security_webui"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Securing the Impala Web User Interface</h1>
+
+
+  <div class="body conbody">
+
+    <p class="p">
+      The instructions in this section presume you are familiar with the
+      <a class="xref" href="http://en.wikipedia.org/wiki/.htpasswd" target="_blank">
+      <span class="ph filepath">.htpasswd</span> mechanism</a> commonly used to password-protect pages on web servers.
+    </p>
+
+    <p class="p">
+      Password-protect the Impala web UI that listens on port 25000 by default. Set up a
+      <span class="ph filepath">.htpasswd</span> file in the <code class="ph codeph">$IMPALA_HOME</code> directory, or start both the
+      <span class="keyword cmdname">impalad</span> and <span class="keyword cmdname">statestored</span> daemons with the
+      <code class="ph codeph">--webserver_password_file</code> option to specify a different location (including the filename).
+    </p>
+
+    <p class="p">
+      This file should only be readable by the Impala process and machine administrators, because it contains
+      (hashed) versions of passwords. The username / password pairs are not derived from Unix usernames, Kerberos
+      users, or any other system. The <code class="ph codeph">domain</code> field in the password file must match the domain
+      supplied to Impala by the new command-line option <code class="ph codeph">--webserver_authentication_domain</code>. The
+      default is <code class="ph codeph">mydomain.com</code>.
+
+    </p>
+
+    <p class="p">
+      Impala also supports using HTTPS for secure web traffic. To do so, set
+      <code class="ph codeph">--webserver_certificate_file</code> to refer to a valid <code class="ph codeph">.pem</code> TLS/SSL certificate file.
+      Impala will automatically start using HTTPS once the TLS/SSL certificate has been read and validated. A
+      <code class="ph codeph">.pem</code> file is basically a private key, followed by a signed TLS/SSL certificate; make sure to
+      concatenate both parts when constructing the <code class="ph codeph">.pem</code> file.
+
+    </p>
+
+    <p class="p">
+      If Impala cannot find or parse the <code class="ph codeph">.pem</code> file, it prints an error message and quits.
+    </p>
+
+    <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+      <p class="p">
+        If the private key is encrypted using a passphrase, Impala will ask for that passphrase on startup, which
+        is not useful for a large cluster. In that case, remove the passphrase and make the <code class="ph codeph">.pem</code>
+        file readable only by Impala and administrators.
+      </p>
+      <p class="p">
+        When you turn on TLS/SSL for the Impala web UI, the associated URLs change from <code class="ph codeph">http://</code>
+        prefixes to <code class="ph codeph">https://</code>. Adjust any bookmarks or application code that refers to those URLs.
+      </p>
+    </div>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_security.html">Impala Security</a></div></div></nav></article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_select.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_select.html b/docs/build3x/html/topics/impala_select.html
new file mode 100644
index 0000000..9d99913
--- /dev/null
+++ b/docs/build3x/html/topics/impala_select.html
@@ -0,0 +1,236 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_langref_sql.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_joins.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_order_by.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_group_by.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_having.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_limit.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_offset.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_union.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_subqueries.html"><meta name="DC.Relation" scheme="U
 RI" content="../topics/impala_tablesample.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_with.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_distinct.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="select"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>SELECT Statement</title></head><body id="select"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">SELECT Statement</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+
+      The <code class="ph codeph">SELECT</code> statement performs queries, retrieving data from one or more tables and producing
+      result sets consisting of rows and columns.
+    </p>
+
+    <p class="p">
+      The Impala <code class="ph codeph"><a class="xref" href="impala_insert.html#insert">INSERT</a></code> statement also typically ends
+      with a <code class="ph codeph">SELECT</code> statement, to define data to copy from one table to another.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>[WITH <em class="ph i">name</em> AS (<em class="ph i">select_expression</em>) [, ...] ]
+SELECT
+  [ALL | DISTINCT]
+  [STRAIGHT_JOIN]
+  <em class="ph i">expression</em> [, <em class="ph i">expression</em> ...]
+FROM <em class="ph i">table_reference</em> [, <em class="ph i">table_reference</em> ...]
+[[FULL | [LEFT | RIGHT] INNER | [LEFT | RIGHT] OUTER | [LEFT | RIGHT] SEMI | [LEFT | RIGHT] ANTI | CROSS]
+  JOIN <em class="ph i">table_reference</em>
+  [ON <em class="ph i">join_equality_clauses</em> | USING (<var class="keyword varname">col1</var>[, <var class="keyword varname">col2</var> ...]] ...
+WHERE <em class="ph i">conditions</em>
+GROUP BY { <em class="ph i">column</em> | <em class="ph i">expression</em> [, ...] }
+HAVING <code class="ph codeph">conditions</code>
+ORDER BY { <em class="ph i">column</em> | <em class="ph i">expression</em> [ASC | DESC] [NULLS FIRST | NULLS LAST] [, ...] }
+LIMIT <em class="ph i">expression</em> [OFFSET <em class="ph i">expression</em>]
+[UNION [ALL] <em class="ph i">select_statement</em>] ...]
+
+table_reference := { <var class="keyword varname">table_name</var> | (<var class="keyword varname">subquery</var>) }
+  <span class="ph">[ TABLESAMPLE SYSTEM(<var class="keyword varname">percentage</var>) [REPEATABLE(<var class="keyword varname">seed</var>)] ]</span>
+</code></pre>
+
+    <p class="p">
+      Impala <code class="ph codeph">SELECT</code> queries support:
+    </p>
+
+    <ul class="ul">
+      <li class="li">
+        SQL scalar data types: <code class="ph codeph"><a class="xref" href="impala_boolean.html#boolean">BOOLEAN</a></code>,
+        <code class="ph codeph"><a class="xref" href="impala_tinyint.html#tinyint">TINYINT</a></code>,
+        <code class="ph codeph"><a class="xref" href="impala_smallint.html#smallint">SMALLINT</a></code>,
+        <code class="ph codeph"><a class="xref" href="impala_int.html#int">INT</a></code>,
+        <code class="ph codeph"><a class="xref" href="impala_bigint.html#bigint">BIGINT</a></code>,
+        <code class="ph codeph"><a class="xref" href="impala_decimal.html#decimal">DECIMAL</a></code>
+        <code class="ph codeph"><a class="xref" href="impala_float.html#float">FLOAT</a></code>,
+        <code class="ph codeph"><a class="xref" href="impala_double.html#double">DOUBLE</a></code>,
+        <code class="ph codeph"><a class="xref" href="impala_timestamp.html#timestamp">TIMESTAMP</a></code>,
+        <code class="ph codeph"><a class="xref" href="impala_string.html#string">STRING</a></code>,
+        <code class="ph codeph"><a class="xref" href="impala_varchar.html#varchar">VARCHAR</a></code>,
+        <code class="ph codeph"><a class="xref" href="impala_char.html#char">CHAR</a></code>.
+      </li>
+
+
+      <li class="li">
+        The complex data types <code class="ph codeph">ARRAY</code>, <code class="ph codeph">STRUCT</code>, and <code class="ph codeph">MAP</code>,
+        are available in <span class="keyword">Impala 2.3</span> and higher.
+        Queries involving these types typically involve special qualified names
+        using dot notation for referring to the complex column fields,
+        and join clauses for bringing the complex columns into the result set.
+        See <a class="xref" href="impala_complex_types.html#complex_types">Complex Types (Impala 2.3 or higher only)</a> for details.
+      </li>
+
+      <li class="li">
+        An optional <a class="xref" href="impala_with.html#with"><code class="ph codeph">WITH</code> clause</a> before the
+        <code class="ph codeph">SELECT</code> keyword, to define a subquery whose name or column names can be referenced from
+        later in the main query. This clause lets you abstract repeated clauses, such as aggregation functions,
+        that are referenced multiple times in the same query.
+      </li>
+
+      <li class="li">
+        By default, one <code class="ph codeph">DISTINCT</code> clause per query. See <a class="xref" href="impala_distinct.html#distinct">DISTINCT Operator</a>
+        for details. See <a class="xref" href="impala_appx_count_distinct.html#appx_count_distinct">APPX_COUNT_DISTINCT Query Option (Impala 2.0 or higher only)</a> for a query option to
+        allow multiple <code class="ph codeph">COUNT(DISTINCT)</code> impressions in the same query.
+      </li>
+
+      <li class="li">
+        Subqueries in a <code class="ph codeph">FROM</code> clause. In <span class="keyword">Impala 2.0</span> and higher,
+        subqueries can also go in the <code class="ph codeph">WHERE</code> clause, for example with the
+        <code class="ph codeph">IN()</code>, <code class="ph codeph">EXISTS</code>, and <code class="ph codeph">NOT EXISTS</code> operators.
+      </li>
+
+      <li class="li">
+        <code class="ph codeph">WHERE</code>, <code class="ph codeph">GROUP BY</code>, <code class="ph codeph">HAVING</code> clauses.
+      </li>
+
+      <li class="li">
+        <code class="ph codeph"><a class="xref" href="impala_order_by.html#order_by">ORDER BY</a></code>. Prior to Impala 1.4.0, Impala
+        required that queries using an <code class="ph codeph">ORDER BY</code> clause also include a
+        <code class="ph codeph"><a class="xref" href="impala_limit.html#limit">LIMIT</a></code> clause. In Impala 1.4.0 and higher, this
+        restriction is lifted; sort operations that would exceed the Impala memory limit automatically use a
+        temporary disk work area to perform the sort.
+      </li>
+
+      <li class="li">
+        <p class="p">
+        Impala supports a wide variety of <code class="ph codeph">JOIN</code> clauses. Left, right, semi, full, and outer joins
+        are supported in all Impala versions. The <code class="ph codeph">CROSS JOIN</code> operator is available in Impala 1.2.2
+        and higher. During performance tuning, you can override the reordering of join clauses that Impala does
+        internally by including the keyword <code class="ph codeph">STRAIGHT_JOIN</code> immediately after the
+        <code class="ph codeph">SELECT</code> and any <code class="ph codeph">DISTINCT</code> or <code class="ph codeph">ALL</code> keywords.
+      </p>
+        <p class="p">
+          See <a class="xref" href="impala_joins.html#joins">Joins in Impala SELECT Statements</a> for details and examples of join queries.
+        </p>
+      </li>
+
+      <li class="li">
+        <code class="ph codeph">UNION ALL</code>.
+      </li>
+
+      <li class="li">
+        <code class="ph codeph">LIMIT</code>.
+      </li>
+
+      <li class="li">
+        External tables.
+      </li>
+
+      <li class="li">
+        Relational operators such as greater than, less than, or equal to.
+      </li>
+
+      <li class="li">
+        Arithmetic operators such as addition or subtraction.
+      </li>
+
+      <li class="li">
+        Logical/Boolean operators <code class="ph codeph">AND</code>, <code class="ph codeph">OR</code>, and <code class="ph codeph">NOT</code>. Impala does
+        not support the corresponding symbols <code class="ph codeph">&amp;&amp;</code>, <code class="ph codeph">||</code>, and
+        <code class="ph codeph">!</code>.
+      </li>
+
+      <li class="li">
+        Common SQL built-in functions such as <code class="ph codeph">COUNT</code>, <code class="ph codeph">SUM</code>, <code class="ph codeph">CAST</code>,
+        <code class="ph codeph">LIKE</code>, <code class="ph codeph">IN</code>, <code class="ph codeph">BETWEEN</code>, and <code class="ph codeph">COALESCE</code>. Impala
+        specifically supports built-ins described in <a class="xref" href="impala_functions.html#builtins">Impala Built-In Functions</a>.
+      </li>
+
+      <li class="li">
+        In <span class="keyword">Impala 2.9</span> and higher, an optional <code class="ph codeph">TABLESAMPLE</code>
+        clause immediately after a table reference, to specify that the query only processes a
+        specified percentage of the table data. See <a class="xref" href="impala_tablesample.html">TABLESAMPLE Clause</a> for details.
+      </li>
+    </ul>
+
+    <p class="p">
+        Impala queries ignore files with extensions commonly used for temporary work files by Hadoop tools. Any
+        files with extensions <code class="ph codeph">.tmp</code> or <code class="ph codeph">.copying</code> are not considered part of the
+        Impala table. The suffix matching is case-insensitive, so for example Impala ignores both
+        <code class="ph codeph">.copying</code> and <code class="ph codeph">.COPYING</code> suffixes.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Security considerations:</strong>
+      </p>
+    <p class="p">
+        If these statements in your environment contain sensitive literal values such as credit card numbers or tax
+        identifiers, Impala can redact this sensitive information when displaying the statements in log files and
+        other administrative contexts. See <span class="xref">the documentation for your Apache Hadoop distribution</span> for details.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Amazon S3 considerations:</strong>
+      </p>
+    <p class="p">
+        In <span class="keyword">Impala 2.6</span> and higher, Impala queries are optimized for files stored in Amazon S3.
+        For Impala tables that use the file formats Parquet, RCFile, SequenceFile,
+        Avro, and uncompressed text, the setting <code class="ph codeph">fs.s3a.block.size</code>
+        in the <span class="ph filepath">core-site.xml</span> configuration file determines
+        how Impala divides the I/O work of reading the data files. This configuration
+        setting is specified in bytes. By default, this
+        value is 33554432 (32 MB), meaning that Impala parallelizes S3 read operations on the files
+        as if they were made up of 32 MB blocks. For example, if your S3 queries primarily access
+        Parquet files written by MapReduce or Hive, increase <code class="ph codeph">fs.s3a.block.size</code>
+        to 134217728 (128 MB) to match the row group size of those files. If most S3 queries involve
+        Parquet files written by Impala, increase <code class="ph codeph">fs.s3a.block.size</code>
+        to 268435456 (256 MB) to match the row group size produced by Impala.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Cancellation:</strong> Can be cancelled. To cancel this statement, use Ctrl-C from the
+        <span class="keyword cmdname">impala-shell</span> interpreter, the <span class="ph uicontrol">Cancel</span> button from the
+        <span class="ph uicontrol">Watch</span> page in Hue, or <span class="ph uicontrol">Cancel</span> from the list of
+        in-flight queries (for a particular node) on the <span class="ph uicontrol">Queries</span> tab in the Impala web UI
+        (port 25000).
+      </p>
+
+    <p class="p">
+        <strong class="ph b">HDFS permissions:</strong>
+      </p>
+    <p class="p">
+      The user ID that the <span class="keyword cmdname">impalad</span> daemon runs under,
+      typically the <code class="ph codeph">impala</code> user, must have read
+      permissions for the files in all applicable directories in all source tables,
+      and read and execute permissions for the relevant data directories.
+      (A <code class="ph codeph">SELECT</code> operation could read files from multiple different HDFS directories
+      if the source table is partitioned.)
+      If a query attempts to read a data file and is unable to because of an HDFS permission error,
+      the query halts and does not return any further results.
+    </p>
+
+    <p class="p toc"></p>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+    <p class="p">
+      The <code class="ph codeph">SELECT</code> syntax is so extensive that it forms its own category of statements: queries. The
+      other major classifications of SQL statements are data definition language (see
+      <a class="xref" href="impala_ddl.html#ddl">DDL Statements</a>) and data manipulation language (see <a class="xref" href="impala_dml.html#dml">DML Statements</a>).
+    </p>
+
+    <p class="p">
+      Because the focus of Impala is on fast queries with interactive response times over huge data sets, query
+      performance and scalability are important considerations. See
+      <a class="xref" href="impala_performance.html#performance">Tuning Impala for Performance</a> and <a class="xref" href="impala_scalability.html#scalability">Scalability Considerations for Impala</a> for
+      details.
+    </p>
+  </div>
+
+
+<nav role="navigation" class="related-links"><ul class="ullinks"><li class="link ulchildlink"><strong><a href="../topics/impala_joins.html">Joins in Impala SELECT Statements</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_order_by.html">ORDER BY Clause</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_group_by.html">GROUP BY Clause</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_having.html">HAVING Clause</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_limit.html">LIMIT Clause</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_offset.html">OFFSET Clause</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_union.html">UNION Clause</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_subqueries.html">Subqueries in Impala SELECT Statements</a></strong><
 br></li><li class="link ulchildlink"><strong><a href="../topics/impala_tablesample.html">TABLESAMPLE Clause</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_with.html">WITH Clause</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_distinct.html">DISTINCT Operator</a></strong><br></li></ul><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_langref_sql.html">Impala SQL Statements</a></div></div></nav></article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_seqfile.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_seqfile.html b/docs/build3x/html/topics/impala_seqfile.html
new file mode 100644
index 0000000..5899ba3
--- /dev/null
+++ b/docs/build3x/html/topics/impala_seqfile.html
@@ -0,0 +1,240 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_file_formats.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="seqfile"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Using the SequenceFile File Format with Impala Tables</title></head><body id="seqfile"><main role="main"><article role="article" aria-labelledby="seqfile__sequencefile">
+
+  <h1 class="title topictitle1" id="seqfile__sequencefile">Using the SequenceFile File Format with Impala Tables</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+
+      Impala supports using SequenceFile data files.
+    </p>
+
+    <table class="table"><caption><span class="table--title-label">Table 1. </span><span class="title">SequenceFile Format Support in Impala</span></caption><colgroup><col style="width:10%"><col style="width:10%"><col style="width:20%"><col style="width:30%"><col style="width:30%"></colgroup><thead class="thead">
+          <tr class="row">
+            <th class="entry nocellnorowborder" id="seqfile__entry__1">
+              File Type
+            </th>
+            <th class="entry nocellnorowborder" id="seqfile__entry__2">
+              Format
+            </th>
+            <th class="entry nocellnorowborder" id="seqfile__entry__3">
+              Compression Codecs
+            </th>
+            <th class="entry nocellnorowborder" id="seqfile__entry__4">
+              Impala Can CREATE?
+            </th>
+            <th class="entry nocellnorowborder" id="seqfile__entry__5">
+              Impala Can INSERT?
+            </th>
+          </tr>
+        </thead><tbody class="tbody">
+          <tr class="row">
+            <td class="entry nocellnorowborder" headers="seqfile__entry__1 ">
+              <a class="xref" href="impala_seqfile.html#seqfile">SequenceFile</a>
+            </td>
+            <td class="entry nocellnorowborder" headers="seqfile__entry__2 ">
+              Structured
+            </td>
+            <td class="entry nocellnorowborder" headers="seqfile__entry__3 ">
+              Snappy, gzip, deflate, bzip2
+            </td>
+            <td class="entry nocellnorowborder" headers="seqfile__entry__4 ">Yes.</td>
+            <td class="entry nocellnorowborder" headers="seqfile__entry__5 ">
+              No. Import data by using <code class="ph codeph">LOAD DATA</code> on data files already in the right format, or use
+              <code class="ph codeph">INSERT</code> in Hive followed by <code class="ph codeph">REFRESH <var class="keyword varname">table_name</var></code> in Impala.
+            </td>
+
+          </tr>
+        </tbody></table>
+
+    <p class="p toc inpage"></p>
+  </div>
+
+  <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_file_formats.html">How Impala Works with Hadoop File Formats</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="seqfile__seqfile_create">
+
+    <h2 class="title topictitle2" id="ariaid-title2">Creating SequenceFile Tables and Loading Data</h2>
+
+
+    <div class="body conbody">
+
+      <p class="p">
+        If you do not have an existing data file to use, begin by creating one in the appropriate format.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">To create a SequenceFile table:</strong>
+      </p>
+
+      <p class="p">
+        In the <code class="ph codeph">impala-shell</code> interpreter, issue a command similar to:
+      </p>
+
+<pre class="pre codeblock"><code>create table sequencefile_table (<var class="keyword varname">column_specs</var>) stored as sequencefile;</code></pre>
+
+      <p class="p">
+        Because Impala can query some kinds of tables that it cannot currently write to, after creating tables of
+        certain file formats, you might use the Hive shell to load the data. See
+        <a class="xref" href="impala_file_formats.html#file_formats">How Impala Works with Hadoop File Formats</a> for details. After loading data into a table through
+        Hive or other mechanism outside of Impala, issue a <code class="ph codeph">REFRESH <var class="keyword varname">table_name</var></code>
+        statement the next time you connect to the Impala node, before querying the table, to make Impala recognize
+        the new data.
+      </p>
+
+      <p class="p">
+        For example, here is how you might create some SequenceFile tables in Impala (by specifying the columns
+        explicitly, or cloning the structure of another table), load data through Hive, and query them through
+        Impala:
+      </p>
+
+<pre class="pre codeblock"><code>$ impala-shell -i localhost
+[localhost:21000] &gt; create table seqfile_table (x int) stored as sequencefile;
+[localhost:21000] &gt; create table seqfile_clone like some_other_table stored as sequencefile;
+[localhost:21000] &gt; quit;
+
+$ hive
+hive&gt; insert into table seqfile_table select x from some_other_table;
+3 Rows loaded to seqfile_table
+Time taken: 19.047 seconds
+hive&gt; quit;
+
+$ impala-shell -i localhost
+[localhost:21000] &gt; select * from seqfile_table;
+Returned 0 row(s) in 0.23s
+[localhost:21000] &gt; -- Make Impala recognize the data loaded through Hive;
+[localhost:21000] &gt; refresh seqfile_table;
+[localhost:21000] &gt; select * from seqfile_table;
++---+
+| x |
++---+
+| 1 |
+| 2 |
+| 3 |
++---+
+Returned 3 row(s) in 0.23s</code></pre>
+
+      <p class="p">
+        <strong class="ph b">Complex type considerations:</strong>
+        Although you can create tables in this file format using
+        the complex types (<code class="ph codeph">ARRAY</code>, <code class="ph codeph">STRUCT</code>,
+        and <code class="ph codeph">MAP</code>) available in <span class="keyword">Impala 2.3</span> and higher,
+        currently, Impala can query these types only in Parquet tables.
+        <span class="ph">
+        The one exception to the preceding rule is <code class="ph codeph">COUNT(*)</code> queries on RCFile tables that include complex types.
+        Such queries are allowed in <span class="keyword">Impala 2.6</span> and higher.
+        </span>
+      </p>
+
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="seqfile__seqfile_compression">
+
+    <h2 class="title topictitle2" id="ariaid-title3">Enabling Compression for SequenceFile Tables</h2>
+
+
+    <div class="body conbody">
+
+      <p class="p">
+
+        You may want to enable compression on existing tables. Enabling compression provides performance gains in
+        most cases and is supported for SequenceFile tables. For example, to enable Snappy compression, you would
+        specify the following additional settings when loading data through the Hive shell:
+      </p>
+
+<pre class="pre codeblock"><code>hive&gt; SET hive.exec.compress.output=true;
+hive&gt; SET mapred.max.split.size=256000000;
+hive&gt; SET mapred.output.compression.type=BLOCK;
+hive&gt; SET mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec;
+hive&gt; insert overwrite table <var class="keyword varname">new_table</var> select * from <var class="keyword varname">old_table</var>;</code></pre>
+
+      <p class="p">
+        If you are converting partitioned tables, you must complete additional steps. In such a case, specify
+        additional settings similar to the following:
+      </p>
+
+<pre class="pre codeblock"><code>hive&gt; create table <var class="keyword varname">new_table</var> (<var class="keyword varname">your_cols</var>) partitioned by (<var class="keyword varname">partition_cols</var>) stored as <var class="keyword varname">new_format</var>;
+hive&gt; SET hive.exec.dynamic.partition.mode=nonstrict;
+hive&gt; SET hive.exec.dynamic.partition=true;
+hive&gt; insert overwrite table <var class="keyword varname">new_table</var> partition(<var class="keyword varname">comma_separated_partition_cols</var>) select * from <var class="keyword varname">old_table</var>;</code></pre>
+
+      <p class="p">
+        Remember that Hive does not require that you specify a source format for it. Consider the case of
+        converting a table with two partition columns called <code class="ph codeph">year</code> and <code class="ph codeph">month</code> to a
+        Snappy compressed SequenceFile. Combining the components outlined previously to complete this table
+        conversion, you would specify settings similar to the following:
+      </p>
+
+<pre class="pre codeblock"><code>hive&gt; create table TBL_SEQ (int_col int, string_col string) STORED AS SEQUENCEFILE;
+hive&gt; SET hive.exec.compress.output=true;
+hive&gt; SET mapred.max.split.size=256000000;
+hive&gt; SET mapred.output.compression.type=BLOCK;
+hive&gt; SET mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec;
+hive&gt; SET hive.exec.dynamic.partition.mode=nonstrict;
+hive&gt; SET hive.exec.dynamic.partition=true;
+hive&gt; INSERT OVERWRITE TABLE tbl_seq SELECT * FROM tbl;</code></pre>
+
+      <p class="p">
+        To complete a similar process for a table that includes partitions, you would specify settings similar to
+        the following:
+      </p>
+
+<pre class="pre codeblock"><code>hive&gt; CREATE TABLE tbl_seq (int_col INT, string_col STRING) PARTITIONED BY (year INT) STORED AS SEQUENCEFILE;
+hive&gt; SET hive.exec.compress.output=true;
+hive&gt; SET mapred.max.split.size=256000000;
+hive&gt; SET mapred.output.compression.type=BLOCK;
+hive&gt; SET mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec;
+hive&gt; SET hive.exec.dynamic.partition.mode=nonstrict;
+hive&gt; SET hive.exec.dynamic.partition=true;
+hive&gt; INSERT OVERWRITE TABLE tbl_seq PARTITION(year) SELECT * FROM tbl;</code></pre>
+
+      <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+        <p class="p">
+          The compression type is specified in the following command:
+        </p>
+<pre class="pre codeblock"><code>SET mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec;</code></pre>
+        <p class="p">
+          You could elect to specify alternative codecs such as <code class="ph codeph">GzipCodec</code> here.
+        </p>
+      </div>
+    </div>
+  </article>
+
+
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="seqfile__seqfile_performance">
+
+    <h2 class="title topictitle2" id="ariaid-title4">Query Performance for Impala SequenceFile Tables</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        In general, expect query performance with SequenceFile tables to be
+        faster than with tables using text data, but slower than with
+        Parquet tables. See <a class="xref" href="impala_parquet.html#parquet">Using the Parquet File Format with Impala Tables</a>
+        for information about using the Parquet file format for
+        high-performance analytic queries.
+      </p>
+
+      <p class="p">
+        In <span class="keyword">Impala 2.6</span> and higher, Impala queries are optimized for files stored in Amazon S3.
+        For Impala tables that use the file formats Parquet, RCFile, SequenceFile,
+        Avro, and uncompressed text, the setting <code class="ph codeph">fs.s3a.block.size</code>
+        in the <span class="ph filepath">core-site.xml</span> configuration file determines
+        how Impala divides the I/O work of reading the data files. This configuration
+        setting is specified in bytes. By default, this
+        value is 33554432 (32 MB), meaning that Impala parallelizes S3 read operations on the files
+        as if they were made up of 32 MB blocks. For example, if your S3 queries primarily access
+        Parquet files written by MapReduce or Hive, increase <code class="ph codeph">fs.s3a.block.size</code>
+        to 134217728 (128 MB) to match the row group size of those files. If most S3 queries involve
+        Parquet files written by Impala, increase <code class="ph codeph">fs.s3a.block.size</code>
+        to 268435456 (256 MB) to match the row group size produced by Impala.
+      </p>
+
+    </div>
+  </article>
+
+</article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_set.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_set.html b/docs/build3x/html/topics/impala_set.html
new file mode 100644
index 0000000..4dd5f77
--- /dev/null
+++ b/docs/build3x/html/topics/impala_set.html
@@ -0,0 +1,280 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_langref_sql.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="set"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>SET Statement</title></head><body id="set"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">SET Statement</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+
+      Specifies values for query options that control the runtime behavior of other statements within the same
+      session.
+    </p>
+
+    <p class="p">
+      In <span class="keyword">Impala 2.5</span> and higher, <code class="ph codeph">SET</code> also defines user-specified substitution variables for
+      the <span class="keyword cmdname">impala-shell</span> interpreter. This feature uses the <code class="ph codeph">SET</code> command
+      built into <span class="keyword cmdname">impala-shell</span> instead of the SQL <code class="ph codeph">SET</code> statement.
+      Therefore the substitution mechanism only works with queries processed by <span class="keyword cmdname">impala-shell</span>,
+      not with queries submitted through JDBC or ODBC.
+    </p>
+
+    <div class="note important note_important"><span class="note__title importanttitle">Important:</span>
+      <p class="p">
+        In <span class="keyword">Impala 2.11</span> and higher, the output of the <code class="ph codeph">SET</code>
+        statement changes in some important ways:
+      </p>
+      <ul class="ul">
+        <li class="li">
+          <p class="p">
+            The options are divided into groups: <code class="ph codeph">Regular Query Options</code>,
+            <code class="ph codeph">Advanced Query Options</code>, <code class="ph codeph">Development Query Options</code>, and
+            <code class="ph codeph">Deprecated Query Options</code>.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            The advanced options are intended for use in specific
+            kinds of performance tuning and debugging scenarios. The development options are
+            related to internal development of Impala or features that are not yet finalized;
+            these options might be changed or removed without notice.
+            The deprecated options are related to features that are removed or changed so that
+            the options no longer have any purpose; these options might be removed in future
+            versions.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            By default, only the first two groups (regular and advanced) are
+            displayed by the <code class="ph codeph">SET</code> command. Use the syntax <code class="ph codeph">SET ALL</code>
+            to see all groups of options.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            <span class="keyword cmdname">impala-shell</span> options and user-specified variables are always displayed
+            at the end of the list of query options, after all appropriate option groups.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            When the <code class="ph codeph">SET</code> command is run through the JDBC or ODBC interfaces,
+            the result set has a new third column, <code class="ph codeph">level</code>, indicating which
+            group each option belongs to. The same distinction of <code class="ph codeph">SET</code>
+            returning the regular and advanced options, and <code class="ph codeph">SET ALL</code>
+            returning all option groups, applies to JDBC and ODBC also.
+          </p>
+        </li>
+      </ul>
+    </div>
+
+    <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>SET [<var class="keyword varname">query_option</var>=<var class="keyword varname">option_value</var>]
+<span class="ph">SET ALL</span>
+</code></pre>
+
+    <p class="p">
+      <code class="ph codeph">SET</code> and <code class="ph codeph">SET ALL</code> with no arguments return a
+      result set consisting of all the applicable query options and their current values.
+    </p>
+
+    <p class="p">
+      The query option name and any string argument values are case-insensitive.
+    </p>
+
+    <p class="p">
+      Each query option has a specific allowed notation for its arguments. Boolean options can be enabled and
+      disabled by assigning values of either <code class="ph codeph">true</code> and <code class="ph codeph">false</code>, or
+      <code class="ph codeph">1</code> and <code class="ph codeph">0</code>. Some numeric options accept a final character signifying the unit,
+      such as <code class="ph codeph">2g</code> for 2 gigabytes or <code class="ph codeph">100m</code> for 100 megabytes. See
+      <a class="xref" href="impala_query_options.html#query_options">Query Options for the SET Statement</a> for the details of each query option.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Setting query options during impala-shell invocation:</strong>
+    </p>
+
+    <p class="p">
+      In <span class="keyword">Impala 2.11</span> and higher, you can use one or more command-line options
+      of the form <code class="ph codeph">--query_option=<var class="keyword varname">option</var>=<var class="keyword varname">value</var></code>
+      when running the <span class="keyword cmdname">impala-shell</span> command. The corresponding query option settings
+      take effect for that <span class="keyword cmdname">impala-shell</span> session.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">User-specified substitution variables:</strong>
+    </p>
+
+    <p class="p">
+      In <span class="keyword">Impala 2.5</span> and higher, you can specify your own names and string substitution values
+      within the <span class="keyword cmdname">impala-shell</span> interpreter. Once a substitution variable is set up,
+      its value is inserted into any SQL statement in that same <span class="keyword cmdname">impala-shell</span> session
+      that contains the notation <code class="ph codeph">${var:<var class="keyword varname">varname</var>}</code>.
+      Using <code class="ph codeph">SET</code> in an interactive <span class="keyword cmdname">impala-shell</span> session overrides
+      any value for that same variable passed in through the <code class="ph codeph">--var=<var class="keyword varname">varname</var>=<var class="keyword varname">value</var></code>
+      command-line option.
+    </p>
+
+    <p class="p">
+      For example, to set up some default parameters for report queries, but then override those default
+      within an <span class="keyword cmdname">impala-shell</span> session, you might issue commands and statements such as
+      the following:
+    </p>
+
+<pre class="pre codeblock"><code>
+-- Initial setup for this example.
+create table staging_table (s string);
+insert into staging_table values ('foo'), ('bar'), ('bletch');
+
+create table production_table (s string);
+insert into production_table values ('North America'), ('EMEA'), ('Asia');
+quit;
+
+-- Start impala-shell with user-specified substitution variables,
+-- run a query, then override the variables with SET and run the query again.
+$ impala-shell --var=table_name=staging_table --var=cutoff=2
+... <var class="keyword varname">banner message</var> ...
+[localhost:21000] &gt; select s from ${var:table_name} order by s limit ${var:cutoff};
+Query: select s from staging_table order by s limit 2
++--------+
+| s      |
++--------+
+| bar    |
+| bletch |
++--------+
+Fetched 2 row(s) in 1.06s
+
+[localhost:21000] &gt; set var:table_name=production_table;
+Variable TABLE_NAME set to production_table
+[localhost:21000] &gt; set var:cutoff=3;
+Variable CUTOFF set to 3
+
+[localhost:21000] &gt; select s from ${var:table_name} order by s limit ${var:cutoff};
+Query: select s from production_table order by s limit 3
++---------------+
+| s             |
++---------------+
+| Asia          |
+| EMEA          |
+| North America |
++---------------+
+</code></pre>
+
+    <p class="p">
+      The following example shows how <code class="ph codeph">SET ALL</code> with no parameters displays
+      all user-specified substitution variables, and how <code class="ph codeph">UNSET</code> removes
+      the substitution variable entirely:
+    </p>
+
+<pre class="pre codeblock"><code>
+[localhost:21000] &gt; set all;
+Query options (defaults shown in []):
+ABORT_ON_ERROR: [0]
+COMPRESSION_CODEC: []
+DISABLE_CODEGEN: [0]
+...
+
+Advanced Query Options:
+APPX_COUNT_DISTINCT: [0]
+BUFFER_POOL_LIMIT: []
+DEFAULT_JOIN_DISTRIBUTION_MODE: [0]
+...
+
+Development Query Options:
+BATCH_SIZE: [0]
+DEBUG_ACTION: []
+DECIMAL_V2: [0]
+...
+
+Deprecated Query Options:
+ABORT_ON_DEFAULT_LIMIT_EXCEEDED: [0]
+ALLOW_UNSUPPORTED_FORMATS: [0]
+DEFAULT_ORDER_BY_LIMIT: [-1]
+...
+
+Shell Options
+  LIVE_PROGRESS: False
+  LIVE_SUMMARY: False
+
+Variables:
+  CUTOFF: 3
+  TABLE_NAME: staging_table
+
+[localhost:21000] &gt; unset var:cutoff;
+Unsetting variable CUTOFF
+[localhost:21000] &gt; select s from ${var:table_name} order by s limit ${var:cutoff};
+Error: Unknown variable CUTOFF
+</code></pre>
+
+    <p class="p">
+      See <a class="xref" href="impala_shell_running_commands.html">Running Commands and SQL Statements in impala-shell</a> for more examples of using the
+      <code class="ph codeph">--var</code>, <code class="ph codeph">SET</code>, and <code class="ph codeph">${var:<var class="keyword varname">varname</var>}</code>
+      substitution technique in <span class="keyword cmdname">impala-shell</span>.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+    <p class="p">
+      <code class="ph codeph">MEM_LIMIT</code> is probably the most commonly used query option. You can specify a high value to
+      allow a resource-intensive query to complete. For testing how queries would work on memory-constrained
+      systems, you might specify an artificially low value.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Complex type considerations:</strong>
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+    <p class="p">
+      The following example sets some numeric and some Boolean query options to control usage of memory, disk
+      space, and timeout periods, then runs a query whose success could depend on the options in effect:
+    </p>
+
+<pre class="pre codeblock"><code>set mem_limit=64g;
+set DISABLE_UNSAFE_SPILLS=true;
+set parquet_file_size=400m;
+set RESERVATION_REQUEST_TIMEOUT=900000;
+insert overwrite parquet_table select c1, c2, count(c3) from text_table group by c1, c2, c3;
+</code></pre>
+
+    <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.0.0</span>
+      </p>
+
+    <p class="p">
+      <code class="ph codeph">SET</code> has always been available as an <span class="keyword cmdname">impala-shell</span> command. Promoting it to
+      a SQL statement lets you use this feature in client applications through the JDBC and ODBC APIs.
+    </p>
+
+
+
+    <p class="p">
+        <strong class="ph b">Cancellation:</strong> Cannot be cancelled.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">HDFS permissions:</strong> This statement does not touch any HDFS files or directories,
+        therefore no HDFS permissions are required.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+    <p class="p">
+      See <a class="xref" href="impala_query_options.html#query_options">Query Options for the SET Statement</a> for the query options you can adjust using this
+      statement.
+    </p>
+  </div>
+<nav role="navigation" class="related-links"><ul class="ullinks"><li class="link ulchildlink"><strong><a href="../topics/impala_query_options.html">Query Options for the SET Statement</a></strong><br></li></ul><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_langref_sql.html">Impala SQL Statements</a></div></div></nav></article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_shell_commands.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_shell_commands.html b/docs/build3x/html/topics/impala_shell_commands.html
new file mode 100644
index 0000000..1d67a69
--- /dev/null
+++ b/docs/build3x/html/topics/impala_shell_commands.html
@@ -0,0 +1,416 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_impala_shell.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="shell_commands"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>impala-shell Command Reference</title></head><body id="shell_commands"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">impala-shell Command Reference</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+
+      Use the following commands within <code class="ph codeph">impala-shell</code> to pass requests to the
+      <code class="ph codeph">impalad</code> daemon that the shell is connected to. You can enter a command interactively at the
+      prompt, or pass it as the argument to the <code class="ph codeph">-q</code> option of <code class="ph codeph">impala-shell</code>. Most
+      of these commands are passed to the Impala daemon as SQL statements; refer to the corresponding
+      <a class="xref" href="impala_langref_sql.html#langref_sql">SQL language reference sections</a> for full syntax
+      details.
+    </p>
+
+    <table class="table"><caption></caption><colgroup><col style="width:20%"><col style="width:80%"></colgroup><thead class="thead">
+          <tr class="row">
+            <th class="entry nocellnorowborder" id="shell_commands__entry__1">
+              Command
+            </th>
+            <th class="entry nocellnorowborder" id="shell_commands__entry__2">
+              Explanation
+            </th>
+          </tr>
+        </thead><tbody class="tbody">
+          <tr class="row" id="shell_commands__alter_cmd">
+            <td class="entry nocellnorowborder" headers="shell_commands__entry__1 ">
+              <p class="p">
+                <code class="ph codeph">alter</code>
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="shell_commands__entry__2 ">
+              <p class="p">
+                Changes the underlying structure or settings of an Impala table, or a table shared between Impala
+                and Hive. See <a class="xref" href="impala_alter_table.html#alter_table">ALTER TABLE Statement</a> and
+                <a class="xref" href="impala_alter_view.html#alter_view">ALTER VIEW Statement</a> for details.
+              </p>
+            </td>
+          </tr>
+          <tr class="row" id="shell_commands__compute_cmd">
+            <td class="entry nocellnorowborder" headers="shell_commands__entry__1 ">
+              <p class="p">
+                <code class="ph codeph">compute stats</code>
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="shell_commands__entry__2 ">
+              <p class="p">
+                Gathers important performance-related information for a table, used by Impala to optimize queries.
+                See <a class="xref" href="impala_compute_stats.html#compute_stats">COMPUTE STATS Statement</a> for details.
+              </p>
+            </td>
+          </tr>
+          <tr class="row" id="shell_commands__connect_cmd">
+            <td class="entry nocellnorowborder" headers="shell_commands__entry__1 ">
+              <p class="p">
+                <code class="ph codeph">connect</code>
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="shell_commands__entry__2 ">
+              <p class="p">
+                Connects to the specified instance of <code class="ph codeph">impalad</code>. The default port of 21000 is
+                assumed unless you provide another value. You can connect to any host in your cluster that is
+                running <code class="ph codeph">impalad</code>. If you connect to an instance of <code class="ph codeph">impalad</code> that
+                was started with an alternate port specified by the <code class="ph codeph">--fe_port</code> flag, you must
+                provide that alternate port. See <a class="xref" href="impala_connecting.html#connecting">Connecting to impalad through impala-shell</a> for examples.
+              </p>
+
+              <p class="p">
+        The <code class="ph codeph">SET</code> statement has no effect until the <span class="keyword cmdname">impala-shell</span> interpreter is
+        connected to an Impala server. Once you are connected, any query options you set remain in effect as you
+        issue a subsequent <code class="ph codeph">CONNECT</code> command to connect to a different Impala host.
+      </p>
+            </td>
+          </tr>
+          <tr class="row" id="shell_commands__describe_cmd">
+            <td class="entry nocellnorowborder" headers="shell_commands__entry__1 ">
+              <p class="p">
+                <code class="ph codeph">describe</code>
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="shell_commands__entry__2 ">
+              <p class="p">
+                Shows the columns, column data types, and any column comments for a specified table.
+                <code class="ph codeph">DESCRIBE FORMATTED</code> shows additional information such as the HDFS data directory,
+                partitions, and internal properties for the table. See <a class="xref" href="impala_describe.html#describe">DESCRIBE Statement</a>
+                for details about the basic <code class="ph codeph">DESCRIBE</code> output and the <code class="ph codeph">DESCRIBE
+                FORMATTED</code> variant. You can use <code class="ph codeph">DESC</code> as shorthand for the
+                <code class="ph codeph">DESCRIBE</code> command.
+              </p>
+            </td>
+          </tr>
+          <tr class="row" id="shell_commands__drop_cmd">
+            <td class="entry nocellnorowborder" headers="shell_commands__entry__1 ">
+              <p class="p">
+                <code class="ph codeph">drop</code>
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="shell_commands__entry__2 ">
+              <p class="p">
+                Removes a schema object, and in some cases its associated data files. See
+                <a class="xref" href="impala_drop_table.html#drop_table">DROP TABLE Statement</a>, <a class="xref" href="impala_drop_view.html#drop_view">DROP VIEW Statement</a>,
+                <a class="xref" href="impala_drop_database.html#drop_database">DROP DATABASE Statement</a>, and
+                <a class="xref" href="impala_drop_function.html#drop_function">DROP FUNCTION Statement</a> for details.
+              </p>
+            </td>
+          </tr>
+          <tr class="row" id="shell_commands__explain_cmd">
+            <td class="entry nocellnorowborder" headers="shell_commands__entry__1 ">
+              <p class="p">
+                <code class="ph codeph">explain</code>
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="shell_commands__entry__2 ">
+              <p class="p">
+                Provides the execution plan for a query. <code class="ph codeph">EXPLAIN</code> represents a query as a series of
+                steps. For example, these steps might be map/reduce stages, metastore operations, or file system
+                operations such as move or rename. See <a class="xref" href="impala_explain.html#explain">EXPLAIN Statement</a> and
+                <a class="xref" href="impala_explain_plan.html#perf_explain">Using the EXPLAIN Plan for Performance Tuning</a> for details.
+              </p>
+            </td>
+          </tr>
+          <tr class="row" id="shell_commands__help_cmd">
+            <td class="entry nocellnorowborder" headers="shell_commands__entry__1 ">
+              <p class="p">
+                <code class="ph codeph">help</code>
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="shell_commands__entry__2 ">
+              <p class="p">
+                Help provides a list of all available commands and options.
+              </p>
+            </td>
+          </tr>
+          <tr class="row" id="shell_commands__history_cmd">
+            <td class="entry nocellnorowborder" headers="shell_commands__entry__1 ">
+              <p class="p">
+                <code class="ph codeph">history</code>
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="shell_commands__entry__2 ">
+              <p class="p">
+                Maintains an enumerated cross-session command history. This history is stored in the
+                <span class="ph filepath">~/.impalahistory</span> file.
+              </p>
+            </td>
+          </tr>
+          <tr class="row" id="shell_commands__insert_cmd">
+            <td class="entry nocellnorowborder" headers="shell_commands__entry__1 ">
+              <p class="p">
+                <code class="ph codeph">insert</code>
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="shell_commands__entry__2 ">
+              <p class="p">
+                Writes the results of a query to a specified table. This either overwrites table data or appends
+                data to the existing table content. See <a class="xref" href="impala_insert.html#insert">INSERT Statement</a> for details.
+              </p>
+            </td>
+          </tr>
+          <tr class="row" id="shell_commands__invalidate_metadata_cmd">
+            <td class="entry nocellnorowborder" headers="shell_commands__entry__1 ">
+              <p class="p">
+                <code class="ph codeph">invalidate metadata</code>
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="shell_commands__entry__2 ">
+              <p class="p">
+                Updates <span class="keyword cmdname">impalad</span> metadata for table existence and structure. Use this command
+                after creating, dropping, or altering databases, tables, or partitions in Hive. See
+                <a class="xref" href="impala_invalidate_metadata.html#invalidate_metadata">INVALIDATE METADATA Statement</a> for details.
+              </p>
+            </td>
+          </tr>
+          <tr class="row" id="shell_commands__profile_cmd">
+            <td class="entry nocellnorowborder" headers="shell_commands__entry__1 ">
+              <p class="p">
+                <code class="ph codeph">profile</code>
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="shell_commands__entry__2 ">
+              <p class="p">
+                Displays low-level information about the most recent query. Used for performance diagnosis and
+                tuning. <span class="ph"> The report starts with the same information as produced by the
+                <code class="ph codeph">EXPLAIN</code> statement and the <code class="ph codeph">SUMMARY</code> command.</span> See
+                <a class="xref" href="impala_explain_plan.html#perf_profile">Using the Query Profile for Performance Tuning</a> for details.
+              </p>
+            </td>
+          </tr>
+          <tr class="row" id="shell_commands__quit_cmd">
+            <td class="entry nocellnorowborder" headers="shell_commands__entry__1 ">
+              <p class="p">
+                <code class="ph codeph">quit</code>
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="shell_commands__entry__2 ">
+              <p class="p">
+                Exits the shell. Remember to include the final semicolon so that the shell recognizes the end of
+                the command.
+              </p>
+            </td>
+          </tr>
+          <tr class="row" id="shell_commands__refresh_cmd">
+            <td class="entry nocellnorowborder" headers="shell_commands__entry__1 ">
+              <p class="p">
+                <code class="ph codeph">refresh</code>
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="shell_commands__entry__2 ">
+              <p class="p">
+                Refreshes <span class="keyword cmdname">impalad</span> metadata for the locations of HDFS blocks corresponding to
+                Impala data files. Use this command after loading new data files into an Impala table through Hive
+                or through HDFS commands. See <a class="xref" href="impala_refresh.html#refresh">REFRESH Statement</a> for details.
+              </p>
+            </td>
+          </tr>
+          <tr class="row" id="shell_commands__rerun_cmd">
+            <td class="entry nocellnorowborder" headers="shell_commands__entry__1 ">
+              <p class="p">
+                <code class="ph codeph">rerun</code> or <code class="ph codeph">@</code>
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="shell_commands__entry__2 ">
+              <p class="p">
+                Executes a previous <span class="keyword cmdname">impala-shell</span> command again,
+                from the list of commands displayed by the <code class="ph codeph">history</code>
+                command. These could be SQL statements, or commands specific to
+                <span class="keyword cmdname">impala-shell</span> such as <code class="ph codeph">quit</code>
+                or <code class="ph codeph">profile</code>.
+              </p>
+              <p class="p">
+                Specify an integer argument. A positive integer <code class="ph codeph">N</code>
+                represents the command labelled <code class="ph codeph">N</code> in the history list.
+                A negative integer <code class="ph codeph">-N</code> represents the <code class="ph codeph">N</code>th
+                command from the end of the list, such as -1 for the most recent command.
+                Commands that are executed again do not produce new entries in the
+                history list.
+              </p>
+            </td>
+          </tr>
+          <tr class="row" id="shell_commands__select_cmd">
+            <td class="entry nocellnorowborder" headers="shell_commands__entry__1 ">
+              <p class="p">
+                <code class="ph codeph">select</code>
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="shell_commands__entry__2 ">
+              <p class="p">
+                Specifies the data set on which to complete some action. All information returned from
+                <code class="ph codeph">select</code> can be sent to some output such as the console or a file or can be used to
+                complete some other element of query. See <a class="xref" href="impala_select.html#select">SELECT Statement</a> for details.
+              </p>
+            </td>
+          </tr>
+          <tr class="row" id="shell_commands__set_cmd">
+            <td class="entry nocellnorowborder" headers="shell_commands__entry__1 ">
+              <p class="p">
+                <code class="ph codeph">set</code>
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="shell_commands__entry__2 ">
+              <p class="p">
+                Manages query options for an <span class="keyword cmdname">impala-shell</span> session. The available options are the
+                ones listed in <a class="xref" href="impala_query_options.html#query_options">Query Options for the SET Statement</a>. These options are used for
+                query tuning and troubleshooting. Issue <code class="ph codeph">SET</code> with no arguments to see the current
+                query options, either based on the <span class="keyword cmdname">impalad</span> defaults, as specified by you at
+                <span class="keyword cmdname">impalad</span> startup, or based on earlier <code class="ph codeph">SET</code> statements in the same
+                session. To modify option values, issue commands with the syntax <code class="ph codeph">set
+                <var class="keyword varname">option</var>=<var class="keyword varname">value</var></code>. To restore an option to its default,
+                use the <code class="ph codeph">unset</code> command. Some options take Boolean values of <code class="ph codeph">true</code>
+                and <code class="ph codeph">false</code>. Others take numeric arguments, or quoted string values.
+              </p>
+
+              <p class="p">
+        The <code class="ph codeph">SET</code> statement has no effect until the <span class="keyword cmdname">impala-shell</span> interpreter is
+        connected to an Impala server. Once you are connected, any query options you set remain in effect as you
+        issue a subsequent <code class="ph codeph">CONNECT</code> command to connect to a different Impala host.
+      </p>
+
+              <p class="p">
+                In Impala 2.0 and later, <code class="ph codeph">SET</code> is available as a SQL statement for any kind of
+                application, not only through <span class="keyword cmdname">impala-shell</span>. See
+                <a class="xref" href="impala_set.html#set">SET Statement</a> for details.
+              </p>
+
+              <p class="p">
+                In Impala 2.5 and later, you can use <code class="ph codeph">SET</code> to define your own substitution variables
+                within an <span class="keyword cmdname">impala-shell</span> session.
+                Within a SQL statement, you substitute the value by using the notation <code class="ph codeph">${var:<var class="keyword varname">variable_name</var>}</code>.
+              </p>
+            </td>
+          </tr>
+          <tr class="row" id="shell_commands__shell_cmd">
+            <td class="entry nocellnorowborder" headers="shell_commands__entry__1 ">
+              <p class="p">
+                <code class="ph codeph">shell</code>
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="shell_commands__entry__2 ">
+              <p class="p">
+                Executes the specified command in the operating system shell without exiting
+                <code class="ph codeph">impala-shell</code>. You can use the <code class="ph codeph">!</code> character as shorthand for the
+                <code class="ph codeph">shell</code> command.
+              </p>
+
+              <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+                Quote any instances of the <code class="ph codeph">--</code> or <code class="ph codeph">/*</code> tokens to avoid them being
+                interpreted as the start of a comment. To embed comments within <code class="ph codeph">source</code> or
+                <code class="ph codeph">!</code> commands, use the shell comment character <code class="ph codeph">#</code> before the comment
+                portion of the line.
+              </div>
+            </td>
+          </tr>
+          <tr class="row" id="shell_commands__show_cmd">
+            <td class="entry nocellnorowborder" headers="shell_commands__entry__1 ">
+              <p class="p">
+                <code class="ph codeph">show</code>
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="shell_commands__entry__2 ">
+              <p class="p">
+                Displays metastore data for schema objects created and accessed through Impala, Hive, or both.
+                <code class="ph codeph">show</code> can be used to gather information about objects such as databases, tables, and functions.
+                See <a class="xref" href="impala_show.html#show">SHOW Statement</a> for details.
+              </p>
+            </td>
+          </tr>
+          <tr class="row" id="shell_commands__source_cmd">
+            <td class="entry nocellnorowborder" headers="shell_commands__entry__1 ">
+              <p class="p">
+                <code class="ph codeph">source</code> or <code class="ph codeph">src</code>
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="shell_commands__entry__2 ">
+              <p class="p">
+                Executes one or more statements residing in a specified file from the local filesystem.
+                Allows you to perform the same kinds of batch operations as with the <code class="ph codeph">-f</code> option,
+                but interactively within the interpreter. The file can contain SQL statements and other
+                <span class="keyword cmdname">impala-shell</span> commands, including additional <code class="ph codeph">SOURCE</code> commands
+                to perform a flexible sequence of actions. Each command or statement, except the last one in the file,
+                must end with a semicolon.
+                See <a class="xref" href="impala_shell_running_commands.html#shell_running_commands">Running Commands and SQL Statements in impala-shell</a> for examples.
+              </p>
+            </td>
+          </tr>
+          <tr class="row" id="shell_commands__summary_cmd">
+            <td class="entry nocellnorowborder" headers="shell_commands__entry__1 ">
+              <p class="p">
+                <code class="ph codeph">summary</code>
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="shell_commands__entry__2 ">
+              <p class="p">
+                Summarizes the work performed in various stages of a query. It provides a higher-level view of the
+                information displayed by the <code class="ph codeph">EXPLAIN</code> command. Added in Impala 1.4.0. See
+                <a class="xref" href="impala_explain_plan.html#perf_summary">Using the SUMMARY Report for Performance Tuning</a> for details about the report format
+                and how to interpret it.
+              </p>
+              <p class="p">
+                In <span class="keyword">Impala 2.3</span> and higher, you can see a continuously updated report of
+                the summary information while a query is in progress.
+                See <a class="xref" href="impala_live_summary.html#live_summary">LIVE_SUMMARY Query Option (Impala 2.3 or higher only)</a> for details.
+              </p>
+            </td>
+          </tr>
+          <tr class="row" id="shell_commands__unset_cmd">
+            <td class="entry nocellnorowborder" headers="shell_commands__entry__1 ">
+              <p class="p">
+                <code class="ph codeph">unset</code>
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="shell_commands__entry__2 ">
+              <p class="p">
+                Removes any user-specified value for a query option and returns the option to its default value.
+                See <a class="xref" href="impala_query_options.html#query_options">Query Options for the SET Statement</a> for the available query options.
+              </p>
+              <p class="p">
+                In <span class="keyword">Impala 2.5</span> and higher, it can also remove user-specified substitution variables
+                using the notation <code class="ph codeph">UNSET VAR:<var class="keyword varname">variable_name</var></code>.
+              </p>
+            </td>
+          </tr>
+          <tr class="row" id="shell_commands__use_cmd">
+            <td class="entry nocellnorowborder" headers="shell_commands__entry__1 ">
+              <p class="p">
+                <code class="ph codeph">use</code>
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="shell_commands__entry__2 ">
+              <p class="p">
+                Indicates the database against which to execute subsequent commands. Lets you avoid using fully
+                qualified names when referring to tables in databases other than <code class="ph codeph">default</code>. See
+                <a class="xref" href="impala_use.html#use">USE Statement</a> for details. Not effective with the <code class="ph codeph">-q</code> option,
+                because that option only allows a single statement in the argument.
+              </p>
+            </td>
+          </tr>
+          <tr class="row" id="shell_commands__version_cmd">
+            <td class="entry nocellnorowborder" headers="shell_commands__entry__1 ">
+              <p class="p">
+                <code class="ph codeph">version</code>
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="shell_commands__entry__2 ">
+              <p class="p">
+                Returns Impala version information.
+              </p>
+            </td>
+          </tr>
+        </tbody></table>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_impala_shell.html">Using the Impala Shell (impala-shell Command)</a></div></div></nav></article></main></body></html>

[43/51] [partial] impala git commit: [DOCS] Impala doc site update for 3.0

Posted by mi...@apache.org.

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_complex_types.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_complex_types.html b/docs/build3x/html/topics/impala_complex_types.html
new file mode 100644
index 0000000..32e40d5
--- /dev/null
+++ b/docs/build3x/html/topics/impala_complex_types.html
@@ -0,0 +1,2606 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_datatypes.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="complex_types"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Complex Types (Impala 2.3 or higher only)</title></head><body id="complex_types"><main role="main"><article role="article" aria-labelledby="complex_types__nested_types">
+
+  <h1 class="title topictitle1" id="complex_types__nested_types">Complex Types (<span class="keyword">Impala 2.3</span> or higher only)</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+
+
+
+      <dfn class="term">Complex types</dfn> (also referred to as <dfn class="term">nested types</dfn>) let you represent multiple data values within a single
+      row/column position. They differ from the familiar column types such as <code class="ph codeph">BIGINT</code> and <code class="ph codeph">STRING</code>, known as
+      <dfn class="term">scalar types</dfn> or <dfn class="term">primitive types</dfn>, which represent a single data value within a given row/column position.
+      Impala supports the complex types <code class="ph codeph">ARRAY</code>, <code class="ph codeph">MAP</code>, and <code class="ph codeph">STRUCT</code> in <span class="keyword">Impala 2.3</span>
+      and higher. The Hive <code class="ph codeph">UNION</code> type is not currently supported.
+    </p>
+
+    <p class="p toc inpage"></p>
+
+    <p class="p">
+      Once you understand the basics of complex types, refer to the individual type topics when you need to refresh your memory about syntax
+      and examples:
+    </p>
+
+    <ul class="ul">
+      <li class="li">
+        <a class="xref" href="impala_array.html#array">ARRAY Complex Type (Impala 2.3 or higher only)</a>
+      </li>
+
+      <li class="li">
+        <a class="xref" href="impala_struct.html#struct">STRUCT Complex Type (Impala 2.3 or higher only)</a>
+      </li>
+
+      <li class="li">
+        <a class="xref" href="impala_map.html#map">MAP Complex Type (Impala 2.3 or higher only)</a>
+      </li>
+    </ul>
+
+  </div>
+
+  <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_datatypes.html">Data Types</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="complex_types__complex_types_benefits">
+
+    <h2 class="title topictitle2" id="ariaid-title2">Benefits of Impala Complex Types</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        The reasons for using Impala complex types include the following:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          <p class="p">
+            You already have data produced by Hive or other non-Impala component that uses the complex type column names. You might need to
+            convert the underlying data to Parquet to use it with Impala.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            Your data model originates with a non-SQL programming language or a NoSQL data management system. For example, if you are
+            representing Python data expressed as nested lists, dictionaries, and tuples, those data structures correspond closely to Impala
+            <code class="ph codeph">ARRAY</code>, <code class="ph codeph">MAP</code>, and <code class="ph codeph">STRUCT</code> types.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            Your analytic queries involving multiple tables could benefit from greater locality during join processing. By packing more
+            related data items within each HDFS data block, complex types let join queries avoid the network overhead of the traditional
+            Hadoop shuffle or broadcast join techniques.
+          </p>
+        </li>
+      </ul>
+
+      <p class="p">
+        The Impala complex type support produces result sets with all scalar values, and the scalar components of complex types can be used
+        with all SQL clauses, such as <code class="ph codeph">GROUP BY</code>, <code class="ph codeph">ORDER BY</code>, all kinds of joins, subqueries, and inline
+        views. The ability to process complex type data entirely in SQL reduces the need to write application-specific code in Java or other
+        programming languages to deconstruct the underlying data structures.
+      </p>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="complex_types__complex_types_overview">
+
+    <h2 class="title topictitle2" id="ariaid-title3">Overview of Impala Complex Types</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+
+        The <code class="ph codeph">ARRAY</code> and <code class="ph codeph">MAP</code> types are closely related: they represent collections with arbitrary numbers of
+        elements, where each element is the same type. In contrast, <code class="ph codeph">STRUCT</code> groups together a fixed number of items into a
+        single element. The parts of a <code class="ph codeph">STRUCT</code> element (the <dfn class="term">fields</dfn>) can be of different types, and each field
+        has a name.
+      </p>
+
+      <p class="p">
+        The elements of an <code class="ph codeph">ARRAY</code> or <code class="ph codeph">MAP</code>, or the fields of a <code class="ph codeph">STRUCT</code>, can also be other
+        complex types. You can construct elaborate data structures with up to 100 levels of nesting. For example, you can make an
+        <code class="ph codeph">ARRAY</code> whose elements are <code class="ph codeph">STRUCT</code>s. Within each <code class="ph codeph">STRUCT</code>, you can have some fields
+        that are <code class="ph codeph">ARRAY</code>, <code class="ph codeph">MAP</code>, or another kind of <code class="ph codeph">STRUCT</code>. The Impala documentation uses the
+        terms complex and nested types interchangeably; for simplicity, it primarily uses the term complex types, to encompass all the
+        properties of these types.
+      </p>
+
+      <p class="p">
+        When visualizing your data model in familiar SQL terms, you can think of each <code class="ph codeph">ARRAY</code> or <code class="ph codeph">MAP</code> as a
+        miniature table, and each <code class="ph codeph">STRUCT</code> as a row within such a table. By default, the table represented by an
+        <code class="ph codeph">ARRAY</code> has two columns, <code class="ph codeph">POS</code> to represent ordering of elements, and <code class="ph codeph">ITEM</code>
+        representing the value of each element. Likewise, by default, the table represented by a <code class="ph codeph">MAP</code> encodes key-value
+        pairs, and therefore has two columns, <code class="ph codeph">KEY</code> and <code class="ph codeph">VALUE</code>.
+
+      </p>
+
+      <p class="p">
+        The <code class="ph codeph">ITEM</code> and <code class="ph codeph">VALUE</code> names are only required for the very simplest kinds of <code class="ph codeph">ARRAY</code>
+        and <code class="ph codeph">MAP</code> columns, ones that hold only scalar values. When the elements within the <code class="ph codeph">ARRAY</code> or
+        <code class="ph codeph">MAP</code> are of type <code class="ph codeph">STRUCT</code> rather than a scalar type, then the result set contains columns with names
+        corresponding to the <code class="ph codeph">STRUCT</code> fields rather than <code class="ph codeph">ITEM</code> or <code class="ph codeph">VALUE</code>.
+      </p>
+
+
+
+      <p class="p">
+        You write most queries that process complex type columns using familiar join syntax, even though the data for both sides of the join
+        resides in a single table. The join notation brings together the scalar values from a row with the values from the complex type
+        columns for that same row. The final result set contains all scalar values, allowing you to do all the familiar filtering,
+        aggregation, ordering, and so on for the complex data entirely in SQL or using business intelligence tools that issue SQL queries.
+
+      </p>
+
+      <p class="p">
+        Behind the scenes, Impala ensures that the processing for each row is done efficiently on a single host, without the network traffic
+        involved in broadcast or shuffle joins. The most common type of join query for tables with complex type columns is <code class="ph codeph">INNER
+        JOIN</code>, which returns results only in those cases where the complex type contains some elements. Therefore, most query
+        examples in this section use either the <code class="ph codeph">INNER JOIN</code> clause or the equivalent comma notation.
+      </p>
+
+      <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+        <p class="p">
+          Although Impala can query complex types that are present in Parquet files, Impala currently cannot create new Parquet files
+          containing complex types. Therefore, the discussion and examples presume that you are working with existing Parquet data produced
+          through Hive, Spark, or some other source. See <a class="xref" href="#complex_types_ex_hive_etl">Constructing Parquet Files with Complex Columns Using Hive</a> for examples of constructing Parquet data
+          files with complex type columns.
+        </p>
+
+        <p class="p">
+          For learning purposes, you can create empty tables with complex type columns and practice query syntax, even if you do not have
+          sample data with the required structure.
+        </p>
+      </div>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="complex_types__complex_types_design">
+
+    <h2 class="title topictitle2" id="ariaid-title4">Design Considerations for Complex Types</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        When planning to use Impala complex types, and designing the Impala schema, first learn how this kind of schema differs from
+        traditional table layouts from the relational database and data warehousing fields. Because you might have already encountered
+        complex types in a Hadoop context while using Hive for ETL, also learn how to write high-performance analytic queries for complex
+        type data using Impala SQL syntax.
+      </p>
+
+      <p class="p toc inpage"></p>
+
+    </div>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title5" id="complex_types_design__complex_types_vs_rdbms">
+
+      <h3 class="title topictitle3" id="ariaid-title5">How Complex Types Differ from Traditional Data Warehouse Schemas</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          Complex types let you associate arbitrary data structures with a particular row. If you are familiar with schema design for
+          relational database management systems or data warehouses, a schema with complex types has the following differences:
+        </p>
+
+        <ul class="ul">
+          <li class="li">
+            <p class="p">
+              Logically, related values can now be grouped tightly together in the same table.
+            </p>
+
+            <p class="p">
+              In traditional data warehousing, related values were typically arranged in one of two ways:
+            </p>
+            <ul class="ul">
+              <li class="li">
+                <p class="p">
+                  Split across multiple normalized tables. Foreign key columns specified which rows from each table were associated with
+                  each other. This arrangement avoided duplicate data and therefore the data was compact, but join queries could be
+                  expensive because the related data had to be retrieved from separate locations. (In the case of distributed Hadoop
+                  queries, the joined tables might even be transmitted between different hosts in a cluster.)
+                </p>
+              </li>
+
+              <li class="li">
+                <p class="p">
+                  Flattened into a single denormalized table. Although this layout eliminated some potential performance issues by removing
+                  the need for join queries, the table typically became larger because values were repeated. The extra data volume could
+                  cause performance issues in other parts of the workflow, such as longer ETL cycles or more expensive full-table scans
+                  during queries.
+                </p>
+              </li>
+            </ul>
+            <p class="p">
+              Complex types represent a middle ground that addresses these performance and volume concerns. By physically locating related
+              data within the same data files, complex types increase locality and reduce the expense of join queries. By associating an
+              arbitrary amount of data with a single row, complex types avoid the need to repeat lengthy values such as strings. Because
+              Impala knows which complex type values are associated with each row, you can save storage by avoiding artificial foreign key
+              values that are only used for joins. The flexibility of the <code class="ph codeph">STRUCT</code>, <code class="ph codeph">ARRAY</code>, and
+              <code class="ph codeph">MAP</code> types lets you model familiar constructs such as fact and dimension tables from a data warehouse, and
+              wide tables representing sparse matrixes.
+            </p>
+          </li>
+        </ul>
+
+      </div>
+
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title6" id="complex_types_design__complex_types_physical">
+
+      <h3 class="title topictitle3" id="ariaid-title6">Physical Storage for Complex Types</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          Physically, the scalar and complex columns in each row are located adjacent to each other in the same Parquet data file, ensuring
+          that they are processed on the same host rather than being broadcast across the network when cross-referenced within a query. This
+          co-location simplifies the process of copying, converting, and backing all the columns up at once. Because of the column-oriented
+          layout of Parquet files, you can still query only the scalar columns of a table without imposing the I/O penalty of reading the
+          (possibly large) values of the composite columns.
+        </p>
+
+        <p class="p">
+          Within each Parquet data file, the constituent parts of complex type columns are stored in column-oriented format:
+        </p>
+
+        <ul class="ul">
+          <li class="li">
+            <p class="p">
+              Each field of a <code class="ph codeph">STRUCT</code> type is stored like a column, with all the scalar values adjacent to each other and
+              encoded, compressed, and so on using the Parquet space-saving techniques.
+            </p>
+          </li>
+
+          <li class="li">
+            <p class="p">
+              For an <code class="ph codeph">ARRAY</code> containing scalar values, all those values (represented by the <code class="ph codeph">ITEM</code>
+              pseudocolumn) are stored adjacent to each other.
+            </p>
+          </li>
+
+          <li class="li">
+            <p class="p">
+              For a <code class="ph codeph">MAP</code>, the values of the <code class="ph codeph">KEY</code> pseudocolumn are stored adjacent to each other. If the
+              <code class="ph codeph">VALUE</code> pseudocolumn is a scalar type, its values are also stored adjacent to each other.
+            </p>
+          </li>
+
+          <li class="li">
+            <p class="p">
+              If an <code class="ph codeph">ARRAY</code> element, <code class="ph codeph">STRUCT</code> field, or <code class="ph codeph">MAP</code> <code class="ph codeph">VALUE</code> part is
+              another complex type, the column-oriented storage applies to the next level down (or the next level after that, and so on for
+              deeply nested types) where the final elements, fields, or values are of scalar types.
+            </p>
+          </li>
+        </ul>
+
+        <p class="p">
+          The numbers represented by the <code class="ph codeph">POS</code> pseudocolumn of an <code class="ph codeph">ARRAY</code> are not physically stored in the
+          data files. They are synthesized at query time based on the order of the <code class="ph codeph">ARRAY</code> elements associated with each row.
+        </p>
+
+      </div>
+
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title7" id="complex_types_design__complex_types_file_formats">
+
+      <h3 class="title topictitle3" id="ariaid-title7">File Format Support for Impala Complex Types</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          Currently, Impala queries support complex type data only in the Parquet file format. See <a class="xref" href="impala_parquet.html#parquet">Using the Parquet File Format with Impala Tables</a>
+          for details about the performance benefits and physical layout of this file format.
+        </p>
+
+        <p class="p">
+          Each table, or each partition within a table, can have a separate file format, and you can change file format at the table or
+          partition level through an <code class="ph codeph">ALTER TABLE</code> statement. Because this flexibility makes it difficult to guarantee ahead
+          of time that all the data files for a table or partition are in a compatible format, Impala does not throw any errors when you
+          change the file format for a table or partition using <code class="ph codeph">ALTER TABLE</code>. Any errors come at runtime when Impala
+          actually processes a table or partition that contains nested types and is not in one of the supported formats. If a query on a
+          partitioned table only processes some partitions, and all those partitions are in one of the supported formats, the query
+          succeeds.
+        </p>
+
+        <p class="p">
+          Because Impala does not parse the data structures containing nested types for unsupported formats such as text, Avro,
+          SequenceFile, or RCFile, you cannot use data files in these formats with Impala, even if the query does not refer to the nested
+          type columns. Also, if a table using an unsupported format originally contained nested type columns, and then those columns were
+          dropped from the table using <code class="ph codeph">ALTER TABLE ... DROP COLUMN</code>, any existing data files in the table still contain the
+          nested type data and Impala queries on that table will generate errors.
+        </p>
+
+        <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+          <p class="p">
+            The one exception to the preceding rule is <code class="ph codeph">COUNT(*)</code> queries on RCFile tables that include complex types.
+            Such queries are allowed in <span class="keyword">Impala 2.6</span> and higher.
+          </p>
+        </div>
+
+        <p class="p">
+          You can perform DDL operations (even <code class="ph codeph">CREATE TABLE</code>) for tables involving complex types in file formats other than
+          Parquet. The DDL support lets you set up intermediate tables in your ETL pipeline, to be populated by Hive, before the final stage
+          where the data resides in a Parquet table and is queryable by Impala. Also, you can have a partitioned table with complex type
+          columns that uses a non-Parquet format, and use <code class="ph codeph">ALTER TABLE</code> to change the file format to Parquet for individual
+          partitions. When you put Parquet data files into those partitions, Impala can execute queries against that data as long as the
+          query does not involve any of the non-Parquet partitions.
+        </p>
+
+        <p class="p">
+          If you use the <span class="keyword cmdname">parquet-tools</span> command to examine the structure of a Parquet data file that includes complex
+          types, you see that both <code class="ph codeph">ARRAY</code> and <code class="ph codeph">MAP</code> are represented as a <code class="ph codeph">Bag</code> in Parquet
+          terminology, with all fields marked <code class="ph codeph">Optional</code> because Impala allows any column to be nullable.
+        </p>
+
+        <p class="p">
+          Impala supports either 2-level and 3-level encoding within each Parquet data file. When constructing Parquet data files outside
+          Impala, use either encoding style but do not mix 2-level and 3-level encoding within the same data file.
+        </p>
+
+      </div>
+
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title8" id="complex_types_design__complex_types_vs_normalization">
+
+      <h3 class="title topictitle3" id="ariaid-title8">Choosing Between Complex Types and Normalized Tables</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          Choosing between multiple normalized fact and dimension tables, or a single table containing complex types, is an important design
+          decision.
+        </p>
+
+        <ul class="ul">
+          <li class="li">
+            <p class="p">
+              If you are coming from a traditional database or data warehousing background, you might be familiar with how to split up data
+              between tables. Your business intelligence tools might already be optimized for dealing with this kind of multi-table scenario
+              through join queries.
+            </p>
+          </li>
+
+          <li class="li">
+            <p class="p">
+              If you are pulling data from Impala into an application written in a programming language that has data structures analogous
+              to the complex types, such as Python or Java, complex types in Impala could simplify data interchange and improve
+              understandability and reliability of your program logic.
+            </p>
+          </li>
+
+          <li class="li">
+            <p class="p">
+              You might already be faced with existing infrastructure or receive high volumes of data that assume one layout or the other.
+              For example, complex types are popular with web-oriented applications, for example to keep information about an online user
+              all in one place for convenient lookup and analysis, or to deal with sparse or constantly evolving data fields.
+            </p>
+          </li>
+
+          <li class="li">
+            <p class="p">
+              If some parts of the data change over time while related data remains constant, using multiple normalized tables lets you
+              replace certain parts of the data without reloading the entire data set. Conversely, if you receive related data all bundled
+              together, such as in JSON files, using complex types can save the overhead of splitting the related items across multiple
+              tables.
+            </p>
+          </li>
+
+          <li class="li">
+            <p class="p">
+              From a performance perspective:
+            </p>
+            <ul class="ul">
+              <li class="li">
+                <p class="p">
+                  In Parquet tables, Impala can skip columns that are not referenced in a query, avoiding the I/O penalty of reading the
+                  embedded data. When complex types are nested within a column, the data is physically divided at a very granular level; for
+                  example, a query referring to data nested multiple levels deep in a complex type column does not have to read all the data
+                  from that column, only the data for the relevant parts of the column type hierarchy.
+
+                </p>
+              </li>
+
+              <li class="li">
+                <p class="p">
+                  Complex types avoid the possibility of expensive join queries when data from fact and dimension tables is processed in
+                  parallel across multiple hosts. All the information for a row containing complex types is typically to be in the same data
+                  block, and therefore does not need to be transmitted across the network when joining fields that are all part of the same
+                  row.
+                </p>
+              </li>
+
+              <li class="li">
+                <p class="p">
+                  The tradeoff with complex types is that fewer rows fit in each data block. Whether it is better to have more data blocks
+                  with fewer rows, or fewer data blocks with many rows, depends on the distribution of your data and the characteristics of
+                  your query workload. If the complex columns are rarely referenced, using them might lower efficiency. If you are seeing
+                  low parallelism due to a small volume of data (relatively few data blocks) in each table partition, increasing the row
+                  size by including complex columns might produce more data blocks and thus spread the work more evenly across the cluster.
+                  See <a class="xref" href="impala_scalability.html#scalability">Scalability Considerations for Impala</a> for more on this advanced topic.
+                </p>
+              </li>
+            </ul>
+          </li>
+        </ul>
+
+      </div>
+
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title9" id="complex_types_design__complex_types_hive">
+
+      <h3 class="title topictitle3" id="ariaid-title9">Differences Between Impala and Hive Complex Types</h3>
+
+      <div class="body conbody">
+
+
+
+
+
+
+
+        <p class="p">
+          Impala can query Parquet tables containing <code class="ph codeph">ARRAY</code>, <code class="ph codeph">STRUCT</code>, and <code class="ph codeph">MAP</code> columns
+          produced by Hive. There are some differences to be aware of between the Impala SQL and HiveQL syntax for complex types, primarily
+          for queries.
+        </p>
+
+        <p class="p">
+          The syntax for specifying <code class="ph codeph">ARRAY</code>, <code class="ph codeph">STRUCT</code>, and <code class="ph codeph">MAP</code> types in a <code class="ph codeph">CREATE
+          TABLE</code> statement is compatible between Impala and Hive.
+        </p>
+
+        <p class="p">
+          Because Impala <code class="ph codeph">STRUCT</code> columns include user-specified field names, you use the <code class="ph codeph">NAMED_STRUCT()</code>
+          constructor in Hive rather than the <code class="ph codeph">STRUCT()</code> constructor when you populate an Impala <code class="ph codeph">STRUCT</code>
+          column using a Hive <code class="ph codeph">INSERT</code> statement.
+        </p>
+
+        <p class="p">
+          The Hive <code class="ph codeph">UNION</code> type is not currently supported in Impala.
+        </p>
+
+        <p class="p">
+          While Impala usually aims for a high degree of compatibility with HiveQL query syntax, Impala syntax differs from Hive for queries
+          involving complex types. The differences are intended to provide extra flexibility for queries involving these kinds of tables.
+        </p>
+
+        <ul class="ul">
+          <li class="li">
+            Impala uses dot notation for referring to element names or elements within complex types, and join notation for
+            cross-referencing scalar columns with the elements of complex types within the same row, rather than the <code class="ph codeph">LATERAL
+            VIEW</code> clause and <code class="ph codeph">EXPLODE()</code> function of HiveQL.
+          </li>
+
+          <li class="li">
+            Using join notation lets you use all the kinds of join queries with complex type columns. For example, you can use a
+            <code class="ph codeph">LEFT OUTER JOIN</code>, <code class="ph codeph">LEFT ANTI JOIN</code>, or <code class="ph codeph">LEFT SEMI JOIN</code> query to evaluate
+            different scenarios where the complex columns do or do not contain any elements.
+          </li>
+
+          <li class="li">
+            You can include references to collection types inside subqueries and inline views. For example, you can construct a
+            <code class="ph codeph">FROM</code> clause where one of the <span class="q">"tables"</span> is a subquery against a complex type column, or use a subquery
+            against a complex type column as the argument to an <code class="ph codeph">IN</code> or <code class="ph codeph">EXISTS</code> clause.
+          </li>
+
+          <li class="li">
+            The Impala pseudocolumn <code class="ph codeph">POS</code> lets you retrieve the position of elements in an array along with the elements
+            themselves, equivalent to the <code class="ph codeph">POSEXPLODE()</code> function of HiveQL. You do not use index notation to retrieve a
+            single array element in a query; the join query loops through the array elements and you use <code class="ph codeph">WHERE</code> clauses to
+            specify which elements to return.
+          </li>
+
+          <li class="li">
+            <p class="p">
+              Join clauses involving complex type columns do not require an <code class="ph codeph">ON</code> or <code class="ph codeph">USING</code> clause. Impala
+              implicitly applies the join key so that the correct array entries or map elements are associated with the correct row from the
+              table.
+            </p>
+          </li>
+
+          <li class="li">
+            <p class="p">
+              Impala does not currently support the <code class="ph codeph">UNION</code> complex type.
+            </p>
+          </li>
+        </ul>
+
+      </div>
+
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title10" id="complex_types_design__complex_types_limits">
+
+      <h3 class="title topictitle3" id="ariaid-title10">Limitations and Restrictions for Complex Types</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          Complex type columns can only be used in tables or partitions with the Parquet file format.
+        </p>
+
+        <p class="p">
+          Complex type columns cannot be used as partition key columns in a partitioned table.
+        </p>
+
+        <p class="p">
+          When you use complex types with the <code class="ph codeph">ORDER BY</code>, <code class="ph codeph">GROUP BY</code>, <code class="ph codeph">HAVING</code>, or
+          <code class="ph codeph">WHERE</code> clauses, you cannot refer to the column name by itself. Instead, you refer to the names of the scalar
+          values within the complex type, such as the <code class="ph codeph">ITEM</code>, <code class="ph codeph">POS</code>, <code class="ph codeph">KEY</code>, or
+          <code class="ph codeph">VALUE</code> pseudocolumns, or the field names from a <code class="ph codeph">STRUCT</code>.
+        </p>
+
+        <p class="p">
+          The maximum depth of nesting for complex types is 100 levels.
+        </p>
+
+        <p class="p">
+            The maximum length of the column definition for any complex type, including declarations for any nested types,
+            is 4000 characters.
+          </p>
+
+        <p class="p">
+          For ideal performance and scalability, use small or medium-sized collections, where all the complex columns contain at most a few
+          hundred megabytes per row. Remember, all the columns of a row are stored in the same HDFS data block, whose size in Parquet files
+          typically ranges from 256 MB to 1 GB.
+        </p>
+
+        <p class="p">
+          Including complex type columns in a table introduces some overhead that might make queries that do not reference those columns
+          somewhat slower than Impala queries against tables without any complex type columns. Expect at most a 2x slowdown compared to
+          tables that do not have any complex type columns.
+        </p>
+
+        <p class="p">
+          Currently, the <code class="ph codeph">COMPUTE STATS</code> statement does not collect any statistics for columns containing complex types.
+          Impala uses heuristics to construct execution plans involving complex type columns.
+        </p>
+
+        <p class="p">
+          Currently, Impala built-in functions and user-defined functions cannot accept complex types as parameters or produce them as
+          function return values. (When the complex type values are materialized in an Impala result set, the result set contains the scalar
+          components of the values, such as the <code class="ph codeph">POS</code> or <code class="ph codeph">ITEM</code> for an <code class="ph codeph">ARRAY</code>, the
+          <code class="ph codeph">KEY</code> or <code class="ph codeph">VALUE</code> for a <code class="ph codeph">MAP</code>, or the fields of a <code class="ph codeph">STRUCT</code>; these
+          scalar data items <em class="ph i">can</em> be used with built-in functions and UDFs as usual.)
+        </p>
+
+        <p class="p">
+        Impala currently cannot write new data files containing complex type columns.
+        Therefore, although the <code class="ph codeph">SELECT</code> statement works for queries
+        involving complex type columns, you cannot use a statement form that writes
+        data to complex type columns, such as <code class="ph codeph">CREATE TABLE AS SELECT</code> or <code class="ph codeph">INSERT ... SELECT</code>.
+        To create data files containing complex type data, use the Hive <code class="ph codeph">INSERT</code> statement, or another
+        ETL mechanism such as MapReduce jobs, Spark jobs, Pig, and so on.
+      </p>
+
+        <p class="p">
+          Currently, Impala can query complex type columns only from Parquet tables or Parquet partitions within partitioned tables.
+          Although you can use complex types in tables with Avro, text, and other file formats as part of your ETL pipeline, for example as
+          intermediate tables populated through Hive, doing analytics through Impala requires that the data eventually ends up in a Parquet
+          table. The requirement for Parquet data files means that you can use complex types with Impala tables hosted on other kinds of
+          file storage systems such as Isilon and Amazon S3, but you cannot use Impala to query complex types from HBase tables. See
+          <a class="xref" href="impala_complex_types.html#complex_types_file_formats">File Format Support for Impala Complex Types</a> for more details.
+        </p>
+
+      </div>
+
+    </article>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title11" id="complex_types__complex_types_using">
+
+    <h2 class="title topictitle2" id="ariaid-title11">Using Complex Types from SQL</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        When using complex types through SQL in Impala, you learn the notation for <code class="ph codeph">&lt; &gt;</code> delimiters for the complex
+        type columns in <code class="ph codeph">CREATE TABLE</code> statements, and how to construct join queries to <span class="q">"unpack"</span> the scalar values
+        nested inside the complex data structures. You might need to condense a traditional RDBMS or data warehouse schema into a smaller
+        number of Parquet tables, and use Hive, Spark, Pig, or other mechanism outside Impala to populate the tables with data.
+      </p>
+
+      <p class="p toc inpage"></p>
+
+    </div>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title12" id="complex_types_using__nested_types_ddl">
+
+      <h3 class="title topictitle3" id="ariaid-title12">Complex Type Syntax for DDL Statements</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          The definition of <var class="keyword varname">data_type</var>, as seen in the <code class="ph codeph">CREATE TABLE</code> and <code class="ph codeph">ALTER TABLE</code>
+          statements, now includes complex types in addition to primitive types:
+        </p>
+
+<pre class="pre codeblock"><code>  primitive_type
+| array_type
+| map_type
+| struct_type
+</code></pre>
+
+        <p class="p">
+          Unions are not currently supported.
+        </p>
+
+        <p class="p">
+          Array, struct, and map column type declarations are specified in the <code class="ph codeph">CREATE TABLE</code> statement. You can also add or
+          change the type of complex columns through the <code class="ph codeph">ALTER TABLE</code> statement.
+        </p>
+
+        <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+          <p class="p">
+            Currently, Impala queries allow complex types only in tables that use the Parquet format. If an Impala query encounters complex
+            types in a table or partition using another file format, the query returns a runtime error.
+          </p>
+
+          <p class="p">
+            The Impala DDL support for complex types works for all file formats, so that you can create tables using text or other
+            non-Parquet formats for Hive to use as staging tables in an ETL cycle that ends with the data in a Parquet table. You can also
+            use <code class="ph codeph">ALTER TABLE ... SET FILEFORMAT PARQUET</code> to change the file format of an existing table containing complex
+            types to Parquet, after which Impala can query it. Make sure to load Parquet files into the table after changing the file
+            format, because the <code class="ph codeph">ALTER TABLE ... SET FILEFORMAT</code> statement does not convert existing data to the new file
+            format.
+          </p>
+        </div>
+
+        <p class="p">
+        Partitioned tables can contain complex type columns.
+        All the partition key columns must be scalar types.
+      </p>
+
+        <p class="p">
+          Because use cases for Impala complex types require that you already have Parquet data files produced outside of Impala, you can
+          use the Impala <code class="ph codeph">CREATE TABLE LIKE PARQUET</code> syntax to produce a table with columns that match the structure of an
+          existing Parquet file, including complex type columns for nested data structures. Remember to include the <code class="ph codeph">STORED AS
+          PARQUET</code> clause in this case, because even with <code class="ph codeph">CREATE TABLE LIKE PARQUET</code>, the default file format of the
+          resulting table is still text.
+        </p>
+
+        <p class="p">
+          Because the complex columns are omitted from the result set of an Impala <code class="ph codeph">SELECT *</code> or <code class="ph codeph">SELECT
+          <var class="keyword varname">col_name</var></code> query, and because Impala currently does not support writing Parquet files with complex type
+          columns, you cannot use the <code class="ph codeph">CREATE TABLE AS SELECT</code> syntax to create a table with nested type columns.
+        </p>
+
+        <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+          <p class="p">
+            Once you have a table set up with complex type columns, use the <code class="ph codeph">DESCRIBE</code> and <code class="ph codeph">SHOW CREATE TABLE</code>
+            statements to see the correct notation with <code class="ph codeph">&lt;</code> and <code class="ph codeph">&gt;</code> delimiters and comma and colon
+            separators within the complex type definitions. If you do not have existing data with the same layout as the table, you can
+            query the empty table to practice with the notation for the <code class="ph codeph">SELECT</code> statement. In the <code class="ph codeph">SELECT</code>
+            list, you use dot notation and pseudocolumns such as <code class="ph codeph">ITEM</code>, <code class="ph codeph">KEY</code>, and <code class="ph codeph">VALUE</code> for
+            referring to items within the complex type columns. In the <code class="ph codeph">FROM</code> clause, you use join notation to construct
+            table aliases for any referenced <code class="ph codeph">ARRAY</code> and <code class="ph codeph">MAP</code> columns.
+          </p>
+        </div>
+
+
+
+        <p class="p">
+          For example, when defining a table that holds contact information, you might represent phone numbers differently depending on the
+          expected layout and relationships of the data, and how well you can predict those properties in advance.
+        </p>
+
+        <p class="p">
+          Here are different ways that you might represent phone numbers in a traditional relational schema, with equivalent representations
+          using complex types.
+        </p>
+
+        <figure class="fig fignone" id="nested_types_ddl__complex_types_phones_flat_fixed"><figcaption><span class="fig--title-label">Figure 1. </span>Traditional Relational Representation of Phone Numbers: Single Table</figcaption>
+
+
+
+          <p class="p">
+            The traditional, simplest way to represent phone numbers in a relational table is to store all contact info in a single table,
+            with all columns having scalar types, and each potential phone number represented as a separate column. In this example, each
+            person can only have these 3 types of phone numbers. If the person does not have a particular kind of phone number, the
+            corresponding column is <code class="ph codeph">NULL</code> for that row.
+          </p>
+
+<pre class="pre codeblock"><code>
+CREATE TABLE contacts_fixed_phones
+(
+    id BIGINT
+  , name STRING
+  , address STRING
+  , home_phone STRING
+  , work_phone STRING
+  , mobile_phone STRING
+) STORED AS PARQUET;
+</code></pre>
+
+        </figure>
+
+        <figure class="fig fignone" id="nested_types_ddl__complex_types_phones_array"><figcaption><span class="fig--title-label">Figure 2. </span>An Array of Phone Numbers</figcaption>
+
+
+
+          <p class="p">
+            Using a complex type column to represent the phone numbers adds some extra flexibility. Now there could be an unlimited number
+            of phone numbers. Because the array elements have an order but not symbolic names, you could decide in advance that
+            phone_number[0] is the home number, [1] is the work number, [2] is the mobile number, and so on. (In subsequent examples, you
+            will see how to create a more flexible naming scheme using other complex type variations, such as a <code class="ph codeph">MAP</code> or an
+            <code class="ph codeph">ARRAY</code> where each element is a <code class="ph codeph">STRUCT</code>.)
+          </p>
+
+<pre class="pre codeblock"><code>
+CREATE TABLE contacts_array_of_phones
+(
+    id BIGINT
+  , name STRING
+  , address STRING
+  , phone_number ARRAY &lt; STRING &gt;
+) STORED AS PARQUET;
+
+</code></pre>
+
+        </figure>
+
+        <figure class="fig fignone" id="nested_types_ddl__complex_types_phones_map"><figcaption><span class="fig--title-label">Figure 3. </span>A Map of Phone Numbers</figcaption>
+
+
+
+          <p class="p">
+            Another way to represent an arbitrary set of phone numbers is with a <code class="ph codeph">MAP</code> column. With a <code class="ph codeph">MAP</code>,
+            each element is associated with a key value that you specify, which could be a numeric, string, or other scalar type. This
+            example uses a <code class="ph codeph">STRING</code> key to give each phone number a name, such as <code class="ph codeph">'home'</code> or
+            <code class="ph codeph">'mobile'</code>. A query could filter the data based on the key values, or display the key values in reports.
+          </p>
+
+<pre class="pre codeblock"><code>
+CREATE TABLE contacts_unlimited_phones
+(
+  id BIGINT, name STRING, address STRING, phone_number MAP &lt; STRING,STRING &gt;
+) STORED AS PARQUET;
+
+</code></pre>
+
+        </figure>
+
+        <figure class="fig fignone" id="nested_types_ddl__complex_types_phones_flat_normalized"><figcaption><span class="fig--title-label">Figure 4. </span>Traditional Relational Representation of Phone Numbers: Normalized Tables</figcaption>
+
+
+
+          <p class="p">
+            If you are an experienced database designer, you already know how to work around the limitations of the single-table schema from
+            <a class="xref" href="#nested_types_ddl__complex_types_phones_flat_fixed">Figure 1</a>. By normalizing the schema, with the phone numbers in their own
+            table, you can associate an arbitrary set of phone numbers with each person, and associate additional details with each phone
+            number, such as whether it is a home, work, or mobile phone.
+          </p>
+
+          <p class="p">
+            The flexibility of this approach comes with some drawbacks. Reconstructing all the data for a particular person requires a join
+            query, which might require performance tuning on Hadoop because the data from each table might be transmitted from a different
+            host. Data management tasks such as backups and refreshing the data require dealing with multiple tables instead of a single
+            table.
+          </p>
+
+          <p class="p">
+            This example illustrates a traditional database schema to store contact info normalized across 2 tables. The fact table
+            establishes the identity and basic information about person. A dimension table stores information only about phone numbers,
+            using an ID value to associate each phone number with a person ID from the fact table. Each person can have 0, 1, or many
+            phones; the categories are not restricted to a few predefined ones; and the phone table can contain as many columns as desired,
+            to represent all sorts of details about each phone number.
+          </p>
+
+<pre class="pre codeblock"><code>
+CREATE TABLE fact_contacts (id BIGINT, name STRING, address STRING) STORED AS PARQUET;
+CREATE TABLE dim_phones
+(
+    contact_id BIGINT
+  , category STRING
+  , international_code STRING
+  , area_code STRING
+  , exchange STRING
+  , extension STRING
+  , mobile BOOLEAN
+  , carrier STRING
+  , current BOOLEAN
+  , service_start_date TIMESTAMP
+  , service_end_date TIMESTAMP
+)
+STORED AS PARQUET;
+</code></pre>
+
+        </figure>
+
+        <figure class="fig fignone" id="nested_types_ddl__complex_types_phones_array_struct"><figcaption><span class="fig--title-label">Figure 5. </span>Phone Numbers Represented as an Array of Structs</figcaption>
+
+
+
+          <p class="p">
+            To represent a schema equivalent to the one from <a class="xref" href="#nested_types_ddl__complex_types_phones_flat_normalized">Figure 4</a> using
+            complex types, this example uses an <code class="ph codeph">ARRAY</code> where each array element is a <code class="ph codeph">STRUCT</code>. As with the
+            earlier complex type examples, each person can have an arbitrary set of associated phone numbers. Making each array element into
+            a <code class="ph codeph">STRUCT</code> lets us associate multiple data items with each phone number, and give a separate name and type to
+            each data item. The <code class="ph codeph">STRUCT</code> fields of the <code class="ph codeph">ARRAY</code> elements reproduce the columns of the dimension
+            table from the previous example.
+          </p>
+
+          <p class="p">
+            You can do all the same kinds of queries with the complex type schema as with the normalized schema from the previous example.
+            The advantages of the complex type design are in the areas of convenience and performance. Now your backup and ETL processes
+            only deal with a single table. When a query uses a join to cross-reference the information about a person with their associated
+            phone numbers, all the relevant data for each row resides in the same HDFS data block, meaning each row can be processed on a
+            single host without requiring network transmission.
+          </p>
+
+<pre class="pre codeblock"><code>
+CREATE TABLE contacts_detailed_phones
+(
+  id BIGINT, name STRING, address STRING
+    , phone ARRAY &lt; STRUCT &lt;
+        category: STRING
+      , international_code: STRING
+      , area_code: STRING
+      , exchange: STRING
+      , extension: STRING
+      , mobile: BOOLEAN
+      , carrier: STRING
+      , current: BOOLEAN
+      , service_start_date: TIMESTAMP
+      , service_end_date: TIMESTAMP
+    &gt;&gt;
+) STORED AS PARQUET;
+
+</code></pre>
+
+        </figure>
+
+      </div>
+
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title13" id="complex_types_using__complex_types_sql">
+
+      <h3 class="title topictitle3" id="ariaid-title13">SQL Statements that Support Complex Types</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          The Impala SQL statements that support complex types are currently
+          <code class="ph codeph"><a class="xref" href="impala_create_table.html#create_table">CREATE TABLE</a></code>,
+          <code class="ph codeph"><a class="xref" href="impala_alter_table.html#alter_table">ALTER TABLE</a></code>,
+          <code class="ph codeph"><a class="xref" href="impala_describe.html#describe">DESCRIBE</a></code>,
+          <code class="ph codeph"><a class="xref" href="impala_load_data.html#load_data">LOAD DATA</a></code>, and
+          <code class="ph codeph"><a class="xref" href="impala_select.html#select">SELECT</a></code>. That is, currently Impala can create or alter tables
+          containing complex type columns, examine the structure of a table containing complex type columns, import existing data files
+          containing complex type columns into a table, and query Parquet tables containing complex types.
+        </p>
+
+        <p class="p">
+        Impala currently cannot write new data files containing complex type columns.
+        Therefore, although the <code class="ph codeph">SELECT</code> statement works for queries
+        involving complex type columns, you cannot use a statement form that writes
+        data to complex type columns, such as <code class="ph codeph">CREATE TABLE AS SELECT</code> or <code class="ph codeph">INSERT ... SELECT</code>.
+        To create data files containing complex type data, use the Hive <code class="ph codeph">INSERT</code> statement, or another
+        ETL mechanism such as MapReduce jobs, Spark jobs, Pig, and so on.
+      </p>
+
+        <p class="p toc inpage"></p>
+
+      </div>
+
+      <article class="topic concept nested3" aria-labelledby="ariaid-title14" id="complex_types_sql__complex_types_ddl">
+
+        <h4 class="title topictitle4" id="ariaid-title14">DDL Statements and Complex Types</h4>
+
+        <div class="body conbody">
+
+          <p class="p">
+            Column specifications for complex or nested types use <code class="ph codeph">&lt;</code> and <code class="ph codeph">&gt;</code> delimiters:
+          </p>
+
+<pre class="pre codeblock"><code>-- What goes inside the &lt; &gt; for an ARRAY is a single type, either a scalar or another
+-- complex type (ARRAY, STRUCT, or MAP).
+CREATE TABLE array_t
+(
+  id BIGINT,
+  a1 ARRAY &lt;STRING&gt;,
+  a2 ARRAY &lt;BIGINT&gt;,
+  a3 ARRAY &lt;TIMESTAMP&gt;,
+  a4 ARRAY &lt;STRUCT &lt;f1: STRING, f2: INT, f3: BOOLEAN&gt;&gt;
+)
+STORED AS PARQUET;
+
+-- What goes inside the &lt; &gt; for a MAP is two comma-separated types specifying the types of the key-value pair:
+-- a scalar type representing the key, and a scalar or complex type representing the value.
+CREATE TABLE map_t
+(
+  id BIGINT,
+  m1 MAP &lt;STRING, STRING&gt;,
+  m2 MAP &lt;STRING, BIGINT&gt;,
+  m3 MAP &lt;BIGINT, STRING&gt;,
+  m4 MAP &lt;BIGINT, BIGINT&gt;,
+  m5 MAP &lt;STRING, ARRAY &lt;STRING&gt;&gt;
+)
+STORED AS PARQUET;
+
+-- What goes inside the &lt; &gt; for a STRUCT is a comma-separated list of fields, each field defined as
+-- name:type. The type can be a scalar or a complex type. The field names for each STRUCT do not clash
+-- with the names of table columns or fields in other STRUCTs. A STRUCT is most often used inside
+-- an ARRAY or a MAP rather than as a top-level column.
+CREATE TABLE struct_t
+(
+  id BIGINT,
+  s1 STRUCT &lt;f1: STRING, f2: BIGINT&gt;,
+  s2 ARRAY &lt;STRUCT &lt;f1: INT, f2: TIMESTAMP&gt;&gt;,
+  s3 MAP &lt;BIGINT, STRUCT &lt;name: STRING, birthday: TIMESTAMP&gt;&gt;
+)
+STORED AS PARQUET;
+
+</code></pre>
+
+        </div>
+
+      </article>
+
+      <article class="topic concept nested3" aria-labelledby="ariaid-title15" id="complex_types_sql__complex_types_queries">
+
+        <h4 class="title topictitle4" id="ariaid-title15">Queries and Complex Types</h4>
+
+        <div class="body conbody">
+
+
+
+
+
+          <p class="p">
+            The result set of an Impala query always contains all scalar types; the elements and fields within any complex type queries must
+            be <span class="q">"unpacked"</span> using join queries. A query cannot directly retrieve the entire value for a complex type column. Impala
+            returns an error in this case. Queries using <code class="ph codeph">SELECT *</code> are allowed for tables with complex types, but the
+            columns with complex types are skipped.
+          </p>
+
+          <p class="p">
+            The following example shows how referring directly to a complex type column returns an error, while <code class="ph codeph">SELECT *</code> on
+            the same table succeeds, but only retrieves the scalar columns.
+          </p>
+
+          <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+      Many of the complex type examples refer to tables
+      such as <code class="ph codeph">CUSTOMER</code> and <code class="ph codeph">REGION</code>
+      adapted from the tables used in the TPC-H benchmark.
+      See <a class="xref" href="../shared/../topics/impala_complex_types.html#complex_sample_schema">Sample Schema and Data for Experimenting with Impala Complex Types</a>
+      for the table definitions.
+      </div>
+
+
+
+<pre class="pre codeblock"><code>SELECT c_orders FROM customer LIMIT 1;
+ERROR: AnalysisException: Expr 'c_orders' in select list returns a complex type 'ARRAY&lt;STRUCT&lt;o_orderkey:BIGINT,o_orderstatus:STRING, ... l_receiptdate:STRING,l_shipinstruct:STRING,l_shipmode:STRING,l_comment:STRING&gt;&gt;&gt;&gt;'.
+Only scalar types are allowed in the select list.
+
+-- Original column has several scalar and one complex column.
+DESCRIBE customer;
++--------------+------------------------------------+
+| name         | type                               |
++--------------+------------------------------------+
+| c_custkey    | bigint                             |
+| c_name       | string                             |
+...
+| c_orders     | array&lt;struct&lt;                      |
+|              |   o_orderkey:bigint,               |
+|              |   o_orderstatus:string,            |
+|              |   o_totalprice:decimal(12,2),      |
+...
+|              | &gt;&gt;                                 |
++--------------+------------------------------------+
+
+-- When we SELECT * from that table, only the scalar columns come back in the result set.
+CREATE TABLE select_star_customer STORED AS PARQUET AS SELECT * FROM customer;
++------------------------+
+| summary                |
++------------------------+
+| Inserted 150000 row(s) |
++------------------------+
+
+-- The c_orders column, being of complex type, was not included in the SELECT * result set.
+DESC select_star_customer;
++--------------+---------------+
+| name         | type          |
++--------------+---------------+
+| c_custkey    | bigint        |
+| c_name       | string        |
+| c_address    | string        |
+| c_nationkey  | smallint      |
+| c_phone      | string        |
+| c_acctbal    | decimal(12,2) |
+| c_mktsegment | string        |
+| c_comment    | string        |
++--------------+---------------+
+
+</code></pre>
+
+
+
+          <p class="p">
+            References to fields within <code class="ph codeph">STRUCT</code> columns use dot notation. If the field name is unambiguous, you can omit
+            qualifiers such as table name, column name, or even the <code class="ph codeph">ITEM</code> or <code class="ph codeph">VALUE</code> pseudocolumn names for
+            <code class="ph codeph">STRUCT</code> elements inside an <code class="ph codeph">ARRAY</code> or a <code class="ph codeph">MAP</code>.
+          </p>
+
+
+
+
+
+
+
+<pre class="pre codeblock"><code>SELECT id, address.city FROM customers WHERE address.zip = 94305;
+</code></pre>
+
+          <p class="p">
+            References to elements within <code class="ph codeph">ARRAY</code> columns use the <code class="ph codeph">ITEM</code> pseudocolumn:
+          </p>
+
+
+
+<pre class="pre codeblock"><code>select r_name, r_nations.item.n_name from region, region.r_nations limit 7;
++--------+----------------+
+| r_name | item.n_name    |
++--------+----------------+
+| EUROPE | UNITED KINGDOM |
+| EUROPE | RUSSIA         |
+| EUROPE | ROMANIA        |
+| EUROPE | GERMANY        |
+| EUROPE | FRANCE         |
+| ASIA   | VIETNAM        |
+| ASIA   | CHINA          |
++--------+----------------+
+</code></pre>
+
+          <p class="p">
+            References to fields within <code class="ph codeph">MAP</code> columns use the <code class="ph codeph">KEY</code> and <code class="ph codeph">VALUE</code> pseudocolumns.
+            In this example, once the query establishes the alias <code class="ph codeph">MAP_FIELD</code> for a <code class="ph codeph">MAP</code> column with a
+            <code class="ph codeph">STRING</code> key and an <code class="ph codeph">INT</code> value, the query can refer to <code class="ph codeph">MAP_FIELD.KEY</code> and
+            <code class="ph codeph">MAP_FIELD.VALUE</code>, which have zero, one, or many instances for each row from the containing table.
+          </p>
+
+<pre class="pre codeblock"><code>DESCRIBE table_0;
++---------+-----------------------+
+| name    | type                  |
++---------+-----------------------+
+| field_0 | string                |
+| field_1 | map&lt;string,int&gt;       |
+...
+
+SELECT field_0, map_field.key, map_field.value
+  FROM table_0, table_0.field_1 AS map_field
+WHERE length(field_0) = 1
+LIMIT 10;
++---------+-----------+-------+
+| field_0 | key       | value |
++---------+-----------+-------+
+| b       | gshsgkvd  | NULL  |
+| b       | twrtcxj6  | 18    |
+| b       | 2vp5      | 39    |
+| b       | fh0s      | 13    |
+| v       | 2         | 41    |
+| v       | 8b58mz    | 20    |
+| v       | hw        | 16    |
+| v       | 65l388pyt | 29    |
+| v       | 03k68g91z | 30    |
+| v       | r2hlg5b   | NULL  |
++---------+-----------+-------+
+
+</code></pre>
+
+
+
+          <p class="p">
+            When complex types are nested inside each other, you use a combination of joins, pseudocolumn names, and dot notation to refer
+            to specific fields at the appropriate level. This is the most frequent form of query syntax for complex columns, because the
+            typical use case involves two levels of complex types, such as an <code class="ph codeph">ARRAY</code> of <code class="ph codeph">STRUCT</code> elements.
+          </p>
+
+
+
+
+
+<pre class="pre codeblock"><code>SELECT id, phone_numbers.area_code FROM contact_info_many_structs INNER JOIN contact_info_many_structs.phone_numbers phone_numbers LIMIT 3;
+</code></pre>
+
+          <p class="p">
+            You can express relationships between <code class="ph codeph">ARRAY</code> and <code class="ph codeph">MAP</code> columns at different levels as joins. You
+            include comparison operators between fields at the top level and within the nested type columns so that Impala can do the
+            appropriate join operation.
+          </p>
+
+
+
+
+
+
+
+
+
+          <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+      Many of the complex type examples refer to tables
+      such as <code class="ph codeph">CUSTOMER</code> and <code class="ph codeph">REGION</code>
+      adapted from the tables used in the TPC-H benchmark.
+      See <a class="xref" href="../shared/../topics/impala_complex_types.html#complex_sample_schema">Sample Schema and Data for Experimenting with Impala Complex Types</a>
+      for the table definitions.
+      </div>
+
+          <p class="p">
+            For example, the following queries work equivalently. They each return customer and order data for customers that have at least
+            one order.
+          </p>
+
+<pre class="pre codeblock"><code>SELECT c.c_name, o.o_orderkey FROM customer c, c.c_orders o LIMIT 5;
++--------------------+------------+
+| c_name             | o_orderkey |
++--------------------+------------+
+| Customer#000072578 | 558821     |
+| Customer#000072578 | 2079810    |
+| Customer#000072578 | 5768068    |
+| Customer#000072578 | 1805604    |
+| Customer#000072578 | 3436389    |
++--------------------+------------+
+
+SELECT c.c_name, o.o_orderkey FROM customer c INNER JOIN c.c_orders o LIMIT 5;
++--------------------+------------+
+| c_name             | o_orderkey |
++--------------------+------------+
+| Customer#000072578 | 558821     |
+| Customer#000072578 | 2079810    |
+| Customer#000072578 | 5768068    |
+| Customer#000072578 | 1805604    |
+| Customer#000072578 | 3436389    |
++--------------------+------------+
+</code></pre>
+
+          <p class="p">
+            The following query using an outer join returns customers that have orders, plus customers with no orders (no entries in the
+            <code class="ph codeph">C_ORDERS</code> array):
+          </p>
+
+<pre class="pre codeblock"><code>SELECT c.c_custkey, o.o_orderkey
+  FROM customer c LEFT OUTER JOIN c.c_orders o
+LIMIT 5;
++-----------+------------+
+| c_custkey | o_orderkey |
++-----------+------------+
+| 60210     | NULL       |
+| 147873    | NULL       |
+| 72578     | 558821     |
+| 72578     | 2079810    |
+| 72578     | 5768068    |
++-----------+------------+
+
+</code></pre>
+
+          <p class="p">
+            The following query returns <em class="ph i">only</em> customers that have no orders. (With <code class="ph codeph">LEFT ANTI JOIN</code> or <code class="ph codeph">LEFT
+            SEMI JOIN</code>, the query can only refer to columns from the left-hand table, because by definition there is no matching
+            information in the right-hand table.)
+          </p>
+
+<pre class="pre codeblock"><code>SELECT c.c_custkey, c.c_name
+  FROM customer c LEFT ANTI JOIN c.c_orders o
+LIMIT 5;
++-----------+--------------------+
+| c_custkey | c_name             |
++-----------+--------------------+
+| 60210     | Customer#000060210 |
+| 147873    | Customer#000147873 |
+| 141576    | Customer#000141576 |
+| 85365     | Customer#000085365 |
+| 70998     | Customer#000070998 |
++-----------+--------------------+
+
+</code></pre>
+
+
+
+          <p class="p">
+            You can also perform correlated subqueries to examine the properties of complex type columns for each row in the result set.
+          </p>
+
+          <p class="p">
+            Count the number of orders per customer. Note the correlated reference to the table alias <code class="ph codeph">C</code>. The
+            <code class="ph codeph">COUNT(*)</code> operation applies to all the elements of the <code class="ph codeph">C_ORDERS</code> array for the corresponding
+            row, avoiding the need for a <code class="ph codeph">GROUP BY</code> clause.
+          </p>
+
+<pre class="pre codeblock"><code>select c_name, howmany FROM customer c, (SELECT COUNT(*) howmany FROM c.c_orders) v limit 5;
++--------------------+---------+
+| c_name             | howmany |
++--------------------+---------+
+| Customer#000030065 | 15      |
+| Customer#000065455 | 18      |
+| Customer#000113644 | 21      |
+| Customer#000111078 | 0       |
+| Customer#000024621 | 0       |
++--------------------+---------+
+</code></pre>
+
+          <p class="p">
+            Count the number of orders per customer, ignoring any customers that have not placed any orders:
+          </p>
+
+<pre class="pre codeblock"><code>SELECT c_name, howmany_orders
+FROM
+  customer c,
+  (SELECT COUNT(*) howmany_orders FROM c.c_orders) subq1
+WHERE howmany_orders &gt; 0
+LIMIT 5;
++--------------------+----------------+
+| c_name             | howmany_orders |
++--------------------+----------------+
+| Customer#000072578 | 7              |
+| Customer#000046378 | 26             |
+| Customer#000069815 | 11             |
+| Customer#000079058 | 12             |
+| Customer#000092239 | 26             |
++--------------------+----------------+
+</code></pre>
+
+          <p class="p">
+            Count the number of line items in each order. The reference to <code class="ph codeph">C.C_ORDERS</code> in the <code class="ph codeph">FROM</code> clause
+            is needed because the <code class="ph codeph">O_ORDERKEY</code> field is a member of the elements in the <code class="ph codeph">C_ORDERS</code> array. The
+            subquery labelled <code class="ph codeph">SUBQ1</code> is correlated: it is re-evaluated for the <code class="ph codeph">C_ORDERS.O_LINEITEMS</code> array
+            from each row of the <code class="ph codeph">CUSTOMERS</code> table.
+          </p>
+
+<pre class="pre codeblock"><code>SELECT c_name, o_orderkey, howmany_line_items
+FROM
+  customer c,
+  c.c_orders t2,
+  (SELECT COUNT(*) howmany_line_items FROM c.c_orders.o_lineitems) subq1
+WHERE howmany_line_items &gt; 0
+LIMIT 5;
++--------------------+------------+--------------------+
+| c_name             | o_orderkey | howmany_line_items |
++--------------------+------------+--------------------+
+| Customer#000020890 | 1884930    | 95                 |
+| Customer#000020890 | 4570754    | 95                 |
+| Customer#000020890 | 3771072    | 95                 |
+| Customer#000020890 | 2555489    | 95                 |
+| Customer#000020890 | 919171     | 95                 |
++--------------------+------------+--------------------+
+</code></pre>
+
+          <p class="p">
+            Get the number of orders, the average order price, and the maximum items in any order per customer. For this example, the
+            subqueries labelled <code class="ph codeph">SUBQ1</code> and <code class="ph codeph">SUBQ2</code> are correlated: they are re-evaluated for each row from
+            the original <code class="ph codeph">CUSTOMER</code> table, and only apply to the complex columns associated with that row.
+          </p>
+
+<pre class="pre codeblock"><code>SELECT c_name, howmany, average_price, most_items
+FROM
+  customer c,
+  (SELECT COUNT(*) howmany, AVG(o_totalprice) average_price FROM c.c_orders) subq1,
+  (SELECT MAX(l_quantity) most_items FROM c.c_orders.o_lineitems ) subq2
+LIMIT 5;
++--------------------+---------+---------------+------------+
+| c_name             | howmany | average_price | most_items |
++--------------------+---------+---------------+------------+
+| Customer#000030065 | 15      | 128908.34     | 50.00      |
+| Customer#000088191 | 0       | NULL          | NULL       |
+| Customer#000101555 | 10      | 164250.31     | 50.00      |
+| Customer#000022092 | 0       | NULL          | NULL       |
+| Customer#000036277 | 27      | 166040.06     | 50.00      |
++--------------------+---------+---------------+------------+
+</code></pre>
+
+          <p class="p">
+            For example, these queries show how to access information about the <code class="ph codeph">ARRAY</code> elements within the
+            <code class="ph codeph">CUSTOMER</code> table from the <span class="q">"nested TPC-H"</span> schema, starting with the initial <code class="ph codeph">ARRAY</code> elements
+            and progressing to examine the <code class="ph codeph">STRUCT</code> fields of the <code class="ph codeph">ARRAY</code>, and then the elements nested within
+            another <code class="ph codeph">ARRAY</code> of <code class="ph codeph">STRUCT</code>:
+          </p>
+
+<pre class="pre codeblock"><code>-- How many orders does each customer have?
+-- The type of the ARRAY column doesn't matter, this is just counting the elements.
+SELECT c_custkey, count(*)
+  FROM customer, customer.c_orders
+GROUP BY c_custkey
+LIMIT 5;
++-----------+----------+
+| c_custkey | count(*) |
++-----------+----------+
+| 61081     | 21       |
+| 115987    | 15       |
+| 69685     | 19       |
+| 109124    | 15       |
+| 50491     | 12       |
++-----------+----------+
+
+-- How many line items are part of each customer order?
+-- Now we examine a field from a STRUCT nested inside the ARRAY.
+SELECT c_custkey, c_orders.o_orderkey, count(*)
+  FROM customer, customer.c_orders c_orders, c_orders.o_lineitems
+GROUP BY c_custkey, c_orders.o_orderkey
+LIMIT 5;
++-----------+------------+----------+
+| c_custkey | o_orderkey | count(*) |
++-----------+------------+----------+
+| 63367     | 4985959    | 7        |
+| 53989     | 1972230    | 2        |
+| 143513    | 5750498    | 5        |
+| 17849     | 4857989    | 1        |
+| 89881     | 1046437    | 1        |
++-----------+------------+----------+
+
+-- What are the line items in each customer order?
+-- One of the STRUCT fields inside the ARRAY is another
+-- ARRAY containing STRUCT elements. The join finds
+-- all the related items from both levels of ARRAY.
+SELECT c_custkey, o_orderkey, l_partkey
+  FROM customer, customer.c_orders, c_orders.o_lineitems
+LIMIT 5;
++-----------+------------+-----------+
+| c_custkey | o_orderkey | l_partkey |
++-----------+------------+-----------+
+| 113644    | 2738497    | 175846    |
+| 113644    | 2738497    | 27309     |
+| 113644    | 2738497    | 175873    |
+| 113644    | 2738497    | 88559     |
+| 113644    | 2738497    | 8032      |
++-----------+------------+-----------+
+
+</code></pre>
+
+        </div>
+
+      </article>
+
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title16" id="complex_types_using__pseudocolumns">
+
+      <h3 class="title topictitle3" id="ariaid-title16">Pseudocolumns for ARRAY and MAP Types</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          Each element in an <code class="ph codeph">ARRAY</code> type has a position, indexed starting from zero, and a value. Each element in a
+          <code class="ph codeph">MAP</code> type represents a key-value pair. Impala provides pseudocolumns that let you retrieve this metadata as part
+          of a query, or filter query results by including such things in a <code class="ph codeph">WHERE</code> clause. You refer to the pseudocolumns as
+          part of qualified column names in queries:
+        </p>
+
+        <ul class="ul">
+          <li class="li">
+            <code class="ph codeph">ITEM</code>: The value of an array element. If the <code class="ph codeph">ARRAY</code> contains <code class="ph codeph">STRUCT</code> elements,
+            you can refer to either <code class="ph codeph"><var class="keyword varname">array_name</var>.ITEM.<var class="keyword varname">field_name</var></code> or use the shorthand
+            <code class="ph codeph"><var class="keyword varname">array_name</var>.<var class="keyword varname">field_name</var></code>.
+          </li>
+
+          <li class="li">
+            <code class="ph codeph">POS</code>: The position of an element within an array.
+          </li>
+
+          <li class="li">
+            <code class="ph codeph">KEY</code>: The value forming the first part of a key-value pair in a map. It is not necessarily unique.
+          </li>
+
+          <li class="li">
+            <code class="ph codeph">VALUE</code>: The data item forming the second part of a key-value pair in a map. If the <code class="ph codeph">VALUE</code> part
+            of the <code class="ph codeph">MAP</code> element is a <code class="ph codeph">STRUCT</code>, you can refer to either
+            <code class="ph codeph"><var class="keyword varname">map_name</var>.VALUE.<var class="keyword varname">field_name</var></code> or use the shorthand
+            <code class="ph codeph"><var class="keyword varname">map_name</var>.<var class="keyword varname">field_name</var></code>.
+          </li>
+        </ul>
+
+
+
+        <p class="p toc inpage"></p>
+
+      </div>
+
+      <article class="topic concept nested3" aria-labelledby="item__pos" id="pseudocolumns__item">
+
+        <h4 class="title topictitle4" id="item__pos">ITEM and POS Pseudocolumns</h4>
+
+        <div class="body conbody">
+
+          <p class="p">
+            When an <code class="ph codeph">ARRAY</code> column contains <code class="ph codeph">STRUCT</code> elements, you can refer to a field within the
+            <code class="ph codeph">STRUCT</code> using a qualified name of the form
+            <code class="ph codeph"><var class="keyword varname">array_column</var>.<var class="keyword varname">field_name</var></code>. If the <code class="ph codeph">ARRAY</code> contains scalar
+            values, Impala recognizes the special name <code class="ph codeph"><var class="keyword varname">array_column</var>.ITEM</code> to represent the value of each
+            scalar array element. For example, if a column contained an <code class="ph codeph">ARRAY</code> where each element was a
+            <code class="ph codeph">STRING</code>, you would use <code class="ph codeph"><var class="keyword varname">array_name</var>.ITEM</code> to refer to each scalar value in the
+            <code class="ph codeph">SELECT</code> list, or the <code class="ph codeph">WHERE</code> or other clauses.
+          </p>
+
+          <p class="p">
+            This example shows a table with two <code class="ph codeph">ARRAY</code> columns whose elements are of the scalar type
+            <code class="ph codeph">STRING</code>. When referring to the values of the array elements in the <code class="ph codeph">SELECT</code> list,
+            <code class="ph codeph">WHERE</code> clause, or <code class="ph codeph">ORDER BY</code> clause, you use the <code class="ph codeph">ITEM</code> pseudocolumn because
+            within the array, the individual elements have no defined names.
+          </p>
+
+<pre class="pre codeblock"><code>create TABLE persons_of_interest
+(
+person_id BIGINT,
+aliases ARRAY &lt;STRING&gt;,
+associates ARRAY &lt;STRING&gt;,
+real_name STRING
+)
+STORED AS PARQUET;
+
+-- Get all the aliases of each person.
+SELECT real_name, aliases.ITEM
+  FROM persons_of_interest, persons_of_interest.aliases
+ORDER BY real_name, aliases.item;
+
+-- Search for particular associates of each person.
+SELECT real_name, associates.ITEM
+  FROM persons_of_interest, persons_of_interest.associates
+WHERE associates.item LIKE '% MacGuffin';
+
+</code></pre>
+
+          <p class="p">
+            Because an array is inherently an ordered data structure, Impala recognizes the special name
+            <code class="ph codeph"><var class="keyword varname">array_column</var>.POS</code> to represent the numeric position of each element within the array. The
+            <code class="ph codeph">POS</code> pseudocolumn lets you filter or reorder the result set based on the sequence of array elements.
+          </p>
+
+          <p class="p">
+            The following example uses a table from a flattened version of the TPC-H schema. The <code class="ph codeph">REGION</code> table only has a
+            few rows, such as one row for Europe and one for Asia. The row for each region represents all the countries in that region as an
+            <code class="ph codeph">ARRAY</code> of <code class="ph codeph">STRUCT</code> elements:
+          </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; desc region;
++-------------+--------------------------------------------------------------------+
+| name        | type                                                               |
++-------------+--------------------------------------------------------------------+
+| r_regionkey | smallint                                                           |
+| r_name      | string                                                             |
+| r_comment   | string                                                             |
+| r_nations   | array&lt;struct&lt;n_nationkey:smallint,n_name:string,n_comment:string&gt;&gt; |
++-------------+--------------------------------------------------------------------+
+
+</code></pre>
+
+          <p class="p">
+            To find the countries within a specific region, you use a join query. To find out the order of elements in the array, you also
+            refer to the <code class="ph codeph">POS</code> pseudocolumn in the select list:
+          </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; SELECT r1.r_name, r2.n_name, <strong class="ph b">r2.POS</strong>
+                  &gt; FROM region r1 INNER JOIN r1.r_nations r2
+                  &gt; WHERE r1.r_name = 'ASIA';
++--------+-----------+-----+
+| r_name | n_name    | pos |
++--------+-----------+-----+
+| ASIA   | VIETNAM   | 0   |
+| ASIA   | CHINA     | 1   |
+| ASIA   | JAPAN     | 2   |
+| ASIA   | INDONESIA | 3   |
+| ASIA   | INDIA     | 4   |
++--------+-----------+-----+
+</code></pre>
+
+          <p class="p">
+            Once you know the positions of the elements, you can use that information in subsequent queries, for example to change the
+            ordering of results from the complex type column or to filter certain elements from the array:
+          </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; SELECT r1.r_name, r2.n_name, r2.POS
+                  &gt; FROM region r1 INNER JOIN r1.r_nations r2
+                  &gt; WHERE r1.r_name = 'ASIA'
+                  &gt; <strong class="ph b">ORDER BY r2.POS DESC</strong>;
++--------+-----------+-----+
+| r_name | n_name    | pos |
++--------+-----------+-----+
+| ASIA   | INDIA     | 4   |
+| ASIA   | INDONESIA | 3   |
+| ASIA   | JAPAN     | 2   |
+| ASIA   | CHINA     | 1   |
+| ASIA   | VIETNAM   | 0   |
++--------+-----------+-----+
+[localhost:21000] &gt; SELECT r1.r_name, r2.n_name, r2.POS
+                  &gt; FROM region r1 INNER JOIN r1.r_nations r2
+                  &gt; WHERE r1.r_name = 'ASIA' AND <strong class="ph b">r2.POS BETWEEN 1 and 3</strong>;
++--------+-----------+-----+
+| r_name | n_name    | pos |
++--------+-----------+-----+
+| ASIA   | CHINA     | 1   |
+| ASIA   | JAPAN     | 2   |
+| ASIA   | INDONESIA | 3   |
++--------+-----------+-----+
+</code></pre>
+
+        </div>
+
+      </article>
+
+      <article class="topic concept nested3" aria-labelledby="key__value" id="pseudocolumns__key">
+
+        <h4 class="title topictitle4" id="key__value">KEY and VALUE Pseudocolumns</h4>
+
+        <div class="body conbody">
+
+          <p class="p">
+            The <code class="ph codeph">MAP</code> data type is suitable for representing sparse or wide data structures, where each row might only have
+            entries for a small subset of named fields. Because the element names (the map keys) vary depending on the row, a query must be
+            able to refer to both the key and the value parts of each key-value pair. The <code class="ph codeph">KEY</code> and <code class="ph codeph">VALUE</code>
+            pseudocolumns let you refer to the parts of the key-value pair independently within the query, as
+            <code class="ph codeph"><var class="keyword varname">map_column</var>.KEY</code> and <code class="ph codeph"><var class="keyword varname">map_column</var>.VALUE</code>.
+          </p>
+
+          <p class="p">
+            The <code class="ph codeph">KEY</code> must always be a scalar type, such as <code class="ph codeph">STRING</code>, <code class="ph codeph">BIGINT</code>, or
+            <code class="ph codeph">TIMESTAMP</code>. It can be <code class="ph codeph">NULL</code>. Values of the <code class="ph codeph">KEY</code> field are not necessarily unique
+            within the same <code class="ph codeph">MAP</code>. You apply any required <code class="ph codeph">DISTINCT</code>, <code class="ph codeph">GROUP BY</code>, and other
+            clauses in the query, and loop through the result set to process all the values matching any specified keys.
+          </p>
+
+          <p class="p">
+            The <code class="ph codeph">VALUE</code> can be either a scalar type or another complex type. If the <code class="ph codeph">VALUE</code> is a
+            <code class="ph codeph">STRUCT</code>, you can construct a qualified name
+            <code class="ph codeph"><var class="keyword varname">map_column</var>.VALUE.<var class="keyword varname">struct_field</var></code> to refer to the individual fields inside
+            the value part. If the <code class="ph codeph">VALUE</code> is an <code class="ph codeph">ARRAY</code> or another <code class="ph codeph">MAP</code>, you must include
+            another join condition that establishes a table alias for <code class="ph codeph"><var class="keyword varname">map_column</var>.VALUE</code>, and then
+            construct another qualified name using that alias, for example <code class="ph codeph"><var class="keyword varname">table_alias</var>.ITEM</code> or
+            <code class="ph codeph"><var class="keyword varname">table_alias</var>.KEY</code> and <code class="ph codeph"><var class="keyword varname">table_alias</var>.VALUE</code>
+          </p>
+
+          <p class="p">
+            The following example shows different ways to access a <code class="ph codeph">MAP</code> column using the <code class="ph codeph">KEY</code> and
+            <code class="ph codeph">VALUE</code> pseudocolumns. The <code class="ph codeph">DETAILS</code> column has a <code class="ph codeph">STRING</code> first part with short,
+            standardized values such as <code class="ph codeph">'Recurring'</code>, <code class="ph codeph">'Lucid'</code>, or <code class="ph codeph">'Anxiety'</code>. This is the
+            <span class="q">"key"</span> that is used to look up particular kinds of elements from the <code class="ph codeph">MAP</code>. The second part, also a
+            <code class="ph codeph">STRING</code>, is a longer free-form explanation. Impala gives you the standard pseudocolumn names
+            <code class="ph codeph">KEY</code> and <code class="ph codeph">VALUE</code> for the two parts, and you apply your own conventions and interpretations to the
+            underlying values.
+          </p>
+
+          <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+            If you find that the single-item nature of the <code class="ph codeph">VALUE</code> makes it difficult to model your data accurately, the
+            solution is typically to add some nesting to the complex type. For example, to have several sets of key-value pairs, make the
+            column an <code class="ph codeph">ARRAY</code> whose elements are <code class="ph codeph">MAP</code>. To make a set of key-value pairs that holds more
+            elaborate information, make a <code class="ph codeph">MAP</code> column whose <code class="ph codeph">VALUE</code> part contains an <code class="ph codeph">ARRAY</code>
+            or a <code class="ph codeph">STRUCT</code>.
+          </div>
+
+<pre class="pre codeblock"><code>CREATE TABLE dream_journal
+(
+  dream_id BIGINT,
+  details MAP &lt;STRING,STRING&gt;
+)
+STORED AS PARQUET;
+
+
+-- What are all the types of dreams that are recorded?
+SELECT DISTINCT details.KEY FROM dream_journal, dream_journal.details;
+
+-- How many lucid dreams were recorded?
+-- Because there is no GROUP BY, we count the 'Lucid' keys across all rows.
+SELECT <strong class="ph b">COUNT(details.KEY)</strong>
+  FROM dream_journal, dream_journal.details
+WHERE <strong class="ph b">details.KEY = 'Lucid'</strong>;
+
+-- Print a report of a subset of dreams, filtering based on both the lookup key
+-- and the detailed value.
+SELECT dream_id, <strong class="ph b">details.KEY AS "Dream Type"</strong>, <strong class="ph b">details.VALUE AS "Dream Summary"</strong>
+  FROM dream_journal, dream_journal.details
+WHERE
+  <strong class="ph b">details.KEY IN ('Happy', 'Pleasant', 'Joyous')</strong>
+  AND <strong class="ph b">details.VALUE LIKE '%childhood%'</strong>;
+</code></pre>
+
+          <p class="p">
+            The following example shows a more elaborate version of the previous table, where the <code class="ph codeph">VALUE</code> part of the
+            <code class="ph codeph">MAP</code> entry is a <code class="ph codeph">STRUCT</code> rather than a scalar type. Now instead of referring to the
+            <code class="ph codeph">VALUE</code> pseudocolumn directly, you use dot notation to refer to the <code class="ph codeph">STRUCT</code> fields inside it.
+          </p>
+
+<pre class="pre codeblock"><code>CREATE TABLE better_dream_journal
+(
+  dream_id BIGINT,
+  details MAP &lt;STRING,STRUCT &lt;summary: STRING, when_happened: TIMESTAMP, duration: DECIMAL(5,2), woke_up: BOOLEAN&gt; &gt;
+)
+STORED AS PARQUET;
+
+
+-- Do more elaborate reporting and filtering by examining multiple attributes within the same dream.
+SELECT dream_id, <strong class="ph b">details.KEY AS "Dream Type"</strong>, <strong class="ph b">details.VALUE.summary AS "Dream Summary"</strong>, <strong class="ph b">details.VALUE.duration AS "Duration"</strong>
+  FROM better_dream_journal, better_dream_journal.details
+WHERE
+  <strong class="ph b">details.KEY IN ('Anxiety', 'Nightmare')</strong>
+  AND <strong class="ph b">details.VALUE.duration &gt; 60</strong>
+  AND <strong class="ph b">details.VALUE.woke_up = TRUE</strong>;
+
+-- Remember that if the ITEM or VALUE contains a STRUCT, you can reference
+-- the STRUCT fields directly without the .ITEM or .VALUE qualifier.
+SELECT dream_id, <strong class="ph b">details.KEY AS "Dream Type"</strong>, <strong class="ph b">details.summary AS "Dream Summary"</strong>, <strong class="ph b">details.duration AS "Duration"</strong>
+  FROM better_dream_journal, better_dream_journal.details
+WHERE
+  <strong class="ph b">details.KEY IN ('Anxiety', 'Nightmare')</strong>
+  AND <strong class="ph b">details.duration &gt; 60</strong>
+  AND <strong class="ph b">details.woke_up = TRUE</strong>;
+</code></pre>
+
+        </div>
+
+      </article>
+
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title19" id="complex_types_using__complex_types_etl">
+
+
+
+      <h3 class="title topictitle3" id="ariaid-title19">Loading Data Containing Complex Types</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          Because the Impala <code class="ph codeph">INSERT</code> statement does not currently support creating new data with complex type columns, or
+          copying existing complex type values from one table to another, you primarily use Impala to query Parquet tables with complex
+          types where the data was inserted through Hive, or create tables with complex types where you already have existing Parquet data
+          files.
+

<TRUNCATED>

[20/51] [partial] impala git commit: [DOCS] Impala doc site update for 3.0

Posted by mi...@apache.org.

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_new_features.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_new_features.html b/docs/build3x/html/topics/impala_new_features.html
new file mode 100644
index 0000000..cd1ecc5
--- /dev/null
+++ b/docs/build3x/html/topics/impala_new_features.html
@@ -0,0 +1,3806 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_release_notes.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="new_features"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>New Features in Apache Impala</title></head><body id="new_features"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1"><span class="ph">New Features in Apache Impala</span></h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+      This release of Impala contains the following changes and enhancements from previous releases.
+    </p>
+
+    <p class="p toc inpage"></p>
+
+  </div>
+
+
+  <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_release_notes.html">Impala Release Notes</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="new_features__new_features_300">
+    <h2 class="title topictitle2" id="ariaid-title2">New Features in <span class="keyword">Impala 3.0</span></h2>
+    <div class="body conbody">
+      <p class="p">
+        For the full list of issues closed in this release, including the
+        issues marked as <span class="q">"new features"</span> or <span class="q">"improvements"</span>, see the
+          <a class="xref" href="https://impala.apache.org/docs/changelog-3.0.html" target="_blank">changelog for <span class="keyword">Impala 3.0</span></a>.
+      </p>
+    </div>
+  </article>
+
+
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="new_features__new_features_2120">
+
+    <h2 class="title topictitle2" id="ariaid-title3">New Features in <span class="keyword">Impala 2.12</span></h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        For the full list of issues closed in this release, including the issues
+        marked as <span class="q">"new features"</span> or <span class="q">"improvements"</span>, see the
+        <a class="xref" href="https://impala.apache.org/docs/changelog-2.12.html" target="_blank">changelog for <span class="keyword">Impala 2.12</span></a>.
+      </p>
+
+    </div>
+  </article>
+
+
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="new_features__new_features_2110">
+
+    <h2 class="title topictitle2" id="ariaid-title4">New Features in <span class="keyword">Impala 2.11</span></h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        For the full list of issues closed in this release, including the issues
+        marked as <span class="q">"new features"</span> or <span class="q">"improvements"</span>, see the
+        <a class="xref" href="https://impala.apache.org/docs/changelog-2.11.html" target="_blank">changelog for <span class="keyword">Impala 2.11</span></a>.
+      </p>
+
+    </div>
+  </article>
+
+
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title5" id="new_features__new_features_2100">
+
+    <h2 class="title topictitle2" id="ariaid-title5">New Features in <span class="keyword">Impala 2.10</span></h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        For the full list of issues closed in this release, including the issues
+        marked as <span class="q">"new features"</span> or <span class="q">"improvements"</span>, see the
+        <a class="xref" href="https://impala.apache.org/docs/changelog-2.10.html" target="_blank">changelog for <span class="keyword">Impala 2.10</span></a>.
+      </p>
+
+    </div>
+  </article>
+
+
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title6" id="new_features__new_features_290">
+
+    <h2 class="title topictitle2" id="ariaid-title6">New Features in <span class="keyword">Impala 2.9</span></h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        For the full list of issues closed in this release, including the issues
+        marked as <span class="q">"new features"</span> or <span class="q">"improvements"</span>, see the
+        <a class="xref" href="https://impala.apache.org/docs/changelog-2.9.html" target="_blank">changelog for <span class="keyword">Impala 2.9</span></a>.
+      </p>
+
+      <p class="p">
+        The following are some of the most significant new features in this release:
+      </p>
+
+      <ul class="ul" id="new_features_290__feature_list">
+        <li class="li">
+          <p class="p">
+            A new function, <code class="ph codeph">replace()</code>, which is faster than
+            <code class="ph codeph">regexp_replace()</code> for simple string substitutions.
+            See <a class="xref" href="impala_string_functions.html">Impala String Functions</a> for details.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            Startup flags for the <span class="keyword cmdname">impalad</span> daemon, <code class="ph codeph">is_executor</code>
+            and <code class="ph codeph">is_coordinator</code>, let you divide the work on a large, busy cluster
+            between a small number of hosts acting as query coordinators, and a larger number of
+            hosts acting as query executors. By default, each host can act in both roles,
+            potentially introducing bottlenecks during heavily concurrent workloads.
+            See <a class="xref" href="impala_scalability.html">Scalability Considerations for Impala</a> for details.
+          </p>
+        </li>
+      </ul>
+
+    </div>
+  </article>
+
+
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title7" id="new_features__new_features_280">
+
+    <h2 class="title topictitle2" id="ariaid-title7">New Features in <span class="keyword">Impala 2.8</span></h2>
+
+    <div class="body conbody">
+
+      <ul class="ul" id="new_features_280__feature_list">
+        <li class="li">
+          <p class="p">
+            Performance and scalability improvements:
+          </p>
+          <ul class="ul">
+            <li class="li">
+              <p class="p">
+                The <code class="ph codeph">COMPUTE STATS</code> statement can
+                take advantage of multithreading.
+              </p>
+            </li>
+            <li class="li">
+              <p class="p">
+                Improved scalability for highly concurrent loads by reducing the possibility of TCP/IP timeouts.
+                A configuration setting, <code class="ph codeph">accepted_cnxn_queue_depth</code>, can be adjusted upwards to
+                avoid this type of timeout on large clusters.
+              </p>
+            </li>
+            <li class="li">
+              <p class="p">
+                Several performance improvements were made to the mechanism for generating native code:
+              </p>
+              <ul class="ul">
+                <li class="li">
+                  <p class="p">
+                    Some queries involving analytic functions can take better advantage of native code generation.
+                  </p>
+                </li>
+                <li class="li">
+                  <p class="p">
+                    Modules produced during intermediate code generation are organized
+                    to be easier to cache and reuse during the lifetime of a long-running or complicated query.
+                  </p>
+                </li>
+                <li class="li">
+                  <p class="p">
+                    The <code class="ph codeph">COMPUTE STATS</code> statement is more efficient
+                    (less time for the codegen phase) for tables with a large number
+                    of columns, especially for tables containing <code class="ph codeph">TIMESTAMP</code>
+                    columns.
+                  </p>
+                </li>
+                <li class="li">
+                  <p class="p">
+                    The logic for determining whether or not to use a runtime filter is more reliable, and the
+                    evaluation process itself is faster because of native code generation.
+                  </p>
+                </li>
+              </ul>
+            </li>
+            <li class="li">
+              <p class="p">
+                The <code class="ph codeph">MT_DOP</code> query option enables
+                multithreading for a number of Impala operations.
+                <code class="ph codeph">COMPUTE STATS</code> statements for Parquet tables
+                use a default of <code class="ph codeph">MT_DOP=4</code> to improve the
+                intra-node parallelism and CPU efficiency of this data-intensive
+                operation.
+              </p>
+            </li>
+            <li class="li">
+              <p class="p">
+                The <code class="ph codeph">COMPUTE STATS</code> statement is more efficient
+                (less time for the codegen phase) for tables with a large number
+                of columns.
+              </p>
+            </li>
+            <li class="li">
+              <p class="p">
+                A new hint, <code class="ph codeph">CLUSTERED</code>,
+                allows Impala <code class="ph codeph">INSERT</code> operations on a Parquet table
+                that use dynamic partitioning to process a high number of
+                partitions in a single statement. The data is ordered based on the
+                partition key columns, and each partition is only written
+                by a single host, reducing the amount of memory needed to buffer
+                Parquet data while the data blocks are being constructed.
+              </p>
+            </li>
+            <li class="li">
+              <p class="p">
+                The new configuration setting <code class="ph codeph">inc_stats_size_limit_bytes</code>
+                lets you reduce the load on the catalog server when running the
+                <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> statement for very large tables.
+              </p>
+            </li>
+            <li class="li">
+              <p class="p">
+                Impala folds many constant expressions within query statements,
+                rather than evaluating them for each row. This optimization
+                is especially useful when using functions to manipulate and
+                format <code class="ph codeph">TIMESTAMP</code> values, such as the result
+                of an expression such as <code class="ph codeph">to_date(now() - interval 1 day)</code>.
+              </p>
+            </li>
+            <li class="li">
+              <p class="p">
+                Parsing of complicated expressions is faster. This speedup is
+                especially useful for queries containing large <code class="ph codeph">CASE</code>
+                expressions.
+              </p>
+            </li>
+            <li class="li">
+              <p class="p">
+                Evaluation is faster for <code class="ph codeph">IN</code> operators with many constant
+                arguments. The same performance improvement applies to other functions
+                with many constant arguments.
+              </p>
+            </li>
+            <li class="li">
+              <p class="p">
+                Impala optimizes identical comparison operators within multiple <code class="ph codeph">OR</code>
+                blocks.
+              </p>
+            </li>
+            <li class="li">
+              <p class="p">
+                The reporting for wall-clock times and total CPU time in profile output is more accurate.
+              </p>
+            </li>
+            <li class="li">
+              <p class="p">
+                A new query option, <code class="ph codeph">SCRATCH_LIMIT</code>, lets you restrict the amount of
+                space used when a query exceeds the memory limit and activates the <span class="q">"spill to disk"</span> mechanism.
+                This option helps to avoid runaway queries or make queries <span class="q">"fail fast"</span> if they require more
+                memory than anticipated. You can prevent runaway queries from using excessive amounts of spill space,
+                without restarting the cluster to turn the spilling feature off entirely.
+                See <a class="xref" href="impala_scratch_limit.html#scratch_limit">SCRATCH_LIMIT Query Option</a> for details.
+              </p>
+            </li>
+          </ul>
+        </li>
+        <li class="li">
+          <p class="p">
+            Integration with Apache Kudu:
+          </p>
+          <ul class="ul">
+            <li class="li">
+              <p class="p">
+                The experimental Impala support for the Kudu storage layer has been folded
+                into the main Impala development branch. Impala can now directly access Kudu tables,
+                opening up new capabilities such as enhanced DML operations and continuous ingestion.
+              </p>
+            </li>
+            <li class="li">
+              <p class="p">
+                The <code class="ph codeph">DELETE</code> statement is a flexible way to remove data from a Kudu table. Previously,
+                removing data from an Impala table involved removing or rewriting the underlying data files, dropping entire partitions,
+                or rewriting the entire table. This Impala statement only works for Kudu tables.
+              </p>
+            </li>
+            <li class="li">
+              <p class="p">
+                The <code class="ph codeph">UPDATE</code> statement is a flexible way to modify data within a Kudu table. Previously,
+                updating data in an Impala table involved replacing the underlying data files, dropping entire partitions,
+                or rewriting the entire table. This Impala statement only works for Kudu tables.
+              </p>
+            </li>
+            <li class="li">
+              <p class="p">
+                The <code class="ph codeph">UPSERT</code> statement is a flexible way to ingest, modify, or both data within a Kudu table. Previously,
+                ingesting data that might contain duplicates involved an inefficient multi-stage operation, and there was no
+                built-in protection against duplicate data. The <code class="ph codeph">UPSERT</code> statement, in combination with
+                the primary key designation for Kudu tables, lets you add or replace rows in a single operation, and
+                automatically avoids creating any duplicate data.
+              </p>
+            </li>
+            <li class="li">
+              <p class="p">
+                The <code class="ph codeph">CREATE TABLE</code> statement gains some new clauses that are specific to Kudu tables:
+                <code class="ph codeph">PARTITION BY</code>, <code class="ph codeph">PARTITIONS</code>, <code class="ph codeph">STORED AS KUDU</code>, and column
+                attributes <code class="ph codeph">PRIMARY KEY</code>, <code class="ph codeph">NULL</code> and <code class="ph codeph">NOT NULL</code>,
+                <code class="ph codeph">ENCODING</code>, <code class="ph codeph">COMPRESSION</code>, <code class="ph codeph">DEFAULT</code>, and <code class="ph codeph">BLOCK_SIZE</code>.
+                These clauses replace the explicit <code class="ph codeph">TBLPROPERTIES</code> settings that were required in the
+                early experimental phases of integration between Impala and Kudu.
+              </p>
+            </li>
+            <li class="li">
+              <p class="p">
+                The <code class="ph codeph">ALTER TABLE</code> statement can change certain attributes of Kudu tables.
+                You can add, drop, or rename columns.
+                You can add or drop range partitions.
+                You can change the <code class="ph codeph">TBLPROPERTIES</code> value to rename or point to a different underlying Kudu table,
+                independently from the Impala table name in the metastore database.
+                You cannot change the data type of an existing column in a Kudu table.
+              </p>
+            </li>
+            <li class="li">
+              <p class="p">
+                The <code class="ph codeph">SHOW PARTITIONS</code> statement displays information about the distribution of data
+                between partitions in Kudu tables. A new variation, <code class="ph codeph">SHOW RANGE PARTITIONS</code>,
+                displays information about the Kudu-specific partitions that apply across ranges of key values.
+              </p>
+            </li>
+            <li class="li">
+              <p class="p">
+                Not all Impala data types are supported in Kudu tables. In particular, currently the Impala
+                <code class="ph codeph">TIMESTAMP</code> type is not allowed in a Kudu table. Impala does not recognize the
+                <code class="ph codeph">UNIXTIME_MICROS</code> Kudu type when it is present in a Kudu table. (These two
+                representations of date/time data use different units and are not directly compatible.)
+                You cannot create columns of type <code class="ph codeph">TIMESTAMP</code>, <code class="ph codeph">DECIMAL</code>,
+                <code class="ph codeph">VARCHAR</code>, or <code class="ph codeph">CHAR</code> within a Kudu table. Within a query, you can
+                cast values in a result set to these types. Certain types, such as <code class="ph codeph">BOOLEAN</code>,
+                cannot be used as primary key columns.
+              </p>
+            </li>
+            <li class="li">
+              <p class="p">
+                Currently, Kudu tables are not interchangeable between Impala and Hive the way other kinds of Impala tables are.
+                Although the metadata for Kudu tables is stored in the metastore database, currently Hive cannot access Kudu tables.
+              </p>
+            </li>
+            <li class="li">
+              <p class="p">
+                The <code class="ph codeph">INSERT</code> statement works for Kudu tables. The organization
+                of the Kudu data makes it more efficient than with HDFS-backed tables to insert
+                data in small batches, such as with the <code class="ph codeph">INSERT ... VALUES</code> syntax.
+              </p>
+            </li>
+            <li class="li">
+              <p class="p">
+                Some audit data is recorded for data governance purposes.
+                All <code class="ph codeph">UPDATE</code>, <code class="ph codeph">DELETE</code>, and <code class="ph codeph">UPSERT</code> statements are characterized
+                as <code class="ph codeph">INSERT</code> operations in the audit log. Currently, lineage metadata is not generated for
+                <code class="ph codeph">UPDATE</code> and <code class="ph codeph">DELETE</code> operations on Kudu tables.
+              </p>
+            </li>
+            <li class="li">
+              <div class="p">
+                Currently, Kudu tables have limited support for Sentry:
+                <ul class="ul">
+                  <li class="li">
+                    <p class="p">
+                      Access to Kudu tables must be granted to roles as usual.
+                    </p>
+                  </li>
+                  <li class="li">
+                    <p class="p">
+                      Currently, access to a Kudu table through Sentry is <span class="q">"all or nothing"</span>.
+                      You cannot enforce finer-grained permissions such as at the column level,
+                      or permissions on certain operations such as <code class="ph codeph">INSERT</code>.
+                    </p>
+                  </li>
+                  <li class="li">
+                    <p class="p">
+                      Only users with <code class="ph codeph">ALL</code> privileges on <code class="ph codeph">SERVER</code> can create external Kudu tables.
+                    </p>
+                  </li>
+                </ul>
+                Because non-SQL APIs can access Kudu data without going through Sentry
+                authorization, currently the Sentry support is considered preliminary.
+              </div>
+            </li>
+            <li class="li">
+              <p class="p">
+                Equality and <code class="ph codeph">IN</code> predicates in Impala queries are pushed to
+                Kudu and evaluated efficiently by the Kudu storage layer.
+              </p>
+            </li>
+          </ul>
+        </li>
+        <li class="li">
+          <p class="p">
+            <strong class="ph b">Security:</strong>
+          </p>
+          <ul class="ul">
+            <li class="li">
+              <p class="p">
+                Impala can take advantage of the S3 encrypted credential
+                store, to avoid exposing the secret key when accessing
+                data stored on S3.
+              </p>
+            </li>
+          </ul>
+        </li>
+        <li class="li">
+          <p class="p">
+            The <code class="ph codeph">REFRESH</code> statement now updates information about HDFS block locations.
+            Therefore, you can perform a fast and efficient <code class="ph codeph">REFRESH</code> after doing an HDFS
+            rebalancing operation instead of the more expensive <code class="ph codeph">INVALIDATE METADATA</code> statement.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            [<a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-1654" target="_blank">IMPALA-1654</a>]
+            Several kinds of DDL operations
+            can now work on a range of partitions. The partitions can be specified
+            using operators such as <code class="ph codeph">&lt;</code>, <code class="ph codeph">&gt;=</code>, and
+            <code class="ph codeph">!=</code> rather than just an equality predicate applying to a single
+            partition.
+            This new feature extends the syntax of several clauses
+            of the <code class="ph codeph">ALTER TABLE</code> statement
+            (<code class="ph codeph">DROP PARTITION</code>, <code class="ph codeph">SET [UN]CACHED</code>,
+            <code class="ph codeph">SET FILEFORMAT | SERDEPROPERTIES | TBLPROPERTIES</code>),
+            the <code class="ph codeph">SHOW FILES</code> statement, and the
+            <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> statement.
+            It does not apply to statements that are defined to only apply to a single
+            partition, such as <code class="ph codeph">LOAD DATA</code>, <code class="ph codeph">ALTER TABLE ... ADD PARTITION</code>,
+            <code class="ph codeph">SET LOCATION</code>, and <code class="ph codeph">INSERT</code> with a static
+            partitioning clause.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            The <code class="ph codeph">instr()</code> function has optional second and third arguments, representing
+            the character to position to begin searching for the substring, and the Nth occurrence
+            of the substring to find.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            Improved error handling for malformed Avro data. In particular, incorrect
+            precision or scale for <code class="ph codeph">DECIMAL</code> types is now handled.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            Impala debug web UI:
+          </p>
+          <ul class="ul">
+            <li class="li">
+              <p class="p">
+                In addition to <span class="q">"inflight"</span> and <span class="q">"finished"</span> queries, the web UI
+                now also includes a section for <span class="q">"queued"</span> queries.
+              </p>
+            </li>
+            <li class="li">
+              <p class="p">
+                The <span class="ph uicontrol">/sessions</span> tab now clarifies how many of the displayed
+                sections are active, and lets you sort by <span class="ph uicontrol">Expired</span> status
+                to distinguish active sessions from expired ones.
+              </p>
+            </li>
+          </ul>
+        </li>
+        <li class="li">
+          <p class="p">
+            Improved stability when DDL operations such as <code class="ph codeph">CREATE DATABASE</code>
+            or <code class="ph codeph">DROP DATABASE</code> are run in Hive at the same time as an Impala
+            <code class="ph codeph">INVALIDATE METADATA</code> statement.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            The <span class="q">"out of memory"</span> error report was made more user-friendly, with additional
+            diagnostic information to help identify the spot where the memory limit was exceeded.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            Improved disk space usage for Java-based UDFs. Temporary copies of the associated JAR
+            files are removed when no longer needed, so that they do not accumulate across restarts
+            of the <span class="keyword cmdname">catalogd</span> daemon and potentially cause an out-of-space condition.
+            These temporary files are also created in the directory specified by the <code class="ph codeph">local_library_dir</code>
+            configuration setting, so that the storage for these temporary files can be independent
+            from any capacity limits on the <span class="ph filepath">/tmp</span> filesystem.
+          </p>
+        </li>
+      </ul>
+
+    </div>
+  </article>
+
+
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title8" id="new_features__new_features_270">
+
+    <h2 class="title topictitle2" id="ariaid-title8">New Features in <span class="keyword">Impala 2.7</span></h2>
+
+    <div class="body conbody">
+
+      <ul class="ul" id="new_features_270__feature_list">
+        <li class="li">
+          <p class="p">
+            Performance improvements:
+          </p>
+          <ul class="ul">
+            <li class="li">
+              <p class="p">
+                [<a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-3206" target="_blank">IMPALA-3206</a>]
+                Speedup for queries against <code class="ph codeph">DECIMAL</code> columns in Avro tables.
+                The code that parses <code class="ph codeph">DECIMAL</code> values from Avro now uses
+                native code generation.
+              </p>
+            </li>
+            <li class="li">
+              <p class="p">
+                [<a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-3674" target="_blank">IMPALA-3674</a>]
+                Improved efficiency in LLVM code generation can reduce codegen time, especially
+                for short queries.
+              </p>
+            </li>
+
+            <li class="li">
+              <p class="p">
+                [<a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2979" target="_blank">IMPALA-2979</a>]
+                Improvements to scheduling on worker nodes,
+                enabled by the <code class="ph codeph">REPLICA_PREFERENCE</code> query option.
+                See <a class="xref" href="impala_replica_preference.html#replica_preference">REPLICA_PREFERENCE Query Option (Impala 2.7 or higher only)</a> for details.
+              </p>
+            </li>
+          </ul>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            [<a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-1683" target="_blank">IMPALA-1683</a>]
+            The <code class="ph codeph">REFRESH</code> statement can be applied to a single partition,
+            rather than the entire table. See <a class="xref" href="impala_refresh.html#refresh">REFRESH Statement</a>
+            and <a class="xref" href="impala_partitioning.html#partition_refresh">Refreshing a Single Partition</a> for details.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            Improvements to the Impala web user interface:
+          </p>
+          <ul class="ul">
+            <li class="li">
+              <p class="p">
+                [<a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2767" target="_blank">IMPALA-2767</a>]
+                You can now force a session to expire by clicking a link in the web UI,
+                on the <span class="ph uicontrol">/sessions</span> tab.
+              </p>
+            </li>
+            <li class="li">
+              <p class="p">
+                [<a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-3715" target="_blank">IMPALA-3715</a>]
+                The <span class="ph uicontrol">/memz</span> tab includes more information about
+                Impala memory usage.
+              </p>
+            </li>
+            <li class="li">
+              <p class="p">
+                [<a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-3716" target="_blank">IMPALA-3716</a>]
+                The <span class="ph uicontrol">Details</span> page for a query now includes
+                a <span class="ph uicontrol">Memory</span> tab.
+              </p>
+            </li>
+          </ul>
+        </li>
+        <li class="li">
+          <p class="p">
+            [<a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-3499" target="_blank">IMPALA-3499</a>]
+            Scalability improvements to the catalog server. Impala handles internal communication
+            more efficiently for tables with large numbers of columns and partitions, where the
+            size of the metadata exceeds 2 GiB.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            [<a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-3677" target="_blank">IMPALA-3677</a>]
+            You can send a <code class="ph codeph">SIGUSR1</code> signal to any Impala-related daemon to write a
+            Breakpad minidump. For advanced troubleshooting, you can now produce a minidump
+            without triggering a crash. See <a class="xref" href="impala_breakpad.html#breakpad">Breakpad Minidumps for Impala (Impala 2.6 or higher only)</a> for
+            details about the Breakpad minidump feature.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            [<a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-3687" target="_blank">IMPALA-3687</a>]
+            The schema reconciliation rules for Avro tables have changed slightly
+            for <code class="ph codeph">CHAR</code> and <code class="ph codeph">VARCHAR</code> columns. Now, if
+            the definition of such a column is changed in the Avro schema file,
+            the column retains its <code class="ph codeph">CHAR</code> or <code class="ph codeph">VARCHAR</code>
+            type as specified in the SQL definition, but the column name and comment
+            from the Avro schema file take precedence.
+            See <a class="xref" href="impala_avro.html#avro_create_table">Creating Avro Tables</a> for details about
+            column definitions in Avro tables.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            [<a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-3575" target="_blank">IMPALA-3575</a>]
+            Some network
+            operations now have additional timeout and retry settings. The extra
+            configuration helps avoid failed queries for transient network
+            problems, to avoid hangs when a sender or receiver fails in the
+            middle of a network transmission, and to make cancellation requests
+            more reliable despite network issues. </p>
+        </li>
+      </ul>
+
+    </div>
+  </article>
+
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title9" id="new_features__new_features_260">
+
+    <h2 class="title topictitle2" id="ariaid-title9">New Features in <span class="keyword">Impala 2.6</span></h2>
+
+    <div class="body conbody">
+
+      <ul class="ul">
+        <li class="li">
+          <p class="p">
+            Improvements to Impala support for the Amazon S3 filesystem:
+          </p>
+          <ul class="ul">
+            <li class="li">
+              <p class="p">
+                Impala can now write to S3 tables through the <code class="ph codeph">INSERT</code>
+                or <code class="ph codeph">LOAD DATA</code> statements.
+                See <a class="xref" href="impala_s3.html#s3">Using Impala with the Amazon S3 Filesystem</a> for general information about
+                using Impala with S3.
+              </p>
+            </li>
+            <li class="li">
+              <p class="p">
+                A new query option, <code class="ph codeph">S3_SKIP_INSERT_STAGING</code>, lets you
+                trade off between fast <code class="ph codeph">INSERT</code> performance and
+                slower <code class="ph codeph">INSERT</code>s that are more consistent if a
+                problem occurs during the statement. The new behavior is enabled by default.
+                See <a class="xref" href="impala_s3_skip_insert_staging.html#s3_skip_insert_staging">S3_SKIP_INSERT_STAGING Query Option (Impala 2.6 or higher only)</a> for details
+                about this option.
+              </p>
+            </li>
+          </ul>
+        </li>
+        <li class="li">
+          <p class="p">
+            Performance improvements for the runtime filtering feature:
+          </p>
+          <ul class="ul">
+            <li class="li">
+              <p class="p">
+                The default for the <code class="ph codeph">RUNTIME_FILTER_MODE</code>
+                query option is changed to <code class="ph codeph">GLOBAL</code> (the highest setting).
+                See <a class="xref" href="impala_runtime_filter_mode.html#runtime_filter_mode">RUNTIME_FILTER_MODE Query Option (Impala 2.5 or higher only)</a> for
+                details about this option.
+              </p>
+            </li>
+            <li class="li">
+              <p class="p">
+                The <code class="ph codeph">RUNTIME_BLOOM_FILTER_SIZE</code> setting is now only used
+                as a fallback if statistics are not available; otherwise, Impala
+                uses the statistics to estimate the appropriate size to use for each filter.
+                See <a class="xref" href="impala_runtime_bloom_filter_size.html#runtime_bloom_filter_size">RUNTIME_BLOOM_FILTER_SIZE Query Option (Impala 2.5 or higher only)</a> for
+                details about this option.
+              </p>
+            </li>
+            <li class="li">
+              <p class="p">
+                New query options <code class="ph codeph">RUNTIME_FILTER_MIN_SIZE</code> and
+                <code class="ph codeph">RUNTIME_FILTER_MAX_SIZE</code> let you fine-tune
+                the sizes of the Bloom filter structures used for runtime filtering.
+                If the filter size derived from Impala internal estimates or from
+                the <code class="ph codeph">RUNTIME_FILTER_BLOOM_SIZE</code> falls outside the size
+                range specified by these options, any too-small filter size is adjusted
+                to the minimum, and any too-large filter size is adjusted to the maximum.
+                See <a class="xref" href="impala_runtime_filter_min_size.html#runtime_filter_min_size">RUNTIME_FILTER_MIN_SIZE Query Option (Impala 2.6 or higher only)</a>
+                and <a class="xref" href="impala_runtime_filter_max_size.html#runtime_filter_max_size">RUNTIME_FILTER_MAX_SIZE Query Option (Impala 2.6 or higher only)</a>
+                for details about these options.
+              </p>
+            </li>
+            <li class="li">
+              <p class="p">
+                Runtime filter propagation now applies to all the
+                operands of <code class="ph codeph">UNION</code> and <code class="ph codeph">UNION ALL</code>
+                operators.
+              </p>
+            </li>
+            <li class="li">
+              <p class="p">
+                Runtime filters can now be produced during join queries even
+                when the join processing activates the spill-to-disk mechanism.
+              </p>
+            </li>
+          </ul>
+            See <a class="xref" href="impala_runtime_filtering.html#runtime_filtering">Runtime Filtering for Impala Queries (Impala 2.5 or higher only)</a> for
+            general information about the runtime filtering feature.
+        </li>
+
+        <li class="li">
+          <p class="p">
+            Admission control and dynamic resource pools are enabled by default.
+            See <a class="xref" href="impala_admission.html#admission_control">Admission Control and Query Queuing</a> for details
+            about admission control.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            Impala can now manually set column statistics,
+            using the <code class="ph codeph">ALTER TABLE</code> statement with a
+            <code class="ph codeph">SET COLUMN STATS</code> clause.
+            See <a class="xref" href="impala_perf_stats.html#perf_column_stats_manual">impala_perf_stats.html#perf_column_stats_manual</a> for details.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            Impala can now write lightweight <span class="q">"minidump"</span> files, rather
+            than large core files, to save diagnostic information when
+            any of the Impala-related daemons crash. This feature uses the
+            open source <code class="ph codeph">breakpad</code> framework.
+            See <a class="xref" href="impala_breakpad.html#breakpad">Breakpad Minidumps for Impala (Impala 2.6 or higher only)</a> for details.
+          </p>
+        </li>
+        <li class="li">
+          <div class="p">
+            New query options improve interoperability with Parquet files:
+            <ul class="ul">
+              <li class="li">
+                <p class="p">
+                  The <code class="ph codeph">PARQUET_FALLBACK_SCHEMA_RESOLUTION</code> query option
+                  lets Impala locate columns within Parquet files based on
+                  column name rather than ordinal position.
+                  This enhancement improves interoperability with applications
+                  that write Parquet files with a different order or subset of
+                  columns than are used in the Impala table.
+                  See <a class="xref" href="impala_parquet_fallback_schema_resolution.html#parquet_fallback_schema_resolution">PARQUET_FALLBACK_SCHEMA_RESOLUTION Query Option (Impala 2.6 or higher only)</a>
+                  for details.
+                </p>
+              </li>
+              <li class="li">
+                <p class="p">
+                  The <code class="ph codeph">PARQUET_ANNOTATE_STRINGS_UTF8</code> query option
+                  makes Impala include the <code class="ph codeph">UTF-8</code> annotation
+                  metadata for <code class="ph codeph">STRING</code>, <code class="ph codeph">CHAR</code>,
+                  and <code class="ph codeph">VARCHAR</code> columns in Parquet files created
+                  by <code class="ph codeph">INSERT</code> or <code class="ph codeph">CREATE TABLE AS SELECT</code>
+                  statements.
+                  See <a class="xref" href="impala_parquet_annotate_strings_utf8.html#parquet_annotate_strings_utf8">PARQUET_ANNOTATE_STRINGS_UTF8 Query Option (Impala 2.6 or higher only)</a>
+                  for details.
+                </p>
+              </li>
+            </ul>
+            See <a class="xref" href="impala_parquet.html#parquet">Using the Parquet File Format with Impala Tables</a> for general information about working
+            with Parquet files.
+          </div>
+        </li>
+        <li class="li">
+          <p class="p">
+            Improvements to security and reduction in overhead for secure clusters:
+          </p>
+          <ul class="ul">
+            <li class="li">
+              <p class="p">
+                Overall performance improvements for secure clusters.
+                (TPC-H queries on a secure cluster were benchmarked
+                at roughly 3x as fast as the previous release.)
+              </p>
+            </li>
+            <li class="li">
+              <p class="p">
+                Impala now recognizes the <code class="ph codeph">auth_to_local</code> setting,
+                specified through the HDFS configuration setting
+                <code class="ph codeph">hadoop.security.auth_to_local</code>.
+                This feature is disabled by default; to enable it,
+                specify <code class="ph codeph">--load_auth_to_local_rules=true</code>
+                in the <span class="keyword cmdname">impalad</span> configuration settings.
+                See <a class="xref" href="impala_kerberos.html#auth_to_local">Mapping Kerberos Principals to Short Names for Impala</a> for details.
+              </p>
+            </li>
+            <li class="li">
+              <p class="p">
+                Timing improvements in the mechanism for the <span class="keyword cmdname">impalad</span>
+                daemon to acquire Kerberos tickets. This feature spreads out the overhead
+                on the KDC during Impala startup, especially for large clusters.
+              </p>
+            </li>
+            <li class="li">
+              <p class="p">
+                For Kerberized clusters, the Catalog service now uses
+                the Kerberos principal instead of the operating sytem user that runs
+                the <span class="keyword cmdname">catalogd</span> daemon.
+                This eliminates the requirement to configure a <code class="ph codeph">hadoop.user.group.static.mapping.overrides</code>
+                setting to put the OS user into the Sentry administrative group, on clusters where the principal
+                and the OS user name for this user are different.
+              </p>
+            </li>
+          </ul>
+        </li>
+        <li class="li">
+          <p class="p">
+            Overall performance improvements for join queries, by using a prefetching mechanism
+            while building the in-memory hash table to evaluate join predicates.
+            See <a class="xref" href="impala_prefetch_mode.html#prefetch_mode">PREFETCH_MODE Query Option (Impala 2.6 or higher only)</a> for the query option
+            to control this optimization.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            The <span class="keyword cmdname">impala-shell</span> interpreter has a new command,
+            <code class="ph codeph">SOURCE</code>, that lets you run a set of SQL statements
+            or other <span class="keyword cmdname">impala-shell</span> commands stored in a file.
+            You can run additional <code class="ph codeph">SOURCE</code> commands from inside
+            a file, to set up flexible sequences of statements for use cases
+            such as schema setup, ETL, or reporting.
+            See <a class="xref" href="impala_shell_commands.html#shell_commands">impala-shell Command Reference</a> for details
+            and <a class="xref" href="impala_shell_running_commands.html#shell_running_commands">Running Commands and SQL Statements in impala-shell</a>
+            for examples.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            The <code class="ph codeph">millisecond()</code> built-in function lets you extract
+            the fractional seconds part of a <code class="ph codeph">TIMESTAMP</code> value.
+            See <a class="xref" href="impala_datetime_functions.html#datetime_functions">Impala Date and Time Functions</a> for details.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            If an Avro table is created without column definitions in the
+            <code class="ph codeph">CREATE TABLE</code> statement, and columns are later
+            added through <code class="ph codeph">ALTER TABLE</code>, the resulting
+            table is now queryable. Missing values from the newly added
+            columns now default to <code class="ph codeph">NULL</code>.
+            See <a class="xref" href="impala_avro.html#avro">Using the Avro File Format with Impala Tables</a> for general details about
+            working with Avro files.
+          </p>
+        </li>
+        <li class="li">
+          <div class="p">
+            The mechanism for interpreting <code class="ph codeph">DECIMAL</code> literals is
+            improved, no longer going through an intermediate conversion step
+            to <code class="ph codeph">DOUBLE</code>:
+            <ul class="ul">
+              <li class="li">
+                <p class="p">
+                  Casting a <code class="ph codeph">DECIMAL</code> value to <code class="ph codeph">TIMESTAMP</code>
+                  <code class="ph codeph">DOUBLE</code> produces a more precise
+                  value for the <code class="ph codeph">TIMESTAMP</code> than formerly.
+                </p>
+              </li>
+              <li class="li">
+                <p class="p">
+                  Certain function calls involving <code class="ph codeph">DECIMAL</code> literals
+                  now succeed, when formerly they failed due to lack of a function
+                  signature with a <code class="ph codeph">DOUBLE</code> argument.
+                </p>
+              </li>
+              <li class="li">
+                <p class="p">
+                  Faster runtime performance for <code class="ph codeph">DECIMAL</code> constant
+                  values, through improved native code generation for all combinations
+                  of precision and scale.
+                </p>
+              </li>
+            </ul>
+            See <a class="xref" href="impala_decimal.html#decimal">DECIMAL Data Type (Impala 3.0 or higher only)</a> for details about the <code class="ph codeph">DECIMAL</code> type.
+          </div>
+        </li>
+        <li class="li">
+          <p class="p">
+            Improved type accuracy for <code class="ph codeph">CASE</code> return values.
+            If all <code class="ph codeph">WHEN</code> clauses of the <code class="ph codeph">CASE</code>
+            expression are of <code class="ph codeph">CHAR</code> type, the final result
+            is also <code class="ph codeph">CHAR</code> instead of being converted to
+            <code class="ph codeph">STRING</code>.
+            See <a class="xref" href="impala_conditional_functions.html#conditional_functions">Impala Conditional Functions</a>
+            for details about the <code class="ph codeph">CASE</code> function.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            Uncorrelated queries using the <code class="ph codeph">NOT EXISTS</code> operator
+            are now supported. Formerly, the <code class="ph codeph">NOT EXISTS</code>
+            operator was only available for correlated subqueries.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            Improved performance for reading Parquet files.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            Improved performance for <dfn class="term">top-N</dfn> queries, that is,
+            those including both <code class="ph codeph">ORDER BY</code> and
+            <code class="ph codeph">LIMIT</code> clauses.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            Impala optionally skips an arbitrary number of header lines from text input
+            files on HDFS based on the <code class="ph codeph">skip.header.line.count</code> value
+            in the <code class="ph codeph">TBLPROPERTIES</code> field of the table metadata.
+            See <a class="xref" href="impala_txtfile.html#text_data_files">Data Files for Text Tables</a> for details.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            Trailing comments are now allowed in queries processed by
+            the <span class="keyword cmdname">impala-shell</span> options <code class="ph codeph">-q</code>
+            and <code class="ph codeph">-f</code>.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            Impala can run <code class="ph codeph">COUNT</code> queries for RCFile tables
+            that include complex type columns.
+            See <a class="xref" href="impala_complex_types.html#complex_types">Complex Types (Impala 2.3 or higher only)</a> for
+            general information about working with complex types,
+            and <a class="xref" href="impala_array.html#array">ARRAY Complex Type (Impala 2.3 or higher only)</a>,
+            <a class="xref" href="impala_map.html#map">MAP Complex Type (Impala 2.3 or higher only)</a>, and <a class="xref" href="impala_struct.html#struct">STRUCT Complex Type (Impala 2.3 or higher only)</a>
+            for syntax details of each type.
+          </p>
+        </li>
+      </ul>
+
+    </div>
+  </article>
+
+
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title10" id="new_features__new_features_250">
+
+    <h2 class="title topictitle2" id="ariaid-title10">New Features in <span class="keyword">Impala 2.5</span></h2>
+
+    <div class="body conbody">
+
+      <ul class="ul">
+        <li class="li">
+          <p class="p">
+            Dynamic partition pruning. When a query refers to a partition key column in a <code class="ph codeph">WHERE</code>
+            clause, and the exact set of column values are not known until the query is executed,
+            Impala evaluates the predicate and skips the I/O for entire partitions that are not needed.
+            For example, if a table was partitioned by year, Impala would apply this technique to a query
+            such as <code class="ph codeph">SELECT c1 FROM partitioned_table WHERE year = (SELECT MAX(year) FROM other_table)</code>.
+            <span class="ph">See <a class="xref" href="impala_partitioning.html#dynamic_partition_pruning">Dynamic Partition Pruning</a> for details.</span>
+          </p>
+          <p class="p">
+            The dynamic partition pruning optimization technique lets Impala avoid reading
+            data files from partitions that are not part of the result set, even when
+            that determination cannot be made in advance. This technique is especially valuable
+            when performing join queries involving partitioned tables. For example, if a join
+            query includes an <code class="ph codeph">ON</code> clause and a <code class="ph codeph">WHERE</code> clause
+            that refer to the same columns, the query can find the set of column values that
+            match the <code class="ph codeph">WHERE</code> clause, and only scan the associated partitions
+            when evaluating the <code class="ph codeph">ON</code> clause.
+          </p>
+          <p class="p">
+            Dynamic partition pruning is controlled by the same settings as the runtime filtering feature.
+            By default, this feature is enabled at a medium level, because the maximum setting can use
+            slightly more memory for queries than in previous releases.
+            To fully enable this feature, set the query option <code class="ph codeph">RUNTIME_FILTER_MODE=GLOBAL</code>.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            Runtime filtering. This is a wide-ranging set of optimizations that are especially valuable for join queries.
+            Using the same technique as with dynamic partition pruning,
+            Impala uses the predicates from <code class="ph codeph">WHERE</code> and <code class="ph codeph">ON</code> clauses
+            to determine the subset of column values from one of the joined tables could possibly be part of the
+            result set. Impala sends a compact representation of the filter condition to the hosts in the cluster,
+            instead of the full set of values or the entire table.
+            <span class="ph">See <a class="xref" href="impala_runtime_filtering.html#runtime_filtering">Runtime Filtering for Impala Queries (Impala 2.5 or higher only)</a> for details.</span>
+          </p>
+          <p class="p">
+            By default, this feature is enabled at a medium level, because the maximum setting can use
+            slightly more memory for queries than in previous releases.
+            To fully enable this feature, set the query option <code class="ph codeph">RUNTIME_FILTER_MODE=GLOBAL</code>.
+            <span class="ph">See <a class="xref" href="impala_runtime_filter_mode.html#runtime_filter_mode">RUNTIME_FILTER_MODE Query Option (Impala 2.5 or higher only)</a> for details.</span>
+          </p>
+          <p class="p">
+            This feature involves some new query options:
+            <a class="xref" href="impala_runtime_filter_mode.html">RUNTIME_FILTER_MODE</a>,
+            <a class="xref" href="impala_max_num_runtime_filters.html">MAX_NUM_RUNTIME_FILTERS</a>,
+            <a class="xref" href="impala_runtime_bloom_filter_size.html">RUNTIME_BLOOM_FILTER_SIZE</a>,
+            <a class="xref" href="impala_runtime_filter_wait_time_ms.html">RUNTIME_FILTER_WAIT_TIME_MS</a>,
+            and <a class="xref" href="impala_disable_row_runtime_filtering.html">DISABLE_ROW_RUNTIME_FILTERING</a>.
+            <span class="ph">See
+            <a class="xref" href="impala_runtime_filter_mode.html#runtime_filter_mode">RUNTIME_FILTER_MODE</a>,
+            <a class="xref" href="impala_max_num_runtime_filters.html#max_num_runtime_filters">MAX_NUM_RUNTIME_FILTERS</a>,
+            <a class="xref" href="impala_runtime_bloom_filter_size.html#runtime_bloom_filter_size">RUNTIME_BLOOM_FILTER_SIZE</a>,
+            <a class="xref" href="impala_runtime_filter_wait_time_ms.html#runtime_filter_wait_time_ms">RUNTIME_FILTER_WAIT_TIME_MS</a>, and
+            <a class="xref" href="impala_disable_row_runtime_filtering.html#disable_row_runtime_filtering">DISABLE_ROW_RUNTIME_FILTERING</a>
+            for details.
+            </span>
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            More efficient use of the HDFS caching feature, to avoid
+            hotspots and bottlenecks that could occur if heavily used
+            cached data blocks were always processed by the same host.
+            By default, Impala now randomizes which host processes each cached
+            HDFS data block, when cached replicas are available on multiple hosts.
+            (Remember to use the <code class="ph codeph">WITH REPLICATION</code> clause with the
+            <code class="ph codeph">CREATE TABLE</code> or <code class="ph codeph">ALTER TABLE</code> statement
+            when enabling HDFS caching for a table or partition, to cache the same
+            data blocks across multiple hosts.)
+            The new query option <code class="ph codeph">SCHEDULE_RANDOM_REPLICA</code>
+
+            lets you fine-tune the interaction with HDFS caching even more.
+            <span class="ph">See <a class="xref" href="impala_perf_hdfs_caching.html#hdfs_caching">Using HDFS Caching with Impala (Impala 2.1 or higher only)</a> for details.</span>
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            The <code class="ph codeph">TRUNCATE TABLE</code> statement now accepts an <code class="ph codeph">IF EXISTS</code>
+            clause, making <code class="ph codeph">TRUNCATE TABLE</code> easier to use in setup or ETL scripts where the table might or
+            might not exist.
+            <span class="ph">See <a class="xref" href="impala_truncate_table.html#truncate_table">TRUNCATE TABLE Statement (Impala 2.3 or higher only)</a> for details.</span>
+          </p>
+        </li>
+        <li class="li">
+          <div class="p">
+            Improved performance and reliability for the <code class="ph codeph">DECIMAL</code> data type:
+            <ul class="ul">
+            <li class="li">
+              <p class="p">
+                Using <code class="ph codeph">DECIMAL</code> values in a <code class="ph codeph">GROUP BY</code> clause now
+                triggers the native code generation optimization, speeding up queries that
+                group by values such as prices.
+              </p>
+            </li>
+            <li class="li">
+              <p class="p">
+                Checking for overflow in <code class="ph codeph">DECIMAL</code>
+                multiplication is now substantially faster, making <code class="ph codeph">DECIMAL</code>
+                a more practical data type in some use cases where formerly <code class="ph codeph">DECIMAL</code>
+                was much slower than <code class="ph codeph">FLOAT</code> or <code class="ph codeph">DOUBLE</code>.
+              </p>
+            </li>
+            <li class="li">
+              <p class="p">
+                Multiplying a mixture of <code class="ph codeph">DECIMAL</code>
+                and <code class="ph codeph">FLOAT</code> or <code class="ph codeph">DOUBLE</code> values now returns the
+                <code class="ph codeph">DOUBLE</code> rather than <code class="ph codeph">DECIMAL</code>. This change avoids
+                some cases where an intermediate value would underflow or overflow and become
+                <code class="ph codeph">NULL</code> unexpectedly.
+              </p>
+            </li>
+            </ul>
+            <span class="ph">See <a class="xref" href="impala_decimal.html">DECIMAL Data Type (Impala 3.0 or higher only)</a> for details.</span>
+          </div>
+        </li>
+        <li class="li">
+          <p class="p">
+            For UDFs written in Java, or Hive UDFs reused for Impala,
+            Impala now allows parameters and return values to be primitive types.
+            Formerly, these things were required to be one of the <span class="q">"Writable"</span>
+            object types.
+            <span class="ph">See <a class="xref" href="impala_udf.html#udfs_hive">Using Hive UDFs with Impala</a> for details.</span>
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            Performance improvements for HDFS I/O. Impala now caches HDFS file handles to avoid the
+            overhead of repeatedly opening the same file.
+          </p>
+        </li>
+
+
+        <li class="li">
+          <p class="p">
+            Performance improvements for queries involving nested complex types.
+            Certain basic query types, such as counting the elements of a complex column,
+            now use an optimized code path.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            Improvements to the memory reservation mechanism for the Impala
+            admission control feature. You can specify more settings, such
+            as the timeout period and maximum aggregate memory used, for each
+            resource pool instead of globally for the Impala instance. The
+            default limit for concurrent queries (the <span class="ph uicontrol">max requests</span>
+            setting) is now unlimited instead of 200.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            Performance improvements related to code generation.
+            Even in queries where code generation is not performed
+            for some phases of execution (such as reading data from
+            Parquet tables), Impala can still use code generation in
+            other parts of the query, such as evaluating
+            functions in the <code class="ph codeph">WHERE</code> clause.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            Performance improvements for queries using aggregation functions
+            on high-cardinality columns.
+            Formerly, Impala could do unnecessary extra work to produce intermediate
+            results for operations such as <code class="ph codeph">DISTINCT</code> or <code class="ph codeph">GROUP BY</code>
+            on columns that were unique or had few duplicate values.
+            Now, Impala decides at run time whether it is more efficient to
+            do an initial aggregation phase and pass along a smaller set of intermediate data,
+            or to pass raw intermediate data back to next phase of query processing to be aggregated there.
+            This feature is known as <dfn class="term">streaming pre-aggregation</dfn>.
+            In case of performance regression, this feature can be turned off
+            using the <code class="ph codeph">DISABLE_STREAMING_PREAGGREGATIONS</code> query option.
+            <span class="ph">See <a class="xref" href="impala_disable_streaming_preaggregations.html#disable_streaming_preaggregations">DISABLE_STREAMING_PREAGGREGATIONS Query Option (Impala 2.5 or higher only)</a> for details.</span>
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            Spill-to-disk feature now always recommended. In earlier releases, the spill-to-disk feature
+            could be turned off using a pair of configuration settings,
+            <code class="ph codeph">enable_partitioned_aggregation=false</code> and
+            <code class="ph codeph">enable_partitioned_hash_join=false</code>.
+            The latest improvements in the spill-to-disk mechanism, and related features that
+            interact with it, make this feature robust enough that disabling it is now
+            no longer needed or supported. In particular, some new features in <span class="keyword">Impala 2.5</span>
+            and higher do not work when the spill-to-disk feature is disabled.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            Improvements to scripting capability for the <span class="keyword cmdname">impala-shell</span> command,
+            through user-specified substitution variables that can appear in statements processed
+            by <span class="keyword cmdname">impala-shell</span>:
+          </p>
+          <ul class="ul">
+            <li class="li">
+              <p class="p">
+                The <code class="ph codeph">--var</code> command-line option lets you pass key-value pairs to
+                <span class="keyword cmdname">impala-shell</span>. The shell can substitute the values
+                into queries before executing them, where the query text contains the notation
+                <code class="ph codeph">${var:<var class="keyword varname">varname</var>}</code>. For example, you might prepare a SQL file
+                containing a set of DDL statements and queries containing variables for
+                database and table names, and then pass the applicable names as part of the
+                <code class="ph codeph">impala-shell -f <var class="keyword varname">filename</var></code> command.
+                <span class="ph">See <a class="xref" href="impala_shell_running_commands.html#shell_running_commands">Running Commands and SQL Statements in impala-shell</a> for details.</span>
+              </p>
+            </li>
+            <li class="li">
+              <p class="p">
+                The <code class="ph codeph">SET</code> and <code class="ph codeph">UNSET</code> commands within the
+                <span class="keyword cmdname">impala-shell</span> interpreter now work with user-specified
+                substitution variables, as well as the built-in query options.
+                The two kinds of variables are divided in the <code class="ph codeph">SET</code> output.
+                As with variables defined by the <code class="ph codeph">--var</code> command-line option,
+                you refer to the user-specified substitution variables in queries by using
+                the notation <code class="ph codeph">${var:<var class="keyword varname">varname</var>}</code>
+                in the query text. Because the substitution variables are processed by
+                <span class="keyword cmdname">impala-shell</span> instead of the <span class="keyword cmdname">impalad</span>
+                backend, you cannot define your own substitution variables through the
+                <code class="ph codeph">SET</code> statement in a JDBC or ODBC application.
+                <span class="ph">See <a class="xref" href="impala_set.html#set">SET Statement</a> for details.</span>
+              </p>
+            </li>
+          </ul>
+        </li>
+        <li class="li">
+          <p class="p">
+            Performance improvements for query startup. Impala better parallelizes certain work
+            when coordinating plan distribution between <span class="keyword cmdname">impalad</span> instances, which improves
+            startup time for queries involving tables with many partitions on large clusters,
+            or complicated queries with many plan fragments.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            Performance and scalability improvements for tables with many partitions.
+            The memory requirements on the coordinator node are reduced, making it substantially
+            faster and less resource-intensive
+            to do joins involving several tables with thousands of partitions each.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            Whitelisting for access to internal APIs. For applications that need direct access
+            to Impala APIs, without going through the HiveServer2 or Beeswax interfaces, you can
+            specify a list of Kerberos users who are allowed to call those APIs. By default, the
+            <code class="ph codeph">impala</code> and <code class="ph codeph">hdfs</code> users are the only ones authorized
+            for this kind of access.
+            Any users not explicitly authorized through the <code class="ph codeph">internal_principals_whitelist</code>
+            configuration setting are blocked from accessing the APIs. This setting applies to all the
+            Impala-related daemons, although currently it is primarily used for HDFS to control the
+            behavior of the catalog server.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            Improvements to Impala integration and usability for Hue. (The code changes
+            are actually on the Hue side.)
+          </p>
+          <ul class="ul">
+          <li class="li">
+            <p class="p">
+              The list of tables now refreshes dynamically.
+            </p>
+          </li>
+          </ul>
+        </li>
+        <li class="li">
+          <p class="p">
+            Usability improvements for case-insensitive queries.
+            You can now use the operators <code class="ph codeph">ILIKE</code> and <code class="ph codeph">IREGEXP</code>
+            to perform case-insensitive wildcard matches or regular expression matches,
+            rather than explicitly converting column values with <code class="ph codeph">UPPER</code>
+            or <code class="ph codeph">LOWER</code>.
+            <span class="ph">See <a class="xref" href="impala_operators.html#ilike">ILIKE Operator</a> and <a class="xref" href="impala_operators.html#iregexp">IREGEXP Operator</a> for details.</span>
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            Performance and reliability improvements for DDL and insert operations on partitioned tables with a large
+            number of partitions. Impala only re-evaluates metadata for partitions that are affected by
+            a DDL operation, not all partitions in the table. While a DDL or insert statement is in progress,
+            other Impala statements that attempt to modify metadata for the same table wait until the first one
+            finishes.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            Reliability improvements for the <code class="ph codeph">LOAD DATA</code> statement.
+            Previously, this statement would fail if the source HDFS directory
+            contained any subdirectories at all. Now, the statement ignores
+            any hidden subdirectories, for example <span class="ph filepath">_impala_insert_staging</span>.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            A new operator, <code class="ph codeph">IS [NOT] DISTINCT FROM</code>, lets you compare values
+            and always get a <code class="ph codeph">true</code> or <code class="ph codeph">false</code> result,
+            even if one or both of the values are <code class="ph codeph">NULL</code>.
+            The <code class="ph codeph">IS NOT DISTINCT FROM</code> operator, or its equivalent
+            <code class="ph codeph">&lt;=&gt;</code> notation, improves the efficiency of join queries that
+            treat key values that are <code class="ph codeph">NULL</code> in both tables as equal.
+            <span class="ph">See <a class="xref" href="impala_operators.html#is_distinct_from">IS DISTINCT FROM Operator</a> for details.</span>
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            Security enhancements for the <span class="keyword cmdname">impala-shell</span> command.
+            A new option, <code class="ph codeph">--ldap_password_cmd</code>, lets you specify
+            a command to retrieve the LDAP password. The resulting password is
+            then used to authenticate the <span class="keyword cmdname">impala-shell</span> command
+            with the LDAP server.
+            <span class="ph">See <a class="xref" href="impala_shell_options.html">impala-shell Configuration Options</a> for details.</span>
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            The <code class="ph codeph">CREATE TABLE AS SELECT</code> statement now accepts a
+            <code class="ph codeph">PARTITIONED BY</code> clause, which lets you create a
+            partitioned table and insert data into it with a single statement.
+            <span class="ph">See <a class="xref" href="impala_create_table.html#create_table">CREATE TABLE Statement</a> for details.</span>
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            User-defined functions (UDFs and UDAFs) written in C++ now persist automatically
+            when the <span class="keyword cmdname">catalogd</span> daemon is restarted. You no longer
+            have to run the <code class="ph codeph">CREATE FUNCTION</code> statements again after a restart.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            User-defined functions (UDFs) written in Java can now persist
+            when the <span class="keyword cmdname">catalogd</span> daemon is restarted, and can be shared
+            transparently between Impala and Hive. You must do a one-time operation to recreate these
+            UDFs using new <code class="ph codeph">CREATE FUNCTION</code> syntax, without a signature for arguments
+            or the return value. Afterwards, you no longer have to run the <code class="ph codeph">CREATE FUNCTION</code>
+            statements again after a restart.
+            Although Impala does not have visibility into the UDFs that implement the
+            Hive built-in functions, user-created Hive UDFs are now automatically available
+            for calling through Impala.
+            <span class="ph">See <a class="xref" href="impala_create_function.html#create_function">CREATE FUNCTION Statement</a> for details.</span>
+          </p>
+        </li>
+        <li class="li">
+
+          <p class="p">
+            Reliability enhancements for memory management. Some aggregation and join queries
+            that formerly might have failed with an out-of-memory error due to memory contention,
+            now can succeed using the spill-to-disk mechanism.
+          </p>
+        </li>
+        <li class="li">
+
+          <p class="p">
+            The <code class="ph codeph">SHOW DATABASES</code> statement now returns two columns rather than one.
+            The second column includes the associated comment string, if any, for each database.
+            Adjust any application code that examines the list of databases and assumes the
+            result set contains only a single column.
+            <span class="ph">See <a class="xref" href="impala_show.html#show_databases">SHOW DATABASES</a> for details.</span>
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            A new optimization speeds up aggregation operations that involve only the partition key
+            columns of partitioned tables. For example, a query such as <code class="ph codeph">SELECT COUNT(DISTINCT k), MIN(k), MAX(k) FROM t1</code>
+            can avoid reading any data files if <code class="ph codeph">T1</code> is a partitioned table and <code class="ph codeph">K</code>
+            is one of the partition key columns. Because this technique can produce different results in cases
+            where HDFS files in a partition are manually deleted or are empty, you must enable the optimization
+            by setting the query option <code class="ph codeph">OPTIMIZE_PARTITION_KEY_SCANS</code>.
+            <span class="ph">See <a class="xref" href="impala_optimize_partition_key_scans.html">OPTIMIZE_PARTITION_KEY_SCANS Query Option (Impala 2.5 or higher only)</a> for details.</span>
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            The <code class="ph codeph">DESCRIBE</code> statement can now display metadata about a database, using the
+            syntax <code class="ph codeph">DESCRIBE DATABASE <var class="keyword varname">db_name</var></code>.
+            <span class="ph">See <a class="xref" href="impala_describe.html#describe">DESCRIBE Statement</a> for details.</span>
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            The <code class="ph codeph">uuid()</code> built-in function generates an
+            alphanumeric value that you can use as a guaranteed unique identifier.
+            The uniqueness applies even across tables, for cases where an ascending
+            numeric sequence is not suitable.
+            <span class="ph">See <a class="xref" href="impala_misc_functions.html#misc_functions">Impala Miscellaneous Functions</a> for details.</span>
+          </p>
+        </li>
+      </ul>
+
+    </div>
+  </article>
+
+
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title11" id="new_features__new_features_240">
+
+    <h2 class="title topictitle2" id="ariaid-title11">New Features in <span class="keyword">Impala 2.4</span></h2>
+
+    <div class="body conbody">
+
+      <ul class="ul">
+        <li class="li">
+          <p class="p">
+            Impala can be used on the DSSD D5 Storage Appliance.
+            From a user perspective, the Impala features are the same as in <span class="keyword">Impala 2.3</span>.
+          </p>
+        </li>
+      </ul>
+
+    </div>
+  </article>
+
+
+
+
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title12" id="new_features__new_features_230">
+
+    <h2 class="title topictitle2" id="ariaid-title12">New Features in <span class="keyword">Impala 2.3</span></h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        The following are the major new features in Impala 2.3.x. This major release
+        contains improvements to SQL syntax (particularly new support for complex types), performance,
+        manageability, security.
+      </p>
+
+      <ul class="ul">
+
+        <li class="li">
+          <p class="p">
+            Complex data types: <code class="ph codeph">STRUCT</code>, <code class="ph codeph">ARRAY</code>, and <code class="ph codeph">MAP</code>. These
+            types can encode multiple named fields, positional items, or key-value pairs within a single column.
+            You can combine these types to produce nested types with arbitrarily deep nesting,
+            such as an <code class="ph codeph">ARRAY</code> of <code class="ph codeph">STRUCT</code> values,
+            a <code class="ph codeph">MAP</code> where each key-value pair is an <code class="ph codeph">ARRAY</code> of other <code class="ph codeph">MAP</code> values,
+            and so on. Currently, complex data types are only supported for the Parquet file format.
+            <span class="ph">See <a class="xref" href="impala_complex_types.html#complex_types">Complex Types (Impala 2.3 or higher only)</a> for usage details and <a class="xref" href="impala_array.html#array">ARRAY Complex Type (Impala 2.3 or higher only)</a>, <a class="xref" href="impala_struct.html#struct">STRUCT Complex Type (Impala 2.3 or higher only)</a>, and <a class="xref" href="impala_map.html#map">MAP Complex Type (Impala 2.3 or higher only)</a> for syntax.</span>
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            Column-level authorization lets you define access to particular columns within a table,
+            rather than the entire table. This feature lets you reduce the reliance on creating views to
+            set up authorization schemes for subsets of information.
+            See <span class="xref">the documentation for Apache Sentry</span> for background details, and
+            <a class="xref" href="impala_grant.html#grant">GRANT Statement (Impala 2.0 or higher only)</a> and <a class="xref" href="impala_revoke.html#revoke">REVOKE Statement (Impala 2.0 or higher only)</a> for Impala-specific syntax.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            The <code class="ph codeph">TRUNCATE TABLE</code> statement removes all the data from a table without removing the table itself.
+            <span class="ph">See <a class="xref" href="impala_truncate_table.html#truncate_table">TRUNCATE TABLE Statement (Impala 2.3 or higher only)</a> for details.</span>
+          </p>
+        </li>
+
+        <li class="li" id="new_features_230__IMPALA-2015">
+          <p class="p">
+            Nested loop join queries. Some join queries that formerly required equality comparisons can now use
+            operators such as <code class="ph codeph">&lt;</code> or <code class="ph codeph">&gt;=</code>. This same join mechanism is used
+            internally to optimize queries that retrieve values from complex type columns.
+            <span class="ph">See <a class="xref" href="impala_joins.html#joins">Joins in Impala SELECT Statements</a> for details about Impala join queries.</span>
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            Reduced memory usage and improved performance and robustness for spill-to-disk feature.
+            <span class="ph">See <a class="xref" href="impala_scalability.html#spill_to_disk">SQL Operations that Spill to Disk</a> for details about this feature.</span>
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            Performance improvements for querying Parquet data files containing multiple row groups
+            and multiple data blocks:
+          </p>
+          <ul class="ul">
+          <li class="li">
+          <p class="p"> For files written by Hive, SparkSQL, and other Parquet MR writers
+                and spanning multiple HDFS blocks, Impala now scans the extra
+                data blocks locally when possible, rather than using remote
+                reads. </p>
+          </li>
+          <li class="li">
+          <p class="p">
+            Impala queries benefit from the improved alignment of row groups with HDFS blocks for Parquet
+            files written by Hive, MapReduce, and other components. (Impala itself never writes
+            multiblock Parquet files, so the alignment change does not apply to Parquet files produced by Impala.)
+            These Parquet writers now add padding to Parquet files that they write to align row groups with HDFS blocks.
+            The <code class="ph codeph">parquet.writer.max-padding</code> setting specifies the maximum number of bytes, by default
+            8 megabytes, that can be added to the file between row groups to fill the gap at the end of one block
+            so that the next row group starts at the beginning of the next block.
+            If the gap is larger than this size, the writer attempts to fit another entire row group in the remaining space.
+            Include this setting in the <span class="ph filepath">hive-site</span> configuration file to influence Parquet files written by Hive,
+            or the <span class="ph filepath">hdfs-site</span> configuration file to influence Parquet files written by all non-Impala components.
+          </p>
+          </li>
+          </ul>
+          <p class="p">
+            See <a class="xref" href="impala_parquet.html#parquet">Using the Parquet File Format with Impala Tables</a> for instructions about using Parquet data files
+            with Impala.
+          </p>
+        </li>
+
+        <li class="li" id="new_features_230__IMPALA-1660">
+          <p class="p">
+            Many new built-in scalar functions, for convenience and enhanced portability of SQL that uses common industry extensions.
+          </p>
+
+          <p class="p">
+            Math functions<span class="ph"> (see <a class="xref" href="impala_math_functions.html#math_functions">Impala Mathematical Functions</a> for details)</span>:
+          </p>
+          <ul class="ul">
+            <li class="li">
+              <code class="ph codeph">ATAN2</code>
+            </li>
+
+            <li class="li">
+              <code class="ph codeph">COSH</code>
+            </li>
+
+            <li class="li">
+              <code class="ph codeph">COT</code>
+            </li>
+
+            <li class="li">
+              <code class="ph codeph">DCEIL</code>
+            </li>
+
+            <li class="li">
+              <code class="ph codeph">DEXP</code>
+            </li>
+
+            <li class="li">
+              <code class="ph codeph">DFLOOR</code>
+            </li>
+
+            <li class="li">
+              <code class="ph codeph">DLOG10</code>
+            </li>
+
+            <li class="li">
+              <code class="ph codeph">DPOW</code>
+            </li>
+
+            <li class="li">
+              <code class="ph codeph">DROUND</code>
+            </li>
+
+            <li class="li">
+              <code class="ph codeph">DSQRT</code>
+            </li>
+
+            <li class="li">
+              <code class="ph codeph">DTRUNC</code>
+            </li>
+
+            <li class="li">
+              <code class="ph codeph">FACTORIAL</code>, and corresponding <code class="ph codeph">!</code> operator
+            </li>
+
+            <li class="li">
+              <code class="ph codeph">FPOW</code>
+            </li>
+
+            <li class="li">
+              <code class="ph codeph">RADIANS</code>
+            </li>
+
+            <li class="li">
+              <code class="ph codeph">RANDOM</code>
+            </li>
+
+            <li class="li">
+              <code class="ph codeph">SINH</code>
+            </li>
+
+            <li class="li">
+              <code class="ph codeph">TANH</code>
+            </li>
+          </ul>
+
+          <p class="p">
+            String functions<span class="ph"> (see <a class="xref" href="impala_string_functions.html#string_functions">Impala String Functions</a> for details)</span>:
+          </p>
+          <ul class="ul">
+            <li class="li">
+              <code class="ph codeph">BTRIM</code>
+            </li>
+            <li class="li">
+              <code class="ph codeph">CHR</code>
+            </li>
+            <li class="li">
+              <code class="ph codeph">REGEXP_LIKE</code>
+            </li>
+            <li class="li">
+              <code class="ph codeph">SPLIT_PART</code>
+            </li>
+          </ul>
+
+          <p class="p">
+            Date and time functions<span class="ph"> (see <a class="xref" href="impala_datetime_functions.html#datetime_functions">Impala Date and Time Functions</a> for details)</span>:
+          </p>
+          <ul class="ul">
+              <li class="li">
+                <code class="ph codeph">INT_MONTHS_BETWEEN</code>
+              </li>
+              <li class="li">
+                <code class="ph codeph">MONTHS_BETWEEN</code>
+              </li>
+              <li class="li">
+                <code class="ph codeph">TIMEOFDAY</code>
+              </li>
+              <li class="li">
+                <code class="ph codeph">TIMESTAMP_CMP</code>
+              </li>
+          </ul>
+
+          <p class="p">
+            Bit manipulation functions<span class="ph"> (see <a class="xref" href="impala_bit_functions.html#bit_functions">Impala Bit Functions</a> for details)</span>:
+          </p>
+          <ul class="ul">
+            <li class="li">
+              <code class="ph codeph">BITAND</code>
+            </li>
+
+            <li class="li">
+              <code class="ph codeph">BITNOT</code>
+            </li>
+
+            <li class="li">
+              <code class="ph codeph">BITOR</code>
+            </li>
+
+            <li class="li">
+              <code class="ph codeph">BITXOR</code>
+            </li>
+
+            <li class="li">
+              <code class="ph codeph">COUNTSET</code>
+            </li>
+
+            <li class="li">
+              <code class="ph codeph">GETBIT</code>
+            </li>
+
+            <li class="li">
+              <code class="ph codeph">ROTATELEFT</code>
+            </li>
+
+            <li class="li">
+              <code class="ph codeph">ROTATERIGHT</code>
+            </li>
+
+            <li class="li">
+              <code class="ph codeph">SETBIT</code>
+            </li>
+
+            <li class="li">
+              <code class="ph codeph">SHIFTLEFT</code>
+            </li>
+
+            <li class="li">
+              <code class="ph codeph">SHIFTRIGHT</code>
+            </li>
+          </ul>
+          <p class="p">
+            Type conversion functions<span class="ph"> (see <a class="xref" href="impala_conversion_functions.html#conversion_functions">Impala Type Conversion Functions</a> for details)</span>:
+          </p>
+          <ul class="ul">
+            <li class="li">
+              <code class="ph codeph">TYPEOF</code>
+            </li>
+          </ul>
+          <p class="p">
+            The <code class="ph codeph">effective_user()</code> function<span class="ph"> (see <a class="xref" href="impala_misc_functions.html#misc_functions">Impala Miscellaneous Functions</a> for details)</span>.
+          </p>
+        </li>
+
+        <li class="li" id="new_features_230__IMPALA-2081">
+          <p class="p">
+            New built-in analytic

<TRUNCATED>

[42/51] [partial] impala git commit: [DOCS] Impala doc site update for 3.0

Posted by mi...@apache.org.

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_components.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_components.html b/docs/build3x/html/topics/impala_components.html
new file mode 100644
index 0000000..eb6e0f6
--- /dev/null
+++ b/docs/build3x/html/topics/impala_components.html
@@ -0,0 +1,227 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_concepts.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="intro_components"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Components of the Impala Server</title></head><body id="intro_components"><main 
 role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Components of the Impala Server</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+      The Impala server is a distributed, massively parallel processing (MPP) database engine. It consists of
+      different daemon processes that run on specific hosts within your <span class="keyword"></span> cluster.
+    </p>
+
+    <p class="p toc inpage"></p>
+  </div>
+
+  <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_concepts.html">Impala Concepts and Architecture</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="intro_components__intro_impalad">
+
+    <h2 class="title topictitle2" id="ariaid-title2">The Impala Daemon</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        The core Impala component is a daemon process that runs on each DataNode of the cluster, physically represented
+        by the <code class="ph codeph">impalad</code> process. It reads and writes to data files; accepts queries transmitted
+        from the <code class="ph codeph">impala-shell</code> command, Hue, JDBC, or ODBC; parallelizes the queries and
+        distributes work across the cluster; and transmits intermediate query results back to the
+        central coordinator node.
+      </p>
+
+      <p class="p">
+        You can submit a query to the Impala daemon running on any DataNode, and that instance of the daemon serves as the
+        <dfn class="term">coordinator node</dfn> for that query. The other nodes transmit partial results back to the
+        coordinator, which constructs the final result set for a query. When running experiments with functionality
+        through the <code class="ph codeph">impala-shell</code> command, you might always connect to the same Impala daemon for
+        convenience. For clusters running production workloads, you might load-balance by
+        submitting each query to a different Impala daemon in round-robin style, using the JDBC or ODBC interfaces.
+      </p>
+
+      <p class="p">
+        The Impala daemons are in constant communication with the <dfn class="term">statestore</dfn>, to confirm which nodes
+        are healthy and can accept new work.
+      </p>
+
+      <p class="p">
+        They also receive broadcast messages from the <span class="keyword cmdname">catalogd</span> daemon (introduced in Impala 1.2)
+        whenever any Impala node in the cluster creates, alters, or drops any type of object, or when an
+        <code class="ph codeph">INSERT</code> or <code class="ph codeph">LOAD DATA</code> statement is processed through Impala. This
+        background communication minimizes the need for <code class="ph codeph">REFRESH</code> or <code class="ph codeph">INVALIDATE
+        METADATA</code> statements that were needed to coordinate metadata across nodes prior to Impala 1.2.
+      </p>
+
+      <p class="p">
+        In <span class="keyword">Impala 2.9</span> and higher, you can control which hosts act as query coordinators
+        and which act as query executors, to improve scalability for highly concurrent workloads on large clusters.
+        See <a class="xref" href="impala_scalability.html">Scalability Considerations for Impala</a> for details.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Related information:</strong> <a class="xref" href="impala_config_options.html#config_options">Modifying Impala Startup Options</a>,
+        <a class="xref" href="impala_processes.html#processes">Starting Impala</a>, <a class="xref" href="impala_timeouts.html#impalad_timeout">Setting the Idle Query and Idle Session Timeouts for impalad</a>,
+        <a class="xref" href="impala_ports.html#ports">Ports Used by Impala</a>, <a class="xref" href="impala_proxy.html#proxy">Using Impala through a Proxy for High Availability</a>
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="intro_components__intro_statestore">
+
+    <h2 class="title topictitle2" id="ariaid-title3">The Impala Statestore</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        The Impala component known as the <dfn class="term">statestore</dfn> checks on the health of Impala daemons on all the
+        DataNodes in a cluster, and continuously relays its findings to each of those daemons. It is physically
+        represented by a daemon process named <code class="ph codeph">statestored</code>; you only need such a process on one
+        host in the cluster. If an Impala daemon goes offline due to hardware failure, network error, software issue,
+        or other reason, the statestore informs all the other Impala daemons so that future queries can avoid making
+        requests to the unreachable node.
+      </p>
+
+      <p class="p">
+        Because the statestore's purpose is to help when things go wrong, it is not critical to the normal
+        operation of an Impala cluster. If the statestore is not running or becomes unreachable, the Impala daemons
+        continue running and distributing work among themselves as usual; the cluster just becomes less robust if
+        other Impala daemons fail while the statestore is offline. When the statestore comes back online, it re-establishes
+        communication with the Impala daemons and resumes its monitoring function.
+      </p>
+
+      <p class="p">
+        Most considerations for load balancing and high availability apply to the <span class="keyword cmdname">impalad</span> daemon.
+        The <span class="keyword cmdname">statestored</span> and <span class="keyword cmdname">catalogd</span> daemons do not have special
+        requirements for high availability, because problems with those daemons do not result in data loss.
+        If those daemons become unavailable due to an outage on a particular
+        host, you can stop the Impala service, delete the <span class="ph uicontrol">Impala StateStore</span> and
+        <span class="ph uicontrol">Impala Catalog Server</span> roles, add the roles on a different host, and restart the
+        Impala service.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+      <p class="p">
+        <a class="xref" href="impala_scalability.html#statestore_scalability">Scalability Considerations for the Impala Statestore</a>,
+        <a class="xref" href="impala_config_options.html#config_options">Modifying Impala Startup Options</a>, <a class="xref" href="impala_processes.html#processes">Starting Impala</a>,
+        <a class="xref" href="impala_timeouts.html#statestore_timeout">Increasing the Statestore Timeout</a>, <a class="xref" href="impala_ports.html#ports">Ports Used by Impala</a>
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="intro_components__intro_catalogd">
+
+    <h2 class="title topictitle2" id="ariaid-title4">The Impala Catalog Service</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        The Impala component known as the <dfn class="term">catalog service</dfn> relays the metadata changes from Impala SQL
+        statements to all the Impala daemons in a cluster. It is physically represented by a daemon process named
+        <code class="ph codeph">catalogd</code>; you only need such a process on one host in the cluster. Because the requests
+        are passed through the statestore daemon, it makes sense to run the <span class="keyword cmdname">statestored</span> and
+        <span class="keyword cmdname">catalogd</span> services on the same host.
+      </p>
+
+      <p class="p">
+        The catalog service avoids the need to issue
+        <code class="ph codeph">REFRESH</code> and <code class="ph codeph">INVALIDATE METADATA</code> statements when the metadata changes are
+        performed by statements issued through Impala. When you create a table, load data, and so on through Hive,
+        you do need to issue <code class="ph codeph">REFRESH</code> or <code class="ph codeph">INVALIDATE METADATA</code> on an Impala node
+        before executing a query there.
+      </p>
+
+      <p class="p">
+        This feature touches a number of aspects of Impala:
+      </p>
+
+
+
+      <ul class="ul" id="intro_catalogd__catalogd_xrefs">
+        <li class="li">
+          <p class="p">
+            See <a class="xref" href="impala_install.html#install">Installing Impala</a>, <a class="xref" href="impala_upgrading.html#upgrading">Upgrading Impala</a> and
+            <a class="xref" href="impala_processes.html#processes">Starting Impala</a>, for usage information for the
+            <span class="keyword cmdname">catalogd</span> daemon.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            The <code class="ph codeph">REFRESH</code> and <code class="ph codeph">INVALIDATE METADATA</code> statements are not needed
+            when the <code class="ph codeph">CREATE TABLE</code>, <code class="ph codeph">INSERT</code>, or other table-changing or
+            data-changing operation is performed through Impala. These statements are still needed if such
+            operations are done through Hive or by manipulating data files directly in HDFS, but in those cases the
+            statements only need to be issued on one Impala node rather than on all nodes. See
+            <a class="xref" href="impala_refresh.html#refresh">REFRESH Statement</a> and
+            <a class="xref" href="impala_invalidate_metadata.html#invalidate_metadata">INVALIDATE METADATA Statement</a> for the latest usage information for
+            those statements.
+          </p>
+        </li>
+      </ul>
+
+      <div class="p">
+        Use <code class="ph codeph">--load_catalog_in_background</code> option to control when
+        the metadata of a table is loaded.
+        <ul class="ul">
+          <li class="li">
+            If set to <code class="ph codeph">false</code>, the metadata of a table is
+            loaded when it is referenced for the first time. This means that the
+            first run of a particular query can be slower than subsequent runs.
+            Starting in Impala 2.2, the default for
+            <code class="ph codeph">load_catalog_in_background</code> is
+            <code class="ph codeph">false</code>.
+          </li>
+          <li class="li">
+            If set to <code class="ph codeph">true</code>, the catalog service attempts to
+            load metadata for a table even if no query needed that metadata. So
+            metadata will possibly be already loaded when the first query that
+            would need it is run. However, for the following reasons, we
+            recommend not to set the option to <code class="ph codeph">true</code>.
+            <ul class="ul">
+              <li class="li">
+                Background load can interfere with query-specific metadata
+                loading. This can happen on startup or after invalidating
+                metadata, with a duration depending on the amount of metadata,
+                and can lead to a seemingly random long running queries that are
+                difficult to diagnose.
+              </li>
+              <li class="li">
+                Impala may load metadata for tables that are possibly never
+                used, potentially increasing catalog size and consequently memory
+                usage for both catalog service and Impala Daemon.
+              </li>
+            </ul>
+          </li>
+        </ul>
+      </div>
+
+      <p class="p">
+        Most considerations for load balancing and high availability apply to the <span class="keyword cmdname">impalad</span> daemon.
+        The <span class="keyword cmdname">statestored</span> and <span class="keyword cmdname">catalogd</span> daemons do not have special
+        requirements for high availability, because problems with those daemons do not result in data loss.
+        If those daemons become unavailable due to an outage on a particular
+        host, you can stop the Impala service, delete the <span class="ph uicontrol">Impala StateStore</span> and
+        <span class="ph uicontrol">Impala Catalog Server</span> roles, add the roles on a different host, and restart the
+        Impala service.
+      </p>
+
+      <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+        <p class="p">
+        In Impala 1.2.4 and higher, you can specify a table name with <code class="ph codeph">INVALIDATE METADATA</code> after
+        the table is created in Hive, allowing you to make individual tables visible to Impala without doing a full
+        reload of the catalog metadata. Impala 1.2.4 also includes other changes to make the metadata broadcast
+        mechanism faster and more responsive, especially during Impala startup. See
+        <a class="xref" href="../shared/../topics/impala_new_features.html#new_features_124">New Features in Impala 1.2.4</a> for details.
+      </p>
+      </div>
+
+      <p class="p">
+        <strong class="ph b">Related information:</strong> <a class="xref" href="impala_config_options.html#config_options">Modifying Impala Startup Options</a>,
+        <a class="xref" href="impala_processes.html#processes">Starting Impala</a>, <a class="xref" href="impala_ports.html#ports">Ports Used by Impala</a>
+      </p>
+    </div>
+  </article>
+</article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_compression_codec.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_compression_codec.html b/docs/build3x/html/topics/impala_compression_codec.html
new file mode 100644
index 0000000..5933efa
--- /dev/null
+++ b/docs/build3x/html/topics/impala_compression_codec.html
@@ -0,0 +1,92 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="compression_codec"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>COMPRESSION_CODEC Query Option (Impala 2.0 or higher only)</title></head><body id="compression_codec"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">COMPRESSION_CODEC Query Option (<span class="keyword">Impala 2.0</span> or higher only)</h1>
+
+
+
+  <div class="body conbody">
+
+
+
+
+
+    <p class="p">
+
+      When Impala writes Parquet data files using the <code class="ph codeph">INSERT</code> statement, the underlying compression
+      is controlled by the <code class="ph codeph">COMPRESSION_CODEC</code> query option.
+    </p>
+
+    <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+      Prior to Impala 2.0, this option was named <code class="ph codeph">PARQUET_COMPRESSION_CODEC</code>. In Impala 2.0 and
+      later, the <code class="ph codeph">PARQUET_COMPRESSION_CODEC</code> name is not recognized. Use the more general name
+      <code class="ph codeph">COMPRESSION_CODEC</code> for new code.
+    </div>
+
+    <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>SET COMPRESSION_CODEC=<var class="keyword varname">codec_name</var>;</code></pre>
+
+    <p class="p">
+      The allowed values for this query option are <code class="ph codeph">SNAPPY</code> (the default), <code class="ph codeph">GZIP</code>,
+      and <code class="ph codeph">NONE</code>.
+    </p>
+
+    <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+      A Parquet file created with <code class="ph codeph">COMPRESSION_CODEC=NONE</code> is still typically smaller than the
+      original data, due to encoding schemes such as run-length encoding and dictionary encoding that are applied
+      separately from compression.
+    </div>
+
+    <p class="p"></p>
+
+    <p class="p">
+      The option value is not case-sensitive.
+    </p>
+
+    <p class="p">
+      If the option is set to an unrecognized value, all kinds of queries will fail due to the invalid option
+      setting, not just queries involving Parquet tables. (The value <code class="ph codeph">BZIP2</code> is also recognized, but
+      is not compatible with Parquet tables.)
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Type:</strong> <code class="ph codeph">STRING</code>
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Default:</strong> <code class="ph codeph">SNAPPY</code>
+    </p>
+
+
+    <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>set compression_codec=gzip;
+insert into parquet_table_highly_compressed select * from t1;
+
+set compression_codec=snappy;
+insert into parquet_table_compression_plus_fast_queries select * from t1;
+
+set compression_codec=none;
+insert into parquet_table_no_compression select * from t1;
+
+set compression_codec=foo;
+select * from t1 limit 5;
+ERROR: Invalid compression codec: foo
+</code></pre>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+    <p class="p">
+      For information about how compressing Parquet data files affects query performance, see
+      <a class="xref" href="impala_parquet.html#parquet_compression">Snappy and GZip Compression for Parquet Data Files</a>.
+    </p>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_compute_stats.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_compute_stats.html b/docs/build3x/html/topics/impala_compute_stats.html
new file mode 100644
index 0000000..407ba97
--- /dev/null
+++ b/docs/build3x/html/topics/impala_compute_stats.html
@@ -0,0 +1,637 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_langref_sql.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="compute_stats"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>COMPUTE STATS Statement</title></head><body id="compute_stats"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">COMPUTE STATS Statement</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+       The
+      COMPUTE STATS statement gathers information about volume and distribution
+      of data in a table and all associated columns and partitions. The
+      information is stored in the metastore database, and used by Impala to
+      help optimize queries. For example, if Impala can determine that a table
+      is large or small, or has many or few distinct values it can organize and
+      parallelize the work appropriately for a join query or insert operation.
+      For details about the kinds of information gathered by this statement, see
+        <a class="xref" href="impala_perf_stats.html#perf_stats">Table and Column Statistics</a>.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+<pre class="pre codeblock"><code><span class="ph">COMPUTE STATS [<var class="keyword varname">db_name</var>.]<var class="keyword varname">table_name</var>  [ ( <var class="keyword varname">column_list</var> ) ] [TABLESAMPLE SYSTEM(<var class="keyword varname">percentage</var>) [REPEATABLE(<var class="keyword varname">seed</var>)]]</span>
+
+<var class="keyword varname">column_list</var> ::= <var class="keyword varname">column_name</var> [ , <var class="keyword varname">column_name</var>, ... ]
+
+COMPUTE INCREMENTAL STATS [<var class="keyword varname">db_name</var>.]<var class="keyword varname">table_name</var> [PARTITION (<var class="keyword varname">partition_spec</var>)]
+
+<var class="keyword varname">partition_spec</var> ::= <var class="keyword varname">simple_partition_spec</var> | <span class="ph"><var class="keyword varname">complex_partition_spec</var></span>
+
+<var class="keyword varname">simple_partition_spec</var> ::= <var class="keyword varname">partition_col</var>=<var class="keyword varname">constant_value</var>
+
+<span class="ph"><var class="keyword varname">complex_partition_spec</var> ::= <var class="keyword varname">comparison_expression_on_partition_col</var></span>
+</code></pre>
+
+    <p class="p">
+        The <code class="ph codeph">PARTITION</code> clause is only allowed in combination with the <code class="ph codeph">INCREMENTAL</code>
+        clause. It is optional for <code class="ph codeph">COMPUTE INCREMENTAL STATS</code>, and required for <code class="ph codeph">DROP
+        INCREMENTAL STATS</code>. Whenever you specify partitions through the <code class="ph codeph">PARTITION
+        (<var class="keyword varname">partition_spec</var>)</code> clause in a <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> or
+        <code class="ph codeph">DROP INCREMENTAL STATS</code> statement, you must include all the partitioning columns in the
+        specification, and specify constant values for all the partition key columns.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+    <p class="p">
+      Originally, Impala relied on users to run the Hive <code class="ph codeph">ANALYZE
+        TABLE</code> statement, but that method of gathering statistics proved
+      unreliable and difficult to use. The Impala <code class="ph codeph">COMPUTE STATS</code>
+      statement was built to improve the reliability and user-friendliness of
+      this operation. <code class="ph codeph">COMPUTE STATS</code> does not require any setup
+      steps or special configuration. You only run a single Impala
+        <code class="ph codeph">COMPUTE STATS</code> statement to gather both table and column
+      statistics, rather than separate Hive <code class="ph codeph">ANALYZE TABLE</code>
+      statements for each kind of statistics.
+    </p>
+
+    <p class="p">
+      For non-incremental <code class="ph codeph">COMPUTE STATS</code>
+      statement, the columns for which statistics are computed can be specified
+      with an optional comma-separate list of columns.
+    </p>
+
+    <p class="p">
+      If no column list is given, the <code class="ph codeph">COMPUTE STATS</code> statement
+      computes column-level statistics for all columns of the table. This adds
+      potentially unneeded work for columns whose stats are not needed by
+      queries. It can be especially costly for very wide tables and unneeded
+      large string fields.
+    </p>
+    <p class="p">
+      <code class="ph codeph">COMPUTE STATS</code> returns an error when a specified column
+      cannot be analyzed, such as when the column does not exist, the column is
+      of an unsupported type for COMPUTE STATS, e.g. colums of complex types,
+      or the column is a partitioning column.
+
+    </p>
+    <p class="p">
+      If an empty column list is given, no column is analyzed by <code class="ph codeph">COMPUTE
+        STATS</code>.
+    </p>
+
+    <p class="p">
+      In <span class="keyword">Impala 2.12</span> and
+      higher, an optional <code class="ph codeph">TABLESAMPLE</code> clause immediately after
+      a table reference specifies that the <code class="ph codeph">COMPUTE STATS</code>
+      operation only processes a specified percentage of the table data. For
+      tables that are so large that a full <code class="ph codeph">COMPUTE STATS</code>
+      operation is impractical, you can use <code class="ph codeph">COMPUTE STATS</code> with
+      a <code class="ph codeph">TABLESAMPLE</code> clause to extrapolate statistics from a
+      sample of the table data. See <a href="impala_perf_stats.html"><span class="keyword">Table and Column Statistics</span></a>about the
+      experimental stats extrapolation and sampling features.
+    </p>
+
+    <p class="p">
+      The <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> variation is a shortcut for partitioned tables that works on a
+      subset of partitions rather than the entire table. The incremental nature makes it suitable for large tables
+      with many partitions, where a full <code class="ph codeph">COMPUTE STATS</code> operation takes too long to be practical
+      each time a partition is added or dropped. See <a class="xref" href="impala_perf_stats.html#perf_stats_incremental">impala_perf_stats.html#perf_stats_incremental</a>
+      for full usage details.
+    </p>
+
+    <div class="note important note_important"><span class="note__title importanttitle">Important:</span>
+      <p class="p">
+        For a particular table, use either <code class="ph codeph">COMPUTE STATS</code> or
+        <code class="ph codeph">COMPUTE INCREMENTAL STATS</code>, but never combine the two or
+        alternate between them. If you switch from <code class="ph codeph">COMPUTE STATS</code> to
+        <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> during the lifetime of a table, or
+        vice versa, drop all statistics by running <code class="ph codeph">DROP STATS</code> before
+        making the switch.
+      </p>
+      <p class="p">
+        When you run <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> on a table for the first time,
+        the statistics are computed again from scratch regardless of whether the table already
+        has statistics. Therefore, expect a one-time resource-intensive operation
+        for scanning the entire table when running <code class="ph codeph">COMPUTE INCREMENTAL STATS</code>
+        for the first time on a given table.
+      </p>
+      <p class="p">
+        For a table with a huge number of partitions and many columns, the approximately 400 bytes
+        of metadata per column per partition can add up to significant memory overhead, as it must
+        be cached on the <span class="keyword cmdname">catalogd</span> host and on every <span class="keyword cmdname">impalad</span> host
+        that is eligible to be a coordinator. If this metadata for all tables combined exceeds 2 GB,
+        you might experience service downtime.
+      </p>
+    </div>
+
+    <p class="p">
+      <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> only applies to partitioned tables. If you use the
+      <code class="ph codeph">INCREMENTAL</code> clause for an unpartitioned table, Impala automatically uses the original
+      <code class="ph codeph">COMPUTE STATS</code> statement. Such tables display <code class="ph codeph">false</code> under the
+      <code class="ph codeph">Incremental stats</code> column of the <code class="ph codeph">SHOW TABLE STATS</code> output.
+    </p>
+    <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+      <div class="p">
+        Because many of the most performance-critical and resource-intensive
+        operations rely on table and column statistics to construct accurate and
+        efficient plans, <code class="ph codeph">COMPUTE STATS</code> is an important step at
+        the end of your ETL process. Run <code class="ph codeph">COMPUTE STATS</code> on all
+        tables as your first step during performance tuning for slow queries, or
+        troubleshooting for out-of-memory conditions:
+        <ul class="ul">
+          <li class="li">
+            Accurate statistics help Impala construct an efficient query plan
+            for join queries, improving performance and reducing memory usage.
+          </li>
+          <li class="li">
+            Accurate statistics help Impala distribute the work effectively
+            for insert operations into Parquet tables, improving performance and
+            reducing memory usage.
+          </li>
+          <li class="li">
+            Accurate statistics help Impala estimate the memory
+            required for each query, which is important when you use resource
+            management features, such as admission control and the YARN resource
+            management framework. The statistics help Impala to achieve high
+            concurrency, full utilization of available memory, and avoid
+            contention with workloads from other Hadoop components.
+          </li>
+          <li class="li">
+            In <span class="keyword">Impala 2.8</span> and
+            higher, when you run the <code class="ph codeph">COMPUTE STATS</code> or
+              <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> statement against a
+            Parquet table, Impala automatically applies the query option setting
+              <code class="ph codeph">MT_DOP=4</code> to increase the amount of intra-node
+            parallelism during this CPU-intensive operation. See <a class="xref" href="impala_mt_dop.html">MT_DOP Query Option</a> for details about what this query option does
+            and how to use it with CPU-intensive <code class="ph codeph">SELECT</code>
+            statements.
+          </li>
+        </ul>
+      </div>
+    </div>
+
+    <p class="p">
+      <strong class="ph b">Computing stats for groups of partitions:</strong>
+    </p>
+
+    <p class="p">
+      In <span class="keyword">Impala 2.8</span> and higher, you can run <code class="ph codeph">COMPUTE INCREMENTAL STATS</code>
+      on multiple partitions, instead of the entire table or one partition at a time. You include
+      comparison operators other than <code class="ph codeph">=</code> in the <code class="ph codeph">PARTITION</code> clause,
+      and the <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> statement applies to all partitions that
+      match the comparison expression.
+    </p>
+
+    <p class="p">
+      For example, the <code class="ph codeph">INT_PARTITIONS</code> table contains 4 partitions.
+      The following <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> statements affect some but not all
+      partitions, as indicated by the <code class="ph codeph">Updated <var class="keyword varname">n</var> partition(s)</code>
+      messages. The partitions that are affected depend on values in the partition key column <code class="ph codeph">X</code>
+      that match the comparison expression in the <code class="ph codeph">PARTITION</code> clause.
+    </p>
+
+<pre class="pre codeblock"><code>
+show partitions int_partitions;
++-------+-------+--------+------+--------------+-------------------+---------+...
+| x     | #Rows | #Files | Size | Bytes Cached | Cache Replication | Format  |...
++-------+-------+--------+------+--------------+-------------------+---------+...
+| 99    | -1    | 0      | 0B   | NOT CACHED   | NOT CACHED        | PARQUET |...
+| 120   | -1    | 0      | 0B   | NOT CACHED   | NOT CACHED        | TEXT    |...
+| 150   | -1    | 0      | 0B   | NOT CACHED   | NOT CACHED        | TEXT    |...
+| 200   | -1    | 0      | 0B   | NOT CACHED   | NOT CACHED        | TEXT    |...
+| Total | -1    | 0      | 0B   | 0B           |                   |         |...
++-------+-------+--------+------+--------------+-------------------+---------+...
+
+compute incremental stats int_partitions partition (x &lt; 100);
++-----------------------------------------+
+| summary                                 |
++-----------------------------------------+
+| Updated 1 partition(s) and 1 column(s). |
++-----------------------------------------+
+
+compute incremental stats int_partitions partition (x in (100, 150, 200));
++-----------------------------------------+
+| summary                                 |
++-----------------------------------------+
+| Updated 2 partition(s) and 1 column(s). |
++-----------------------------------------+
+
+compute incremental stats int_partitions partition (x between 100 and 175);
++-----------------------------------------+
+| summary                                 |
++-----------------------------------------+
+| Updated 2 partition(s) and 1 column(s). |
++-----------------------------------------+
+
+compute incremental stats int_partitions partition (x in (100, 150, 200) or x &lt; 100);
++-----------------------------------------+
+| summary                                 |
++-----------------------------------------+
+| Updated 3 partition(s) and 1 column(s). |
++-----------------------------------------+
+
+compute incremental stats int_partitions partition (x != 150);
++-----------------------------------------+
+| summary                                 |
++-----------------------------------------+
+| Updated 3 partition(s) and 1 column(s). |
++-----------------------------------------+
+
+</code></pre>
+
+    <p class="p">
+        <strong class="ph b">Complex type considerations:</strong>
+      </p>
+
+    <p class="p">
+      Currently, the statistics created by the <code class="ph codeph">COMPUTE STATS</code> statement do not include
+      information about complex type columns. The column stats metrics for complex columns are always shown
+      as -1. For queries involving complex type columns, Impala uses
+      heuristics to estimate the data distribution within such columns.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">HBase considerations:</strong>
+      </p>
+
+    <p class="p">
+      <code class="ph codeph">COMPUTE STATS</code> works for HBase tables also. The statistics gathered for HBase tables are
+      somewhat different than for HDFS-backed tables, but that metadata is still used for optimization when HBase
+      tables are involved in join queries.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Amazon S3 considerations:</strong>
+      </p>
+
+    <p class="p">
+      <code class="ph codeph">COMPUTE STATS</code> also works for tables where data resides in the Amazon Simple Storage Service (S3).
+      See <a class="xref" href="impala_s3.html#s3">Using Impala with the Amazon S3 Filesystem</a> for details.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Performance considerations:</strong>
+      </p>
+
+    <p class="p">
+      The statistics collected by <code class="ph codeph">COMPUTE STATS</code> are used to optimize join queries
+      <code class="ph codeph">INSERT</code> operations into Parquet tables, and other resource-intensive kinds of SQL statements.
+      See <a class="xref" href="impala_perf_stats.html#perf_stats">Table and Column Statistics</a> for details.
+    </p>
+
+    <p class="p">
+      For large tables, the <code class="ph codeph">COMPUTE STATS</code> statement itself might take a long time and you
+      might need to tune its performance. The <code class="ph codeph">COMPUTE STATS</code> statement does not work with the
+      <code class="ph codeph">EXPLAIN</code> statement, or the <code class="ph codeph">SUMMARY</code> command in <span class="keyword cmdname">impala-shell</span>.
+      You can use the <code class="ph codeph">PROFILE</code> statement in <span class="keyword cmdname">impala-shell</span> to examine timing information
+      for the statement as a whole. If a basic <code class="ph codeph">COMPUTE STATS</code> statement takes a long time for a
+      partitioned table, consider switching to the <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> syntax so that only
+      newly added partitions are analyzed each time.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+    <p class="p">
+      This example shows two tables, <code class="ph codeph">T1</code> and <code class="ph codeph">T2</code>, with a small number distinct
+      values linked by a parent-child relationship between <code class="ph codeph">T1.ID</code> and <code class="ph codeph">T2.PARENT</code>.
+      <code class="ph codeph">T1</code> is tiny, while <code class="ph codeph">T2</code> has approximately 100K rows. Initially, the statistics
+      includes physical measurements such as the number of files, the total size, and size measurements for
+      fixed-length columns such as with the <code class="ph codeph">INT</code> type. Unknown values are represented by -1. After
+      running <code class="ph codeph">COMPUTE STATS</code> for each table, much more information is available through the
+      <code class="ph codeph">SHOW STATS</code> statements. If you were running a join query involving both of these tables, you
+      would need statistics for both tables to get the most effective optimization for the query.
+    </p>
+
+
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; show table stats t1;
+Query: show table stats t1
++-------+--------+------+--------+
+| #Rows | #Files | Size | Format |
++-------+--------+------+--------+
+| -1    | 1      | 33B  | TEXT   |
++-------+--------+------+--------+
+Returned 1 row(s) in 0.02s
+[localhost:21000] &gt; show table stats t2;
+Query: show table stats t2
++-------+--------+----------+--------+
+| #Rows | #Files | Size     | Format |
++-------+--------+----------+--------+
+| -1    | 28     | 960.00KB | TEXT   |
++-------+--------+----------+--------+
+Returned 1 row(s) in 0.01s
+[localhost:21000] &gt; show column stats t1;
+Query: show column stats t1
++--------+--------+------------------+--------+----------+----------+
+| Column | Type   | #Distinct Values | #Nulls | Max Size | Avg Size |
++--------+--------+------------------+--------+----------+----------+
+| id     | INT    | -1               | -1     | 4        | 4        |
+| s      | STRING | -1               | -1     | -1       | -1       |
++--------+--------+------------------+--------+----------+----------+
+Returned 2 row(s) in 1.71s
+[localhost:21000] &gt; show column stats t2;
+Query: show column stats t2
++--------+--------+------------------+--------+----------+----------+
+| Column | Type   | #Distinct Values | #Nulls | Max Size | Avg Size |
++--------+--------+------------------+--------+----------+----------+
+| parent | INT    | -1               | -1     | 4        | 4        |
+| s      | STRING | -1               | -1     | -1       | -1       |
++--------+--------+------------------+--------+----------+----------+
+Returned 2 row(s) in 0.01s
+[localhost:21000] &gt; compute stats t1;
+Query: compute stats t1
++-----------------------------------------+
+| summary                                 |
++-----------------------------------------+
+| Updated 1 partition(s) and 2 column(s). |
++-----------------------------------------+
+Returned 1 row(s) in 5.30s
+[localhost:21000] &gt; show table stats t1;
+Query: show table stats t1
++-------+--------+------+--------+
+| #Rows | #Files | Size | Format |
++-------+--------+------+--------+
+| 3     | 1      | 33B  | TEXT   |
++-------+--------+------+--------+
+Returned 1 row(s) in 0.01s
+[localhost:21000] &gt; show column stats t1;
+Query: show column stats t1
++--------+--------+------------------+--------+----------+----------+
+| Column | Type   | #Distinct Values | #Nulls | Max Size | Avg Size |
++--------+--------+------------------+--------+----------+----------+
+| id     | INT    | 3                | -1     | 4        | 4        |
+| s      | STRING | 3                | -1     | -1       | -1       |
++--------+--------+------------------+--------+----------+----------+
+Returned 2 row(s) in 0.02s
+[localhost:21000] &gt; compute stats t2;
+Query: compute stats t2
++-----------------------------------------+
+| summary                                 |
++-----------------------------------------+
+| Updated 1 partition(s) and 2 column(s). |
++-----------------------------------------+
+Returned 1 row(s) in 5.70s
+[localhost:21000] &gt; show table stats t2;
+Query: show table stats t2
++-------+--------+----------+--------+
+| #Rows | #Files | Size     | Format |
++-------+--------+----------+--------+
+| 98304 | 1      | 960.00KB | TEXT   |
++-------+--------+----------+--------+
+Returned 1 row(s) in 0.03s
+[localhost:21000] &gt; show column stats t2;
+Query: show column stats t2
++--------+--------+------------------+--------+----------+----------+
+| Column | Type   | #Distinct Values | #Nulls | Max Size | Avg Size |
++--------+--------+------------------+--------+----------+----------+
+| parent | INT    | 3                | -1     | 4        | 4        |
+| s      | STRING | 6                | -1     | 14       | 9.3      |
++--------+--------+------------------+--------+----------+----------+
+Returned 2 row(s) in 0.01s</code></pre>
+
+    <p class="p">
+      The following example shows how to use the <code class="ph codeph">INCREMENTAL</code> clause, available in Impala 2.1.0 and
+      higher. The <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> syntax lets you collect statistics for newly added or
+      changed partitions, without rescanning the entire table.
+    </p>
+
+<pre class="pre codeblock"><code>-- Initially the table has no incremental stats, as indicated
+-- 'false' under Incremental stats.
+show table stats item_partitioned;
++-------------+-------+--------+----------+--------------+---------+------------------
+| i_category  | #Rows | #Files | Size     | Bytes Cached | Format  | Incremental stats
++-------------+-------+--------+----------+--------------+---------+------------------
+| Books       | -1    | 1      | 223.74KB | NOT CACHED   | PARQUET | false
+| Children    | -1    | 1      | 230.05KB | NOT CACHED   | PARQUET | false
+| Electronics | -1    | 1      | 232.67KB | NOT CACHED   | PARQUET | false
+| Home        | -1    | 1      | 232.56KB | NOT CACHED   | PARQUET | false
+| Jewelry     | -1    | 1      | 223.72KB | NOT CACHED   | PARQUET | false
+| Men         | -1    | 1      | 231.25KB | NOT CACHED   | PARQUET | false
+| Music       | -1    | 1      | 237.90KB | NOT CACHED   | PARQUET | false
+| Shoes       | -1    | 1      | 234.90KB | NOT CACHED   | PARQUET | false
+| Sports      | -1    | 1      | 227.97KB | NOT CACHED   | PARQUET | false
+| Women       | -1    | 1      | 226.27KB | NOT CACHED   | PARQUET | false
+| Total       | -1    | 10     | 2.25MB   | 0B           |         |
++-------------+-------+--------+----------+--------------+---------+------------------
+
+-- After the first COMPUTE INCREMENTAL STATS,
+-- all partitions have stats. The first
+-- COMPUTE INCREMENTAL STATS scans the whole
+-- table, discarding any previous stats from
+-- a traditional COMPUTE STATS statement.
+compute incremental stats item_partitioned;
++-------------------------------------------+
+| summary                                   |
++-------------------------------------------+
+| Updated 10 partition(s) and 21 column(s). |
++-------------------------------------------+
+show table stats item_partitioned;
++-------------+-------+--------+----------+--------------+---------+------------------
+| i_category  | #Rows | #Files | Size     | Bytes Cached | Format  | Incremental stats
++-------------+-------+--------+----------+--------------+---------+------------------
+| Books       | 1733  | 1      | 223.74KB | NOT CACHED   | PARQUET | true
+| Children    | 1786  | 1      | 230.05KB | NOT CACHED   | PARQUET | true
+| Electronics | 1812  | 1      | 232.67KB | NOT CACHED   | PARQUET | true
+| Home        | 1807  | 1      | 232.56KB | NOT CACHED   | PARQUET | true
+| Jewelry     | 1740  | 1      | 223.72KB | NOT CACHED   | PARQUET | true
+| Men         | 1811  | 1      | 231.25KB | NOT CACHED   | PARQUET | true
+| Music       | 1860  | 1      | 237.90KB | NOT CACHED   | PARQUET | true
+| Shoes       | 1835  | 1      | 234.90KB | NOT CACHED   | PARQUET | true
+| Sports      | 1783  | 1      | 227.97KB | NOT CACHED   | PARQUET | true
+| Women       | 1790  | 1      | 226.27KB | NOT CACHED   | PARQUET | true
+| Total       | 17957 | 10     | 2.25MB   | 0B           |         |
++-------------+-------+--------+----------+--------------+---------+------------------
+
+-- Add a new partition...
+alter table item_partitioned add partition (i_category='Camping');
+-- Add or replace files in HDFS outside of Impala,
+-- rendering the stats for a partition obsolete.
+!import_data_into_sports_partition.sh
+refresh item_partitioned;
+drop incremental stats item_partitioned partition (i_category='Sports');
+-- Now some partitions have incremental stats
+-- and some do not.
+show table stats item_partitioned;
++-------------+-------+--------+----------+--------------+---------+------------------
+| i_category  | #Rows | #Files | Size     | Bytes Cached | Format  | Incremental stats
++-------------+-------+--------+----------+--------------+---------+------------------
+| Books       | 1733  | 1      | 223.74KB | NOT CACHED   | PARQUET | true
+| Camping     | -1    | 1      | 408.02KB | NOT CACHED   | PARQUET | false
+| Children    | 1786  | 1      | 230.05KB | NOT CACHED   | PARQUET | true
+| Electronics | 1812  | 1      | 232.67KB | NOT CACHED   | PARQUET | true
+| Home        | 1807  | 1      | 232.56KB | NOT CACHED   | PARQUET | true
+| Jewelry     | 1740  | 1      | 223.72KB | NOT CACHED   | PARQUET | true
+| Men         | 1811  | 1      | 231.25KB | NOT CACHED   | PARQUET | true
+| Music       | 1860  | 1      | 237.90KB | NOT CACHED   | PARQUET | true
+| Shoes       | 1835  | 1      | 234.90KB | NOT CACHED   | PARQUET | true
+| Sports      | -1    | 1      | 227.97KB | NOT CACHED   | PARQUET | false
+| Women       | 1790  | 1      | 226.27KB | NOT CACHED   | PARQUET | true
+| Total       | 17957 | 11     | 2.65MB   | 0B           |         |
++-------------+-------+--------+----------+--------------+---------+------------------
+
+-- After another COMPUTE INCREMENTAL STATS,
+-- all partitions have incremental stats, and only the 2
+-- partitions without incremental stats were scanned.
+compute incremental stats item_partitioned;
++------------------------------------------+
+| summary                                  |
++------------------------------------------+
+| Updated 2 partition(s) and 21 column(s). |
++------------------------------------------+
+show table stats item_partitioned;
++-------------+-------+--------+----------+--------------+---------+------------------
+| i_category  | #Rows | #Files | Size     | Bytes Cached | Format  | Incremental stats
++-------------+-------+--------+----------+--------------+---------+------------------
+| Books       | 1733  | 1      | 223.74KB | NOT CACHED   | PARQUET | true
+| Camping     | 5328  | 1      | 408.02KB | NOT CACHED   | PARQUET | true
+| Children    | 1786  | 1      | 230.05KB | NOT CACHED   | PARQUET | true
+| Electronics | 1812  | 1      | 232.67KB | NOT CACHED   | PARQUET | true
+| Home        | 1807  | 1      | 232.56KB | NOT CACHED   | PARQUET | true
+| Jewelry     | 1740  | 1      | 223.72KB | NOT CACHED   | PARQUET | true
+| Men         | 1811  | 1      | 231.25KB | NOT CACHED   | PARQUET | true
+| Music       | 1860  | 1      | 237.90KB | NOT CACHED   | PARQUET | true
+| Shoes       | 1835  | 1      | 234.90KB | NOT CACHED   | PARQUET | true
+| Sports      | 1783  | 1      | 227.97KB | NOT CACHED   | PARQUET | true
+| Women       | 1790  | 1      | 226.27KB | NOT CACHED   | PARQUET | true
+| Total       | 17957 | 11     | 2.65MB   | 0B           |         |
++-------------+-------+--------+----------+--------------+---------+------------------
+</code></pre>
+
+    <p class="p">
+        <strong class="ph b">File format considerations:</strong>
+      </p>
+
+    <p class="p">
+      The <code class="ph codeph">COMPUTE STATS</code> statement works with tables created with any of the file formats supported
+      by Impala. See <a class="xref" href="impala_file_formats.html#file_formats">How Impala Works with Hadoop File Formats</a> for details about working with the
+      different file formats. The following considerations apply to <code class="ph codeph">COMPUTE STATS</code> depending on the
+      file format of the table.
+    </p>
+
+    <p class="p">
+      The <code class="ph codeph">COMPUTE STATS</code> statement works with text tables with no restrictions. These tables can be
+      created through either Impala or Hive.
+    </p>
+
+    <p class="p">
+      The <code class="ph codeph">COMPUTE STATS</code> statement works with Parquet tables. These tables can be created through
+      either Impala or Hive.
+    </p>
+
+    <p class="p">
+      The <code class="ph codeph">COMPUTE STATS</code> statement works with Avro tables without restriction in <span class="keyword">Impala 2.2</span>
+      and higher. In earlier releases, <code class="ph codeph">COMPUTE STATS</code> worked only for Avro tables created through Hive,
+      and required the <code class="ph codeph">CREATE TABLE</code> statement to use SQL-style column names and types rather than an
+      Avro-style schema specification.
+    </p>
+
+    <p class="p">
+      The <code class="ph codeph">COMPUTE STATS</code> statement works with RCFile tables with no restrictions. These tables can
+      be created through either Impala or Hive.
+    </p>
+
+    <p class="p">
+      The <code class="ph codeph">COMPUTE STATS</code> statement works with SequenceFile tables with no restrictions. These
+      tables can be created through either Impala or Hive.
+    </p>
+
+    <p class="p">
+      The <code class="ph codeph">COMPUTE STATS</code> statement works with partitioned tables, whether all the partitions use
+      the same file format, or some partitions are defined through <code class="ph codeph">ALTER TABLE</code> to use different
+      file formats.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Statement type:</strong> DDL
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Cancellation:</strong> Certain multi-stage statements (<code class="ph codeph">CREATE TABLE AS SELECT</code> and
+        <code class="ph codeph">COMPUTE STATS</code>) can be cancelled during some stages, when running <code class="ph codeph">INSERT</code>
+        or <code class="ph codeph">SELECT</code> operations internally. To cancel this statement, use Ctrl-C from the
+        <span class="keyword cmdname">impala-shell</span> interpreter, the <span class="ph uicontrol">Cancel</span> button from the
+        <span class="ph uicontrol">Watch</span> page in Hue, or <span class="ph uicontrol">Cancel</span> from the list of
+        in-flight queries (for a particular node) on the <span class="ph uicontrol">Queries</span> tab in the Impala web UI
+        (port 25000).
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Restrictions:</strong>
+      </p>
+
+    <div class="note note note_note"><span class="note__title notetitle">Note:</span>  Prior to Impala 1.4.0,
+          <code class="ph codeph">COMPUTE STATS</code> counted the number of
+          <code class="ph codeph">NULL</code> values in each column and recorded that figure
+        in the metastore database. Because Impala does not currently use the
+          <code class="ph codeph">NULL</code> count during query planning, Impala 1.4.0 and
+        higher speeds up the <code class="ph codeph">COMPUTE STATS</code> statement by
+        skipping this <code class="ph codeph">NULL</code> counting. </div>
+
+    <p class="p">
+        <strong class="ph b">Internal details:</strong>
+      </p>
+    <p class="p">
+      Behind the scenes, the <code class="ph codeph">COMPUTE STATS</code> statement
+      executes two statements: one to count the rows of each partition
+      in the table (or the entire table if unpartitioned) through the
+      <code class="ph codeph">COUNT(*)</code> function,
+      and another to count the approximate number of distinct values
+      in each column through the <code class="ph codeph">NDV()</code> function.
+      You might see these queries in your monitoring and diagnostic displays.
+      The same factors that affect the performance, scalability, and
+      execution of other queries (such as parallel execution, memory usage,
+      admission control, and timeouts) also apply to the queries run by the
+      <code class="ph codeph">COMPUTE STATS</code> statement.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">HDFS permissions:</strong>
+      </p>
+    <p class="p">
+      The user ID that the <span class="keyword cmdname">impalad</span> daemon runs under,
+      typically the <code class="ph codeph">impala</code> user, must have read
+      permission for all affected files in the source directory:
+      all files in the case of an unpartitioned table or
+      a partitioned table in the case of <code class="ph codeph">COMPUTE STATS</code>;
+      or all the files in partitions without incremental stats in
+      the case of <code class="ph codeph">COMPUTE INCREMENTAL STATS</code>.
+      It must also have read and execute permissions for all
+      relevant directories holding the data files.
+      (Essentially, <code class="ph codeph">COMPUTE STATS</code> requires the
+      same permissions as the underlying <code class="ph codeph">SELECT</code> queries it runs
+      against the table.)
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Kudu considerations:</strong>
+      </p>
+
+    <p class="p">
+      The <code class="ph codeph">COMPUTE STATS</code> statement applies to Kudu tables.
+      Impala does not compute the number of rows for each partition for
+      Kudu tables. Therefore, you do not need to re-run the operation when
+      you see -1 in the <code class="ph codeph"># Rows</code> column of the output from
+      <code class="ph codeph">SHOW TABLE STATS</code>. That column always shows -1 for
+      all Kudu tables.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+    <p class="p">
+      <a class="xref" href="impala_drop_stats.html#drop_stats">DROP STATS Statement</a>, <a class="xref" href="impala_show.html#show_table_stats">SHOW TABLE STATS Statement</a>,
+      <a class="xref" href="impala_show.html#show_column_stats">SHOW COLUMN STATS Statement</a>, <a class="xref" href="impala_perf_stats.html#perf_stats">Table and Column Statistics</a>
+    </p>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_langref_sql.html">Impala SQL Statements</a></div></div></nav></article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_compute_stats_min_sample_size.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_compute_stats_min_sample_size.html b/docs/build3x/html/topics/impala_compute_stats_min_sample_size.html
new file mode 100644
index 0000000..03d21e2
--- /dev/null
+++ b/docs/build3x/html/topics/impala_compute_stats_min_sample_size.html
@@ -0,0 +1,23 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="compute_stats_sample_min_sample_size"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>COMPUTE_STATS_MIN_SAMPLE_SIZE Query Option</title></head><body id="compute_stats_sample_min_sample_size"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+  <h1 class="title topictitle1" id="ariaid-title1">COMPUTE_STATS_MIN_SAMPLE_SIZE Query Option</h1>
+
+
+  <div class="body conbody">
+    <p class="p">The <code class="ph codeph">COMPUTE_STATS_MIN_SAMPLE_SIZE</code> query option specifies
+      the minimum number of bytes that will be scanned in <code class="ph codeph">COMPUTE STATS
+        TABLESAMPLE</code>, regardless of the user-supplied sampling percent.
+      This query option prevents sampling for very small tables where accurate
+      stats can be obtained cheaply without sampling because the minimum sample
+      size is required to get meaningful stats.</p>
+    <p class="p">
+        <strong class="ph b">Type:</strong> integer
+      </p>
+    <p class="p"><strong class="ph b">Default:</strong> 1GB</p>
+    <p class="p"><strong class="ph b">Added in</strong>: <span class="keyword">Impala 2.12</span></p>
+    <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_concepts.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_concepts.html b/docs/build3x/html/topics/impala_concepts.html
new file mode 100644
index 0000000..b98e4ce
--- /dev/null
+++ b/docs/build3x/html/topics/impala_concepts.html
@@ -0,0 +1,48 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_components.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_development.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_hadoop.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="concepts"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Impala Concepts and Architecture</title></head><body id="concepts"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Impala Concepts and Architecture</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+      The following sections provide background information to help you become productive using Impala and
+      its features. Where appropriate, the explanations include context to help understand how aspects of Impala
+      relate to other technologies you might already be familiar with, such as relational database management
+      systems and data warehouses, or other Hadoop components such as Hive, HDFS, and HBase.
+    </p>
+
+    <p class="p toc"></p>
+  </div>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+<nav role="navigation" class="related-links"><ul class="ullinks"><li class="link ulchildlink"><strong><a href="../topics/impala_components.html">Components of the Impala Server</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_development.html">Developing Impala Applications</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_hadoop.html">How Impala Fits Into the Hadoop Ecosystem</a></strong><br></li></ul></nav></article></main></body></html>

[41/51] [partial] impala git commit: [DOCS] Impala doc site update for 3.0

Posted by mi...@apache.org.

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_conditional_functions.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_conditional_functions.html b/docs/build3x/html/topics/impala_conditional_functions.html
new file mode 100644
index 0000000..476fb82
--- /dev/null
+++ b/docs/build3x/html/topics/impala_conditional_functions.html
@@ -0,0 +1,611 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_functions.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="conditional_functions"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Impala Conditional Functions</title></head><body id="conditional_functions"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Impala Conditional Functions</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+      Impala supports the following conditional functions for testing equality, comparison operators, and nullity:
+    </p>
+
+    <dl class="dl">
+
+
+        <dt class="dt dlterm" id="conditional_functions__case">
+          <code class="ph codeph">CASE a WHEN b THEN c [WHEN d THEN e]... [ELSE f] END</code>
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Compares an expression to one or more possible values, and returns a corresponding result
+          when a match is found.
+          <p class="p">
+        <strong class="ph b">Return type:</strong> same as the initial argument value, except that integer values are promoted to
+        <code class="ph codeph">BIGINT</code> and floating-point values are promoted to <code class="ph codeph">DOUBLE</code>; use
+        <code class="ph codeph">CAST()</code> when inserting into a smaller numeric column
+      </p>
+          <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+          <p class="p">
+            In this form of the <code class="ph codeph">CASE</code> expression, the initial value <code class="ph codeph">A</code>
+            being evaluated for each row it typically a column reference, or an expression involving
+            a column. This form can only compare against a set of specified values, not ranges,
+            multi-value comparisons such as <code class="ph codeph">BETWEEN</code> or <code class="ph codeph">IN</code>,
+            regular expressions, or <code class="ph codeph">NULL</code>.
+          </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+          <p class="p">
+            Although this example is split across multiple lines, you can put any or all parts of a <code class="ph codeph">CASE</code> expression
+            on a single line, with no punctuation or other separators between the <code class="ph codeph">WHEN</code>,
+            <code class="ph codeph">ELSE</code>, and <code class="ph codeph">END</code> clauses.
+          </p>
+<pre class="pre codeblock"><code>select case x
+    when 1 then 'one'
+    when 2 then 'two'
+    when 0 then 'zero'
+    else 'out of range'
+  end
+    from t1;
+</code></pre>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="conditional_functions__case2">
+          <code class="ph codeph">CASE WHEN a THEN b [WHEN c THEN d]... [ELSE e] END</code>
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Tests whether any of a sequence of expressions is true, and returns a corresponding
+          result for the first true expression.
+          <p class="p">
+        <strong class="ph b">Return type:</strong> same as the initial argument value, except that integer values are promoted to
+        <code class="ph codeph">BIGINT</code> and floating-point values are promoted to <code class="ph codeph">DOUBLE</code>; use
+        <code class="ph codeph">CAST()</code> when inserting into a smaller numeric column
+      </p>
+          <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+          <p class="p">
+            <code class="ph codeph">CASE</code> expressions without an initial test value have more flexibility.
+            For example, they can test different columns in different <code class="ph codeph">WHEN</code> clauses,
+            or use comparison operators such as <code class="ph codeph">BETWEEN</code>, <code class="ph codeph">IN</code> and <code class="ph codeph">IS NULL</code>
+            rather than comparing against discrete values.
+          </p>
+          <p class="p">
+            <code class="ph codeph">CASE</code> expressions are often the foundation of long queries that
+            summarize and format results for easy-to-read reports. For example, you might
+            use a <code class="ph codeph">CASE</code> function call to turn values from a numeric column
+            into category strings corresponding to integer values, or labels such as <span class="q">"Small"</span>,
+            <span class="q">"Medium"</span> and <span class="q">"Large"</span> based on ranges. Then subsequent parts of the
+            query might aggregate based on the transformed values, such as how many
+            values are classified as small, medium, or large. You can also use <code class="ph codeph">CASE</code>
+            to signal problems with out-of-bounds values, <code class="ph codeph">NULL</code> values,
+            and so on.
+          </p>
+          <p class="p">
+            By using operators such as <code class="ph codeph">OR</code>, <code class="ph codeph">IN</code>,
+            <code class="ph codeph">REGEXP</code>, and so on in <code class="ph codeph">CASE</code> expressions,
+            you can build extensive tests and transformations into a single query.
+            Therefore, applications that construct SQL statements often rely heavily on <code class="ph codeph">CASE</code>
+            calls in the generated SQL code.
+          </p>
+          <p class="p">
+            Because this flexible form of the <code class="ph codeph">CASE</code> expressions allows you to perform
+            many comparisons and call multiple functions when evaluating each row, be careful applying
+            elaborate <code class="ph codeph">CASE</code> expressions to queries that process large amounts of data.
+            For example, when practical, evaluate and transform values through <code class="ph codeph">CASE</code>
+            after applying operations such as aggregations that reduce the size of the result set;
+            transform numbers to strings after performing joins with the original numeric values.
+          </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+          <p class="p">
+            Although this example is split across multiple lines, you can put any or all parts of a <code class="ph codeph">CASE</code> expression
+            on a single line, with no punctuation or other separators between the <code class="ph codeph">WHEN</code>,
+            <code class="ph codeph">ELSE</code>, and <code class="ph codeph">END</code> clauses.
+          </p>
+<pre class="pre codeblock"><code>select case
+    when dayname(now()) in ('Saturday','Sunday') then 'result undefined on weekends'
+    when x &gt; y then 'x greater than y'
+    when x = y then 'x and y are equal'
+    when x is null or y is null then 'one of the columns is null'
+    else null
+  end
+    from t1;
+</code></pre>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="conditional_functions__coalesce">
+          <code class="ph codeph">coalesce(type v1, type v2, ...)</code>
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Returns the first specified argument that is not <code class="ph codeph">NULL</code>, or
+          <code class="ph codeph">NULL</code> if all arguments are <code class="ph codeph">NULL</code>.
+          <p class="p">
+        <strong class="ph b">Return type:</strong> same as the initial argument value, except that integer values are promoted to
+        <code class="ph codeph">BIGINT</code> and floating-point values are promoted to <code class="ph codeph">DOUBLE</code>; use
+        <code class="ph codeph">CAST()</code> when inserting into a smaller numeric column
+      </p>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="conditional_functions__decode">
+          <code class="ph codeph">decode(type expression, type search1, type result1 [, type search2, type result2 ...] [, type
+          default] )</code>
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Compares an expression to one or more possible values, and returns a corresponding result
+          when a match is found.
+          <p class="p">
+        <strong class="ph b">Return type:</strong> same as the initial argument value, except that integer values are promoted to
+        <code class="ph codeph">BIGINT</code> and floating-point values are promoted to <code class="ph codeph">DOUBLE</code>; use
+        <code class="ph codeph">CAST()</code> when inserting into a smaller numeric column
+      </p>
+          <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+          <p class="p">
+            Can be used as shorthand for a <code class="ph codeph">CASE</code> expression.
+          </p>
+          <p class="p">
+            The original expression and the search expressions must of the same type or convertible types. The
+            result expression can be a different type, but all result expressions must be of the same type.
+          </p>
+          <p class="p">
+            Returns a successful match If the original expression is <code class="ph codeph">NULL</code> and a search expression
+            is also <code class="ph codeph">NULL</code>. the
+          </p>
+          <p class="p">
+            Returns <code class="ph codeph">NULL</code> if the final <code class="ph codeph">default</code> value is omitted and none of the
+            search expressions match the original expression.
+          </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+          <p class="p">
+            The following example translates numeric day values into descriptive names:
+          </p>
+<pre class="pre codeblock"><code>SELECT event, decode(day_of_week, 1, "Monday", 2, "Tuesday", 3, "Wednesday",
+  4, "Thursday", 5, "Friday", 6, "Saturday", 7, "Sunday", "Unknown day")
+  FROM calendar;
+</code></pre>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="conditional_functions__if">
+          <code class="ph codeph">if(boolean condition, type ifTrue, type ifFalseOrNull)</code>
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Tests an expression and returns a corresponding result depending on whether the result is
+          true, false, or <code class="ph codeph">NULL</code>.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> Same as the <code class="ph codeph">ifTrue</code> argument value
+          </p>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="conditional_functions__ifnull">
+          <code class="ph codeph">ifnull(type a, type ifNull)</code>
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Alias for the <code class="ph codeph">isnull()</code> function, with the same behavior. To simplify
+          porting SQL with vendor extensions to Impala.
+          <p class="p">
+        <strong class="ph b">Added in:</strong> Impala 1.3.0
+      </p>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="conditional_functions__isfalse">
+          <code class="ph codeph">isfalse(<var class="keyword varname">boolean</var>)</code>
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Tests if a Boolean expression is <code class="ph codeph">false</code> or not.
+          Returns <code class="ph codeph">true</code> if so.
+          If the argument is <code class="ph codeph">NULL</code>, returns <code class="ph codeph">false</code>.
+          Identical to <code class="ph codeph">isnottrue()</code>, except it returns the opposite value for a <code class="ph codeph">NULL</code> argument.
+          <p class="p">
+        <strong class="ph b">Return type:</strong> <code class="ph codeph">BOOLEAN</code>
+      </p>
+          <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.2.0</span>
+      </p>
+          <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+          <p class="p">
+        In <span class="keyword">Impala 2.11</span> and higher, you can use
+        the operators <code class="ph codeph">IS [NOT] TRUE</code> and
+        <code class="ph codeph">IS [NOT] FALSE</code> as equivalents for the built-in
+        functions <code class="ph codeph">istrue()</code>, <code class="ph codeph">isnottrue()</code>,
+        <code class="ph codeph">isfalse()</code>, and <code class="ph codeph">isnotfalse()</code>.
+      </p>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="conditional_functions__isnotfalse">
+          <code class="ph codeph">isnotfalse(<var class="keyword varname">boolean</var>)</code>
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Tests if a Boolean expression is not <code class="ph codeph">false</code> (that is, either <code class="ph codeph">true</code> or <code class="ph codeph">NULL</code>).
+          Returns <code class="ph codeph">true</code> if so.
+          If the argument is <code class="ph codeph">NULL</code>, returns <code class="ph codeph">true</code>.
+          Identical to <code class="ph codeph">istrue()</code>, except it returns the opposite value for a <code class="ph codeph">NULL</code> argument.
+          <p class="p">
+        <strong class="ph b">Return type:</strong> <code class="ph codeph">BOOLEAN</code>
+      </p>
+          <p class="p">
+        <strong class="ph b">Usage notes:</strong> Primarily for compatibility with code containing industry extensions to SQL.
+      </p>
+          <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.2.0</span>
+      </p>
+          <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+          <p class="p">
+        In <span class="keyword">Impala 2.11</span> and higher, you can use
+        the operators <code class="ph codeph">IS [NOT] TRUE</code> and
+        <code class="ph codeph">IS [NOT] FALSE</code> as equivalents for the built-in
+        functions <code class="ph codeph">istrue()</code>, <code class="ph codeph">isnottrue()</code>,
+        <code class="ph codeph">isfalse()</code>, and <code class="ph codeph">isnotfalse()</code>.
+      </p>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="conditional_functions__isnottrue">
+          <code class="ph codeph">isnottrue(<var class="keyword varname">boolean</var>)</code>
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Tests if a Boolean expression is not <code class="ph codeph">true</code> (that is, either <code class="ph codeph">false</code> or <code class="ph codeph">NULL</code>).
+          Returns <code class="ph codeph">true</code> if so.
+          If the argument is <code class="ph codeph">NULL</code>, returns <code class="ph codeph">true</code>.
+          Identical to <code class="ph codeph">isfalse()</code>, except it returns the opposite value for a <code class="ph codeph">NULL</code> argument.
+          <p class="p">
+        <strong class="ph b">Return type:</strong> <code class="ph codeph">BOOLEAN</code>
+      </p>
+          <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.2.0</span>
+      </p>
+          <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+          <p class="p">
+        In <span class="keyword">Impala 2.11</span> and higher, you can use
+        the operators <code class="ph codeph">IS [NOT] TRUE</code> and
+        <code class="ph codeph">IS [NOT] FALSE</code> as equivalents for the built-in
+        functions <code class="ph codeph">istrue()</code>, <code class="ph codeph">isnottrue()</code>,
+        <code class="ph codeph">isfalse()</code>, and <code class="ph codeph">isnotfalse()</code>.
+      </p>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="conditional_functions__isnull">
+          <code class="ph codeph">isnull(type a, type ifNull)</code>
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Tests if an expression is <code class="ph codeph">NULL</code>, and returns the expression result value
+          if not. If the first argument is <code class="ph codeph">NULL</code>, returns the second argument.
+          <p class="p">
+            <strong class="ph b">Compatibility notes:</strong> Equivalent to the <code class="ph codeph">nvl()</code> function from Oracle Database or
+            <code class="ph codeph">ifnull()</code> from MySQL. The <code class="ph codeph">nvl()</code> and <code class="ph codeph">ifnull()</code>
+            functions are also available in Impala.
+          </p>
+          <p class="p">
+            <strong class="ph b">Return type:</strong> Same as the first argument value
+          </p>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="conditional_functions__istrue">
+          <code class="ph codeph">istrue(<var class="keyword varname">boolean</var>)</code>
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Tests if a Boolean expression is <code class="ph codeph">true</code> or not.
+          Returns <code class="ph codeph">true</code> if so.
+          If the argument is <code class="ph codeph">NULL</code>, returns <code class="ph codeph">false</code>.
+          Identical to <code class="ph codeph">isnotfalse()</code>, except it returns the opposite value for a <code class="ph codeph">NULL</code> argument.
+          <p class="p">
+        <strong class="ph b">Return type:</strong> <code class="ph codeph">BOOLEAN</code>
+      </p>
+          <p class="p">
+        <strong class="ph b">Usage notes:</strong> Primarily for compatibility with code containing industry extensions to SQL.
+      </p>
+          <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.2.0</span>
+      </p>
+          <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+          <p class="p">
+        In <span class="keyword">Impala 2.11</span> and higher, you can use
+        the operators <code class="ph codeph">IS [NOT] TRUE</code> and
+        <code class="ph codeph">IS [NOT] FALSE</code> as equivalents for the built-in
+        functions <code class="ph codeph">istrue()</code>, <code class="ph codeph">isnottrue()</code>,
+        <code class="ph codeph">isfalse()</code>, and <code class="ph codeph">isnotfalse()</code>.
+      </p>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="conditional_functions__nonnullvalue">
+          <code class="ph codeph">nonnullvalue(<var class="keyword varname">expression</var>)</code>
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Tests if an expression (of any type) is <code class="ph codeph">NULL</code> or not.
+          Returns <code class="ph codeph">false</code> if so.
+          The converse of <code class="ph codeph">nullvalue()</code>.
+          <p class="p">
+        <strong class="ph b">Return type:</strong> <code class="ph codeph">BOOLEAN</code>
+      </p>
+          <p class="p">
+        <strong class="ph b">Usage notes:</strong> Primarily for compatibility with code containing industry extensions to SQL.
+      </p>
+          <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.2.0</span>
+      </p>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="conditional_functions__nullif">
+          <code class="ph codeph">nullif(<var class="keyword varname">expr1</var>,<var class="keyword varname">expr2</var>)</code>
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Returns <code class="ph codeph">NULL</code> if the two specified arguments are equal. If the specified
+          arguments are not equal, returns the value of <var class="keyword varname">expr1</var>. The data types of the expressions
+          must be compatible, according to the conversion rules from <a class="xref" href="impala_datatypes.html#datatypes">Data Types</a>.
+          You cannot use an expression that evaluates to <code class="ph codeph">NULL</code> for <var class="keyword varname">expr1</var>; that
+          way, you can distinguish a return value of <code class="ph codeph">NULL</code> from an argument value of
+          <code class="ph codeph">NULL</code>, which would never match <var class="keyword varname">expr2</var>.
+          <p class="p">
+            <strong class="ph b">Usage notes:</strong> This function is effectively shorthand for a <code class="ph codeph">CASE</code> expression of
+            the form:
+          </p>
+<pre class="pre codeblock"><code>CASE
+  WHEN <var class="keyword varname">expr1</var> = <var class="keyword varname">expr2</var> THEN NULL
+  ELSE <var class="keyword varname">expr1</var>
+END</code></pre>
+          <p class="p">
+            It is commonly used in division expressions, to produce a <code class="ph codeph">NULL</code> result instead of a
+            divide-by-zero error when the divisor is equal to zero:
+          </p>
+<pre class="pre codeblock"><code>select 1.0 / nullif(c1,0) as reciprocal from t1;</code></pre>
+          <p class="p">
+            You might also use it for compatibility with other database systems that support the same
+            <code class="ph codeph">NULLIF()</code> function.
+          </p>
+          <p class="p">
+        <strong class="ph b">Return type:</strong> same as the initial argument value, except that integer values are promoted to
+        <code class="ph codeph">BIGINT</code> and floating-point values are promoted to <code class="ph codeph">DOUBLE</code>; use
+        <code class="ph codeph">CAST()</code> when inserting into a smaller numeric column
+      </p>
+          <p class="p">
+        <strong class="ph b">Added in:</strong> Impala 1.3.0
+      </p>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="conditional_functions__nullifzero">
+          <code class="ph codeph">nullifzero(<var class="keyword varname">numeric_expr</var>)</code>
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Returns <code class="ph codeph">NULL</code> if the numeric expression evaluates to 0, otherwise returns
+          the result of the expression.
+          <p class="p">
+            <strong class="ph b">Usage notes:</strong> Used to avoid error conditions such as divide-by-zero in numeric calculations.
+            Serves as shorthand for a more elaborate <code class="ph codeph">CASE</code> expression, to simplify porting SQL with
+            vendor extensions to Impala.
+          </p>
+          <p class="p">
+        <strong class="ph b">Return type:</strong> same as the initial argument value, except that integer values are promoted to
+        <code class="ph codeph">BIGINT</code> and floating-point values are promoted to <code class="ph codeph">DOUBLE</code>; use
+        <code class="ph codeph">CAST()</code> when inserting into a smaller numeric column
+      </p>
+          <p class="p">
+        <strong class="ph b">Added in:</strong> Impala 1.3.0
+      </p>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="conditional_functions__nullvalue">
+          <code class="ph codeph">nullvalue(<var class="keyword varname">expression</var>)</code>
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Tests if an expression (of any type) is <code class="ph codeph">NULL</code> or not.
+          Returns <code class="ph codeph">true</code> if so.
+          The converse of <code class="ph codeph">nonnullvalue()</code>.
+          <p class="p">
+        <strong class="ph b">Return type:</strong> <code class="ph codeph">BOOLEAN</code>
+      </p>
+          <p class="p">
+        <strong class="ph b">Usage notes:</strong> Primarily for compatibility with code containing industry extensions to SQL.
+      </p>
+          <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.2.0</span>
+      </p>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="conditional_functions__nvl">
+          <code class="ph codeph">nvl(type a, type ifNull)</code>
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Alias for the <code class="ph codeph">isnull()</code> function. Tests if an expression is
+          <code class="ph codeph">NULL</code>, and returns the expression result value if not. If the first argument is
+          <code class="ph codeph">NULL</code>, returns the second argument. Equivalent to the <code class="ph codeph">nvl()</code> function
+          from Oracle Database or <code class="ph codeph">ifnull()</code> from MySQL.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> Same as the first argument value
+          </p>
+          <p class="p">
+        <strong class="ph b">Added in:</strong> Impala 1.1
+      </p>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="conditional_functions__nvl2">
+          <code class="ph codeph">nvl2(type a, type ifNull, type ifNotNull)</code>
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Enhanced variant of the <code class="ph codeph">nvl()</code> function. Tests an expression
+          and returns different result values depending on whether it is <code class="ph codeph">NULL</code> or not.
+          If the first argument is <code class="ph codeph">NULL</code>, returns the second argument.
+          If the first argument is not <code class="ph codeph">NULL</code>, returns the third argument.
+          Equivalent to the <code class="ph codeph">nvl2()</code> function from Oracle Database.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> Same as the first argument value
+          </p>
+          <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.9.0</span>
+      </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+          <p class="p">
+            The following examples show how a query can use special indicator values
+            to represent null and not-null expression values. The first example tests
+            an <code class="ph codeph">INT</code> column and so uses special integer values.
+            The second example tests a <code class="ph codeph">STRING</code> column and so uses
+            special string values.
+          </p>
+<pre class="pre codeblock"><code>
+select x, nvl2(x, 999, 0) from nvl2_demo;
++------+---------------------------+
+| x    | if(x is not null, 999, 0) |
++------+---------------------------+
+| NULL | 0                         |
+| 1    | 999                       |
+| NULL | 0                         |
+| 2    | 999                       |
++------+---------------------------+
+
+select s, nvl2(s, 'is not null', 'is null') from nvl2_demo;
++------+---------------------------------------------+
+| s    | if(s is not null, 'is not null', 'is null') |
++------+---------------------------------------------+
+| NULL | is null                                     |
+| one  | is not null                                 |
+| NULL | is null                                     |
+| two  | is not null                                 |
++------+---------------------------------------------+
+</code></pre>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="conditional_functions__zeroifnull">
+          <code class="ph codeph">zeroifnull(<var class="keyword varname">numeric_expr</var>)</code>
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Returns 0 if the numeric expression evaluates to <code class="ph codeph">NULL</code>, otherwise returns
+          the result of the expression.
+          <p class="p">
+            <strong class="ph b">Usage notes:</strong> Used to avoid unexpected results due to unexpected propagation of
+            <code class="ph codeph">NULL</code> values in numeric calculations. Serves as shorthand for a more elaborate
+            <code class="ph codeph">CASE</code> expression, to simplify porting SQL with vendor extensions to Impala.
+          </p>
+          <p class="p">
+        <strong class="ph b">Return type:</strong> same as the initial argument value, except that integer values are promoted to
+        <code class="ph codeph">BIGINT</code> and floating-point values are promoted to <code class="ph codeph">DOUBLE</code>; use
+        <code class="ph codeph">CAST()</code> when inserting into a smaller numeric column
+      </p>
+          <p class="p">
+        <strong class="ph b">Added in:</strong> Impala 1.3.0
+      </p>
+        </dd>
+
+
+    </dl>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_functions.html">Impala Built-In Functions</a></div></div></nav></article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_config.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_config.html b/docs/build3x/html/topics/impala_config.html
new file mode 100644
index 0000000..c2686d8
--- /dev/null
+++ b/docs/build3x/html/topics/impala_config.html
@@ -0,0 +1,48 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_config_performance.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_odbc.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_jdbc.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="config"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Managing Impala</title></head><body id="config"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Managing Impala</h1>
+
+
+  <div class="body conbody">
+
+    <p class="p">
+      This section explains how to configure Impala to accept connections from applications that use popular
+      programming APIs:
+    </p>
+
+    <ul class="ul">
+      <li class="li">
+        <a class="xref" href="impala_config_performance.html#config_performance">Post-Installation Configuration for Impala</a>
+      </li>
+
+      <li class="li">
+        <a class="xref" href="impala_odbc.html#impala_odbc">Configuring Impala to Work with ODBC</a>
+      </li>
+
+      <li class="li">
+        <a class="xref" href="impala_jdbc.html#impala_jdbc">Configuring Impala to Work with JDBC</a>
+      </li>
+    </ul>
+
+    <p class="p">
+      This type of configuration is especially useful when using Impala in combination with Business Intelligence
+      tools, which use these standard interfaces to query different kinds of database and Big Data systems.
+    </p>
+
+    <p class="p">
+      You can also configure these other aspects of Impala:
+    </p>
+
+    <ul class="ul">
+      <li class="li">
+        <a class="xref" href="impala_security.html#security">Impala Security</a>
+      </li>
+
+      <li class="li">
+        <a class="xref" href="impala_config_options.html#config_options">Modifying Impala Startup Options</a>
+      </li>
+    </ul>
+  </div>
+<nav role="navigation" class="related-links"><ul class="ullinks"><li class="link ulchildlink"><strong><a href="../topics/impala_config_performance.html">Post-Installation Configuration for Impala</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_odbc.html">Configuring Impala to Work with ODBC</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_jdbc.html">Configuring Impala to Work with JDBC</a></strong><br></li></ul></nav></article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_config_options.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_config_options.html b/docs/build3x/html/topics/impala_config_options.html
new file mode 100644
index 0000000..12af2bc
--- /dev/null
+++ b/docs/build3x/html/topics/impala_config_options.html
@@ -0,0 +1,389 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_processes.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="config_options"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Modifying Impala Startup Options</title></head><body id="config_options"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Modifying Impala Startup Options</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+      The configuration options for the Impala-related daemons let you choose which hosts and
+      ports to use for the services that run on a single host, specify directories for logging,
+      control resource usage and security, and specify other aspects of the Impala software.
+    </p>
+
+    <p class="p toc inpage"></p>
+
+  </div>
+
+  <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_processes.html">Starting Impala</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="config_options__config_options_noncm">
+
+    <h2 class="title topictitle2" id="ariaid-title2">Configuring Impala Startup Options through the Command Line</h2>
+
+    <div class="body conbody">
+
+      <p class="p"> The Impala server, statestore, and catalog services start up using values provided in a
+        defaults file, <span class="ph filepath">/etc/default/impala</span>. </p>
+
+      <p class="p">
+        This file includes information about many resources used by Impala. Most of the defaults
+        included in this file should be effective in most cases. For example, typically you
+        would not change the definition of the <code class="ph codeph">CLASSPATH</code> variable, but you
+        would always set the address used by the statestore server. Some of the content you
+        might modify includes:
+      </p>
+
+
+
+<pre class="pre codeblock"><code>IMPALA_STATE_STORE_HOST=127.0.0.1
+IMPALA_STATE_STORE_PORT=24000
+IMPALA_BACKEND_PORT=22000
+IMPALA_LOG_DIR=/var/log/impala
+IMPALA_CATALOG_SERVICE_HOST=...
+IMPALA_STATE_STORE_HOST=...
+
+export IMPALA_STATE_STORE_ARGS=${IMPALA_STATE_STORE_ARGS:- \
+    -log_dir=${IMPALA_LOG_DIR} -state_store_port=${IMPALA_STATE_STORE_PORT}}
+IMPALA_SERVER_ARGS=" \
+-log_dir=${IMPALA_LOG_DIR} \
+-catalog_service_host=${IMPALA_CATALOG_SERVICE_HOST} \
+-state_store_port=${IMPALA_STATE_STORE_PORT} \
+-state_store_host=${IMPALA_STATE_STORE_HOST} \
+-be_port=${IMPALA_BACKEND_PORT}"
+export ENABLE_CORE_DUMPS=${ENABLE_COREDUMPS:-false}</code></pre>
+
+      <p class="p">
+        To use alternate values, edit the defaults file, then restart all the Impala-related
+        services so that the changes take effect. Restart the Impala server using the following
+        commands:
+      </p>
+
+<pre class="pre codeblock"><code>$ sudo service impala-server restart
+Stopping Impala Server:                                    [  OK  ]
+Starting Impala Server:                                    [  OK  ]</code></pre>
+
+      <p class="p">
+        Restart the Impala statestore using the following commands:
+      </p>
+
+<pre class="pre codeblock"><code>$ sudo service impala-state-store restart
+Stopping Impala State Store Server:                        [  OK  ]
+Starting Impala State Store Server:                        [  OK  ]</code></pre>
+
+      <p class="p">
+        Restart the Impala catalog service using the following commands:
+      </p>
+
+<pre class="pre codeblock"><code>$ sudo service impala-catalog restart
+Stopping Impala Catalog Server:                            [  OK  ]
+Starting Impala Catalog Server:                            [  OK  ]</code></pre>
+
+      <p class="p">
+        Some common settings to change include:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          <p class="p">
+            Statestore address. Where practical, put the statestore on a separate host not
+            running the <span class="keyword cmdname">impalad</span> daemon. In that recommended configuration,
+            the <span class="keyword cmdname">impalad</span> daemon cannot refer to the statestore server using
+            the loopback address. If the statestore is hosted on a machine with an IP address of
+            192.168.0.27, change:
+          </p>
+<pre class="pre codeblock"><code>IMPALA_STATE_STORE_HOST=127.0.0.1</code></pre>
+          <p class="p">
+            to:
+          </p>
+<pre class="pre codeblock"><code>IMPALA_STATE_STORE_HOST=192.168.0.27</code></pre>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            Catalog server address (including both the hostname and the port number). Update the
+            value of the <code class="ph codeph">IMPALA_CATALOG_SERVICE_HOST</code> variable. Where
+            practical, run the catalog server on the same host as the statestore. In that
+            recommended configuration, the <span class="keyword cmdname">impalad</span> daemon cannot refer to the
+            catalog server using the loopback address. If the catalog service is hosted on a
+            machine with an IP address of 192.168.0.27, add the following line:
+          </p>
+<pre class="pre codeblock"><code>IMPALA_CATALOG_SERVICE_HOST=192.168.0.27:26000</code></pre>
+          <p class="p">
+            The <span class="ph filepath">/etc/default/impala</span> defaults file currently does not define
+            an <code class="ph codeph">IMPALA_CATALOG_ARGS</code> environment variable, but if you add one it
+            will be recognized by the service startup/shutdown script. Add a definition for this
+            variable to <span class="ph filepath">/etc/default/impala</span> and add the option
+            <code class="ph codeph">-catalog_service_host=<var class="keyword varname">hostname</var></code>. If the port is
+            different than the default 26000, also add the option
+            <code class="ph codeph">-catalog_service_port=<var class="keyword varname">port</var></code>.
+          </p>
+        </li>
+
+        <li class="li" id="config_options_noncm__mem_limit">
+          <p class="p">
+            Memory limits. You can limit the amount of memory available to Impala. For example,
+            to allow Impala to use no more than 70% of system memory, change:
+          </p>
+
+<pre class="pre codeblock"><code>export IMPALA_SERVER_ARGS=${IMPALA_SERVER_ARGS:- \
+    -log_dir=${IMPALA_LOG_DIR} \
+    -state_store_port=${IMPALA_STATE_STORE_PORT} \
+    -state_store_host=${IMPALA_STATE_STORE_HOST} \
+    -be_port=${IMPALA_BACKEND_PORT}}</code></pre>
+          <p class="p">
+            to:
+          </p>
+<pre class="pre codeblock"><code>export IMPALA_SERVER_ARGS=${IMPALA_SERVER_ARGS:- \
+    -log_dir=${IMPALA_LOG_DIR} -state_store_port=${IMPALA_STATE_STORE_PORT} \
+    -state_store_host=${IMPALA_STATE_STORE_HOST} \
+    -be_port=${IMPALA_BACKEND_PORT} -mem_limit=70%}</code></pre>
+          <p class="p">
+            You can specify the memory limit using absolute notation such as
+            <code class="ph codeph">500m</code> or <code class="ph codeph">2G</code>, or as a percentage of physical memory
+            such as <code class="ph codeph">60%</code>.
+          </p>
+
+          <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+            Queries that exceed the specified memory limit are aborted. Percentage limits are
+            based on the physical memory of the machine and do not consider cgroups.
+          </div>
+        </li>
+
+        <li class="li">
+          <p class="p"> Core dump enablement. To enable core dumps, change: </p>
+<pre class="pre codeblock"><code>export ENABLE_CORE_DUMPS=${ENABLE_COREDUMPS:-false}</code></pre>
+          <p class="p">
+            to:
+          </p>
+<pre class="pre codeblock"><code>export ENABLE_CORE_DUMPS=${ENABLE_COREDUMPS:-true}</code></pre>
+
+          <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+      <ul class="ul">
+        <li class="li">
+          <p class="p">
+            The location of core dump files may vary according to your operating system configuration.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            Other security settings may prevent Impala from writing core dumps even when this option is enabled.
+          </p>
+        </li>
+      </ul>
+      </div>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            Authorization using the open source Sentry plugin. Specify the
+            <code class="ph codeph">-server_name</code> and <code class="ph codeph">-authorization_policy_file</code>
+            options as part of the <code class="ph codeph">IMPALA_SERVER_ARGS</code> and
+            <code class="ph codeph">IMPALA_STATE_STORE_ARGS</code> settings to enable the core Impala support
+            for authentication. See <a class="xref" href="impala_authorization.html#secure_startup">Starting the impalad Daemon with Sentry Authorization Enabled</a> for
+            details.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            Auditing for successful or blocked Impala queries, another aspect of security.
+            Specify the <code class="ph codeph">-audit_event_log_dir=<var class="keyword varname">directory_path</var></code>
+            option and optionally the
+            <code class="ph codeph">-max_audit_event_log_file_size=<var class="keyword varname">number_of_queries</var></code>
+            and <code class="ph codeph">-abort_on_failed_audit_event</code> options as part of the
+            <code class="ph codeph">IMPALA_SERVER_ARGS</code> settings, for each Impala node, to enable and
+            customize auditing. See <a class="xref" href="impala_auditing.html#auditing">Auditing Impala Operations</a> for details.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            Password protection for the Impala web UI, which listens on port 25000 by default.
+            This feature involves adding some or all of the
+            <code class="ph codeph">--webserver_password_file</code>,
+            <code class="ph codeph">--webserver_authentication_domain</code>, and
+            <code class="ph codeph">--webserver_certificate_file</code> options to the
+            <code class="ph codeph">IMPALA_SERVER_ARGS</code> and <code class="ph codeph">IMPALA_STATE_STORE_ARGS</code>
+            settings. See <a class="xref" href="impala_security_guidelines.html#security_guidelines">Security Guidelines for Impala</a> for
+            details.
+          </p>
+        </li>
+
+        <li class="li" id="config_options_noncm__default_query_options">
+          <div class="p">
+            Another setting you might add to <code class="ph codeph">IMPALA_SERVER_ARGS</code> is a
+            comma-separated list of query options and values:
+<pre class="pre codeblock"><code>-default_query_options='<var class="keyword varname">option</var>=<var class="keyword varname">value</var>,<var class="keyword varname">option</var>=<var class="keyword varname">value</var>,...'
+</code></pre>
+            These options control the behavior of queries performed by this
+            <span class="keyword cmdname">impalad</span> instance. The option values you specify here override the
+            default values for <a class="xref" href="impala_query_options.html#query_options">Impala query
+            options</a>, as shown by the <code class="ph codeph">SET</code> statement in
+            <span class="keyword cmdname">impala-shell</span>.
+          </div>
+        </li>
+
+
+
+        <li class="li">
+          <p class="p">
+            During troubleshooting, <span class="keyword">the appropriate support channel</span> might direct you to change other values,
+            particularly for <code class="ph codeph">IMPALA_SERVER_ARGS</code>, to work around issues or
+            gather debugging information.
+          </p>
+        </li>
+      </ul>
+
+
+
+      <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+        <p class="p">
+          These startup options for the <span class="keyword cmdname">impalad</span> daemon are different from the
+          command-line options for the <span class="keyword cmdname">impala-shell</span> command. For the
+          <span class="keyword cmdname">impala-shell</span> options, see
+          <a class="xref" href="impala_shell_options.html#shell_options">impala-shell Configuration Options</a>.
+        </p>
+      </div>
+
+
+
+    </div>
+
+
+
+
+
+
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="config_options__config_options_checking">
+
+    <h2 class="title topictitle2" id="ariaid-title3">Checking the Values of Impala Configuration Options</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        You can check the current runtime value of all these settings through the Impala web
+        interface, available by default at
+        <code class="ph codeph">http://<var class="keyword varname">impala_hostname</var>:25000/varz</code> for the
+        <span class="keyword cmdname">impalad</span> daemon,
+        <code class="ph codeph">http://<var class="keyword varname">impala_hostname</var>:25010/varz</code> for the
+        <span class="keyword cmdname">statestored</span> daemon, or
+        <code class="ph codeph">http://<var class="keyword varname">impala_hostname</var>:25020/varz</code> for the
+        <span class="keyword cmdname">catalogd</span> daemon.
+      </p>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="config_options__config_options_impalad">
+
+    <h2 class="title topictitle2" id="ariaid-title4">Startup Options for impalad Daemon</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        The <code class="ph codeph">impalad</code> daemon implements the main Impala service, which performs
+        query processing and reads and writes the data files.
+      </p>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title5" id="config_options__config_options_statestored">
+
+    <h2 class="title topictitle2" id="ariaid-title5">Startup Options for statestored Daemon</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        The <span class="keyword cmdname">statestored</span> daemon implements the Impala statestore service,
+        which monitors the availability of Impala services across the cluster, and handles
+        situations such as nodes becoming unavailable or becoming available again.
+      </p>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title6" id="config_options__config_options_catalogd">
+
+    <h2 class="title topictitle2" id="ariaid-title6">Startup Options for catalogd Daemon</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        The <span class="keyword cmdname">catalogd</span> daemon implements the Impala catalog service, which
+        broadcasts metadata changes to all the Impala nodes when Impala creates a table, inserts
+        data, or performs other kinds of DDL and DML operations.
+      </p>
+
+      <div class="p">
+        Use <code class="ph codeph">--load_catalog_in_background</code> option to control when
+        the metadata of a table is loaded.
+        <ul class="ul">
+          <li class="li">
+            If set to <code class="ph codeph">false</code>, the metadata of a table is
+            loaded when it is referenced for the first time. This means that the
+            first run of a particular query can be slower than subsequent runs.
+            Starting in Impala 2.2, the default for
+            <code class="ph codeph">load_catalog_in_background</code> is
+            <code class="ph codeph">false</code>.
+          </li>
+          <li class="li">
+            If set to <code class="ph codeph">true</code>, the catalog service attempts to
+            load metadata for a table even if no query needed that metadata. So
+            metadata will possibly be already loaded when the first query that
+            would need it is run. However, for the following reasons, we
+            recommend not to set the option to <code class="ph codeph">true</code>.
+            <ul class="ul">
+              <li class="li">
+                Background load can interfere with query-specific metadata
+                loading. This can happen on startup or after invalidating
+                metadata, with a duration depending on the amount of metadata,
+                and can lead to a seemingly random long running queries that are
+                difficult to diagnose.
+              </li>
+              <li class="li">
+                Impala may load metadata for tables that are possibly never
+                used, potentially increasing catalog size and consequently memory
+                usage for both catalog service and Impala Daemon.
+              </li>
+            </ul>
+          </li>
+        </ul>
+      </div>
+
+    </div>
+
+  </article>
+
+</article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_config_performance.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_config_performance.html b/docs/build3x/html/topics/impala_config_performance.html
new file mode 100644
index 0000000..ad91a39
--- /dev/null
+++ b/docs/build3x/html/topics/impala_config_performance.html
@@ -0,0 +1,149 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_config.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="config_performance"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Post-Installation Configuration for Impala</title></head><body id="config_performance"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Post-Installation Configuration for Impala</h1>
+
+
+  <div class="body conbody">
+
+    <p class="p" id="config_performance__p_24">
+      This section describes the mandatory and recommended configuration settings for Impala. If Impala is
+      installed using cluster management software, some of these configurations might be completed automatically; you must still
+      configure short-circuit reads manually. If you want to customize your environment, consider making the changes described in this topic.
+    </p>
+
+    <ul class="ul">
+      <li class="li">
+        You must enable short-circuit reads, whether or not Impala was installed with cluster
+        management software. This setting goes in the Impala configuration settings, not the Hadoop-wide settings.
+      </li>
+
+      <li class="li">
+        You must enable block location tracking, and you can optionally enable native checksumming for optimal performance.
+      </li>
+    </ul>
+
+    <section class="section" id="config_performance__section_fhq_wyv_ls"><h2 class="title sectiontitle">Mandatory: Short-Circuit Reads</h2>
+
+      <p class="p"> Enabling short-circuit reads allows Impala to read local data directly
+        from the file system. This removes the need to communicate through the
+        DataNodes, improving performance. This setting also minimizes the number
+        of additional copies of data. Short-circuit reads requires
+          <code class="ph codeph">libhadoop.so</code>
+        (the Hadoop Native Library) to be accessible to both the server and the
+        client. <code class="ph codeph">libhadoop.so</code> is not available if you have
+        installed from a tarball. You must install from an
+        <code class="ph codeph">.rpm</code>, <code class="ph codeph">.deb</code>, or parcel to use
+        short-circuit local reads.
+      </p>
+      <p class="p">
+        <strong class="ph b">To configure DataNodes for short-circuit reads:</strong>
+      </p>
+      <ol class="ol" id="config_performance__ol_qlq_wyv_ls">
+        <li class="li" id="config_performance__copy_config_files"> Copy the client
+            <code class="ph codeph">core-site.xml</code> and <code class="ph codeph">hdfs-site.xml</code>
+          configuration files from the Hadoop configuration directory to the
+          Impala configuration directory. The default Impala configuration
+          location is <code class="ph codeph">/etc/impala/conf</code>. </li>
+        <li class="li">
+
+
+
+          On all Impala nodes, configure the following properties in
+
+          Impala's copy of <code class="ph codeph">hdfs-site.xml</code> as shown: <pre class="pre codeblock"><code>&lt;property&gt;
+    &lt;name&gt;dfs.client.read.shortcircuit&lt;/name&gt;
+    &lt;value&gt;true&lt;/value&gt;
+&lt;/property&gt;
+
+&lt;property&gt;
+    &lt;name&gt;dfs.domain.socket.path&lt;/name&gt;
+    &lt;value&gt;/var/run/hdfs-sockets/dn&lt;/value&gt;
+&lt;/property&gt;
+
+&lt;property&gt;
+    &lt;name&gt;dfs.client.file-block-storage-locations.timeout.millis&lt;/name&gt;
+    &lt;value&gt;10000&lt;/value&gt;
+&lt;/property&gt;</code></pre>
+
+
+        </li>
+        <li class="li">
+          <p class="p"> If <code class="ph codeph">/var/run/hadoop-hdfs/</code> is group-writable, make
+            sure its group is <code class="ph codeph">root</code>. </p>
+          <div class="note note note_note"><span class="note__title notetitle">Note:</span>  If you are also going to enable block location tracking, you
+            can skip copying configuration files and restarting DataNodes and go
+            straight to <a class="xref" href="#config_performance__block_location_tracking">Optional: Block Location Tracking</a>.
+            Configuring short-circuit reads and block location tracking require
+            the same process of copying files and restarting services, so you
+            can complete that process once when you have completed all
+            configuration changes. Whether you copy files and restart services
+            now or during configuring block location tracking, short-circuit
+            reads are not enabled until you complete those final steps. </div>
+        </li>
+        <li class="li" id="config_performance__restart_all_datanodes"> After applying these changes, restart
+          all DataNodes. </li>
+      </ol>
+    </section>
+
+    <section class="section" id="config_performance__block_location_tracking"><h2 class="title sectiontitle">Mandatory: Block Location Tracking</h2>
+
+
+
+      <p class="p">
+        Enabling block location metadata allows Impala to know which disk data blocks are located on, allowing
+        better utilization of the underlying disks. Impala will not start unless this setting is enabled.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">To enable block location tracking:</strong>
+      </p>
+
+      <ol class="ol">
+        <li class="li">
+          For each DataNode, adding the following to the&nbsp;<code class="ph codeph">hdfs-site.xml</code> file:
+<pre class="pre codeblock"><code>&lt;property&gt;
+  &lt;name&gt;dfs.datanode.hdfs-blocks-metadata.enabled&lt;/name&gt;
+  &lt;value&gt;true&lt;/value&gt;
+&lt;/property&gt; </code></pre>
+        </li>
+
+        <li class="li"> Copy the client
+            <code class="ph codeph">core-site.xml</code> and <code class="ph codeph">hdfs-site.xml</code>
+          configuration files from the Hadoop configuration directory to the
+          Impala configuration directory. The default Impala configuration
+          location is <code class="ph codeph">/etc/impala/conf</code>. </li>
+
+        <li class="li"> After applying these changes, restart
+          all DataNodes. </li>
+      </ol>
+    </section>
+
+    <section class="section" id="config_performance__native_checksumming"><h2 class="title sectiontitle">Optional: Native Checksumming</h2>
+
+
+
+      <p class="p">
+        Enabling native checksumming causes Impala to use an optimized native library for computing checksums, if
+        that library is available.
+      </p>
+
+      <p class="p" id="config_performance__p_29">
+        <strong class="ph b">To enable native checksumming:</strong>
+      </p>
+
+      <p class="p">
+        If you installed <span class="keyword"></span> from packages, the native checksumming library is installed and setup correctly. In
+        such a case, no additional steps are required. Conversely, if you installed by other means, such as with
+        tarballs, native checksumming may not be available due to missing shared objects. Finding the message
+        "<code class="ph codeph">Unable to load native-hadoop library for your platform... using builtin-java classes where
+        applicable</code>" in the Impala logs indicates native checksumming may be unavailable. To enable native
+        checksumming, you must build and install <code class="ph codeph">libhadoop.so</code> (the
+
+
+        Hadoop Native Library).
+      </p>
+    </section>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_config.html">Managing Impala</a></div></div></nav></article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_connecting.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_connecting.html b/docs/build3x/html/topics/impala_connecting.html
new file mode 100644
index 0000000..1411525
--- /dev/null
+++ b/docs/build3x/html/topics/impala_connecting.html
@@ -0,0 +1,187 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_impala_shell.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="connecting"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Connecting to impalad through impala-shell</title></head><body id="connecting"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Connecting to impalad through impala-shell</h1>
+
+
+
+  <div class="body conbody">
+
+
+
+    <div class="p">
+      Within an <span class="keyword cmdname">impala-shell</span> session, you can only issue queries while connected to an instance
+      of the <span class="keyword cmdname">impalad</span> daemon. You can specify the connection information:
+      <ul class="ul">
+        <li class="li">
+          Through command-line options when you run the <span class="keyword cmdname">impala-shell</span> command.
+        </li>
+        <li class="li">
+          Through a configuration file that is read when you run the <span class="keyword cmdname">impala-shell</span> command.
+        </li>
+        <li class="li">
+          During an <span class="keyword cmdname">impala-shell</span> session, by issuing a <code class="ph codeph">CONNECT</code> command.
+        </li>
+      </ul>
+      See <a class="xref" href="impala_shell_options.html">impala-shell Configuration Options</a> for the command-line and configuration file options you can use.
+    </div>
+
+    <p class="p">
+      You can connect to any DataNode where an instance of <span class="keyword cmdname">impalad</span> is running,
+      and that host coordinates the execution of all queries sent to it.
+    </p>
+
+    <p class="p">
+      For simplicity during development, you might always connect to the same host, perhaps running <span class="keyword cmdname">impala-shell</span> on
+      the same host as <span class="keyword cmdname">impalad</span> and specifying the hostname as <code class="ph codeph">localhost</code>.
+    </p>
+
+    <p class="p">
+      In a production environment, you might enable load balancing, in which you connect to specific host/port combination
+      but queries are forwarded to arbitrary hosts. This technique spreads the overhead of acting as the coordinator
+      node among all the DataNodes in the cluster. See <a class="xref" href="impala_proxy.html">Using Impala through a Proxy for High Availability</a> for details.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">To connect the Impala shell during shell startup:</strong>
+    </p>
+
+    <ol class="ol">
+      <li class="li">
+        Locate the hostname of a DataNode within the cluster that is running an instance of the
+        <span class="keyword cmdname">impalad</span> daemon. If that DataNode uses a non-default port (something
+        other than port 21000) for <span class="keyword cmdname">impala-shell</span> connections, find out the
+        port number also.
+      </li>
+
+      <li class="li">
+        Use the <code class="ph codeph">-i</code> option to the
+        <span class="keyword cmdname">impala-shell</span> interpreter to specify the connection information for
+        that instance of <span class="keyword cmdname">impalad</span>:
+<pre class="pre codeblock"><code># When you are logged into the same machine running impalad.
+# The prompt will reflect the current hostname.
+$ impala-shell
+
+# When you are logged into the same machine running impalad.
+# The host will reflect the hostname 'localhost'.
+$ impala-shell -i localhost
+
+# When you are logged onto a different host, perhaps a client machine
+# outside the Hadoop cluster.
+$ impala-shell -i <var class="keyword varname">some.other.hostname</var>
+
+# When you are logged onto a different host, and impalad is listening
+# on a non-default port. Perhaps a load balancer is forwarding requests
+# to a different host/port combination behind the scenes.
+$ impala-shell -i <var class="keyword varname">some.other.hostname</var>:<var class="keyword varname">port_number</var>
+</code></pre>
+      </li>
+    </ol>
+
+    <p class="p">
+      <strong class="ph b">To connect the Impala shell after shell startup:</strong>
+    </p>
+
+    <ol class="ol">
+      <li class="li">
+        Start the Impala shell with no connection:
+<pre class="pre codeblock"><code>$ impala-shell</code></pre>
+        <p class="p">
+          You should see a prompt like the following:
+        </p>
+<pre class="pre codeblock"><code>Welcome to the Impala shell. Press TAB twice to see a list of available commands.
+...
+<span class="ph">(Shell
+      build version: Impala Shell v3.0.x (<var class="keyword varname">hash</var>) built on
+      <var class="keyword varname">date</var>)</span>
+[Not connected] &gt; </code></pre>
+      </li>
+
+      <li class="li">
+        Locate the hostname of a DataNode within the cluster that is running an instance of the
+        <span class="keyword cmdname">impalad</span> daemon. If that DataNode uses a non-default port (something
+        other than port 21000) for <span class="keyword cmdname">impala-shell</span> connections, find out the
+        port number also.
+      </li>
+
+      <li class="li">
+        Use the <code class="ph codeph">connect</code> command to connect to an Impala instance. Enter a command of the form:
+<pre class="pre codeblock"><code>[Not connected] &gt; connect <var class="keyword varname">impalad-host</var>
+[<var class="keyword varname">impalad-host</var>:21000] &gt;</code></pre>
+        <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+          Replace <var class="keyword varname">impalad-host</var> with the hostname you have configured for any DataNode running
+          Impala in your environment. The changed prompt indicates a successful connection.
+        </div>
+      </li>
+    </ol>
+
+    <p class="p">
+      <strong class="ph b">To start <span class="keyword cmdname">impala-shell</span> in a specific database:</strong>
+    </p>
+
+    <p class="p">
+      You can use all the same connection options as in previous examples.
+      For simplicity, these examples assume that you are logged into one of
+      the DataNodes that is running the <span class="keyword cmdname">impalad</span> daemon.
+    </p>
+
+    <ol class="ol">
+      <li class="li">
+        Find the name of the database containing the relevant tables, views, and so
+        on that you want to operate on.
+      </li>
+
+      <li class="li">
+        Use the <code class="ph codeph">-d</code> option to the
+        <span class="keyword cmdname">impala-shell</span> interpreter to connect and immediately
+        switch to the specified database, without the need for a <code class="ph codeph">USE</code>
+        statement or fully qualified names:
+<pre class="pre codeblock"><code># Subsequent queries with unqualified names operate on
+# tables, views, and so on inside the database named 'staging'.
+$ impala-shell -i localhost -d staging
+
+# It is common during development, ETL, benchmarking, and so on
+# to have different databases containing the same table names
+# but with different contents or layouts.
+$ impala-shell -i localhost -d parquet_snappy_compression
+$ impala-shell -i localhost -d parquet_gzip_compression
+</code></pre>
+      </li>
+    </ol>
+
+    <p class="p">
+      <strong class="ph b">To run one or several statements in non-interactive mode:</strong>
+    </p>
+
+    <p class="p">
+      You can use all the same connection options as in previous examples.
+      For simplicity, these examples assume that you are logged into one of
+      the DataNodes that is running the <span class="keyword cmdname">impalad</span> daemon.
+    </p>
+
+    <ol class="ol">
+      <li class="li">
+        Construct a statement, or a file containing a sequence of statements,
+        that you want to run in an automated way, without typing or copying
+        and pasting each time.
+      </li>
+
+      <li class="li">
+        Invoke <span class="keyword cmdname">impala-shell</span> with the <code class="ph codeph">-q</code> option to run a single statement, or
+        the <code class="ph codeph">-f</code> option to run a sequence of statements from a file.
+        The <span class="keyword cmdname">impala-shell</span> command returns immediately, without going into
+        the interactive interpreter.
+<pre class="pre codeblock"><code># A utility command that you might run while developing shell scripts
+# to manipulate HDFS files.
+$ impala-shell -i localhost -d database_of_interest -q 'show tables'
+
+# A sequence of CREATE TABLE, CREATE VIEW, and similar DDL statements
+# can go into a file to make the setup process repeatable.
+$ impala-shell -i localhost -d database_of_interest -f recreate_tables.sql
+</code></pre>
+      </li>
+    </ol>
+
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_impala_shell.html">Using the Impala Shell (impala-shell Command)</a></div></div></nav></article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_conversion_functions.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_conversion_functions.html b/docs/build3x/html/topics/impala_conversion_functions.html
new file mode 100644
index 0000000..5532c8e
--- /dev/null
+++ b/docs/build3x/html/topics/impala_conversion_functions.html
@@ -0,0 +1,288 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_functions.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="conversion_functions"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Impala Type Conversion Functions</title></head><body id="conversion_functions"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Impala Type Conversion Functions</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+      Conversion functions are usually used in combination with other functions, to explicitly pass the expected
+      data types. Impala has strict rules regarding data types for function parameters. For example, Impala does
+      not automatically convert a <code class="ph codeph">DOUBLE</code> value to <code class="ph codeph">FLOAT</code>, a
+      <code class="ph codeph">BIGINT</code> value to <code class="ph codeph">INT</code>, or other conversion where precision could be lost or
+      overflow could occur. Also, for reporting or dealing with loosely defined schemas in big data contexts,
+      you might frequently need to convert values to or from the <code class="ph codeph">STRING</code> type.
+    </p>
+
+    <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+      Although in <span class="keyword">Impala 2.3</span>, the <code class="ph codeph">SHOW FUNCTIONS</code> output for
+      database <code class="ph codeph">_IMPALA_BUILTINS</code> contains some function signatures
+      matching the pattern <code class="ph codeph">castto*</code>, these functions are not intended
+      for public use and are expected to be hidden in future.
+    </div>
+
+    <p class="p">
+      <strong class="ph b">Function reference:</strong>
+    </p>
+
+    <p class="p">
+      Impala supports the following type conversion functions:
+    </p>
+
+<dl class="dl">
+
+
+<dt class="dt dlterm" id="conversion_functions__cast">
+<code class="ph codeph">cast(<var class="keyword varname">expr</var> AS <var class="keyword varname">type</var>)</code>
+</dt>
+
+<dd class="dd">
+
+<strong class="ph b">Purpose:</strong> Converts the value of an expression to any other type.
+If the expression value is of a type that cannot be converted to the target type, the result is <code class="ph codeph">NULL</code>.
+<p class="p"><strong class="ph b">Usage notes:</strong>
+Use <code class="ph codeph">CAST</code> when passing a column value or literal to a function that
+expects a parameter with a different type.
+Frequently used in SQL operations such as <code class="ph codeph">CREATE TABLE AS SELECT</code>
+and <code class="ph codeph">INSERT ... VALUES</code> to ensure that values from various sources
+are of the appropriate type for the destination columns.
+Where practical, do a one-time <code class="ph codeph">CAST()</code> operation during the ingestion process
+to make each column into the appropriate type, rather than using many <code class="ph codeph">CAST()</code>
+operations in each query; doing type conversions for each row during each query can be expensive
+for tables with millions or billions of rows.
+</p>
+    <p class="p">
+        The way this function deals with time zones when converting to or from <code class="ph codeph">TIMESTAMP</code>
+        values is affected by the <code class="ph codeph">--use_local_tz_for_unix_timestamp_conversions</code> startup flag for the
+        <span class="keyword cmdname">impalad</span> daemon. See <a class="xref" href="../shared/../topics/impala_timestamp.html#timestamp">TIMESTAMP Data Type</a> for details about
+        how Impala handles time zone considerations for the <code class="ph codeph">TIMESTAMP</code> data type.
+      </p>
+
+<p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+<pre class="pre codeblock"><code>select concat('Here are the first ',10,' results.'); -- Fails
+select concat('Here are the first ',cast(10 as string),' results.'); -- Succeeds
+</code></pre>
+<p class="p">
+The following example starts with a text table where every column has a type of <code class="ph codeph">STRING</code>,
+which might be how you ingest data of unknown schema until you can verify the cleanliness of the underly values.
+Then it uses <code class="ph codeph">CAST()</code> to create a new Parquet table with the same data, but using specific
+numeric data types for the columns with numeric data. Using numeric types of appropriate sizes can result in
+substantial space savings on disk and in memory, and performance improvements in queries,
+over using strings or larger-than-necessary numeric types.
+</p>
+<pre class="pre codeblock"><code>create table t1 (name string, x string, y string, z string);
+
+create table t2 stored as parquet
+as select
+  name,
+  cast(x as bigint) x,
+  cast(y as timestamp) y,
+  cast(z as smallint) z
+from t1;
+
+describe t2;
++------+----------+---------+
+| name | type     | comment |
++------+----------+---------+
+| name | string   |         |
+| x    | bigint   |         |
+| y    | smallint |         |
+| z    | tinyint  |         |
++------+----------+---------+
+</code></pre>
+<p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+<p class="p">
+
+  For details of casts from each kind of data type, see the description of
+  the appropriate type:
+  <a class="xref" href="impala_tinyint.html#tinyint">TINYINT Data Type</a>,
+  <a class="xref" href="impala_smallint.html#smallint">SMALLINT Data Type</a>,
+  <a class="xref" href="impala_int.html#int">INT Data Type</a>,
+  <a class="xref" href="impala_bigint.html#bigint">BIGINT Data Type</a>,
+  <a class="xref" href="impala_float.html#float">FLOAT Data Type</a>,
+  <a class="xref" href="impala_double.html#double">DOUBLE Data Type</a>,
+  <a class="xref" href="impala_decimal.html#decimal">DECIMAL Data Type (Impala 3.0 or higher only)</a>,
+  <a class="xref" href="impala_string.html#string">STRING Data Type</a>,
+  <a class="xref" href="impala_char.html#char">CHAR Data Type (Impala 2.0 or higher only)</a>,
+  <a class="xref" href="impala_varchar.html#varchar">VARCHAR Data Type (Impala 2.0 or higher only)</a>,
+  <a class="xref" href="impala_timestamp.html#timestamp">TIMESTAMP Data Type</a>,
+  <a class="xref" href="impala_boolean.html#boolean">BOOLEAN Data Type</a>
+</p>
+</dd>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+<dt class="dt dlterm" id="conversion_functions__typeof">
+<code class="ph codeph">typeof(type value)</code>
+</dt>
+<dd class="dd">
+
+<strong class="ph b">Purpose:</strong> Returns the name of the data type corresponding to an expression. For types with
+extra attributes, such as length for <code class="ph codeph">CHAR</code> and <code class="ph codeph">VARCHAR</code>,
+or precision and scale for <code class="ph codeph">DECIMAL</code>, includes the full specification of the type.
+
+<p class="p"><strong class="ph b">Return type:</strong> <code class="ph codeph">string</code></p>
+<p class="p"><strong class="ph b">Usage notes:</strong> Typically used in interactive exploration of a schema, or in application code that programmatically generates schema definitions such as <code class="ph codeph">CREATE TABLE</code> statements.
+For example, previously, to understand the type of an expression such as
+<code class="ph codeph">col1 / col2</code> or <code class="ph codeph">concat(col1, col2, col3)</code>,
+you might have created a dummy table with a single row, using syntax such as <code class="ph codeph">CREATE TABLE foo AS SELECT 5 / 3.0</code>,
+and then doing a <code class="ph codeph">DESCRIBE</code> to see the type of the row.
+Or you might have done a <code class="ph codeph">CREATE TABLE AS SELECT</code> operation to create a table and
+copy data into it, only learning the types of the columns by doing a <code class="ph codeph">DESCRIBE</code> afterward.
+This technique is especially useful for arithmetic expressions involving <code class="ph codeph">DECIMAL</code> types,
+because the precision and scale of the result is typically different than that of the operands.
+</p>
+<p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.3.0</span>
+      </p>
+<p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+<p class="p">
+These examples show how to check the type of a simple literal or function value.
+Notice how adding even tiny integers together changes the data type of the result to
+avoid overflow, and how the results of arithmetic operations on <code class="ph codeph">DECIMAL</code> values
+have specific precision and scale attributes.
+</p>
+<pre class="pre codeblock"><code>select typeof(2)
++-----------+
+| typeof(2) |
++-----------+
+| TINYINT   |
++-----------+
+
+select typeof(2+2)
++---------------+
+| typeof(2 + 2) |
++---------------+
+| SMALLINT      |
++---------------+
+
+select typeof('xyz')
++---------------+
+| typeof('xyz') |
++---------------+
+| STRING        |
++---------------+
+
+select typeof(now())
++---------------+
+| typeof(now()) |
++---------------+
+| TIMESTAMP     |
++---------------+
+
+select typeof(5.3 / 2.1)
++-------------------+
+| typeof(5.3 / 2.1) |
++-------------------+
+| DECIMAL(6,4)      |
++-------------------+
+
+select typeof(5.30001 / 2342.1);
++--------------------------+
+| typeof(5.30001 / 2342.1) |
++--------------------------+
+| DECIMAL(13,11)           |
++--------------------------+
+
+select typeof(typeof(2+2))
++-----------------------+
+| typeof(typeof(2 + 2)) |
++-----------------------+
+| STRING                |
++-----------------------+
+</code></pre>
+
+<p class="p">
+This example shows how even if you do not have a record of the type of a column,
+for example because the type was changed by <code class="ph codeph">ALTER TABLE</code> after the
+original <code class="ph codeph">CREATE TABLE</code>, you can still find out the type in a
+more compact form than examining the full <code class="ph codeph">DESCRIBE</code> output.
+Remember to use <code class="ph codeph">LIMIT 1</code> in such cases, to avoid an identical
+result value for every row in the table.
+</p>
+<pre class="pre codeblock"><code>create table typeof_example (a int, b tinyint, c smallint, d bigint);
+
+/* Empty result set if there is no data in the table. */
+select typeof(a) from typeof_example;
+
+/* OK, now we have some data but the type of column A is being changed. */
+insert into typeof_example values (1, 2, 3, 4);
+alter table typeof_example change a a bigint;
+
+/* We can always find out the current type of that column without doing a full DESCRIBE. */
+select typeof(a) from typeof_example limit 1;
++-----------+
+| typeof(a) |
++-----------+
+| BIGINT    |
++-----------+
+</code></pre>
+<p class="p">
+This example shows how you might programmatically generate a <code class="ph codeph">CREATE TABLE</code> statement
+with the appropriate column definitions to hold the result values of arbitrary expressions.
+The <code class="ph codeph">typeof()</code> function lets you construct a detailed <code class="ph codeph">CREATE TABLE</code> statement
+without actually creating the table, as opposed to <code class="ph codeph">CREATE TABLE AS SELECT</code> operations
+where you create the destination table but only learn the column data types afterward through <code class="ph codeph">DESCRIBE</code>.
+</p>
+<pre class="pre codeblock"><code>describe typeof_example;
++------+----------+---------+
+| name | type     | comment |
++------+----------+---------+
+| a    | bigint   |         |
+| b    | tinyint  |         |
+| c    | smallint |         |
+| d    | bigint   |         |
++------+----------+---------+
+
+/* An ETL or business intelligence tool might create variations on a table with different file formats,
+   different sets of columns, and so on. TYPEOF() lets an application introspect the types of the original columns. */
+select concat('create table derived_table (a ', typeof(a), ', b ', typeof(b), ', c ',
+    typeof(c), ', d ', typeof(d), ') stored as parquet;')
+  as 'create table statement'
+from typeof_example limit 1;
++-------------------------------------------------------------------------------------------+
+| create table statement                                                                    |
++-------------------------------------------------------------------------------------------+
+| create table derived_table (a BIGINT, b TINYINT, c SMALLINT, d BIGINT) stored as parquet; |
++-------------------------------------------------------------------------------------------+
+</code></pre>
+</dd>
+
+
+</dl>
+
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_functions.html">Impala Built-In Functions</a></div></div></nav></article></main></body></html>

[50/51] [partial] impala git commit: [DOCS] Impala doc site update for 3.0

Posted by mi...@apache.org.

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_adls.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_adls.html b/docs/build3x/html/topics/impala_adls.html
new file mode 100644
index 0000000..4353825
--- /dev/null
+++ b/docs/build3x/html/topics/impala_adls.html
@@ -0,0 +1,638 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="adls"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Using Impala with the Azure Data Lake Store (ADLS)</title></head><body id="adls"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Using Impala with the Azure Data Lake Store (ADLS)</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+
+      You can use Impala to query data residing on the Azure Data Lake Store (ADLS) filesystem.
+      This capability allows convenient access to a storage system that is remotely managed,
+      accessible from anywhere, and integrated with various cloud-based services. Impala can
+      query files in any supported file format from ADLS. The ADLS storage location
+      can be for an entire table, or individual partitions in a partitioned table.
+    </p>
+
+    <p class="p">
+      The default Impala tables use data files stored on HDFS, which are ideal for bulk loads and queries using
+      full-table scans. In contrast, queries against ADLS data are less performant, making ADLS suitable for holding
+      <span class="q">"cold"</span> data that is only queried occasionally, while more frequently accessed <span class="q">"hot"</span> data resides in
+      HDFS. In a partitioned table, you can set the <code class="ph codeph">LOCATION</code> attribute for individual partitions
+      to put some partitions on HDFS and others on ADLS, typically depending on the age of the data.
+    </p>
+
+    <p class="p toc inpage"></p>
+
+  </div>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title2" id="adls__prereqs">
+    <h2 class="title topictitle2" id="ariaid-title2">Prerequisites</h2>
+    <div class="body conbody">
+      <p class="p">
+        These procedures presume that you have already set up an Azure account,
+        configured an ADLS store, and configured your Hadoop cluster with appropriate
+        credentials to be able to access ADLS. See the following resources for information:
+      </p>
+      <ul class="ul">
+        <li class="li">
+          <p class="p">
+            <a class="xref" href="https://docs.microsoft.com/en-us/azure/data-lake-store/data-lake-store-get-started-portal" target="_blank">Get started with Azure Data Lake Store using the Azure Portal</a>
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            <a class="xref" href="https://hadoop.apache.org/docs/current/hadoop-azure-datalake/index.html" target="_blank">Hadoop Azure Data Lake Support</a>
+          </p>
+        </li>
+      </ul>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="adls__sql">
+    <h2 class="title topictitle2" id="ariaid-title3">How Impala SQL Statements Work with ADLS</h2>
+    <div class="body conbody">
+      <p class="p">
+        Impala SQL statements work with data on ADLS as follows:
+      </p>
+      <ul class="ul">
+        <li class="li">
+          <p class="p">
+            The <a class="xref" href="impala_create_table.html#create_table">CREATE TABLE Statement</a>
+            or <a class="xref" href="impala_alter_table.html#alter_table">ALTER TABLE Statement</a> statements
+            can specify that a table resides on the ADLS filesystem by
+            encoding an <code class="ph codeph">adl://</code> prefix for the <code class="ph codeph">LOCATION</code>
+            property. <code class="ph codeph">ALTER TABLE</code> can also set the <code class="ph codeph">LOCATION</code>
+            property for an individual partition, so that some data in a table resides on
+            ADLS and other data in the same table resides on HDFS.
+          </p>
+          <div class="p">
+            The full format of the location URI is typically:
+<pre class="pre codeblock"><code>
+adl://<var class="keyword varname">your_account</var>.azuredatalakestore.net/<var class="keyword varname">rest_of_directory_path</var>
+</code></pre>
+          </div>
+        </li>
+        <li class="li">
+          <p class="p">
+            Once a table or partition is designated as residing on ADLS, the <a class="xref" href="impala_select.html#select">SELECT Statement</a>
+            statement transparently accesses the data files from the appropriate storage layer.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            If the ADLS table is an internal table, the <a class="xref" href="impala_drop_table.html#drop_table">DROP TABLE Statement</a> statement
+            removes the corresponding data files from ADLS when the table is dropped.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            The <a class="xref" href="impala_truncate_table.html#truncate_table">TRUNCATE TABLE Statement (Impala 2.3 or higher only)</a> statement always removes the corresponding
+            data files from ADLS when the table is truncated.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            The <a class="xref" href="impala_load_data.html#load_data">LOAD DATA Statement</a> can move data files residing in HDFS into
+            an ADLS table.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            The <a class="xref" href="impala_insert.html#insert">INSERT Statement</a>, or the <code class="ph codeph">CREATE TABLE AS SELECT</code>
+            form of the <code class="ph codeph">CREATE TABLE</code> statement, can copy data from an HDFS table or another ADLS
+            table into an ADLS table.
+          </p>
+        </li>
+      </ul>
+      <p class="p">
+        For usage information about Impala SQL statements with ADLS tables, see <a class="xref" href="impala_adls.html#ddl">Creating Impala Databases, Tables, and Partitions for Data Stored on ADLS</a>
+        and <a class="xref" href="impala_adls.html#dml">Using Impala DML Statements for ADLS Data</a>.
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="adls__creds">
+
+    <h2 class="title topictitle2" id="ariaid-title4">Specifying Impala Credentials to Access Data in ADLS</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        To allow Impala to access data in ADLS, specify values for the following configuration settings in your
+        <span class="ph filepath">core-site.xml</span> file:
+      </p>
+
+<pre class="pre codeblock"><code>
+&lt;property&gt;
+   &lt;name&gt;dfs.adls.oauth2.access.token.provider.type&lt;/name&gt;
+   &lt;value&gt;ClientCredential&lt;/value&gt;
+&lt;/property&gt;
+&lt;property&gt;
+   &lt;name&gt;dfs.adls.oauth2.client.id&lt;/name&gt;
+   &lt;value&gt;&lt;varname&gt;your_client_id&lt;/varname&gt;&lt;/value&gt;
+&lt;/property&gt;
+&lt;property&gt;
+   &lt;name&gt;dfs.adls.oauth2.credential&lt;/name&gt;
+   &lt;value&gt;&lt;varname&gt;your_client_secret&lt;/varname&gt;&lt;/value&gt;
+&lt;/property&gt;
+&lt;property&gt;
+   &lt;name&gt;dfs.adls.oauth2.refresh.url&lt;/name&gt;
+   &lt;value&gt;&lt;varname&gt;refresh_URL&lt;/varname&gt;&lt;/value&gt;
+&lt;/property&gt;
+
+</code></pre>
+
+      <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+        <p class="p">
+          Check if your Hadoop distribution or cluster management tool includes support for
+          filling in and distributing credentials across the cluster in an automated way.
+        </p>
+      </div>
+
+      <p class="p">
+        After specifying the credentials, restart both the Impala and
+        Hive services. (Restarting Hive is required because Impala queries, CREATE TABLE statements, and so on go
+        through the Hive metastore.)
+      </p>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title5" id="adls__etl">
+
+    <h2 class="title topictitle2" id="ariaid-title5">Loading Data into ADLS for Impala Queries</h2>
+
+
+    <div class="body conbody">
+
+      <p class="p">
+        If your ETL pipeline involves moving data into ADLS and then querying through Impala,
+        you can either use Impala DML statements to create, move, or copy the data, or
+        use the same data loading techniques as you would for non-Impala data.
+      </p>
+
+    </div>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title6" id="etl__dml">
+      <h3 class="title topictitle3" id="ariaid-title6">Using Impala DML Statements for ADLS Data</h3>
+      <div class="body conbody">
+        <p class="p">
+        In <span class="keyword">Impala 2.9</span> and higher, the Impala DML statements (<code class="ph codeph">INSERT</code>, <code class="ph codeph">LOAD DATA</code>,
+        and <code class="ph codeph">CREATE TABLE AS SELECT</code>) can write data into a table or partition that resides in the
+        Azure Data Lake Store (ADLS).
+        The syntax of the DML statements is the same as for any other tables, because the ADLS location for tables and
+        partitions is specified by an <code class="ph codeph">adl://</code> prefix in the
+        <code class="ph codeph">LOCATION</code> attribute of
+        <code class="ph codeph">CREATE TABLE</code> or <code class="ph codeph">ALTER TABLE</code> statements.
+        If you bring data into ADLS using the normal ADLS transfer mechanisms instead of Impala DML statements,
+        issue a <code class="ph codeph">REFRESH</code> statement for the table before using Impala to query the ADLS data.
+      </p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title7" id="etl__manual_etl">
+      <h3 class="title topictitle3" id="ariaid-title7">Manually Loading Data into Impala Tables on ADLS</h3>
+      <div class="body conbody">
+        <p class="p">
+          As an alternative, you can use the Microsoft-provided methods to bring data files
+          into ADLS for querying through Impala. See
+          <a class="xref" href="https://docs.microsoft.com/en-us/azure/data-lake-store/data-lake-store-copy-data-azure-storage-blob" target="_blank">the Microsoft ADLS documentation</a>
+          for details.
+        </p>
+
+        <p class="p">
+          After you upload data files to a location already mapped to an Impala table or partition, or if you delete
+          files in ADLS from such a location, issue the <code class="ph codeph">REFRESH <var class="keyword varname">table_name</var></code>
+          statement to make Impala aware of the new set of data files.
+        </p>
+
+      </div>
+    </article>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title8" id="adls__ddl">
+
+    <h2 class="title topictitle2" id="ariaid-title8">Creating Impala Databases, Tables, and Partitions for Data Stored on ADLS</h2>
+
+
+    <div class="body conbody">
+
+      <p class="p">
+        Impala reads data for a table or partition from ADLS based on the <code class="ph codeph">LOCATION</code> attribute for the
+        table or partition. Specify the ADLS details in the <code class="ph codeph">LOCATION</code> clause of a <code class="ph codeph">CREATE
+        TABLE</code> or <code class="ph codeph">ALTER TABLE</code> statement. The notation for the <code class="ph codeph">LOCATION</code>
+        clause is <code class="ph codeph">adl://<var class="keyword varname">store</var>/<var class="keyword varname">path/to/file</var></code>.
+      </p>
+
+      <p class="p">
+        For a partitioned table, either specify a separate <code class="ph codeph">LOCATION</code> clause for each new partition,
+        or specify a base <code class="ph codeph">LOCATION</code> for the table and set up a directory structure in ADLS to mirror
+        the way Impala partitioned tables are structured in HDFS. Although, strictly speaking, ADLS filenames do not
+        have directory paths, Impala treats ADLS filenames with <code class="ph codeph">/</code> characters the same as HDFS
+        pathnames that include directories.
+      </p>
+
+      <p class="p">
+        To point a nonpartitioned table or an individual partition at ADLS, specify a single directory
+        path in ADLS, which could be any arbitrary directory. To replicate the structure of an entire Impala
+        partitioned table or database in ADLS requires more care, with directories and subdirectories nested and
+        named to match the equivalent directory tree in HDFS. Consider setting up an empty staging area if
+        necessary in HDFS, and recording the complete directory structure so that you can replicate it in ADLS.
+      </p>
+
+      <p class="p">
+        For example, the following session creates a partitioned table where only a single partition resides on ADLS.
+        The partitions for years 2013 and 2014 are located on HDFS. The partition for year 2015 includes a
+        <code class="ph codeph">LOCATION</code> attribute with an <code class="ph codeph">adl://</code> URL, and so refers to data residing on
+        ADLS, under a specific path underneath the store <code class="ph codeph">impalademo</code>.
+      </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; create database db_on_hdfs;
+[localhost:21000] &gt; use db_on_hdfs;
+[localhost:21000] &gt; create table mostly_on_hdfs (x int) partitioned by (year int);
+[localhost:21000] &gt; alter table mostly_on_hdfs add partition (year=2013);
+[localhost:21000] &gt; alter table mostly_on_hdfs add partition (year=2014);
+[localhost:21000] &gt; alter table mostly_on_hdfs add partition (year=2015)
+                  &gt;   location 'adl://impalademo.azuredatalakestore.net/dir1/dir2/dir3/t1';
+</code></pre>
+
+      <p class="p">
+        For convenience when working with multiple tables with data files stored in ADLS, you can create a database
+        with a <code class="ph codeph">LOCATION</code> attribute pointing to an ADLS path.
+        Specify a URL of the form <code class="ph codeph">adl://<var class="keyword varname">store</var>/<var class="keyword varname">root/path/for/database</var></code>
+        for the <code class="ph codeph">LOCATION</code> attribute of the database.
+        Any tables created inside that database
+        automatically create directories underneath the one specified by the database
+        <code class="ph codeph">LOCATION</code> attribute.
+      </p>
+
+      <p class="p">
+        The following session creates a database and two partitioned tables residing entirely on ADLS, one
+        partitioned by a single column and the other partitioned by multiple columns. Because a
+        <code class="ph codeph">LOCATION</code> attribute with an <code class="ph codeph">adl://</code> URL is specified for the database, the
+        tables inside that database are automatically created on ADLS underneath the database directory. To see the
+        names of the associated subdirectories, including the partition key values, we use an ADLS client tool to
+        examine how the directory structure is organized on ADLS. For example, Impala partition directories such as
+        <code class="ph codeph">month=1</code> do not include leading zeroes, which sometimes appear in partition directories created
+        through Hive.
+      </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; create database db_on_adls location 'adl://impalademo.azuredatalakestore.net/dir1/dir2/dir3';
+[localhost:21000] &gt; use db_on_adls;
+
+[localhost:21000] &gt; create table partitioned_on_adls (x int) partitioned by (year int);
+[localhost:21000] &gt; alter table partitioned_on_adls add partition (year=2013);
+[localhost:21000] &gt; alter table partitioned_on_adls add partition (year=2014);
+[localhost:21000] &gt; alter table partitioned_on_adls add partition (year=2015);
+
+[localhost:21000] &gt; ! hadoop fs -ls adl://impalademo.azuredatalakestore.net/dir1/dir2/dir3 --recursive;
+2015-03-17 13:56:34          0 dir1/dir2/dir3/
+2015-03-17 16:43:28          0 dir1/dir2/dir3/partitioned_on_adls/
+2015-03-17 16:43:49          0 dir1/dir2/dir3/partitioned_on_adls/year=2013/
+2015-03-17 16:43:53          0 dir1/dir2/dir3/partitioned_on_adls/year=2014/
+2015-03-17 16:43:58          0 dir1/dir2/dir3/partitioned_on_adls/year=2015/
+
+[localhost:21000] &gt; create table partitioned_multiple_keys (x int)
+                  &gt;   partitioned by (year smallint, month tinyint, day tinyint);
+[localhost:21000] &gt; alter table partitioned_multiple_keys
+                  &gt;   add partition (year=2015,month=1,day=1);
+[localhost:21000] &gt; alter table partitioned_multiple_keys
+                  &gt;   add partition (year=2015,month=1,day=31);
+[localhost:21000] &gt; alter table partitioned_multiple_keys
+                  &gt;   add partition (year=2015,month=2,day=28);
+
+[localhost:21000] &gt; ! hadoop fs -ls adl://impalademo.azuredatalakestore.net/dir1/dir2/dir3 --recursive;
+2015-03-17 13:56:34          0 dir1/dir2/dir3/
+2015-03-17 16:47:13          0 dir1/dir2/dir3/partitioned_multiple_keys/
+2015-03-17 16:47:44          0 dir1/dir2/dir3/partitioned_multiple_keys/year=2015/month=1/day=1/
+2015-03-17 16:47:50          0 dir1/dir2/dir3/partitioned_multiple_keys/year=2015/month=1/day=31/
+2015-03-17 16:47:57          0 dir1/dir2/dir3/partitioned_multiple_keys/year=2015/month=2/day=28/
+2015-03-17 16:43:28          0 dir1/dir2/dir3/partitioned_on_adls/
+2015-03-17 16:43:49          0 dir1/dir2/dir3/partitioned_on_adls/year=2013/
+2015-03-17 16:43:53          0 dir1/dir2/dir3/partitioned_on_adls/year=2014/
+2015-03-17 16:43:58          0 dir1/dir2/dir3/partitioned_on_adls/year=2015/
+</code></pre>
+
+      <p class="p">
+        The <code class="ph codeph">CREATE DATABASE</code> and <code class="ph codeph">CREATE TABLE</code> statements create the associated
+        directory paths if they do not already exist. You can specify multiple levels of directories, and the
+        <code class="ph codeph">CREATE</code> statement creates all appropriate levels, similar to using <code class="ph codeph">mkdir
+        -p</code>.
+      </p>
+
+      <p class="p">
+        Use the standard ADLS file upload methods to actually put the data files into the right locations. You can
+        also put the directory paths and data files in place before creating the associated Impala databases or
+        tables, and Impala automatically uses the data from the appropriate location after the associated databases
+        and tables are created.
+      </p>
+
+      <p class="p">
+        You can switch whether an existing table or partition points to data in HDFS or ADLS. For example, if you
+        have an Impala table or partition pointing to data files in HDFS or ADLS, and you later transfer those data
+        files to the other filesystem, use an <code class="ph codeph">ALTER TABLE</code> statement to adjust the
+        <code class="ph codeph">LOCATION</code> attribute of the corresponding table or partition to reflect that change. Because
+        Impala does not have an <code class="ph codeph">ALTER DATABASE</code> statement, this location-switching technique is not
+        practical for entire databases that have a custom <code class="ph codeph">LOCATION</code> attribute.
+      </p>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title9" id="adls__internal_external">
+
+    <h2 class="title topictitle2" id="ariaid-title9">Internal and External Tables Located on ADLS</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        Just as with tables located on HDFS storage, you can designate ADLS-based tables as either internal (managed
+        by Impala) or external, by using the syntax <code class="ph codeph">CREATE TABLE</code> or <code class="ph codeph">CREATE EXTERNAL
+        TABLE</code> respectively. When you drop an internal table, the files associated with the table are
+        removed, even if they are on ADLS storage. When you drop an external table, the files associated with the
+        table are left alone, and are still available for access by other tools or components. See
+        <a class="xref" href="impala_tables.html#tables">Overview of Impala Tables</a> for details.
+      </p>
+
+      <p class="p">
+        If the data on ADLS is intended to be long-lived and accessed by other tools in addition to Impala, create
+        any associated ADLS tables with the <code class="ph codeph">CREATE EXTERNAL TABLE</code> syntax, so that the files are not
+        deleted from ADLS when the table is dropped.
+      </p>
+
+      <p class="p">
+        If the data on ADLS is only needed for querying by Impala and can be safely discarded once the Impala
+        workflow is complete, create the associated ADLS tables using the <code class="ph codeph">CREATE TABLE</code> syntax, so
+        that dropping the table also deletes the corresponding data files on ADLS.
+      </p>
+
+      <p class="p">
+        For example, this session creates a table in ADLS with the same column layout as a table in HDFS, then
+        examines the ADLS table and queries some data from it. The table in ADLS works the same as a table in HDFS as
+        far as the expected file format of the data, table and column statistics, and other table properties. The
+        only indication that it is not an HDFS table is the <code class="ph codeph">adl://</code> URL in the
+        <code class="ph codeph">LOCATION</code> property. Many data files can reside in the ADLS directory, and their combined
+        contents form the table data. Because the data in this example is uploaded after the table is created, a
+        <code class="ph codeph">REFRESH</code> statement prompts Impala to update its cached information about the data files.
+      </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; create table usa_cities_adls like usa_cities location 'adl://impalademo.azuredatalakestore.net/usa_cities';
+[localhost:21000] &gt; desc usa_cities_adls;
++-------+----------+---------+
+| name  | type     | comment |
++-------+----------+---------+
+| id    | smallint |         |
+| city  | string   |         |
+| state | string   |         |
++-------+----------+---------+
+
+-- Now from a web browser, upload the same data file(s) to ADLS as in the HDFS table,
+-- under the relevant store and path. If you already have the data in ADLS, you would
+-- point the table LOCATION at an existing path.
+
+[localhost:21000] &gt; refresh usa_cities_adls;
+[localhost:21000] &gt; select count(*) from usa_cities_adls;
++----------+
+| count(*) |
++----------+
+| 289      |
++----------+
+[localhost:21000] &gt; select distinct state from sample_data_adls limit 5;
++----------------------+
+| state                |
++----------------------+
+| Louisiana            |
+| Minnesota            |
+| Georgia              |
+| Alaska               |
+| Ohio                 |
++----------------------+
+[localhost:21000] &gt; desc formatted usa_cities_adls;
++------------------------------+----------------------------------------------------+---------+
+| name                         | type                                               | comment |
++------------------------------+----------------------------------------------------+---------+
+| # col_name                   | data_type                                          | comment |
+|                              | NULL                                               | NULL    |
+| id                           | smallint                                           | NULL    |
+| city                         | string                                             | NULL    |
+| state                        | string                                             | NULL    |
+|                              | NULL                                               | NULL    |
+| # Detailed Table Information | NULL                                               | NULL    |
+| Database:                    | adls_testing                                       | NULL    |
+| Owner:                       | jrussell                                           | NULL    |
+| CreateTime:                  | Mon Mar 16 11:36:25 PDT 2017                       | NULL    |
+| LastAccessTime:              | UNKNOWN                                            | NULL    |
+| Protect Mode:                | None                                               | NULL    |
+| Retention:                   | 0                                                  | NULL    |
+| Location:                    | adl://impalademo.azuredatalakestore.net/usa_cities | NULL    |
+| Table Type:                  | MANAGED_TABLE                                      | NULL    |
+...
++------------------------------+----------------------------------------------------+---------+
+</code></pre>
+
+      <p class="p">
+        In this case, we have already uploaded a Parquet file with a million rows of data to the
+        <code class="ph codeph">sample_data</code> directory underneath the <code class="ph codeph">impalademo</code> store on ADLS. This
+        session creates a table with matching column settings pointing to the corresponding location in ADLS, then
+        queries the table. Because the data is already in place on ADLS when the table is created, no
+        <code class="ph codeph">REFRESH</code> statement is required.
+      </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; create table sample_data_adls
+                  &gt; (id int, id bigint, val int, zerofill string,
+                  &gt; name string, assertion boolean, city string, state string)
+                  &gt; stored as parquet location 'adl://impalademo.azuredatalakestore.net/sample_data';
+[localhost:21000] &gt; select count(*) from sample_data_adls;
++----------+
+| count(*) |
++----------+
+| 1000000  |
++----------+
+[localhost:21000] &gt; select count(*) howmany, assertion from sample_data_adls group by assertion;
++---------+-----------+
+| howmany | assertion |
++---------+-----------+
+| 667149  | true      |
+| 332851  | false     |
++---------+-----------+
+</code></pre>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title10" id="adls__queries">
+
+    <h2 class="title topictitle2" id="ariaid-title10">Running and Tuning Impala Queries for Data Stored on ADLS</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        Once the appropriate <code class="ph codeph">LOCATION</code> attributes are set up at the table or partition level, you
+        query data stored in ADLS exactly the same as data stored on HDFS or in HBase:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          Queries against ADLS data support all the same file formats as for HDFS data.
+        </li>
+
+        <li class="li">
+          Tables can be unpartitioned or partitioned. For partitioned tables, either manually construct paths in ADLS
+          corresponding to the HDFS directories representing partition key values, or use <code class="ph codeph">ALTER TABLE ...
+          ADD PARTITION</code> to set up the appropriate paths in ADLS.
+        </li>
+
+        <li class="li">
+          HDFS, Kudu, and HBase tables can be joined to ADLS tables, or ADLS tables can be joined with each other.
+        </li>
+
+        <li class="li">
+          Authorization using the Sentry framework to control access to databases, tables, or columns works the
+          same whether the data is in HDFS or in ADLS.
+        </li>
+
+        <li class="li">
+          The <span class="keyword cmdname">catalogd</span> daemon caches metadata for both HDFS and ADLS tables. Use
+          <code class="ph codeph">REFRESH</code> and <code class="ph codeph">INVALIDATE METADATA</code> for ADLS tables in the same situations
+          where you would issue those statements for HDFS tables.
+        </li>
+
+        <li class="li">
+          Queries against ADLS tables are subject to the same kinds of admission control and resource management as
+          HDFS tables.
+        </li>
+
+        <li class="li">
+          Metadata about ADLS tables is stored in the same metastore database as for HDFS tables.
+        </li>
+
+        <li class="li">
+          You can set up views referring to ADLS tables, the same as for HDFS tables.
+        </li>
+
+        <li class="li">
+          The <code class="ph codeph">COMPUTE STATS</code>, <code class="ph codeph">SHOW TABLE STATS</code>, and <code class="ph codeph">SHOW COLUMN
+          STATS</code> statements work for ADLS tables also.
+        </li>
+      </ul>
+
+    </div>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title11" id="queries__performance">
+
+      <h3 class="title topictitle3" id="ariaid-title11">Understanding and Tuning Impala Query Performance for ADLS Data</h3>
+
+
+      <div class="body conbody">
+
+        <p class="p">
+          Although Impala queries for data stored in ADLS might be less performant than queries against the
+          equivalent data stored in HDFS, you can still do some tuning. Here are techniques you can use to
+          interpret explain plans and profiles for queries against ADLS data, and tips to achieve the best
+          performance possible for such queries.
+        </p>
+
+        <p class="p">
+          All else being equal, performance is expected to be lower for queries running against data on ADLS rather
+          than HDFS. The actual mechanics of the <code class="ph codeph">SELECT</code> statement are somewhat different when the
+          data is in ADLS. Although the work is still distributed across the datanodes of the cluster, Impala might
+          parallelize the work for a distributed query differently for data on HDFS and ADLS. ADLS does not have the
+          same block notion as HDFS, so Impala uses heuristics to determine how to split up large ADLS files for
+          processing in parallel. Because all hosts can access any ADLS data file with equal efficiency, the
+          distribution of work might be different than for HDFS data, where the data blocks are physically read
+          using short-circuit local reads by hosts that contain the appropriate block replicas. Although the I/O to
+          read the ADLS data might be spread evenly across the hosts of the cluster, the fact that all data is
+          initially retrieved across the network means that the overall query performance is likely to be lower for
+          ADLS data than for HDFS data.
+        </p>
+
+        <p class="p">
+        Because ADLS does not expose the block sizes of data files the way HDFS does,
+        any Impala <code class="ph codeph">INSERT</code> or <code class="ph codeph">CREATE TABLE AS SELECT</code> statements
+        use the <code class="ph codeph">PARQUET_FILE_SIZE</code> query option setting to define the size of
+        Parquet data files. (Using a large block size is more important for Parquet tables than
+        for tables that use other file formats.)
+      </p>
+
+        <p class="p">
+          When optimizing aspects of for complex queries such as the join order, Impala treats tables on HDFS and
+          ADLS the same way. Therefore, follow all the same tuning recommendations for ADLS tables as for HDFS ones,
+          such as using the <code class="ph codeph">COMPUTE STATS</code> statement to help Impala construct accurate estimates of
+          row counts and cardinality. See <a class="xref" href="impala_performance.html#performance">Tuning Impala for Performance</a> for details.
+        </p>
+
+        <p class="p">
+          In query profile reports, the numbers for <code class="ph codeph">BytesReadLocal</code>,
+          <code class="ph codeph">BytesReadShortCircuit</code>, <code class="ph codeph">BytesReadDataNodeCached</code>, and
+          <code class="ph codeph">BytesReadRemoteUnexpected</code> are blank because those metrics come from HDFS.
+          If you do see any indications that a query against an ADLS table performed <span class="q">"remote read"</span>
+          operations, do not be alarmed. That is expected because, by definition, all the I/O for ADLS tables involves
+          remote reads.
+        </p>
+
+      </div>
+
+    </article>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title12" id="adls__restrictions">
+
+    <h2 class="title topictitle2" id="ariaid-title12">Restrictions on Impala Support for ADLS</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        Impala requires that the default filesystem for the cluster be HDFS. You cannot use ADLS as the only
+        filesystem in the cluster.
+      </p>
+
+      <p class="p">
+        Although ADLS is often used to store JSON-formatted data, the current Impala support for ADLS does not include
+        directly querying JSON data. For Impala queries, use data files in one of the file formats listed in
+        <a class="xref" href="impala_file_formats.html#file_formats">How Impala Works with Hadoop File Formats</a>. If you have data in JSON format, you can prepare a
+        flattened version of that data for querying by Impala as part of your ETL cycle.
+      </p>
+
+      <p class="p">
+        You cannot use the <code class="ph codeph">ALTER TABLE ... SET CACHED</code> statement for tables or partitions that are
+        located in ADLS.
+      </p>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title13" id="adls__best_practices">
+    <h2 class="title topictitle2" id="ariaid-title13">Best Practices for Using Impala with ADLS</h2>
+
+    <div class="body conbody">
+      <p class="p">
+        The following guidelines represent best practices derived from testing and real-world experience with Impala on ADLS:
+      </p>
+      <ul class="ul">
+        <li class="li">
+          <p class="p">
+            Any reference to an ADLS location must be fully qualified. (This rule applies when
+            ADLS is not designated as the default filesystem.)
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            Set any appropriate configuration settings for <span class="keyword cmdname">impalad</span>.
+          </p>
+        </li>
+      </ul>
+
+    </div>
+  </article>
+
+</article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_admin.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_admin.html b/docs/build3x/html/topics/impala_admin.html
new file mode 100644
index 0000000..7c76987
--- /dev/null
+++ b/docs/build3x/html/topics/impala_admin.html
@@ -0,0 +1,52 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_admission.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_resource_management.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_timeouts.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_proxy.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_disk_space.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="admin"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Impala Admini
 stration</title></head><body id="admin"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Impala Administration</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+      As an administrator, you monitor Impala's use of resources and take action when necessary to keep Impala
+      running smoothly and avoid conflicts with other Hadoop components running on the same cluster. When you
+      detect that an issue has happened or could happen in the future, you reconfigure Impala or other components
+      such as HDFS or even the hardware of the cluster itself to resolve or avoid problems.
+    </p>
+
+    <p class="p toc"></p>
+
+    <p class="p">
+      <strong class="ph b">Related tasks:</strong>
+    </p>
+
+    <p class="p">
+      As an administrator, you can expect to perform installation, upgrade, and configuration tasks for Impala on
+      all machines in a cluster. See <a class="xref" href="impala_install.html#install">Installing Impala</a>,
+      <a class="xref" href="impala_upgrading.html#upgrading">Upgrading Impala</a>, and <a class="xref" href="impala_config.html#config">Managing Impala</a> for details.
+    </p>
+
+    <p class="p">
+      For security tasks typically performed by administrators, see <a class="xref" href="impala_security.html#security">Impala Security</a>.
+    </p>
+
+    <div class="p">
+      Administrators also decide how to allocate cluster resources so that all Hadoop components can run smoothly
+      together. For Impala, this task primarily involves:
+      <ul class="ul">
+        <li class="li">
+          Deciding how many Impala queries can run concurrently and with how much memory, through the admission
+          control feature. See <a class="xref" href="impala_admission.html#admission_control">Admission Control and Query Queuing</a> for details.
+        </li>
+
+        <li class="li">
+          Dividing cluster resources such as memory between Impala and other components, using YARN for overall
+          resource management, and Llama to mediate resource requests from Impala to YARN. See
+          <a class="xref" href="impala_resource_management.html#resource_management">Resource Management for Impala</a> for details.
+        </li>
+      </ul>
+    </div>
+
+  </div>
+<nav role="navigation" class="related-links"><ul class="ullinks"><li class="link ulchildlink"><strong><a href="../topics/impala_admission.html">Admission Control and Query Queuing</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_resource_management.html">Resource Management for Impala</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_timeouts.html">Setting Timeout Periods for Daemons, Queries, and Sessions</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_proxy.html">Using Impala through a Proxy for High Availability</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_disk_space.html">Managing Disk Space for Impala Data</a></strong><br></li></ul></nav></article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_admission.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_admission.html b/docs/build3x/html/topics/impala_admission.html
new file mode 100644
index 0000000..9eff7ea
--- /dev/null
+++ b/docs/build3x/html/topics/impala_admission.html
@@ -0,0 +1,822 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_admin.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="admission_control"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Admission Control and Query Queuing</title></head><body id="admission_control"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Admission Control and Query Queuing</h1>
+
+
+  <div class="body conbody">
+
+    <p class="p" id="admission_control__admission_control_intro">
+      Admission control is an Impala feature that imposes limits on concurrent SQL queries, to avoid resource usage
+      spikes and out-of-memory conditions on busy clusters.
+      It is a form of <span class="q">"throttling"</span>.
+      New queries are accepted and executed until
+      certain conditions are met, such as too many queries or too much
+      total memory used across the cluster.
+      When one of these thresholds is reached,
+      incoming queries wait to begin execution. These queries are
+      queued and are admitted (that is, begin executing) when the resources become available.
+    </p>
+    <p class="p">
+      In addition to the threshold values for currently executing queries,
+      you can place limits on the maximum number of queries that are
+      queued (waiting) and a limit on the amount of time they might wait
+      before returning with an error. These queue settings let you ensure that queries do
+      not wait indefinitely, so that you can detect and correct <span class="q">"starvation"</span> scenarios.
+    </p>
+    <p class="p">
+      Enable this feature if your cluster is
+      underutilized at some times and overutilized at others. Overutilization is indicated by performance
+      bottlenecks and queries being cancelled due to out-of-memory conditions, when those same queries are
+      successful and perform well during times with less concurrent load. Admission control works as a safeguard to
+      avoid out-of-memory conditions during heavy concurrent usage.
+    </p>
+
+    <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+        <p class="p">
+          The use of the Llama component for integrated resource management within YARN
+          is no longer supported with <span class="keyword">Impala 2.3</span> and higher.
+          The Llama support code is removed entirely in <span class="keyword">Impala 2.8</span> and higher.
+        </p>
+        <p class="p">
+          For clusters running Impala alongside
+          other data management components, you define static service pools to define the resources
+          available to Impala and other components. Then within the area allocated for Impala,
+          you can create dynamic service pools, each with its own settings for the Impala admission control feature.
+        </p>
+      </div>
+
+    <p class="p toc inpage"></p>
+  </div>
+
+  <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_admin.html">Impala Administration</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="admission_control__admission_intro">
+
+    <h2 class="title topictitle2" id="ariaid-title2">Overview of Impala Admission Control</h2>
+
+
+    <div class="body conbody">
+
+      <p class="p">
+        On a busy cluster, you might find there is an optimal number of Impala queries that run concurrently.
+        For example, when the I/O capacity is fully utilized by I/O-intensive queries,
+        you might not find any throughput benefit in running more concurrent queries.
+        By allowing some queries to run at full speed while others wait, rather than having
+        all queries contend for resources and run slowly, admission control can result in higher overall throughput.
+      </p>
+
+      <p class="p">
+        For another example, consider a memory-bound workload such as many large joins or aggregation queries.
+        Each such query could briefly use many gigabytes of memory to process intermediate results.
+        Because Impala by default cancels queries that exceed the specified memory limit,
+        running multiple large-scale queries at once might require
+        re-running some queries that are cancelled. In this case, admission control improves the
+        reliability and stability of the overall workload by only allowing as many concurrent queries
+        as the overall memory of the cluster can accomodate.
+      </p>
+
+      <p class="p">
+        The admission control feature lets you set an upper limit on the number of concurrent Impala
+        queries and on the memory used by those queries. Any additional queries are queued until the earlier ones
+        finish, rather than being cancelled or running slowly and causing contention. As other queries finish, the
+        queued queries are allowed to proceed.
+      </p>
+
+      <p class="p">
+        In <span class="keyword">Impala 2.5</span> and higher, you can specify these limits and thresholds for each
+        pool rather than globally. That way, you can balance the resource usage and throughput
+        between steady well-defined workloads, rare resource-intensive queries, and ad hoc
+        exploratory queries.
+      </p>
+
+      <p class="p">
+        For details on the internal workings of admission control, see
+        <a class="xref" href="impala_admission.html#admission_architecture">How Impala Schedules and Enforces Limits on Concurrent Queries</a>.
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="admission_control__admission_concurrency">
+    <h2 class="title topictitle2" id="ariaid-title3">Concurrent Queries and Admission Control</h2>
+    <div class="body conbody">
+      <p class="p">
+        One way to limit resource usage through admission control is to set an upper limit
+        on the number of concurrent queries. This is the initial technique you might use
+        when you do not have extensive information about memory usage for your workload.
+        This setting can be specified separately for each dynamic resource pool.
+      </p>
+      <p class="p">
+        You can combine this setting with the memory-based approach described in
+        <a class="xref" href="impala_admission.html#admission_memory">Memory Limits and Admission Control</a>. If either the maximum number of
+        or the expected memory usage of the concurrent queries is exceeded, subsequent queries
+        are queued until the concurrent workload falls below the threshold again.
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="admission_control__admission_memory">
+    <h2 class="title topictitle2" id="ariaid-title4">Memory Limits and Admission Control</h2>
+    <div class="body conbody">
+      <p class="p">
+        Each dynamic resource pool can have an upper limit on the cluster-wide memory used by queries executing in that pool.
+        This is the technique to use once you have a stable workload with well-understood memory requirements.
+      </p>
+      <p class="p">
+        Always specify the <span class="ph uicontrol">Default Query Memory Limit</span> for the expected maximum amount of RAM
+        that a query might require on each host, which is equivalent to setting the <code class="ph codeph">MEM_LIMIT</code>
+        query option for every query run in that pool. That value affects the execution of each query, preventing it
+        from overallocating memory on each host, and potentially activating the spill-to-disk mechanism or cancelling
+        the query when necessary.
+      </p>
+      <p class="p">
+        Optionally, specify the <span class="ph uicontrol">Max Memory</span> setting, a cluster-wide limit that determines
+        how many queries can be safely run concurrently, based on the upper memory limit per host multiplied by the
+        number of Impala nodes in the cluster.
+      </p>
+      <div class="p">
+        For example, consider the following scenario:
+        <ul class="ul">
+          <li class="li"> The cluster is running <span class="keyword cmdname">impalad</span> daemons on five
+            DataNodes. </li>
+          <li class="li"> A dynamic resource pool has <span class="ph uicontrol">Max Memory</span> set
+            to 100 GB. </li>
+          <li class="li"> The <span class="ph uicontrol">Default Query Memory Limit</span> for the
+            pool is 10 GB. Therefore, any query running in this pool could use
+            up to 50 GB of memory (default query memory limit * number of Impala
+            nodes). </li>
+          <li class="li"> The maximum number of queries that Impala executes concurrently
+            within this dynamic resource pool is two, which is the most that
+            could be accomodated within the 100 GB <span class="ph uicontrol">Max
+              Memory</span> cluster-wide limit. </li>
+          <li class="li"> There is no memory penalty if queries use less memory than the
+              <span class="ph uicontrol">Default Query Memory Limit</span> per-host setting
+            or the <span class="ph uicontrol">Max Memory</span> cluster-wide limit. These
+            values are only used to estimate how many queries can be run
+            concurrently within the resource constraints for the pool. </li>
+        </ul>
+      </div>
+      <div class="note note note_note"><span class="note__title notetitle">Note:</span>  If you specify <span class="ph uicontrol">Max
+          Memory</span> for an Impala dynamic resource pool, you must also
+        specify the <span class="ph uicontrol">Default Query Memory Limit</span>.
+          <span class="ph uicontrol">Max Memory</span> relies on the <span class="ph uicontrol">Default
+          Query Memory Limit</span> to produce a reliable estimate of
+        overall memory consumption for a query. </div>
+      <p class="p">
+        You can combine the memory-based settings with the upper limit on concurrent queries described in
+        <a class="xref" href="impala_admission.html#admission_concurrency">Concurrent Queries and Admission Control</a>. If either the maximum number of
+        or the expected memory usage of the concurrent queries is exceeded, subsequent queries
+        are queued until the concurrent workload falls below the threshold again.
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title5" id="admission_control__admission_yarn">
+
+    <h2 class="title topictitle2" id="ariaid-title5">How Impala Admission Control Relates to Other Resource Management Tools</h2>
+
+
+    <div class="body conbody">
+
+      <p class="p">
+        The admission control feature is similar in some ways to the YARN resource management framework. These features
+        can be used separately or together. This section describes some similarities and differences, to help you
+        decide which combination of resource management features to use for Impala.
+      </p>
+
+      <p class="p">
+        Admission control is a lightweight, decentralized system that is suitable for workloads consisting
+        primarily of Impala queries and other SQL statements. It sets <span class="q">"soft"</span> limits that smooth out Impala
+        memory usage during times of heavy load, rather than taking an all-or-nothing approach that cancels jobs
+        that are too resource-intensive.
+      </p>
+
+      <p class="p">
+        Because the admission control system does not interact with other Hadoop workloads such as MapReduce jobs, you
+        might use YARN with static service pools on clusters where resources are shared between
+        Impala and other Hadoop components. This configuration is recommended when using Impala in a
+        <dfn class="term">multitenant</dfn> cluster. Devote a percentage of cluster resources to Impala, and allocate another
+        percentage for MapReduce and other batch-style workloads. Let admission control handle the concurrency and
+        memory usage for the Impala work within the cluster, and let YARN manage the work for other components within the
+        cluster. In this scenario, Impala's resources are not managed by YARN.
+      </p>
+
+      <p class="p">
+        The Impala admission control feature uses the same configuration mechanism as the YARN resource manager to map users to
+        pools and authenticate them.
+      </p>
+
+      <p class="p">
+        Although the Impala admission control feature uses a <code class="ph codeph">fair-scheduler.xml</code> configuration file
+        behind the scenes, this file does not depend on which scheduler is used for YARN. You still use this file
+        even when YARN is using the capacity scheduler.
+      </p>
+
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title6" id="admission_control__admission_architecture">
+
+    <h2 class="title topictitle2" id="ariaid-title6">How Impala Schedules and Enforces Limits on Concurrent Queries</h2>
+
+
+    <div class="body conbody">
+
+      <p class="p">
+        The admission control system is decentralized, embedded in each Impala daemon and communicating through the
+        statestore mechanism. Although the limits you set for memory usage and number of concurrent queries apply
+        cluster-wide, each Impala daemon makes its own decisions about whether to allow each query to run
+        immediately or to queue it for a less-busy time. These decisions are fast, meaning the admission control
+        mechanism is low-overhead, but might be imprecise during times of heavy load across many coordinators. There could be times when the
+        more queries were queued (in aggregate across the cluster) than the specified limit, or when number of admitted queries
+        exceeds the expected number. Thus, you typically err on the
+        high side for the size of the queue, because there is not a big penalty for having a large number of queued
+        queries; and you typically err on the low side for configuring memory resources, to leave some headroom in case more
+        queries are admitted than expected, without running out of memory and being cancelled as a result.
+      </p>
+
+
+
+      <p class="p">
+        To avoid a large backlog of queued requests, you can set an upper limit on the size of the queue for
+        queries that are queued. When the number of queued queries exceeds this limit, further queries are
+        cancelled rather than being queued. You can also configure a timeout period per pool, after which queued queries are
+        cancelled, to avoid indefinite waits. If a cluster reaches this state where queries are cancelled due to
+        too many concurrent requests or long waits for query execution to begin, that is a signal for an
+        administrator to take action, either by provisioning more resources, scheduling work on the cluster to
+        smooth out the load, or by doing <a class="xref" href="impala_performance.html#performance">Impala performance
+        tuning</a> to enable higher throughput.
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title7" id="admission_control__admission_jdbc_odbc">
+
+    <h2 class="title topictitle2" id="ariaid-title7">How Admission Control works with Impala Clients (JDBC, ODBC, HiveServer2)</h2>
+
+
+    <div class="body conbody">
+
+      <p class="p">
+        Most aspects of admission control work transparently with client interfaces such as JDBC and ODBC:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          If a SQL statement is put into a queue rather than running immediately, the API call blocks until the
+          statement is dequeued and begins execution. At that point, the client program can request to fetch
+          results, which might also block until results become available.
+        </li>
+
+        <li class="li">
+          If a SQL statement is cancelled because it has been queued for too long or because it exceeded the memory
+          limit during execution, the error is returned to the client program with a descriptive error message.
+        </li>
+
+      </ul>
+
+      <p class="p">
+        In Impala 2.0 and higher, you can submit
+        a SQL <code class="ph codeph">SET</code> statement from the client application
+        to change the <code class="ph codeph">REQUEST_POOL</code> query option.
+        This option lets you submit queries to different resource pools,
+        as described in <a class="xref" href="impala_request_pool.html#request_pool">REQUEST_POOL Query Option</a>.
+
+      </p>
+
+      <p class="p">
+        At any time, the set of queued queries could include queries submitted through multiple different Impala
+        daemon hosts. All the queries submitted through a particular host will be executed in order, so a
+        <code class="ph codeph">CREATE TABLE</code> followed by an <code class="ph codeph">INSERT</code> on the same table would succeed.
+        Queries submitted through different hosts are not guaranteed to be executed in the order they were
+        received. Therefore, if you are using load-balancing or other round-robin scheduling where different
+        statements are submitted through different hosts, set up all table structures ahead of time so that the
+        statements controlled by the queuing system are primarily queries, where order is not significant. Or, if a
+        sequence of statements needs to happen in strict order (such as an <code class="ph codeph">INSERT</code> followed by a
+        <code class="ph codeph">SELECT</code>), submit all those statements through a single session, while connected to the same
+        Impala daemon host.
+      </p>
+
+      <p class="p">
+        Admission control has the following limitations or special behavior when used with JDBC or ODBC
+        applications:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          The other resource-related query options,
+          <code class="ph codeph">RESERVATION_REQUEST_TIMEOUT</code> and <code class="ph codeph">V_CPU_CORES</code>, are no longer used. Those query options only
+          applied to using Impala with Llama, which is no longer supported.
+        </li>
+      </ul>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title8" id="admission_control__admission_schema_config">
+    <h2 class="title topictitle2" id="ariaid-title8">SQL and Schema Considerations for Admission Control</h2>
+    <div class="body conbody">
+      <p class="p">
+        When queries complete quickly and are tuned for optimal memory usage, there is less chance of
+        performance or capacity problems during times of heavy load. Before setting up admission control,
+        tune your Impala queries to ensure that the query plans are efficient and the memory estimates
+        are accurate. Understanding the nature of your workload, and which queries are the most
+        resource-intensive, helps you to plan how to divide the queries into different pools and
+        decide what limits to define for each pool.
+      </p>
+      <p class="p">
+        For large tables, especially those involved in join queries, keep their statistics up to date
+        after loading substantial amounts of new data or adding new partitions.
+        Use the <code class="ph codeph">COMPUTE STATS</code> statement for unpartitioned tables, and
+        <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> for partitioned tables.
+      </p>
+      <p class="p">
+        When you use dynamic resource pools with a <span class="ph uicontrol">Max Memory</span> setting enabled,
+        you typically override the memory estimates that Impala makes based on the statistics from the
+        <code class="ph codeph">COMPUTE STATS</code> statement.
+        You either set the <code class="ph codeph">MEM_LIMIT</code> query option within a particular session to
+        set an upper memory limit for queries within that session, or a default <code class="ph codeph">MEM_LIMIT</code>
+        setting for all queries processed by the <span class="keyword cmdname">impalad</span> instance, or
+        a default <code class="ph codeph">MEM_LIMIT</code> setting for all queries assigned to a particular
+        dynamic resource pool. By designating a consistent memory limit for a set of similar queries
+        that use the same resource pool, you avoid unnecessary query queuing or out-of-memory conditions
+        that can arise during high-concurrency workloads when memory estimates for some queries are inaccurate.
+      </p>
+      <p class="p">
+        Follow other steps from <a class="xref" href="impala_performance.html#performance">Tuning Impala for Performance</a> to tune your queries.
+      </p>
+    </div>
+  </article>
+
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title9" id="admission_control__admission_config">
+
+    <h2 class="title topictitle2" id="ariaid-title9">Configuring Admission Control</h2>
+
+
+    <div class="body conbody">
+
+      <p class="p">
+        The configuration options for admission control range from the simple (a single resource pool with a single
+        set of options) to the complex (multiple resource pools with different options, each pool handling queries
+        for a different set of users and groups).
+      </p>
+
+      <section class="section" id="admission_config__admission_flags"><h3 class="title sectiontitle">Impala Service Flags for Admission Control (Advanced)</h3>
+
+
+
+        <p class="p">
+          The following Impala configuration options let you adjust the settings of the admission control feature. When supplying the
+          options on the <span class="keyword cmdname">impalad</span> command line, prepend the option name with <code class="ph codeph">--</code>.
+        </p>
+
+        <dl class="dl" id="admission_config__admission_control_option_list">
+
+            <dt class="dt dlterm" id="admission_config__queue_wait_timeout_ms">
+              <code class="ph codeph">queue_wait_timeout_ms</code>
+            </dt>
+            <dd class="dd">
+
+              <strong class="ph b">Purpose:</strong> Maximum amount of time (in milliseconds) that a
+              request waits to be admitted before timing out.
+              <p class="p">
+                <strong class="ph b">Type:</strong> <code class="ph codeph">int64</code>
+              </p>
+              <p class="p">
+                <strong class="ph b">Default:</strong> <code class="ph codeph">60000</code>
+              </p>
+            </dd>
+
+
+            <dt class="dt dlterm" id="admission_config__default_pool_max_requests">
+              <code class="ph codeph">default_pool_max_requests</code>
+            </dt>
+            <dd class="dd">
+
+              <strong class="ph b">Purpose:</strong> Maximum number of concurrent outstanding requests
+              allowed to run before incoming requests are queued. Because this
+              limit applies cluster-wide, but each Impala node makes independent
+              decisions to run queries immediately or queue them, it is a soft
+              limit; the overall number of concurrent queries might be slightly
+              higher during times of heavy load. A negative value indicates no
+              limit. Ignored if <code class="ph codeph">fair_scheduler_config_path</code> and
+                <code class="ph codeph">llama_site_path</code> are set. <p class="p">
+                <strong class="ph b">Type:</strong>
+                <code class="ph codeph">int64</code>
+              </p>
+              <p class="p">
+                <strong class="ph b">Default:</strong>
+                <span class="ph">-1, meaning unlimited (prior to <span class="keyword">Impala 2.5</span> the default was 200)</span>
+              </p>
+            </dd>
+
+
+            <dt class="dt dlterm" id="admission_config__default_pool_max_queued">
+              <code class="ph codeph">default_pool_max_queued</code>
+            </dt>
+            <dd class="dd">
+
+              <strong class="ph b">Purpose:</strong> Maximum number of requests allowed to be queued
+              before rejecting requests. Because this limit applies
+              cluster-wide, but each Impala node makes independent decisions to
+              run queries immediately or queue them, it is a soft limit; the
+              overall number of queued queries might be slightly higher during
+              times of heavy load. A negative value or 0 indicates requests are
+              always rejected once the maximum concurrent requests are
+              executing. Ignored if <code class="ph codeph">fair_scheduler_config_path</code>
+              and <code class="ph codeph">llama_site_path</code> are set. <p class="p">
+                <strong class="ph b">Type:</strong>
+                <code class="ph codeph">int64</code>
+              </p>
+              <p class="p">
+                <strong class="ph b">Default:</strong>
+                <span class="ph">unlimited</span>
+              </p>
+            </dd>
+
+
+            <dt class="dt dlterm" id="admission_config__default_pool_mem_limit">
+              <code class="ph codeph">default_pool_mem_limit</code>
+            </dt>
+            <dd class="dd">
+
+              <strong class="ph b">Purpose:</strong> Maximum amount of memory (across the entire
+              cluster) that all outstanding requests in this pool can use before
+              new requests to this pool are queued. Specified in bytes,
+              megabytes, or gigabytes by a number followed by the suffix
+                <code class="ph codeph">b</code> (optional), <code class="ph codeph">m</code>, or
+                <code class="ph codeph">g</code>, either uppercase or lowercase. You can
+              specify floating-point values for megabytes and gigabytes, to
+              represent fractional numbers such as <code class="ph codeph">1.5</code>. You can
+              also specify it as a percentage of the physical memory by
+              specifying the suffix <code class="ph codeph">%</code>. 0 or no setting
+              indicates no limit. Defaults to bytes if no unit is given. Because
+              this limit applies cluster-wide, but each Impala node makes
+              independent decisions to run queries immediately or queue them, it
+              is a soft limit; the overall memory used by concurrent queries
+              might be slightly higher during times of heavy load. Ignored if
+                <code class="ph codeph">fair_scheduler_config_path</code> and
+                <code class="ph codeph">llama_site_path</code> are set. <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+        Impala relies on the statistics produced by the <code class="ph codeph">COMPUTE STATS</code> statement to estimate memory
+        usage for each query. See <a class="xref" href="../shared/../topics/impala_compute_stats.html#compute_stats">COMPUTE STATS Statement</a> for guidelines
+        about how and when to use this statement.
+      </div>
+              <p class="p">
+        <strong class="ph b">Type:</strong> string
+      </p>
+              <p class="p">
+                <strong class="ph b">Default:</strong>
+                <code class="ph codeph">""</code> (empty string, meaning unlimited) </p>
+            </dd>
+
+
+            <dt class="dt dlterm" id="admission_config__disable_pool_max_requests">
+              <code class="ph codeph">disable_pool_max_requests</code>
+            </dt>
+            <dd class="dd">
+
+              <strong class="ph b">Purpose:</strong> Disables all per-pool limits on the maximum number
+              of running requests. <p class="p">
+                <strong class="ph b">Type:</strong> Boolean </p>
+              <p class="p">
+                <strong class="ph b">Default:</strong>
+                <code class="ph codeph">false</code>
+              </p>
+            </dd>
+
+
+            <dt class="dt dlterm" id="admission_config__disable_pool_mem_limits">
+              <code class="ph codeph">disable_pool_mem_limits</code>
+            </dt>
+            <dd class="dd">
+
+              <strong class="ph b">Purpose:</strong> Disables all per-pool mem limits. <p class="p">
+                <strong class="ph b">Type:</strong> Boolean </p>
+              <p class="p">
+                <strong class="ph b">Default:</strong>
+                <code class="ph codeph">false</code>
+              </p>
+            </dd>
+
+
+            <dt class="dt dlterm" id="admission_config__fair_scheduler_allocation_path">
+              <code class="ph codeph">fair_scheduler_allocation_path</code>
+            </dt>
+            <dd class="dd">
+
+              <strong class="ph b">Purpose:</strong> Path to the fair scheduler allocation file
+                (<code class="ph codeph">fair-scheduler.xml</code>). <p class="p">
+        <strong class="ph b">Type:</strong> string
+      </p>
+              <p class="p">
+                <strong class="ph b">Default:</strong>
+                <code class="ph codeph">""</code> (empty string) </p>
+              <p class="p">
+                <strong class="ph b">Usage notes:</strong> Admission control only uses a small subset
+                of the settings that can go in this file, as described below.
+                For details about all the Fair Scheduler configuration settings,
+                see the <a class="xref" href="http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html#Configuration" target="_blank">Apache wiki</a>. </p>
+            </dd>
+
+
+            <dt class="dt dlterm" id="admission_config__llama_site_path">
+              <code class="ph codeph">llama_site_path</code>
+            </dt>
+            <dd class="dd">
+
+              <strong class="ph b">Purpose:</strong> Path to the configuration file used by admission control
+                (<code class="ph codeph">llama-site.xml</code>). If set,
+                <code class="ph codeph">fair_scheduler_allocation_path</code> must also be set.
+              <p class="p">
+        <strong class="ph b">Type:</strong> string
+      </p>
+              <p class="p">
+                <strong class="ph b">Default:</strong> <code class="ph codeph">""</code> (empty string) </p>
+              <p class="p">
+                <strong class="ph b">Usage notes:</strong> Admission control only uses a few
+                of the settings that can go in this file, as described below.
+              </p>
+            </dd>
+
+        </dl>
+      </section>
+    </div>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title10" id="admission_config__admission_config_manual">
+
+      <h3 class="title topictitle3" id="ariaid-title10">Configuring Admission Control Using the Command Line</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          To configure admission control, use a combination of startup options for the Impala daemon and edit
+          or create the configuration files <span class="ph filepath">fair-scheduler.xml</span> and
+            <span class="ph filepath">llama-site.xml</span>.
+        </p>
+
+        <p class="p">
+          For a straightforward configuration using a single resource pool named <code class="ph codeph">default</code>, you can
+          specify configuration options on the command line and skip the <span class="ph filepath">fair-scheduler.xml</span>
+          and <span class="ph filepath">llama-site.xml</span> configuration files.
+        </p>
+
+        <p class="p">
+          For an advanced configuration with multiple resource pools using different settings, set up the
+          <span class="ph filepath">fair-scheduler.xml</span> and <span class="ph filepath">llama-site.xml</span> configuration files
+          manually. Provide the paths to each one using the <span class="keyword cmdname">impalad</span> command-line options,
+          <code class="ph codeph">--fair_scheduler_allocation_path</code> and <code class="ph codeph">--llama_site_path</code> respectively.
+        </p>
+
+        <p class="p">
+          The Impala admission control feature only uses the Fair Scheduler configuration settings to determine how
+          to map users and groups to different resource pools. For example, you might set up different resource
+          pools with separate memory limits, and maximum number of concurrent and queued queries, for different
+          categories of users within your organization. For details about all the Fair Scheduler configuration
+          settings, see the
+          <a class="xref" href="http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html#Configuration" target="_blank">Apache
+          wiki</a>.
+        </p>
+
+        <p class="p">
+          The Impala admission control feature only uses a small subset of possible settings from the
+          <span class="ph filepath">llama-site.xml</span> configuration file:
+        </p>
+
+<pre class="pre codeblock"><code>llama.am.throttling.maximum.placed.reservations.<var class="keyword varname">queue_name</var>
+llama.am.throttling.maximum.queued.reservations.<var class="keyword varname">queue_name</var>
+<span class="ph">impala.admission-control.pool-default-query-options.<var class="keyword varname">queue_name</var>
+impala.admission-control.pool-queue-timeout-ms.<var class="keyword varname">queue_name</var></span>
+</code></pre>
+
+        <p class="p">
+          The <code class="ph codeph">impala.admission-control.pool-queue-timeout-ms</code>
+          setting specifies the timeout value for this pool, in milliseconds.
+          The<code class="ph codeph">impala.admission-control.pool-default-query-options</code>
+          settings designates the default query options for all queries that run
+          in this pool. Its argument value is a comma-delimited string of
+          'key=value' pairs, for example,<code class="ph codeph">'key1=val1,key2=val2'</code>.
+          For example, this is where you might set a default memory limit
+          for all queries in the pool, using an argument such as <code class="ph codeph">MEM_LIMIT=5G</code>.
+        </p>
+
+        <p class="p">
+          The <code class="ph codeph">impala.admission-control.*</code> configuration settings are available in
+          <span class="keyword">Impala 2.5</span> and higher.
+        </p>
+
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title11" id="admission_config__admission_examples">
+
+      <h3 class="title topictitle3" id="ariaid-title11">Example of Admission Control Configuration</h3>
+
+      <div class="body conbody">
+
+        <p class="p"> Here are sample <span class="ph filepath">fair-scheduler.xml</span> and
+          <span class="ph filepath">llama-site.xml</span> files that define resource pools
+          <code class="ph codeph">root.default</code>, <code class="ph codeph">root.development</code>, and
+          <code class="ph codeph">root.production</code>. These sample files are stripped down: in a real
+          deployment they might contain other settings for use with various aspects of the YARN
+          component. The settings shown here are the significant ones for the Impala admission
+          control feature. </p>
+
+        <p class="p">
+          <strong class="ph b">fair-scheduler.xml:</strong>
+        </p>
+
+        <p class="p">
+          Although Impala does not use the <code class="ph codeph">vcores</code> value, you must still specify it to satisfy
+          YARN requirements for the file contents.
+        </p>
+
+        <p class="p">
+          Each <code class="ph codeph">&lt;aclSubmitApps&gt;</code> tag (other than the one for <code class="ph codeph">root</code>) contains
+          a comma-separated list of users, then a space, then a comma-separated list of groups; these are the
+          users and groups allowed to submit Impala statements to the corresponding resource pool.
+        </p>
+
+        <p class="p">
+          If you leave the <code class="ph codeph">&lt;aclSubmitApps&gt;</code> element empty for a pool, nobody can submit
+          directly to that pool; child pools can specify their own <code class="ph codeph">&lt;aclSubmitApps&gt;</code> values
+          to authorize users and groups to submit to those pools.
+        </p>
+
+        <pre class="pre codeblock"><code>&lt;allocations&gt;
+
+    &lt;queue name="root"&gt;
+        &lt;aclSubmitApps&gt; &lt;/aclSubmitApps&gt;
+        &lt;queue name="default"&gt;
+            &lt;maxResources&gt;50000 mb, 0 vcores&lt;/maxResources&gt;
+            &lt;aclSubmitApps&gt;*&lt;/aclSubmitApps&gt;
+        &lt;/queue&gt;
+        &lt;queue name="development"&gt;
+            &lt;maxResources&gt;200000 mb, 0 vcores&lt;/maxResources&gt;
+            &lt;aclSubmitApps&gt;user1,user2 dev,ops,admin&lt;/aclSubmitApps&gt;
+        &lt;/queue&gt;
+        &lt;queue name="production"&gt;
+            &lt;maxResources&gt;1000000 mb, 0 vcores&lt;/maxResources&gt;
+            &lt;aclSubmitApps&gt; ops,admin&lt;/aclSubmitApps&gt;
+        &lt;/queue&gt;
+    &lt;/queue&gt;
+    &lt;queuePlacementPolicy&gt;
+        &lt;rule name="specified" create="false"/&gt;
+        &lt;rule name="default" /&gt;
+    &lt;/queuePlacementPolicy&gt;
+&lt;/allocations&gt;
+
+</code></pre>
+
+        <p class="p">
+          <strong class="ph b">llama-site.xml:</strong>
+        </p>
+
+        <pre class="pre codeblock"><code>
+&lt;?xml version="1.0" encoding="UTF-8"?&gt;
+&lt;configuration&gt;
+  &lt;property&gt;
+    &lt;name&gt;llama.am.throttling.maximum.placed.reservations.root.default&lt;/name&gt;
+    &lt;value&gt;10&lt;/value&gt;
+  &lt;/property&gt;
+  &lt;property&gt;
+    &lt;name&gt;llama.am.throttling.maximum.queued.reservations.root.default&lt;/name&gt;
+    &lt;value&gt;50&lt;/value&gt;
+  &lt;/property&gt;
+  &lt;property&gt;
+    &lt;name&gt;impala.admission-control.pool-default-query-options.root.default&lt;/name&gt;
+    &lt;value&gt;mem_limit=128m,query_timeout_s=20,max_io_buffers=10&lt;/value&gt;
+  &lt;/property&gt;
+  &lt;property&gt;
+    &lt;name&gt;impala.admission-control.pool-queue-timeout-ms.root.default&lt;/name&gt;
+    &lt;value&gt;30000&lt;/value&gt;
+  &lt;/property&gt;
+  &lt;property&gt;
+    &lt;name&gt;llama.am.throttling.maximum.placed.reservations.root.development&lt;/name&gt;
+    &lt;value&gt;50&lt;/value&gt;
+  &lt;/property&gt;
+  &lt;property&gt;
+    &lt;name&gt;llama.am.throttling.maximum.queued.reservations.root.development&lt;/name&gt;
+    &lt;value&gt;100&lt;/value&gt;
+  &lt;/property&gt;
+  &lt;property&gt;
+    &lt;name&gt;impala.admission-control.pool-default-query-options.root.development&lt;/name&gt;
+    &lt;value&gt;mem_limit=256m,query_timeout_s=30,max_io_buffers=10&lt;/value&gt;
+  &lt;/property&gt;
+  &lt;property&gt;
+    &lt;name&gt;impala.admission-control.pool-queue-timeout-ms.root.development&lt;/name&gt;
+    &lt;value&gt;15000&lt;/value&gt;
+  &lt;/property&gt;
+  &lt;property&gt;
+    &lt;name&gt;llama.am.throttling.maximum.placed.reservations.root.production&lt;/name&gt;
+    &lt;value&gt;100&lt;/value&gt;
+  &lt;/property&gt;
+  &lt;property&gt;
+    &lt;name&gt;llama.am.throttling.maximum.queued.reservations.root.production&lt;/name&gt;
+    &lt;value&gt;200&lt;/value&gt;
+  &lt;/property&gt;
+&lt;!--
+       Default query options for the 'root.production' pool.
+       THIS IS A NEW PARAMETER in Impala 2.5.
+       Note that the MEM_LIMIT query option still shows up in here even though it is a
+       separate box in the UI. We do that because it is the most important query option
+       that people will need (everything else is somewhat advanced).
+
+       MEM_LIMIT takes a per-node memory limit which is specified using one of the following:
+        - '&lt;int&gt;[bB]?'  -&gt; bytes (default if no unit given)
+        - '&lt;float&gt;[mM(bB)]' -&gt; megabytes
+        - '&lt;float&gt;[gG(bB)]' -&gt; in gigabytes
+        E.g. 'MEM_LIMIT=12345' (no unit) means 12345 bytes, and you can append m or g
+             to specify megabytes or gigabytes, though that is not required.
+--&gt;
+  &lt;property&gt;
+    &lt;name&gt;impala.admission-control.pool-default-query-options.root.production&lt;/name&gt;
+    &lt;value&gt;mem_limit=386m,query_timeout_s=30,max_io_buffers=10&lt;/value&gt;
+  &lt;/property&gt;
+&lt;!--
+  Default queue timeout (ms) for the pool 'root.production'.
+  If this isn’t set, the process-wide flag is used.
+  THIS IS A NEW PARAMETER in Impala 2.5.
+--&gt;
+  &lt;property&gt;
+    &lt;name&gt;impala.admission-control.pool-queue-timeout-ms.root.production&lt;/name&gt;
+    &lt;value&gt;30000&lt;/value&gt;
+  &lt;/property&gt;
+&lt;/configuration&gt;
+
+</code></pre>
+
+      </div>
+    </article>
+
+
+
+  <article class="topic concept nested2" aria-labelledby="ariaid-title12" id="admission_config__admission_guidelines">
+
+    <h3 class="title topictitle3" id="ariaid-title12">Guidelines for Using Admission Control</h3>
+
+
+    <div class="body conbody">
+
+      <p class="p">
+        To see how admission control works for particular queries, examine the profile output for the query. This
+        information is available through the <code class="ph codeph">PROFILE</code> statement in <span class="keyword cmdname">impala-shell</span>
+        immediately after running a query in the shell, on the <span class="ph uicontrol">queries</span> page of the Impala
+        debug web UI, or in the Impala log file (basic information at log level 1, more detailed information at log
+        level 2). The profile output contains details about the admission decision, such as whether the query was
+        queued or not and which resource pool it was assigned to. It also includes the estimated and actual memory
+        usage for the query, so you can fine-tune the configuration for the memory limits of the resource pools.
+      </p>
+
+      <p class="p">
+        Remember that the limits imposed by admission control are <span class="q">"soft"</span> limits.
+        The decentralized nature of this mechanism means that each Impala node makes its own decisions about whether
+        to allow queries to run immediately or to queue them. These decisions rely on information passed back and forth
+        between nodes by the statestore service. If a sudden surge in requests causes more queries than anticipated to run
+        concurrently, then throughput could decrease due to queries spilling to disk or contending for resources;
+        or queries could be cancelled if they exceed the <code class="ph codeph">MEM_LIMIT</code> setting while running.
+      </p>
+
+
+
+      <p class="p">
+        In <span class="keyword cmdname">impala-shell</span>, you can also specify which resource pool to direct queries to by
+        setting the <code class="ph codeph">REQUEST_POOL</code> query option.
+      </p>
+
+      <p class="p">
+        The statements affected by the admission control feature are primarily queries, but also include statements
+        that write data such as <code class="ph codeph">INSERT</code> and <code class="ph codeph">CREATE TABLE AS SELECT</code>. Most write
+        operations in Impala are not resource-intensive, but inserting into a Parquet table can require substantial
+        memory due to buffering intermediate data before writing out each Parquet data block. See
+        <a class="xref" href="impala_parquet.html#parquet_etl">Loading Data into Parquet Tables</a> for instructions about inserting data efficiently into
+        Parquet tables.
+      </p>
+
+      <p class="p">
+        Although admission control does not scrutinize memory usage for other kinds of DDL statements, if a query
+        is queued due to a limit on concurrent queries or memory usage, subsequent statements in the same session
+        are also queued so that they are processed in the correct order:
+      </p>
+
+<pre class="pre codeblock"><code>-- This query could be queued to avoid out-of-memory at times of heavy load.
+select * from huge_table join enormous_table using (id);
+-- If so, this subsequent statement in the same session is also queued
+-- until the previous statement completes.
+drop table huge_table;
+</code></pre>
+
+      <p class="p">
+        If you set up different resource pools for different users and groups, consider reusing any classifications
+        you developed for use with Sentry security. See <a class="xref" href="impala_authorization.html#authorization">Enabling Sentry Authorization for Impala</a> for details.
+      </p>
+
+      <p class="p">
+        For details about all the Fair Scheduler configuration settings, see
+        <a class="xref" href="http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html#Configuration" target="_blank">Fair Scheduler Configuration</a>, in particular the tags such as <code class="ph codeph">&lt;queue&gt;</code> and
+        <code class="ph codeph">&lt;aclSubmitApps&gt;</code> to map users and groups to particular resource pools (queues).
+      </p>
+
+
+    </div>
+  </article>
+</article>
+</article></main></body></html>

[07/51] [partial] impala git commit: [DOCS] Impala doc site update for 3.0

Posted by mi...@apache.org.

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_s3.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_s3.html b/docs/build3x/html/topics/impala_s3.html
new file mode 100644
index 0000000..33aa361
--- /dev/null
+++ b/docs/build3x/html/topics/impala_s3.html
@@ -0,0 +1,775 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="s3"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Using Impala with the Amazon S3 Filesystem</title></head><body id="s3"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Using Impala with the Amazon S3 Filesystem</h1>
+
+
+
+  <div class="body conbody">
+
+    <div class="note important note_important"><span class="note__title importanttitle">Important:</span>
+        <p class="p">
+          In <span class="keyword">Impala 2.6</span> and higher, Impala supports both queries (<code class="ph codeph">SELECT</code>)
+          and DML (<code class="ph codeph">INSERT</code>, <code class="ph codeph">LOAD DATA</code>, <code class="ph codeph">CREATE TABLE AS SELECT</code>)
+          for data residing on Amazon S3. With the inclusion of write support,
+
+          the Impala support for S3 is now considered ready for production use.
+        </p>
+      </div>
+
+    <p class="p">
+
+
+
+      You can use Impala to query data residing on the Amazon S3 filesystem. This capability allows convenient
+      access to a storage system that is remotely managed, accessible from anywhere, and integrated with various
+      cloud-based services. Impala can query files in any supported file format from S3. The S3 storage location
+      can be for an entire table, or individual partitions in a partitioned table.
+    </p>
+
+    <p class="p">
+      The default Impala tables use data files stored on HDFS, which are ideal for bulk loads and queries using
+      full-table scans. In contrast, queries against S3 data are less performant, making S3 suitable for holding
+      <span class="q">"cold"</span> data that is only queried occasionally, while more frequently accessed <span class="q">"hot"</span> data resides in
+      HDFS. In a partitioned table, you can set the <code class="ph codeph">LOCATION</code> attribute for individual partitions
+      to put some partitions on HDFS and others on S3, typically depending on the age of the data.
+    </p>
+
+    <p class="p toc inpage"></p>
+
+  </div>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title2" id="s3__s3_sql">
+    <h2 class="title topictitle2" id="ariaid-title2">How Impala SQL Statements Work with S3</h2>
+    <div class="body conbody">
+      <p class="p">
+        Impala SQL statements work with data on S3 as follows:
+      </p>
+      <ul class="ul">
+        <li class="li">
+          <p class="p">
+            The <a class="xref" href="impala_create_table.html#create_table">CREATE TABLE Statement</a>
+            or <a class="xref" href="impala_alter_table.html#alter_table">ALTER TABLE Statement</a> statements
+            can specify that a table resides on the S3 filesystem by
+            encoding an <code class="ph codeph">s3a://</code> prefix for the <code class="ph codeph">LOCATION</code>
+            property. <code class="ph codeph">ALTER TABLE</code> can also set the <code class="ph codeph">LOCATION</code>
+            property for an individual partition, so that some data in a table resides on
+            S3 and other data in the same table resides on HDFS.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            Once a table or partition is designated as residing on S3, the <a class="xref" href="impala_select.html#select">SELECT Statement</a>
+            statement transparently accesses the data files from the appropriate storage layer.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            If the S3 table is an internal table, the <a class="xref" href="impala_drop_table.html#drop_table">DROP TABLE Statement</a> statement
+            removes the corresponding data files from S3 when the table is dropped.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            The <a class="xref" href="impala_truncate_table.html#truncate_table">TRUNCATE TABLE Statement (Impala 2.3 or higher only)</a> statement always removes the corresponding
+            data files from S3 when the table is truncated.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            The <a class="xref" href="impala_load_data.html#load_data">LOAD DATA Statement</a> can move data files residing in HDFS into
+            an S3 table.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            The <a class="xref" href="impala_insert.html#insert">INSERT Statement</a> statement, or the <code class="ph codeph">CREATE TABLE AS SELECT</code>
+            form of the <code class="ph codeph">CREATE TABLE</code> statement, can copy data from an HDFS table or another S3
+            table into an S3 table. The <a class="xref" href="impala_s3_skip_insert_staging.html#s3_skip_insert_staging">S3_SKIP_INSERT_STAGING Query Option (Impala 2.6 or higher only)</a>
+            query option chooses whether or not to use a fast code path for these write operations to S3,
+            with the tradeoff of potential inconsistency in the case of a failure during the statement.
+          </p>
+        </li>
+      </ul>
+      <p class="p">
+        For usage information about Impala SQL statements with S3 tables, see <a class="xref" href="impala_s3.html#s3_ddl">Creating Impala Databases, Tables, and Partitions for Data Stored on S3</a>
+        and <a class="xref" href="impala_s3.html#s3_dml">Using Impala DML Statements for S3 Data</a>.
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="s3__s3_creds">
+
+    <h2 class="title topictitle2" id="ariaid-title3">Specifying Impala Credentials to Access Data in S3</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+
+
+
+
+        To allow Impala to access data in S3, specify values for the following configuration settings in your
+        <span class="ph filepath">core-site.xml</span> file:
+      </p>
+
+
+<pre class="pre codeblock"><code>
+&lt;property&gt;
+&lt;name&gt;fs.s3a.access.key&lt;/name&gt;
+&lt;value&gt;<var class="keyword varname">your_access_key</var>&lt;/value&gt;
+&lt;/property&gt;
+&lt;property&gt;
+&lt;name&gt;fs.s3a.secret.key&lt;/name&gt;
+&lt;value&gt;<var class="keyword varname">your_secret_key</var>&lt;/value&gt;
+&lt;/property&gt;
+</code></pre>
+
+      <p class="p">
+        After specifying the credentials, restart both the Impala and
+        Hive services. (Restarting Hive is required because Impala queries, CREATE TABLE statements, and so on go
+        through the Hive metastore.)
+      </p>
+
+      <div class="note important note_important"><span class="note__title importanttitle">Important:</span>
+
+          <p class="p">
+            Although you can specify the access key ID and secret key as part of the <code class="ph codeph">s3a://</code> URL in the
+            <code class="ph codeph">LOCATION</code> attribute, doing so makes this sensitive information visible in many places, such
+            as <code class="ph codeph">DESCRIBE FORMATTED</code> output and Impala log files. Therefore, specify this information
+            centrally in the <span class="ph filepath">core-site.xml</span> file, and restrict read access to that file to only
+            trusted users.
+          </p>
+
+
+
+      </div>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="s3__s3_etl">
+
+    <h2 class="title topictitle2" id="ariaid-title4">Loading Data into S3 for Impala Queries</h2>
+
+
+    <div class="body conbody">
+
+      <p class="p">
+        If your ETL pipeline involves moving data into S3 and then querying through Impala,
+        you can either use Impala DML statements to create, move, or copy the data, or
+        use the same data loading techniques as you would for non-Impala data.
+      </p>
+
+    </div>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title5" id="s3_etl__s3_dml">
+      <h3 class="title topictitle3" id="ariaid-title5">Using Impala DML Statements for S3 Data</h3>
+      <div class="body conbody">
+        <p class="p">
+        In <span class="keyword">Impala 2.6</span> and higher, the Impala DML statements (<code class="ph codeph">INSERT</code>, <code class="ph codeph">LOAD DATA</code>,
+        and <code class="ph codeph">CREATE TABLE AS SELECT</code>) can write data into a table or partition that resides in the
+        Amazon Simple Storage Service (S3).
+        The syntax of the DML statements is the same as for any other tables, because the S3 location for tables and
+        partitions is specified by an <code class="ph codeph">s3a://</code> prefix in the
+        <code class="ph codeph">LOCATION</code> attribute of
+        <code class="ph codeph">CREATE TABLE</code> or <code class="ph codeph">ALTER TABLE</code> statements.
+        If you bring data into S3 using the normal S3 transfer mechanisms instead of Impala DML statements,
+        issue a <code class="ph codeph">REFRESH</code> statement for the table before using Impala to query the S3 data.
+      </p>
+        <p class="p">
+        Because of differences between S3 and traditional filesystems, DML operations
+        for S3 tables can take longer than for tables on HDFS. For example, both the
+        <code class="ph codeph">LOAD DATA</code> statement and the final stage of the <code class="ph codeph">INSERT</code>
+        and <code class="ph codeph">CREATE TABLE AS SELECT</code> statements involve moving files from one directory
+        to another. (In the case of <code class="ph codeph">INSERT</code> and <code class="ph codeph">CREATE TABLE AS SELECT</code>,
+        the files are moved from a temporary staging directory to the final destination directory.)
+        Because S3 does not support a <span class="q">"rename"</span> operation for existing objects, in these cases Impala
+        actually copies the data files from one location to another and then removes the original files.
+        In <span class="keyword">Impala 2.6</span>, the <code class="ph codeph">S3_SKIP_INSERT_STAGING</code> query option provides a way
+        to speed up <code class="ph codeph">INSERT</code> statements for S3 tables and partitions, with the tradeoff
+        that a problem during statement execution could leave data in an inconsistent state.
+        It does not apply to <code class="ph codeph">INSERT OVERWRITE</code> or <code class="ph codeph">LOAD DATA</code> statements.
+        See <a class="xref" href="../shared/../topics/impala_s3_skip_insert_staging.html#s3_skip_insert_staging">S3_SKIP_INSERT_STAGING Query Option (Impala 2.6 or higher only)</a> for details.
+      </p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title6" id="s3_etl__s3_manual_etl">
+      <h3 class="title topictitle3" id="ariaid-title6">Manually Loading Data into Impala Tables on S3</h3>
+      <div class="body conbody">
+        <p class="p">
+          As an alternative, or on earlier Impala releases without DML support for S3,
+          you can use the Amazon-provided methods to bring data files into S3 for querying through Impala. See
+          <a class="xref" href="http://aws.amazon.com/s3/" target="_blank">the Amazon S3 web site</a> for
+          details.
+        </p>
+
+        <div class="note important note_important"><span class="note__title importanttitle">Important:</span>
+          <div class="p">
+        For best compatibility with the S3 write support in <span class="keyword">Impala 2.6</span>
+        and higher:
+        <ul class="ul">
+        <li class="li">Use native Hadoop techniques to create data files in S3 for querying through Impala.</li>
+        <li class="li">Use the <code class="ph codeph">PURGE</code> clause of <code class="ph codeph">DROP TABLE</code> when dropping internal (managed) tables.</li>
+        </ul>
+        By default, when you drop an internal (managed) table, the data files are
+        moved to the HDFS trashcan. This operation is expensive for tables that
+        reside on the Amazon S3 filesystem. Therefore, for S3 tables, prefer to use
+        <code class="ph codeph">DROP TABLE <var class="keyword varname">table_name</var> PURGE</code> rather than the default <code class="ph codeph">DROP TABLE</code> statement.
+        The <code class="ph codeph">PURGE</code> clause makes Impala delete the data files immediately,
+        skipping the HDFS trashcan.
+        For the <code class="ph codeph">PURGE</code> clause to work effectively, you must originally create the
+        data files on S3 using one of the tools from the Hadoop ecosystem, such as
+        <code class="ph codeph">hadoop fs -cp</code>, or <code class="ph codeph">INSERT</code> in Impala or Hive.
+      </div>
+        </div>
+
+        <p class="p">
+          Alternative file creation techniques (less compatible with the <code class="ph codeph">PURGE</code> clause) include:
+        </p>
+
+        <ul class="ul">
+          <li class="li">
+            The <a class="xref" href="https://console.aws.amazon.com/s3/home" target="_blank">Amazon AWS / S3
+            web interface</a> to upload from a web browser.
+          </li>
+
+          <li class="li">
+            The <a class="xref" href="http://aws.amazon.com/cli/" target="_blank">Amazon AWS CLI</a> to
+            manipulate files from the command line.
+          </li>
+
+          <li class="li">
+            Other S3-enabled software, such as
+            <a class="xref" href="http://s3tools.org/s3cmd" target="_blank">the S3Tools client software</a>.
+          </li>
+        </ul>
+
+        <p class="p">
+          After you upload data files to a location already mapped to an Impala table or partition, or if you delete
+          files in S3 from such a location, issue the <code class="ph codeph">REFRESH <var class="keyword varname">table_name</var></code>
+          statement to make Impala aware of the new set of data files.
+        </p>
+
+      </div>
+    </article>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title7" id="s3__s3_ddl">
+
+    <h2 class="title topictitle2" id="ariaid-title7">Creating Impala Databases, Tables, and Partitions for Data Stored on S3</h2>
+
+
+    <div class="body conbody">
+
+      <p class="p">
+        Impala reads data for a table or partition from S3 based on the <code class="ph codeph">LOCATION</code> attribute for the
+        table or partition. Specify the S3 details in the <code class="ph codeph">LOCATION</code> clause of a <code class="ph codeph">CREATE
+        TABLE</code> or <code class="ph codeph">ALTER TABLE</code> statement. The notation for the <code class="ph codeph">LOCATION</code>
+        clause is <code class="ph codeph">s3a://<var class="keyword varname">bucket_name</var>/<var class="keyword varname">path/to/file</var></code>. The
+        filesystem prefix is always <code class="ph codeph">s3a://</code> because Impala does not support the <code class="ph codeph">s3://</code> or
+        <code class="ph codeph">s3n://</code> prefixes.
+      </p>
+
+      <p class="p">
+        For a partitioned table, either specify a separate <code class="ph codeph">LOCATION</code> clause for each new partition,
+        or specify a base <code class="ph codeph">LOCATION</code> for the table and set up a directory structure in S3 to mirror
+        the way Impala partitioned tables are structured in HDFS. Although, strictly speaking, S3 filenames do not
+        have directory paths, Impala treats S3 filenames with <code class="ph codeph">/</code> characters the same as HDFS
+        pathnames that include directories.
+      </p>
+
+      <p class="p">
+        You point a nonpartitioned table or an individual partition at S3 by specifying a single directory
+        path in S3, which could be any arbitrary directory. To replicate the structure of an entire Impala
+        partitioned table or database in S3 requires more care, with directories and subdirectories nested and
+        named to match the equivalent directory tree in HDFS. Consider setting up an empty staging area if
+        necessary in HDFS, and recording the complete directory structure so that you can replicate it in S3.
+
+      </p>
+
+      <p class="p">
+        For convenience when working with multiple tables with data files stored in S3, you can create a database
+        with a <code class="ph codeph">LOCATION</code> attribute pointing to an S3 path.
+        Specify a URL of the form <code class="ph codeph">s3a://<var class="keyword varname">bucket</var>/<var class="keyword varname">root/path/for/database</var></code>
+        for the <code class="ph codeph">LOCATION</code> attribute of the database.
+        Any tables created inside that database
+        automatically create directories underneath the one specified by the database
+        <code class="ph codeph">LOCATION</code> attribute.
+      </p>
+
+      <p class="p">
+        For example, the following session creates a partitioned table where only a single partition resides on S3.
+        The partitions for years 2013 and 2014 are located on HDFS. The partition for year 2015 includes a
+        <code class="ph codeph">LOCATION</code> attribute with an <code class="ph codeph">s3a://</code> URL, and so refers to data residing on
+        S3, under a specific path underneath the bucket <code class="ph codeph">impala-demo</code>.
+      </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; create database db_on_hdfs;
+[localhost:21000] &gt; use db_on_hdfs;
+[localhost:21000] &gt; create table mostly_on_hdfs (x int) partitioned by (year int);
+[localhost:21000] &gt; alter table mostly_on_hdfs add partition (year=2013);
+[localhost:21000] &gt; alter table mostly_on_hdfs add partition (year=2014);
+[localhost:21000] &gt; alter table mostly_on_hdfs add partition (year=2015)
+                  &gt;   location 's3a://impala-demo/dir1/dir2/dir3/t1';
+</code></pre>
+
+      <p class="p">
+        The following session creates a database and two partitioned tables residing entirely on S3, one
+        partitioned by a single column and the other partitioned by multiple columns. Because a
+        <code class="ph codeph">LOCATION</code> attribute with an <code class="ph codeph">s3a://</code> URL is specified for the database, the
+        tables inside that database are automatically created on S3 underneath the database directory. To see the
+        names of the associated subdirectories, including the partition key values, we use an S3 client tool to
+        examine how the directory structure is organized on S3. For example, Impala partition directories such as
+        <code class="ph codeph">month=1</code> do not include leading zeroes, which sometimes appear in partition directories created
+        through Hive.
+      </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; create database db_on_s3 location 's3a://impala-demo/dir1/dir2/dir3';
+[localhost:21000] &gt; use db_on_s3;
+
+[localhost:21000] &gt; create table partitioned_on_s3 (x int) partitioned by (year int);
+[localhost:21000] &gt; alter table partitioned_on_s3 add partition (year=2013);
+[localhost:21000] &gt; alter table partitioned_on_s3 add partition (year=2014);
+[localhost:21000] &gt; alter table partitioned_on_s3 add partition (year=2015);
+
+[localhost:21000] &gt; !aws s3 ls s3://impala-demo/dir1/dir2/dir3 --recursive;
+2015-03-17 13:56:34          0 dir1/dir2/dir3/
+2015-03-17 16:43:28          0 dir1/dir2/dir3/partitioned_on_s3/
+2015-03-17 16:43:49          0 dir1/dir2/dir3/partitioned_on_s3/year=2013/
+2015-03-17 16:43:53          0 dir1/dir2/dir3/partitioned_on_s3/year=2014/
+2015-03-17 16:43:58          0 dir1/dir2/dir3/partitioned_on_s3/year=2015/
+
+[localhost:21000] &gt; create table partitioned_multiple_keys (x int)
+                  &gt;   partitioned by (year smallint, month tinyint, day tinyint);
+[localhost:21000] &gt; alter table partitioned_multiple_keys
+                  &gt;   add partition (year=2015,month=1,day=1);
+[localhost:21000] &gt; alter table partitioned_multiple_keys
+                  &gt;   add partition (year=2015,month=1,day=31);
+[localhost:21000] &gt; alter table partitioned_multiple_keys
+                  &gt;   add partition (year=2015,month=2,day=28);
+
+[localhost:21000] &gt; !aws s3 ls s3://impala-demo/dir1/dir2/dir3 --recursive;
+2015-03-17 13:56:34          0 dir1/dir2/dir3/
+2015-03-17 16:47:13          0 dir1/dir2/dir3/partitioned_multiple_keys/
+2015-03-17 16:47:44          0 dir1/dir2/dir3/partitioned_multiple_keys/year=2015/month=1/day=1/
+2015-03-17 16:47:50          0 dir1/dir2/dir3/partitioned_multiple_keys/year=2015/month=1/day=31/
+2015-03-17 16:47:57          0 dir1/dir2/dir3/partitioned_multiple_keys/year=2015/month=2/day=28/
+2015-03-17 16:43:28          0 dir1/dir2/dir3/partitioned_on_s3/
+2015-03-17 16:43:49          0 dir1/dir2/dir3/partitioned_on_s3/year=2013/
+2015-03-17 16:43:53          0 dir1/dir2/dir3/partitioned_on_s3/year=2014/
+2015-03-17 16:43:58          0 dir1/dir2/dir3/partitioned_on_s3/year=2015/
+</code></pre>
+
+      <p class="p">
+        The <code class="ph codeph">CREATE DATABASE</code> and <code class="ph codeph">CREATE TABLE</code> statements create the associated
+        directory paths if they do not already exist. You can specify multiple levels of directories, and the
+        <code class="ph codeph">CREATE</code> statement creates all appropriate levels, similar to using <code class="ph codeph">mkdir
+        -p</code>.
+      </p>
+
+      <p class="p">
+        Use the standard S3 file upload methods to actually put the data files into the right locations. You can
+        also put the directory paths and data files in place before creating the associated Impala databases or
+        tables, and Impala automatically uses the data from the appropriate location after the associated databases
+        and tables are created.
+      </p>
+
+      <p class="p">
+        You can switch whether an existing table or partition points to data in HDFS or S3. For example, if you
+        have an Impala table or partition pointing to data files in HDFS or S3, and you later transfer those data
+        files to the other filesystem, use an <code class="ph codeph">ALTER TABLE</code> statement to adjust the
+        <code class="ph codeph">LOCATION</code> attribute of the corresponding table or partition to reflect that change. Because
+        Impala does not have an <code class="ph codeph">ALTER DATABASE</code> statement, this location-switching technique is not
+        practical for entire databases that have a custom <code class="ph codeph">LOCATION</code> attribute.
+      </p>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title8" id="s3__s3_internal_external">
+
+    <h2 class="title topictitle2" id="ariaid-title8">Internal and External Tables Located on S3</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        Just as with tables located on HDFS storage, you can designate S3-based tables as either internal (managed
+        by Impala) or external, by using the syntax <code class="ph codeph">CREATE TABLE</code> or <code class="ph codeph">CREATE EXTERNAL
+        TABLE</code> respectively. When you drop an internal table, the files associated with the table are
+        removed, even if they are on S3 storage. When you drop an external table, the files associated with the
+        table are left alone, and are still available for access by other tools or components. See
+        <a class="xref" href="impala_tables.html#tables">Overview of Impala Tables</a> for details.
+      </p>
+
+      <p class="p">
+        If the data on S3 is intended to be long-lived and accessed by other tools in addition to Impala, create
+        any associated S3 tables with the <code class="ph codeph">CREATE EXTERNAL TABLE</code> syntax, so that the files are not
+        deleted from S3 when the table is dropped.
+      </p>
+
+      <p class="p">
+        If the data on S3 is only needed for querying by Impala and can be safely discarded once the Impala
+        workflow is complete, create the associated S3 tables using the <code class="ph codeph">CREATE TABLE</code> syntax, so
+        that dropping the table also deletes the corresponding data files on S3.
+      </p>
+
+      <p class="p">
+        For example, this session creates a table in S3 with the same column layout as a table in HDFS, then
+        examines the S3 table and queries some data from it. The table in S3 works the same as a table in HDFS as
+        far as the expected file format of the data, table and column statistics, and other table properties. The
+        only indication that it is not an HDFS table is the <code class="ph codeph">s3a://</code> URL in the
+        <code class="ph codeph">LOCATION</code> property. Many data files can reside in the S3 directory, and their combined
+        contents form the table data. Because the data in this example is uploaded after the table is created, a
+        <code class="ph codeph">REFRESH</code> statement prompts Impala to update its cached information about the data files.
+      </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; create table usa_cities_s3 like usa_cities location 's3a://impala-demo/usa_cities';
+[localhost:21000] &gt; desc usa_cities_s3;
++-------+----------+---------+
+| name  | type     | comment |
++-------+----------+---------+
+| id    | smallint |         |
+| city  | string   |         |
+| state | string   |         |
++-------+----------+---------+
+
+-- Now from a web browser, upload the same data file(s) to S3 as in the HDFS table,
+-- under the relevant bucket and path. If you already have the data in S3, you would
+-- point the table LOCATION at an existing path.
+
+[localhost:21000] &gt; refresh usa_cities_s3;
+[localhost:21000] &gt; select count(*) from usa_cities_s3;
++----------+
+| count(*) |
++----------+
+| 289      |
++----------+
+[localhost:21000] &gt; select distinct state from sample_data_s3 limit 5;
++----------------------+
+| state                |
++----------------------+
+| Louisiana            |
+| Minnesota            |
+| Georgia              |
+| Alaska               |
+| Ohio                 |
++----------------------+
+[localhost:21000] &gt; desc formatted usa_cities_s3;
++------------------------------+------------------------------+---------+
+| name                         | type                         | comment |
++------------------------------+------------------------------+---------+
+| # col_name                   | data_type                    | comment |
+|                              | NULL                         | NULL    |
+| id                           | smallint                     | NULL    |
+| city                         | string                       | NULL    |
+| state                        | string                       | NULL    |
+|                              | NULL                         | NULL    |
+| # Detailed Table Information | NULL                         | NULL    |
+| Database:                    | s3_testing                   | NULL    |
+| Owner:                       | jrussell                     | NULL    |
+| CreateTime:                  | Mon Mar 16 11:36:25 PDT 2015 | NULL    |
+| LastAccessTime:              | UNKNOWN                      | NULL    |
+| Protect Mode:                | None                         | NULL    |
+| Retention:                   | 0                            | NULL    |
+| Location:                    | s3a://impala-demo/usa_cities | NULL    |
+| Table Type:                  | MANAGED_TABLE                | NULL    |
+...
++------------------------------+------------------------------+---------+
+</code></pre>
+
+
+
+      <p class="p">
+        In this case, we have already uploaded a Parquet file with a million rows of data to the
+        <code class="ph codeph">sample_data</code> directory underneath the <code class="ph codeph">impala-demo</code> bucket on S3. This
+        session creates a table with matching column settings pointing to the corresponding location in S3, then
+        queries the table. Because the data is already in place on S3 when the table is created, no
+        <code class="ph codeph">REFRESH</code> statement is required.
+      </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; create table sample_data_s3
+                  &gt; (id int, id bigint, val int, zerofill string,
+                  &gt; name string, assertion boolean, city string, state string)
+                  &gt; stored as parquet location 's3a://impala-demo/sample_data';
+[localhost:21000] &gt; select count(*) from sample_data_s3;;
++----------+
+| count(*) |
++----------+
+| 1000000  |
++----------+
+[localhost:21000] &gt; select count(*) howmany, assertion from sample_data_s3 group by assertion;
++---------+-----------+
+| howmany | assertion |
++---------+-----------+
+| 667149  | true      |
+| 332851  | false     |
++---------+-----------+
+</code></pre>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title9" id="s3__s3_queries">
+
+    <h2 class="title topictitle2" id="ariaid-title9">Running and Tuning Impala Queries for Data Stored on S3</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        Once the appropriate <code class="ph codeph">LOCATION</code> attributes are set up at the table or partition level, you
+        query data stored in S3 exactly the same as data stored on HDFS or in HBase:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          Queries against S3 data support all the same file formats as for HDFS data.
+        </li>
+
+        <li class="li">
+          Tables can be unpartitioned or partitioned. For partitioned tables, either manually construct paths in S3
+          corresponding to the HDFS directories representing partition key values, or use <code class="ph codeph">ALTER TABLE ...
+          ADD PARTITION</code> to set up the appropriate paths in S3.
+        </li>
+
+        <li class="li">
+          HDFS and HBase tables can be joined to S3 tables, or S3 tables can be joined with each other.
+        </li>
+
+        <li class="li">
+          Authorization using the Sentry framework to control access to databases, tables, or columns works the
+          same whether the data is in HDFS or in S3.
+        </li>
+
+        <li class="li">
+          The <span class="keyword cmdname">catalogd</span> daemon caches metadata for both HDFS and S3 tables. Use
+          <code class="ph codeph">REFRESH</code> and <code class="ph codeph">INVALIDATE METADATA</code> for S3 tables in the same situations
+          where you would issue those statements for HDFS tables.
+        </li>
+
+        <li class="li">
+          Queries against S3 tables are subject to the same kinds of admission control and resource management as
+          HDFS tables.
+        </li>
+
+        <li class="li">
+          Metadata about S3 tables is stored in the same metastore database as for HDFS tables.
+        </li>
+
+        <li class="li">
+          You can set up views referring to S3 tables, the same as for HDFS tables.
+        </li>
+
+        <li class="li">
+          The <code class="ph codeph">COMPUTE STATS</code>, <code class="ph codeph">SHOW TABLE STATS</code>, and <code class="ph codeph">SHOW COLUMN
+          STATS</code> statements work for S3 tables also.
+        </li>
+      </ul>
+
+    </div>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title10" id="s3_queries__s3_performance">
+
+      <h3 class="title topictitle3" id="ariaid-title10">Understanding and Tuning Impala Query Performance for S3 Data</h3>
+
+
+      <div class="body conbody">
+
+        <p class="p">
+          Although Impala queries for data stored in S3 might be less performant than queries against the
+          equivalent data stored in HDFS, you can still do some tuning. Here are techniques you can use to
+          interpret explain plans and profiles for queries against S3 data, and tips to achieve the best
+          performance possible for such queries.
+        </p>
+
+        <p class="p">
+          All else being equal, performance is expected to be lower for queries running against data on S3 rather
+          than HDFS. The actual mechanics of the <code class="ph codeph">SELECT</code> statement are somewhat different when the
+          data is in S3. Although the work is still distributed across the datanodes of the cluster, Impala might
+          parallelize the work for a distributed query differently for data on HDFS and S3. S3 does not have the
+          same block notion as HDFS, so Impala uses heuristics to determine how to split up large S3 files for
+          processing in parallel. Because all hosts can access any S3 data file with equal efficiency, the
+          distribution of work might be different than for HDFS data, where the data blocks are physically read
+          using short-circuit local reads by hosts that contain the appropriate block replicas. Although the I/O to
+          read the S3 data might be spread evenly across the hosts of the cluster, the fact that all data is
+          initially retrieved across the network means that the overall query performance is likely to be lower for
+          S3 data than for HDFS data.
+        </p>
+
+        <p class="p">
+        In <span class="keyword">Impala 2.6</span> and higher, Impala queries are optimized for files stored in Amazon S3.
+        For Impala tables that use the file formats Parquet, RCFile, SequenceFile,
+        Avro, and uncompressed text, the setting <code class="ph codeph">fs.s3a.block.size</code>
+        in the <span class="ph filepath">core-site.xml</span> configuration file determines
+        how Impala divides the I/O work of reading the data files. This configuration
+        setting is specified in bytes. By default, this
+        value is 33554432 (32 MB), meaning that Impala parallelizes S3 read operations on the files
+        as if they were made up of 32 MB blocks. For example, if your S3 queries primarily access
+        Parquet files written by MapReduce or Hive, increase <code class="ph codeph">fs.s3a.block.size</code>
+        to 134217728 (128 MB) to match the row group size of those files. If most S3 queries involve
+        Parquet files written by Impala, increase <code class="ph codeph">fs.s3a.block.size</code>
+        to 268435456 (256 MB) to match the row group size produced by Impala.
+      </p>
+
+        <p class="p">
+        Because of differences between S3 and traditional filesystems, DML operations
+        for S3 tables can take longer than for tables on HDFS. For example, both the
+        <code class="ph codeph">LOAD DATA</code> statement and the final stage of the <code class="ph codeph">INSERT</code>
+        and <code class="ph codeph">CREATE TABLE AS SELECT</code> statements involve moving files from one directory
+        to another. (In the case of <code class="ph codeph">INSERT</code> and <code class="ph codeph">CREATE TABLE AS SELECT</code>,
+        the files are moved from a temporary staging directory to the final destination directory.)
+        Because S3 does not support a <span class="q">"rename"</span> operation for existing objects, in these cases Impala
+        actually copies the data files from one location to another and then removes the original files.
+        In <span class="keyword">Impala 2.6</span>, the <code class="ph codeph">S3_SKIP_INSERT_STAGING</code> query option provides a way
+        to speed up <code class="ph codeph">INSERT</code> statements for S3 tables and partitions, with the tradeoff
+        that a problem during statement execution could leave data in an inconsistent state.
+        It does not apply to <code class="ph codeph">INSERT OVERWRITE</code> or <code class="ph codeph">LOAD DATA</code> statements.
+        See <a class="xref" href="../shared/../topics/impala_s3_skip_insert_staging.html#s3_skip_insert_staging">S3_SKIP_INSERT_STAGING Query Option (Impala 2.6 or higher only)</a> for details.
+      </p>
+
+        <p class="p">
+          When optimizing aspects of for complex queries such as the join order, Impala treats tables on HDFS and
+          S3 the same way. Therefore, follow all the same tuning recommendations for S3 tables as for HDFS ones,
+          such as using the <code class="ph codeph">COMPUTE STATS</code> statement to help Impala construct accurate estimates of
+          row counts and cardinality. See <a class="xref" href="impala_performance.html#performance">Tuning Impala for Performance</a> for details.
+        </p>
+
+        <p class="p">
+          In query profile reports, the numbers for <code class="ph codeph">BytesReadLocal</code>,
+          <code class="ph codeph">BytesReadShortCircuit</code>, <code class="ph codeph">BytesReadDataNodeCached</code>, and
+          <code class="ph codeph">BytesReadRemoteUnexpected</code> are blank because those metrics come from HDFS.
+          If you do see any indications that a query against an S3 table performed <span class="q">"remote read"</span>
+          operations, do not be alarmed. That is expected because, by definition, all the I/O for S3 tables involves
+          remote reads.
+        </p>
+
+      </div>
+
+    </article>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title11" id="s3__s3_restrictions">
+
+    <h2 class="title topictitle2" id="ariaid-title11">Restrictions on Impala Support for S3</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        Impala requires that the default filesystem for the cluster be HDFS. You cannot use S3 as the only
+        filesystem in the cluster.
+      </p>
+
+      <p class="p">
+        Prior to <span class="keyword">Impala 2.6</span> Impala could not perform DML operations (<code class="ph codeph">INSERT</code>,
+        <code class="ph codeph">LOAD DATA</code>, or <code class="ph codeph">CREATE TABLE AS SELECT</code>) where the destination is a table
+        or partition located on an S3 filesystem. This restriction is lifted in <span class="keyword">Impala 2.6</span> and higher.
+      </p>
+
+      <p class="p">
+        Impala does not support the old <code class="ph codeph">s3://</code> block-based and <code class="ph codeph">s3n://</code> filesystem
+        schemes, only <code class="ph codeph">s3a://</code>.
+      </p>
+
+      <p class="p">
+        Although S3 is often used to store JSON-formatted data, the current Impala support for S3 does not include
+        directly querying JSON data. For Impala queries, use data files in one of the file formats listed in
+        <a class="xref" href="impala_file_formats.html#file_formats">How Impala Works with Hadoop File Formats</a>. If you have data in JSON format, you can prepare a
+        flattened version of that data for querying by Impala as part of your ETL cycle.
+      </p>
+
+      <p class="p">
+        You cannot use the <code class="ph codeph">ALTER TABLE ... SET CACHED</code> statement for tables or partitions that are
+        located in S3.
+      </p>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title12" id="s3__s3_best_practices">
+    <h2 class="title topictitle2" id="ariaid-title12">Best Practices for Using Impala with S3</h2>
+
+    <div class="body conbody">
+      <p class="p">
+        The following guidelines represent best practices derived from testing and field experience with Impala on S3:
+      </p>
+      <ul class="ul">
+        <li class="li">
+          <p class="p">
+            Any reference to an S3 location must be fully qualified. (This rule applies when
+            S3 is not designated as the default filesystem.)
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            Set the safety valve <code class="ph codeph">fs.s3a.connection.maximum</code> to 1500 for <span class="keyword cmdname">impalad</span>.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            Set safety valve <code class="ph codeph">fs.s3a.block.size</code> to 134217728
+            (128 MB in bytes) if most Parquet files queried by Impala were written by Hive
+            or ParquetMR jobs. Set the block size to 268435456 (256 MB in bytes) if most Parquet
+            files queried by Impala were written by Impala.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            <code class="ph codeph">DROP TABLE .. PURGE</code> is much faster than the default <code class="ph codeph">DROP TABLE</code>.
+            The same applies to <code class="ph codeph">ALTER TABLE ... DROP PARTITION PURGE</code>
+            versus the default <code class="ph codeph">DROP PARTITION</code> operation.
+            However, due to the eventually consistent nature of S3, the files for that
+            table or partition could remain for some unbounded time when using <code class="ph codeph">PURGE</code>.
+            The default <code class="ph codeph">DROP TABLE/PARTITION</code> is slow because Impala copies the files to the HDFS trash folder,
+            and Impala waits until all the data is moved. <code class="ph codeph">DROP TABLE/PARTITION .. PURGE</code> is a
+            fast delete operation, and the Impala statement finishes quickly even though the change might not
+            have propagated fully throughout S3.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            <code class="ph codeph">INSERT</code> statements are faster than <code class="ph codeph">INSERT OVERWRITE</code> for S3.
+            The query option <code class="ph codeph">S3_SKIP_INSERT_STAGING</code>, which is set to <code class="ph codeph">true</code> by default,
+            skips the staging step for regular <code class="ph codeph">INSERT</code> (but not <code class="ph codeph">INSERT OVERWRITE</code>).
+            This makes the operation much faster, but consistency is not guaranteed: if a node fails during execution, the
+            table could end up with inconsistent data. Set this option to <code class="ph codeph">false</code> if stronger
+            consistency is required, however this setting will make the <code class="ph codeph">INSERT</code> operations slower.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            Too many files in a table can make metadata loading and updating slow on S3.
+            If too many requests are made to S3, S3 has a back-off mechanism and
+            responds slower than usual. You might have many small files because of:
+          </p>
+          <ul class="ul">
+            <li class="li">
+              <p class="p">
+                Too many partitions due to over-granular partitioning. Prefer partitions with
+                many megabytes of data, so that even a query against a single partition can
+                be parallelized effectively.
+              </p>
+            </li>
+            <li class="li">
+              <p class="p">
+                Many small <code class="ph codeph">INSERT</code> queries. Prefer bulk
+                <code class="ph codeph">INSERT</code>s so that more data is written to fewer
+                files.
+              </p>
+            </li>
+          </ul>
+        </li>
+      </ul>
+
+    </div>
+  </article>
+
+
+</article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_s3_skip_insert_staging.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_s3_skip_insert_staging.html b/docs/build3x/html/topics/impala_s3_skip_insert_staging.html
new file mode 100644
index 0000000..72e4be8
--- /dev/null
+++ b/docs/build3x/html/topics/impala_s3_skip_insert_staging.html
@@ -0,0 +1,78 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="s3_skip_insert_staging"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>S3_SKIP_INSERT_STAGING Query Option (Impala 2.6 or higher only)</title></head><body id="s3_skip_insert_staging"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">S3_SKIP_INSERT_STAGING Query Option (<span class="keyword">Impala 2.6</span> or higher only)</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+
+    </p>
+
+    <p class="p">
+      Speeds up <code class="ph codeph">INSERT</code> operations on tables or partitions residing on the
+      Amazon S3 filesystem. The tradeoff is the possibility of inconsistent data left behind
+      if an error occurs partway through the operation.
+    </p>
+
+    <p class="p">
+      By default, Impala write operations to S3 tables and partitions involve a two-stage process.
+      Impala writes intermediate files to S3, then (because S3 does not provide a <span class="q">"rename"</span>
+      operation) those intermediate files are copied to their final location, making the process
+      more expensive as on a filesystem that supports renaming or moving files.
+      This query option makes Impala skip the intermediate files, and instead write the
+      new data directly to the final destination.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+    <div class="note important note_important"><span class="note__title importanttitle">Important:</span>
+      <p class="p">
+        If a host that is participating in the <code class="ph codeph">INSERT</code> operation fails partway through
+        the query, you might be left with a table or partition that contains some but not all of the
+        expected data files. Therefore, this option is most appropriate for a development or test
+        environment where you have the ability to reconstruct the table if a problem during
+        <code class="ph codeph">INSERT</code> leaves the data in an inconsistent state.
+      </p>
+    </div>
+
+    <p class="p">
+      The timing of file deletion during an <code class="ph codeph">INSERT OVERWRITE</code> operation
+      makes it impractical to write new files to S3 and delete the old files in a single operation.
+      Therefore, this query option only affects regular <code class="ph codeph">INSERT</code> statements that add
+      to the existing data in a table, not <code class="ph codeph">INSERT OVERWRITE</code> statements.
+      Use <code class="ph codeph">TRUNCATE TABLE</code> if you need to remove all contents from an S3 table
+      before performing a fast <code class="ph codeph">INSERT</code> with this option enabled.
+    </p>
+
+    <p class="p">
+      Performance improvements with this option enabled can be substantial. The speed increase
+      might be more noticeable for non-partitioned tables than for partitioned tables.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Type:</strong> Boolean; recognized values are 1 and 0, or <code class="ph codeph">true</code> and <code class="ph codeph">false</code>;
+        any other value interpreted as <code class="ph codeph">false</code>
+      </p>
+    <p class="p">
+        <strong class="ph b">Default:</strong> <code class="ph codeph">true</code> (shown as 1 in output of <code class="ph codeph">SET</code> statement)
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.6.0</span>
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+    <p class="p">
+      <a class="xref" href="impala_s3.html#s3">Using Impala with the Amazon S3 Filesystem</a>
+    </p>
+
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>

[34/51] [partial] impala git commit: [DOCS] Impala doc site update for 3.0

Posted by mi...@apache.org.

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_distinct.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_distinct.html b/docs/build3x/html/topics/impala_distinct.html
new file mode 100644
index 0000000..08d6232
--- /dev/null
+++ b/docs/build3x/html/topics/impala_distinct.html
@@ -0,0 +1,81 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_select.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="distinct"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>DISTINCT Operator</title></head><body id="distinct"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">DISTINCT Operator</h1>
+
+
+  <div class="body conbody">
+
+    <p class="p">
+
+      The <code class="ph codeph">DISTINCT</code> operator in a <code class="ph codeph">SELECT</code> statement filters the result set to
+      remove duplicates:
+    </p>
+
+<pre class="pre codeblock"><code>-- Returns the unique values from one column.
+-- NULL is included in the set of values if any rows have a NULL in this column.
+select distinct c_birth_country from customer;
+-- Returns the unique combinations of values from multiple columns.
+select distinct c_salutation, c_last_name from customer;</code></pre>
+
+    <p class="p">
+      You can use <code class="ph codeph">DISTINCT</code> in combination with an aggregation function, typically
+      <code class="ph codeph">COUNT()</code>, to find how many different values a column contains:
+    </p>
+
+<pre class="pre codeblock"><code>-- Counts the unique values from one column.
+-- NULL is not included as a distinct value in the count.
+select count(distinct c_birth_country) from customer;
+-- Counts the unique combinations of values from multiple columns.
+select count(distinct c_salutation, c_last_name) from customer;</code></pre>
+
+    <p class="p">
+      One construct that Impala SQL does <em class="ph i">not</em> support is using <code class="ph codeph">DISTINCT</code> in more than one
+      aggregation function in the same query. For example, you could not have a single query with both
+      <code class="ph codeph">COUNT(DISTINCT c_first_name)</code> and <code class="ph codeph">COUNT(DISTINCT c_last_name)</code> in the
+      <code class="ph codeph">SELECT</code> list.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Zero-length strings:</strong> For purposes of clauses such as <code class="ph codeph">DISTINCT</code> and <code class="ph codeph">GROUP
+        BY</code>, Impala considers zero-length strings (<code class="ph codeph">""</code>), <code class="ph codeph">NULL</code>, and space
+        to all be different values.
+      </p>
+
+    <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+        <p class="p">
+          By default, Impala only allows a single <code class="ph codeph">COUNT(DISTINCT <var class="keyword varname">columns</var>)</code>
+          expression in each query.
+        </p>
+        <p class="p">
+          If you do not need precise accuracy, you can produce an estimate of the distinct values for a column by
+          specifying <code class="ph codeph">NDV(<var class="keyword varname">column</var>)</code>; a query can contain multiple instances of
+          <code class="ph codeph">NDV(<var class="keyword varname">column</var>)</code>. To make Impala automatically rewrite
+          <code class="ph codeph">COUNT(DISTINCT)</code> expressions to <code class="ph codeph">NDV()</code>, enable the
+          <code class="ph codeph">APPX_COUNT_DISTINCT</code> query option.
+        </p>
+        <p class="p">
+          To produce the same result as multiple <code class="ph codeph">COUNT(DISTINCT)</code> expressions, you can use the
+          following technique for queries involving a single table:
+        </p>
+<pre class="pre codeblock"><code>select v1.c1 result1, v2.c1 result2 from
+  (select count(distinct col1) as c1 from t1) v1
+    cross join
+  (select count(distinct col2) as c1 from t1) v2;
+</code></pre>
+        <p class="p">
+          Because <code class="ph codeph">CROSS JOIN</code> is an expensive operation, prefer to use the <code class="ph codeph">NDV()</code>
+          technique wherever practical.
+        </p>
+      </div>
+
+    <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+      <p class="p">
+        In contrast with some database systems that always return <code class="ph codeph">DISTINCT</code> values in sorted order,
+        Impala does not do any ordering of <code class="ph codeph">DISTINCT</code> values. Always include an <code class="ph codeph">ORDER
+        BY</code> clause if you need the values in alphabetical or numeric sorted order.
+      </p>
+    </div>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_select.html">SELECT Statement</a></div></div></nav></article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_dml.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_dml.html b/docs/build3x/html/topics/impala_dml.html
new file mode 100644
index 0000000..4fb1296
--- /dev/null
+++ b/docs/build3x/html/topics/impala_dml.html
@@ -0,0 +1,82 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_langref_sql.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="dml"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>DML Statements</title></head><body id="dml"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">DML Statements</h1>
+
+
+  <div class="body conbody">
+
+    <p class="p">
+      DML refers to <span class="q">"Data Manipulation Language"</span>, a subset of SQL statements that modify the data stored in
+      tables. Because Impala focuses on query performance and leverages the append-only nature of HDFS storage,
+      currently Impala only supports a small set of DML statements:
+    </p>
+
+    <ul class="ul">
+      <li class="li">
+        <a class="xref" href="impala_delete.html">DELETE Statement (Impala 2.8 or higher only)</a>. Works for Kudu tables only.
+      </li>
+
+      <li class="li">
+        <a class="xref" href="impala_insert.html">INSERT Statement</a>.
+      </li>
+
+      <li class="li">
+        <a class="xref" href="impala_load_data.html">LOAD DATA Statement</a>. Does not apply for HBase or Kudu tables.
+      </li>
+
+      <li class="li">
+        <a class="xref" href="impala_update.html">UPDATE Statement (Impala 2.8 or higher only)</a>. Works for Kudu tables only.
+      </li>
+
+      <li class="li">
+        <a class="xref" href="impala_upsert.html">UPSERT Statement (Impala 2.8 or higher only)</a>. Works for Kudu tables only.
+      </li>
+    </ul>
+
+    <p class="p">
+      <code class="ph codeph">INSERT</code> in Impala is primarily optimized for inserting large volumes of data in a single
+      statement, to make effective use of the multi-megabyte HDFS blocks. This is the way in Impala to create new
+      data files. If you intend to insert one or a few rows at a time, such as using the <code class="ph codeph">INSERT ...
+      VALUES</code> syntax, that technique is much more efficient for Impala tables stored in HBase. See
+      <a class="xref" href="impala_hbase.html#impala_hbase">Using Impala to Query HBase Tables</a> for details.
+    </p>
+
+    <p class="p">
+      <code class="ph codeph">LOAD DATA</code> moves existing data files into the directory for an Impala table, making them
+      immediately available for Impala queries. This is one way in Impala to work with data files produced by other
+      Hadoop components. (<code class="ph codeph">CREATE EXTERNAL TABLE</code> is the other alternative; with external tables,
+      you can query existing data files, while the files remain in their original location.)
+    </p>
+
+    <p class="p">
+      In <span class="keyword">Impala 2.8</span> and higher, Impala does support the <code class="ph codeph">UPDATE</code>, <code class="ph codeph">DELETE</code>,
+      and <code class="ph codeph">UPSERT</code> statements for Kudu tables.
+      For HDFS or S3 tables, to simulate the effects of an <code class="ph codeph">UPDATE</code> or <code class="ph codeph">DELETE</code> statement
+      in other database systems, typically you use <code class="ph codeph">INSERT</code> or <code class="ph codeph">CREATE TABLE AS SELECT</code> to copy data
+      from one table to another, filtering out or changing the appropriate rows during the copy operation.
+    </p>
+
+    <p class="p">
+      You can also achieve a result similar to <code class="ph codeph">UPDATE</code> by using Impala tables stored in HBase.
+      When you insert a row into an HBase table, and the table
+      already contains a row with the same value for the key column, the older row is hidden, effectively the same
+      as a single-row <code class="ph codeph">UPDATE</code>.
+    </p>
+
+    <p class="p">
+      Impala can perform DML operations for tables or partitions stored in the Amazon S3 filesystem
+      with <span class="keyword">Impala 2.6</span> and higher. See <a class="xref" href="impala_s3.html#s3">Using Impala with the Amazon S3 Filesystem</a> for details.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+    <p class="p">
+      The other major classifications of SQL statements are data definition language (see
+      <a class="xref" href="impala_ddl.html#ddl">DDL Statements</a>) and queries (see <a class="xref" href="impala_select.html#select">SELECT Statement</a>).
+    </p>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_langref_sql.html">Impala SQL Statements</a></div></div></nav></article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_double.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_double.html b/docs/build3x/html/topics/impala_double.html
new file mode 100644
index 0000000..afff3cf
--- /dev/null
+++ b/docs/build3x/html/topics/impala_double.html
@@ -0,0 +1,157 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_datatypes.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="double"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>DOUBLE Data Type</title></head><body id="double"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">DOUBLE Data Type</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+      A double precision floating-point data type used in <code class="ph codeph">CREATE TABLE</code> and <code class="ph codeph">ALTER
+      TABLE</code> statements.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+    <p class="p">
+      In the column definition of a <code class="ph codeph">CREATE TABLE</code> statement:
+    </p>
+
+<pre class="pre codeblock"><code><var class="keyword varname">column_name</var> DOUBLE</code></pre>
+
+    <p class="p">
+      <strong class="ph b">Range:</strong> 4.94065645841246544e-324d .. 1.79769313486231570e+308, positive or negative
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Precision:</strong> 15 to 17 significant digits, depending on usage. The number of significant digits does
+      not depend on the position of the decimal point.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Representation:</strong> The values are stored in 8 bytes, using
+      <a class="xref" href="https://en.wikipedia.org/wiki/Double-precision_floating-point_format" target="_blank">IEEE 754 Double Precision Binary Floating Point</a> format.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Conversions:</strong> Impala does not automatically convert <code class="ph codeph">DOUBLE</code> to any other type. You can
+      use <code class="ph codeph">CAST()</code> to convert <code class="ph codeph">DOUBLE</code> values to <code class="ph codeph">FLOAT</code>,
+      <code class="ph codeph">TINYINT</code>, <code class="ph codeph">SMALLINT</code>, <code class="ph codeph">INT</code>, <code class="ph codeph">BIGINT</code>,
+      <code class="ph codeph">STRING</code>, <code class="ph codeph">TIMESTAMP</code>, or <code class="ph codeph">BOOLEAN</code>. You can use exponential
+      notation in <code class="ph codeph">DOUBLE</code> literals or when casting from <code class="ph codeph">STRING</code>, for example
+      <code class="ph codeph">1.0e6</code> to represent one million.
+      <span class="ph">
+          Casting an integer or floating-point value <code class="ph codeph">N</code> to
+        <code class="ph codeph">TIMESTAMP</code> produces a value that is <code class="ph codeph">N</code> seconds past the start of the epoch
+        date (January 1, 1970). By default, the result value represents a date and time in the UTC time zone.
+        If the setting <code class="ph codeph">--use_local_tz_for_unix_timestamp_conversions=true</code> is in effect,
+        the resulting <code class="ph codeph">TIMESTAMP</code> represents a date and time in the local time zone.
+        </span>
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+    <p class="p">
+      The data type <code class="ph codeph">REAL</code> is an alias for <code class="ph codeph">DOUBLE</code>.
+    </p>
+
+
+    <p class="p">
+        Impala does not evaluate NaN (not a number) as equal to any other numeric values,
+        including other NaN values. For example, the following statement, which evaluates equality
+        between two NaN values, returns <code class="ph codeph">false</code>:
+      </p>
+
+<pre class="pre codeblock"><code>
+SELECT CAST('nan' AS DOUBLE)=CAST('nan' AS DOUBLE);
+</code></pre>
+
+    <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>CREATE TABLE t1 (x DOUBLE);
+SELECT CAST(1000.5 AS DOUBLE);
+</code></pre>
+
+    <p class="p">
+        <strong class="ph b">Partitioning:</strong> Because fractional values of this type are not always represented precisely, when this
+        type is used for a partition key column, the underlying HDFS directories might not be named exactly as you
+        expect. Prefer to partition on a <code class="ph codeph">DECIMAL</code> column instead.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">HBase considerations:</strong> This data type is fully compatible with HBase tables.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Parquet considerations:</strong> This type is fully compatible with Parquet tables.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Text table considerations:</strong> Values of this type are potentially larger in text tables than in tables
+        using Parquet or other binary formats.
+      </p>
+
+
+
+    <p class="p">
+        <strong class="ph b">Internal details:</strong> Represented in memory as an 8-byte value.
+      </p>
+
+
+
+    <p class="p">
+        <strong class="ph b">Column statistics considerations:</strong> Because this type has a fixed size, the maximum and average size
+        fields are always filled in for column statistics, even before you run the <code class="ph codeph">COMPUTE STATS</code>
+        statement.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Restrictions:</strong>
+      </p>
+
+
+
+    <p class="p">
+        Due to the way arithmetic on <code class="ph codeph">FLOAT</code> and <code class="ph codeph">DOUBLE</code> columns uses
+        high-performance hardware instructions, and distributed queries can perform these operations in different
+        order for each query, results can vary slightly for aggregate function calls such as <code class="ph codeph">SUM()</code>
+        and <code class="ph codeph">AVG()</code> for <code class="ph codeph">FLOAT</code> and <code class="ph codeph">DOUBLE</code> columns, particularly on
+        large data sets where millions or billions of values are summed or averaged. For perfect consistency and
+        repeatability, use the <code class="ph codeph">DECIMAL</code> data type for such operations instead of
+        <code class="ph codeph">FLOAT</code> or <code class="ph codeph">DOUBLE</code>.
+      </p>
+
+    <p class="p">
+        The inability to exactly represent certain floating-point values means that
+        <code class="ph codeph">DECIMAL</code> is sometimes a better choice than <code class="ph codeph">DOUBLE</code>
+        or <code class="ph codeph">FLOAT</code> when precision is critical, particularly when
+        transferring data from other database systems that use different representations
+        or file formats.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Kudu considerations:</strong>
+      </p>
+    <p class="p">
+        Currently, the data types <code class="ph codeph">BOOLEAN</code>, <code class="ph codeph">FLOAT</code>,
+        and <code class="ph codeph">DOUBLE</code> cannot be used for primary key columns in Kudu tables.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+    <p class="p">
+      <a class="xref" href="impala_literals.html#numeric_literals">Numeric Literals</a>, <a class="xref" href="impala_math_functions.html#math_functions">Impala Mathematical Functions</a>,
+      <a class="xref" href="impala_float.html#float">FLOAT Data Type</a>
+    </p>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_datatypes.html">Data Types</a></div></div></nav></article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_drop_database.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_drop_database.html b/docs/build3x/html/topics/impala_drop_database.html
new file mode 100644
index 0000000..9bbda27
--- /dev/null
+++ b/docs/build3x/html/topics/impala_drop_database.html
@@ -0,0 +1,193 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_langref_sql.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="drop_database"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>DROP DATABASE Statement</title></head><body id="drop_database"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">DROP DATABASE Statement</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+
+      Removes a database from the system. The physical operations involve removing the metadata for the database
+      from the metastore, and deleting the corresponding <code class="ph codeph">*.db</code> directory from HDFS.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>DROP (DATABASE|SCHEMA) [IF EXISTS] <var class="keyword varname">database_name</var> <span class="ph">[RESTRICT | CASCADE]</span>;</code></pre>
+
+    <p class="p">
+        <strong class="ph b">Statement type:</strong> DDL
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+    <p class="p">
+      By default, the database must be empty before it can be dropped, to avoid losing any data.
+    </p>
+
+    <p class="p">
+      In <span class="keyword">Impala 2.3</span> and higher, you can include the <code class="ph codeph">CASCADE</code>
+      clause to make Impala drop all tables and other objects in the database before dropping the database itself.
+      The <code class="ph codeph">RESTRICT</code> clause enforces the original requirement that the database be empty
+      before being dropped. Because the <code class="ph codeph">RESTRICT</code> behavior is still the default, this
+      clause is optional.
+    </p>
+
+    <p class="p">
+      The automatic dropping resulting from the <code class="ph codeph">CASCADE</code> clause follows the same rules as the
+      corresponding <code class="ph codeph">DROP TABLE</code>, <code class="ph codeph">DROP VIEW</code>, and <code class="ph codeph">DROP FUNCTION</code> statements.
+      In particular, the HDFS directories and data files for any external tables are left behind when the
+      tables are removed.
+    </p>
+
+    <p class="p">
+      When you do not use the <code class="ph codeph">CASCADE</code> clause, drop or move all the objects inside the database manually
+      before dropping the database itself:
+    </p>
+
+    <ul class="ul">
+      <li class="li">
+        <p class="p">
+          Use the <code class="ph codeph">SHOW TABLES</code> statement to locate all tables and views in the database,
+          and issue <code class="ph codeph">DROP TABLE</code> and <code class="ph codeph">DROP VIEW</code> statements to remove them all.
+        </p>
+      </li>
+      <li class="li">
+        <p class="p">
+          Use the <code class="ph codeph">SHOW FUNCTIONS</code> and <code class="ph codeph">SHOW AGGREGATE FUNCTIONS</code> statements
+          to locate all user-defined functions in the database, and issue <code class="ph codeph">DROP FUNCTION</code>
+          and <code class="ph codeph">DROP AGGREGATE FUNCTION</code> statements to remove them all.
+        </p>
+      </li>
+      <li class="li">
+        <p class="p">
+          To keep tables or views contained by a database while removing the database itself, use
+          <code class="ph codeph">ALTER TABLE</code> and <code class="ph codeph">ALTER VIEW</code> to move the relevant
+          objects to a different database before dropping the original database.
+        </p>
+      </li>
+    </ul>
+
+    <p class="p">
+      You cannot drop the current database, that is, the database your session connected to
+      either through the <code class="ph codeph">USE</code> statement or the <code class="ph codeph">-d</code> option of <span class="keyword cmdname">impala-shell</span>.
+      Issue a <code class="ph codeph">USE</code> statement to switch to a different database first.
+      Because the <code class="ph codeph">default</code> database is always available, issuing
+      <code class="ph codeph">USE default</code> is a convenient way to leave the current database
+      before dropping it.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Hive considerations:</strong>
+      </p>
+
+    <p class="p">
+      When you drop a database in Impala, the database can no longer be used by Hive.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+
+
+    <p class="p">
+      See <a class="xref" href="impala_create_database.html#create_database">CREATE DATABASE Statement</a> for examples covering <code class="ph codeph">CREATE
+      DATABASE</code>, <code class="ph codeph">USE</code>, and <code class="ph codeph">DROP DATABASE</code>.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Amazon S3 considerations:</strong>
+      </p>
+
+    <p class="p">
+        In <span class="keyword">Impala 2.6</span> and higher, Impala DDL statements such as
+        <code class="ph codeph">CREATE DATABASE</code>, <code class="ph codeph">CREATE TABLE</code>, <code class="ph codeph">DROP DATABASE CASCADE</code>,
+        <code class="ph codeph">DROP TABLE</code>, and <code class="ph codeph">ALTER TABLE [ADD|DROP] PARTITION</code> can create or remove folders
+        as needed in the Amazon S3 system. Prior to <span class="keyword">Impala 2.6</span>, you had to create folders yourself and point
+        Impala database, tables, or partitions at them, and manually remove folders when no longer needed.
+        See <a class="xref" href="../shared/../topics/impala_s3.html#s3">Using Impala with the Amazon S3 Filesystem</a> for details about reading and writing S3 data with Impala.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Cancellation:</strong> Cannot be cancelled.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">HDFS permissions:</strong>
+      </p>
+    <p class="p">
+      The user ID that the <span class="keyword cmdname">impalad</span> daemon runs under,
+      typically the <code class="ph codeph">impala</code> user, must have write
+      permission for the directory associated with the database.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+    <pre class="pre codeblock"><code>create database first_db;
+use first_db;
+create table t1 (x int);
+
+create database second_db;
+use second_db;
+-- Each database has its own namespace for tables.
+-- You can reuse the same table names in each database.
+create table t1 (s string);
+
+create database temp;
+
+-- You can either USE a database after creating it,
+-- or qualify all references to the table name with the name of the database.
+-- Here, tables T2 and T3 are both created in the TEMP database.
+
+create table temp.t2 (x int, y int);
+use database temp;
+create table t3 (s string);
+
+-- You cannot drop a database while it is selected by the USE statement.
+drop database temp;
+<em class="ph i">ERROR: AnalysisException: Cannot drop current default database: temp</em>
+
+-- The always-available database 'default' is a convenient one to USE
+-- before dropping a database you created.
+use default;
+
+-- Before dropping a database, first drop all the tables inside it,
+<span class="ph">-- or in <span class="keyword">Impala 2.3</span> and higher use the CASCADE clause.</span>
+drop database temp;
+ERROR: ImpalaRuntimeException: Error making 'dropDatabase' RPC to Hive Metastore:
+CAUSED BY: InvalidOperationException: Database temp is not empty
+show tables in temp;
++------+
+| name |
++------+
+| t3   |
++------+
+
+<span class="ph">-- <span class="keyword">Impala 2.3</span> and higher:</span>
+<span class="ph">drop database temp cascade;</span>
+
+-- Earlier releases:
+drop table temp.t3;
+drop database temp;
+</code></pre>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+    <p class="p">
+      <a class="xref" href="impala_databases.html#databases">Overview of Impala Databases</a>, <a class="xref" href="impala_create_database.html#create_database">CREATE DATABASE Statement</a>,
+      <a class="xref" href="impala_use.html#use">USE Statement</a>, <a class="xref" href="impala_show.html#show_databases">SHOW DATABASES</a>, <a class="xref" href="impala_drop_table.html#drop_table">DROP TABLE Statement</a>
+    </p>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_langref_sql.html">Impala SQL Statements</a></div></div></nav></article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_drop_function.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_drop_function.html b/docs/build3x/html/topics/impala_drop_function.html
new file mode 100644
index 0000000..a398e94
--- /dev/null
+++ b/docs/build3x/html/topics/impala_drop_function.html
@@ -0,0 +1,136 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_langref_sql.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="drop_function"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>DROP FUNCTION Statement</title></head><body id="drop_function"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">DROP FUNCTION Statement</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+
+      Removes a user-defined function (UDF), so that it is not available for execution during Impala
+      <code class="ph codeph">SELECT</code> or <code class="ph codeph">INSERT</code> operations.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+    <p class="p">
+      To drop C++ UDFs and UDAs:
+    </p>
+
+<pre class="pre codeblock"><code>DROP [AGGREGATE] FUNCTION [IF EXISTS] [<var class="keyword varname">db_name</var>.]<var class="keyword varname">function_name</var>(<var class="keyword varname">type</var>[, <var class="keyword varname">type</var>...])</code></pre>
+
+    <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+      <p class="p">
+        The preceding syntax, which includes the function signature, also applies to Java UDFs that were created
+        using the corresponding <code class="ph codeph">CREATE FUNCTION</code> syntax that includes the argument and return types.
+        After upgrading to <span class="keyword">Impala 2.5</span> or higher, consider re-creating all Java UDFs with the
+        <code class="ph codeph">CREATE FUNCTION</code> syntax that does not include the function signature. Java UDFs created this
+        way are now persisted in the metastore database and do not need to be re-created after an Impala restart.
+      </p>
+    </div>
+
+    <p class="p">
+      To drop Java UDFs (created using the <code class="ph codeph">CREATE FUNCTION</code> syntax with no function signature):
+    </p>
+
+<pre class="pre codeblock"><code>DROP FUNCTION [IF EXISTS] [<var class="keyword varname">db_name</var>.]<var class="keyword varname">function_name</var></code></pre>
+
+
+
+    <p class="p">
+        <strong class="ph b">Statement type:</strong> DDL
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+    <p class="p">
+      Because the same function name could be overloaded with different argument signatures, you specify the
+      argument types to identify the exact function to drop.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Restrictions:</strong>
+      </p>
+
+    <p class="p">
+        In <span class="keyword">Impala 2.5</span> and higher, Impala UDFs and UDAs written in C++ are persisted in the metastore database.
+        Java UDFs are also persisted, if they were created with the new <code class="ph codeph">CREATE FUNCTION</code> syntax for Java UDFs,
+        where the Java function argument and return types are omitted.
+        Java-based UDFs created with the old <code class="ph codeph">CREATE FUNCTION</code> syntax do not persist across restarts
+        because they are held in the memory of the <span class="keyword cmdname">catalogd</span> daemon.
+        Until you re-create such Java UDFs using the new <code class="ph codeph">CREATE FUNCTION</code> syntax,
+        you must reload those Java-based UDFs by running the original <code class="ph codeph">CREATE FUNCTION</code> statements again each time
+        you restart the <span class="keyword cmdname">catalogd</span> daemon.
+        Prior to <span class="keyword">Impala 2.5</span> the requirement to reload functions after a restart applied to both C++ and Java functions.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Cancellation:</strong> Cannot be cancelled.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">HDFS permissions:</strong>
+      </p>
+    <p class="p">
+      The user ID that the <span class="keyword cmdname">impalad</span> daemon runs under,
+      typically the <code class="ph codeph">impala</code> user, does not need any
+      particular HDFS permissions to perform this statement.
+      All read and write operations are on the metastore database,
+      not HDFS files and directories.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+    <p class="p">
+      The following example shows how to drop Java functions created with the signatureless
+      <code class="ph codeph">CREATE FUNCTION</code> syntax in <span class="keyword">Impala 2.5</span> and higher.
+      Issuing <code class="ph codeph">DROP FUNCTION <var class="keyword varname">function_name</var></code> removes all the
+      overloaded functions under that name.
+      (See <a class="xref" href="impala_create_function.html#create_function">CREATE FUNCTION Statement</a> for a longer example
+      showing how to set up such functions in the first place.)
+    </p>
+<pre class="pre codeblock"><code>
+create function my_func location '/user/impala/udfs/udf-examples.jar'
+  symbol='org.apache.impala.TestUdf';
+
+show functions;
++-------------+---------------------------------------+-------------+---------------+
+| return type | signature                             | binary type | is persistent |
++-------------+---------------------------------------+-------------+---------------+
+| BIGINT      | my_func(BIGINT)                       | JAVA        | true          |
+| BOOLEAN     | my_func(BOOLEAN)                      | JAVA        | true          |
+| BOOLEAN     | my_func(BOOLEAN, BOOLEAN)             | JAVA        | true          |
+...
+| BIGINT      | testudf(BIGINT)                       | JAVA        | true          |
+| BOOLEAN     | testudf(BOOLEAN)                      | JAVA        | true          |
+| BOOLEAN     | testudf(BOOLEAN, BOOLEAN)             | JAVA        | true          |
+...
+
+drop function my_func;
+show functions;
++-------------+---------------------------------------+-------------+---------------+
+| return type | signature                             | binary type | is persistent |
++-------------+---------------------------------------+-------------+---------------+
+| BIGINT      | testudf(BIGINT)                       | JAVA        | true          |
+| BOOLEAN     | testudf(BOOLEAN)                      | JAVA        | true          |
+| BOOLEAN     | testudf(BOOLEAN, BOOLEAN)             | JAVA        | true          |
+...
+</code></pre>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+    <p class="p">
+      <a class="xref" href="impala_udf.html#udfs">Impala User-Defined Functions (UDFs)</a>, <a class="xref" href="impala_create_function.html#create_function">CREATE FUNCTION Statement</a>
+    </p>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_langref_sql.html">Impala SQL Statements</a></div></div></nav></article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_drop_role.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_drop_role.html b/docs/build3x/html/topics/impala_drop_role.html
new file mode 100644
index 0000000..53a5c73
--- /dev/null
+++ b/docs/build3x/html/topics/impala_drop_role.html
@@ -0,0 +1,71 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_langref_sql.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="drop_role"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>DROP ROLE Statement (Impala 2.0 or higher only)</title></head><body id="drop_role"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">DROP ROLE Statement (<span class="keyword">Impala 2.0</span> or higher only)</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+
+
+      The <code class="ph codeph">DROP ROLE</code> statement removes a role from the metastore database. Once dropped, the role
+      is revoked for all users to whom it was previously assigned, and all privileges granted to that role are
+      revoked. Queries that are already executing are not affected. Impala verifies the role information
+      approximately every 60 seconds, so the effects of <code class="ph codeph">DROP ROLE</code> might not take effect for new
+      Impala queries for a brief period.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>DROP ROLE <var class="keyword varname">role_name</var>
+</code></pre>
+
+    <p class="p">
+        <strong class="ph b">Required privileges:</strong>
+      </p>
+
+    <p class="p">
+      Only administrative users (initially, a predefined set of users specified in the Sentry service configuration
+      file) can use this statement.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Compatibility:</strong>
+      </p>
+
+    <p class="p">
+      Impala makes use of any roles and privileges specified by the <code class="ph codeph">GRANT</code> and
+      <code class="ph codeph">REVOKE</code> statements in Hive, and Hive makes use of any roles and privileges specified by the
+      <code class="ph codeph">GRANT</code> and <code class="ph codeph">REVOKE</code> statements in Impala. The Impala <code class="ph codeph">GRANT</code>
+      and <code class="ph codeph">REVOKE</code> statements for privileges do not require the <code class="ph codeph">ROLE</code> keyword to be
+      repeated before each role name, unlike the equivalent Hive statements.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+    <p class="p">
+      <a class="xref" href="impala_authorization.html#authorization">Enabling Sentry Authorization for Impala</a>, <a class="xref" href="impala_grant.html#grant">GRANT Statement (Impala 2.0 or higher only)</a>
+      <a class="xref" href="impala_revoke.html#revoke">REVOKE Statement (Impala 2.0 or higher only)</a>, <a class="xref" href="impala_create_role.html#create_role">CREATE ROLE Statement (Impala 2.0 or higher only)</a>,
+      <a class="xref" href="impala_show.html#show">SHOW Statement</a>
+    </p>
+
+
+
+    <p class="p">
+        <strong class="ph b">Cancellation:</strong> Cannot be cancelled.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">HDFS permissions:</strong> This statement does not touch any HDFS files or directories,
+        therefore no HDFS permissions are required.
+      </p>
+
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_langref_sql.html">Impala SQL Statements</a></div></div></nav></article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_drop_stats.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_drop_stats.html b/docs/build3x/html/topics/impala_drop_stats.html
new file mode 100644
index 0000000..2175f20
--- /dev/null
+++ b/docs/build3x/html/topics/impala_drop_stats.html
@@ -0,0 +1,285 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_langref_sql.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="drop_stats"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>DROP STATS Statement</title></head><body id="drop_stats"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">DROP STATS Statement</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+
+      Removes the specified statistics from a table or partition. The statistics were originally created by the
+      <code class="ph codeph">COMPUTE STATS</code> or <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> statement.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>DROP STATS [<var class="keyword varname">database_name</var>.]<var class="keyword varname">table_name</var>
+DROP INCREMENTAL STATS [<var class="keyword varname">database_name</var>.]<var class="keyword varname">table_name</var> PARTITION (<var class="keyword varname">partition_spec</var>)
+
+<var class="keyword varname">partition_spec</var> ::= <var class="keyword varname">partition_col</var>=<var class="keyword varname">constant_value</var>
+</code></pre>
+
+    <p class="p">
+        The <code class="ph codeph">PARTITION</code> clause is only allowed in combination with the <code class="ph codeph">INCREMENTAL</code>
+        clause. It is optional for <code class="ph codeph">COMPUTE INCREMENTAL STATS</code>, and required for <code class="ph codeph">DROP
+        INCREMENTAL STATS</code>. Whenever you specify partitions through the <code class="ph codeph">PARTITION
+        (<var class="keyword varname">partition_spec</var>)</code> clause in a <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> or
+        <code class="ph codeph">DROP INCREMENTAL STATS</code> statement, you must include all the partitioning columns in the
+        specification, and specify constant values for all the partition key columns.
+      </p>
+
+    <p class="p">
+      <code class="ph codeph">DROP STATS</code> removes all statistics from the table, whether created by <code class="ph codeph">COMPUTE
+      STATS</code> or <code class="ph codeph">COMPUTE INCREMENTAL STATS</code>.
+    </p>
+
+    <p class="p">
+      <code class="ph codeph">DROP INCREMENTAL STATS</code> only affects incremental statistics for a single partition, specified
+      through the <code class="ph codeph">PARTITION</code> clause. The incremental stats are marked as outdated, so that they are
+      recomputed by the next <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> statement.
+    </p>
+
+
+
+    <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+    <p class="p">
+      You typically use this statement when the statistics for a table or a partition have become stale due to data
+      files being added to or removed from the associated HDFS data directories, whether by manual HDFS operations
+      or <code class="ph codeph">INSERT</code>, <code class="ph codeph">INSERT OVERWRITE</code>, or <code class="ph codeph">LOAD DATA</code> statements, or
+      adding or dropping partitions.
+    </p>
+
+    <p class="p">
+      When a table or partition has no associated statistics, Impala treats it as essentially zero-sized when
+      constructing the execution plan for a query. In particular, the statistics influence the order in which
+      tables are joined in a join query. To ensure proper query planning and good query performance and
+      scalability, make sure to run <code class="ph codeph">COMPUTE STATS</code> or <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> on
+      the table or partition after removing any stale statistics.
+    </p>
+
+    <p class="p">
+      Dropping the statistics is not required for an unpartitioned table or a partitioned table covered by the
+      original type of statistics. A subsequent <code class="ph codeph">COMPUTE STATS</code> statement replaces any existing
+      statistics with new ones, for all partitions, regardless of whether the old ones were outdated. Therefore,
+      this statement was rarely used before the introduction of incremental statistics.
+    </p>
+
+    <p class="p">
+      Dropping the statistics is required for a partitioned table containing incremental statistics, to make a
+      subsequent <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> statement rescan an existing partition. See
+      <a class="xref" href="impala_perf_stats.html#perf_stats">Table and Column Statistics</a> for information about incremental statistics, a new feature
+      available in Impala 2.1.0 and higher.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Statement type:</strong> DDL
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Cancellation:</strong> Cannot be cancelled.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">HDFS permissions:</strong>
+      </p>
+    <p class="p">
+      The user ID that the <span class="keyword cmdname">impalad</span> daemon runs under,
+      typically the <code class="ph codeph">impala</code> user, does not need any
+      particular HDFS permissions to perform this statement.
+      All read and write operations are on the metastore database,
+      not HDFS files and directories.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+    <p class="p">
+      The following example shows a partitioned table that has associated statistics produced by the
+      <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> statement, and how the situation evolves as statistics are dropped
+      from specific partitions, then the entire table.
+    </p>
+
+    <p class="p">
+      Initially, all table and column statistics are filled in.
+    </p>
+
+
+
+<pre class="pre codeblock"><code>show table stats item_partitioned;
++-------------+-------+--------+----------+--------------+---------+-----------------
+| i_category  | #Rows | #Files | Size     | Bytes Cached | Format  | Incremental stats
++-------------+-------+--------+----------+--------------+---------+-----------------
+| Books       | 1733  | 1      | 223.74KB | NOT CACHED   | PARQUET | true
+| Children    | 1786  | 1      | 230.05KB | NOT CACHED   | PARQUET | true
+| Electronics | 1812  | 1      | 232.67KB | NOT CACHED   | PARQUET | true
+| Home        | 1807  | 1      | 232.56KB | NOT CACHED   | PARQUET | true
+| Jewelry     | 1740  | 1      | 223.72KB | NOT CACHED   | PARQUET | true
+| Men         | 1811  | 1      | 231.25KB | NOT CACHED   | PARQUET | true
+| Music       | 1860  | 1      | 237.90KB | NOT CACHED   | PARQUET | true
+| Shoes       | 1835  | 1      | 234.90KB | NOT CACHED   | PARQUET | true
+| Sports      | 1783  | 1      | 227.97KB | NOT CACHED   | PARQUET | true
+| Women       | 1790  | 1      | 226.27KB | NOT CACHED   | PARQUET | true
+| Total       | 17957 | 10     | 2.25MB   | 0B           |         |
++-------------+-------+--------+----------+--------------+---------+-----------------
+show column stats item_partitioned;
++------------------+-----------+------------------+--------+----------+--------------
+| Column           | Type      | #Distinct Values | #Nulls | Max Size | Avg Size
++------------------+-----------+------------------+--------+----------+--------------
+| i_item_sk        | INT       | 19443            | -1     | 4        | 4
+| i_item_id        | STRING    | 9025             | -1     | 16       | 16
+| i_rec_start_date | TIMESTAMP | 4                | -1     | 16       | 16
+| i_rec_end_date   | TIMESTAMP | 3                | -1     | 16       | 16
+| i_item_desc      | STRING    | 13330            | -1     | 200      | 100.302803039
+| i_current_price  | FLOAT     | 2807             | -1     | 4        | 4
+| i_wholesale_cost | FLOAT     | 2105             | -1     | 4        | 4
+| i_brand_id       | INT       | 965              | -1     | 4        | 4
+| i_brand          | STRING    | 725              | -1     | 22       | 16.1776008605
+| i_class_id       | INT       | 16               | -1     | 4        | 4
+| i_class          | STRING    | 101              | -1     | 15       | 7.76749992370
+| i_category_id    | INT       | 10               | -1     | 4        | 4
+| i_manufact_id    | INT       | 1857             | -1     | 4        | 4
+| i_manufact       | STRING    | 1028             | -1     | 15       | 11.3295001983
+| i_size           | STRING    | 8                | -1     | 11       | 4.33459997177
+| i_formulation    | STRING    | 12884            | -1     | 20       | 19.9799995422
+| i_color          | STRING    | 92               | -1     | 10       | 5.38089990615
+| i_units          | STRING    | 22               | -1     | 7        | 4.18690013885
+| i_container      | STRING    | 2                | -1     | 7        | 6.99259996414
+| i_manager_id     | INT       | 105              | -1     | 4        | 4
+| i_product_name   | STRING    | 19094            | -1     | 25       | 18.0233001708
+| i_category       | STRING    | 10               | 0      | -1       | -1
++------------------+-----------+------------------+--------+----------+--------------
+</code></pre>
+
+    <p class="p">
+      To remove statistics for particular partitions, use the <code class="ph codeph">DROP INCREMENTAL STATS</code> statement.
+      After removing statistics for two partitions, the table-level statistics reflect that change in the
+      <code class="ph codeph">#Rows</code> and <code class="ph codeph">Incremental stats</code> fields. The counts, maximums, and averages of
+      the column-level statistics are unaffected.
+    </p>
+
+    <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+      (It is possible that the row count might be preserved in future after a <code class="ph codeph">DROP INCREMENTAL
+      STATS</code> statement. Check the resolution of the issue
+      <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-1615" target="_blank">IMPALA-1615</a>.)
+    </div>
+
+<pre class="pre codeblock"><code>drop incremental stats item_partitioned partition (i_category='Sports');
+drop incremental stats item_partitioned partition (i_category='Electronics');
+
+show table stats item_partitioned
++-------------+-------+--------+----------+--------------+---------+------------------
+| i_category  | #Rows | #Files | Size     | Bytes Cached | Format  | Incremental stats
++-------------+-------+--------+----------+--------------+---------+-----------------
+| Books       | 1733  | 1      | 223.74KB | NOT CACHED   | PARQUET | true
+| Children    | 1786  | 1      | 230.05KB | NOT CACHED   | PARQUET | true
+| Electronics | -1    | 1      | 232.67KB | NOT CACHED   | PARQUET | false
+| Home        | 1807  | 1      | 232.56KB | NOT CACHED   | PARQUET | true
+| Jewelry     | 1740  | 1      | 223.72KB | NOT CACHED   | PARQUET | true
+| Men         | 1811  | 1      | 231.25KB | NOT CACHED   | PARQUET | true
+| Music       | 1860  | 1      | 237.90KB | NOT CACHED   | PARQUET | true
+| Shoes       | 1835  | 1      | 234.90KB | NOT CACHED   | PARQUET | true
+| Sports      | -1    | 1      | 227.97KB | NOT CACHED   | PARQUET | false
+| Women       | 1790  | 1      | 226.27KB | NOT CACHED   | PARQUET | true
+| Total       | 17957 | 10     | 2.25MB   | 0B           |         |
++-------------+-------+--------+----------+--------------+---------+-----------------
+show column stats item_partitioned
++------------------+-----------+------------------+--------+----------+--------------
+| Column           | Type      | #Distinct Values | #Nulls | Max Size | Avg Size
++------------------+-----------+------------------+--------+----------+--------------
+| i_item_sk        | INT       | 19443            | -1     | 4        | 4
+| i_item_id        | STRING    | 9025             | -1     | 16       | 16
+| i_rec_start_date | TIMESTAMP | 4                | -1     | 16       | 16
+| i_rec_end_date   | TIMESTAMP | 3                | -1     | 16       | 16
+| i_item_desc      | STRING    | 13330            | -1     | 200      | 100.302803039
+| i_current_price  | FLOAT     | 2807             | -1     | 4        | 4
+| i_wholesale_cost | FLOAT     | 2105             | -1     | 4        | 4
+| i_brand_id       | INT       | 965              | -1     | 4        | 4
+| i_brand          | STRING    | 725              | -1     | 22       | 16.1776008605
+| i_class_id       | INT       | 16               | -1     | 4        | 4
+| i_class          | STRING    | 101              | -1     | 15       | 7.76749992370
+| i_category_id    | INT       | 10               | -1     | 4        | 4
+| i_manufact_id    | INT       | 1857             | -1     | 4        | 4
+| i_manufact       | STRING    | 1028             | -1     | 15       | 11.3295001983
+| i_size           | STRING    | 8                | -1     | 11       | 4.33459997177
+| i_formulation    | STRING    | 12884            | -1     | 20       | 19.9799995422
+| i_color          | STRING    | 92               | -1     | 10       | 5.38089990615
+| i_units          | STRING    | 22               | -1     | 7        | 4.18690013885
+| i_container      | STRING    | 2                | -1     | 7        | 6.99259996414
+| i_manager_id     | INT       | 105              | -1     | 4        | 4
+| i_product_name   | STRING    | 19094            | -1     | 25       | 18.0233001708
+| i_category       | STRING    | 10               | 0      | -1       | -1
++------------------+-----------+------------------+--------+----------+--------------
+</code></pre>
+
+    <p class="p">
+      To remove all statistics from the table, whether produced by <code class="ph codeph">COMPUTE STATS</code> or
+      <code class="ph codeph">COMPUTE INCREMENTAL STATS</code>, use the <code class="ph codeph">DROP STATS</code> statement without the
+      <code class="ph codeph">INCREMENTAL</code> clause). Now, both table-level and column-level statistics are reset.
+    </p>
+
+<pre class="pre codeblock"><code>drop stats item_partitioned;
+
+show table stats item_partitioned
++-------------+-------+--------+----------+--------------+---------+------------------
+| i_category  | #Rows | #Files | Size     | Bytes Cached | Format  | Incremental stats
++-------------+-------+--------+----------+--------------+---------+------------------
+| Books       | -1    | 1      | 223.74KB | NOT CACHED   | PARQUET | false
+| Children    | -1    | 1      | 230.05KB | NOT CACHED   | PARQUET | false
+| Electronics | -1    | 1      | 232.67KB | NOT CACHED   | PARQUET | false
+| Home        | -1    | 1      | 232.56KB | NOT CACHED   | PARQUET | false
+| Jewelry     | -1    | 1      | 223.72KB | NOT CACHED   | PARQUET | false
+| Men         | -1    | 1      | 231.25KB | NOT CACHED   | PARQUET | false
+| Music       | -1    | 1      | 237.90KB | NOT CACHED   | PARQUET | false
+| Shoes       | -1    | 1      | 234.90KB | NOT CACHED   | PARQUET | false
+| Sports      | -1    | 1      | 227.97KB | NOT CACHED   | PARQUET | false
+| Women       | -1    | 1      | 226.27KB | NOT CACHED   | PARQUET | false
+| Total       | -1    | 10     | 2.25MB   | 0B           |         |
++-------------+-------+--------+----------+--------------+---------+------------------
+show column stats item_partitioned
++------------------+-----------+------------------+--------+----------+----------+
+| Column           | Type      | #Distinct Values | #Nulls | Max Size | Avg Size |
++------------------+-----------+------------------+--------+----------+----------+
+| i_item_sk        | INT       | -1               | -1     | 4        | 4        |
+| i_item_id        | STRING    | -1               | -1     | -1       | -1       |
+| i_rec_start_date | TIMESTAMP | -1               | -1     | 16       | 16       |
+| i_rec_end_date   | TIMESTAMP | -1               | -1     | 16       | 16       |
+| i_item_desc      | STRING    | -1               | -1     | -1       | -1       |
+| i_current_price  | FLOAT     | -1               | -1     | 4        | 4        |
+| i_wholesale_cost | FLOAT     | -1               | -1     | 4        | 4        |
+| i_brand_id       | INT       | -1               | -1     | 4        | 4        |
+| i_brand          | STRING    | -1               | -1     | -1       | -1       |
+| i_class_id       | INT       | -1               | -1     | 4        | 4        |
+| i_class          | STRING    | -1               | -1     | -1       | -1       |
+| i_category_id    | INT       | -1               | -1     | 4        | 4        |
+| i_manufact_id    | INT       | -1               | -1     | 4        | 4        |
+| i_manufact       | STRING    | -1               | -1     | -1       | -1       |
+| i_size           | STRING    | -1               | -1     | -1       | -1       |
+| i_formulation    | STRING    | -1               | -1     | -1       | -1       |
+| i_color          | STRING    | -1               | -1     | -1       | -1       |
+| i_units          | STRING    | -1               | -1     | -1       | -1       |
+| i_container      | STRING    | -1               | -1     | -1       | -1       |
+| i_manager_id     | INT       | -1               | -1     | 4        | 4        |
+| i_product_name   | STRING    | -1               | -1     | -1       | -1       |
+| i_category       | STRING    | 10               | 0      | -1       | -1       |
++------------------+-----------+------------------+--------+----------+----------+
+</code></pre>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+    <p class="p">
+      <a class="xref" href="impala_compute_stats.html#compute_stats">COMPUTE STATS Statement</a>, <a class="xref" href="impala_show.html#show_table_stats">SHOW TABLE STATS Statement</a>,
+      <a class="xref" href="impala_show.html#show_column_stats">SHOW COLUMN STATS Statement</a>, <a class="xref" href="impala_perf_stats.html#perf_stats">Table and Column Statistics</a>
+    </p>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_langref_sql.html">Impala SQL Statements</a></div></div></nav></article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_drop_table.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_drop_table.html b/docs/build3x/html/topics/impala_drop_table.html
new file mode 100644
index 0000000..ff98d9c
--- /dev/null
+++ b/docs/build3x/html/topics/impala_drop_table.html
@@ -0,0 +1,192 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_langref_sql.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="drop_table"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>DROP TABLE Statement</title></head><body id="drop_table"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">DROP TABLE Statement</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+
+      Removes an Impala table. Also removes the underlying HDFS data files for internal tables, although not for
+      external tables.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>DROP TABLE [IF EXISTS] [<var class="keyword varname">db_name</var>.]<var class="keyword varname">table_name</var> <span class="ph">[PURGE]</span></code></pre>
+
+    <p class="p">
+      <strong class="ph b">IF EXISTS clause:</strong>
+    </p>
+
+    <p class="p">
+      The optional <code class="ph codeph">IF EXISTS</code> clause makes the statement succeed whether or not the table exists.
+      If the table does exist, it is dropped; if it does not exist, the statement has no effect. This capability is
+      useful in standardized setup scripts that remove existing schema objects and create new ones. By using some
+      combination of <code class="ph codeph">IF EXISTS</code> for the <code class="ph codeph">DROP</code> statements and <code class="ph codeph">IF NOT
+      EXISTS</code> clauses for the <code class="ph codeph">CREATE</code> statements, the script can run successfully the first
+      time you run it (when the objects do not exist yet) and subsequent times (when some or all of the objects do
+      already exist).
+    </p>
+
+    <p class="p">
+      <strong class="ph b">PURGE clause:</strong>
+    </p>
+
+    <p class="p"> The optional <code class="ph codeph">PURGE</code> keyword, available in
+      <span class="keyword">Impala 2.3</span> and higher, causes Impala to remove the associated
+      HDFS data files immediately, rather than going through the HDFS trashcan
+      mechanism. Use this keyword when dropping a table if it is crucial to
+      remove the data as quickly as possible to free up space, or if there is a
+      problem with the trashcan, such as the trash cannot being configured or
+      being in a different HDFS encryption zone than the data files. </p>
+
+    <p class="p">
+        <strong class="ph b">Statement type:</strong> DDL
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+    <p class="p">
+      By default, Impala removes the associated HDFS directory and data files for the table. If you issue a
+      <code class="ph codeph">DROP TABLE</code> and the data files are not deleted, it might be for the following reasons:
+    </p>
+
+    <ul class="ul">
+      <li class="li">
+        If the table was created with the
+        <code class="ph codeph"><a class="xref" href="impala_tables.html#external_tables">EXTERNAL</a></code> clause, Impala leaves all
+        files and directories untouched. Use external tables when the data is under the control of other Hadoop
+        components, and Impala is only used to query the data files from their original locations.
+      </li>
+
+      <li class="li">
+        Impala might leave the data files behind unintentionally, if there is no HDFS location available to hold
+        the HDFS trashcan for the <code class="ph codeph">impala</code> user. See
+        <a class="xref" href="impala_prereqs.html#prereqs_account">User Account Requirements</a> for the procedure to set up the required HDFS home
+        directory.
+      </li>
+    </ul>
+
+    <p class="p">
+      Make sure that you are in the correct database before dropping a table, either by issuing a
+      <code class="ph codeph">USE</code> statement first or by using a fully qualified name
+      <code class="ph codeph"><var class="keyword varname">db_name</var>.<var class="keyword varname">table_name</var></code>.
+    </p>
+
+    <p class="p">
+      If you intend to issue a <code class="ph codeph">DROP DATABASE</code> statement, first issue <code class="ph codeph">DROP TABLE</code>
+      statements to remove all the tables in that database.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>create database temporary;
+use temporary;
+create table unimportant (x int);
+create table trivial (s string);
+-- Drop a table in the current database.
+drop table unimportant;
+-- Switch to a different database.
+use default;
+-- To drop a table in a different database...
+drop table trivial;
+<em class="ph i">ERROR: AnalysisException: Table does not exist: default.trivial</em>
+-- ...use a fully qualified name.
+drop table temporary.trivial;</code></pre>
+
+    <p class="p">
+        For other tips about managing and reclaiming Impala disk space, see
+        <a class="xref" href="../shared/../topics/impala_disk_space.html#disk_space">Managing Disk Space for Impala Data</a>.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Amazon S3 considerations:</strong>
+      </p>
+    <p class="p">
+      The <code class="ph codeph">DROP TABLE</code> statement can remove data files from S3
+      if the associated S3 table is an internal table.
+      In <span class="keyword">Impala 2.6</span> and higher, as part of improved support for writing
+      to S3, Impala also removes the associated folder when dropping an internal table
+      that resides on S3.
+      See <a class="xref" href="impala_s3.html#s3">Using Impala with the Amazon S3 Filesystem</a> for details about working with S3 tables.
+    </p>
+
+    <div class="p">
+        For best compatibility with the S3 write support in <span class="keyword">Impala 2.6</span>
+        and higher:
+        <ul class="ul">
+        <li class="li">Use native Hadoop techniques to create data files in S3 for querying through Impala.</li>
+        <li class="li">Use the <code class="ph codeph">PURGE</code> clause of <code class="ph codeph">DROP TABLE</code> when dropping internal (managed) tables.</li>
+        </ul>
+        By default, when you drop an internal (managed) table, the data files are
+        moved to the HDFS trashcan. This operation is expensive for tables that
+        reside on the Amazon S3 filesystem. Therefore, for S3 tables, prefer to use
+        <code class="ph codeph">DROP TABLE <var class="keyword varname">table_name</var> PURGE</code> rather than the default <code class="ph codeph">DROP TABLE</code> statement.
+        The <code class="ph codeph">PURGE</code> clause makes Impala delete the data files immediately,
+        skipping the HDFS trashcan.
+        For the <code class="ph codeph">PURGE</code> clause to work effectively, you must originally create the
+        data files on S3 using one of the tools from the Hadoop ecosystem, such as
+        <code class="ph codeph">hadoop fs -cp</code>, or <code class="ph codeph">INSERT</code> in Impala or Hive.
+      </div>
+
+    <p class="p">
+        In <span class="keyword">Impala 2.6</span> and higher, Impala DDL statements such as
+        <code class="ph codeph">CREATE DATABASE</code>, <code class="ph codeph">CREATE TABLE</code>, <code class="ph codeph">DROP DATABASE CASCADE</code>,
+        <code class="ph codeph">DROP TABLE</code>, and <code class="ph codeph">ALTER TABLE [ADD|DROP] PARTITION</code> can create or remove folders
+        as needed in the Amazon S3 system. Prior to <span class="keyword">Impala 2.6</span>, you had to create folders yourself and point
+        Impala database, tables, or partitions at them, and manually remove folders when no longer needed.
+        See <a class="xref" href="../shared/../topics/impala_s3.html#s3">Using Impala with the Amazon S3 Filesystem</a> for details about reading and writing S3 data with Impala.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Cancellation:</strong> Cannot be cancelled.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">HDFS permissions:</strong>
+      </p>
+    <p class="p">
+      For an internal table, the user ID that the <span class="keyword cmdname">impalad</span> daemon runs under,
+      typically the <code class="ph codeph">impala</code> user, must have write
+      permission for all the files and directories that make up the table.
+    </p>
+    <p class="p">
+      For an external table, dropping the table only involves changes to metadata in the metastore database.
+      Because Impala does not remove any HDFS files or directories when external tables are dropped,
+      no particular permissions are needed for the associated HDFS files or directories.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Kudu considerations:</strong>
+      </p>
+    <p class="p">
+      Kudu tables can be managed or external, the same as with HDFS-based
+      tables. For a managed table, the underlying Kudu table and its data
+      are removed by <code class="ph codeph">DROP TABLE</code>. For an external table,
+      the underlying Kudu table and its data remain after a
+      <code class="ph codeph">DROP TABLE</code>.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+    <p class="p">
+      <a class="xref" href="impala_tables.html#tables">Overview of Impala Tables</a>,
+      <a class="xref" href="impala_alter_table.html#alter_table">ALTER TABLE Statement</a>, <a class="xref" href="impala_create_table.html#create_table">CREATE TABLE Statement</a>,
+      <a class="xref" href="impala_partitioning.html#partitioning">Partitioning for Impala Tables</a>, <a class="xref" href="impala_tables.html#internal_tables">Internal Tables</a>,
+      <a class="xref" href="impala_tables.html#external_tables">External Tables</a>
+    </p>
+
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_langref_sql.html">Impala SQL Statements</a></div></div></nav></article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_drop_view.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_drop_view.html b/docs/build3x/html/topics/impala_drop_view.html
new file mode 100644
index 0000000..523e50a
--- /dev/null
+++ b/docs/build3x/html/topics/impala_drop_view.html
@@ -0,0 +1,80 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_langref_sql.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="drop_view"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>DROP VIEW Statement</title></head><body id="drop_view"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">DROP VIEW Statement</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+
+      Removes the specified view, which was originally created by the <code class="ph codeph">CREATE VIEW</code> statement.
+      Because a view is purely a logical construct (an alias for a query) with no physical data behind it,
+      <code class="ph codeph">DROP VIEW</code> only involves changes to metadata in the metastore database, not any data files in
+      HDFS.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>DROP VIEW [IF EXISTS] [<var class="keyword varname">db_name</var>.]<var class="keyword varname">view_name</var></code></pre>
+
+    <p class="p">
+        <strong class="ph b">Statement type:</strong> DDL
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Cancellation:</strong> Cannot be cancelled.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">HDFS permissions:</strong> This statement does not touch any HDFS files or directories,
+        therefore no HDFS permissions are required.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+    <div class="p">
+        The following example creates a series of views and then drops them. These examples illustrate how views
+        are associated with a particular database, and both the view definitions and the view names for
+        <code class="ph codeph">CREATE VIEW</code> and <code class="ph codeph">DROP VIEW</code> can refer to a view in the current database or
+        a fully qualified view name.
+<pre class="pre codeblock"><code>
+-- Create and drop a view in the current database.
+CREATE VIEW few_rows_from_t1 AS SELECT * FROM t1 LIMIT 10;
+DROP VIEW few_rows_from_t1;
+
+-- Create and drop a view referencing a table in a different database.
+CREATE VIEW table_from_other_db AS SELECT x FROM db1.foo WHERE x IS NOT NULL;
+DROP VIEW table_from_other_db;
+
+USE db1;
+-- Create a view in a different database.
+CREATE VIEW db2.v1 AS SELECT * FROM db2.foo;
+-- Switch into the other database and drop the view.
+USE db2;
+DROP VIEW v1;
+
+USE db1;
+-- Create a view in a different database.
+CREATE VIEW db2.v1 AS SELECT * FROM db2.foo;
+-- Drop a view in the other database.
+DROP VIEW db2.v1;
+</code></pre>
+      </div>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+    <p class="p">
+      <a class="xref" href="impala_views.html#views">Overview of Impala Views</a>, <a class="xref" href="impala_create_view.html#create_view">CREATE VIEW Statement</a>,
+      <a class="xref" href="impala_alter_view.html#alter_view">ALTER VIEW Statement</a>
+    </p>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_langref_sql.html">Impala SQL Statements</a></div></div></nav></article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_exec_single_node_rows_threshold.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_exec_single_node_rows_threshold.html b/docs/build3x/html/topics/impala_exec_single_node_rows_threshold.html
new file mode 100644
index 0000000..9ca982a
--- /dev/null
+++ b/docs/build3x/html/topics/impala_exec_single_node_rows_threshold.html
@@ -0,0 +1,89 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="exec_single_node_rows_threshold"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>EXEC_SINGLE_NODE_ROWS_THRESHOLD Query Option (Impala 2.1 or higher only)</title></head><body id="exec_single_node_rows_threshold"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">EXEC_SINGLE_NODE_ROWS_THRESHOLD Query Option (<span class="keyword">Impala 2.1</span> or higher only)</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+
+      This setting controls the cutoff point (in terms of number of rows scanned) below which Impala treats a query
+      as a <span class="q">"small"</span> query, turning off optimizations such as parallel execution and native code generation. The
+      overhead for these optimizations is applicable for queries involving substantial amounts of data, but it
+      makes sense to skip them for queries involving tiny amounts of data. Reducing the overhead for small queries
+      allows Impala to complete them more quickly, keeping YARN resources, admission control slots, and so on
+      available for data-intensive queries.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>SET EXEC_SINGLE_NODE_ROWS_THRESHOLD=<var class="keyword varname">number_of_rows</var></code></pre>
+
+    <p class="p">
+      <strong class="ph b">Type:</strong> numeric
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Default:</strong> 100
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Usage notes:</strong> Typically, you increase the default value to make this optimization apply to more queries.
+      If incorrect or corrupted table and column statistics cause Impala to apply this optimization
+      incorrectly to queries that actually involve substantial work, you might see the queries being slower as a
+      result of remote reads. In that case, recompute statistics with the <code class="ph codeph">COMPUTE STATS</code>
+      or <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> statement. If there is a problem collecting accurate
+      statistics, you can turn this feature off by setting the value to -1.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Internal details:</strong>
+      </p>
+
+    <p class="p">
+      This setting applies to query fragments where the amount of data to scan can be accurately determined, either
+      through table and column statistics, or by the presence of a <code class="ph codeph">LIMIT</code> clause. If Impala cannot
+      accurately estimate the size of the input data, this setting does not apply.
+    </p>
+
+    <p class="p">
+      In <span class="keyword">Impala 2.3</span> and higher, where Impala supports the complex data types <code class="ph codeph">STRUCT</code>,
+      <code class="ph codeph">ARRAY</code>, and <code class="ph codeph">MAP</code>, if a query refers to any column of those types,
+      the small-query optimization is turned off for that query regardless of the
+      <code class="ph codeph">EXEC_SINGLE_NODE_ROWS_THRESHOLD</code> setting.
+    </p>
+
+    <p class="p">
+      For a query that is determined to be <span class="q">"small"</span>, all work is performed on the coordinator node. This might
+      result in some I/O being performed by remote reads. The savings from not distributing the query work and not
+      generating native code are expected to outweigh any overhead from the remote reads.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.10</span>
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+    <p class="p">
+      A common use case is to query just a few rows from a table to inspect typical data values. In this example,
+      Impala does not parallelize the query or perform native code generation because the result set is guaranteed
+      to be smaller than the threshold value from this query option:
+    </p>
+
+<pre class="pre codeblock"><code>SET EXEC_SINGLE_NODE_ROWS_THRESHOLD=500;
+SELECT * FROM enormous_table LIMIT 300;
+</code></pre>
+
+
+
+  </div>
+
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_exec_time_limit_s.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_exec_time_limit_s.html b/docs/build3x/html/topics/impala_exec_time_limit_s.html
new file mode 100644
index 0000000..df2d28a
--- /dev/null
+++ b/docs/build3x/html/topics/impala_exec_time_limit_s.html
@@ -0,0 +1,70 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="exec_time_limit_s"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>EXEC_TIME_LIMIT_S Query Option (Impala 2.12 or higher only)</title></head><body id="exec_time_limit_s"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">EXEC_TIME_LIMIT_S Query Option (<span class="keyword">Impala 2.12</span> or higher only)</h1>
+
+
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+      The <code class="ph codeph">EXEC_TIME_LIMIT_S</code> query option sets a time limit on query execution.
+      If a query is still executing when time limit expires, it is automatically canceled. The
+      option is intended to prevent runaway queries that execute for much longer than intended.
+    </p>
+
+    <p class="p">
+      For example, an Impala administrator could set a default value of
+      <code class="ph codeph">EXEC_TIME_LIMIT_S=3600</code> for a resource pool to automatically kill queries
+      that execute for longer than one hour (see
+      <a class="xref" href="impala_admission.html#admission_control">Admission Control and Query Queuing</a> for information about default query
+      options). Then, if a user accidentally runs a large query that executes for more than one
+      hour, it will be automatically killed after the time limit expires to free up resources.
+      Users can override the default value per query or per session if they do not want the
+      default <code class="ph codeph">EXEC_TIME_LIMIT_S</code> value to apply to a specific query or a
+      session.
+    </p>
+
+    <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+      <p class="p">
+        The time limit only starts once the query is executing. Time spent planning the query,
+        scheduling the query, or in admission control is not counted towards the execution time
+        limit. <code class="ph codeph">SELECT</code> statements are eligible for automatic cancellation until
+        the client has fetched all result rows. DML queries are eligible for automatic
+        cancellation until the DML statement has finished.
+      </p>
+    </div>
+
+    <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>SET EXEC_TIME_LIMIT_S=<var class="keyword varname">seconds</var>;</code></pre>
+
+    <p class="p">
+      <strong class="ph b">Type:</strong> numeric
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Default:</strong> 0 (no time limit )
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Added in:</strong>
+        <span class="keyword">Impala 2.12</span>
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+    <p class="p">
+      <a class="xref" href="impala_timeouts.html#timeouts">Setting Timeout Periods for Daemons, Queries, and Sessions</a>
+    </p>
+
+  </div>
+
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>

[46/51] [partial] impala git commit: [DOCS] Impala doc site update for 3.0

Posted by mi...@apache.org.

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_authorization.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_authorization.html b/docs/build3x/html/topics/impala_authorization.html
new file mode 100644
index 0000000..79c8cec
--- /dev/null
+++ b/docs/build3x/html/topics/impala_authorization.html
@@ -0,0 +1,1176 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_security.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="authorization"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Enabling Sentry Authorization for Impala</title></head><body id="authorization"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Enabling Sentry Authorization for Impala</h1>
+
+
+  <div class="body conbody" id="authorization__sentry">
+
+    <p class="p">
+      Authorization determines which users are allowed to access which resources, and what operations they are
+      allowed to perform. In Impala 1.1 and higher, you use Apache Sentry for
+      authorization. Sentry adds a fine-grained authorization framework for Hadoop. By default (when authorization
+      is not enabled), Impala does all read and write operations with the privileges of the <code class="ph codeph">impala</code>
+      user, which is suitable for a development/test environment but not for a secure production environment. When
+      authorization is enabled, Impala uses the OS user ID of the user who runs <span class="keyword cmdname">impala-shell</span> or
+      other client program, and associates various privileges with each user.
+    </p>
+
+    <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+      Sentry is typically used in conjunction with Kerberos authentication, which defines which hosts are allowed
+      to connect to each server. Using the combination of Sentry and Kerberos prevents malicious users from being
+      able to connect by creating a named account on an untrusted machine. See
+      <a class="xref" href="impala_kerberos.html#kerberos">Enabling Kerberos Authentication for Impala</a> for details about Kerberos authentication.
+    </div>
+
+    <p class="p toc inpage">
+      See the following sections for details about using the Impala authorization features:
+    </p>
+  </div>
+
+  <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_security.html">Impala Security</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="authorization__sentry_priv_model">
+
+    <h2 class="title topictitle2" id="ariaid-title2">The Sentry Privilege Model</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        Privileges can be granted on different objects in the schema. Any privilege that can be granted is
+        associated with a level in the object hierarchy. If a privilege is granted on a container object in the
+        hierarchy, the child object automatically inherits it. This is the same privilege model as Hive and other
+        database systems such as MySQL.
+      </p>
+
+      <p class="p">
+        The object hierarchy for Impala covers Server, URI, Database, Table, and Column. (The Table privileges apply to views as well;
+        anywhere you specify a table name, you can specify a view name instead.)
+        Column-level authorization is available in <span class="keyword">Impala 2.3</span> and higher.
+        Previously, you constructed views to query specific columns and assigned privilege based on
+        the views rather than the base tables. Now, you can use Impala's <a class="xref" href="impala_grant.html">GRANT Statement (Impala 2.0 or higher only)</a> and
+        <a class="xref" href="impala_revoke.html">REVOKE Statement (Impala 2.0 or higher only)</a> statements to assign and revoke privileges from specific columns
+        in a table.
+      </p>
+
+      <p class="p">
+        A restricted set of privileges determines what you can do with each object:
+      </p>
+
+      <dl class="dl">
+
+
+          <dt class="dt dlterm" id="sentry_priv_model__select_priv">
+            SELECT privilege
+          </dt>
+
+          <dd class="dd">
+            Lets you read data from a table or view, for example with the <code class="ph codeph">SELECT</code> statement, the
+            <code class="ph codeph">INSERT...SELECT</code> syntax, or <code class="ph codeph">CREATE TABLE...LIKE</code>. Also required to
+            issue the <code class="ph codeph">DESCRIBE</code> statement or the <code class="ph codeph">EXPLAIN</code> statement for a query
+            against a particular table. Only objects for which a user has this privilege are shown in the output
+            for <code class="ph codeph">SHOW DATABASES</code> and <code class="ph codeph">SHOW TABLES</code> statements. The
+            <code class="ph codeph">REFRESH</code> statement and <code class="ph codeph">INVALIDATE METADATA</code> statements only access
+            metadata for tables for which the user has this privilege.
+          </dd>
+
+
+
+
+
+          <dt class="dt dlterm" id="sentry_priv_model__insert_priv">
+            INSERT privilege
+          </dt>
+
+          <dd class="dd">
+            Lets you write data to a table. Applies to the <code class="ph codeph">INSERT</code> and <code class="ph codeph">LOAD DATA</code>
+            statements.
+          </dd>
+
+
+
+
+
+          <dt class="dt dlterm" id="sentry_priv_model__all_priv">
+            ALL privilege
+          </dt>
+
+          <dd class="dd">
+            Lets you create or modify the object. Required to run DDL statements such as <code class="ph codeph">CREATE
+            TABLE</code>, <code class="ph codeph">ALTER TABLE</code>, or <code class="ph codeph">DROP TABLE</code> for a table,
+            <code class="ph codeph">CREATE DATABASE</code> or <code class="ph codeph">DROP DATABASE</code> for a database, or <code class="ph codeph">CREATE
+            VIEW</code>, <code class="ph codeph">ALTER VIEW</code>, or <code class="ph codeph">DROP VIEW</code> for a view. Also required for
+            the URI of the <span class="q">"location"</span> parameter for the <code class="ph codeph">CREATE EXTERNAL TABLE</code> and
+            <code class="ph codeph">LOAD DATA</code> statements.
+
+          </dd>
+
+
+      </dl>
+
+      <p class="p">
+        Privileges can be specified for a table or view before that object actually exists. If you do not have
+        sufficient privilege to perform an operation, the error message does not disclose if the object exists or
+        not.
+      </p>
+
+      <p class="p">
+        Originally, privileges were encoded in a policy file, stored in HDFS. This mode of operation is still an
+        option, but the emphasis of privilege management is moving towards being SQL-based. Although currently
+        Impala does not have <code class="ph codeph">GRANT</code> or <code class="ph codeph">REVOKE</code> statements, Impala can make use of
+        privileges assigned through <code class="ph codeph">GRANT</code> and <code class="ph codeph">REVOKE</code> statements done through
+        Hive. The mode of operation with <code class="ph codeph">GRANT</code> and <code class="ph codeph">REVOKE</code> statements instead of
+        the policy file requires that a special Sentry service be enabled; this service stores, retrieves, and
+        manipulates privilege information stored inside the metastore database.
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="authorization__secure_startup">
+
+    <h2 class="title topictitle2" id="ariaid-title3">Starting the impalad Daemon with Sentry Authorization Enabled</h2>
+
+
+    <div class="body conbody">
+
+      <p class="p">
+        To run the <span class="keyword cmdname">impalad</span> daemon with authorization enabled, you add one or more options to the
+        <code class="ph codeph">IMPALA_SERVER_ARGS</code> declaration in the <span class="ph filepath">/etc/default/impala</span>
+        configuration file:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          The <code class="ph codeph">-server_name</code> option turns on Sentry authorization for Impala. The authorization
+          rules refer to a symbolic server name, and you specify the name to use as the argument to the
+          <code class="ph codeph">-server_name</code> option.
+        </li>
+
+        <li class="li">
+          If you specify just <code class="ph codeph">-server_name</code>, Impala uses the Sentry service for authorization,
+          relying on the results of <code class="ph codeph">GRANT</code> and <code class="ph codeph">REVOKE</code> statements issued through
+          Hive. (This mode of operation is available in Impala 1.4.0 and higher.) Prior to Impala 1.4.0, or if you
+          want to continue storing privilege rules in the policy file, also specify the
+          <code class="ph codeph">-authorization_policy_file</code> option as in the following item.
+        </li>
+
+        <li class="li">
+          Specifying the <code class="ph codeph">-authorization_policy_file</code> option in addition to
+          <code class="ph codeph">-server_name</code> makes Impala read privilege information from a policy file, rather than
+          from the metastore database. The argument to the <code class="ph codeph">-authorization_policy_file</code> option
+          specifies the HDFS path to the policy file that defines the privileges on different schema objects.
+        </li>
+      </ul>
+
+      <p class="p">
+        For example, you might adapt your <span class="ph filepath">/etc/default/impala</span> configuration to contain lines
+        like the following. To use the Sentry service rather than the policy file:
+      </p>
+
+<pre class="pre codeblock"><code>IMPALA_SERVER_ARGS=" \
+-server_name=server1 \
+...
+</code></pre>
+
+      <p class="p">
+        Or to use the policy file, as in releases prior to Impala 1.4:
+      </p>
+
+<pre class="pre codeblock"><code>IMPALA_SERVER_ARGS=" \
+-authorization_policy_file=/user/hive/warehouse/auth-policy.ini \
+-server_name=server1 \
+...
+</code></pre>
+
+      <p class="p">
+        The preceding examples set up a symbolic name of <code class="ph codeph">server1</code> to refer to the current instance
+        of Impala. This symbolic name is used in the following ways:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          <p class="p">
+            Specify the <code class="ph codeph">server1</code> value for the <code class="ph codeph">sentry.hive.server</code> property in the
+            <span class="ph filepath">sentry-site.xml</span> configuration file for Hive, as well as in the
+            <code class="ph codeph">-server_name</code> option for <span class="keyword cmdname">impalad</span>.
+          </p>
+          <p class="p">
+            If the <span class="keyword cmdname">impalad</span> daemon is not already running, start it as described in
+            <a class="xref" href="impala_processes.html#processes">Starting Impala</a>. If it is already running, restart it with the command
+            <code class="ph codeph">sudo /etc/init.d/impala-server restart</code>. Run the appropriate commands on all the nodes
+            where <span class="keyword cmdname">impalad</span> normally runs.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            If you use the mode of operation using the policy file, the rules in the <code class="ph codeph">[roles]</code>
+            section of the policy file refer to this same <code class="ph codeph">server1</code> name. For example, the following
+            rule sets up a role <code class="ph codeph">report_generator</code> that lets users with that role query any table in
+            a database named <code class="ph codeph">reporting_db</code> on a node where the <span class="keyword cmdname">impalad</span> daemon
+            was started up with the <code class="ph codeph">-server_name=server1</code> option:
+          </p>
+<pre class="pre codeblock"><code>[roles]
+report_generator = server=server1-&gt;db=reporting_db-&gt;table=*-&gt;action=SELECT
+</code></pre>
+        </li>
+      </ul>
+
+      <p class="p">
+        When <span class="keyword cmdname">impalad</span> is started with one or both of the <code class="ph codeph">-server_name=server1</code>
+        and <code class="ph codeph">-authorization_policy_file</code> options, Impala authorization is enabled. If Impala detects
+        any errors or inconsistencies in the authorization settings or the policy file, the daemon refuses to
+        start.
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="authorization__sentry_service">
+
+    <h2 class="title topictitle2" id="ariaid-title4">Using Impala with the Sentry Service (<span class="keyword">Impala 1.4</span> or higher only)</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        When you use the Sentry service rather than the policy file, you set up privileges through
+        <code class="ph codeph">GRANT</code> and <code class="ph codeph">REVOKE</code> statement in either Impala or Hive, then both components
+        use those same privileges automatically. (Impala added the <code class="ph codeph">GRANT</code> and
+        <code class="ph codeph">REVOKE</code> statements in <span class="keyword">Impala 2.0</span>.)
+      </p>
+
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title5" id="authorization__security_policy_file">
+
+    <h2 class="title topictitle2" id="ariaid-title5">Using Impala with the Sentry Policy File</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        The policy file is a file that you put in a designated location in HDFS, and is read during the startup of
+        the <span class="keyword cmdname">impalad</span> daemon when you specify both the <code class="ph codeph">-server_name</code> and
+        <code class="ph codeph">-authorization_policy_file</code> startup options. It controls which objects (databases, tables,
+        and HDFS directory paths) can be accessed by the user who connects to <span class="keyword cmdname">impalad</span>, and what
+        operations that user can perform on the objects.
+      </p>
+
+      <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+        <p class="p">
+          The Sentry service, as described in <a class="xref" href="impala_authorization.html#sentry_service">Using Impala with the Sentry Service (Impala 1.4 or higher only)</a>, stores
+          authorization metadata in a relational database. This means you can manage user privileges for Impala tables
+          using traditional <code class="ph codeph">GRANT</code> and <code class="ph codeph">REVOKE</code> SQL statements, rather than the
+          policy file approach described here.If you are still using policy files, migrate to the
+          database-backed service whenever practical.
+        </p>
+      </div>
+
+      <p class="p">
+        The location of the policy file is listed in the <span class="ph filepath">auth-site.xml</span> configuration file. To
+        minimize overhead, the security information from this file is cached by each <span class="keyword cmdname">impalad</span>
+        daemon and refreshed automatically, with a default interval of 5 minutes. After making a substantial change
+        to security policies, restart all Impala daemons to pick up the changes immediately.
+      </p>
+
+      <p class="p toc inpage"></p>
+    </div>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title6" id="security_policy_file__security_policy_file_details">
+
+      <h3 class="title topictitle3" id="ariaid-title6">Policy File Location and Format</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          The policy file uses the familiar <code class="ph codeph">.ini</code> format, divided into the major sections
+          <code class="ph codeph">[groups]</code> and <code class="ph codeph">[roles]</code>. There is also an optional
+          <code class="ph codeph">[databases]</code> section, which allows you to specify a specific policy file for a particular
+          database, as explained in <a class="xref" href="#security_multiple_policy_files">Using Multiple Policy Files for Different Databases</a>. Another optional section,
+          <code class="ph codeph">[users]</code>, allows you to override the OS-level mapping of users to groups; that is an
+          advanced technique primarily for testing and debugging, and is beyond the scope of this document.
+        </p>
+
+        <p class="p">
+          In the <code class="ph codeph">[groups]</code> section, you define various categories of users and select which roles
+          are associated with each category. The group and usernames correspond to Linux groups and users on the
+          server where the <span class="keyword cmdname">impalad</span> daemon runs.
+        </p>
+
+        <p class="p">
+          The group and usernames in the <code class="ph codeph">[groups]</code> section correspond to Linux groups and users on
+          the server where the <span class="keyword cmdname">impalad</span> daemon runs. When you access Impala through the
+          <span class="keyword cmdname">impalad</span> interpreter, for purposes of authorization, the user is the logged-in Linux
+          user and the groups are the Linux groups that user is a member of. When you access Impala through the
+          ODBC or JDBC interfaces, the user and password specified through the connection string are used as login
+          credentials for the Linux server, and authorization is based on that username and the associated Linux
+          group membership.
+        </p>
+
+        <div class="p">
+          In the <code class="ph codeph">[roles]</code> section, you a set of roles. For each role, you specify precisely the set
+          of privileges is available. That is, which objects users with that role can access, and what operations
+          they can perform on those objects. This is the lowest-level category of security information; the other
+          sections in the policy file map the privileges to higher-level divisions of groups and users. In the
+          <code class="ph codeph">[groups]</code> section, you specify which roles are associated with which groups. The group
+          and usernames correspond to Linux groups and users on the server where the <span class="keyword cmdname">impalad</span>
+          daemon runs. The privileges are specified using patterns like:
+<pre class="pre codeblock"><code>server=<var class="keyword varname">server_name</var>-&gt;db=<var class="keyword varname">database_name</var>-&gt;table=<var class="keyword varname">table_name</var>-&gt;action=SELECT
+server=<var class="keyword varname">server_name</var>-&gt;db=<var class="keyword varname">database_name</var>-&gt;table=<var class="keyword varname">table_name</var>-&gt;action=CREATE
+server=<var class="keyword varname">server_name</var>-&gt;db=<var class="keyword varname">database_name</var>-&gt;table=<var class="keyword varname">table_name</var>-&gt;action=ALL
+</code></pre>
+          For the <var class="keyword varname">server_name</var> value, substitute the same symbolic name you specify with the
+          <span class="keyword cmdname">impalad</span> <code class="ph codeph">-server_name</code> option. You can use <code class="ph codeph">*</code> wildcard
+          characters at each level of the privilege specification to allow access to all such objects. For example:
+<pre class="pre codeblock"><code>server=impala-host.example.com-&gt;db=default-&gt;table=t1-&gt;action=SELECT
+server=impala-host.example.com-&gt;db=*-&gt;table=*-&gt;action=CREATE
+server=impala-host.example.com-&gt;db=*-&gt;table=audit_log-&gt;action=SELECT
+server=impala-host.example.com-&gt;db=default-&gt;table=t1-&gt;action=*
+</code></pre>
+        </div>
+
+        <p class="p">
+          When authorization is enabled, Impala uses the policy file as a <em class="ph i">whitelist</em>, representing every
+          privilege available to any user on any object. That is, only operations specified for the appropriate
+          combination of object, role, group, and user are allowed; all other operations are not allowed. If a
+          group or role is defined multiple times in the policy file, the last definition takes precedence.
+        </p>
+
+        <p class="p">
+          To understand the notion of whitelisting, set up a minimal policy file that does not provide any
+          privileges for any object. When you connect to an Impala node where this policy file is in effect, you
+          get no results for <code class="ph codeph">SHOW DATABASES</code>, and an error when you issue any <code class="ph codeph">SHOW
+          TABLES</code>, <code class="ph codeph">USE <var class="keyword varname">database_name</var></code>, <code class="ph codeph">DESCRIBE
+          <var class="keyword varname">table_name</var></code>, <code class="ph codeph">SELECT</code>, and or other statements that expect to
+          access databases or tables, even if the corresponding databases and tables exist.
+        </p>
+
+        <p class="p">
+          The contents of the policy file are cached, to avoid a performance penalty for each query. The policy
+          file is re-checked by each <span class="keyword cmdname">impalad</span> node every 5 minutes. When you make a
+          non-time-sensitive change such as adding new privileges or new users, you can let the change take effect
+          automatically a few minutes later. If you remove or reduce privileges, and want the change to take effect
+          immediately, restart the <span class="keyword cmdname">impalad</span> daemon on all nodes, again specifying the
+          <code class="ph codeph">-server_name</code> and <code class="ph codeph">-authorization_policy_file</code> options so that the rules
+          from the updated policy file are applied.
+        </p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title7" id="security_policy_file__security_examples">
+
+      <h3 class="title topictitle3" id="ariaid-title7">Examples of Policy File Rules for Security Scenarios</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          The following examples show rules that might go in the policy file to deal with various
+          authorization-related scenarios. For illustration purposes, this section shows several very small policy
+          files with only a few rules each. In your environment, typically you would define many roles to cover all
+          the scenarios involving your own databases, tables, and applications, and a smaller number of groups,
+          whose members are given the privileges from one or more roles.
+        </p>
+
+        <div class="example" id="security_examples__sec_ex_unprivileged"><h4 class="title sectiontitle">A User with No Privileges</h4>
+
+
+
+          <p class="p">
+            If a user has no privileges at all, that user cannot access any schema objects in the system. The error
+            messages do not disclose the names or existence of objects that the user is not authorized to read.
+          </p>
+
+          <p class="p">
+
+            This is the experience you want a user to have if they somehow log into a system where they are not an
+            authorized Impala user. In a real deployment with a filled-in policy file, a user might have no
+            privileges because they are not a member of any of the relevant groups mentioned in the policy file.
+          </p>
+
+
+
+        </div>
+
+        <div class="example" id="security_examples__sec_ex_superuser"><h4 class="title sectiontitle">Examples of Privileges for Administrative Users</h4>
+
+
+
+          <p class="p">
+            When an administrative user has broad access to tables or databases, the associated rules in the
+            <code class="ph codeph">[roles]</code> section typically use wildcards and/or inheritance. For example, in the
+            following sample policy file, <code class="ph codeph">db=*</code> refers to all databases and
+            <code class="ph codeph">db=*-&gt;table=*</code> refers to all tables in all databases.
+          </p>
+
+          <p class="p">
+            Omitting the rightmost portion of a rule means that the privileges apply to all the objects that could
+            be specified there. For example, in the following sample policy file, the
+            <code class="ph codeph">all_databases</code> role has all privileges for all tables in all databases, while the
+            <code class="ph codeph">one_database</code> role has all privileges for all tables in one specific database. The
+            <code class="ph codeph">all_databases</code> role does not grant privileges on URIs, so a group with that role could
+            not issue a <code class="ph codeph">CREATE TABLE</code> statement with a <code class="ph codeph">LOCATION</code> clause. The
+            <code class="ph codeph">entire_server</code> role has all privileges on both databases and URIs within the server.
+          </p>
+
+<pre class="pre codeblock"><code>[groups]
+supergroup = all_databases
+
+[roles]
+read_all_tables = server=server1-&gt;db=*-&gt;table=*-&gt;action=SELECT
+all_tables = server=server1-&gt;db=*-&gt;table=*
+all_databases = server=server1-&gt;db=*
+one_database = server=server1-&gt;db=test_db
+entire_server = server=server1
+</code></pre>
+
+        </div>
+
+        <div class="example" id="security_examples__sec_ex_detailed"><h4 class="title sectiontitle">A User with Privileges for Specific Databases and Tables</h4>
+
+
+
+          <p class="p">
+            If a user has privileges for specific tables in specific databases, the user can access those things
+            but nothing else. They can see the tables and their parent databases in the output of <code class="ph codeph">SHOW
+            TABLES</code> and <code class="ph codeph">SHOW DATABASES</code>, <code class="ph codeph">USE</code> the appropriate databases,
+            and perform the relevant actions (<code class="ph codeph">SELECT</code> and/or <code class="ph codeph">INSERT</code>) based on the
+            table privileges. To actually create a table requires the <code class="ph codeph">ALL</code> privilege at the
+            database level, so you might define separate roles for the user that sets up a schema and other users
+            or applications that perform day-to-day operations on the tables.
+          </p>
+
+          <p class="p">
+            The following sample policy file shows some of the syntax that is appropriate as the policy file grows,
+            such as the <code class="ph codeph">#</code> comment syntax, <code class="ph codeph">\</code> continuation syntax, and comma
+            separation for roles assigned to groups or privileges assigned to roles.
+          </p>
+
+<pre class="pre codeblock"><code>[groups]
+employee = training_sysadmin, instructor
+visitor = student
+
+[roles]
+training_sysadmin = server=server1-&gt;db=training, \
+server=server1-&gt;db=instructor_private, \
+server=server1-&gt;db=lesson_development
+instructor = server=server1-&gt;db=training-&gt;table=*-&gt;action=*, \
+server=server1-&gt;db=instructor_private-&gt;table=*-&gt;action=*, \
+server=server1-&gt;db=lesson_development-&gt;table=lesson*
+# This particular course is all about queries, so the students can SELECT but not INSERT or CREATE/DROP.
+student = server=server1-&gt;db=training-&gt;table=lesson_*-&gt;action=SELECT
+</code></pre>
+
+        </div>
+
+
+
+        <div class="example" id="security_examples__sec_ex_external_files"><h4 class="title sectiontitle">Privileges for Working with External Data Files</h4>
+
+
+
+          <p class="p">
+            When data is being inserted through the <code class="ph codeph">LOAD DATA</code> statement, or is referenced from an
+            HDFS location outside the normal Impala database directories, the user also needs appropriate
+            permissions on the URIs corresponding to those HDFS locations.
+          </p>
+
+          <p class="p">
+            In this sample policy file:
+          </p>
+
+          <ul class="ul">
+            <li class="li">
+              The <code class="ph codeph">external_table</code> role lets us insert into and query the Impala table,
+              <code class="ph codeph">external_table.sample</code>.
+            </li>
+
+            <li class="li">
+              The <code class="ph codeph">staging_dir</code> role lets us specify the HDFS path
+              <span class="ph filepath">/user/username/external_data</span> with the <code class="ph codeph">LOAD DATA</code> statement.
+              Remember, when Impala queries or loads data files, it operates on all the files in that directory,
+              not just a single file, so any Impala <code class="ph codeph">LOCATION</code> parameters refer to a directory
+              rather than an individual file.
+            </li>
+
+            <li class="li">
+              We included the IP address and port of the Hadoop name node in the HDFS URI of the
+              <code class="ph codeph">staging_dir</code> rule. We found those details in
+              <span class="ph filepath">/etc/hadoop/conf/core-site.xml</span>, under the <code class="ph codeph">fs.default.name</code>
+              element. That is what we use in any roles that specify URIs (that is, the locations of directories in
+              HDFS).
+            </li>
+
+            <li class="li">
+              We start this example after the table <code class="ph codeph">external_table.sample</code> is already created. In
+              the policy file for the example, we have already taken away the <code class="ph codeph">external_table_admin</code>
+              role from the <code class="ph codeph">username</code> group, and replaced it with the lesser-privileged
+              <code class="ph codeph">external_table</code> role.
+            </li>
+
+            <li class="li">
+              We assign privileges to a subdirectory underneath <span class="ph filepath">/user/username</span> in HDFS,
+              because such privileges also apply to any subdirectories underneath. If we had assigned privileges to
+              the parent directory <span class="ph filepath">/user/username</span>, it would be too likely to mess up other
+              files by specifying a wrong location by mistake.
+            </li>
+
+            <li class="li">
+              The <code class="ph codeph">username</code> under the <code class="ph codeph">[groups]</code> section refers to the
+              <code class="ph codeph">username</code> group. (In this example, there is a <code class="ph codeph">username</code> user
+              that is a member of a <code class="ph codeph">username</code> group.)
+            </li>
+          </ul>
+
+          <p class="p">
+            Policy file:
+          </p>
+
+<pre class="pre codeblock"><code>[groups]
+username = external_table, staging_dir
+
+[roles]
+external_table_admin = server=server1-&gt;db=external_table
+external_table = server=server1-&gt;db=external_table-&gt;table=sample-&gt;action=*
+staging_dir = server=server1-&gt;uri=hdfs://127.0.0.1:8020/user/username/external_data-&gt;action=*
+</code></pre>
+
+          <p class="p">
+            <span class="keyword cmdname">impala-shell</span> session:
+          </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; use external_table;
+Query: use external_table
+[localhost:21000] &gt; show tables;
+Query: show tables
+Query finished, fetching results ...
++--------+
+| name   |
++--------+
+| sample |
++--------+
+Returned 1 row(s) in 0.02s
+
+[localhost:21000] &gt; select * from sample;
+Query: select * from sample
+Query finished, fetching results ...
++-----+
+| x   |
++-----+
+| 1   |
+| 5   |
+| 150 |
++-----+
+Returned 3 row(s) in 1.04s
+
+[localhost:21000] &gt; load data inpath '/user/username/external_data' into table sample;
+Query: load data inpath '/user/username/external_data' into table sample
+Query finished, fetching results ...
++----------------------------------------------------------+
+| summary                                                  |
++----------------------------------------------------------+
+| Loaded 1 file(s). Total files in destination location: 2 |
++----------------------------------------------------------+
+Returned 1 row(s) in 0.26s
+[localhost:21000] &gt; select * from sample;
+Query: select * from sample
+Query finished, fetching results ...
++-------+
+| x     |
++-------+
+| 2     |
+| 4     |
+| 6     |
+| 8     |
+| 64738 |
+| 49152 |
+| 1     |
+| 5     |
+| 150   |
++-------+
+Returned 9 row(s) in 0.22s
+
+[localhost:21000] &gt; load data inpath '/user/username/unauthorized_data' into table sample;
+Query: load data inpath '/user/username/unauthorized_data' into table sample
+ERROR: AuthorizationException: User 'username' does not have privileges to access: hdfs://127.0.0.1:8020/user/username/unauthorized_data
+</code></pre>
+
+        </div>
+
+
+
+        <div class="example" id="security_examples__sec_sysadmin"><h4 class="title sectiontitle">Separating Administrator Responsibility from Read and Write Privileges</h4>
+
+
+
+          <p class="p">
+            Remember that to create a database requires full privilege on that database, while day-to-day
+            operations on tables within that database can be performed with lower levels of privilege on specific
+            table. Thus, you might set up separate roles for each database or application: an administrative one
+            that could create or drop the database, and a user-level one that can access only the relevant tables.
+          </p>
+
+          <p class="p">
+            For example, this policy file divides responsibilities between users in 3 different groups:
+          </p>
+
+          <ul class="ul">
+            <li class="li">
+              Members of the <code class="ph codeph">supergroup</code> group have the <code class="ph codeph">training_sysadmin</code> role and
+              so can set up a database named <code class="ph codeph">training</code>.
+            </li>
+
+            <li class="li"> Members of the <code class="ph codeph">employee</code> group have the
+                <code class="ph codeph">instructor</code> role and so can create, insert into,
+              and query any tables in the <code class="ph codeph">training</code> database,
+              but cannot create or drop the database itself. </li>
+
+            <li class="li">
+              Members of the <code class="ph codeph">visitor</code> group have the <code class="ph codeph">student</code> role and so can query
+              those tables in the <code class="ph codeph">training</code> database.
+            </li>
+          </ul>
+
+<pre class="pre codeblock"><code>[groups]
+supergroup = training_sysadmin
+employee = instructor
+visitor = student
+
+[roles]
+training_sysadmin = server=server1-&gt;db=training
+instructor = server=server1-&gt;db=training-&gt;table=*-&gt;action=*
+student = server=server1-&gt;db=training-&gt;table=*-&gt;action=SELECT
+</code></pre>
+
+        </div>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title8" id="security_policy_file__security_multiple_policy_files">
+
+      <h3 class="title topictitle3" id="ariaid-title8">Using Multiple Policy Files for Different Databases</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          For an Impala cluster with many databases being accessed by many users and applications, it might be
+          cumbersome to update the security policy file for each privilege change or each new database, table, or
+          view. You can allow security to be managed separately for individual databases, by setting up a separate
+          policy file for each database:
+        </p>
+
+        <ul class="ul">
+          <li class="li">
+            Add the optional <code class="ph codeph">[databases]</code> section to the main policy file.
+          </li>
+
+          <li class="li">
+            Add entries in the <code class="ph codeph">[databases]</code> section for each database that has its own policy file.
+          </li>
+
+          <li class="li">
+            For each listed database, specify the HDFS path of the appropriate policy file.
+          </li>
+        </ul>
+
+        <p class="p">
+          For example:
+        </p>
+
+<pre class="pre codeblock"><code>[databases]
+# Defines the location of the per-DB policy files for the 'customers' and 'sales' databases.
+customers = hdfs://ha-nn-uri/etc/access/customers.ini
+sales = hdfs://ha-nn-uri/etc/access/sales.ini
+</code></pre>
+
+        <p class="p">
+          To enable URIs in per-DB policy files, the Java configuration option <code class="ph codeph">sentry.allow.uri.db.policyfile</code>
+          must be set to <code class="ph codeph">true</code>. For example:
+        </p>
+
+<pre class="pre codeblock"><code>JAVA_TOOL_OPTIONS="-Dsentry.allow.uri.db.policyfile=true"
+</code></pre>
+
+        <div class="note important note_important"><span class="note__title importanttitle">Important:</span>
+          Enabling URIs in per-DB policy files introduces a security risk by allowing the owner of the db-level
+          policy file to grant himself/herself load privileges to anything the <code class="ph codeph">impala</code> user has
+          read permissions for in HDFS (including data in other databases controlled by different db-level policy
+          files).
+        </div>
+      </div>
+    </article>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title9" id="authorization__security_schema">
+
+    <h2 class="title topictitle2" id="ariaid-title9">Setting Up Schema Objects for a Secure Impala Deployment</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        Remember that in your role definitions, you specify privileges at the level of individual databases and
+        tables, or all databases or all tables within a database. To simplify the structure of these rules, plan
+        ahead of time how to name your schema objects so that data with different authorization requirements is
+        divided into separate databases.
+      </p>
+
+      <p class="p">
+        If you are adding security on top of an existing Impala deployment, remember that you can rename tables or
+        even move them between databases using the <code class="ph codeph">ALTER TABLE</code> statement. In Impala, creating new
+        databases is a relatively inexpensive operation, basically just creating a new directory in HDFS.
+      </p>
+
+      <p class="p">
+        You can also plan the security scheme and set up the policy file before the actual schema objects named in
+        the policy file exist. Because the authorization capability is based on whitelisting, a user can only
+        create a new database or table if the required privilege is already in the policy file: either by listing
+        the exact name of the object being created, or a <code class="ph codeph">*</code> wildcard to match all the applicable
+        objects within the appropriate container.
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title10" id="authorization__security_privileges">
+
+    <h2 class="title topictitle2" id="ariaid-title10">Privilege Model and Object Hierarchy</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        Privileges can be granted on different objects in the schema. Any privilege that can be granted is
+        associated with a level in the object hierarchy. If a privilege is granted on a container object in the
+        hierarchy, the child object automatically inherits it. This is the same privilege model as Hive and other
+        database systems such as MySQL.
+      </p>
+
+      <p class="p">
+        The kinds of objects in the schema hierarchy are:
+      </p>
+
+<pre class="pre codeblock"><code>Server
+URI
+Database
+  Table
+</code></pre>
+
+      <p class="p">
+        The server name is specified by the <code class="ph codeph">-server_name</code> option when <span class="keyword cmdname">impalad</span>
+        starts. Specify the same name for all <span class="keyword cmdname">impalad</span> nodes in the cluster.
+      </p>
+
+      <p class="p">
+        URIs represent the HDFS paths you specify as part of statements such as <code class="ph codeph">CREATE EXTERNAL
+        TABLE</code> and <code class="ph codeph">LOAD DATA</code>. Typically, you specify what look like UNIX paths, but these
+        locations can also be prefixed with <code class="ph codeph">hdfs://</code> to make clear that they are really URIs. To
+        set privileges for a URI, specify the name of a directory, and the privilege applies to all the files in
+        that directory and any directories underneath it.
+      </p>
+
+      <p class="p">
+        In <span class="keyword">Impala 2.3</span> and higher, you can specify privileges for individual columns.
+        Formerly, to specify read privileges at this level, you created a view that queried specific columns
+        and/or partitions from a base table, and gave <code class="ph codeph">SELECT</code> privilege on the view but not
+        the underlying table. Now, you can use Impala's <a class="xref" href="impala_grant.html">GRANT Statement (Impala 2.0 or higher only)</a> and
+        <a class="xref" href="impala_revoke.html">REVOKE Statement (Impala 2.0 or higher only)</a> statements to assign and revoke privileges from specific columns
+        in a table.
+      </p>
+
+      <div class="p">
+        URIs must start with either <code class="ph codeph">hdfs://</code> or <code class="ph codeph">file://</code>. If a URI starts with
+        anything else, it will cause an exception and the policy file will be invalid. When defining URIs for HDFS,
+        you must also specify the NameNode. For example:
+<pre class="pre codeblock"><code>data_read = server=server1-&gt;uri=file:///path/to/dir, \
+server=server1-&gt;uri=hdfs://namenode:port/path/to/dir
+</code></pre>
+        <div class="note warning note_warning"><span class="note__title warningtitle">Warning:</span>
+          <p class="p">
+            Because the NameNode host and port must be specified, enable High Availability (HA) to ensure
+            that the URI will remain constant even if the NameNode changes.
+          </p>
+<pre class="pre codeblock"><code>data_read = server=server1-&gt;uri=file:///path/to/dir,\ server=server1-&gt;uri=hdfs://ha-nn-uri/path/to/dir
+</code></pre>
+        </div>
+      </div>
+
+
+
+
+
+      <table class="table"><caption><span class="table--title-label">Table 1. </span><span class="title">Valid privilege types and objects they apply to</span></caption><colgroup><col style="width:33.33333333333333%"><col style="width:66.66666666666666%"></colgroup><thead class="thead">
+            <tr class="row">
+              <th class="entry nocellnorowborder" id="security_privileges__entry__1"><strong class="ph b">Privilege</strong></th>
+              <th class="entry nocellnorowborder" id="security_privileges__entry__2"><strong class="ph b">Object</strong></th>
+            </tr>
+          </thead><tbody class="tbody">
+            <tr class="row">
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__1 ">INSERT</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__2 ">DB, TABLE</td>
+            </tr>
+            <tr class="row">
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__1 ">SELECT</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__2 ">DB, TABLE, COLUMN</td>
+            </tr>
+            <tr class="row">
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__1 ">ALL</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__2 ">SERVER, TABLE, DB, URI</td>
+            </tr>
+          </tbody></table>
+
+      <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+        <p class="p">
+          Although this document refers to the <code class="ph codeph">ALL</code> privilege, currently if you use the policy file
+          mode, you do not use the actual keyword <code class="ph codeph">ALL</code> in the policy file. When you code role
+          entries in the policy file:
+        </p>
+        <ul class="ul">
+          <li class="li">
+            To specify the <code class="ph codeph">ALL</code> privilege for a server, use a role like
+            <code class="ph codeph">server=<var class="keyword varname">server_name</var></code>.
+          </li>
+
+          <li class="li">
+            To specify the <code class="ph codeph">ALL</code> privilege for a database, use a role like
+            <code class="ph codeph">server=<var class="keyword varname">server_name</var>-&gt;db=<var class="keyword varname">database_name</var></code>.
+          </li>
+
+          <li class="li">
+            To specify the <code class="ph codeph">ALL</code> privilege for a table, use a role like
+            <code class="ph codeph">server=<var class="keyword varname">server_name</var>-&gt;db=<var class="keyword varname">database_name</var>-&gt;table=<var class="keyword varname">table_name</var>-&gt;action=*</code>.
+          </li>
+        </ul>
+      </div>
+      <table class="table"><caption></caption><colgroup><col style="width:29.241071428571423%"><col style="width:26.116071428571423%"><col style="width:22.32142857142857%"><col style="width:22.32142857142857%"></colgroup><thead class="thead">
+            <tr class="row">
+              <th class="entry nocellnorowborder" id="security_privileges__entry__9">
+                Operation
+              </th>
+              <th class="entry nocellnorowborder" id="security_privileges__entry__10">
+                Scope
+              </th>
+              <th class="entry nocellnorowborder" id="security_privileges__entry__11">
+                Privileges
+              </th>
+              <th class="entry nocellnorowborder" id="security_privileges__entry__12">
+                URI
+              </th>
+            </tr>
+          </thead><tbody class="tbody">
+            <tr class="row">
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__9 ">EXPLAIN</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__10 ">TABLE; COLUMN</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__11 ">SELECT</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__12 "></td>
+            </tr>
+            <tr class="row">
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__9 ">LOAD DATA</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__10 ">TABLE</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__11 ">INSERT</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__12 ">URI</td>
+            </tr>
+            <tr class="row">
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__9 ">CREATE DATABASE</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__10 ">SERVER</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__11 ">ALL</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__12 "></td>
+            </tr>
+            <tr class="row">
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__9 ">DROP DATABASE</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__10 ">DATABASE</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__11 ">ALL</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__12 "></td>
+            </tr>
+            <tr class="row">
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__9 ">CREATE TABLE</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__10 ">DATABASE</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__11 ">ALL</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__12 "></td>
+            </tr>
+            <tr class="row">
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__9 ">DROP TABLE</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__10 ">TABLE</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__11 ">ALL</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__12 "></td>
+            </tr>
+            <tr class="row">
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__9 ">DESCRIBE TABLE<p class="p">-Output shows <em class="ph i">all</em> columns if the
+                  user has table level-privileges or <code class="ph codeph">SELECT</code>
+                  privilege on at least one table column</p></td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__10 ">TABLE</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__11 ">SELECT/INSERT</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__12 "></td>
+            </tr>
+            <tr class="row">
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__9 ">ALTER TABLE .. ADD COLUMNS</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__10 ">TABLE</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__11 ">ALL on DATABASE</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__12 "></td>
+            </tr>
+            <tr class="row">
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__9 ">ALTER TABLE .. REPLACE COLUMNS</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__10 ">TABLE</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__11 ">ALL on DATABASE</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__12 "></td>
+            </tr>
+            <tr class="row">
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__9 ">ALTER TABLE .. CHANGE column</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__10 ">TABLE</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__11 ">ALL on DATABASE</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__12 "></td>
+            </tr>
+            <tr class="row">
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__9 ">ALTER TABLE .. RENAME</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__10 ">TABLE</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__11 ">ALL on DATABASE</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__12 "></td>
+            </tr>
+            <tr class="row">
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__9 ">ALTER TABLE .. SET TBLPROPERTIES</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__10 ">TABLE</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__11 ">ALL on DATABASE</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__12 "></td>
+            </tr>
+            <tr class="row">
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__9 ">ALTER TABLE .. SET FILEFORMAT</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__10 ">TABLE</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__11 ">ALL on DATABASE</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__12 "></td>
+            </tr>
+            <tr class="row">
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__9 ">ALTER TABLE .. SET LOCATION</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__10 ">TABLE</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__11 ">ALL on DATABASE</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__12 ">URI</td>
+            </tr>
+            <tr class="row">
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__9 ">ALTER TABLE .. ADD PARTITION</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__10 ">TABLE</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__11 ">ALL on DATABASE</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__12 "></td>
+            </tr>
+            <tr class="row">
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__9 ">ALTER TABLE .. ADD PARTITION location</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__10 ">TABLE</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__11 ">ALL on DATABASE</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__12 ">URI</td>
+            </tr>
+            <tr class="row">
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__9 ">ALTER TABLE .. DROP PARTITION</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__10 ">TABLE</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__11 ">ALL on DATABASE</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__12 "></td>
+            </tr>
+            <tr class="row">
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__9 ">ALTER TABLE .. PARTITION SET FILEFORMAT</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__10 ">TABLE</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__11 ">ALL on DATABASE</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__12 "></td>
+            </tr>
+            <tr class="row">
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__9 ">ALTER TABLE .. SET SERDEPROPERTIES</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__10 ">TABLE</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__11 ">ALL on DATABASE</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__12 "></td>
+            </tr>
+            <tr class="row">
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__9 ">CREATE VIEW<p class="p">-This operation is allowed if you have
+                  column-level <code class="ph codeph">SELECT</code> access to the columns
+                  being used.</p></td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__10 ">DATABASE; SELECT on TABLE; </td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__11 ">ALL</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__12 "></td>
+            </tr>
+            <tr class="row">
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__9 ">DROP VIEW</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__10 ">VIEW/TABLE</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__11 ">ALL</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__12 "></td>
+            </tr>
+            <tr class="row" id="security_privileges__alter_view_privs">
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__9 ">
+                ALTER VIEW
+              </td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__10 ">
+                You need <code class="ph codeph">ALL</code> privilege on the named view <span class="ph">and the parent
+                database</span>, plus <code class="ph codeph">SELECT</code> privilege for any tables or views referenced by the
+                view query. Once the view is created or altered by a high-privileged system administrator, it can
+                be queried by a lower-privileged user who does not have full query privileges for the base tables.
+              </td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__11 ">
+                ALL, SELECT
+              </td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__12 "></td>
+            </tr>
+            <tr class="row">
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__9 ">ALTER TABLE .. SET LOCATION</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__10 ">TABLE</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__11 ">ALL on DATABASE</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__12 ">URI</td>
+            </tr>
+            <tr class="row" id="security_privileges__create_external_table_privs">
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__9 ">
+                CREATE EXTERNAL TABLE
+              </td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__10 ">
+                Database (ALL), URI (SELECT)
+              </td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__11 ">
+                ALL, SELECT
+              </td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__12 "></td>
+            </tr>
+            <tr class="row">
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__9 ">SELECT<p class="p">-You can grant the SELECT privilege on a view to
+                  give users access to specific columns of a table they do not
+                  otherwise have access to.</p><p class="p">-See
+                  <span class="xref">the documentation for Apache Sentry</span>
+                  for details on allowed column-level
+                operations.</p></td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__10 ">VIEW/TABLE; COLUMN</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__11 ">SELECT</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__12 "></td>
+            </tr>
+            <tr class="row">
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__9 ">USE &lt;dbName&gt;</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__10 ">Any</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__11 "></td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__12 "></td>
+            </tr>
+            <tr class="row">
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__9 ">CREATE FUNCTION</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__10 ">SERVER</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__11 ">ALL</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__12 "></td>
+            </tr>
+            <tr class="row">
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__9 ">DROP FUNCTION</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__10 ">SERVER</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__11 ">ALL</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__12 "></td>
+            </tr>
+            <tr class="row">
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__9 ">REFRESH &lt;table name&gt; or REFRESH &lt;table name&gt; PARTITION (&lt;partition_spec&gt;)</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__10 ">TABLE</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__11 ">SELECT/INSERT</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__12 "></td>
+            </tr>
+            <tr class="row">
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__9 ">INVALIDATE METADATA</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__10 ">SERVER</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__11 ">ALL</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__12 "></td>
+            </tr>
+            <tr class="row">
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__9 ">INVALIDATE METADATA &lt;table name&gt;</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__10 ">TABLE</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__11 ">SELECT/INSERT</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__12 "></td>
+            </tr>
+            <tr class="row">
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__9 ">COMPUTE STATS</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__10 ">TABLE</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__11 ">ALL</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__12 "></td>
+            </tr>
+            <tr class="row" id="security_privileges__show_table_stats_privs">
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__9 ">
+                SHOW TABLE STATS, SHOW PARTITIONS
+              </td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__10 ">
+                TABLE
+              </td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__11 ">
+                SELECT/INSERT
+              </td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__12 "></td>
+            </tr>
+            <tr class="row">
+              <td class="entry nocellnorowborder" id="security_privileges__show_column_stats_privs" headers="security_privileges__entry__9 ">
+                SHOW COLUMN STATS
+              </td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__10 ">
+                TABLE
+              </td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__11 ">
+                SELECT/INSERT
+              </td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__12 "></td>
+            </tr>
+            <tr class="row">
+              <td class="entry nocellnorowborder" id="security_privileges__show_functions_privs" headers="security_privileges__entry__9 ">
+                SHOW FUNCTIONS
+              </td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__10 ">
+                DATABASE
+              </td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__11 ">
+                SELECT
+              </td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__12 "></td>
+            </tr>
+            <tr class="row" id="security_privileges__show_tables_privs">
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__9 ">
+                SHOW TABLES
+              </td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__10 "></td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__11 ">
+                No special privileges needed to issue the statement, but only shows objects you are authorized for
+              </td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__12 "></td>
+            </tr>
+            <tr class="row" id="security_privileges__show_databases_privs">
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__9 ">
+                SHOW DATABASES, SHOW SCHEMAS
+              </td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__10 "></td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__11 ">
+                No special privileges needed to issue the statement, but only shows objects you are authorized for
+              </td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__12 "></td>
+            </tr>
+          </tbody></table>
+
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title11" id="authorization__sentry_debug">
+
+    <h2 class="title topictitle2" id="ariaid-title11"><span class="ph">Debugging Failed Sentry Authorization Requests</span></h2>
+
+    <div class="body conbody">
+
+      <div class="p">
+      Sentry logs all facts that lead up to authorization decisions at the debug level. If you do not understand
+      why Sentry is denying access, the best way to debug is to temporarily turn on debug logging:
+      <ul class="ul">
+        <li class="li">
+          Add <code class="ph codeph">log4j.logger.org.apache.sentry=DEBUG</code> to the <span class="ph filepath">log4j.properties</span>
+          file on each host in the cluster, in the appropriate configuration directory for each service.
+        </li>
+      </ul>
+      Specifically, look for exceptions and messages such as:
+<pre class="pre codeblock"><code>FilePermission server..., RequestPermission server...., result [true|false]</code></pre>
+      which indicate each evaluation Sentry makes. The <code class="ph codeph">FilePermission</code> is from the policy file,
+      while <code class="ph codeph">RequestPermission</code> is the privilege required for the query. A
+      <code class="ph codeph">RequestPermission</code> will iterate over all appropriate <code class="ph codeph">FilePermission</code>
+      settings until a match is found. If no matching privilege is found, Sentry returns <code class="ph codeph">false</code>
+      indicating <span class="q">"Access Denied"</span> .
+
+    </div>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title12" id="authorization__sec_ex_default">
+
+    <h2 class="title topictitle2" id="ariaid-title12">The DEFAULT Database in a Secure Deployment</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        Because of the extra emphasis on granular access controls in a secure deployment, you should move any
+        important or sensitive information out of the <code class="ph codeph">DEFAULT</code> database into a named database whose
+        privileges are specified in the policy file. Sometimes you might need to give privileges on the
+        <code class="ph codeph">DEFAULT</code> database for administrative reasons; for example, as a place you can reliably
+        specify with a <code class="ph codeph">USE</code> statement when preparing to drop a database.
+      </p>
+
+
+    </div>
+  </article>
+</article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_avg.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_avg.html b/docs/build3x/html/topics/impala_avg.html
new file mode 100644
index 0000000..a63791f
--- /dev/null
+++ b/docs/build3x/html/topics/impala_avg.html
@@ -0,0 +1,318 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_aggregate_functions.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="avg"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>AVG Function</title></head><body id="avg"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">AVG Function</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+
+      An aggregate function that returns the average value from a set of numbers or <code class="ph codeph">TIMESTAMP</code> values.
+      Its single argument can be numeric column, or the numeric result of a function or expression applied to the
+      column value. Rows with a <code class="ph codeph">NULL</code> value for the specified column are ignored. If the table is empty,
+      or all the values supplied to <code class="ph codeph">AVG</code> are <code class="ph codeph">NULL</code>, <code class="ph codeph">AVG</code> returns
+      <code class="ph codeph">NULL</code>.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>AVG([DISTINCT | ALL] <var class="keyword varname">expression</var>) [OVER (<var class="keyword varname">analytic_clause</var>)]
+</code></pre>
+
+    <p class="p">
+      When the query contains a <code class="ph codeph">GROUP BY</code> clause, returns one value for each combination of
+      grouping values.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Return type:</strong> <code class="ph codeph">DOUBLE</code> for numeric values; <code class="ph codeph">TIMESTAMP</code> for
+      <code class="ph codeph">TIMESTAMP</code> values
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Complex type considerations:</strong>
+      </p>
+
+    <p class="p">
+        To access a column with a complex type (<code class="ph codeph">ARRAY</code>, <code class="ph codeph">STRUCT</code>, or <code class="ph codeph">MAP</code>)
+        in an aggregation function, you unpack the individual elements using join notation in the query,
+        and then apply the function to the final scalar item, field, key, or value at the bottom of any nested type hierarchy in the column.
+        See <a class="xref" href="../shared/../topics/impala_complex_types.html#complex_types">Complex Types (Impala 2.3 or higher only)</a> for details about using complex types in Impala.
+      </p>
+
+    <div class="p">
+The following example demonstrates calls to several aggregation functions
+using values from a column containing nested complex types
+(an <code class="ph codeph">ARRAY</code> of <code class="ph codeph">STRUCT</code> items).
+The array is unpacked inside the query using join notation.
+The array elements are referenced using the <code class="ph codeph">ITEM</code>
+pseudocolumn, and the structure fields inside the array elements
+are referenced using dot notation.
+Numeric values such as <code class="ph codeph">SUM()</code> and <code class="ph codeph">AVG()</code>
+are computed using the numeric <code class="ph codeph">R_NATIONKEY</code> field, and
+the general-purpose <code class="ph codeph">MAX()</code> and <code class="ph codeph">MIN()</code>
+values are computed from the string <code class="ph codeph">N_NAME</code> field.
+<pre class="pre codeblock"><code>describe region;
++-------------+-------------------------+---------+
+| name        | type                    | comment |
++-------------+-------------------------+---------+
+| r_regionkey | smallint                |         |
+| r_name      | string                  |         |
+| r_comment   | string                  |         |
+| r_nations   | array&lt;struct&lt;           |         |
+|             |   n_nationkey:smallint, |         |
+|             |   n_name:string,        |         |
+|             |   n_comment:string      |         |
+|             | &gt;&gt;                      |         |
++-------------+-------------------------+---------+
+
+select r_name, r_nations.item.n_nationkey
+  from region, region.r_nations as r_nations
+order by r_name, r_nations.item.n_nationkey;
++-------------+------------------+
+| r_name      | item.n_nationkey |
++-------------+------------------+
+| AFRICA      | 0                |
+| AFRICA      | 5                |
+| AFRICA      | 14               |
+| AFRICA      | 15               |
+| AFRICA      | 16               |
+| AMERICA     | 1                |
+| AMERICA     | 2                |
+| AMERICA     | 3                |
+| AMERICA     | 17               |
+| AMERICA     | 24               |
+| ASIA        | 8                |
+| ASIA        | 9                |
+| ASIA        | 12               |
+| ASIA        | 18               |
+| ASIA        | 21               |
+| EUROPE      | 6                |
+| EUROPE      | 7                |
+| EUROPE      | 19               |
+| EUROPE      | 22               |
+| EUROPE      | 23               |
+| MIDDLE EAST | 4                |
+| MIDDLE EAST | 10               |
+| MIDDLE EAST | 11               |
+| MIDDLE EAST | 13               |
+| MIDDLE EAST | 20               |
++-------------+------------------+
+
+select
+  r_name,
+  count(r_nations.item.n_nationkey) as count,
+  sum(r_nations.item.n_nationkey) as sum,
+  avg(r_nations.item.n_nationkey) as avg,
+  min(r_nations.item.n_name) as minimum,
+  max(r_nations.item.n_name) as maximum,
+  ndv(r_nations.item.n_nationkey) as distinct_vals
+from
+  region, region.r_nations as r_nations
+group by r_name
+order by r_name;
++-------------+-------+-----+------+-----------+----------------+---------------+
+| r_name      | count | sum | avg  | minimum   | maximum        | distinct_vals |
++-------------+-------+-----+------+-----------+----------------+---------------+
+| AFRICA      | 5     | 50  | 10   | ALGERIA   | MOZAMBIQUE     | 5             |
+| AMERICA     | 5     | 47  | 9.4  | ARGENTINA | UNITED STATES  | 5             |
+| ASIA        | 5     | 68  | 13.6 | CHINA     | VIETNAM        | 5             |
+| EUROPE      | 5     | 77  | 15.4 | FRANCE    | UNITED KINGDOM | 5             |
+| MIDDLE EAST | 5     | 58  | 11.6 | EGYPT     | SAUDI ARABIA   | 5             |
++-------------+-------+-----+------+-----------+----------------+---------------+
+</code></pre>
+</div>
+
+    <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>-- Average all the non-NULL values in a column.
+insert overwrite avg_t values (2),(4),(6),(null),(null);
+-- The average of the above values is 4: (2+4+6) / 3. The 2 NULL values are ignored.
+select avg(x) from avg_t;
+-- Average only certain values from the column.
+select avg(x) from t1 where month = 'January' and year = '2013';
+-- Apply a calculation to the value of the column before averaging.
+select avg(x/3) from t1;
+-- Apply a function to the value of the column before averaging.
+-- Here we are substituting a value of 0 for all NULLs in the column,
+-- so that those rows do factor into the return value.
+select avg(isnull(x,0)) from t1;
+-- Apply some number-returning function to a string column and average the results.
+-- If column s contains any NULLs, length(s) also returns NULL and those rows are ignored.
+select avg(length(s)) from t1;
+-- Can also be used in combination with DISTINCT and/or GROUP BY.
+-- Return more than one result.
+select month, year, avg(page_visits) from web_stats group by month, year;
+-- Filter the input to eliminate duplicates before performing the calculation.
+select avg(distinct x) from t1;
+-- Filter the output after performing the calculation.
+select avg(x) from t1 group by y having avg(x) between 1 and 20;
+</code></pre>
+
+    <div class="p">
+      The following examples show how to use <code class="ph codeph">AVG()</code> in an analytic context. They use a table
+      containing integers from 1 to 10. Notice how the <code class="ph codeph">AVG()</code> is reported for each input value, as
+      opposed to the <code class="ph codeph">GROUP BY</code> clause which condenses the result set.
+<pre class="pre codeblock"><code>select x, property, avg(x) over (partition by property) as avg from int_t where property in ('odd','even');
++----+----------+-----+
+| x  | property | avg |
++----+----------+-----+
+| 2  | even     | 6   |
+| 4  | even     | 6   |
+| 6  | even     | 6   |
+| 8  | even     | 6   |
+| 10 | even     | 6   |
+| 1  | odd      | 5   |
+| 3  | odd      | 5   |
+| 5  | odd      | 5   |
+| 7  | odd      | 5   |
+| 9  | odd      | 5   |
++----+----------+-----+
+</code></pre>
+
+Adding an <code class="ph codeph">ORDER BY</code> clause lets you experiment with results that are cumulative or apply to a moving
+set of rows (the <span class="q">"window"</span>). The following examples use <code class="ph codeph">AVG()</code> in an analytic context
+(that is, with an <code class="ph codeph">OVER()</code> clause) to produce a running average of all the even values,
+then a running average of all the odd values. The basic <code class="ph codeph">ORDER BY x</code> clause implicitly
+activates a window clause of <code class="ph codeph">RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW</code>,
+which is effectively the same as <code class="ph codeph">ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW</code>,
+therefore all of these examples produce the same results:
+<pre class="pre codeblock"><code>select x, property,
+  avg(x) over (partition by property <strong class="ph b">order by x</strong>) as 'cumulative average'
+  from int_t where property in ('odd','even');
++----+----------+--------------------+
+| x  | property | cumulative average |
++----+----------+--------------------+
+| 2  | even     | 2                  |
+| 4  | even     | 3                  |
+| 6  | even     | 4                  |
+| 8  | even     | 5                  |
+| 10 | even     | 6                  |
+| 1  | odd      | 1                  |
+| 3  | odd      | 2                  |
+| 5  | odd      | 3                  |
+| 7  | odd      | 4                  |
+| 9  | odd      | 5                  |
++----+----------+--------------------+
+
+select x, property,
+  avg(x) over
+  (
+    partition by property
+    <strong class="ph b">order by x</strong>
+    <strong class="ph b">range between unbounded preceding and current row</strong>
+  ) as 'cumulative average'
+from int_t where property in ('odd','even');
++----+----------+--------------------+
+| x  | property | cumulative average |
++----+----------+--------------------+
+| 2  | even     | 2                  |
+| 4  | even     | 3                  |
+| 6  | even     | 4                  |
+| 8  | even     | 5                  |
+| 10 | even     | 6                  |
+| 1  | odd      | 1                  |
+| 3  | odd      | 2                  |
+| 5  | odd      | 3                  |
+| 7  | odd      | 4                  |
+| 9  | odd      | 5                  |
++----+----------+--------------------+
+
+select x, property,
+  avg(x) over
+  (
+    partition by property
+    <strong class="ph b">order by x</strong>
+    <strong class="ph b">rows between unbounded preceding and current row</strong>
+  ) as 'cumulative average'
+  from int_t where property in ('odd','even');
++----+----------+--------------------+
+| x  | property | cumulative average |
++----+----------+--------------------+
+| 2  | even     | 2                  |
+| 4  | even     | 3                  |
+| 6  | even     | 4                  |
+| 8  | even     | 5                  |
+| 10 | even     | 6                  |
+| 1  | odd      | 1                  |
+| 3  | odd      | 2                  |
+| 5  | odd      | 3                  |
+| 7  | odd      | 4                  |
+| 9  | odd      | 5                  |
++----+----------+--------------------+
+</code></pre>
+
+The following examples show how to construct a moving window, with a running average taking into account 1 row before
+and 1 row after the current row, within the same partition (all the even values or all the odd values).
+Because of a restriction in the Impala <code class="ph codeph">RANGE</code> syntax, this type of
+moving window is possible with the <code class="ph codeph">ROWS BETWEEN</code> clause but not the <code class="ph codeph">RANGE BETWEEN</code>
+clause:
+<pre class="pre codeblock"><code>select x, property,
+  avg(x) over
+  (
+    partition by property
+    <strong class="ph b">order by x</strong>
+    <strong class="ph b">rows between 1 preceding and 1 following</strong>
+  ) as 'moving average'
+  from int_t where property in ('odd','even');
++----+----------+----------------+
+| x  | property | moving average |
++----+----------+----------------+
+| 2  | even     | 3              |
+| 4  | even     | 4              |
+| 6  | even     | 6              |
+| 8  | even     | 8              |
+| 10 | even     | 9              |
+| 1  | odd      | 2              |
+| 3  | odd      | 3              |
+| 5  | odd      | 5              |
+| 7  | odd      | 7              |
+| 9  | odd      | 8              |
++----+----------+----------------+
+
+-- Doesn't work because of syntax restriction on RANGE clause.
+select x, property,
+  avg(x) over
+  (
+    partition by property
+    <strong class="ph b">order by x</strong>
+    <strong class="ph b">range between 1 preceding and 1 following</strong>
+  ) as 'moving average'
+from int_t where property in ('odd','even');
+ERROR: AnalysisException: RANGE is only supported with both the lower and upper bounds UNBOUNDED or one UNBOUNDED and the other CURRENT ROW.
+</code></pre>
+    </div>
+
+    <p class="p">
+        <strong class="ph b">Restrictions:</strong>
+      </p>
+
+
+
+    <p class="p">
+        Due to the way arithmetic on <code class="ph codeph">FLOAT</code> and <code class="ph codeph">DOUBLE</code> columns uses
+        high-performance hardware instructions, and distributed queries can perform these operations in different
+        order for each query, results can vary slightly for aggregate function calls such as <code class="ph codeph">SUM()</code>
+        and <code class="ph codeph">AVG()</code> for <code class="ph codeph">FLOAT</code> and <code class="ph codeph">DOUBLE</code> columns, particularly on
+        large data sets where millions or billions of values are summed or averaged. For perfect consistency and
+        repeatability, use the <code class="ph codeph">DECIMAL</code> data type for such operations instead of
+        <code class="ph codeph">FLOAT</code> or <code class="ph codeph">DOUBLE</code>.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+    <p class="p">
+      <a class="xref" href="impala_analytic_functions.html#analytic_functions">Impala Analytic Functions</a>, <a class="xref" href="impala_max.html#max">MAX Function</a>,
+      <a class="xref" href="impala_min.html#min">MIN Function</a>
+    </p>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_aggregate_functions.html">Impala Aggregate Functions</a></div></div></nav></article></main></body></html>

[15/51] [partial] impala git commit: [DOCS] Impala doc site update for 3.0

Posted by mi...@apache.org.

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_parquet_fallback_schema_resolution.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_parquet_fallback_schema_resolution.html b/docs/build3x/html/topics/impala_parquet_fallback_schema_resolution.html
new file mode 100644
index 0000000..74ec966
--- /dev/null
+++ b/docs/build3x/html/topics/impala_parquet_fallback_schema_resolution.html
@@ -0,0 +1,55 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="parquet_fallback_schema_resolution"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>PARQUET_FALLBACK_SCHEMA_RESOLUTION Query Option (Impala 2.6 or higher only)</title></head><body id="parquet_fallback_schema_resolution"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">PARQUET_FALLBACK_SCHEMA_RESOLUTION Query Option (<span class="keyword">Impala 2.6</span> or higher only)</h1>
+
+
+
+  <div class="body conbody">
+
+    <div class="p">
+
+      The <code class="ph codeph">PARQUET_FALLBACK_SCHEMA_RESOLUTION</code> query option allows Impala to look
+      up columns within Parquet files by column name, rather than column order,
+      when necessary.
+      The allowed values are:
+      <ul class="ul">
+        <li class="li">
+          POSITION (0)
+        </li>
+        <li class="li">
+          NAME (1)
+        </li>
+      </ul>
+    </div>
+
+    <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+    <p class="p">
+      By default, Impala looks up columns within a Parquet file based on
+      the order of columns in the table.
+      The <code class="ph codeph">name</code> setting for this option enables behavior for
+      Impala queries similar to the Hive setting <code class="ph codeph">parquet.column.index access=false</code>.
+      It also allows Impala to query Parquet files created by Hive with the
+      <code class="ph codeph">parquet.column.index.access=false</code> setting in effect.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Type:</strong> integer or string
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.6.0</span>
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+    <p class="p">
+      <a class="xref" href="impala_parquet.html#parquet_schema_evolution">Schema Evolution for Parquet Tables</a>
+    </p>
+
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_parquet_file_size.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_parquet_file_size.html b/docs/build3x/html/topics/impala_parquet_file_size.html
new file mode 100644
index 0000000..b62341e
--- /dev/null
+++ b/docs/build3x/html/topics/impala_parquet_file_size.html
@@ -0,0 +1,101 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="parquet_file_size"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>PARQUET_FILE_SIZE Query Option</title></head><body id="parquet_file_size"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">PARQUET_FILE_SIZE Query Option</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+
+      Specifies the maximum size of each Parquet data file produced by Impala <code class="ph codeph">INSERT</code> statements.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+    <p class="p">
+      Specify the size in bytes, or with a trailing <code class="ph codeph">m</code> or <code class="ph codeph">g</code> character to indicate
+      megabytes or gigabytes. For example:
+    </p>
+
+<pre class="pre codeblock"><code>-- 128 megabytes.
+set PARQUET_FILE_SIZE=134217728
+INSERT OVERWRITE parquet_table SELECT * FROM text_table;
+
+-- 512 megabytes.
+set PARQUET_FILE_SIZE=512m;
+INSERT OVERWRITE parquet_table SELECT * FROM text_table;
+
+-- 1 gigabyte.
+set PARQUET_FILE_SIZE=1g;
+INSERT OVERWRITE parquet_table SELECT * FROM text_table;
+</code></pre>
+
+    <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+    <p class="p">
+      With tables that are small or finely partitioned, the default Parquet block size (formerly 1 GB, now 256 MB
+      in Impala 2.0 and later) could be much larger than needed for each data file. For <code class="ph codeph">INSERT</code>
+      operations into such tables, you can increase parallelism by specifying a smaller
+      <code class="ph codeph">PARQUET_FILE_SIZE</code> value, resulting in more HDFS blocks that can be processed by different
+      nodes.
+
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Type:</strong> numeric, with optional unit specifier
+    </p>
+
+    <div class="note important note_important"><span class="note__title importanttitle">Important:</span>
+    <p class="p">
+      Currently, the maximum value for this setting is 1 gigabyte (<code class="ph codeph">1g</code>).
+      Setting a value higher than 1 gigabyte could result in errors during
+      an <code class="ph codeph">INSERT</code> operation.
+    </p>
+    </div>
+
+    <p class="p">
+      <strong class="ph b">Default:</strong> 0 (produces files with a target size of 256 MB; files might be larger for very wide tables)
+    </p>
+
+    <p class="p">
+        Because ADLS does not expose the block sizes of data files the way HDFS does,
+        any Impala <code class="ph codeph">INSERT</code> or <code class="ph codeph">CREATE TABLE AS SELECT</code> statements
+        use the <code class="ph codeph">PARQUET_FILE_SIZE</code> query option setting to define the size of
+        Parquet data files. (Using a large block size is more important for Parquet tables than
+        for tables that use other file formats.)
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Isilon considerations:</strong>
+      </p>
+    <div class="p">
+        Because the EMC Isilon storage devices use a global value for the block size
+        rather than a configurable value for each file, the <code class="ph codeph">PARQUET_FILE_SIZE</code>
+        query option has no effect when Impala inserts data into a table or partition
+        residing on Isilon storage. Use the <code class="ph codeph">isi</code> command to set the
+        default block size globally on the Isilon device. For example, to set the
+        Isilon default block size to 256 MB, the recommended size for Parquet
+        data files for Impala, issue the following command:
+<pre class="pre codeblock"><code>isi hdfs settings modify --default-block-size=256MB</code></pre>
+      </div>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+    <p class="p">
+      For information about the Parquet file format, and how the number and size of data files affects query
+      performance, see <a class="xref" href="impala_parquet.html#parquet">Using the Parquet File Format with Impala Tables</a>.
+    </p>
+
+
+
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_partitioning.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_partitioning.html b/docs/build3x/html/topics/impala_partitioning.html
new file mode 100644
index 0000000..c99d10d
--- /dev/null
+++ b/docs/build3x/html/topics/impala_partitioning.html
@@ -0,0 +1,801 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version"
  content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="partitioning"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Partitioning for Impala Tables</title></head><body id="partitioning"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Partitioning for Impala Tables</h1>
+
+
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+
+      By default, all the data files for a table are located in a single directory. Partitioning is a technique for physically dividing the
+      data during loading, based on values from one or more columns, to speed up queries that test those columns. For example, with a
+      <code class="ph codeph">school_records</code> table partitioned on a <code class="ph codeph">year</code> column, there is a separate data directory for each
+      different year value, and all the data for that year is stored in a data file in that directory. A query that includes a
+      <code class="ph codeph">WHERE</code> condition such as <code class="ph codeph">YEAR=1966</code>, <code class="ph codeph">YEAR IN (1989,1999)</code>, or <code class="ph codeph">YEAR BETWEEN
+      1984 AND 1989</code> can examine only the data files from the appropriate directory or directories, greatly reducing the amount of
+      data to read and test.
+    </p>
+
+    <p class="p toc inpage"></p>
+
+    <p class="p">
+      See <a class="xref" href="impala_tutorial.html#tut_external_partition_data">Attaching an External Partitioned Table to an HDFS Directory Structure</a> for an example that illustrates the syntax for creating partitioned
+      tables, the underlying directory structure in HDFS, and how to attach a partitioned Impala external table to data files stored
+      elsewhere in HDFS.
+    </p>
+
+    <p class="p">
+      Parquet is a popular format for partitioned Impala tables because it is well suited to handle huge data volumes. See
+      <a class="xref" href="impala_parquet.html#parquet_performance">Query Performance for Impala Parquet Tables</a> for performance considerations for partitioned Parquet tables.
+    </p>
+
+    <p class="p">
+      See <a class="xref" href="impala_literals.html#null">NULL</a> for details about how <code class="ph codeph">NULL</code> values are represented in partitioned tables.
+    </p>
+
+    <p class="p">
+      See <a class="xref" href="impala_s3.html#s3">Using Impala with the Amazon S3 Filesystem</a> for details about setting up tables where some or all partitions reside on the Amazon Simple
+      Storage Service (S3).
+    </p>
+
+  </div>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title2" id="partitioning__partitioning_choosing">
+
+    <h2 class="title topictitle2" id="ariaid-title2">When to Use Partitioned Tables</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        Partitioning is typically appropriate for:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          Tables that are very large, where reading the entire data set takes an impractical amount of time.
+        </li>
+
+        <li class="li">
+          Tables that are always or almost always queried with conditions on the partitioning columns. In our example of a table partitioned
+          by year, <code class="ph codeph">SELECT COUNT(*) FROM school_records WHERE year = 1985</code> is efficient, only examining a small fraction of
+          the data; but <code class="ph codeph">SELECT COUNT(*) FROM school_records</code> has to process a separate data file for each year, resulting in
+          more overall work than in an unpartitioned table. You would probably not partition this way if you frequently queried the table
+          based on last name, student ID, and so on without testing the year.
+        </li>
+
+        <li class="li">
+          Columns that have reasonable cardinality (number of different values). If a column only has a small number of values, for example
+          <code class="ph codeph">Male</code> or <code class="ph codeph">Female</code>, you do not gain much efficiency by eliminating only about 50% of the data to
+          read for each query. If a column has only a few rows matching each value, the number of directories to process can become a
+          limiting factor, and the data file in each directory could be too small to take advantage of the Hadoop mechanism for transmitting
+          data in multi-megabyte blocks. For example, you might partition census data by year, store sales data by year and month, and web
+          traffic data by year, month, and day. (Some users with high volumes of incoming data might even partition down to the individual
+          hour and minute.)
+        </li>
+
+        <li class="li">
+          Data that already passes through an extract, transform, and load (ETL) pipeline. The values of the partitioning columns are
+          stripped from the original data files and represented by directory names, so loading data into a partitioned table involves some
+          sort of transformation or preprocessing.
+        </li>
+      </ul>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="partitioning__partition_sql">
+
+    <h2 class="title topictitle2" id="ariaid-title3">SQL Statements for Partitioned Tables</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        In terms of Impala SQL syntax, partitioning affects these statements:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          <code class="ph codeph"><a class="xref" href="impala_create_table.html#create_table">CREATE TABLE</a></code>: you specify a <code class="ph codeph">PARTITIONED
+          BY</code> clause when creating the table to identify names and data types of the partitioning columns. These columns are not
+          included in the main list of columns for the table.
+        </li>
+
+        <li class="li">
+          In <span class="keyword">Impala 2.5</span> and higher, you can also use the <code class="ph codeph">PARTITIONED BY</code> clause in a <code class="ph codeph">CREATE TABLE AS
+          SELECT</code> statement. This syntax lets you use a single statement to create a partitioned table, copy data into it, and
+          create new partitions based on the values in the inserted data.
+        </li>
+
+        <li class="li">
+          <code class="ph codeph"><a class="xref" href="impala_alter_table.html#alter_table">ALTER TABLE</a></code>: you can add or drop partitions, to work with
+          different portions of a huge data set. You can designate the HDFS directory that holds the data files for a specific partition.
+          With data partitioned by date values, you might <span class="q">"age out"</span> data that is no longer relevant.
+          <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+        If you are creating a partition for the first time and specifying its location, for maximum efficiency, use
+        a single <code class="ph codeph">ALTER TABLE</code> statement including both the <code class="ph codeph">ADD PARTITION</code> and
+        <code class="ph codeph">LOCATION</code> clauses, rather than separate statements with <code class="ph codeph">ADD PARTITION</code> and
+        <code class="ph codeph">SET LOCATION</code> clauses.
+      </div>
+        </li>
+
+        <li class="li">
+          <code class="ph codeph"><a class="xref" href="impala_insert.html#insert">INSERT</a></code>: When you insert data into a partitioned table, you identify
+          the partitioning columns. One or more values from each inserted row are not stored in data files, but instead determine the
+          directory where that row value is stored. You can also specify which partition to load a set of data into, with <code class="ph codeph">INSERT
+          OVERWRITE</code> statements; you can replace the contents of a specific partition but you cannot append data to a specific
+          partition.
+          <p class="p">
+        By default, if an <code class="ph codeph">INSERT</code> statement creates any new subdirectories underneath a partitioned
+        table, those subdirectories are assigned default HDFS permissions for the <code class="ph codeph">impala</code> user. To
+        make each subdirectory have the same permissions as its parent directory in HDFS, specify the
+        <code class="ph codeph">--insert_inherit_permissions</code> startup option for the <span class="keyword cmdname">impalad</span> daemon.
+      </p>
+        </li>
+
+        <li class="li">
+          Although the syntax of the <code class="ph codeph"><a class="xref" href="impala_select.html#select">SELECT</a></code> statement is the same whether or
+          not the table is partitioned, the way queries interact with partitioned tables can have a dramatic impact on performance and
+          scalability. The mechanism that lets queries skip certain partitions during a query is known as partition pruning; see
+          <a class="xref" href="impala_partitioning.html#partition_pruning">Partition Pruning for Queries</a> for details.
+        </li>
+
+        <li class="li">
+          In Impala 1.4 and later, there is a <code class="ph codeph">SHOW PARTITIONS</code> statement that displays information about each partition in a
+          table. See <a class="xref" href="impala_show.html#show">SHOW Statement</a> for details.
+        </li>
+      </ul>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="partitioning__partition_static_dynamic">
+
+    <h2 class="title topictitle2" id="ariaid-title4">Static and Dynamic Partitioning Clauses</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        Specifying all the partition columns in a SQL statement is called <dfn class="term">static partitioning</dfn>, because the statement affects a
+        single predictable partition. For example, you use static partitioning with an <code class="ph codeph">ALTER TABLE</code> statement that affects
+        only one partition, or with an <code class="ph codeph">INSERT</code> statement that inserts all values into the same partition:
+      </p>
+
+<pre class="pre codeblock"><code>insert into t1 <strong class="ph b">partition(x=10, y='a')</strong> select c1 from some_other_table;
+</code></pre>
+
+      <p class="p">
+        When you specify some partition key columns in an <code class="ph codeph">INSERT</code> statement, but leave out the values, Impala determines
+        which partition to insert. This technique is called <dfn class="term">dynamic partitioning</dfn>:
+      </p>
+
+<pre class="pre codeblock"><code>insert into t1 <strong class="ph b">partition(x, y='b')</strong> select c1, c2 from some_other_table;
+-- Create new partition if necessary based on variable year, month, and day; insert a single value.
+insert into weather <strong class="ph b">partition (year, month, day)</strong> select 'cloudy',2014,4,21;
+-- Create new partition if necessary for specified year and month but variable day; insert a single value.
+insert into weather <strong class="ph b">partition (year=2014, month=04, day)</strong> select 'sunny',22;
+</code></pre>
+
+      <p class="p">
+        The more key columns you specify in the <code class="ph codeph">PARTITION</code> clause, the fewer columns you need in the <code class="ph codeph">SELECT</code>
+        list. The trailing columns in the <code class="ph codeph">SELECT</code> list are substituted in order for the partition key columns with no
+        specified value.
+      </p>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title5" id="partitioning__partition_refresh">
+
+    <h2 class="title topictitle2" id="ariaid-title5">Refreshing a Single Partition</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        The <code class="ph codeph">REFRESH</code> statement is typically used with partitioned tables when new data files are loaded into a partition by
+        some non-Impala mechanism, such as a Hive or Spark job. The <code class="ph codeph">REFRESH</code> statement makes Impala aware of the new data
+        files so that they can be used in Impala queries. Because partitioned tables typically contain a high volume of data, the
+        <code class="ph codeph">REFRESH</code> operation for a full partitioned table can take significant time.
+      </p>
+
+      <p class="p">
+        In <span class="keyword">Impala 2.7</span> and higher, you can include a <code class="ph codeph">PARTITION (<var class="keyword varname">partition_spec</var>)</code> clause in the
+        <code class="ph codeph">REFRESH</code> statement so that only a single partition is refreshed. For example, <code class="ph codeph">REFRESH big_table PARTITION
+        (year=2017, month=9, day=30)</code>. The partition spec must include all the partition key columns. See
+        <a class="xref" href="impala_refresh.html#refresh">REFRESH Statement</a> for more details and examples of <code class="ph codeph">REFRESH</code> syntax and usage.
+      </p>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title6" id="partitioning__partition_permissions">
+
+    <h2 class="title topictitle2" id="ariaid-title6">Permissions for Partition Subdirectories</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        By default, if an <code class="ph codeph">INSERT</code> statement creates any new subdirectories underneath a partitioned
+        table, those subdirectories are assigned default HDFS permissions for the <code class="ph codeph">impala</code> user. To
+        make each subdirectory have the same permissions as its parent directory in HDFS, specify the
+        <code class="ph codeph">--insert_inherit_permissions</code> startup option for the <span class="keyword cmdname">impalad</span> daemon.
+      </p>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title7" id="partitioning__partition_pruning">
+
+    <h2 class="title topictitle2" id="ariaid-title7">Partition Pruning for Queries</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        Partition pruning refers to the mechanism where a query can skip reading the data files corresponding to one or more partitions. If
+        you can arrange for queries to prune large numbers of unnecessary partitions from the query execution plan, the queries use fewer
+        resources and are thus proportionally faster and more scalable.
+      </p>
+
+      <p class="p">
+        For example, if a table is partitioned by columns <code class="ph codeph">YEAR</code>, <code class="ph codeph">MONTH</code>, and <code class="ph codeph">DAY</code>, then
+        <code class="ph codeph">WHERE</code> clauses such as <code class="ph codeph">WHERE year = 2013</code>, <code class="ph codeph">WHERE year &lt; 2010</code>, or <code class="ph codeph">WHERE
+        year BETWEEN 1995 AND 1998</code> allow Impala to skip the data files in all partitions outside the specified range. Likewise,
+        <code class="ph codeph">WHERE year = 2013 AND month BETWEEN 1 AND 3</code> could prune even more partitions, reading the data files for only a
+        portion of one year.
+      </p>
+
+      <p class="p toc inpage"></p>
+
+    </div>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title8" id="partition_pruning__partition_pruning_checking">
+
+      <h3 class="title topictitle3" id="ariaid-title8">Checking if Partition Pruning Happens for a Query</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          To check the effectiveness of partition pruning for a query, check the <code class="ph codeph">EXPLAIN</code> output for the query before
+          running it. For example, this example shows a table with 3 partitions, where the query only reads 1 of them. The notation
+          <code class="ph codeph">#partitions=1/3</code> in the <code class="ph codeph">EXPLAIN</code> plan confirms that Impala can do the appropriate partition
+          pruning.
+        </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; insert into census partition (year=2010) values ('Smith'),('Jones');
+[localhost:21000] &gt; insert into census partition (year=2011) values ('Smith'),('Jones'),('Doe');
+[localhost:21000] &gt; insert into census partition (year=2012) values ('Smith'),('Doe');
+[localhost:21000] &gt; select name from census where year=2010;
++-------+
+| name  |
++-------+
+| Smith |
+| Jones |
++-------+
+[localhost:21000] &gt; explain select name from census <strong class="ph b">where year=2010</strong>;
++------------------------------------------------------------------+
+| Explain String                                                   |
++------------------------------------------------------------------+
+| PLAN FRAGMENT 0                                                  |
+|   PARTITION: UNPARTITIONED                                       |
+|                                                                  |
+|   1:EXCHANGE                                                     |
+|                                                                  |
+| PLAN FRAGMENT 1                                                  |
+|   PARTITION: RANDOM                                              |
+|                                                                  |
+|   STREAM DATA SINK                                               |
+|     EXCHANGE ID: 1                                               |
+|     UNPARTITIONED                                                |
+|                                                                  |
+|   0:SCAN HDFS                                                    |
+|      table=predicate_propagation.census <strong class="ph b">#partitions=1/3</strong> size=12B |
++------------------------------------------------------------------+</code></pre>
+
+        <p class="p">
+          For a report of the volume of data that was actually read and processed at each stage of the query, check the output of the
+          <code class="ph codeph">SUMMARY</code> command immediately after running the query. For a more detailed analysis, look at the output of the
+          <code class="ph codeph">PROFILE</code> command; it includes this same summary report near the start of the profile output.
+        </p>
+
+      </div>
+
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title9" id="partition_pruning__partition_pruning_sql">
+
+      <h3 class="title topictitle3" id="ariaid-title9">What SQL Constructs Work with Partition Pruning</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+
+          Impala can even do partition pruning in cases where the partition key column is not directly compared to a constant, by applying
+          the transitive property to other parts of the <code class="ph codeph">WHERE</code> clause. This technique is known as predicate propagation, and
+          is available in Impala 1.2.2 and later. In this example, the census table includes another column indicating when the data was
+          collected, which happens in 10-year intervals. Even though the query does not compare the partition key column
+          (<code class="ph codeph">YEAR</code>) to a constant value, Impala can deduce that only the partition <code class="ph codeph">YEAR=2010</code> is required, and
+          again only reads 1 out of 3 partitions.
+        </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; drop table census;
+[localhost:21000] &gt; create table census (name string, census_year int) partitioned by (year int);
+[localhost:21000] &gt; insert into census partition (year=2010) values ('Smith',2010),('Jones',2010);
+[localhost:21000] &gt; insert into census partition (year=2011) values ('Smith',2020),('Jones',2020),('Doe',2020);
+[localhost:21000] &gt; insert into census partition (year=2012) values ('Smith',2020),('Doe',2020);
+[localhost:21000] &gt; select name from census where year = census_year and census_year=2010;
++-------+
+| name  |
++-------+
+| Smith |
+| Jones |
++-------+
+[localhost:21000] &gt; explain select name from census <strong class="ph b">where year = census_year and census_year=2010</strong>;
++------------------------------------------------------------------+
+| Explain String                                                   |
++------------------------------------------------------------------+
+| PLAN FRAGMENT 0                                                  |
+|   PARTITION: UNPARTITIONED                                       |
+|                                                                  |
+|   1:EXCHANGE                                                     |
+|                                                                  |
+| PLAN FRAGMENT 1                                                  |
+|   PARTITION: RANDOM                                              |
+|                                                                  |
+|   STREAM DATA SINK                                               |
+|     EXCHANGE ID: 1                                               |
+|     UNPARTITIONED                                                |
+|                                                                  |
+|   0:SCAN HDFS                                                    |
+|      table=predicate_propagation.census <strong class="ph b">#partitions=1/3</strong> size=22B |
+|      predicates: census_year = 2010, year = census_year          |
++------------------------------------------------------------------+
+</code></pre>
+
+        <p class="p">
+        If a view applies to a partitioned table, any partition pruning considers the clauses on both
+        the original query and any additional <code class="ph codeph">WHERE</code> predicates in the query that refers to the view.
+        Prior to Impala 1.4, only the <code class="ph codeph">WHERE</code> clauses on the original query from the
+        <code class="ph codeph">CREATE VIEW</code> statement were used for partition pruning.
+      </p>
+
+        <p class="p">
+        In queries involving both analytic functions and partitioned tables, partition pruning only occurs for columns named in the <code class="ph codeph">PARTITION BY</code>
+        clause of the analytic function call. For example, if an analytic function query has a clause such as <code class="ph codeph">WHERE year=2016</code>,
+        the way to make the query prune all other <code class="ph codeph">YEAR</code> partitions is to include <code class="ph codeph">PARTITION BY year</code> in the analytic function call;
+        for example, <code class="ph codeph">OVER (PARTITION BY year,<var class="keyword varname">other_columns</var> <var class="keyword varname">other_analytic_clauses</var>)</code>.
+
+      </p>
+
+      </div>
+
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title10" id="partition_pruning__dynamic_partition_pruning">
+
+      <h3 class="title topictitle3" id="ariaid-title10">Dynamic Partition Pruning</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          The original mechanism uses to prune partitions is <dfn class="term">static partition pruning</dfn>, in which the conditions in the
+          <code class="ph codeph">WHERE</code> clause are analyzed to determine in advance which partitions can be safely skipped. In <span class="keyword">Impala 2.5</span>
+          and higher, Impala can perform <dfn class="term">dynamic partition pruning</dfn>, where information about the partitions is collected during
+          the query, and Impala prunes unnecessary partitions in ways that were impractical to predict in advance.
+        </p>
+
+        <p class="p">
+          For example, if partition key columns are compared to literal values in a <code class="ph codeph">WHERE</code> clause, Impala can perform static
+          partition pruning during the planning phase to only read the relevant partitions:
+        </p>
+
+<pre class="pre codeblock"><code>
+-- The query only needs to read 3 partitions whose key values are known ahead of time.
+-- That's static partition pruning.
+SELECT COUNT(*) FROM sales_table WHERE year IN (2005, 2010, 2015);
+</code></pre>
+
+        <p class="p">
+          Dynamic partition pruning involves using information only available at run time, such as the result of a subquery:
+        </p>
+
+<pre class="pre codeblock"><code>
+create table yy (s string) partitioned by (year int) stored as parquet;
+insert into yy partition (year) values ('1999', 1999), ('2000', 2000),
+  ('2001', 2001), ('2010',2010);
+compute stats yy;
+
+create table yy2 (s string) partitioned by (year int) stored as parquet;
+insert into yy2 partition (year) values ('1999', 1999), ('2000', 2000),
+  ('2001', 2001);
+compute stats yy2;
+
+-- The query reads an unknown number of partitions, whose key values are only
+-- known at run time. The 'runtime filters' lines show how the information about
+-- the partitions is calculated in query fragment 02, and then used in query
+-- fragment 00 to decide which partitions to skip.
+explain select s from yy2 where year in (select year from yy where year between 2000 and 2005);
++----------------------------------------------------------+
+| Explain String                                           |
++----------------------------------------------------------+
+| Estimated Per-Host Requirements: Memory=16.00MB VCores=2 |
+|                                                          |
+| 04:EXCHANGE [UNPARTITIONED]                              |
+| |                                                        |
+| 02:HASH JOIN [LEFT SEMI JOIN, BROADCAST]                 |
+| |  hash predicates: year = year                          |
+| |  <strong class="ph b">runtime filters: RF000 &lt;- year</strong>                        |
+| |                                                        |
+| |--03:EXCHANGE [BROADCAST]                               |
+| |  |                                                     |
+| |  01:SCAN HDFS [dpp.yy]                                 |
+| |     partitions=2/4 files=2 size=468B                   |
+| |                                                        |
+| 00:SCAN HDFS [dpp.yy2]                                   |
+|    partitions=2/3 files=2 size=468B                      |
+|    <strong class="ph b">runtime filters: RF000 -&gt; year</strong>                        |
++----------------------------------------------------------+
+</code></pre>
+
+
+
+        <p class="p">
+          In this case, Impala evaluates the subquery, sends the subquery results to all Impala nodes participating in the query, and then
+          each <span class="keyword cmdname">impalad</span> daemon uses the dynamic partition pruning optimization to read only the partitions with the
+          relevant key values.
+        </p>
+
+        <p class="p">
+          Dynamic partition pruning is especially effective for queries involving joins of several large partitioned tables. Evaluating the
+          <code class="ph codeph">ON</code> clauses of the join predicates might normally require reading data from all partitions of certain tables. If
+          the <code class="ph codeph">WHERE</code> clauses of the query refer to the partition key columns, Impala can now often skip reading many of the
+          partitions while evaluating the <code class="ph codeph">ON</code> clauses. The dynamic partition pruning optimization reduces the amount of I/O
+          and the amount of intermediate data stored and transmitted across the network during the query.
+        </p>
+
+        <p class="p">
+        When the spill-to-disk feature is activated for a join node within a query, Impala does not
+        produce any runtime filters for that join operation on that host. Other join nodes within
+        the query are not affected.
+      </p>
+
+        <p class="p">
+          Dynamic partition pruning is part of the runtime filtering feature, which applies to other kinds of queries in addition to queries
+          against partitioned tables. See <a class="xref" href="impala_runtime_filtering.html#runtime_filtering">Runtime Filtering for Impala Queries (Impala 2.5 or higher only)</a> for full details about this feature.
+        </p>
+
+      </div>
+
+    </article>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title11" id="partitioning__partition_key_columns">
+
+    <h2 class="title topictitle2" id="ariaid-title11">Partition Key Columns</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        The columns you choose as the partition keys should be ones that are frequently used to filter query results in important,
+        large-scale queries. Popular examples are some combination of year, month, and day when the data has associated time values, and
+        geographic region when the data is associated with some place.
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          <p class="p">
+            For time-based data, split out the separate parts into their own columns, because Impala cannot partition based on a
+            <code class="ph codeph">TIMESTAMP</code> column.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            The data type of the partition columns does not have a significant effect on the storage required, because the values from those
+            columns are not stored in the data files, rather they are represented as strings inside HDFS directory names.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            In <span class="keyword">Impala 2.5</span> and higher, you can enable the <code class="ph codeph">OPTIMIZE_PARTITION_KEY_SCANS</code> query option to speed up
+            queries that only refer to partition key columns, such as <code class="ph codeph">SELECT MAX(year)</code>. This setting is not enabled by
+            default because the query behavior is slightly different if the table contains partition directories without actual data inside.
+            See <a class="xref" href="impala_optimize_partition_key_scans.html#optimize_partition_key_scans">OPTIMIZE_PARTITION_KEY_SCANS Query Option (Impala 2.5 or higher only)</a> for details.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+        Partitioned tables can contain complex type columns.
+        All the partition key columns must be scalar types.
+      </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            Remember that when Impala queries data stored in HDFS, it is most efficient to use multi-megabyte files to take advantage of the
+            HDFS block size. For Parquet tables, the block size (and ideal size of the data files) is <span class="ph">256 MB in
+            Impala 2.0 and later</span>. Therefore, avoid specifying too many partition key columns, which could result in individual
+            partitions containing only small amounts of data. For example, if you receive 1 GB of data per day, you might partition by year,
+            month, and day; while if you receive 5 GB of data per minute, you might partition by year, month, day, hour, and minute. If you
+            have data with a geographic component, you might partition based on postal code if you have many megabytes of data for each
+            postal code, but if not, you might partition by some larger region such as city, state, or country. state
+          </p>
+        </li>
+      </ul>
+
+      <p class="p">
+        If you frequently run aggregate functions such as <code class="ph codeph">MIN()</code>, <code class="ph codeph">MAX()</code>, and
+        <code class="ph codeph">COUNT(DISTINCT)</code> on partition key columns, consider enabling the <code class="ph codeph">OPTIMIZE_PARTITION_KEY_SCANS</code>
+        query option, which optimizes such queries. This feature is available in <span class="keyword">Impala 2.5</span> and higher.
+        See <a class="xref" href="../shared/../topics/impala_optimize_partition_key_scans.html">OPTIMIZE_PARTITION_KEY_SCANS Query Option (Impala 2.5 or higher only)</a>
+        for the kinds of queries that this option applies to, and slight differences in how partitions are
+        evaluated when this query option is enabled.
+      </p>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title12" id="partitioning__mixed_format_partitions">
+
+    <h2 class="title topictitle2" id="ariaid-title12">Setting Different File Formats for Partitions</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        Partitioned tables have the flexibility to use different file formats for different partitions. (For background information about
+        the different file formats Impala supports, see <a class="xref" href="impala_file_formats.html#file_formats">How Impala Works with Hadoop File Formats</a>.) For example, if you originally
+        received data in text format, then received new data in RCFile format, and eventually began receiving data in Parquet format, all
+        that data could reside in the same table for queries. You just need to ensure that the table is structured so that the data files
+        that use different file formats reside in separate partitions.
+      </p>
+
+      <p class="p">
+        For example, here is how you might switch from text to Parquet data as you receive data for different years:
+      </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; create table census (name string) partitioned by (year smallint);
+[localhost:21000] &gt; alter table census add partition (year=2012); -- Text format;
+
+[localhost:21000] &gt; alter table census add partition (year=2013); -- Text format switches to Parquet before data loaded;
+[localhost:21000] &gt; alter table census partition (year=2013) set fileformat parquet;
+
+[localhost:21000] &gt; insert into census partition (year=2012) values ('Smith'),('Jones'),('Lee'),('Singh');
+[localhost:21000] &gt; insert into census partition (year=2013) values ('Flores'),('Bogomolov'),('Cooper'),('Appiah');</code></pre>
+
+      <p class="p">
+        At this point, the HDFS directory for <code class="ph codeph">year=2012</code> contains a text-format data file, while the HDFS directory for
+        <code class="ph codeph">year=2013</code> contains a Parquet data file. As always, when loading non-trivial data, you would use <code class="ph codeph">INSERT ...
+        SELECT</code> or <code class="ph codeph">LOAD DATA</code> to import data in large batches, rather than <code class="ph codeph">INSERT ... VALUES</code> which
+        produces small files that are inefficient for real-world queries.
+      </p>
+
+      <p class="p">
+        For other file types that Impala cannot create natively, you can switch into Hive and issue the <code class="ph codeph">ALTER TABLE ... SET
+        FILEFORMAT</code> statements and <code class="ph codeph">INSERT</code> or <code class="ph codeph">LOAD DATA</code> statements there. After switching back to
+        Impala, issue a <code class="ph codeph">REFRESH <var class="keyword varname">table_name</var></code> statement so that Impala recognizes any partitions or new
+        data added through Hive.
+      </p>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title13" id="partitioning__partition_management">
+
+    <h2 class="title topictitle2" id="ariaid-title13">Managing Partitions</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        You can add, drop, set the expected file format, or set the HDFS location of the data files for individual partitions within an
+        Impala table. See <a class="xref" href="impala_alter_table.html#alter_table">ALTER TABLE Statement</a> for syntax details, and
+        <a class="xref" href="impala_partitioning.html#mixed_format_partitions">Setting Different File Formats for Partitions</a> for tips on managing tables containing partitions with different file
+        formats.
+      </p>
+
+      <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+        If you are creating a partition for the first time and specifying its location, for maximum efficiency, use
+        a single <code class="ph codeph">ALTER TABLE</code> statement including both the <code class="ph codeph">ADD PARTITION</code> and
+        <code class="ph codeph">LOCATION</code> clauses, rather than separate statements with <code class="ph codeph">ADD PARTITION</code> and
+        <code class="ph codeph">SET LOCATION</code> clauses.
+      </div>
+
+      <p class="p">
+        What happens to the data files when a partition is dropped depends on whether the partitioned table is designated as internal or
+        external. For an internal (managed) table, the data files are deleted. For example, if data in the partitioned table is a copy of
+        raw data files stored elsewhere, you might save disk space by dropping older partitions that are no longer required for reporting,
+        knowing that the original data is still available if needed later. For an external table, the data files are left alone. For
+        example, dropping a partition without deleting the associated files lets Impala consider a smaller set of partitions, improving
+        query efficiency and reducing overhead for DDL operations on the table; if the data is needed again later, you can add the partition
+        again. See <a class="xref" href="impala_tables.html#tables">Overview of Impala Tables</a> for details and examples.
+      </p>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title14" id="partitioning__partition_kudu">
+
+    <h2 class="title topictitle2" id="ariaid-title14">Using Partitioning with Kudu Tables</h2>
+
+
+
+    <div class="body conbody">
+
+      <p class="p">
+        Kudu tables use a more fine-grained partitioning scheme than tables containing HDFS data files. You specify a <code class="ph codeph">PARTITION
+        BY</code> clause with the <code class="ph codeph">CREATE TABLE</code> statement to identify how to divide the values from the partition key
+        columns.
+      </p>
+
+      <p class="p">
+        See <a class="xref" href="impala_kudu.html#kudu_partitioning">Partitioning for Kudu Tables</a> for
+        details and examples of the partitioning techniques
+        for Kudu tables.
+      </p>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title15" id="partitioning__partition_stats">
+    <h2 class="title topictitle2" id="ariaid-title15">Keeping Statistics Up to Date for Partitioned Tables</h2>
+    <div class="body conbody">
+
+      <p class="p">
+        Because the <code class="ph codeph">COMPUTE STATS</code> statement can be resource-intensive to run on a partitioned table
+        as new partitions are added, Impala includes a variation of this statement that allows computing statistics
+        on a per-partition basis such that stats can be incrementally updated when new partitions are added.
+      </p>
+
+      <div class="note important note_important"><span class="note__title importanttitle">Important:</span>
+        <p class="p">
+        For a particular table, use either <code class="ph codeph">COMPUTE STATS</code> or
+        <code class="ph codeph">COMPUTE INCREMENTAL STATS</code>, but never combine the two or
+        alternate between them. If you switch from <code class="ph codeph">COMPUTE STATS</code> to
+        <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> during the lifetime of a table, or
+        vice versa, drop all statistics by running <code class="ph codeph">DROP STATS</code> before
+        making the switch.
+      </p>
+        <p class="p">
+        When you run <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> on a table for the first time,
+        the statistics are computed again from scratch regardless of whether the table already
+        has statistics. Therefore, expect a one-time resource-intensive operation
+        for scanning the entire table when running <code class="ph codeph">COMPUTE INCREMENTAL STATS</code>
+        for the first time on a given table.
+      </p>
+        <p class="p">
+        For a table with a huge number of partitions and many columns, the approximately 400 bytes
+        of metadata per column per partition can add up to significant memory overhead, as it must
+        be cached on the <span class="keyword cmdname">catalogd</span> host and on every <span class="keyword cmdname">impalad</span> host
+        that is eligible to be a coordinator. If this metadata for all tables combined exceeds 2 GB,
+        you might experience service downtime.
+      </p>
+      </div>
+
+      <p class="p">
+        The <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> variation computes statistics only for partitions that were
+        added or changed since the last <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> statement, rather than the entire
+        table. It is typically used for tables where a full <code class="ph codeph">COMPUTE STATS</code>
+        operation takes too long to be practical each time a partition is added or dropped. See
+        <a class="xref" href="impala_perf_stats.html#perf_stats_incremental">impala_perf_stats.html#perf_stats_incremental</a> for full usage details.
+      </p>
+
+<pre class="pre codeblock"><code>-- Initially the table has no incremental stats, as indicated
+-- 'false' under Incremental stats.
+show table stats item_partitioned;
++-------------+-------+--------+----------+--------------+---------+------------------
+| i_category  | #Rows | #Files | Size     | Bytes Cached | Format  | Incremental stats
++-------------+-------+--------+----------+--------------+---------+------------------
+| Books       | -1    | 1      | 223.74KB | NOT CACHED   | PARQUET | false
+| Children    | -1    | 1      | 230.05KB | NOT CACHED   | PARQUET | false
+| Electronics | -1    | 1      | 232.67KB | NOT CACHED   | PARQUET | false
+| Home        | -1    | 1      | 232.56KB | NOT CACHED   | PARQUET | false
+| Jewelry     | -1    | 1      | 223.72KB | NOT CACHED   | PARQUET | false
+| Men         | -1    | 1      | 231.25KB | NOT CACHED   | PARQUET | false
+| Music       | -1    | 1      | 237.90KB | NOT CACHED   | PARQUET | false
+| Shoes       | -1    | 1      | 234.90KB | NOT CACHED   | PARQUET | false
+| Sports      | -1    | 1      | 227.97KB | NOT CACHED   | PARQUET | false
+| Women       | -1    | 1      | 226.27KB | NOT CACHED   | PARQUET | false
+| Total       | -1    | 10     | 2.25MB   | 0B           |         |
++-------------+-------+--------+----------+--------------+---------+------------------
+
+-- After the first COMPUTE INCREMENTAL STATS,
+-- all partitions have stats. The first
+-- COMPUTE INCREMENTAL STATS scans the whole
+-- table, discarding any previous stats from
+-- a traditional COMPUTE STATS statement.
+compute incremental stats item_partitioned;
++-------------------------------------------+
+| summary                                   |
++-------------------------------------------+
+| Updated 10 partition(s) and 21 column(s). |
++-------------------------------------------+
+show table stats item_partitioned;
++-------------+-------+--------+----------+--------------+---------+------------------
+| i_category  | #Rows | #Files | Size     | Bytes Cached | Format  | Incremental stats
++-------------+-------+--------+----------+--------------+---------+------------------
+| Books       | 1733  | 1      | 223.74KB | NOT CACHED   | PARQUET | true
+| Children    | 1786  | 1      | 230.05KB | NOT CACHED   | PARQUET | true
+| Electronics | 1812  | 1      | 232.67KB | NOT CACHED   | PARQUET | true
+| Home        | 1807  | 1      | 232.56KB | NOT CACHED   | PARQUET | true
+| Jewelry     | 1740  | 1      | 223.72KB | NOT CACHED   | PARQUET | true
+| Men         | 1811  | 1      | 231.25KB | NOT CACHED   | PARQUET | true
+| Music       | 1860  | 1      | 237.90KB | NOT CACHED   | PARQUET | true
+| Shoes       | 1835  | 1      | 234.90KB | NOT CACHED   | PARQUET | true
+| Sports      | 1783  | 1      | 227.97KB | NOT CACHED   | PARQUET | true
+| Women       | 1790  | 1      | 226.27KB | NOT CACHED   | PARQUET | true
+| Total       | 17957 | 10     | 2.25MB   | 0B           |         |
++-------------+-------+--------+----------+--------------+---------+------------------
+
+-- Add a new partition...
+alter table item_partitioned add partition (i_category='Camping');
+-- Add or replace files in HDFS outside of Impala,
+-- rendering the stats for a partition obsolete.
+!import_data_into_sports_partition.sh
+refresh item_partitioned;
+drop incremental stats item_partitioned partition (i_category='Sports');
+-- Now some partitions have incremental stats
+-- and some do not.
+show table stats item_partitioned;
++-------------+-------+--------+----------+--------------+---------+------------------
+| i_category  | #Rows | #Files | Size     | Bytes Cached | Format  | Incremental stats
++-------------+-------+--------+----------+--------------+---------+------------------
+| Books       | 1733  | 1      | 223.74KB | NOT CACHED   | PARQUET | true
+| Camping     | -1    | 1      | 408.02KB | NOT CACHED   | PARQUET | false
+| Children    | 1786  | 1      | 230.05KB | NOT CACHED   | PARQUET | true
+| Electronics | 1812  | 1      | 232.67KB | NOT CACHED   | PARQUET | true
+| Home        | 1807  | 1      | 232.56KB | NOT CACHED   | PARQUET | true
+| Jewelry     | 1740  | 1      | 223.72KB | NOT CACHED   | PARQUET | true
+| Men         | 1811  | 1      | 231.25KB | NOT CACHED   | PARQUET | true
+| Music       | 1860  | 1      | 237.90KB | NOT CACHED   | PARQUET | true
+| Shoes       | 1835  | 1      | 234.90KB | NOT CACHED   | PARQUET | true
+| Sports      | -1    | 1      | 227.97KB | NOT CACHED   | PARQUET | false
+| Women       | 1790  | 1      | 226.27KB | NOT CACHED   | PARQUET | true
+| Total       | 17957 | 11     | 2.65MB   | 0B           |         |
++-------------+-------+--------+----------+--------------+---------+------------------
+
+-- After another COMPUTE INCREMENTAL STATS,
+-- all partitions have incremental stats, and only the 2
+-- partitions without incremental stats were scanned.
+compute incremental stats item_partitioned;
++------------------------------------------+
+| summary                                  |
++------------------------------------------+
+| Updated 2 partition(s) and 21 column(s). |
++------------------------------------------+
+show table stats item_partitioned;
++-------------+-------+--------+----------+--------------+---------+------------------
+| i_category  | #Rows | #Files | Size     | Bytes Cached | Format  | Incremental stats
++-------------+-------+--------+----------+--------------+---------+------------------
+| Books       | 1733  | 1      | 223.74KB | NOT CACHED   | PARQUET | true
+| Camping     | 5328  | 1      | 408.02KB | NOT CACHED   | PARQUET | true
+| Children    | 1786  | 1      | 230.05KB | NOT CACHED   | PARQUET | true
+| Electronics | 1812  | 1      | 232.67KB | NOT CACHED   | PARQUET | true
+| Home        | 1807  | 1      | 232.56KB | NOT CACHED   | PARQUET | true
+| Jewelry     | 1740  | 1      | 223.72KB | NOT CACHED   | PARQUET | true
+| Men         | 1811  | 1      | 231.25KB | NOT CACHED   | PARQUET | true
+| Music       | 1860  | 1      | 237.90KB | NOT CACHED   | PARQUET | true
+| Shoes       | 1835  | 1      | 234.90KB | NOT CACHED   | PARQUET | true
+| Sports      | 1783  | 1      | 227.97KB | NOT CACHED   | PARQUET | true
+| Women       | 1790  | 1      | 226.27KB | NOT CACHED   | PARQUET | true
+| Total       | 17957 | 11     | 2.65MB   | 0B           |         |
++-------------+-------+--------+----------+--------------+---------+------------------
+</code></pre>
+
+    </div>
+  </article>
+
+</article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_perf_benchmarking.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_perf_benchmarking.html b/docs/build3x/html/topics/impala_perf_benchmarking.html
new file mode 100644
index 0000000..0470e72
--- /dev/null
+++ b/docs/build3x/html/topics/impala_perf_benchmarking.html
@@ -0,0 +1,27 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_performance.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="perf_benchmarks"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Benchmarking Impala Queries</title></head><body id="perf_benchmarks"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Benchmarking Impala Queries</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+      Because Impala, like other Hadoop components, is designed to handle large data volumes in a distributed
+      environment, conduct any performance tests using realistic data and cluster configurations. Use a multi-node
+      cluster rather than a single node; run queries against tables containing terabytes of data rather than tens
+      of gigabytes. The parallel processing techniques used by Impala are most appropriate for workloads that are
+      beyond the capacity of a single server.
+    </p>
+
+    <p class="p">
+      When you run queries returning large numbers of rows, the CPU time to pretty-print the output can be
+      substantial, giving an inaccurate measurement of the actual query time. Consider using the
+      <code class="ph codeph">-B</code> option on the <code class="ph codeph">impala-shell</code> command to turn off the pretty-printing, and
+      optionally the <code class="ph codeph">-o</code> option to store query results in a file rather than printing to the
+      screen. See <a class="xref" href="impala_shell_options.html#shell_options">impala-shell Configuration Options</a> for details.
+    </p>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_performance.html">Tuning Impala for Performance</a></div></div></nav></article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_perf_cookbook.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_perf_cookbook.html b/docs/build3x/html/topics/impala_perf_cookbook.html
new file mode 100644
index 0000000..5e7c7ec
--- /dev/null
+++ b/docs/build3x/html/topics/impala_perf_cookbook.html
@@ -0,0 +1,256 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_performance.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="perf_cookbook"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Impala Performance Guidelines and Best Practices</title></head><body id="perf_cookbook"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Impala Performance Guidelines and Best Practices</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+      Here are performance guidelines and best practices that you can use during planning, experimentation, and
+      performance tuning for an Impala-enabled <span class="keyword"></span> cluster. All of this information is also available in more
+      detail elsewhere in the Impala documentation; it is gathered together here to serve as a cookbook and
+      emphasize which performance techniques typically provide the highest return on investment
+    </p>
+
+    <p class="p toc inpage"></p>
+
+    <section class="section" id="perf_cookbook__perf_cookbook_file_format"><h2 class="title sectiontitle">Choose the appropriate file format for the data.</h2>
+
+
+
+      <p class="p">
+        Typically, for large volumes of data (multiple gigabytes per table or partition), the Parquet file format
+        performs best because of its combination of columnar storage layout, large I/O request size, and
+        compression and encoding. See <a class="xref" href="impala_file_formats.html#file_formats">How Impala Works with Hadoop File Formats</a> for comparisons of all
+        file formats supported by Impala, and <a class="xref" href="impala_parquet.html#parquet">Using the Parquet File Format with Impala Tables</a> for details about the
+        Parquet file format.
+      </p>
+
+      <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+        For smaller volumes of data, a few gigabytes or less for each table or partition, you might not see
+        significant performance differences between file formats. At small data volumes, reduced I/O from an
+        efficient compressed file format can be counterbalanced by reduced opportunity for parallel execution. When
+        planning for a production deployment or conducting benchmarks, always use realistic data volumes to get a
+        true picture of performance and scalability.
+      </div>
+    </section>
+
+    <section class="section" id="perf_cookbook__perf_cookbook_small_files"><h2 class="title sectiontitle">Avoid data ingestion processes that produce many small files.</h2>
+
+
+
+      <p class="p">
+        When producing data files outside of Impala, prefer either text format or Avro, where you can build up the
+        files row by row. Once the data is in Impala, you can convert it to the more efficient Parquet format and
+        split into multiple data files using a single <code class="ph codeph">INSERT ... SELECT</code> statement. Or, if you have
+        the infrastructure to produce multi-megabyte Parquet files as part of your data preparation process, do
+        that and skip the conversion step inside Impala.
+      </p>
+
+      <p class="p">
+        Always use <code class="ph codeph">INSERT ... SELECT</code> to copy significant volumes of data from table to table
+        within Impala. Avoid <code class="ph codeph">INSERT ... VALUES</code> for any substantial volume of data or
+        performance-critical tables, because each such statement produces a separate tiny data file. See
+        <a class="xref" href="impala_insert.html#insert">INSERT Statement</a> for examples of the <code class="ph codeph">INSERT ... SELECT</code> syntax.
+      </p>
+
+      <p class="p">
+        For example, if you have thousands of partitions in a Parquet table, each with less than
+        <span class="ph">256 MB</span> of data, consider partitioning in a less granular way, such as by
+        year / month rather than year / month / day. If an inefficient data ingestion process produces thousands of
+        data files in the same table or partition, consider compacting the data by performing an <code class="ph codeph">INSERT ...
+        SELECT</code> to copy all the data to a different table; the data will be reorganized into a smaller
+        number of larger files by this process.
+      </p>
+    </section>
+
+    <section class="section" id="perf_cookbook__perf_cookbook_partitioning"><h2 class="title sectiontitle">Choose partitioning granularity based on actual data volume.</h2>
+
+
+
+      <p class="p">
+        Partitioning is a technique that physically divides the data based on values of one or more columns, such
+        as by year, month, day, region, city, section of a web site, and so on. When you issue queries that request
+        a specific value or range of values for the partition key columns, Impala can avoid reading the irrelevant
+        data, potentially yielding a huge savings in disk I/O.
+      </p>
+
+      <p class="p">
+        When deciding which column(s) to use for partitioning, choose the right level of granularity. For example,
+        should you partition by year, month, and day, or only by year and month? Choose a partitioning strategy
+        that puts at least <span class="ph">256 MB</span> of data in each partition, to take advantage of
+        HDFS bulk I/O and Impala distributed queries.
+      </p>
+
+      <p class="p">
+        Over-partitioning can also cause query planning to take longer than necessary, as Impala prunes the
+        unnecessary partitions. Ideally, keep the number of partitions in the table under 30 thousand.
+      </p>
+
+      <p class="p">
+        When preparing data files to go in a partition directory, create several large files rather than many small
+        ones. If you receive data in the form of many small files and have no control over the input format,
+        consider using the <code class="ph codeph">INSERT ... SELECT</code> syntax to copy data from one table or partition to
+        another, which compacts the files into a relatively small number (based on the number of nodes in the
+        cluster).
+      </p>
+
+      <p class="p">
+        If you need to reduce the overall number of partitions and increase the amount of data in each partition,
+        first look for partition key columns that are rarely referenced or are referenced in non-critical queries
+        (not subject to an SLA). For example, your web site log data might be partitioned by year, month, day, and
+        hour, but if most queries roll up the results by day, perhaps you only need to partition by year, month,
+        and day.
+      </p>
+
+      <p class="p">
+        If you need to reduce the granularity even more, consider creating <span class="q">"buckets"</span>, computed values
+        corresponding to different sets of partition key values. For example, you can use the
+        <code class="ph codeph">TRUNC()</code> function with a <code class="ph codeph">TIMESTAMP</code> column to group date and time values
+        based on intervals such as week or quarter. See
+        <a class="xref" href="impala_datetime_functions.html#datetime_functions">Impala Date and Time Functions</a> for details.
+      </p>
+
+      <p class="p">
+        See <a class="xref" href="impala_partitioning.html#partitioning">Partitioning for Impala Tables</a> for full details and performance considerations for
+        partitioning.
+      </p>
+    </section>
+
+    <section class="section" id="perf_cookbook__perf_cookbook_partition_keys"><h2 class="title sectiontitle">Use smallest appropriate integer types for partition key columns.</h2>
+
+
+
+      <p class="p">
+        Although it is tempting to use strings for partition key columns, since those values are turned into HDFS
+        directory names anyway, you can minimize memory usage by using numeric values for common partition key
+        fields such as <code class="ph codeph">YEAR</code>, <code class="ph codeph">MONTH</code>, and <code class="ph codeph">DAY</code>. Use the smallest
+        integer type that holds the appropriate range of values, typically <code class="ph codeph">TINYINT</code> for
+        <code class="ph codeph">MONTH</code> and <code class="ph codeph">DAY</code>, and <code class="ph codeph">SMALLINT</code> for <code class="ph codeph">YEAR</code>.
+        Use the <code class="ph codeph">EXTRACT()</code> function to pull out individual date and time fields from a
+        <code class="ph codeph">TIMESTAMP</code> value, and <code class="ph codeph">CAST()</code> the return value to the appropriate integer
+        type.
+      </p>
+    </section>
+
+    <section class="section" id="perf_cookbook__perf_cookbook_parquet_block_size"><h2 class="title sectiontitle">Choose an appropriate Parquet block size.</h2>
+
+
+
+      <p class="p">
+        By default, the Impala <code class="ph codeph">INSERT ... SELECT</code> statement creates Parquet files with a 256 MB
+        block size. (This default was changed in Impala 2.0. Formerly, the limit was 1 GB, but Impala made
+        conservative estimates about compression, resulting in files that were smaller than 1 GB.)
+      </p>
+
+      <p class="p">
+        Each Parquet file written by Impala is a single block, allowing the whole file to be processed as a unit by a single host.
+        As you copy Parquet files into HDFS or between HDFS filesystems, use <code class="ph codeph">hdfs dfs -pb</code> to preserve the original
+        block size.
+      </p>
+
+      <p class="p">
+        If there is only one or a few data block in your Parquet table, or in a partition that is the only one
+        accessed by a query, then you might experience a slowdown for a different reason: not enough data to take
+        advantage of Impala's parallel distributed queries. Each data block is processed by a single core on one of
+        the DataNodes. In a 100-node cluster of 16-core machines, you could potentially process thousands of data
+        files simultaneously. You want to find a sweet spot between <span class="q">"many tiny files"</span> and <span class="q">"single giant
+        file"</span> that balances bulk I/O and parallel processing. You can set the <code class="ph codeph">PARQUET_FILE_SIZE</code>
+        query option before doing an <code class="ph codeph">INSERT ... SELECT</code> statement to reduce the size of each
+        generated Parquet file. <span class="ph">(Specify the file size as an absolute number of bytes, or in Impala
+        2.0 and later, in units ending with <code class="ph codeph">m</code> for megabytes or <code class="ph codeph">g</code> for
+        gigabytes.)</span> Run benchmarks with different file sizes to find the right balance point for your
+        particular data volume.
+      </p>
+    </section>
+
+    <section class="section" id="perf_cookbook__perf_cookbook_stats"><h2 class="title sectiontitle">Gather statistics for all tables used in performance-critical or high-volume join queries.</h2>
+
+
+
+      <p class="p">
+        Gather the statistics with the <code class="ph codeph">COMPUTE STATS</code> statement. See
+        <a class="xref" href="impala_perf_joins.html#perf_joins">Performance Considerations for Join Queries</a> for details.
+      </p>
+    </section>
+
+    <section class="section" id="perf_cookbook__perf_cookbook_network"><h2 class="title sectiontitle">Minimize the overhead of transmitting results back to the client.</h2>
+
+
+
+      <p class="p">
+        Use techniques such as:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          Aggregation. If you need to know how many rows match a condition, the total values of matching values
+          from some column, the lowest or highest matching value, and so on, call aggregate functions such as
+          <code class="ph codeph">COUNT()</code>, <code class="ph codeph">SUM()</code>, and <code class="ph codeph">MAX()</code> in the query rather than
+          sending the result set to an application and doing those computations there. Remember that the size of an
+          unaggregated result set could be huge, requiring substantial time to transmit across the network.
+        </li>
+
+        <li class="li">
+          Filtering. Use all applicable tests in the <code class="ph codeph">WHERE</code> clause of a query to eliminate rows
+          that are not relevant, rather than producing a big result set and filtering it using application logic.
+        </li>
+
+        <li class="li">
+          <code class="ph codeph">LIMIT</code> clause. If you only need to see a few sample values from a result set, or the top
+          or bottom values from a query using <code class="ph codeph">ORDER BY</code>, include the <code class="ph codeph">LIMIT</code> clause
+          to reduce the size of the result set rather than asking for the full result set and then throwing most of
+          the rows away.
+        </li>
+
+        <li class="li">
+          Avoid overhead from pretty-printing the result set and displaying it on the screen. When you retrieve the
+          results through <span class="keyword cmdname">impala-shell</span>, use <span class="keyword cmdname">impala-shell</span> options such as
+          <code class="ph codeph">-B</code> and <code class="ph codeph">--output_delimiter</code> to produce results without special
+          formatting, and redirect output to a file rather than printing to the screen. Consider using
+          <code class="ph codeph">INSERT ... SELECT</code> to write the results directly to new files in HDFS. See
+          <a class="xref" href="impala_shell_options.html#shell_options">impala-shell Configuration Options</a> for details about the
+          <span class="keyword cmdname">impala-shell</span> command-line options.
+        </li>
+      </ul>
+    </section>
+
+    <section class="section" id="perf_cookbook__perf_cookbook_explain"><h2 class="title sectiontitle">Verify that your queries are planned in an efficient logical manner.</h2>
+
+
+
+      <p class="p">
+        Examine the <code class="ph codeph">EXPLAIN</code> plan for a query before actually running it. See
+        <a class="xref" href="impala_explain.html#explain">EXPLAIN Statement</a> and <a class="xref" href="impala_explain_plan.html#perf_explain">Using the EXPLAIN Plan for Performance Tuning</a> for
+        details.
+      </p>
+    </section>
+
+    <section class="section" id="perf_cookbook__perf_cookbook_profile"><h2 class="title sectiontitle">Verify performance characteristics of queries.</h2>
+
+
+
+      <p class="p">
+        Verify that the low-level aspects of I/O, memory usage, network bandwidth, CPU utilization, and so on are
+        within expected ranges by examining the query profile for a query after running it. See
+        <a class="xref" href="impala_explain_plan.html#perf_profile">Using the Query Profile for Performance Tuning</a> for details.
+      </p>
+    </section>
+
+    <section class="section" id="perf_cookbook__perf_cookbook_os"><h2 class="title sectiontitle">Use appropriate operating system settings.</h2>
+
+
+
+      <p class="p">
+        See <span class="xref">the documentation for your Apache Hadoop distribution</span> for recommendations about operating system
+        settings that you can change to influence Impala performance. In particular, you might find
+        that changing the <code class="ph codeph">vm.swappiness</code> Linux kernel setting to a non-zero value improves
+        overall performance.
+      </p>
+    </section>
+
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_performance.html">Tuning Impala for Performance</a></div></div></nav></article></main></body></html>

[44/51] [partial] impala git commit: [DOCS] Impala doc site update for 3.0

Posted by mi...@apache.org.

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_breakpad.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_breakpad.html b/docs/build3x/html/topics/impala_breakpad.html
new file mode 100644
index 0000000..eb59388
--- /dev/null
+++ b/docs/build3x/html/topics/impala_breakpad.html
@@ -0,0 +1,239 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_troubleshooting.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="breakpad"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Breakpad Minidumps for Impala (Impala 2.6 or higher only)</title></head><body id="breakpad"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Breakpad Minidumps for Impala (<span class="keyword">Impala 2.6</span> or higher only)</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+      The <a class="xref" href="https://chromium.googlesource.com/breakpad/breakpad/" target="_blank">breakpad</a>
+      project is an open-source framework for crash reporting.
+      In <span class="keyword">Impala 2.6</span> and higher, Impala can use <code class="ph codeph">breakpad</code> to record stack information and
+      register values when any of the Impala-related daemons crash due to an error such as <code class="ph codeph">SIGSEGV</code>
+      or unhandled exceptions.
+      The dump files are much smaller than traditional core dump files. The dump mechanism itself uses very little
+      memory, which improves reliability if the crash occurs while the system is low on memory.
+    </p>
+
+    <div class="note important note_important"><span class="note__title importanttitle">Important:</span>
+      Because of the internal mechanisms involving Impala memory allocation and Linux
+      signalling for out-of-memory (OOM) errors, if an Impala-related daemon experiences a
+      crash due to an OOM condition, it does <em class="ph i">not</em> generate a minidump for that error.
+    <p class="p">
+
+    </p>
+    </div>
+
+
+    <p class="p toc inpage"></p>
+
+  </div>
+
+  <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_troubleshooting.html">Troubleshooting Impala</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="breakpad__breakpad_minidump_enable">
+    <h2 class="title topictitle2" id="ariaid-title2">Enabling or Disabling Minidump Generation</h2>
+    <div class="body conbody">
+      <p class="p">
+        By default, a minidump file is generated when an Impala-related daemon
+        crashes.
+      </p>
+
+      <div class="p">
+        To turn off generation of the minidump files, use one of the following
+        options:
+
+        <ul class="ul">
+          <li class="li">
+            Set the <code class="ph codeph">--enable_minidumps</code> configuration setting
+            to <code class="ph codeph">false</code>. Restart the corresponding services or
+            daemons.
+          </li>
+
+          <li class="li">
+            Set the <code class="ph codeph">--minidump_path</code> configuration setting to
+            an empty string. Restart the corresponding services or daemons.
+          </li>
+        </ul>
+      </div>
+
+      <p class="p">
+        In <span class="keyword">Impala 2.7</span> and higher,
+        you can send a <code class="ph codeph">SIGUSR1</code> signal to any Impala-related daemon to write a
+        Breakpad minidump. For advanced troubleshooting, you can now produce a minidump
+        without triggering a crash.
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="breakpad__breakpad_minidump_location">
+    <h2 class="title topictitle2" id="ariaid-title3">Specifying the Location for Minidump Files</h2>
+    <div class="body conbody">
+      <div class="p">
+        By default, all minidump files are written to the following location
+        on the host where a crash occurs:
+
+         <ul class="ul">
+          <li class="li">
+            <p class="p">
+              Clusters not managed by cluster management software:
+              <span class="ph filepath"><var class="keyword varname">impala_log_dir</var>/<var class="keyword varname">daemon_name</var>/minidumps/<var class="keyword varname">daemon_name</var></span>
+            </p>
+          </li>
+        </ul>
+        The minidump files for <span class="keyword cmdname">impalad</span>, <span class="keyword cmdname">catalogd</span>,
+        and <span class="keyword cmdname">statestored</span> are each written to a separate directory.
+      </div>
+      <p class="p">
+        To specify a different location, set the
+
+        <span class="ph uicontrol">minidump_path</span>
+        configuration setting of one or more Impala-related daemons, and restart the corresponding services or daemons.
+      </p>
+      <p class="p">
+        If you specify a relative path for this setting, the value is interpreted relative to
+        the default <span class="ph uicontrol">minidump_path</span> directory.
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="breakpad__breakpad_minidump_number">
+    <h2 class="title topictitle2" id="ariaid-title4">Controlling the Number of Minidump Files</h2>
+    <div class="body conbody">
+      <p class="p">
+        Like any files used for logging or troubleshooting, consider limiting the number of
+        minidump files, or removing unneeded ones, depending on the amount of free storage
+        space on the hosts in the cluster.
+      </p>
+      <p class="p">
+        Because the minidump files are only used for problem resolution, you can remove any such files that
+        are not needed to debug current issues.
+      </p>
+      <p class="p">
+        To control how many minidump files Impala keeps around at any one time,
+        set the <span class="ph uicontrol">max_minidumps</span> configuration setting for
+        of one or more Impala-related daemon, and restart the corresponding services or daemons.
+        The default for this setting is 9. A zero or negative value is interpreted as
+        <span class="q">"unlimited"</span>.
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title5" id="breakpad__breakpad_minidump_logging">
+    <h2 class="title topictitle2" id="ariaid-title5">Detecting Crash Events</h2>
+    <div class="body conbody">
+
+      <p class="p">
+        You can see in the Impala log files when crash events occur that generate
+        minidump files. Because each restart begins a new log file, the <span class="q">"crashed"</span> message
+        is always at or near the bottom of the log file. There might be another later message
+        if core dumps are also enabled.
+      </p>
+
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title6" id="breakpad__breakpad_demo">
+    <h2 class="title topictitle2" id="ariaid-title6">Demonstration of Breakpad Feature</h2>
+    <div class="body conbody">
+      <p class="p">
+        The following example uses the command <span class="keyword cmdname">kill -11</span> to
+        simulate a <code class="ph codeph">SIGSEGV</code> crash for an <span class="keyword cmdname">impalad</span>
+        process on a single DataNode, then examines the relevant log files and minidump file.
+      </p>
+
+      <p class="p">
+        First, as root on a worker node, kill the <span class="keyword cmdname">impalad</span> process with a
+        <code class="ph codeph">SIGSEGV</code> error. The original process ID was 23114.
+      </p>
+
+<pre class="pre codeblock"><code>
+# ps ax | grep impalad
+23114 ?        Sl     0:18 /opt/local/parcels/&lt;parcel_version&gt;/lib/impala/sbin/impalad --flagfile=/var/run/impala/process/114-impala-IMPALAD/impala-conf/impalad_flags
+31259 pts/0    S+     0:00 grep impalad
+#
+# kill -11 23114
+#
+# ps ax | grep impalad
+31374 ?        Rl     0:04 /opt/local/parcels/&lt;parcel_version&gt;/lib/impala/sbin/impalad --flagfile=/var/run/impala/process/114-impala-IMPALAD/impala-conf/impalad_flags
+31475 pts/0    S+     0:00 grep impalad
+
+</code></pre>
+
+      <p class="p">
+        We locate the log directory underneath <span class="ph filepath">/var/log</span>.
+        There is a <code class="ph codeph">.INFO</code>, <code class="ph codeph">.WARNING</code>, and <code class="ph codeph">.ERROR</code>
+        log file for the 23114 process ID. The minidump message is written to the
+        <code class="ph codeph">.INFO</code> file and the <code class="ph codeph">.ERROR</code> file, but not the
+        <code class="ph codeph">.WARNING</code> file. In this case, a large core file was also produced.
+      </p>
+<pre class="pre codeblock"><code>
+# cd /var/log/impalad
+# ls -la | grep 23114
+-rw-------   1 impala impala 3539079168 Jun 23 15:20 core.23114
+-rw-r--r--   1 impala impala      99057 Jun 23 15:20 hs_err_pid23114.log
+-rw-r--r--   1 impala impala        351 Jun 23 15:20 impalad.worker_node_123.impala.log.ERROR.20160623-140343.23114
+-rw-r--r--   1 impala impala      29101 Jun 23 15:20 impalad.worker_node_123.impala.log.INFO.20160623-140343.23114
+-rw-r--r--   1 impala impala        228 Jun 23 14:03 impalad.worker_node_123.impala.log.WARNING.20160623-140343.23114
+
+</code></pre>
+      <p class="p">
+        The <code class="ph codeph">.INFO</code> log includes the location of the minidump file, followed by
+        a report of a core dump. With the breakpad minidump feature enabled, now we might
+        disable core dumps or keep fewer of them around.
+      </p>
+<pre class="pre codeblock"><code>
+# cat impalad.worker_node_123.impala.log.INFO.20160623-140343.23114
+...
+Wrote minidump to /var/log/impala-minidumps/impalad/0980da2d-a905-01e1-25ff883a-04ee027a.dmp
+#
+# A fatal error has been detected by the Java Runtime Environment:
+#
+#  SIGSEGV (0xb) at pc=0x00000030c0e0b68a, pid=23114, tid=139869541455968
+#
+# JRE version: Java(TM) SE Runtime Environment (7.0_67-b01) (build 1.7.0_67-b01)
+# Java VM: Java HotSpot(TM) 64-Bit Server VM (24.65-b04 mixed mode linux-amd64 compressed oops)
+# Problematic frame:
+# C  [libpthread.so.0+0xb68a]  pthread_cond_wait+0xca
+#
+# Core dump written. Default location: /var/log/impalad/core or core.23114
+#
+# An error report file with more information is saved as:
+# /var/log/impalad/hs_err_pid23114.log
+#
+# If you would like to submit a bug report, please visit:
+#   http://bugreport.sun.com/bugreport/crash.jsp
+# The crash happened outside the Java Virtual Machine in native code.
+# See problematic frame for where to report the bug.
+...
+
+# cat impalad.worker_node_123.impala.log.ERROR.20160623-140343.23114
+
+Log file created at: 2016/06/23 14:03:43
+Running on machine:.worker_node_123
+Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
+E0623 14:03:43.911002 23114 logging.cc:118] stderr will be logged to this file.
+Wrote minidump to /var/log/impala-minidumps/impalad/0980da2d-a905-01e1-25ff883a-04ee027a.dmp
+
+</code></pre>
+
+      <p class="p">
+        The resulting minidump file is much smaller than the corresponding core file,
+        making it much easier to supply diagnostic information to <span class="keyword">the appropriate support channel</span>.
+      </p>
+
+<pre class="pre codeblock"><code>
+# pwd
+/var/log/impalad
+# cd ../impala-minidumps/impalad
+# ls
+0980da2d-a905-01e1-25ff883a-04ee027a.dmp
+# du -kh *
+2.4M  0980da2d-a905-01e1-25ff883a-04ee027a.dmp
+
+</code></pre>
+    </div>
+  </article>
+
+</article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_buffer_pool_limit.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_buffer_pool_limit.html b/docs/build3x/html/topics/impala_buffer_pool_limit.html
new file mode 100644
index 0000000..e9406b7
--- /dev/null
+++ b/docs/build3x/html/topics/impala_buffer_pool_limit.html
@@ -0,0 +1,71 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="buffer_pool_limit"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>BUFFER_POOL_LIMIT Query Option</title></head><body id="buffer_pool_limit"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">BUFFER_POOL_LIMIT Query Option</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+
+      Defines a limit on the amount of memory that a query can allocate from the
+      internal buffer pool. The value for this limit applies to the memory on each host,
+      not the aggregate memory across the cluster. Typically not changed by users, except
+      during diagnosis of out-of-memory errors during queries.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Type:</strong> integer
+      </p>
+
+
+
+    <p class="p">
+        <strong class="ph b">Default:</strong>
+      </p>
+    <p class="p">
+      The default setting for this option is the lower of 80% of the
+      <code class="ph codeph">MEM_LIMIT</code> setting, or the <code class="ph codeph">MEM_LIMIT</code>
+      setting minus 100 MB.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.10.0</span>
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+    <p class="p">
+      If queries encounter out-of-memory errors, consider decreasing the
+      <code class="ph codeph">BUFFER_POOL_LIMIT</code> setting to less than 80% of the
+      <code class="ph codeph">MEM_LIMIT setting</code>.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>
+-- Set an absolute value.
+set buffer_pool_limit=8GB;
+
+-- Set a relative value based on the MEM_LIMIT setting.
+set buffer_pool_limit=80%;
+
+</code></pre>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+    <p class="p">
+      <a class="xref" href="impala_default_spillable_buffer_size.html">DEFAULT_SPILLABLE_BUFFER_SIZE Query Option</a>,
+      <a class="xref" href="impala_max_row_size.html">MAX_ROW_SIZE Query Option</a>,
+      <a class="xref" href="impala_min_spillable_buffer_size.html">MIN_SPILLABLE_BUFFER_SIZE Query Option</a>,
+      <a class="xref" href="impala_scalability.html">Scalability Considerations for Impala</a>
+    </p>
+
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_char.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_char.html b/docs/build3x/html/topics/impala_char.html
new file mode 100644
index 0000000..6441539
--- /dev/null
+++ b/docs/build3x/html/topics/impala_char.html
@@ -0,0 +1,305 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_datatypes.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="char"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>CHAR Data Type (Impala 2.0 or higher only)</title></head><body id="char"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">CHAR Data Type (<span class="keyword">Impala 2.0</span> or higher only)</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+
+      A fixed-length character type, padded with trailing spaces if necessary to achieve the specified length. If
+      values are longer than the specified length, Impala truncates any trailing characters.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+    <p class="p">
+      In the column definition of a <code class="ph codeph">CREATE TABLE</code> statement:
+    </p>
+
+<pre class="pre codeblock"><code><var class="keyword varname">column_name</var> CHAR(<var class="keyword varname">length</var>)</code></pre>
+
+    <p class="p">
+      The maximum length you can specify is 255.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Semantics of trailing spaces:</strong>
+    </p>
+
+    <ul class="ul">
+      <li class="li">
+        When you store a <code class="ph codeph">CHAR</code> value shorter than the specified length in a table, queries return
+        the value padded with trailing spaces if necessary; the resulting value has the same length as specified in
+        the column definition.
+      </li>
+
+      <li class="li">
+        If you store a <code class="ph codeph">CHAR</code> value containing trailing spaces in a table, those trailing spaces are
+        not stored in the data file. When the value is retrieved by a query, the result could have a different
+        number of trailing spaces. That is, the value includes however many spaces are needed to pad it to the
+        specified length of the column.
+      </li>
+
+      <li class="li">
+        If you compare two <code class="ph codeph">CHAR</code> values that differ only in the number of trailing spaces, those
+        values are considered identical.
+      </li>
+    </ul>
+
+    <p class="p">
+        <strong class="ph b">Partitioning:</strong> This type can be used for partition key columns. Because of the efficiency advantage
+        of numeric values over character-based values, if the partition key is a string representation of a number,
+        prefer to use an integer type with sufficient range (<code class="ph codeph">INT</code>, <code class="ph codeph">BIGINT</code>, and so
+        on) where practical.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">HBase considerations:</strong> This data type cannot be used with HBase tables.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Parquet considerations:</strong>
+      </p>
+
+    <ul class="ul">
+      <li class="li">
+        This type can be read from and written to Parquet files.
+      </li>
+
+      <li class="li">
+        There is no requirement for a particular level of Parquet.
+      </li>
+
+      <li class="li">
+        Parquet files generated by Impala and containing this type can be freely interchanged with other components
+        such as Hive and MapReduce.
+      </li>
+
+      <li class="li">
+        Any trailing spaces, whether implicitly or explicitly specified, are not written to the Parquet data files.
+      </li>
+
+      <li class="li">
+        Parquet data files might contain values that are longer than allowed by the
+        <code class="ph codeph">CHAR(<var class="keyword varname">n</var>)</code> length limit. Impala ignores any extra trailing characters when
+        it processes those values during a query.
+      </li>
+    </ul>
+
+    <p class="p">
+        <strong class="ph b">Text table considerations:</strong>
+      </p>
+
+    <p class="p">
+      Text data files might contain values that are longer than allowed for a particular
+      <code class="ph codeph">CHAR(<var class="keyword varname">n</var>)</code> column. Any extra trailing characters are ignored when Impala
+      processes those values during a query. Text data files can also contain values that are shorter than the
+      defined length limit, and Impala pads them with trailing spaces up to the specified length. Any text data
+      files produced by Impala <code class="ph codeph">INSERT</code> statements do not include any trailing blanks for
+      <code class="ph codeph">CHAR</code> columns.
+    </p>
+
+    <p class="p"><strong class="ph b">Avro considerations:</strong></p>
+    <p class="p">
+        The Avro specification allows string values up to 2**64 bytes in length.
+        Impala queries for Avro tables use 32-bit integers to hold string lengths.
+        In <span class="keyword">Impala 2.5</span> and higher, Impala truncates <code class="ph codeph">CHAR</code>
+        and <code class="ph codeph">VARCHAR</code> values in Avro tables to (2**31)-1 bytes.
+        If a query encounters a <code class="ph codeph">STRING</code> value longer than (2**31)-1
+        bytes in an Avro table, the query fails. In earlier releases,
+        encountering such long values in an Avro table could cause a crash.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Compatibility:</strong>
+      </p>
+
+    <p class="p">
+      This type is available using <span class="keyword">Impala 2.0</span> or higher.
+    </p>
+
+    <p class="p">
+      Some other database systems make the length specification optional. For Impala, the length is required.
+    </p>
+
+
+
+    <p class="p">
+        <strong class="ph b">Internal details:</strong> Represented in memory as a byte array with the same size as the length
+        specification. Values that are shorter than the specified length are padded on the right with trailing
+        spaces.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.0.0</span>
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Column statistics considerations:</strong> Because this type has a fixed size, the maximum and average size
+        fields are always filled in for column statistics, even before you run the <code class="ph codeph">COMPUTE STATS</code>
+        statement.
+      </p>
+
+
+
+    <p class="p">
+        <strong class="ph b">UDF considerations:</strong> This type cannot be used for the argument or return type of a user-defined
+        function (UDF) or user-defined aggregate function (UDA).
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+    <p class="p">
+      These examples show how trailing spaces are not considered significant when comparing or processing
+      <code class="ph codeph">CHAR</code> values. <code class="ph codeph">CAST()</code> truncates any longer string to fit within the defined
+      length. If a <code class="ph codeph">CHAR</code> value is shorter than the specified length, it is padded on the right with
+      spaces until it matches the specified length. Therefore, <code class="ph codeph">LENGTH()</code> represents the length
+      including any trailing spaces, and <code class="ph codeph">CONCAT()</code> also treats the column value as if it has
+      trailing spaces.
+    </p>
+
+<pre class="pre codeblock"><code>select cast('x' as char(4)) = cast('x   ' as char(4)) as "unpadded equal to padded";
++--------------------------+
+| unpadded equal to padded |
++--------------------------+
+| true                     |
++--------------------------+
+
+create table char_length(c char(3));
+insert into char_length values (cast('1' as char(3))), (cast('12' as char(3))), (cast('123' as char(3))), (cast('123456' as char(3)));
+select concat("[",c,"]") as c, length(c) from char_length;
++-------+-----------+
+| c     | length(c) |
++-------+-----------+
+| [1  ] | 3         |
+| [12 ] | 3         |
+| [123] | 3         |
+| [123] | 3         |
++-------+-----------+
+</code></pre>
+
+    <p class="p">
+      This example shows a case where data values are known to have a specific length, where <code class="ph codeph">CHAR</code>
+      is a logical data type to use.
+
+    </p>
+
+<pre class="pre codeblock"><code>create table addresses
+  (id bigint,
+   street_name string,
+   state_abbreviation char(2),
+   country_abbreviation char(2));
+</code></pre>
+
+    <p class="p">
+      The following example shows how values written by Impala do not physically include the trailing spaces. It
+      creates a table using text format, with <code class="ph codeph">CHAR</code> values much shorter than the declared length,
+      and then prints the resulting data file to show that the delimited values are not separated by spaces. The
+      same behavior applies to binary-format Parquet data files.
+    </p>
+
+<pre class="pre codeblock"><code>create table char_in_text (a char(20), b char(30), c char(40))
+  row format delimited fields terminated by ',';
+
+insert into char_in_text values (cast('foo' as char(20)), cast('bar' as char(30)), cast('baz' as char(40))), (cast('hello' as char(20)), cast('goodbye' as char(30)), cast('aloha' as char(40)));
+
+-- Running this Linux command inside impala-shell using the ! shortcut.
+!hdfs dfs -cat 'hdfs://127.0.0.1:8020/user/hive/warehouse/impala_doc_testing.db/char_in_text/*.*';
+foo,bar,baz
+hello,goodbye,aloha
+</code></pre>
+
+    <p class="p">
+      The following example further illustrates the treatment of spaces. It replaces the contents of the previous
+      table with some values including leading spaces, trailing spaces, or both. Any leading spaces are preserved
+      within the data file, but trailing spaces are discarded. Then when the values are retrieved by a query, the
+      leading spaces are retrieved verbatim while any necessary trailing spaces are supplied by Impala.
+    </p>
+
+<pre class="pre codeblock"><code>insert overwrite char_in_text values (cast('trailing   ' as char(20)), cast('   leading and trailing   ' as char(30)), cast('   leading' as char(40)));
+!hdfs dfs -cat 'hdfs://127.0.0.1:8020/user/hive/warehouse/impala_doc_testing.db/char_in_text/*.*';
+trailing,   leading and trailing,   leading
+
+select concat('[',a,']') as a, concat('[',b,']') as b, concat('[',c,']') as c from char_in_text;
++------------------------+----------------------------------+--------------------------------------------+
+| a                      | b                                | c                                          |
++------------------------+----------------------------------+--------------------------------------------+
+| [trailing            ] | [   leading and trailing       ] | [   leading                              ] |
++------------------------+----------------------------------+--------------------------------------------+
+</code></pre>
+
+    <p class="p">
+        <strong class="ph b">Kudu considerations:</strong>
+      </p>
+    <p class="p">
+        Currently, the data types <code class="ph codeph">CHAR</code>, <code class="ph codeph">VARCHAR</code>,
+        <code class="ph codeph">ARRAY</code>, <code class="ph codeph">MAP</code>, and <code class="ph codeph">STRUCT</code> cannot be used with Kudu tables.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Restrictions:</strong>
+      </p>
+
+    <p class="p">
+      Because the blank-padding behavior requires allocating the maximum length for each value in memory, for
+      scalability reasons avoid declaring <code class="ph codeph">CHAR</code> columns that are much longer than typical values in
+      that column.
+    </p>
+
+    <p class="p">
+        All data in <code class="ph codeph">CHAR</code> and <code class="ph codeph">VARCHAR</code> columns must be in a character encoding that
+        is compatible with UTF-8. If you have binary data from another database system (that is, a BLOB type), use
+        a <code class="ph codeph">STRING</code> column to hold it.
+      </p>
+
+    <p class="p">
+      When an expression compares a <code class="ph codeph">CHAR</code> with a <code class="ph codeph">STRING</code> or
+      <code class="ph codeph">VARCHAR</code>, the <code class="ph codeph">CHAR</code> value is implicitly converted to <code class="ph codeph">STRING</code>
+      first, with trailing spaces preserved.
+    </p>
+
+<pre class="pre codeblock"><code>select cast("foo  " as char(5)) = 'foo' as "char equal to string";
++----------------------+
+| char equal to string |
++----------------------+
+| false                |
++----------------------+
+</code></pre>
+
+    <p class="p">
+      This behavior differs from other popular database systems. To get the expected result of
+      <code class="ph codeph">TRUE</code>, cast the expressions on both sides to <code class="ph codeph">CHAR</code> values of the appropriate
+      length:
+    </p>
+
+<pre class="pre codeblock"><code>select cast("foo  " as char(5)) = cast('foo' as char(3)) as "char equal to string";
++----------------------+
+| char equal to string |
++----------------------+
+| true                 |
++----------------------+
+</code></pre>
+
+    <p class="p">
+      This behavior is subject to change in future releases.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+    <p class="p">
+      <a class="xref" href="impala_string.html#string">STRING Data Type</a>, <a class="xref" href="impala_varchar.html#varchar">VARCHAR Data Type (Impala 2.0 or higher only)</a>,
+      <a class="xref" href="impala_literals.html#string_literals">String Literals</a>,
+      <a class="xref" href="impala_string_functions.html#string_functions">Impala String Functions</a>
+    </p>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_datatypes.html">Data Types</a></div></div></nav></article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_comments.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_comments.html b/docs/build3x/html/topics/impala_comments.html
new file mode 100644
index 0000000..62bd6ee
--- /dev/null
+++ b/docs/build3x/html/topics/impala_comments.html
@@ -0,0 +1,46 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_langref.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="comments"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Comments</title></head><body id="comments"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Comments</h1>
+
+
+  <div class="body conbody">
+
+    <p class="p">
+
+      Impala supports the familiar styles of SQL comments:
+    </p>
+
+    <ul class="ul">
+      <li class="li">
+        All text from a <code class="ph codeph">--</code> sequence to the end of the line is considered a comment and ignored.
+        This type of comment can occur on a single line by itself, or after all or part of a statement.
+      </li>
+
+      <li class="li">
+        All text from a <code class="ph codeph">/*</code> sequence to the next <code class="ph codeph">*/</code> sequence is considered a
+        comment and ignored. This type of comment can stretch over multiple lines. This type of comment can occur
+        on one or more lines by itself, in the middle of a statement, or before or after a statement.
+      </li>
+    </ul>
+
+    <p class="p">
+      For example:
+    </p>
+
+<pre class="pre codeblock"><code>-- This line is a comment about a table.
+create table ...;
+
+/*
+This is a multi-line comment about a query.
+*/
+select ...;
+
+select * from t /* This is an embedded comment about a query. */ where ...;
+
+select * from t -- This is a trailing comment within a multi-line command.
+where ...;
+</code></pre>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_langref.html">Impala SQL Language Reference</a></div></div></nav></article></main></body></html>

[49/51] [partial] impala git commit: [DOCS] Impala doc site update for 3.0

Posted by mi...@apache.org.

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_aggregate_functions.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_aggregate_functions.html b/docs/build3x/html/topics/impala_aggregate_functions.html
new file mode 100644
index 0000000..9175be2
--- /dev/null
+++ b/docs/build3x/html/topics/impala_aggregate_functions.html
@@ -0,0 +1,34 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_functions.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_appx_median.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_avg.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_count.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_group_concat.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_max.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_min.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_ndv.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_stddev.html"><meta name="DC.Relation" scheme="URI" conte
 nt="../topics/impala_sum.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_variance.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="aggregate_functions"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Impala Aggregate Functions</title></head><body id="aggregate_functions"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Impala Aggregate Functions</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+        Aggregate functions are a special category with different rules. These functions calculate a return value
+        across all the items in a result set, so they require a <code class="ph codeph">FROM</code> clause in the query:
+      </p>
+
+<pre class="pre codeblock"><code>select count(product_id) from product_catalog;
+select max(height), avg(height) from census_data where age &gt; 20;
+</code></pre>
+
+    <p class="p">
+        Aggregate functions also ignore <code class="ph codeph">NULL</code> values rather than returning a <code class="ph codeph">NULL</code>
+        result. For example, if some rows have <code class="ph codeph">NULL</code> for a particular column, those rows are
+        ignored when computing the <code class="ph codeph">AVG()</code> for that column. Likewise, specifying
+        <code class="ph codeph">COUNT(<var class="keyword varname">col_name</var>)</code> in a query counts only those rows where
+        <var class="keyword varname">col_name</var> contains a non-<code class="ph codeph">NULL</code> value.
+      </p>
+
+    <p class="p">
+
+    </p>
+
+    <p class="p toc"></p>
+  </div>
+<nav role="navigation" class="related-links"><ul class="ullinks"><li class="link ulchildlink"><strong><a href="../topics/impala_appx_median.html">APPX_MEDIAN Function</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_avg.html">AVG Function</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_count.html">COUNT Function</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_group_concat.html">GROUP_CONCAT Function</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_max.html">MAX Function</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_min.html">MIN Function</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_ndv.html">NDV Function</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_stddev.html">STDDEV, STDDEV_SAMP, STDDEV_POP Functions</a></strong><br></li><li cl
 ass="link ulchildlink"><strong><a href="../topics/impala_sum.html">SUM Function</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_variance.html">VARIANCE, VARIANCE_SAMP, VARIANCE_POP, VAR_SAMP, VAR_POP Functions</a></strong><br></li></ul><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_functions.html">Impala Built-In Functions</a></div></div></nav></article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_aliases.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_aliases.html b/docs/build3x/html/topics/impala_aliases.html
new file mode 100644
index 0000000..95f4da8
--- /dev/null
+++ b/docs/build3x/html/topics/impala_aliases.html
@@ -0,0 +1,148 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_schema_objects.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="aliases"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Overview of Impala Aliases</title></head><body id="aliases"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Overview of Impala Aliases</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+      When you write the names of tables, columns, or column expressions in a query, you can assign an alias at the
+      same time. Then you can specify the alias rather than the original name when making other references to the
+      table or column in the same statement. You typically specify aliases that are shorter, easier to remember, or
+      both than the original names. The aliases are printed in the query header, making them useful for
+      self-documenting output.
+    </p>
+
+    <p class="p">
+      To set up an alias, add the <code class="ph codeph">AS <var class="keyword varname">alias</var></code> clause immediately after any table,
+      column, or expression name in the <code class="ph codeph">SELECT</code> list or <code class="ph codeph">FROM</code> list of a query. The
+      <code class="ph codeph">AS</code> keyword is optional; you can also specify the alias immediately after the original name.
+    </p>
+
+<pre class="pre codeblock"><code>-- Make the column headers of the result set easier to understand.
+SELECT c1 AS name, c2 AS address, c3 AS phone FROM table_with_terse_columns;
+SELECT SUM(ss_xyz_dollars_net) AS total_sales FROM table_with_cryptic_columns;
+-- The alias can be a quoted string for extra readability.
+SELECT c1 AS "Employee ID", c2 AS "Date of hire" FROM t1;
+-- The AS keyword is optional.
+SELECT c1 "Employee ID", c2 "Date of hire" FROM t1;
+
+-- The table aliases assigned in the FROM clause can be used both earlier
+-- in the query (the SELECT list) and later (the WHERE clause).
+SELECT one.name, two.address, three.phone
+  FROM census one, building_directory two, phonebook three
+WHERE one.id = two.id and two.id = three.id;
+
+-- The aliases c1 and c2 let the query handle columns with the same names from 2 joined tables.
+-- The aliases t1 and t2 let the query abbreviate references to long or cryptically named tables.
+SELECT t1.column_n AS c1, t2.column_n AS c2 FROM long_name_table AS t1, very_long_name_table2 AS t2
+  WHERE c1 = c2;
+SELECT t1.column_n c1, t2.column_n c2 FROM table1 t1, table2 t2
+  WHERE c1 = c2;
+</code></pre>
+
+    <p class="p">
+      From Impala 3.0, the alias substitution logic has changed.
+    </p>
+    <div class="p">
+      You can specify column aliases with or without the <code class="ph codeph">AS</code> keyword, and with no quotation
+      marks, single quotation marks, or double quotation marks. Some kind of quotation marks are required if the
+      column alias contains any spaces or other problematic characters. The alias text is displayed in the
+      <span class="keyword cmdname">impala-shell</span> output as all-lowercase. For example:
+<pre class="pre codeblock"><code>[localhost:21000] &gt; select c1 First_Column from t;
+[localhost:21000] &gt; select c1 as First_Column from t;
++--------------+
+| first_column |
++--------------+
+...
+
+[localhost:21000] &gt; select c1 'First Column' from t;
+[localhost:21000] &gt; select c1 as 'First Column' from t;
++--------------+
+| first column |
++--------------+
+...
+
+[localhost:21000] &gt; select c1 "First Column" from t;
+[localhost:21000] &gt; select c1 as "First Column" from t;
++--------------+
+| first column |
++--------------+
+...</code></pre>
+      From Impala 3.0, the alias substitution logic in the <code class="ph codeph">GROUP BY</code>, <code class="ph codeph">HAVING</code>,
+      and <code class="ph codeph">ORDER BY</code> clauses has become more consistent with standard SQL behavior, as follows.
+      Aliases are now only legal at the top level, and not in subexpressions. The following statements are
+      allowed:
+<pre class="pre codeblock"><code>
+  SELECT int_col / 2 AS x
+  FROM t
+  GROUP BY x;
+
+  SELECT int_col / 2 AS x
+  FROM t
+  ORDER BY x;
+
+  SELECT NOT bool_col AS nb
+  FROM t
+  GROUP BY nb
+  HAVING nb;
+</code></pre>
+      And the following statements are NOT allowed:
+<pre class="pre codeblock"><code>
+  SELECT int_col / 2 AS x
+  FROM t
+  GROUP BY x / 2;
+
+  SELECT int_col / 2 AS x
+  FROM t
+  ORDER BY -x;
+
+  SELECT int_col / 2 AS x
+  FROM t
+  GROUP BY x
+  HAVING x &gt; 3;
+</code></pre>
+    </div>
+
+    <p class="p">
+      To use an alias name that matches one of the Impala reserved keywords (listed in
+      <a class="xref" href="impala_reserved_words.html#reserved_words">Impala Reserved Words</a>), surround the identifier with either single or
+      double quotation marks, or <code class="ph codeph">``</code> characters (backticks).
+    </p>
+
+    <p class="p">
+      <span class="ph"> Aliases follow the same rules as identifiers when it comes to case
+        insensitivity. Aliases can be longer than identifiers (up to the maximum length of a Java string) and can
+        include additional characters such as spaces and dashes when they are quoted using backtick characters.
+        </span>
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Complex type considerations:</strong>
+      </p>
+
+    <p class="p">
+      Queries involving the complex types (<code class="ph codeph">ARRAY</code>,
+      <code class="ph codeph">STRUCT</code>, and <code class="ph codeph">MAP</code>), typically make
+      extensive use of table aliases. These queries involve join clauses
+      where the complex type column is treated as a joined table.
+      To construct two-part or three-part qualified names for the
+      complex column elements in the <code class="ph codeph">FROM</code> list,
+      sometimes it is syntactically required to construct a table
+      alias for the complex column where it is referenced in the join clause.
+      See <a class="xref" href="impala_complex_types.html#complex_types">Complex Types (Impala 2.3 or higher only)</a> for details and examples.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Alternatives:</strong>
+    </p>
+
+    <p class="p">
+        Another way to define different names for the same tables or columns is to create views. See
+        <a class="xref" href="../shared/../topics/impala_views.html#views">Overview of Impala Views</a> for details.
+      </p>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_schema_objects.html">Impala Schema Objects and Object Names</a></div></div></nav></article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_allow_unsupported_formats.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_allow_unsupported_formats.html b/docs/build3x/html/topics/impala_allow_unsupported_formats.html
new file mode 100644
index 0000000..6481bf3
--- /dev/null
+++ b/docs/build3x/html/topics/impala_allow_unsupported_formats.html
@@ -0,0 +1,24 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="allow_unsupported_formats"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>ALLOW_UNSUPPORTED_FORMATS Query Option</title></head><body id="allow_unsupported_formats"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">ALLOW_UNSUPPORTED_FORMATS Query Option</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+      An obsolete query option from early work on support for file formats. Do not use. Might be removed in the
+      future.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Type:</strong> Boolean; recognized values are 1 and 0, or <code class="ph codeph">true</code> and <code class="ph codeph">false</code>;
+        any other value interpreted as <code class="ph codeph">false</code>
+      </p>
+    <p class="p">
+        <strong class="ph b">Default:</strong> <code class="ph codeph">false</code> (shown as 0 in output of <code class="ph codeph">SET</code> statement)
+      </p>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_alter_table.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_alter_table.html b/docs/build3x/html/topics/impala_alter_table.html
new file mode 100644
index 0000000..628b779
--- /dev/null
+++ b/docs/build3x/html/topics/impala_alter_table.html
@@ -0,0 +1,1117 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_langref_sql.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="alter_table"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>ALTER TABLE Statement</title></head><body id="alter_table"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">ALTER TABLE Statement</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+
+      The <code class="ph codeph">ALTER TABLE</code> statement changes the structure or properties of an existing Impala table.
+    </p>
+    <p class="p">
+      In Impala, this is primarily a logical operation that updates the table metadata in the metastore database that Impala
+      shares with Hive. Most <code class="ph codeph">ALTER TABLE</code> operations do not actually rewrite, move, and so on the actual data
+      files. (The <code class="ph codeph">RENAME TO</code> clause is the one exception; it can cause HDFS files to be moved to different paths.)
+      When you do an <code class="ph codeph">ALTER TABLE</code> operation, you typically need to perform corresponding physical filesystem operations,
+      such as rewriting the data files to include extra fields, or converting them to a different file format.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>ALTER TABLE [<var class="keyword varname">old_db_name</var>.]<var class="keyword varname">old_table_name</var> RENAME TO [<var class="keyword varname">new_db_name</var>.]<var class="keyword varname">new_table_name</var>
+
+ALTER TABLE <var class="keyword varname">name</var> ADD COLUMNS (<var class="keyword varname">col_spec</var>[, <var class="keyword varname">col_spec</var> ...])
+ALTER TABLE <var class="keyword varname">name</var> DROP [COLUMN] <var class="keyword varname">column_name</var>
+ALTER TABLE <var class="keyword varname">name</var> CHANGE <var class="keyword varname">column_name</var> <var class="keyword varname">new_name</var> <var class="keyword varname">new_type</var>
+
+ALTER TABLE <var class="keyword varname">name</var> REPLACE COLUMNS (<var class="keyword varname">col_spec</var>[, <var class="keyword varname">col_spec</var> ...])
+
+<span class="ph">-- Kudu tables only.
+ALTER TABLE <var class="keyword varname">name</var> ALTER [COLUMN] <var class="keyword varname">column_name</var>
+  { SET <var class="keyword varname">kudu_storage_attr</var> <var class="keyword varname">attr_value</var>
+    | DROP DEFAULT }
+
+kudu_storage_attr ::= { DEFAULT | BLOCK_SIZE | ENCODING | COMPRESSION }</span>
+
+<span class="ph">-- Non-Kudu tables only.
+ALTER TABLE <var class="keyword varname">name</var> ALTER [COLUMN] <var class="keyword varname">column_name</var>
+  SET COMMENT '<var class="keyword varname">comment_text</var>'</span>
+
+ALTER TABLE <var class="keyword varname">name</var> ADD [IF NOT EXISTS] PARTITION (<var class="keyword varname">partition_spec</var>)
+  <span class="ph">[<var class="keyword varname">location_spec</var>]</span>
+  <span class="ph">[<var class="keyword varname">cache_spec</var>]</span>
+<span class="ph">ALTER TABLE <var class="keyword varname">name</var> ADD [IF NOT EXISTS] RANGE PARTITION <var class="keyword varname">kudu_partition_spec</var></span>
+
+ALTER TABLE <var class="keyword varname">name</var> DROP [IF EXISTS] PARTITION (<var class="keyword varname">partition_spec</var>)
+  <span class="ph">[PURGE]</span>
+<span class="ph">ALTER TABLE <var class="keyword varname">name</var> DROP [IF EXISTS] RANGE PARTITION <var class="keyword varname">kudu_partition_spec</var></span>
+
+<span class="ph">ALTER TABLE <var class="keyword varname">name</var> RECOVER PARTITIONS</span>
+
+ALTER TABLE <var class="keyword varname">name</var> [PARTITION (<var class="keyword varname">partition_spec</var>)]
+  SET { FILEFORMAT <var class="keyword varname">file_format</var>
+  | LOCATION '<var class="keyword varname">hdfs_path_of_directory</var>'
+  | TBLPROPERTIES (<var class="keyword varname">table_properties</var>)
+  | SERDEPROPERTIES (<var class="keyword varname">serde_properties</var>) }
+
+<span class="ph">ALTER TABLE <var class="keyword varname">name</var> <var class="keyword varname">colname</var>
+  ('<var class="keyword varname">statsKey</var>'='<var class="keyword varname">val</var>, ...)
+
+statsKey ::= numDVs | numNulls | avgSize | maxSize</span>
+
+<span class="ph">ALTER TABLE <var class="keyword varname">name</var> [PARTITION (<var class="keyword varname">partition_spec</var>)] SET { CACHED IN '<var class="keyword varname">pool_name</var>' <span class="ph">[WITH REPLICATION = <var class="keyword varname">integer</var>]</span> | UNCACHED }</span>
+
+<var class="keyword varname">new_name</var> ::= [<var class="keyword varname">new_database</var>.]<var class="keyword varname">new_table_name</var>
+
+<var class="keyword varname">col_spec</var> ::= <var class="keyword varname">col_name</var> <var class="keyword varname">type_name</var> <span class="ph">[<var class="keyword varname">kudu_attributes</var>]</span>
+
+<span class="ph"><var class="keyword varname">kudu_attributes</var> ::= { [NOT] NULL | ENCODING <var class="keyword varname">codec</var> | COMPRESSION <var class="keyword varname">algorithm</var> |
+  DEFAULT <var class="keyword varname">constant</var> | BLOCK_SIZE <var class="keyword varname">number</var> }</span>
+
+<var class="keyword varname">partition_spec</var> ::= <var class="keyword varname">simple_partition_spec</var> | <span class="ph"><var class="keyword varname">complex_partition_spec</var></span>
+
+<var class="keyword varname">simple_partition_spec</var> ::= <var class="keyword varname">partition_col</var>=<var class="keyword varname">constant_value</var>
+
+<span class="ph"><var class="keyword varname">complex_partition_spec</var> ::= <var class="keyword varname">comparison_expression_on_partition_col</var></span>
+
+<span class="ph"><var class="keyword varname">kudu_partition_spec</var> ::= <var class="keyword varname">constant</var> <var class="keyword varname">range_operator</var> VALUES <var class="keyword varname">range_operator</var> <var class="keyword varname">constant</var> | VALUE = <var class="keyword varname">constant</var></span>
+
+<span class="ph">cache_spec ::= CACHED IN '<var class="keyword varname">pool_name</var>' [WITH REPLICATION = <var class="keyword varname">integer</var>] | UNCACHED</span>
+
+<span class="ph">location_spec ::= LOCATION '<var class="keyword varname">hdfs_path_of_directory</var>'</span>
+
+<var class="keyword varname">table_properties</var> ::= '<var class="keyword varname">name</var>'='<var class="keyword varname">value</var>'[, '<var class="keyword varname">name</var>'='<var class="keyword varname">value</var>' ...]
+
+<var class="keyword varname">serde_properties</var> ::= '<var class="keyword varname">name</var>'='<var class="keyword varname">value</var>'[, '<var class="keyword varname">name</var>'='<var class="keyword varname">value</var>' ...]
+
+<var class="keyword varname">file_format</var> ::= { PARQUET | TEXTFILE | RCFILE | SEQUENCEFILE | AVRO }
+</code></pre>
+
+    <p class="p">
+        <strong class="ph b">Statement type:</strong> DDL
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Complex type considerations:</strong>
+      </p>
+
+    <p class="p">
+      In <span class="keyword">Impala 2.3</span> and higher, the <code class="ph codeph">ALTER TABLE</code> statement can
+      change the metadata for tables containing complex types (<code class="ph codeph">ARRAY</code>,
+      <code class="ph codeph">STRUCT</code>, and <code class="ph codeph">MAP</code>).
+      For example, you can use an <code class="ph codeph">ADD COLUMNS</code>, <code class="ph codeph">DROP COLUMN</code>, or <code class="ph codeph">CHANGE</code>
+      clause to modify the table layout for complex type columns.
+      Although Impala queries only work for complex type columns in Parquet tables, the complex type support in the
+      <code class="ph codeph">ALTER TABLE</code> statement applies to all file formats.
+      For example, you can use Impala to update metadata for a staging table in a non-Parquet file format where the
+      data is populated by Hive. Or you can use <code class="ph codeph">ALTER TABLE SET FILEFORMAT</code> to change the format
+      of an existing table to Parquet so that Impala can query it. Remember that changing the file format for a table does
+      not convert the data files within the table; you must prepare any Parquet data files containing complex types
+      outside Impala, and bring them into the table using <code class="ph codeph">LOAD DATA</code> or updating the table's
+      <code class="ph codeph">LOCATION</code> property.
+      See <a class="xref" href="impala_complex_types.html#complex_types">Complex Types (Impala 2.3 or higher only)</a> for details about using complex types.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+    <p class="p">
+      Whenever you specify partitions in an <code class="ph codeph">ALTER TABLE</code> statement, through the <code class="ph codeph">PARTITION
+      (<var class="keyword varname">partition_spec</var>)</code> clause, you must include all the partitioning columns in the
+      specification.
+    </p>
+
+    <p class="p">
+      Most of the <code class="ph codeph">ALTER TABLE</code> operations work the same for internal tables (managed by Impala) as
+      for external tables (with data files located in arbitrary locations). The exception is renaming a table; for
+      an external table, the underlying data directory is not renamed or moved.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Dropping or altering multiple partitions:</strong>
+    </p>
+
+    <p class="p">
+      In <span class="keyword">Impala 2.8</span> and higher,
+      the expression for the partition clause with a <code class="ph codeph">DROP</code> or <code class="ph codeph">SET</code>
+      operation can include comparison operators such as <code class="ph codeph">&lt;</code>, <code class="ph codeph">IN</code>,
+      or <code class="ph codeph">BETWEEN</code>, and Boolean operators such as <code class="ph codeph">AND</code>
+      and <code class="ph codeph">OR</code>.
+    </p>
+
+    <p class="p">
+      For example, you might drop a group of partitions corresponding to a particular date
+      range after the data <span class="q">"ages out"</span>:
+    </p>
+
+<pre class="pre codeblock"><code>
+alter table historical_data drop partition (year &lt; 1995);
+alter table historical_data drop partition (year = 1996 and month between 1 and 6);
+
+</code></pre>
+
+    <p class="p">
+      For tables with multiple partition keys columns, you can specify multiple
+      conditions separated by commas, and the operation only applies to the partitions
+      that match all the conditions (similar to using an <code class="ph codeph">AND</code> clause):
+    </p>
+
+<pre class="pre codeblock"><code>
+alter table historical_data drop partition (year &lt; 1995, last_name like 'A%');
+
+</code></pre>
+
+    <p class="p">
+      This technique can also be used to change the file format of groups of partitions,
+      as part of an ETL pipeline that periodically consolidates and rewrites the underlying
+      data files in a different file format:
+    </p>
+
+<pre class="pre codeblock"><code>
+alter table fast_growing_data partition (year = 2016, month in (10,11,12)) set fileformat parquet;
+
+</code></pre>
+
+    <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+      <p class="p">
+        The extended syntax involving comparison operators and multiple partitions
+        applies to the <code class="ph codeph">SET FILEFORMAT</code>, <code class="ph codeph">SET TBLPROPERTIES</code>,
+        <code class="ph codeph">SET SERDEPROPERTIES</code>, and <code class="ph codeph">SET [UN]CACHED</code> clauses.
+        You can also use this syntax with the <code class="ph codeph">PARTITION</code> clause
+        in the <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> statement, and with the
+        <code class="ph codeph">PARTITION</code> clause of the <code class="ph codeph">SHOW FILES</code> statement.
+        Some forms of <code class="ph codeph">ALTER TABLE</code> still only apply to one partition
+        at a time: the <code class="ph codeph">SET LOCATION</code> and <code class="ph codeph">ADD PARTITION</code>
+        clauses. The <code class="ph codeph">PARTITION</code> clauses in the <code class="ph codeph">LOAD DATA</code>
+        and <code class="ph codeph">INSERT</code> statements also only apply to one partition at a time.
+      </p>
+      <p class="p">
+        A DDL statement that applies to multiple partitions is considered successful
+        (resulting in no changes) even if no partitions match the conditions.
+        The results are the same as if the <code class="ph codeph">IF EXISTS</code> clause was specified.
+      </p>
+      <p class="p">
+        The performance and scalability of this technique is similar to
+        issuing a sequence of single-partition <code class="ph codeph">ALTER TABLE</code>
+        statements in quick succession. To minimize bottlenecks due to
+        communication with the metastore database, or causing other
+        DDL operations on the same table to wait, test the effects of
+        performing <code class="ph codeph">ALTER TABLE</code> statements that affect
+        large numbers of partitions.
+      </p>
+    </div>
+
+    <p class="p">
+        <strong class="ph b">Amazon S3 considerations:</strong>
+      </p>
+
+    <p class="p">
+      You can specify an <code class="ph codeph">s3a://</code> prefix on the <code class="ph codeph">LOCATION</code> attribute of a table or partition
+      to make Impala query data from the Amazon S3 filesystem. In <span class="keyword">Impala 2.6</span> and higher, Impala automatically
+      handles creating or removing the associated folders when you issue <code class="ph codeph">ALTER TABLE</code> statements
+      with the <code class="ph codeph">ADD PARTITION</code> or <code class="ph codeph">DROP PARTITION</code> clauses.
+    </p>
+
+    <p class="p">
+        In <span class="keyword">Impala 2.6</span> and higher, Impala DDL statements such as
+        <code class="ph codeph">CREATE DATABASE</code>, <code class="ph codeph">CREATE TABLE</code>, <code class="ph codeph">DROP DATABASE CASCADE</code>,
+        <code class="ph codeph">DROP TABLE</code>, and <code class="ph codeph">ALTER TABLE [ADD|DROP] PARTITION</code> can create or remove folders
+        as needed in the Amazon S3 system. Prior to <span class="keyword">Impala 2.6</span>, you had to create folders yourself and point
+        Impala database, tables, or partitions at them, and manually remove folders when no longer needed.
+        See <a class="xref" href="../shared/../topics/impala_s3.html#s3">Using Impala with the Amazon S3 Filesystem</a> for details about reading and writing S3 data with Impala.
+      </p>
+
+    <p class="p">
+      <strong class="ph b">HDFS caching (CACHED IN clause):</strong>
+    </p>
+
+    <p class="p">
+      If you specify the <code class="ph codeph">CACHED IN</code> clause, any existing or future data files in the table
+      directory or the partition subdirectories are designated to be loaded into memory with the HDFS caching
+      mechanism. See <a class="xref" href="impala_perf_hdfs_caching.html#hdfs_caching">Using HDFS Caching with Impala (Impala 2.1 or higher only)</a> for details about using the HDFS
+      caching feature.
+    </p>
+
+    <p class="p">
+        In <span class="keyword">Impala 2.2</span> and higher, the optional <code class="ph codeph">WITH REPLICATION</code> clause
+        for <code class="ph codeph">CREATE TABLE</code> and <code class="ph codeph">ALTER TABLE</code> lets you specify
+        a <dfn class="term">replication factor</dfn>, the number of hosts on which to cache the same data blocks.
+        When Impala processes a cached data block, where the cache replication factor is greater than 1, Impala randomly
+        selects a host that has a cached copy of that data block. This optimization avoids excessive CPU
+        usage on a single host when the same cached data block is processed multiple times.
+        Where practical, specify a value greater than or equal to the HDFS block replication factor.
+      </p>
+
+    <p class="p">
+        If you connect to different Impala nodes within an <span class="keyword cmdname">impala-shell</span> session for
+        load-balancing purposes, you can enable the <code class="ph codeph">SYNC_DDL</code> query option to make each DDL
+        statement wait before returning, until the new or changed metadata has been received by all the Impala
+        nodes. See <a class="xref" href="../shared/../topics/impala_sync_ddl.html#sync_ddl">SYNC_DDL Query Option</a> for details.
+      </p>
+
+    <p class="p">
+      The following sections show examples of the use cases for various <code class="ph codeph">ALTER TABLE</code> clauses.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">To rename a table (RENAME TO clause):</strong>
+    </p>
+
+
+
+    <p class="p">
+      The <code class="ph codeph">RENAME TO</code> clause lets you change the name of an existing table, and optionally which
+      database it is located in.
+    </p>
+
+    <p class="p">
+      For internal tables, this operation physically renames the directory within HDFS that contains the data files;
+      the original directory name no longer exists. By qualifying the table names with database names, you can use
+      this technique to move an internal table (and its associated data directory) from one database to another.
+      For example:
+    </p>
+
+<pre class="pre codeblock"><code>create database d1;
+create database d2;
+create database d3;
+use d1;
+create table mobile (x int);
+use d2;
+-- Move table from another database to the current one.
+alter table d1.mobile rename to mobile;
+use d1;
+-- Move table from one database to another.
+alter table d2.mobile rename to d3.mobile;</code></pre>
+
+    <p class="p">
+      For external tables,
+    </p>
+
+    <p class="p">
+      <strong class="ph b">To change the physical location where Impala looks for data files associated with a table or
+      partition:</strong>
+    </p>
+
+<pre class="pre codeblock"><code>ALTER TABLE <var class="keyword varname">table_name</var> [PARTITION (<var class="keyword varname">partition_spec</var>)] SET LOCATION '<var class="keyword varname">hdfs_path_of_directory</var>';</code></pre>
+
+    <p class="p">
+      The path you specify is the full HDFS path where the data files reside, or will be created. Impala does not
+      create any additional subdirectory named after the table. Impala does not move any data files to this new
+      location or change any data files that might already exist in that directory.
+    </p>
+
+    <p class="p">
+      To set the location for a single partition, include the <code class="ph codeph">PARTITION</code> clause. Specify all the
+      same partitioning columns for the table, with a constant value for each, to precisely identify the single
+      partition affected by the statement:
+    </p>
+
+<pre class="pre codeblock"><code>create table p1 (s string) partitioned by (month int, day int);
+-- Each ADD PARTITION clause creates a subdirectory in HDFS.
+alter table p1 add partition (month=1, day=1);
+alter table p1 add partition (month=1, day=2);
+alter table p1 add partition (month=2, day=1);
+alter table p1 add partition (month=2, day=2);
+-- Redirect queries, INSERT, and LOAD DATA for one partition
+-- to a specific different directory.
+alter table p1 partition (month=1, day=1) set location '/usr/external_data/new_years_day';
+</code></pre>
+
+    <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+        If you are creating a partition for the first time and specifying its location, for maximum efficiency, use
+        a single <code class="ph codeph">ALTER TABLE</code> statement including both the <code class="ph codeph">ADD PARTITION</code> and
+        <code class="ph codeph">LOCATION</code> clauses, rather than separate statements with <code class="ph codeph">ADD PARTITION</code> and
+        <code class="ph codeph">SET LOCATION</code> clauses.
+      </div>
+
+    <p class="p">
+      <strong class="ph b">To automatically detect new partition directories added through Hive or HDFS operations:</strong>
+    </p>
+
+    <p class="p">
+      In <span class="keyword">Impala 2.3</span> and higher, the <code class="ph codeph">RECOVER PARTITIONS</code> clause scans
+      a partitioned table to detect if any new partition directories were added outside of Impala,
+      such as by Hive <code class="ph codeph">ALTER TABLE</code> statements or by <span class="keyword cmdname">hdfs dfs</span>
+      or <span class="keyword cmdname">hadoop fs</span> commands. The <code class="ph codeph">RECOVER PARTITIONS</code> clause
+      automatically recognizes any data files present in these new directories, the same as
+      the <code class="ph codeph">REFRESH</code> statement does.
+    </p>
+
+    <p class="p">
+      For example, here is a sequence of examples showing how you might create a partitioned table in Impala,
+      create new partitions through Hive, copy data files into the new partitions with the <span class="keyword cmdname">hdfs</span>
+      command, and have Impala recognize the new partitions and new data:
+    </p>
+
+    <p class="p">
+      In Impala, create the table, and a single partition for demonstration purposes:
+    </p>
+
+<pre class="pre codeblock"><code>
+
+create database recover_partitions;
+use recover_partitions;
+create table t1 (s string) partitioned by (yy int, mm int);
+insert into t1 partition (yy = 2016, mm = 1) values ('Partition exists');
+show files in t1;
++---------------------------------------------------------------------+------+--------------+
+| Path                                                                | Size | Partition    |
++---------------------------------------------------------------------+------+--------------+
+| /user/hive/warehouse/recover_partitions.db/t1/yy=2016/mm=1/data.txt | 17B  | yy=2016/mm=1 |
++---------------------------------------------------------------------+------+--------------+
+quit;
+
+</code></pre>
+
+    <p class="p">
+      In Hive, create some new partitions. In a real use case, you might create the
+      partitions and populate them with data as the final stages of an ETL pipeline.
+    </p>
+
+<pre class="pre codeblock"><code>
+
+hive&gt; use recover_partitions;
+OK
+hive&gt; alter table t1 add partition (yy = 2016, mm = 2);
+OK
+hive&gt; alter table t1 add partition (yy = 2016, mm = 3);
+OK
+hive&gt; quit;
+
+</code></pre>
+
+    <p class="p">
+      For demonstration purposes, manually copy data (a single row) into these
+      new partitions, using manual HDFS operations:
+    </p>
+
+<pre class="pre codeblock"><code>
+
+$ hdfs dfs -ls /user/hive/warehouse/recover_partitions.db/t1/yy=2016/
+Found 3 items
+drwxr-xr-x - impala   hive 0 2016-05-09 16:06 /user/hive/warehouse/recover_partitions.db/t1/yy=2016/mm=1
+drwxr-xr-x - jrussell hive 0 2016-05-09 16:14 /user/hive/warehouse/recover_partitions.db/t1/yy=2016/mm=2
+drwxr-xr-x - jrussell hive 0 2016-05-09 16:13 /user/hive/warehouse/recover_partitions.db/t1/yy=2016/mm=3
+
+$ hdfs dfs -cp /user/hive/warehouse/recover_partitions.db/t1/yy=2016/mm=1/data.txt \
+  /user/hive/warehouse/recover_partitions.db/t1/yy=2016/mm=2/data.txt
+$ hdfs dfs -cp /user/hive/warehouse/recover_partitions.db/t1/yy=2016/mm=1/data.txt \
+  /user/hive/warehouse/recover_partitions.db/t1/yy=2016/mm=3/data.txt
+
+</code></pre>
+
+<pre class="pre codeblock"><code>
+
+hive&gt; select * from t1;
+OK
+Partition exists  2016  1
+Partition exists  2016  2
+Partition exists  2016  3
+hive&gt; quit;
+
+</code></pre>
+
+    <p class="p">
+      In Impala, initially the partitions and data are not visible.
+      Running <code class="ph codeph">ALTER TABLE</code> with the <code class="ph codeph">RECOVER PARTITIONS</code>
+      clause scans the table data directory to find any new partition directories, and
+      the data files inside them:
+    </p>
+
+<pre class="pre codeblock"><code>
+
+select * from t1;
++------------------+------+----+
+| s                | yy   | mm |
++------------------+------+----+
+| Partition exists | 2016 | 1  |
++------------------+------+----+
+
+alter table t1 recover partitions;
+select * from t1;
++------------------+------+----+
+| s                | yy   | mm |
++------------------+------+----+
+| Partition exists | 2016 | 1  |
+| Partition exists | 2016 | 3  |
+| Partition exists | 2016 | 2  |
++------------------+------+----+
+
+</code></pre>
+
+    <p class="p">
+      <strong class="ph b">To change the key-value pairs of the TBLPROPERTIES and SERDEPROPERTIES fields:</strong>
+    </p>
+
+<pre class="pre codeblock"><code>ALTER TABLE <var class="keyword varname">table_name</var> SET TBLPROPERTIES ('<var class="keyword varname">key1</var>'='<var class="keyword varname">value1</var>', '<var class="keyword varname">key2</var>'='<var class="keyword varname">value2</var>'[, ...]);
+ALTER TABLE <var class="keyword varname">table_name</var> SET SERDEPROPERTIES ('<var class="keyword varname">key1</var>'='<var class="keyword varname">value1</var>', '<var class="keyword varname">key2</var>'='<var class="keyword varname">value2</var>'[, ...]);</code></pre>
+
+    <p class="p">
+      The <code class="ph codeph">TBLPROPERTIES</code> clause is primarily a way to associate arbitrary user-specified data items
+      with a particular table.
+    </p>
+
+    <p class="p">
+      The <code class="ph codeph">SERDEPROPERTIES</code> clause sets up metadata defining how tables are read or written, needed
+      in some cases by Hive but not used extensively by Impala. You would use this clause primarily to change the
+      delimiter in an existing text table or partition, by setting the <code class="ph codeph">'serialization.format'</code> and
+      <code class="ph codeph">'field.delim'</code> property values to the new delimiter character:
+    </p>
+
+<pre class="pre codeblock"><code>-- This table begins life as pipe-separated text format.
+create table change_to_csv (s1 string, s2 string) row format delimited fields terminated by '|';
+-- Then we change it to a CSV table.
+alter table change_to_csv set SERDEPROPERTIES ('serialization.format'=',', 'field.delim'=',');
+insert overwrite change_to_csv values ('stop','go'), ('yes','no');
+!hdfs dfs -cat 'hdfs://<var class="keyword varname">hostname</var>:8020/<var class="keyword varname">data_directory</var>/<var class="keyword varname">dbname</var>.db/change_to_csv/<var class="keyword varname">data_file</var>';
+stop,go
+yes,no</code></pre>
+
+    <p class="p">
+      Use the <code class="ph codeph">DESCRIBE FORMATTED</code> statement to see the current values of these properties for an
+      existing table. See <a class="xref" href="impala_create_table.html#create_table">CREATE TABLE Statement</a> for more details about these clauses.
+      See <a class="xref" href="impala_perf_stats.html#perf_table_stats_manual">impala_perf_stats.html#perf_table_stats_manual</a> for an example of using table properties to
+      fine-tune the performance-related table statistics.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">To manually set or update table or column statistics:</strong>
+    </p>
+
+    <p class="p">
+      Although for most tables the <code class="ph codeph">COMPUTE STATS</code> or <code class="ph codeph">COMPUTE INCREMENTAL STATS</code>
+      statement is all you need to keep table and column statistics up to date for a table,
+      sometimes for a very large table or one that is updated frequently, the length of time to recompute
+      all the statistics might make it impractical to run those statements as often as needed.
+      As a workaround, you can use the <code class="ph codeph">ALTER TABLE</code> statement to set table statistics
+      at the level of the entire table or a single partition, or column statistics at the level of
+      the entire table.
+    </p>
+
+    <div class="p">
+      You can set the <code class="ph codeph">numrows</code> value for table statistics by changing the
+      <code class="ph codeph">TBLPROPERTIES</code> setting for a table or partition.
+      For example:
+<pre class="pre codeblock"><code>create table analysis_data stored as parquet as select * from raw_data;
+Inserted 1000000000 rows in 181.98s
+compute stats analysis_data;
+insert into analysis_data select * from smaller_table_we_forgot_before;
+Inserted 1000000 rows in 15.32s
+-- Now there are 1001000000 rows. We can update this single data point in the stats.
+alter table analysis_data set tblproperties('numRows'='1001000000', 'STATS_GENERATED_VIA_STATS_TASK'='true');</code></pre>
+<pre class="pre codeblock"><code>-- If the table originally contained 1 million rows, and we add another partition with 30 thousand rows,
+-- change the numRows property for the partition and the overall table.
+alter table partitioned_data partition(year=2009, month=4) set tblproperties ('numRows'='30000', 'STATS_GENERATED_VIA_STATS_TASK'='true');
+alter table partitioned_data set tblproperties ('numRows'='1030000', 'STATS_GENERATED_VIA_STATS_TASK'='true');</code></pre>
+      See <a class="xref" href="impala_perf_stats.html#perf_table_stats_manual">impala_perf_stats.html#perf_table_stats_manual</a> for details.
+    </div>
+
+    <p class="p">
+      In <span class="keyword">Impala 2.6</span> and higher, you can use the <code class="ph codeph">SET COLUMN STATS</code> clause
+      to set a specific stats value for a particular column.
+    </p>
+
+    <div class="p">
+        You specify a case-insensitive symbolic name for the kind of statistics:
+        <code class="ph codeph">numDVs</code>, <code class="ph codeph">numNulls</code>, <code class="ph codeph">avgSize</code>, <code class="ph codeph">maxSize</code>.
+        The key names and values are both quoted. This operation applies to an entire table,
+        not a specific partition. For example:
+<pre class="pre codeblock"><code>
+create table t1 (x int, s string);
+insert into t1 values (1, 'one'), (2, 'two'), (2, 'deux');
+show column stats t1;
++--------+--------+------------------+--------+----------+----------+
+| Column | Type   | #Distinct Values | #Nulls | Max Size | Avg Size |
++--------+--------+------------------+--------+----------+----------+
+| x      | INT    | -1               | -1     | 4        | 4        |
+| s      | STRING | -1               | -1     | -1       | -1       |
++--------+--------+------------------+--------+----------+----------+
+alter table t1 set column stats x ('numDVs'='2','numNulls'='0');
+alter table t1 set column stats s ('numdvs'='3','maxsize'='4');
+show column stats t1;
++--------+--------+------------------+--------+----------+----------+
+| Column | Type   | #Distinct Values | #Nulls | Max Size | Avg Size |
++--------+--------+------------------+--------+----------+----------+
+| x      | INT    | 2                | 0      | 4        | 4        |
+| s      | STRING | 3                | -1     | 4        | -1       |
++--------+--------+------------------+--------+----------+----------+
+</code></pre>
+      </div>
+
+    <p class="p">
+      <strong class="ph b">To reorganize columns for a table:</strong>
+    </p>
+
+<pre class="pre codeblock"><code>ALTER TABLE <var class="keyword varname">table_name</var> ADD COLUMNS (<var class="keyword varname">column_defs</var>);
+ALTER TABLE <var class="keyword varname">table_name</var> REPLACE COLUMNS (<var class="keyword varname">column_defs</var>);
+ALTER TABLE <var class="keyword varname">table_name</var> CHANGE <var class="keyword varname">column_name</var> <var class="keyword varname">new_name</var> <var class="keyword varname">new_type</var>;
+ALTER TABLE <var class="keyword varname">table_name</var> DROP <var class="keyword varname">column_name</var>;</code></pre>
+
+    <p class="p">
+      The <var class="keyword varname">column_spec</var> is the same as in the <code class="ph codeph">CREATE TABLE</code> statement: the column
+      name, then its data type, then an optional comment. You can add multiple columns at a time. The parentheses
+      are required whether you add a single column or multiple columns. When you replace columns, all the original
+      column definitions are discarded. You might use this technique if you receive a new set of data files with
+      different data types or columns in a different order. (The data files are retained, so if the new columns are
+      incompatible with the old ones, use <code class="ph codeph">INSERT OVERWRITE</code> or <code class="ph codeph">LOAD DATA OVERWRITE</code>
+      to replace all the data before issuing any further queries.)
+    </p>
+
+    <p class="p">
+      For example, here is how you might add columns to an existing table.
+      The first <code class="ph codeph">ALTER TABLE</code> adds two new columns, and the second
+      <code class="ph codeph">ALTER TABLE</code> adds one new column.
+      A single Impala query reads both the old and new data files, containing different numbers of columns.
+      For any columns not present in a particular data file, all the column values are
+      considered to be <code class="ph codeph">NULL</code>.
+    </p>
+
+<pre class="pre codeblock"><code>
+create table t1 (x int);
+insert into t1 values (1), (2);
+
+alter table t1 add columns (s string, t timestamp);
+insert into t1 values (3, 'three', now());
+
+alter table t1 add columns (b boolean);
+insert into t1 values (4, 'four', now(), true);
+
+select * from t1 order by x;
++---+-------+-------------------------------+------+
+| x | s     | t                             | b    |
++---+-------+-------------------------------+------+
+| 1 | NULL  | NULL                          | NULL |
+| 2 | NULL  | NULL                          | NULL |
+| 3 | three | 2016-05-11 11:19:45.054457000 | NULL |
+| 4 | four  | 2016-05-11 11:20:20.260733000 | true |
++---+-------+-------------------------------+------+
+</code></pre>
+
+    <p class="p">
+      You might use the <code class="ph codeph">CHANGE</code> clause to rename a single column, or to treat an existing column as
+      a different type than before, such as to switch between treating a column as <code class="ph codeph">STRING</code> and
+      <code class="ph codeph">TIMESTAMP</code>, or between <code class="ph codeph">INT</code> and <code class="ph codeph">BIGINT</code>. You can only drop a
+      single column at a time; to drop multiple columns, issue multiple <code class="ph codeph">ALTER TABLE</code> statements, or
+      define the new set of columns with a single <code class="ph codeph">ALTER TABLE ... REPLACE COLUMNS</code> statement.
+    </p>
+
+    <p class="p">
+      The following examples show some safe operations to drop or change columns. Dropping the final column
+      in a table lets Impala ignore the data causing any disruption to existing data files. Changing the type
+      of a column works if existing data values can be safely converted to the new type. The type conversion
+      rules depend on the file format of the underlying table. For example, in a text table, the same value
+      can be interpreted as a <code class="ph codeph">STRING</code> or a numeric value, while in a binary format such as
+      Parquet, the rules are stricter and type conversions only work between certain sizes of integers.
+    </p>
+
+<pre class="pre codeblock"><code>
+create table optional_columns (x int, y int, z int, a1 int, a2 int);
+insert into optional_columns values (1,2,3,0,0), (2,3,4,100,100);
+
+-- When the last column in the table is dropped, Impala ignores the
+-- values that are no longer needed. (Dropping A1 but leaving A2
+-- would cause problems, as we will see in a subsequent example.)
+alter table optional_columns drop column a2;
+alter table optional_columns drop column a1;
+
+select * from optional_columns;
++---+---+---+
+| x | y | z |
++---+---+---+
+| 1 | 2 | 3 |
+| 2 | 3 | 4 |
++---+---+---+
+</code></pre>
+
+<pre class="pre codeblock"><code>
+create table int_to_string (s string, x int);
+insert into int_to_string values ('one', 1), ('two', 2);
+
+-- What was an INT column will now be interpreted as STRING.
+-- This technique works for text tables but not other file formats.
+-- The second X represents the new name of the column, which we keep the same.
+alter table int_to_string change x x string;
+
+-- Once the type is changed, we can insert non-integer values into the X column
+-- and treat that column as a string, for example by uppercasing or concatenating.
+insert into int_to_string values ('three', 'trois');
+select s, upper(x) from int_to_string;
++-------+----------+
+| s     | upper(x) |
++-------+----------+
+| one   | 1        |
+| two   | 2        |
+| three | TROIS    |
++-------+----------+
+</code></pre>
+
+    <p class="p">
+      Remember that Impala does not actually do any conversion for the underlying data files as a result of
+      <code class="ph codeph">ALTER TABLE</code> statements. If you use <code class="ph codeph">ALTER TABLE</code> to create a table
+      layout that does not agree with the contents of the underlying files, you must replace the files
+      yourself, such as using <code class="ph codeph">LOAD DATA</code> to load a new set of data files, or
+      <code class="ph codeph">INSERT OVERWRITE</code> to copy from another table and replace the original data.
+    </p>
+
+    <p class="p">
+      The following example shows what happens if you delete the middle column from a Parquet table containing three columns.
+      The underlying data files still contain three columns of data. Because the columns are interpreted based on their positions in
+      the data file instead of the specific column names, a <code class="ph codeph">SELECT *</code> query now reads the first and second
+      columns from the data file, potentially leading to unexpected results or conversion errors.
+      For this reason, if you expect to someday drop a column, declare it as the last column in the table, where its data
+      can be ignored by queries after the column is dropped. Or, re-run your ETL process and create new data files
+      if you drop or change the type of a column in a way that causes problems with existing data files.
+    </p>
+
+<pre class="pre codeblock"><code>
+-- Parquet table showing how dropping a column can produce unexpected results.
+create table p1 (s1 string, s2 string, s3 string) stored as parquet;
+
+insert into p1 values ('one', 'un', 'uno'), ('two', 'deux', 'dos'),
+  ('three', 'trois', 'tres');
+select * from p1;
++-------+-------+------+
+| s1    | s2    | s3   |
++-------+-------+------+
+| one   | un    | uno  |
+| two   | deux  | dos  |
+| three | trois | tres |
++-------+-------+------+
+
+alter table p1 drop column s2;
+-- The S3 column contains unexpected results.
+-- Because S2 and S3 have compatible types, the query reads
+-- values from the dropped S2, because the existing data files
+-- still contain those values as the second column.
+select * from p1;
++-------+-------+
+| s1    | s3    |
++-------+-------+
+| one   | un    |
+| two   | deux  |
+| three | trois |
++-------+-------+
+</code></pre>
+
+<pre class="pre codeblock"><code>
+-- Parquet table showing how dropping a column can produce conversion errors.
+create table p2 (s1 string, x int, s3 string) stored as parquet;
+
+insert into p2 values ('one', 1, 'uno'), ('two', 2, 'dos'), ('three', 3, 'tres');
+select * from p2;
++-------+---+------+
+| s1    | x | s3   |
++-------+---+------+
+| one   | 1 | uno  |
+| two   | 2 | dos  |
+| three | 3 | tres |
++-------+---+------+
+
+alter table p2 drop column x;
+select * from p2;
+WARNINGS:
+File '<var class="keyword varname">hdfs_filename</var>' has an incompatible Parquet schema for column 'add_columns.p2.s3'.
+Column type: STRING, Parquet schema:
+optional int32 x [i:1 d:1 r:0]
+
+File '<var class="keyword varname">hdfs_filename</var>' has an incompatible Parquet schema for column 'add_columns.p2.s3'.
+Column type: STRING, Parquet schema:
+optional int32 x [i:1 d:1 r:0]
+</code></pre>
+
+    <p class="p">
+      In <span class="keyword">Impala 2.6</span> and higher, if an Avro table is created without column definitions in the
+      <code class="ph codeph">CREATE TABLE</code> statement, and columns are later
+      added through <code class="ph codeph">ALTER TABLE</code>, the resulting
+      table is now queryable. Missing values from the newly added
+      columns now default to <code class="ph codeph">NULL</code>.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">To change the file format that Impala expects data to be in, for a table or partition:</strong>
+    </p>
+
+    <p class="p">
+      Use an <code class="ph codeph">ALTER TABLE ... SET FILEFORMAT</code> clause. You can include an optional <code class="ph codeph">PARTITION
+      (<var class="keyword varname">col1</var>=<var class="keyword varname">val1</var>, <var class="keyword varname">col2</var>=<var class="keyword varname">val2</var>,
+      ...</code> clause so that the file format is changed for a specific partition rather than the entire table.
+    </p>
+
+    <p class="p">
+      Because this operation only changes the table metadata, you must do any conversion of existing data using
+      regular Hadoop techniques outside of Impala. Any new data created by the Impala <code class="ph codeph">INSERT</code>
+      statement will be in the new format. You cannot specify the delimiter for Text files; the data files must be
+      comma-delimited.
+
+    </p>
+
+    <p class="p">
+      To set the file format for a single partition, include the <code class="ph codeph">PARTITION</code> clause. Specify all the
+      same partitioning columns for the table, with a constant value for each, to precisely identify the single
+      partition affected by the statement:
+    </p>
+
+<pre class="pre codeblock"><code>create table p1 (s string) partitioned by (month int, day int);
+-- Each ADD PARTITION clause creates a subdirectory in HDFS.
+alter table p1 add partition (month=1, day=1);
+alter table p1 add partition (month=1, day=2);
+alter table p1 add partition (month=2, day=1);
+alter table p1 add partition (month=2, day=2);
+-- Queries and INSERT statements will read and write files
+-- in this format for this specific partition.
+alter table p1 partition (month=2, day=2) set fileformat parquet;
+</code></pre>
+
+    <p class="p">
+      <strong class="ph b">To add or drop partitions for a table</strong>, the table must already be partitioned (that is, created with a
+      <code class="ph codeph">PARTITIONED BY</code> clause). The partition is a physical directory in HDFS, with a name that
+      encodes a particular column value (the <strong class="ph b">partition key</strong>). The Impala <code class="ph codeph">INSERT</code> statement
+      already creates the partition if necessary, so the <code class="ph codeph">ALTER TABLE ... ADD PARTITION</code> is
+      primarily useful for importing data by moving or copying existing data files into the HDFS directory
+      corresponding to a partition. (You can use the <code class="ph codeph">LOAD DATA</code> statement to move files into the
+      partition directory, or <code class="ph codeph">ALTER TABLE ... PARTITION (...) SET LOCATION</code> to point a partition at
+      a directory that already contains data files.
+    </p>
+
+    <p class="p">
+      The <code class="ph codeph">DROP PARTITION</code> clause is used to remove the HDFS directory and associated data files for
+      a particular set of partition key values; for example, if you always analyze the last 3 months worth of data,
+      at the beginning of each month you might drop the oldest partition that is no longer needed. Removing
+      partitions reduces the amount of metadata associated with the table and the complexity of calculating the
+      optimal query plan, which can simplify and speed up queries on partitioned tables, particularly join queries.
+      Here is an example showing the <code class="ph codeph">ADD PARTITION</code> and <code class="ph codeph">DROP PARTITION</code> clauses.
+    </p>
+
+    <p class="p">
+      To avoid errors while adding or dropping partitions whose existence is not certain,
+      add the optional <code class="ph codeph">IF [NOT] EXISTS</code> clause between the <code class="ph codeph">ADD</code> or
+      <code class="ph codeph">DROP</code> keyword and the <code class="ph codeph">PARTITION</code> keyword. That is, the entire
+      clause becomes <code class="ph codeph">ADD IF NOT EXISTS PARTITION</code> or <code class="ph codeph">DROP IF EXISTS PARTITION</code>.
+      The following example shows how partitions can be created automatically through <code class="ph codeph">INSERT</code>
+      statements, or manually through <code class="ph codeph">ALTER TABLE</code> statements. The <code class="ph codeph">IF [NOT] EXISTS</code>
+      clauses let the <code class="ph codeph">ALTER TABLE</code> statements succeed even if a new requested partition already
+      exists, or a partition to be dropped does not exist.
+    </p>
+
+<p class="p">
+Inserting 2 year values creates 2 partitions:
+</p>
+
+<pre class="pre codeblock"><code>
+create table partition_t (s string) partitioned by (y int);
+insert into partition_t (s,y) values ('two thousand',2000), ('nineteen ninety',1990);
+show partitions partition_t;
++-------+-------+--------+------+--------------+-------------------+--------+-------------------+
+| y     | #Rows | #Files | Size | Bytes Cached | Cache Replication | Format | Incremental stats |
++-------+-------+--------+------+--------------+-------------------+--------+-------+
+| 1990  | -1    | 1      | 16B  | NOT CACHED   | NOT CACHED        | TEXT   | false |
+| 2000  | -1    | 1      | 13B  | NOT CACHED   | NOT CACHED        | TEXT   | false |
+| Total | -1    | 2      | 29B  | 0B           |                   |        |       |
++-------+-------+--------+------+--------------+-------------------+--------+-------+
+</code></pre>
+
+<p class="p">
+Without the <code class="ph codeph">IF NOT EXISTS</code> clause, an attempt to add a new partition might fail:
+</p>
+
+<pre class="pre codeblock"><code>
+alter table partition_t add partition (y=2000);
+ERROR: AnalysisException: Partition spec already exists: (y=2000).
+</code></pre>
+
+<p class="p">
+The <code class="ph codeph">IF NOT EXISTS</code> clause makes the statement succeed whether or not there was already a
+partition with the specified key value:
+</p>
+
+<pre class="pre codeblock"><code>
+alter table partition_t add if not exists partition (y=2000);
+alter table partition_t add if not exists partition (y=2010);
+show partitions partition_t;
++-------+-------+--------+------+--------------+-------------------+--------+-------------------+
+| y     | #Rows | #Files | Size | Bytes Cached | Cache Replication | Format | Incremental stats |
++-------+-------+--------+------+--------------+-------------------+--------+-------+
+| 1990  | -1    | 1      | 16B  | NOT CACHED   | NOT CACHED        | TEXT   | false |
+| 2000  | -1    | 1      | 13B  | NOT CACHED   | NOT CACHED        | TEXT   | false |
+| 2010  | -1    | 0      | 0B   | NOT CACHED   | NOT CACHED        | TEXT   | false |
+| Total | -1    | 2      | 29B  | 0B           |                   |        |       |
++-------+-------+--------+------+--------------+-------------------+--------+-------+
+</code></pre>
+
+<p class="p">
+Likewise, the <code class="ph codeph">IF EXISTS</code> clause lets <code class="ph codeph">DROP PARTITION</code> succeed whether or not the partition is already
+in the table:
+</p>
+
+<pre class="pre codeblock"><code>
+alter table partition_t drop if exists partition (y=2000);
+alter table partition_t drop if exists partition (y=1950);
+show partitions partition_t;
++-------+-------+--------+------+--------------+-------------------+--------+-------------------+
+| y     | #Rows | #Files | Size | Bytes Cached | Cache Replication | Format | Incremental stats |
++-------+-------+--------+------+--------------+-------------------+--------+-------+
+| 1990  | -1    | 1      | 16B  | NOT CACHED   | NOT CACHED        | TEXT   | false |
+| 2010  | -1    | 0      | 0B   | NOT CACHED   | NOT CACHED        | TEXT   | false |
+| Total | -1    | 1      | 16B  | 0B           |                   |        |       |
++-------+-------+--------+------+--------------+-------------------+--------+-------+
+</code></pre>
+
+    <p class="p"> The optional <code class="ph codeph">PURGE</code> keyword, available in
+      <span class="keyword">Impala 2.3</span> and higher, is used with the <code class="ph codeph">DROP
+        PARTITION</code> clause to remove associated HDFS data files
+      immediately rather than going through the HDFS trashcan mechanism. Use
+      this keyword when dropping a partition if it is crucial to remove the data
+      as quickly as possible to free up space, or if there is a problem with the
+      trashcan, such as the trash cannot being configured or being in a
+      different HDFS encryption zone than the data files. </p>
+
+
+
+<pre class="pre codeblock"><code>-- Create an empty table and define the partitioning scheme.
+create table part_t (x int) partitioned by (month int);
+-- Create an empty partition into which you could copy data files from some other source.
+alter table part_t add partition (month=1);
+-- After changing the underlying data, issue a REFRESH statement to make the data visible in Impala.
+refresh part_t;
+-- Later, do the same for the next month.
+alter table part_t add partition (month=2);
+
+-- Now you no longer need the older data.
+alter table part_t drop partition (month=1);
+-- If the table was partitioned by month and year, you would issue a statement like:
+-- alter table part_t drop partition (year=2003,month=1);
+-- which would require 12 ALTER TABLE statements to remove a year's worth of data.
+
+-- If the data files for subsequent months were in a different file format,
+-- you could set a different file format for the new partition as you create it.
+alter table part_t add partition (month=3) set fileformat=parquet;
+</code></pre>
+
+    <p class="p">
+      The value specified for a partition key can be an arbitrary constant expression, without any references to
+      columns. For example:
+    </p>
+
+<pre class="pre codeblock"><code>alter table time_data add partition (month=concat('Decem','ber'));
+alter table sales_data add partition (zipcode = cast(9021 * 10 as string));</code></pre>
+
+    <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+      <p class="p">
+        An alternative way to reorganize a table and its associated data files is to use <code class="ph codeph">CREATE
+        TABLE</code> to create a variation of the original table, then use <code class="ph codeph">INSERT</code> to copy the
+        transformed or reordered data to the new table. The advantage of <code class="ph codeph">ALTER TABLE</code> is that it
+        avoids making a duplicate copy of the data files, allowing you to reorganize huge volumes of data in a
+        space-efficient way using familiar Hadoop techniques.
+      </p>
+    </div>
+
+    <p class="p">
+      <strong class="ph b">To switch a table between internal and external:</strong>
+    </p>
+
+    <div class="p">
+        You can switch a table from internal to external, or from external to internal, by using the <code class="ph codeph">ALTER
+        TABLE</code> statement:
+<pre class="pre codeblock"><code>
+-- Switch a table from internal to external.
+ALTER TABLE <var class="keyword varname">table_name</var> SET TBLPROPERTIES('EXTERNAL'='TRUE');
+
+-- Switch a table from external to internal.
+ALTER TABLE <var class="keyword varname">table_name</var> SET TBLPROPERTIES('EXTERNAL'='FALSE');
+</code></pre>
+      </div>
+
+    <p class="p">
+        <strong class="ph b">Cancellation:</strong> Cannot be cancelled.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">HDFS permissions:</strong>
+      </p>
+    <p class="p">
+      Most <code class="ph codeph">ALTER TABLE</code> clauses do not actually
+      read or write any HDFS files, and so do not depend on
+      specific HDFS permissions. For example, the <code class="ph codeph">SET FILEFORMAT</code>
+      clause does not actually check the file format existing data files or
+      convert them to the new format, and the <code class="ph codeph">SET LOCATION</code> clause
+      does not require any special permissions on the new location.
+      (Any permission-related failures would come later, when you
+      actually query or insert into the table.)
+    </p>
+
+
+    <p class="p">
+      In general, <code class="ph codeph">ALTER TABLE</code> clauses that do touch
+      HDFS files and directories require the same HDFS permissions
+      as corresponding <code class="ph codeph">CREATE</code>, <code class="ph codeph">INSERT</code>,
+      or <code class="ph codeph">SELECT</code> statements.
+      The permissions allow
+      the user ID that the <span class="keyword cmdname">impalad</span> daemon runs under,
+      typically the <code class="ph codeph">impala</code> user, to read or write
+      files or directories, or (in the case of the execute bit) descend into a directory.
+      The <code class="ph codeph">RENAME TO</code> clause requires read, write, and execute permission in the
+      source and destination database directories and in the table data directory,
+      and read and write permission for the data files within the table.
+      The <code class="ph codeph">ADD PARTITION</code> and <code class="ph codeph">DROP PARTITION</code> clauses
+      require write and execute permissions for the associated partition directory.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Kudu considerations:</strong>
+      </p>
+
+    <div class="p">
+      Because of the extra constraints and features of Kudu tables, such as the <code class="ph codeph">NOT NULL</code>
+      and <code class="ph codeph">DEFAULT</code> attributes for columns, <code class="ph codeph">ALTER TABLE</code> has specific
+      requirements related to Kudu tables:
+      <ul class="ul">
+        <li class="li">
+          <p class="p">
+            In an <code class="ph codeph">ADD COLUMNS</code> operation, you can specify the <code class="ph codeph">NULL</code>,
+            <code class="ph codeph">NOT NULL</code>, and <code class="ph codeph">DEFAULT <var class="keyword varname">default_value</var></code>
+            column attributes.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            In <span class="keyword">Impala 2.9</span> and higher, you can also specify the <code class="ph codeph">ENCODING</code>,
+            <code class="ph codeph">COMPRESSION</code>, and <code class="ph codeph">BLOCK_SIZE</code> attributes when adding a column.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            If you add a column with a <code class="ph codeph">NOT NULL</code> attribute, it must also have a
+            <code class="ph codeph">DEFAULT</code> attribute, so the default value can be assigned to that
+            column for all existing rows.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            The <code class="ph codeph">DROP COLUMN</code> clause works the same for a Kudu table as for other
+            kinds of tables.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            Although you can change the name of a column with the <code class="ph codeph">CHANGE</code> clause,
+            you cannot change the type of a column in a Kudu table.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            You cannot change the nullability of existing columns in a Kudu table.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            In <span class="keyword">Impala 2.10</span>, you can change the default value, encoding,
+            compression, or block size of existing columns in a Kudu table by using the
+            <code class="ph codeph">SET</code> clause.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            You cannot use the <code class="ph codeph">REPLACE COLUMNS</code> clause with a Kudu table.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            The <code class="ph codeph">RENAME TO</code> clause for a Kudu table only affects the name stored in the
+            metastore database that Impala uses to refer to the table. To change which underlying Kudu
+            table is associated with an Impala table name, you must change the <code class="ph codeph">TBLPROPERTIES</code>
+            property of the table: <code class="ph codeph">SET TBLPROPERTIES('kudu.table_name'='<var class="keyword varname">kudu_tbl_name</var>)</code>.
+            Doing so causes Kudu to change the name of the underlying Kudu table.
+          </p>
+        </li>
+      </ul>
+    </div>
+
+    <p class="p">
+      The following are some examples of using the <code class="ph codeph">ADD COLUMNS</code> clause for a Kudu table:
+    </p>
+
+<pre class="pre codeblock"><code>
+CREATE TABLE t1 ( x INT, PRIMARY KEY (x) )
+  PARTITION BY HASH (x) PARTITIONS 16
+  STORED AS KUDU
+
+ALTER TABLE t1 ADD COLUMNS (y STRING ENCODING prefix_encoding);
+ALTER TABLE t1 ADD COLUMNS (z INT DEFAULT 10);
+ALTER TABLE t1 ADD COLUMNS (a STRING NOT NULL DEFAULT '', t TIMESTAMP COMPRESSION default_compression);
+</code></pre>
+
+    <p class="p">
+      The following are some examples of modifying column defaults and storage attributes for a Kudu table:
+    </p>
+
+<pre class="pre codeblock"><code>
+create table kt (x bigint primary key, s string default 'yes', t timestamp)
+  stored as kudu;
+
+-- You can change the default value for a column, which affects any rows
+-- inserted after this change is made.
+alter table kt alter column s set default 'no';
+
+-- You can remove the default value for a column, which affects any rows
+-- inserted after this change is made. If the column is nullable, any
+-- future inserts default to NULL for this column. If the column is marked
+-- NOT NULL, any future inserts must specify a value for the column.
+alter table kt alter column s drop default;
+
+insert into kt values (1, 'foo', now());
+-- Because of the DROP DEFAULT above, omitting S from the insert
+-- gives it a value of NULL.
+insert into kt (x, t) values (2, now());
+
+select * from kt;
++---+------+-------------------------------+
+| x | s    | t                             |
++---+------+-------------------------------+
+| 2 | NULL | 2017-10-02 00:03:40.652156000 |
+| 1 | foo  | 2017-10-02 00:03:04.346185000 |
++---+------+-------------------------------+
+
+-- Other storage-related attributes can also be changed for columns.
+-- These changes take effect for any newly inserted rows, or rows
+-- rearranged due to compaction after deletes or updates.
+alter table kt alter column s set encoding prefix_encoding;
+-- The COLUMN keyword is optional in the syntax.
+alter table kt alter x set block_size 2048;
+alter table kt alter column t set compression zlib;
+
+desc kt;
++------+-----------+---------+-------------+----------+---------------+-----------------+---------------------+------------+
+| name | type      | comment | primary_key | nullable | default_value | encoding        | compression         | block_size |
++------+-----------+---------+-------------+----------+---------------+-----------------+---------------------+------------+
+| x    | bigint    |         | true        | false    |               | AUTO_ENCODING   | DEFAULT_COMPRESSION | 2048       |
+| s    | string    |         | false       | true     |               | PREFIX_ENCODING | DEFAULT_COMPRESSION | 0          |
+| t    | timestamp |         | false       | true     |               | AUTO_ENCODING   | ZLIB                | 0          |
++------+-----------+---------+-------------+----------+---------------+-----------------+---------------------+------------+
+</code></pre>
+
+    <p class="p">
+      Kudu tables all use an underlying partitioning mechanism. The partition syntax is different than for non-Kudu
+      tables. You can use the <code class="ph codeph">ALTER TABLE</code> statement to add and drop <dfn class="term">range partitions</dfn>
+      from a Kudu table. Any new range must not overlap with any existing ranges. Dropping a range removes all the associated
+      rows from the table. See <a class="xref" href="impala_kudu.html#kudu_partitioning">Partitioning for Kudu Tables</a> for details.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+    <p class="p">
+      <a class="xref" href="impala_tables.html#tables">Overview of Impala Tables</a>,
+      <a class="xref" href="impala_create_table.html#create_table">CREATE TABLE Statement</a>, <a class="xref" href="impala_drop_table.html#drop_table">DROP TABLE Statement</a>,
+      <a class="xref" href="impala_partitioning.html#partitioning">Partitioning for Impala Tables</a>, <a class="xref" href="impala_tables.html#internal_tables">Internal Tables</a>,
+      <a class="xref" href="impala_tables.html#external_tables">External Tables</a>
+    </p>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_langref_sql.html">Impala SQL Statements</a></div></div></nav></article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_alter_view.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_alter_view.html b/docs/build3x/html/topics/impala_alter_view.html
new file mode 100644
index 0000000..2d96fa1
--- /dev/null
+++ b/docs/build3x/html/topics/impala_alter_view.html
@@ -0,0 +1,139 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_langref_sql.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="alter_view"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>ALTER VIEW Statement</title></head><body id="alter_view"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">ALTER VIEW Statement</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+
+      Changes the characteristics of a view. The syntax has two forms:
+    </p>
+
+    <ul class="ul">
+      <li class="li">
+        The <code class="ph codeph">AS</code> clause associates the view with a different query.
+      </li>
+      <li class="li">
+        The <code class="ph codeph">RENAME TO</code> clause changes the name of the view, moves the view to
+        a different database, or both.
+      </li>
+    </ul>
+
+    <p class="p">
+      Because a view is purely a logical construct (an alias for a query) with no physical data behind it,
+      <code class="ph codeph">ALTER VIEW</code> only involves changes to metadata in the metastore database, not any data files
+      in HDFS.
+    </p>
+
+
+
+
+
+
+
+
+
+    <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>ALTER VIEW [<var class="keyword varname">database_name</var>.]<var class="keyword varname">view_name</var> AS <var class="keyword varname">select_statement</var>
+ALTER VIEW [<var class="keyword varname">database_name</var>.]<var class="keyword varname">view_name</var> RENAME TO [<var class="keyword varname">database_name</var>.]<var class="keyword varname">view_name</var></code></pre>
+
+    <p class="p">
+        <strong class="ph b">Statement type:</strong> DDL
+      </p>
+
+    <p class="p">
+        If you connect to different Impala nodes within an <span class="keyword cmdname">impala-shell</span> session for
+        load-balancing purposes, you can enable the <code class="ph codeph">SYNC_DDL</code> query option to make each DDL
+        statement wait before returning, until the new or changed metadata has been received by all the Impala
+        nodes. See <a class="xref" href="../shared/../topics/impala_sync_ddl.html#sync_ddl">SYNC_DDL Query Option</a> for details.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Security considerations:</strong>
+      </p>
+    <p class="p">
+        If these statements in your environment contain sensitive literal values such as credit card numbers or tax
+        identifiers, Impala can redact this sensitive information when displaying the statements in log files and
+        other administrative contexts. See <span class="xref">the documentation for your Apache Hadoop distribution</span> for details.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Cancellation:</strong> Cannot be cancelled.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">HDFS permissions:</strong> This statement does not touch any HDFS files or directories,
+        therefore no HDFS permissions are required.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>create table t1 (x int, y int, s string);
+create table t2 like t1;
+create view v1 as select * from t1;
+alter view v1 as select * from t2;
+alter view v1 as select x, upper(s) s from t2;</code></pre>
+
+
+
+    <div class="p">
+        To see the definition of a view, issue a <code class="ph codeph">DESCRIBE FORMATTED</code> statement, which shows the
+        query from the original <code class="ph codeph">CREATE VIEW</code> statement:
+<pre class="pre codeblock"><code>[localhost:21000] &gt; create view v1 as select * from t1;
+[localhost:21000] &gt; describe formatted v1;
+Query finished, fetching results ...
++------------------------------+------------------------------+------------+
+| name                         | type                         | comment    |
++------------------------------+------------------------------+------------+
+| # col_name                   | data_type                    | comment    |
+|                              | NULL                         | NULL       |
+| x                            | int                          | None       |
+| y                            | int                          | None       |
+| s                            | string                       | None       |
+|                              | NULL                         | NULL       |
+| # Detailed Table Information | NULL                         | NULL       |
+| Database:                    | views                        | NULL       |
+| Owner:                       | doc_demo                     | NULL       |
+| CreateTime:                  | Mon Jul 08 15:56:27 EDT 2013 | NULL       |
+| LastAccessTime:              | UNKNOWN                      | NULL       |
+| Protect Mode:                | None                         | NULL       |
+| Retention:                   | 0                            | NULL       |
+<strong class="ph b">| Table Type:                  | VIRTUAL_VIEW                 | NULL       |</strong>
+| Table Parameters:            | NULL                         | NULL       |
+|                              | transient_lastDdlTime        | 1373313387 |
+|                              | NULL                         | NULL       |
+| # Storage Information        | NULL                         | NULL       |
+| SerDe Library:               | null                         | NULL       |
+| InputFormat:                 | null                         | NULL       |
+| OutputFormat:                | null                         | NULL       |
+| Compressed:                  | No                           | NULL       |
+| Num Buckets:                 | 0                            | NULL       |
+| Bucket Columns:              | []                           | NULL       |
+| Sort Columns:                | []                           | NULL       |
+|                              | NULL                         | NULL       |
+| # View Information           | NULL                         | NULL       |
+<strong class="ph b">| View Original Text:          | SELECT * FROM t1             | NULL       |
+| View Expanded Text:          | SELECT * FROM t1             | NULL       |</strong>
++------------------------------+------------------------------+------------+
+</code></pre>
+      </div>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+    <p class="p">
+      <a class="xref" href="impala_views.html#views">Overview of Impala Views</a>, <a class="xref" href="impala_create_view.html#create_view">CREATE VIEW Statement</a>,
+      <a class="xref" href="impala_drop_view.html#drop_view">DROP VIEW Statement</a>
+    </p>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_langref_sql.html">Impala SQL Statements</a></div></div></nav></article></main></body></html>

[37/51] [partial] impala git commit: [DOCS] Impala doc site update for 3.0

Posted by mi...@apache.org.

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_datetime_functions.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_datetime_functions.html b/docs/build3x/html/topics/impala_datetime_functions.html
new file mode 100644
index 0000000..61ae72a
--- /dev/null
+++ b/docs/build3x/html/topics/impala_datetime_functions.html
@@ -0,0 +1,3105 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_functions.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="datetime_functions"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Impala Date and Time Functions</title></head><body id="datetime_functions"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Impala Date and Time Functions</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+      The underlying Impala data type for date and time data is
+      <code class="ph codeph"><a class="xref" href="impala_timestamp.html#timestamp">TIMESTAMP</a></code>, which has both a date and a
+      time portion. Functions that extract a single field, such as <code class="ph codeph">hour()</code> or
+      <code class="ph codeph">minute()</code>, typically return an integer value. Functions that format the date portion, such as
+      <code class="ph codeph">date_add()</code> or <code class="ph codeph">to_date()</code>, typically return a string value.
+    </p>
+
+    <p class="p">
+      You can also adjust a <code class="ph codeph">TIMESTAMP</code> value by adding or subtracting an <code class="ph codeph">INTERVAL</code>
+      expression. See <a class="xref" href="impala_timestamp.html#timestamp">TIMESTAMP Data Type</a> for details. <code class="ph codeph">INTERVAL</code>
+      expressions are also allowed as the second argument for the <code class="ph codeph">date_add()</code> and
+      <code class="ph codeph">date_sub()</code> functions, rather than integers.
+    </p>
+
+    <p class="p">
+      Some of these functions are affected by the setting of the
+      <code class="ph codeph">--use_local_tz_for_unix_timestamp_conversions</code> startup flag for the
+      <span class="keyword cmdname">impalad</span> daemon. This setting is off by default, meaning that
+      functions such as <code class="ph codeph">from_unixtime()</code> and <code class="ph codeph">unix_timestamp()</code>
+      consider the input values to always represent the UTC time zone.
+      This setting also applies when you <code class="ph codeph">CAST()</code> a <code class="ph codeph">BIGINT</code>
+      value to <code class="ph codeph">TIMESTAMP</code>, or a <code class="ph codeph">TIMESTAMP</code>
+      value to <code class="ph codeph">BIGINT</code>.
+      When this setting is enabled, these functions and operations convert to and from
+      values representing the local time zone.
+      See <a class="xref" href="impala_timestamp.html#timestamp">TIMESTAMP Data Type</a> for details about how
+      Impala handles time zone considerations for the <code class="ph codeph">TIMESTAMP</code> data type.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Function reference:</strong>
+    </p>
+
+    <p class="p">
+      Impala supports the following data and time functions:
+    </p>
+
+
+
+    <dl class="dl">
+
+
+        <dt class="dt dlterm" id="datetime_functions__add_months">
+          <code class="ph codeph">add_months(timestamp date, int months)</code>, <code class="ph codeph">add_months(timestamp date, bigint
+          months)</code>
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Returns the specified date and time plus some number of months.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">timestamp</code>
+          </p>
+          <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+          <p class="p">
+            Same as <code class="ph codeph"><a class="xref" href="#datetime_functions__months_add">months_add()</a></code>.
+            Available in Impala 1.4 and higher. For
+            compatibility when porting code with vendor extensions.
+          </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+          <p class="p">
+            The following examples demonstrate adding months to construct the same
+            day of the month in a different month; how if the current day of the month
+            does not exist in the target month, the last day of that month is substituted;
+            and how a negative argument produces a return value from a previous month.
+          </p>
+<pre class="pre codeblock"><code>
+select now(), add_months(now(), 2);
++-------------------------------+-------------------------------+
+| now()                         | add_months(now(), 2)          |
++-------------------------------+-------------------------------+
+| 2016-05-31 10:47:00.429109000 | 2016-07-31 10:47:00.429109000 |
++-------------------------------+-------------------------------+
+
+select now(), add_months(now(), 1);
++-------------------------------+-------------------------------+
+| now()                         | add_months(now(), 1)          |
++-------------------------------+-------------------------------+
+| 2016-05-31 10:47:14.540226000 | 2016-06-30 10:47:14.540226000 |
++-------------------------------+-------------------------------+
+
+select now(), add_months(now(), -1);
++-------------------------------+-------------------------------+
+| now()                         | add_months(now(), -1)         |
++-------------------------------+-------------------------------+
+| 2016-05-31 10:47:31.732298000 | 2016-04-30 10:47:31.732298000 |
++-------------------------------+-------------------------------+
+</code></pre>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="datetime_functions__adddate">
+          <code class="ph codeph">adddate(timestamp startdate, int days)</code>, <code class="ph codeph">adddate(timestamp startdate, bigint
+          days)</code>,
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Adds a specified number of days to a <code class="ph codeph">TIMESTAMP</code> value. Similar to
+          <code class="ph codeph">date_add()</code>, but starts with an actual <code class="ph codeph">TIMESTAMP</code> value instead of a
+          string that is converted to a <code class="ph codeph">TIMESTAMP</code>.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">timestamp</code>
+          </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+          <p class="p">
+            The following examples show how to add a number of days to a <code class="ph codeph">TIMESTAMP</code>.
+            The number of days can also be negative, which gives the same effect as the <code class="ph codeph">subdate()</code> function.
+          </p>
+<pre class="pre codeblock"><code>
+select now() as right_now, adddate(now(), 30) as now_plus_30;
++-------------------------------+-------------------------------+
+| right_now                     | now_plus_30                   |
++-------------------------------+-------------------------------+
+| 2016-05-20 10:23:08.640111000 | 2016-06-19 10:23:08.640111000 |
++-------------------------------+-------------------------------+
+
+select now() as right_now, adddate(now(), -15) as now_minus_15;
++-------------------------------+-------------------------------+
+| right_now                     | now_minus_15                  |
++-------------------------------+-------------------------------+
+| 2016-05-20 10:23:38.214064000 | 2016-05-05 10:23:38.214064000 |
++-------------------------------+-------------------------------+
+</code></pre>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="datetime_functions__current_timestamp">
+          <code class="ph codeph">current_timestamp()</code>
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Alias for the <code class="ph codeph">now()</code> function.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">timestamp</code>
+          </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+<pre class="pre codeblock"><code>
+select now(), current_timestamp();
++-------------------------------+-------------------------------+
+| now()                         | current_timestamp()           |
++-------------------------------+-------------------------------+
+| 2016-05-19 16:10:14.237849000 | 2016-05-19 16:10:14.237849000 |
++-------------------------------+-------------------------------+
+
+select current_timestamp() as right_now,
+  current_timestamp() + interval 3 hours as in_three_hours;
++-------------------------------+-------------------------------+
+| right_now                     | in_three_hours                |
++-------------------------------+-------------------------------+
+| 2016-05-19 16:13:20.017117000 | 2016-05-19 19:13:20.017117000 |
++-------------------------------+-------------------------------+
+</code></pre>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="datetime_functions__date_add">
+          <code class="ph codeph">date_add(timestamp startdate, int days)</code>, <code class="ph codeph">date_add(timestamp startdate,
+          <var class="keyword varname">interval_expression</var>)</code>
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Adds a specified number of days to a <code class="ph codeph">TIMESTAMP</code> value.
+
+          With an <code class="ph codeph">INTERVAL</code>
+          expression as the second argument, you can calculate a delta value using other units such as weeks,
+          years, hours, seconds, and so on; see <a class="xref" href="impala_timestamp.html#timestamp">TIMESTAMP Data Type</a> for details.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">timestamp</code>
+          </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+          <p class="p">
+            The following example shows the simplest usage, of adding a specified number of days
+            to a <code class="ph codeph">TIMESTAMP</code> value:
+          </p>
+<pre class="pre codeblock"><code>
+select now() as right_now, date_add(now(), 7) as next_week;
++-------------------------------+-------------------------------+
+| right_now                     | next_week                     |
++-------------------------------+-------------------------------+
+| 2016-05-20 11:03:48.687055000 | 2016-05-27 11:03:48.687055000 |
++-------------------------------+-------------------------------+
+</code></pre>
+
+          <p class="p">
+            The following examples show the shorthand notation of an <code class="ph codeph">INTERVAL</code>
+            expression, instead of specifying the precise number of days.
+            The <code class="ph codeph">INTERVAL</code> notation also lets you work with units smaller than
+            a single day.
+          </p>
+<pre class="pre codeblock"><code>
+select now() as right_now, date_add(now(), interval 3 weeks) as in_3_weeks;
++-------------------------------+-------------------------------+
+| right_now                     | in_3_weeks                    |
++-------------------------------+-------------------------------+
+| 2016-05-20 11:05:39.173331000 | 2016-06-10 11:05:39.173331000 |
++-------------------------------+-------------------------------+
+
+select now() as right_now, date_add(now(), interval 6 hours) as in_6_hours;
++-------------------------------+-------------------------------+
+| right_now                     | in_6_hours                    |
++-------------------------------+-------------------------------+
+| 2016-05-20 11:13:51.492536000 | 2016-05-20 17:13:51.492536000 |
++-------------------------------+-------------------------------+
+</code></pre>
+
+          <p class="p">
+            Like all date/time functions that deal with months, <code class="ph codeph">date_add()</code>
+            handles nonexistent dates past the end of a month by setting the date to the
+            last day of the month. The following example shows how the nonexistent date
+            April 31st is normalized to April 30th:
+          </p>
+<pre class="pre codeblock"><code>
+select date_add(cast('2016-01-31' as timestamp), interval 3 months) as 'april_31st';
++---------------------+
+| april_31st          |
++---------------------+
+| 2016-04-30 00:00:00 |
++---------------------+
+</code></pre>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="datetime_functions__date_part">
+          <code class="ph codeph">date_part(string, timestamp)</code>
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Similar to
+          <a class="xref" href="impala_datetime_functions.html#datetime_functions__extract"><code class="ph codeph">EXTRACT()</code></a>,
+          with the argument order reversed. Supports the same date and time units as <code class="ph codeph">EXTRACT()</code>.
+          For compatibility with SQL code containing vendor extensions.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">int</code>
+          </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+<pre class="pre codeblock"><code>
+select date_part('year',now()) as current_year;
++--------------+
+| current_year |
++--------------+
+| 2016         |
++--------------+
+
+select date_part('hour',now()) as hour_of_day;
++-------------+
+| hour_of_day |
++-------------+
+| 11          |
++-------------+
+</code></pre>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="datetime_functions__date_sub">
+          <code class="ph codeph">date_sub(timestamp startdate, int days)</code>, <code class="ph codeph">date_sub(timestamp startdate,
+          <var class="keyword varname">interval_expression</var>)</code>
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Subtracts a specified number of days from a <code class="ph codeph">TIMESTAMP</code> value.
+
+          With an
+          <code class="ph codeph">INTERVAL</code> expression as the second argument, you can calculate a delta value using other
+          units such as weeks, years, hours, seconds, and so on; see <a class="xref" href="impala_timestamp.html#timestamp">TIMESTAMP Data Type</a>
+          for details.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">timestamp</code>
+          </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+          <p class="p">
+            The following example shows the simplest usage, of subtracting a specified number of days
+            from a <code class="ph codeph">TIMESTAMP</code> value:
+          </p>
+<pre class="pre codeblock"><code>
+select now() as right_now, date_sub(now(), 7) as last_week;
++-------------------------------+-------------------------------+
+| right_now                     | last_week                     |
++-------------------------------+-------------------------------+
+| 2016-05-20 11:21:30.491011000 | 2016-05-13 11:21:30.491011000 |
++-------------------------------+-------------------------------+
+</code></pre>
+          <p class="p">
+            The following examples show the shorthand notation of an <code class="ph codeph">INTERVAL</code>
+            expression, instead of specifying the precise number of days.
+            The <code class="ph codeph">INTERVAL</code> notation also lets you work with units smaller than
+            a single day.
+          </p>
+<pre class="pre codeblock"><code>
+select now() as right_now, date_sub(now(), interval 3 weeks) as 3_weeks_ago;
++-------------------------------+-------------------------------+
+| right_now                     | 3_weeks_ago                   |
++-------------------------------+-------------------------------+
+| 2016-05-20 11:23:05.176953000 | 2016-04-29 11:23:05.176953000 |
++-------------------------------+-------------------------------+
+
+select now() as right_now, date_sub(now(), interval 6 hours) as 6_hours_ago;
++-------------------------------+-------------------------------+
+| right_now                     | 6_hours_ago                   |
++-------------------------------+-------------------------------+
+| 2016-05-20 11:23:35.439631000 | 2016-05-20 05:23:35.439631000 |
++-------------------------------+-------------------------------+
+</code></pre>
+
+          <p class="p">
+            Like all date/time functions that deal with months, <code class="ph codeph">date_add()</code>
+            handles nonexistent dates past the end of a month by setting the date to the
+            last day of the month. The following example shows how the nonexistent date
+            April 31st is normalized to April 30th:
+          </p>
+<pre class="pre codeblock"><code>
+select date_sub(cast('2016-05-31' as timestamp), interval 1 months) as 'april_31st';
++---------------------+
+| april_31st          |
++---------------------+
+| 2016-04-30 00:00:00 |
++---------------------+
+</code></pre>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="datetime_functions__date_trunc">
+          <code class="ph codeph">date_trunc(string unit, timestamp)</code>
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Truncates a <code class="ph codeph">TIMESTAMP</code> value to the specified precision.
+          <p class="p">
+            <strong class="ph b">Unit argument:</strong> The <code class="ph codeph">unit</code> argument value for truncating
+            <code class="ph codeph">TIMESTAMP</code> values is not case-sensitive. This argument string
+            can be one of:
+          </p>
+          <ul class="ul">
+            <li class="li">microseconds</li>
+            <li class="li">milliseconds</li>
+            <li class="li">second</li>
+            <li class="li">minute</li>
+            <li class="li">hour</li>
+            <li class="li">day</li>
+            <li class="li">week</li>
+            <li class="li">month</li>
+            <li class="li">year</li>
+            <li class="li">decade</li>
+            <li class="li">century</li>
+            <li class="li">millennium</li>
+          </ul>
+          <p class="p">
+            For example, calling <code class="ph codeph">date_trunc('hour',ts)</code> truncates
+            <code class="ph codeph">ts</code> to the beginning of the corresponding hour, with
+            all minutes, seconds, milliseconds, and so on set to zero. Calling
+            <code class="ph codeph">date_trunc('milliseconds',ts)</code> truncates
+            <code class="ph codeph">ts</code> to the beginning of the corresponding millisecond,
+            with all microseconds and nanoseconds set to zero.
+          </p>
+          <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+            The sub-second units are specified in plural form. All units representing
+            one second or more are specified in singular form.
+          </div>
+          <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.11.0</span>
+      </p>
+          <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+          <p class="p">
+            Although this function is similar to calling <code class="ph codeph">TRUNC()</code>
+            with a <code class="ph codeph">TIMESTAMP</code> argument, the order of arguments
+            and the recognized units are different between <code class="ph codeph">TRUNC()</code>
+            and <code class="ph codeph">DATE_TRUNC()</code>. Therefore, these functions are not
+            interchangeable.
+          </p>
+          <p class="p">
+            This function is typically used in <code class="ph codeph">GROUP BY</code>
+            queries to aggregate results from the same hour, day, week, month, quarter, and so on.
+            You can also use this function in an <code class="ph codeph">INSERT ... SELECT</code> into a
+            partitioned table to divide <code class="ph codeph">TIMESTAMP</code> values into the correct partition.
+          </p>
+          <p class="p">
+            Because the return value is a <code class="ph codeph">TIMESTAMP</code>, if you cast the result of
+            <code class="ph codeph">DATE_TRUNC()</code> to <code class="ph codeph">STRING</code>, you will often see zeroed-out portions such as
+            <code class="ph codeph">00:00:00</code> in the time field. If you only need the individual units such as hour, day,
+            month, or year, use the <code class="ph codeph">EXTRACT()</code> function instead. If you need the individual units
+            from a truncated <code class="ph codeph">TIMESTAMP</code> value, run the <code class="ph codeph">TRUNCATE()</code> function on the
+            original value, then run <code class="ph codeph">EXTRACT()</code> on the result.
+          </p>
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">timestamp</code>
+          </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+          <p class="p">
+            The following examples show how to call <code class="ph codeph">DATE_TRUNC()</code> with different unit values:
+          </p>
+<pre class="pre codeblock"><code>
+select now(), date_trunc('second', now());
++-------------------------------+-----------------------------------+
+| now()                         | date_trunc('second', now())       |
++-------------------------------+-----------------------------------+
+| 2017-12-05 13:58:04.565403000 | 2017-12-05 13:58:04               |
++-------------------------------+-----------------------------------+
+
+select now(), date_trunc('hour', now());
++-------------------------------+---------------------------+
+| now()                         | date_trunc('hour', now()) |
++-------------------------------+---------------------------+
+| 2017-12-05 13:59:01.884459000 | 2017-12-05 13:00:00       |
++-------------------------------+---------------------------+
+
+select now(), date_trunc('millennium', now());
++-------------------------------+---------------------------------+
+| now()                         | date_trunc('millennium', now()) |
++-------------------------------+---------------------------------+
+| 2017-12-05 14:00:30.296812000 | 2000-01-01 00:00:00             |
++-------------------------------+---------------------------------+
+</code></pre>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="datetime_functions__datediff">
+          <code class="ph codeph">datediff(timestamp enddate, timestamp startdate)</code>
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Returns the number of days between two <code class="ph codeph">TIMESTAMP</code> values.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">int</code>
+          </p>
+          <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+          <p class="p">
+            If the first argument represents a later date than the second argument,
+            the return value is positive. If both arguments represent the same date,
+            the return value is zero. The time portions of the <code class="ph codeph">TIMESTAMP</code>
+            values are irrelevant. For example, 11:59 PM on one day and 12:01 on the next
+            day represent a <code class="ph codeph">datediff()</code> of -1 because the date/time values
+            represent different days, even though the <code class="ph codeph">TIMESTAMP</code> values differ by only 2 minutes.
+          </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+          <p class="p">
+            The following example shows how comparing a <span class="q">"late"</span> value with
+            an <span class="q">"earlier"</span> value produces a positive number. In this case,
+            the result is (365 * 5) + 1, because one of the intervening years is
+            a leap year.
+          </p>
+<pre class="pre codeblock"><code>
+select now() as right_now, datediff(now() + interval 5 years, now()) as in_5_years;
++-------------------------------+------------+
+| right_now                     | in_5_years |
++-------------------------------+------------+
+| 2016-05-20 13:43:55.873826000 | 1826       |
++-------------------------------+------------+
+</code></pre>
+          <p class="p">
+            The following examples show how the return value represent the number of days
+            between the associated dates, regardless of the time portion of each <code class="ph codeph">TIMESTAMP</code>.
+            For example, different times on the same day produce a <code class="ph codeph">date_diff()</code> of 0,
+            regardless of which one is earlier or later. But if the arguments represent different dates,
+            <code class="ph codeph">date_diff()</code> returns a non-zero integer value, regardless of the time portions
+            of the dates.
+          </p>
+<pre class="pre codeblock"><code>
+select now() as right_now, datediff(now(), now() + interval 4 hours) as in_4_hours;
++-------------------------------+------------+
+| right_now                     | in_4_hours |
++-------------------------------+------------+
+| 2016-05-20 13:42:05.302747000 | 0          |
++-------------------------------+------------+
+
+select now() as right_now, datediff(now(), now() - interval 4 hours) as 4_hours_ago;
++-------------------------------+-------------+
+| right_now                     | 4_hours_ago |
++-------------------------------+-------------+
+| 2016-05-20 13:42:21.134958000 | 0           |
++-------------------------------+-------------+
+
+select now() as right_now, datediff(now(), now() + interval 12 hours) as in_12_hours;
++-------------------------------+-------------+
+| right_now                     | in_12_hours |
++-------------------------------+-------------+
+| 2016-05-20 13:42:44.765873000 | -1          |
++-------------------------------+-------------+
+
+select now() as right_now, datediff(now(), now() - interval 18 hours) as 18_hours_ago;
++-------------------------------+--------------+
+| right_now                     | 18_hours_ago |
++-------------------------------+--------------+
+| 2016-05-20 13:54:38.829827000 | 1            |
++-------------------------------+--------------+
+</code></pre>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="datetime_functions__day">
+          <code class="ph codeph">day(timestamp date), <span class="ph" id="datetime_functions__dayofmonth">dayofmonth(timestamp date)</span></code>
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Returns the day field from the date portion of a <code class="ph codeph">TIMESTAMP</code>.
+          The value represents the day of the month, therefore is in the range 1-31, or less for
+          months without 31 days.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">int</code>
+          </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+          <p class="p">
+            The following examples show how the day value corresponds to the day
+            of the month, resetting back to 1 at the start of each month.
+          </p>
+<pre class="pre codeblock"><code>
+select now(), day(now());
++-------------------------------+------------+
+| now()                         | day(now()) |
++-------------------------------+------------+
+| 2016-05-20 15:01:51.042185000 | 20         |
++-------------------------------+------------+
+
+select now() + interval 11 days, day(now() + interval 11 days);
++-------------------------------+-------------------------------+
+| now() + interval 11 days      | day(now() + interval 11 days) |
++-------------------------------+-------------------------------+
+| 2016-05-31 15:05:56.843139000 | 31                            |
++-------------------------------+-------------------------------+
+
+select now() + interval 12 days, day(now() + interval 12 days);
++-------------------------------+-------------------------------+
+| now() + interval 12 days      | day(now() + interval 12 days) |
++-------------------------------+-------------------------------+
+| 2016-06-01 15:06:05.074236000 | 1                             |
++-------------------------------+-------------------------------+
+</code></pre>
+          <p class="p">
+            The following examples show how the day value is <code class="ph codeph">NULL</code>
+            for nonexistent dates or misformatted date strings.
+          </p>
+<pre class="pre codeblock"><code>
+-- 2016 is a leap year, so it has a Feb. 29.
+select day('2016-02-29');
++-------------------+
+| day('2016-02-29') |
++-------------------+
+| 29                |
++-------------------+
+
+-- 2015 is not a leap year, so Feb. 29 is nonexistent.
+select day('2015-02-29');
++-------------------+
+| day('2015-02-29') |
++-------------------+
+| NULL              |
++-------------------+
+
+-- A string that does not match the expected YYYY-MM-DD format
+-- produces an invalid TIMESTAMP, causing day() to return NULL.
+select day('2016-02-028');
++--------------------+
+| day('2016-02-028') |
++--------------------+
+| NULL               |
++--------------------+
+</code></pre>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="datetime_functions__dayname">
+          <code class="ph codeph">dayname(timestamp date)</code>
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Returns the day field from a <code class="ph codeph">TIMESTAMP</code> value, converted to the string
+          corresponding to that day name. The range of return values is <code class="ph codeph">'Sunday'</code> to
+          <code class="ph codeph">'Saturday'</code>. Used in report-generating queries, as an alternative to calling
+          <code class="ph codeph">dayofweek()</code> and turning that numeric return value into a string using a
+          <code class="ph codeph">CASE</code> expression.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+          </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+          <p class="p">
+            The following examples show the day name associated with
+            <code class="ph codeph">TIMESTAMP</code> values representing different days.
+          </p>
+<pre class="pre codeblock"><code>
+select now() as right_now,
+  dayofweek(now()) as todays_day_of_week,
+  dayname(now()) as todays_day_name;
++-------------------------------+--------------------+-----------------+
+| right_now                     | todays_day_of_week | todays_day_name |
++-------------------------------+--------------------+-----------------+
+| 2016-05-31 10:57:03.953670000 | 3                  | Tuesday         |
++-------------------------------+--------------------+-----------------+
+
+select now() + interval 1 day as tomorrow,
+  dayname(now() + interval 1 day) as tomorrows_day_name;
++-------------------------------+--------------------+
+| tomorrow                      | tomorrows_day_name |
++-------------------------------+--------------------+
+| 2016-06-01 10:58:53.945761000 | Wednesday          |
++-------------------------------+--------------------+
+</code></pre>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="datetime_functions__dayofweek">
+          <code class="ph codeph">dayofweek(timestamp date)</code>
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Returns the day field from the date portion of a <code class="ph codeph">TIMESTAMP</code>, corresponding to the day of
+          the week. The range of return values is 1 (Sunday) to 7 (Saturday).
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">int</code>
+          </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+<pre class="pre codeblock"><code>
+select now() as right_now,
+  dayofweek(now()) as todays_day_of_week,
+  dayname(now()) as todays_day_name;
++-------------------------------+--------------------+-----------------+
+| right_now                     | todays_day_of_week | todays_day_name |
++-------------------------------+--------------------+-----------------+
+| 2016-05-31 10:57:03.953670000 | 3                  | Tuesday         |
++-------------------------------+--------------------+-----------------+
+</code></pre>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="datetime_functions__dayofyear">
+          <code class="ph codeph">dayofyear(timestamp date)</code>
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Returns the day field from a <code class="ph codeph">TIMESTAMP</code> value, corresponding to the day
+          of the year. The range of return values is 1 (January 1) to 366 (December 31 of a leap year).
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">int</code>
+          </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+          <p class="p">
+            The following examples show return values from the
+            <code class="ph codeph">dayofyear()</code> function. The same date
+            in different years returns a different day number
+            for all dates after February 28,
+            because 2016 is a leap year while 2015 is not a leap year.
+          </p>
+<pre class="pre codeblock"><code>
+select now() as right_now,
+  dayofyear(now()) as today_day_of_year;
++-------------------------------+-------------------+
+| right_now                     | today_day_of_year |
++-------------------------------+-------------------+
+| 2016-05-31 11:05:48.314932000 | 152               |
++-------------------------------+-------------------+
+
+select now() - interval 1 year as last_year,
+  dayofyear(now() - interval 1 year) as year_ago_day_of_year;
++-------------------------------+----------------------+
+| last_year                     | year_ago_day_of_year |
++-------------------------------+----------------------+
+| 2015-05-31 11:07:03.733689000 | 151                  |
++-------------------------------+----------------------+
+</code></pre>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="datetime_functions__days_add">
+          <code class="ph codeph">days_add(timestamp startdate, int days)</code>, <code class="ph codeph">days_add(timestamp startdate, bigint
+          days)</code>
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Adds a specified number of days to a <code class="ph codeph">TIMESTAMP</code> value. Similar to
+          <code class="ph codeph">date_add()</code>, but starts with an actual <code class="ph codeph">TIMESTAMP</code> value instead of a
+          string that is converted to a <code class="ph codeph">TIMESTAMP</code>.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">timestamp</code>
+          </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+<pre class="pre codeblock"><code>
+select now() as right_now, days_add(now(), 31) as 31_days_later;
++-------------------------------+-------------------------------+
+| right_now                     | 31_days_later                 |
++-------------------------------+-------------------------------+
+| 2016-05-31 11:12:32.216764000 | 2016-07-01 11:12:32.216764000 |
++-------------------------------+-------------------------------+
+</code></pre>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="datetime_functions__days_sub">
+          <code class="ph codeph">days_sub(timestamp startdate, int days)</code>, <code class="ph codeph">days_sub(timestamp startdate, bigint
+          days)</code>
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Subtracts a specified number of days from a <code class="ph codeph">TIMESTAMP</code> value. Similar to
+          <code class="ph codeph">date_sub()</code>, but starts with an actual <code class="ph codeph">TIMESTAMP</code> value instead of a
+          string that is converted to a <code class="ph codeph">TIMESTAMP</code>.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">timestamp</code>
+          </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+<pre class="pre codeblock"><code>
+select now() as right_now, days_sub(now(), 31) as 31_days_ago;
++-------------------------------+-------------------------------+
+| right_now                     | 31_days_ago                   |
++-------------------------------+-------------------------------+
+| 2016-05-31 11:13:42.163905000 | 2016-04-30 11:13:42.163905000 |
++-------------------------------+-------------------------------+
+</code></pre>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="datetime_functions__extract">
+          <code class="ph codeph">extract(timestamp, string unit)</code>, <code class="ph codeph">extract(unit FROM timestamp)</code>
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Returns one of the numeric date or time fields from a
+            <code class="ph codeph">TIMESTAMP</code> value.
+          <p class="p">
+            <strong class="ph b">Unit argument:</strong> The <code class="ph codeph">unit</code> string can be one of
+              <code class="ph codeph">epoch</code>, <code class="ph codeph">year</code>,
+              <code class="ph codeph">quarter</code>, <code class="ph codeph">month</code>,
+              <code class="ph codeph">day</code>, <code class="ph codeph">hour</code>,
+              <code class="ph codeph">minute</code>, <code class="ph codeph">second</code>, or
+              <code class="ph codeph">millisecond</code>. This argument value is
+            case-insensitive.
+          </p>
+          <div class="p"> In Impala 2.0 and higher, you can use special syntax
+            rather than a regular function call, for compatibility with code
+            that uses the SQL-99 format with the <code class="ph codeph">FROM</code> keyword.
+            With this style, the unit names are identifiers rather than
+              <code class="ph codeph">STRING</code> literals. For example, the following calls
+            are both equivalent:
+            <pre class="pre codeblock"><code>extract(year from now());
+extract(now(), "year");
+</code></pre>
+          </div>
+          <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+          <p class="p"> Typically used in <code class="ph codeph">GROUP BY</code> queries to arrange
+            results by hour, day, month, and so on. You can also use this
+            function in an <code class="ph codeph">INSERT ... SELECT</code> into a partitioned
+            table to split up <code class="ph codeph">TIMESTAMP</code> values into individual
+            parts, if the partitioned table has separate partition key columns
+            representing year, month, day, and so on. If you need to divide by
+            more complex units of time, such as by week or by quarter, use the
+              <code class="ph codeph">TRUNC()</code> function instead. </p>
+          <p class="p">
+            <strong class="ph b">Return type:</strong>
+            <code class="ph codeph">int</code>
+          </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+          <pre class="pre codeblock"><code>
+select now() as right_now,
+  extract(year from now()) as this_year,
+  extract(month from now()) as this_month;
++-------------------------------+-----------+------------+
+| right_now                     | this_year | this_month |
++-------------------------------+-----------+------------+
+| 2016-05-31 11:18:43.310328000 | 2016      | 5          |
++-------------------------------+-----------+------------+
+
+select now() as right_now,
+  extract(day from now()) as this_day,
+  extract(hour from now()) as this_hour;
++-------------------------------+----------+-----------+
+| right_now                     | this_day | this_hour |
++-------------------------------+----------+-----------+
+| 2016-05-31 11:19:24.025303000 | 31       | 11        |
++-------------------------------+----------+-----------+
+</code></pre>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="datetime_functions__from_timestamp">
+          <code class="ph codeph">from_timestamp(datetime timestamp, pattern string)</code>
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Converts a <code class="ph codeph">TIMESTAMP</code> value into a
+          string representing the same value.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+          </p>
+          <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.3.0</span>
+      </p>
+          <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+          <p class="p">
+            The <code class="ph codeph">from_timestamp()</code> function provides a flexible way to convert <code class="ph codeph">TIMESTAMP</code>
+            values into arbitrary string formats for reporting purposes.
+          </p>
+          <p class="p">
+            Because Impala implicitly converts string values into <code class="ph codeph">TIMESTAMP</code>, you can
+            pass date/time values represented as strings (in the standard <code class="ph codeph">yyyy-MM-dd HH:mm:ss.SSS</code> format)
+            to this function. The result is a string using different separator characters, order of fields, spelled-out month
+            names, or other variation of the date/time string representation.
+          </p>
+          <p class="p">
+            The allowed tokens for the pattern string are the same as for the <code class="ph codeph">from_unixtime()</code> function.
+          </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+          <p class="p">
+            The following examples show different ways to format a <code class="ph codeph">TIMESTAMP</code>
+            value as a string:
+          </p>
+<pre class="pre codeblock"><code>
+-- Reformat arbitrary TIMESTAMP value.
+select from_timestamp(now(), 'yyyy/MM/dd');
++-------------------------------------+
+| from_timestamp(now(), 'yyyy/mm/dd') |
++-------------------------------------+
+| 2017/10/01                          |
++-------------------------------------+
+
+-- Reformat string literal representing date/time.
+select from_timestamp('1984-09-25', 'yyyy/MM/dd');
++--------------------------------------------+
+| from_timestamp('1984-09-25', 'yyyy/mm/dd') |
++--------------------------------------------+
+| 1984/09/25                                 |
++--------------------------------------------+
+
+-- Alternative format for reporting purposes.
+select from_timestamp('1984-09-25 16:45:30.125', 'MMM dd, yyyy HH:mm:ss.SSS');
++------------------------------------------------------------------------+
+| from_timestamp('1984-09-25 16:45:30.125', 'mmm dd, yyyy hh:mm:ss.sss') |
++------------------------------------------------------------------------+
+| Sep 25, 1984 16:45:30.125                                              |
++------------------------------------------------------------------------+
+</code></pre>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="datetime_functions__from_unixtime">
+          <code class="ph codeph">from_unixtime(bigint unixtime[, string format])</code>
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Converts the number of seconds from the Unix epoch to the specified time into a string in
+          the local time zone.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+          </p>
+          <p class="p">
+        In Impala 2.2.0 and higher, built-in functions that accept or return integers representing <code class="ph codeph">TIMESTAMP</code> values
+        use the <code class="ph codeph">BIGINT</code> type for parameters and return values, rather than <code class="ph codeph">INT</code>.
+        This change lets the date and time functions avoid an overflow error that would otherwise occur
+        on January 19th, 2038 (known as the
+        <a class="xref" href="http://en.wikipedia.org/wiki/Year_2038_problem" target="_blank"><span class="q">"Year 2038 problem"</span> or <span class="q">"Y2K38 problem"</span></a>).
+        This change affects the <code class="ph codeph">from_unixtime()</code> and <code class="ph codeph">unix_timestamp()</code> functions.
+        You might need to change application code that interacts with these functions, change the types of
+        columns that store the return values, or add <code class="ph codeph">CAST()</code> calls to SQL statements that
+        call these functions.
+      </p>
+          <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+          <p class="p">
+            The format string accepts the variations allowed for the <code class="ph codeph">TIMESTAMP</code>
+            data type: date plus time, date by itself, time by itself, and optional fractional seconds for the
+            time. See <a class="xref" href="impala_timestamp.html#timestamp">TIMESTAMP Data Type</a> for details.
+          </p>
+          <p class="p">
+            Currently, the format string is case-sensitive, especially to distinguish <code class="ph codeph">m</code> for
+            minutes and <code class="ph codeph">M</code> for months. In Impala 1.3 and later, you can switch the order of
+            elements, use alternative separator characters, and use a different number of placeholders for each
+            unit. Adding more instances of <code class="ph codeph">y</code>, <code class="ph codeph">d</code>, <code class="ph codeph">H</code>, and so on
+            produces output strings zero-padded to the requested number of characters. The exception is
+            <code class="ph codeph">M</code> for months, where <code class="ph codeph">M</code> produces a non-padded value such as
+            <code class="ph codeph">3</code>, <code class="ph codeph">MM</code> produces a zero-padded value such as <code class="ph codeph">03</code>,
+            <code class="ph codeph">MMM</code> produces an abbreviated month name such as <code class="ph codeph">Mar</code>, and sequences of
+            4 or more <code class="ph codeph">M</code> are not allowed. A date string including all fields could be
+            <code class="ph codeph">"yyyy-MM-dd HH:mm:ss.SSSSSS"</code>, <code class="ph codeph">"dd/MM/yyyy HH:mm:ss.SSSSSS"</code>,
+            <code class="ph codeph">"MMM dd, yyyy HH.mm.ss (SSSSSS)"</code> or other combinations of placeholders and separator
+            characters.
+          </p>
+          <p class="p">
+        The way this function deals with time zones when converting to or from <code class="ph codeph">TIMESTAMP</code>
+        values is affected by the <code class="ph codeph">--use_local_tz_for_unix_timestamp_conversions</code> startup flag for the
+        <span class="keyword cmdname">impalad</span> daemon. See <a class="xref" href="../shared/../topics/impala_timestamp.html#timestamp">TIMESTAMP Data Type</a> for details about
+        how Impala handles time zone considerations for the <code class="ph codeph">TIMESTAMP</code> data type.
+      </p>
+          <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+            <p class="p">
+              The more flexible format strings allowed with the built-in functions do not change the rules about
+              using <code class="ph codeph">CAST()</code> to convert from a string to a <code class="ph codeph">TIMESTAMP</code> value. Strings
+              being converted through <code class="ph codeph">CAST()</code> must still have the elements in the specified order and use the specified delimiter
+              characters, as described in <a class="xref" href="impala_timestamp.html#timestamp">TIMESTAMP Data Type</a>.
+            </p>
+          </div>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+<pre class="pre codeblock"><code>select from_unixtime(1392394861,"yyyy-MM-dd HH:mm:ss.SSSS");
++-------------------------------------------------------+
+| from_unixtime(1392394861, 'yyyy-mm-dd hh:mm:ss.ssss') |
++-------------------------------------------------------+
+| 2014-02-14 16:21:01.0000                              |
++-------------------------------------------------------+
+
+select from_unixtime(1392394861,"yyyy-MM-dd");
++-----------------------------------------+
+| from_unixtime(1392394861, 'yyyy-mm-dd') |
++-----------------------------------------+
+| 2014-02-14                              |
++-----------------------------------------+
+
+select from_unixtime(1392394861,"HH:mm:ss.SSSS");
++--------------------------------------------+
+| from_unixtime(1392394861, 'hh:mm:ss.ssss') |
++--------------------------------------------+
+| 16:21:01.0000                              |
++--------------------------------------------+
+
+select from_unixtime(1392394861,"HH:mm:ss");
++---------------------------------------+
+| from_unixtime(1392394861, 'hh:mm:ss') |
++---------------------------------------+
+| 16:21:01                              |
++---------------------------------------+</code></pre>
+          <div class="p">
+        <code class="ph codeph">unix_timestamp()</code> and <code class="ph codeph">from_unixtime()</code> are often used in combination to
+        convert a <code class="ph codeph">TIMESTAMP</code> value into a particular string format. For example:
+<pre class="pre codeblock"><code>select from_unixtime(unix_timestamp(now() + interval 3 days),
+  'yyyy/MM/dd HH:mm') as yyyy_mm_dd_hh_mm;
++------------------+
+| yyyy_mm_dd_hh_mm |
++------------------+
+| 2016/06/03 11:38 |
++------------------+
+</code></pre>
+      </div>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="datetime_functions__from_utc_timestamp">
+          <code class="ph codeph">from_utc_timestamp(timestamp, string timezone)</code>
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Converts a specified UTC timestamp value into the appropriate value for a specified time
+          zone.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">timestamp</code>
+          </p>
+          <p class="p">
+            <strong class="ph b">Usage notes:</strong> Often used to translate UTC time zone data stored in a table back to the local
+            date and time for reporting. The opposite of the <code class="ph codeph">to_utc_timestamp()</code> function.
+          </p>
+          <p class="p">
+        To determine the time zone of the server you are connected to, in <span class="keyword">Impala 2.3</span> and
+        higher you can call the <code class="ph codeph">timeofday()</code> function, which includes the time zone
+        specifier in its return value. Remember that with cloud computing, the server you interact
+        with might be in a different time zone than you are, or different sessions might connect to
+        servers in different time zones, or a cluster might include servers in more than one time zone.
+      </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+          <p class="p">
+            See discussion of time zones in <a class="xref" href="impala_timestamp.html#timestamp">TIMESTAMP Data Type</a>
+            for information about using this function for conversions between the local time zone and UTC.
+          </p>
+          <p class="p">
+            The following example shows how when <code class="ph codeph">TIMESTAMP</code> values representing the UTC time zone
+            are stored in a table, a query can display the equivalent local date and time for a different time zone.
+          </p>
+<pre class="pre codeblock"><code>
+with t1 as (select cast('2016-06-02 16:25:36.116143000' as timestamp) as utc_datetime)
+  select utc_datetime as 'Date/time in Greenwich UK',
+    from_utc_timestamp(utc_datetime, 'PDT')
+      as 'Equivalent in California USA'
+  from t1;
++-------------------------------+-------------------------------+
+| date/time in greenwich uk     | equivalent in california usa  |
++-------------------------------+-------------------------------+
+| 2016-06-02 16:25:36.116143000 | 2016-06-02 09:25:36.116143000 |
++-------------------------------+-------------------------------+
+</code></pre>
+          <p class="p">
+            The following example shows that for a date and time when daylight savings
+            is in effect (<code class="ph codeph">PDT</code>), the UTC time
+            is 7 hours ahead of the local California time; while when daylight savings
+            is not in effect (<code class="ph codeph">PST</code>), the UTC time is 8 hours ahead of
+            the local California time.
+          </p>
+<pre class="pre codeblock"><code>
+select now() as local_datetime,
+  to_utc_timestamp(now(), 'PDT') as utc_datetime;
++-------------------------------+-------------------------------+
+| local_datetime                | utc_datetime                  |
++-------------------------------+-------------------------------+
+| 2016-05-31 11:50:02.316883000 | 2016-05-31 18:50:02.316883000 |
++-------------------------------+-------------------------------+
+
+select '2016-01-05' as local_datetime,
+  to_utc_timestamp('2016-01-05', 'PST') as utc_datetime;
++----------------+---------------------+
+| local_datetime | utc_datetime        |
++----------------+---------------------+
+| 2016-01-05     | 2016-01-05 08:00:00 |
++----------------+---------------------+
+</code></pre>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="datetime_functions__hour">
+          <code class="ph codeph">hour(timestamp date)</code>
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Returns the hour field from a <code class="ph codeph">TIMESTAMP</code> field.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">int</code>
+          </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+<pre class="pre codeblock"><code>
+select now() as right_now, hour(now()) as current_hour;
++-------------------------------+--------------+
+| right_now                     | current_hour |
++-------------------------------+--------------+
+| 2016-06-01 14:14:12.472846000 | 14           |
++-------------------------------+--------------+
+
+select now() + interval 12 hours as 12_hours_from_now,
+  hour(now() + interval 12 hours) as hour_in_12_hours;
++-------------------------------+-------------------+
+| 12_hours_from_now             | hour_in_12_hours  |
++-------------------------------+-------------------+
+| 2016-06-02 02:15:32.454750000 | 2                 |
++-------------------------------+-------------------+
+</code></pre>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="datetime_functions__hours_add">
+          <code class="ph codeph">hours_add(timestamp date, int hours)</code>, <code class="ph codeph">hours_add(timestamp date, bigint
+          hours)</code>
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Returns the specified date and time plus some number of hours.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">timestamp</code>
+          </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+<pre class="pre codeblock"><code>
+select now() as right_now,
+  hours_add(now(), 12) as in_12_hours;
++-------------------------------+-------------------------------+
+| right_now                     | in_12_hours                   |
++-------------------------------+-------------------------------+
+| 2016-06-01 14:19:48.948107000 | 2016-06-02 02:19:48.948107000 |
++-------------------------------+-------------------------------+
+</code></pre>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="datetime_functions__hours_sub">
+          <code class="ph codeph">hours_sub(timestamp date, int hours)</code>, <code class="ph codeph">hours_sub(timestamp date, bigint
+          hours)</code>
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Returns the specified date and time minus some number of hours.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">timestamp</code>
+          </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+<pre class="pre codeblock"><code>
+select now() as right_now,
+  hours_sub(now(), 18) as 18_hours_ago;
++-------------------------------+-------------------------------+
+| right_now                     | 18_hours_ago                  |
++-------------------------------+-------------------------------+
+| 2016-06-01 14:23:13.868150000 | 2016-05-31 20:23:13.868150000 |
++-------------------------------+-------------------------------+
+</code></pre>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="datetime_functions__int_months_between">
+          <code class="ph codeph">int_months_between(timestamp newer, timestamp older)</code>
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Returns the number of months between the date portions of two <code class="ph codeph">TIMESTAMP</code> values,
+          as an <code class="ph codeph">INT</code> representing only the full months that passed.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">int</code>
+          </p>
+          <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.3.0</span>
+      </p>
+          <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+          <p class="p">
+            Typically used in business contexts, for example to determine whether
+            a specified number of months have passed or whether some end-of-month deadline was reached.
+          </p>
+          <p class="p">
+            The method of determining the number of elapsed months includes some special handling of
+            months with different numbers of days that creates edge cases for dates between the
+            28th and 31st days of certain months. See <code class="ph codeph">months_between()</code> for details.
+            The <code class="ph codeph">int_months_between()</code> result is essentially the <code class="ph codeph">floor()</code>
+            of the <code class="ph codeph">months_between()</code> result.
+          </p>
+          <p class="p">
+            If either value is <code class="ph codeph">NULL</code>, which could happen for example when converting a
+            nonexistent date string such as <code class="ph codeph">'2015-02-29'</code> to a <code class="ph codeph">TIMESTAMP</code>,
+            the result is also <code class="ph codeph">NULL</code>.
+          </p>
+          <p class="p">
+            If the first argument represents an earlier time than the second argument, the result is negative.
+          </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>/* Less than a full month = 0. */
+select int_months_between('2015-02-28', '2015-01-29');
++------------------------------------------------+
+| int_months_between('2015-02-28', '2015-01-29') |
++------------------------------------------------+
+| 0                                              |
++------------------------------------------------+
+
+/* Last day of month to last day of next month = 1. */
+select int_months_between('2015-02-28', '2015-01-31');
++------------------------------------------------+
+| int_months_between('2015-02-28', '2015-01-31') |
++------------------------------------------------+
+| 1                                              |
++------------------------------------------------+
+
+/* Slightly less than 2 months = 1. */
+select int_months_between('2015-03-28', '2015-01-31');
++------------------------------------------------+
+| int_months_between('2015-03-28', '2015-01-31') |
++------------------------------------------------+
+| 1                                              |
++------------------------------------------------+
+
+/* 2 full months (identical days of the month) = 2. */
+select int_months_between('2015-03-31', '2015-01-31');
++------------------------------------------------+
+| int_months_between('2015-03-31', '2015-01-31') |
++------------------------------------------------+
+| 2                                              |
++------------------------------------------------+
+
+/* Last day of month to last day of month-after-next = 2. */
+select int_months_between('2015-03-31', '2015-01-30');
++------------------------------------------------+
+| int_months_between('2015-03-31', '2015-01-30') |
++------------------------------------------------+
+| 2                                              |
++------------------------------------------------+
+</code></pre>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="datetime_functions__last_day">
+          <code class="ph codeph">last_day(timestamp t)</code>
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Returns a <code class="ph codeph">TIMESTAMP</code> corresponding to
+          the beginning of the last calendar day in the same month as the
+          <code class="ph codeph">TIMESTAMP</code> argument.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">timestamp</code>
+          </p>
+          <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.9.0</span>
+      </p>
+          <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+          <p class="p">
+            If the input argument does not represent a valid Impala <code class="ph codeph">TIMESTAMP</code>
+            including both date and time portions, the function returns <code class="ph codeph">NULL</code>.
+            For example, if the input argument is a string that cannot be implicitly cast to
+            <code class="ph codeph">TIMESTAMP</code>, does not include a date portion, or is out of the
+            allowed range for Impala <code class="ph codeph">TIMESTAMP</code> values, the function returns
+            <code class="ph codeph">NULL</code>.
+          </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+          <p class="p">
+            The following example shows how to examine the current date, and dates around the
+            end of the month, as <code class="ph codeph">TIMESTAMP</code> values with any time portion removed:
+          </p>
+<pre class="pre codeblock"><code>
+select
+    now() as right_now
+  , trunc(now(),'dd') as today
+  , last_day(now()) as last_day_of_month
+  , last_day(now()) + interval 1 day as first_of_next_month;
++-------------------------------+---------------------+---------------------+---------------------+
+| right_now                     | today               | last_day_of_month   | first_of_next_month |
++-------------------------------+---------------------+---------------------+---------------------+
+| 2017-08-15 15:07:58.823812000 | 2017-08-15 00:00:00 | 2017-08-31 00:00:00 | 2017-09-01 00:00:00 |
++-------------------------------+---------------------+---------------------+---------------------+
+</code></pre>
+          <p class="p">
+            The following example shows how to examine the current date and dates around the
+            end of the month as integers representing the day of the month:
+          </p>
+<pre class="pre codeblock"><code>
+select
+    now() as right_now
+  , dayofmonth(now()) as day
+  , extract(day from now()) as also_day
+  , dayofmonth(last_day(now())) as last_day
+  , extract(day from last_day(now())) as also_last_day;
++-------------------------------+-----+----------+----------+---------------+
+| right_now                     | day | also_day | last_day | also_last_day |
++-------------------------------+-----+----------+----------+---------------+
+| 2017-08-15 15:07:59.417755000 | 15  | 15       | 31       | 31            |
++-------------------------------+-----+----------+----------+---------------+
+</code></pre>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="datetime_functions__microseconds_add">
+          <code class="ph codeph">microseconds_add(timestamp date, int microseconds)</code>, <code class="ph codeph">microseconds_add(timestamp
+          date, bigint microseconds)</code>
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Returns the specified date and time plus some number of microseconds.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">timestamp</code>
+          </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+<pre class="pre codeblock"><code>
+select now() as right_now,
+  microseconds_add(now(), 500000) as half_a_second_from_now;
++-------------------------------+-------------------------------+
+| right_now                     | half_a_second_from_now        |
++-------------------------------+-------------------------------+
+| 2016-06-01 14:25:11.455051000 | 2016-06-01 14:25:11.955051000 |
++-------------------------------+-------------------------------+
+</code></pre>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="datetime_functions__microseconds_sub">
+          <code class="ph codeph">microseconds_sub(timestamp date, int microseconds)</code>, <code class="ph codeph">microseconds_sub(timestamp
+          date, bigint microseconds)</code>
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Returns the specified date and time minus some number of microseconds.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">timestamp</code>
+          </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+<pre class="pre codeblock"><code>
+select now() as right_now,
+  microseconds_sub(now(), 500000) as half_a_second_ago;
++-------------------------------+-------------------------------+
+| right_now                     | half_a_second_ago             |
++-------------------------------+-------------------------------+
+| 2016-06-01 14:26:16.509990000 | 2016-06-01 14:26:16.009990000 |
++-------------------------------+-------------------------------+
+</code></pre>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="datetime_functions__millisecond">
+          <code class="ph codeph">millisecond(timestamp)</code>
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Returns the millisecond portion of a <code class="ph codeph">TIMESTAMP</code> value.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">int</code>
+          </p>
+          <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.5.0</span>
+      </p>
+          <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+          <p class="p">
+            The millisecond value is truncated, not rounded, if the <code class="ph codeph">TIMESTAMP</code>
+            value contains more than 3 significant digits to the right of the decimal point.
+          </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+<pre class="pre codeblock"><code>
+252.4 milliseconds truncated to 252.
+
+select now(), millisecond(now());
++-------------------------------+--------------------+
+| now()                         | millisecond(now()) |
++-------------------------------+--------------------+
+| 2016-03-14 22:30:25.252400000 | 252                |
++-------------------------------+--------------------+
+
+761.767 milliseconds truncated to 761.
+
+select now(), millisecond(now());
++-------------------------------+--------------------+
+| now()                         | millisecond(now()) |
++-------------------------------+--------------------+
+| 2016-03-14 22:30:58.761767000 | 761                |
++-------------------------------+--------------------+
+</code></pre>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="datetime_functions__milliseconds_add">
+          <code class="ph codeph">milliseconds_add(timestamp date, int milliseconds)</code>, <code class="ph codeph">milliseconds_add(timestamp
+          date, bigint milliseconds)</code>
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Returns the specified date and time plus some number of milliseconds.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">timestamp</code>
+          </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+<pre class="pre codeblock"><code>
+select now() as right_now,
+  milliseconds_add(now(), 1500) as 1_point_5_seconds_from_now;
++-------------------------------+-------------------------------+
+| right_now                     | 1_point_5_seconds_from_now    |
++-------------------------------+-------------------------------+
+| 2016-06-01 14:30:30.067366000 | 2016-06-01 14:30:31.567366000 |
++-------------------------------+-------------------------------+
+</code></pre>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="datetime_functions__milliseconds_sub">
+          <code class="ph codeph">milliseconds_sub(timestamp date, int milliseconds)</code>, <code class="ph codeph">milliseconds_sub(timestamp
+          date, bigint milliseconds)</code>
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Returns the specified date and time minus some number of milliseconds.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">timestamp</code>
+          </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+<pre class="pre codeblock"><code>
+select now() as right_now,
+  milliseconds_sub(now(), 1500) as 1_point_5_seconds_ago;
++-------------------------------+-------------------------------+
+| right_now                     | 1_point_5_seconds_ago         |
++-------------------------------+-------------------------------+
+| 2016-06-01 14:30:53.467140000 | 2016-06-01 14:30:51.967140000 |
++-------------------------------+-------------------------------+
+</code></pre>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="datetime_functions__minute">
+          <code class="ph codeph">minute(timestamp date)</code>
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Returns the minute field from a <code class="ph codeph">TIMESTAMP</code> value.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">int</code>
+          </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+<pre class="pre codeblock"><code>
+select now() as right_now, minute(now()) as current_minute;
++-------------------------------+----------------+
+| right_now                     | current_minute |
++-------------------------------+----------------+
+| 2016-06-01 14:34:08.051702000 | 34             |
++-------------------------------+----------------+
+</code></pre>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="datetime_functions__minutes_add">
+          <code class="ph codeph">minutes_add(timestamp date, int minutes)</code>, <code class="ph codeph">minutes_add(timestamp date, bigint
+          minutes)</code>
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Returns the specified date and time plus some number of minutes.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">timestamp</code>
+          </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+<pre class="pre codeblock"><code>
+select now() as right_now, minutes_add(now(), 90) as 90_minutes_from_now;
++-------------------------------+-------------------------------+
+| right_now                     | 90_minutes_from_now           |
++-------------------------------+-------------------------------+
+| 2016-06-01 14:36:04.887095000 | 2016-06-01 16:06:04.887095000 |
++-------------------------------+-------------------------------+
+</code></pre>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="datetime_functions__minutes_sub">
+          <code class="ph codeph">minutes_sub(timestamp date, int minutes)</code>, <code class="ph codeph">minutes_sub(timestamp date, bigint
+          minutes)</code>
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Returns the specified date and time minus some number of minutes.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">timestamp</code>
+          </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+<pre class="pre codeblock"><code>
+select now() as right_now, minutes_sub(now(), 90) as 90_minutes_ago;
++-------------------------------+-------------------------------+
+| right_now                     | 90_minutes_ago                |
++-------------------------------+-------------------------------+
+| 2016-06-01 14:36:32.643061000 | 2016-06-01 13:06:32.643061000 |
++-------------------------------+-------------------------------+
+</code></pre>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="datetime_functions__month">
+          <code class="ph codeph">month(timestamp date)</code>
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Returns the month field, represented as an integer, from the date portion of a <code class="ph codeph">TIMESTAMP</code>.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">int</code>
+          </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+<pre class="pre codeblock"><code>
+select now() as right_now, month(now()) as current_month;
++-------------------------------+---------------+
+| right_now                     | current_month |
++-------------------------------+---------------+
+| 2016-06-01 14:43:37.141542000 | 6             |
++-------------------------------+---------------+
+</code></pre>
+        </dd>
+
+
+
+        <dt class="dt dlterm" id="datetime_functions__monthname">
+          <code class="ph codeph">monthname(timestamp date)</code>
+        </dt>
+        <dd class="dd">
+          <strong class="ph b">Purpose:</strong> Returns the month field from a
+            <code class="ph codeph">TIMESTAMP</code> value, converted to the string
+          corresponding to that month name.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+          </p>
+        </dd>
+
+
+
+
+        <dt class="dt dlterm" id="datetime_functions__months_add">
+          <code class="ph codeph">months_add(timestamp date, int months)</code>, <code class="ph codeph">months_add(timestamp date, bigint
+          months)</code>
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Returns the specified date and time plus some number of months.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">timestamp</code>
+          </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+          <p class="p">
+            The following example shows the effects of adding some number of
+            months to a <code class="ph codeph">TIMESTAMP</code> value, using both the
+            <code class="ph codeph">months_add()</code> function and its <code class="ph codeph">add_months()</code>
+            alias. These examples use <code class="ph codeph">trunc()</code> to strip off the time portion
+            and leave just the date.
+          </p>
+<pre class="pre codeblock"><code>
+with t1 as (select trunc(now(), 'dd') as today)
+  select today, months_add(today,1) as next_month from t1;
++---------------------+---------------------+
+| today               | next_month          |
++---------------------+---------------------+
+| 2016-05-19 00:00:00 | 2016-06-19 00:00:00 |
++---------------------+---------------------+
+
+with t1 as (select trunc(now(), 'dd') as today)
+  select today, add_months(today,1) as next_month from t1;
++---------------------+---------------------+
+| today               | next_month          |
++---------------------+---------------------+
+| 2016-05-19 00:00:00 | 2016-06-19 00:00:00 |
++---------------------+---------------------+
+</code></pre>
+          <p class="p">
+            The following examples show how if <code class="ph codeph">months_add()</code>
+            would return a nonexistent date, due to different months having
+            different numbers of days, the function returns a <code class="ph codeph">TIMESTAMP</code>
+            from the last day of the relevant month. For example, adding one month
+            to January 31 produces a date of February 29th in the year 2016 (a leap year),
+            and February 28th in the year 2015 (a non-leap year).
+          </p>
+<pre class="pre codeblock"><code>
+with t1 as (select cast('2016-01-31' as timestamp) as jan_31)
+  select jan_31, months_add(jan_31,1) as feb_31 from t1;
++---------------------+---------------------+
+| jan_31              | feb_31              |
++---------------------+---------------------+
+| 2016-01-31 00:00:00 | 2016-02-29 00:00:00 |
++---------------------+---------------------+
+
+with t1 as (select cast('2015-01-31' as timestamp) as jan_31)
+  select jan_31, months_add(jan_31,1) as feb_31 from t1;
++---------------------+---------------------+
+| jan_31              | feb_31              |
++---------------------+---------------------+
+| 2015-01-31 00:00:00 | 2015-02-28 00:00:00 |
++---------------------+---------------------+
+</code></pre>
+        </dd>
+
+
+
+
+
+        <dt class="dt dlterm" id="datetime_functions__months_between">
+          <code class="ph codeph">months_between(timestamp newer, timestamp older)</code>
+        </dt>
+
+        <dd class="dd">
+
+          <strong class="ph b">Purpose:</strong> Returns the number of months between the date portions of two <code class="ph codeph">TIMESTAMP</code> values.
+          Can include a fractional part representing extra days in addition to the full months
+          between the dates. The fractional component is computed by dividing the difference in days by 31 (regardless of the month).
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">double</code>
+          </p>
+          <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.3.0</span>
+      </p>
+          <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+          <p class="p">
+            Typically used in business contexts, for example to determine whether
+            a specified number of months have passed or whether some end-of-month deadline was reached.
+          </p>
+          <p class="p">
+            If the only consideration is the number of full months and any fractional value is
+            not significant, use <code class="ph codeph">int_months_between()</code> instead.
+          </p>
+          <p class="p">
+            The method of determining the number of elapsed months includes some special handling of
+            months with different numbers of days that creates edge cases for dates between the
+            28th and 31st days of certain months.
+          </p>
+          <p class="p">
+            If either value is <code class="ph codeph">NULL</code>, which could happen for example when converting a
+            nonexistent date string such as <code class="ph codeph">'2015-02-29'</code> to a <code class="ph codeph">TIMESTAMP</code>,
+            the result is also <code class="ph codeph">NULL</code>.
+          </p>
+          <p class="p">
+            If the first argument represents an earlier time than the second argument, the result is negative.
+          </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+          <p class="p">
+            The following examples show how dates that are on the same day of the month
+            are considered to be exactly N months apart, even if the months have different
+            numbers of days.
+          </p>
+<pre class="pre codeblock"><code>select months_between('2015-02-28', '2015-01-28');
++--------------------------------------------+
+| months_between('2015-02-28', '2015-01-28') |
++--------------------------------------------+
+| 1                                          |
++--------------------------------------------+
+
+select months_between(now(), now() + interval 1 month);
++-------------------------------------------------+
+| months_between(now(), now() + interval 1 month) |
++-------------------------------------------------+
+| -1                                              |
++-------------------------------------------------+
+
+select months_between(now() + interval 1 year, now());
++------------------------------------------------+
+| months_between(now() + interval 1 year, now()) |
++------------------------------------------------+
+| 12                                             |
++------------------------------------------------+
+</code></pre>
+          <p class="p">
+            The following examples show how dates that are on the last day of the month
+            are considered to be exactly N months apart, even if the months have different
+            numbers of days. For example, from January 28th to February 28th is exactly one
+            month because the day of the month is identical; January 31st to February 28th
+            is exactly one month because in both cases it is the last day of the month;
+            but January 29th or 30th to February 28th is considered a fractional month.
+          </p>
+<pre class="pre codeblock"><code>select months_between('2015-02-28', '2015-01-31');
++--------------------------------------------+
+| months_between('2015-02-28', '2015-01-31') |
++--------------------------------------------+
+| 1                                          |
++--------------------------------------------+
+
+select months_between('2015-02-28', '2015-01-29');
++--------------------------------------------+
+| months_between('2015-02-28', '2015-01-29') |
++--------------------------------------------+
+| 0.967741935483871                          |
++--------------------------------------------+
+
+select months_between('2015-02-28', '2015-01-30');;
++--------------------------------------------+
+| months_between('2015-02-28', '2015-01-30') |
++--------------------------------------------+
+| 0.935483870967742                          |
++--------------------------------------------+
+</code></pre>
+          <p class="p">
+            The following examples show how dates that are not a precise number
+            of months apart result in a fractional return value.
+          </p>
+<pre class="pre codeblock"><code>select months_between('2015-03-01', '2015-01-28');
++--------------------------------------------+
+| months_between('2015-03-01', '2015-01-28') |
++--------------------------------------------+
+| 1.129032258064516                          |
++--------------------------------------------+
+
+select months_between('2015-03-01', '2015-02-28');
++--------------------------------------------+
+| months_between('2015-03-01', '2015-02-28') |
++--------------------------------------------+
+| 0.1290322580645161                         |
++--------------------------------------------+
+
+select months_between('2015-06-02', '2015-05-29');
++--------------------------------------------+
+| months_between('2015-06-02', '2015-05-29') |
++--------------------------------------------+
+| 0.1290322580645161                         |
++--------------------------------------------+
+
+select months_between('2015-03-01', '2015-01-25');
++--------------------------------------------+
+| months_between('2015-03-01', '2015-01-25') |
++--------------------------------------------+
+| 1.225806451612903                          |
++--------------------------------------------+
+
+select months_between('2015-03-01', '2015-02-25');
++--------------------------------------------+
+| months_between('2015-03-01', '2015-02-25') |
++--------------------------------------------+
+| 0.2258064516129032                         |
++--------------------------------------------+
+
+select months_between('2015-02-28', '2015-02-01');
++--------------------------------------------+
+| months_between('2015-02-28', '2015-02-01') |
++--------------------------------------------+
+| 0.8709677419354839                         |
++--------------------------------------------+
+
+select months_between('2015-03-28', '2015-03-01');
++--------------------------------------------+
+| months_between('2015-03-28', '2015-03-01') |
++--------------------------------------------+
+| 0.8709677419354839                         |
++--------------------------------------------+
+</code></pre>
+          <p class="p">
+            The following examples show how the time portion of the <code class="ph codeph">TIMESTAMP</code>
+            values are irrelevant for calculating the month interval. Even the fractional part
+            of the result only depends on the number of full days between the argument values,
+            regardless of the time portion.
+          </p>
+<pre class="pre codeblock"><code>select months_between('2015-05-28 23:00:00', '2015-04-28 11:45:00');
++--------------------------------------------------------------+
+| months_between('2015-05-28 23:00:00', '2015-04-28 11:45:00') |
++--------------------------------------------------------------+
+| 1                                                            |
++--------------------------------------------------------------+
+
+select months_between('2015-03-28', '2015-03-01');
++--------------------------------------------+
+| months_between('2015-03-28', '2015-03-01') |
++--------------------------------------------+
+| 0.8709677419354839                         |


<TRUNCATED>

[48/51] [partial] impala git commit: [DOCS] Impala doc site update for 3.0

Posted by mi...@apache.org.

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_analytic_functions.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_analytic_functions.html b/docs/build3x/html/topics/impala_analytic_functions.html
new file mode 100644
index 0000000..607633d
--- /dev/null
+++ b/docs/build3x/html/topics/impala_analytic_functions.html
@@ -0,0 +1,1785 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_functions.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Im
 pala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="an
 alytic_functions"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Impala Analytic Functions</title></head><body id="analytic_functions"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Impala Analytic Functions</h1>
+
+
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+
+
+
+      Analytic functions (also known as window functions) are a special category of built-in functions. Like
+      aggregate functions, they examine the contents of multiple input rows to compute each output value. However,
+      rather than being limited to one result value per <code class="ph codeph">GROUP BY</code> group, they operate on
+      <dfn class="term">windows</dfn> where the input rows are ordered and grouped using flexible conditions expressed through
+      an <code class="ph codeph">OVER()</code> clause.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.0.0</span>
+      </p>
+
+
+
+    <p class="p">
+      Some functions, such as <code class="ph codeph">LAG()</code> and <code class="ph codeph">RANK()</code>, can only be used in this analytic
+      context. Some aggregate functions do double duty: when you call the aggregation functions such as
+      <code class="ph codeph">MAX()</code>, <code class="ph codeph">SUM()</code>, <code class="ph codeph">AVG()</code>, and so on with an
+      <code class="ph codeph">OVER()</code> clause, they produce an output value for each row, based on computations across other
+      rows in the window.
+    </p>
+
+    <p class="p">
+      Although analytic functions often compute the same value you would see from an aggregate function in a
+      <code class="ph codeph">GROUP BY</code> query, the analytic functions produce a value for each row in the result set rather
+      than a single value for each group. This flexibility lets you include additional columns in the
+      <code class="ph codeph">SELECT</code> list, offering more opportunities for organizing and filtering the result set.
+    </p>
+
+    <p class="p">
+      Analytic function calls are only allowed in the <code class="ph codeph">SELECT</code> list and in the outermost
+      <code class="ph codeph">ORDER BY</code> clause of the query. During query processing, analytic functions are evaluated
+      after other query stages such as joins, <code class="ph codeph">WHERE</code>, and <code class="ph codeph">GROUP BY</code>,
+    </p>
+
+
+
+
+
+
+
+
+
+    <p class="p">
+      The rows that are part of each partition are analyzed by computations across an ordered or unordered set of
+      rows. For example, <code class="ph codeph">COUNT()</code> and <code class="ph codeph">SUM()</code> might be applied to all the rows in
+      the partition, in which case the order of analysis does not matter. The <code class="ph codeph">ORDER BY</code> clause
+      might be used inside the <code class="ph codeph">OVER()</code> clause to defines the ordering that applies to functions
+      such as <code class="ph codeph">LAG()</code> and <code class="ph codeph">FIRST_VALUE()</code>.
+    </p>
+
+
+
+
+
+    <p class="p">
+      Analytic functions are frequently used in fields such as finance and science to provide trend, outlier, and
+      bucketed analysis for large data sets. You might also see the term <span class="q">"window functions"</span> in database
+      literature, referring to the sequence of rows (the <span class="q">"window"</span>) that the function call applies to,
+      particularly when the <code class="ph codeph">OVER</code> clause includes a <code class="ph codeph">ROWS</code> or <code class="ph codeph">RANGE</code>
+      keyword.
+    </p>
+
+    <p class="p">
+      The following sections describe the analytic query clauses and the pure analytic functions provided by
+      Impala. For usage information about aggregate functions in an analytic context, see
+      <a class="xref" href="impala_aggregate_functions.html#aggregate_functions">Impala Aggregate Functions</a>.
+    </p>
+
+    <p class="p toc inpage"></p>
+
+  </div>
+
+  <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_functions.html">Impala Built-In Functions</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="analytic_functions__over">
+
+    <h2 class="title topictitle2" id="ariaid-title2">OVER Clause</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        The <code class="ph codeph">OVER</code> clause is required for calls to pure analytic functions such as
+        <code class="ph codeph">LEAD()</code>, <code class="ph codeph">RANK()</code>, and <code class="ph codeph">FIRST_VALUE()</code>. When you include an
+        <code class="ph codeph">OVER</code> clause with calls to aggregate functions such as <code class="ph codeph">MAX()</code>,
+        <code class="ph codeph">COUNT()</code>, or <code class="ph codeph">SUM()</code>, they operate as analytic functions.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>function(<var class="keyword varname">args</var>) OVER([<var class="keyword varname">partition_by_clause</var>] [<var class="keyword varname">order_by_clause</var> [<var class="keyword varname">window_clause</var>]])
+
+partition_by_clause ::= PARTITION BY <var class="keyword varname">expr</var> [, <var class="keyword varname">expr</var> ...]
+order_by_clause ::= ORDER BY <var class="keyword varname">expr</var>  [ASC | DESC] [NULLS FIRST | NULLS LAST] [, <var class="keyword varname">expr</var> [ASC | DESC] [NULLS FIRST | NULLS LAST] ...]
+window_clause: See <a class="xref" href="#window_clause">Window Clause</a>
+</code></pre>
+
+      <p class="p">
+        <strong class="ph b">PARTITION BY clause:</strong>
+      </p>
+
+      <p class="p">
+        The <code class="ph codeph">PARTITION BY</code> clause acts much like the <code class="ph codeph">GROUP BY</code> clause in the
+        outermost block of a query. It divides the rows into groups containing identical values in one or more
+        columns. These logical groups are known as <dfn class="term">partitions</dfn>. Throughout the discussion of analytic
+        functions, <span class="q">"partitions"</span> refers to the groups produced by the <code class="ph codeph">PARTITION BY</code> clause, not
+        to partitioned tables. However, note the following limitation that applies specifically to analytic function
+        calls involving partitioned tables.
+      </p>
+
+      <p class="p">
+        In queries involving both analytic functions and partitioned tables, partition pruning only occurs for columns named in the <code class="ph codeph">PARTITION BY</code>
+        clause of the analytic function call. For example, if an analytic function query has a clause such as <code class="ph codeph">WHERE year=2016</code>,
+        the way to make the query prune all other <code class="ph codeph">YEAR</code> partitions is to include <code class="ph codeph">PARTITION BY year</code> in the analytic function call;
+        for example, <code class="ph codeph">OVER (PARTITION BY year,<var class="keyword varname">other_columns</var> <var class="keyword varname">other_analytic_clauses</var>)</code>.
+
+      </p>
+
+      <p class="p">
+        The sequence of results from an analytic function <span class="q">"resets"</span> for each new partition in the result set.
+        That is, the set of preceding or following rows considered by the analytic function always come from a
+        single partition. Any <code class="ph codeph">MAX()</code>, <code class="ph codeph">SUM()</code>, <code class="ph codeph">ROW_NUMBER()</code>, and so
+        on apply to each partition independently. Omit the <code class="ph codeph">PARTITION BY</code> clause to apply the
+        analytic operation to all the rows in the table.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">ORDER BY clause:</strong>
+      </p>
+
+      <p class="p">
+        The <code class="ph codeph">ORDER BY</code> clause works much like the <code class="ph codeph">ORDER BY</code> clause in the outermost
+        block of a query. It defines the order in which rows are evaluated for the entire input set, or for each
+        group produced by a <code class="ph codeph">PARTITION BY</code> clause. You can order by one or multiple expressions, and
+        for each expression optionally choose ascending or descending order and whether nulls come first or last in
+        the sort order. Because this <code class="ph codeph">ORDER BY</code> clause only defines the order in which rows are
+        evaluated, if you want the results to be output in a specific order, also include an <code class="ph codeph">ORDER
+        BY</code> clause in the outer block of the query.
+      </p>
+
+      <p class="p">
+        When the <code class="ph codeph">ORDER BY</code> clause is omitted, the analytic function applies to all items in the
+        group produced by the <code class="ph codeph">PARTITION BY</code> clause. When the <code class="ph codeph">ORDER BY</code> clause is
+        included, the analysis can apply to all or a subset of the items in the group, depending on the optional
+        window clause.
+      </p>
+
+      <p class="p">
+        The order in which the rows are analyzed is only defined for those columns specified in <code class="ph codeph">ORDER
+        BY</code> clauses.
+      </p>
+
+      <p class="p">
+        One difference between the analytic and outer uses of the <code class="ph codeph">ORDER BY</code> clause: inside the
+        <code class="ph codeph">OVER</code> clause, <code class="ph codeph">ORDER BY 1</code> or other integer value is interpreted as a
+        constant sort value (effectively a no-op) rather than referring to column 1.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Window clause:</strong>
+      </p>
+
+      <p class="p">
+        The window clause is only allowed in combination with an <code class="ph codeph">ORDER BY</code> clause. If the
+        <code class="ph codeph">ORDER BY</code> clause is specified but the window clause is not, the default window is
+        <code class="ph codeph">RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW</code>. See
+        <a class="xref" href="impala_analytic_functions.html#window_clause">Window Clause</a> for full details.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">HBase considerations:</strong>
+      </p>
+
+      <p class="p">
+        Because HBase tables are optimized for single-row lookups rather than full scans, analytic functions using
+        the <code class="ph codeph">OVER()</code> clause are not recommended for HBase tables. Although such queries work, their
+        performance is lower than on comparable tables using HDFS data files.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Parquet considerations:</strong>
+      </p>
+
+      <p class="p">
+        Analytic functions are very efficient for Parquet tables. The data that is examined during evaluation of
+        the <code class="ph codeph">OVER()</code> clause comes from a specified set of columns, and the values for each column
+        are arranged sequentially within each data file.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Text table considerations:</strong>
+      </p>
+
+      <p class="p">
+        Analytic functions are convenient to use with text tables for exploratory business intelligence. When the
+        volume of data is substantial, prefer to use Parquet tables for performance-critical analytic queries.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.0.0</span>
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+      <p class="p">
+        The following example shows how to synthesize a numeric sequence corresponding to all the rows in a table.
+        The new table has the same columns as the old one, plus an additional column <code class="ph codeph">ID</code> containing
+        the integers 1, 2, 3, and so on, corresponding to the order of a <code class="ph codeph">TIMESTAMP</code> column in the
+        original table.
+      </p>
+
+
+
+<pre class="pre codeblock"><code>CREATE TABLE events_with_id AS
+  SELECT
+    row_number() OVER (ORDER BY date_and_time) AS id,
+    c1, c2, c3, c4
+  FROM events;
+</code></pre>
+
+      <p class="p">
+        The following example shows how to determine the number of rows containing each value for a column. Unlike
+        a corresponding <code class="ph codeph">GROUP BY</code> query, this one can analyze a single column and still return all
+        values (not just the distinct ones) from the other columns.
+      </p>
+
+
+
+<pre class="pre codeblock"><code>SELECT x, y, z,
+  count() OVER (PARTITION BY x) AS how_many_x
+FROM t1;
+</code></pre>
+
+      <p class="p">
+        <strong class="ph b">Restrictions:</strong>
+      </p>
+
+      <p class="p">
+        You cannot directly combine the <code class="ph codeph">DISTINCT</code> operator with analytic function calls. You can
+        put the analytic function call in a <code class="ph codeph">WITH</code> clause or an inline view, and apply the
+        <code class="ph codeph">DISTINCT</code> operator to its result set.
+      </p>
+
+<pre class="pre codeblock"><code>WITH t1 AS (SELECT x, sum(x) OVER (PARTITION BY x) AS total FROM t1)
+  SELECT DISTINCT x, total FROM t1;
+</code></pre>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="analytic_functions__window_clause">
+
+    <h2 class="title topictitle2" id="ariaid-title3">Window Clause</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        Certain analytic functions accept an optional <dfn class="term">window clause</dfn>, which makes the function analyze
+        only certain rows <span class="q">"around"</span> the current row rather than all rows in the partition. For example, you can
+        get a moving average by specifying some number of preceding and following rows, or a running count or
+        running total by specifying all rows up to the current position. This clause can result in different
+        analytic results for rows within the same partition.
+      </p>
+
+      <p class="p">
+        The window clause is supported with the <code class="ph codeph">AVG()</code>, <code class="ph codeph">COUNT()</code>,
+        <code class="ph codeph">FIRST_VALUE()</code>, <code class="ph codeph">LAST_VALUE()</code>, and <code class="ph codeph">SUM()</code> functions.
+
+        For <code class="ph codeph">MAX()</code> and <code class="ph codeph">MIN()</code>, the window clause only allowed if the start bound is
+        <code class="ph codeph">UNBOUNDED PRECEDING</code>
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>ROWS BETWEEN [ { <var class="keyword varname">m</var> | UNBOUNDED } PRECEDING | CURRENT ROW] [ AND [CURRENT ROW | { UNBOUNDED | <var class="keyword varname">n</var> } FOLLOWING] ]
+RANGE BETWEEN [ {<var class="keyword varname">m</var> | UNBOUNDED } PRECEDING | CURRENT ROW] [ AND [CURRENT ROW | { UNBOUNDED | <var class="keyword varname">n</var> } FOLLOWING] ]</code></pre>
+
+      <p class="p">
+        <code class="ph codeph">ROWS BETWEEN</code> defines the size of the window in terms of the indexes of the rows in the
+        result set. The size of the window is predictable based on the clauses the position within the result set.
+      </p>
+
+      <p class="p">
+        <code class="ph codeph">RANGE BETWEEN</code> does not currently support numeric arguments to define a variable-size
+        sliding window.
+
+      </p>
+
+
+
+      <p class="p">
+        Currently, Impala supports only some combinations of arguments to the <code class="ph codeph">RANGE</code> clause:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          <code class="ph codeph">RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW</code> (the default when <code class="ph codeph">ORDER
+          BY</code> is specified and the window clause is omitted)
+        </li>
+
+        <li class="li">
+          <code class="ph codeph">RANGE BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING</code>
+        </li>
+
+        <li class="li">
+          <code class="ph codeph">RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING</code>
+        </li>
+      </ul>
+
+      <p class="p">
+        When <code class="ph codeph">RANGE</code> is used, <code class="ph codeph">CURRENT ROW</code> includes not just the current row but all
+        rows that are tied with the current row based on the <code class="ph codeph">ORDER BY</code> expressions.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.0.0</span>
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+      <p class="p">
+        The following examples show financial data for a fictional stock symbol <code class="ph codeph">JDR</code>. The closing
+        price moves up and down each day.
+      </p>
+
+<pre class="pre codeblock"><code>create table stock_ticker (stock_symbol string, closing_price decimal(8,2), closing_date timestamp);
+...load some data...
+select * from stock_ticker order by stock_symbol, closing_date
++--------------+---------------+---------------------+
+| stock_symbol | closing_price | closing_date        |
++--------------+---------------+---------------------+
+| JDR          | 12.86         | 2014-10-02 00:00:00 |
+| JDR          | 12.89         | 2014-10-03 00:00:00 |
+| JDR          | 12.94         | 2014-10-04 00:00:00 |
+| JDR          | 12.55         | 2014-10-05 00:00:00 |
+| JDR          | 14.03         | 2014-10-06 00:00:00 |
+| JDR          | 14.75         | 2014-10-07 00:00:00 |
+| JDR          | 13.98         | 2014-10-08 00:00:00 |
++--------------+---------------+---------------------+
+</code></pre>
+
+      <p class="p">
+        The queries use analytic functions with window clauses to compute moving averages of the closing price. For
+        example, <code class="ph codeph">ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING</code> produces an average of the value from a
+        3-day span, producing a different value for each row. The first row, which has no preceding row, only gets
+        averaged with the row following it. If the table contained more than one stock symbol, the
+        <code class="ph codeph">PARTITION BY</code> clause would limit the window for the moving average to only consider the
+        prices for a single stock.
+      </p>
+
+<pre class="pre codeblock"><code>select stock_symbol, closing_date, closing_price,
+  avg(closing_price) over (partition by stock_symbol order by closing_date
+    rows between 1 preceding and 1 following) as moving_average
+  from stock_ticker;
++--------------+---------------------+---------------+----------------+
+| stock_symbol | closing_date        | closing_price | moving_average |
++--------------+---------------------+---------------+----------------+
+| JDR          | 2014-10-02 00:00:00 | 12.86         | 12.87          |
+| JDR          | 2014-10-03 00:00:00 | 12.89         | 12.89          |
+| JDR          | 2014-10-04 00:00:00 | 12.94         | 12.79          |
+| JDR          | 2014-10-05 00:00:00 | 12.55         | 13.17          |
+| JDR          | 2014-10-06 00:00:00 | 14.03         | 13.77          |
+| JDR          | 2014-10-07 00:00:00 | 14.75         | 14.25          |
+| JDR          | 2014-10-08 00:00:00 | 13.98         | 14.36          |
++--------------+---------------------+---------------+----------------+
+</code></pre>
+
+      <p class="p">
+        The clause <code class="ph codeph">ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW</code> produces a cumulative moving
+        average, from the earliest data up to the value for each day.
+      </p>
+
+<pre class="pre codeblock"><code>select stock_symbol, closing_date, closing_price,
+  avg(closing_price) over (partition by stock_symbol order by closing_date
+    rows between unbounded preceding and current row) as moving_average
+  from stock_ticker;
++--------------+---------------------+---------------+----------------+
+| stock_symbol | closing_date        | closing_price | moving_average |
++--------------+---------------------+---------------+----------------+
+| JDR          | 2014-10-02 00:00:00 | 12.86         | 12.86          |
+| JDR          | 2014-10-03 00:00:00 | 12.89         | 12.87          |
+| JDR          | 2014-10-04 00:00:00 | 12.94         | 12.89          |
+| JDR          | 2014-10-05 00:00:00 | 12.55         | 12.81          |
+| JDR          | 2014-10-06 00:00:00 | 14.03         | 13.05          |
+| JDR          | 2014-10-07 00:00:00 | 14.75         | 13.33          |
+| JDR          | 2014-10-08 00:00:00 | 13.98         | 13.42          |
++--------------+---------------------+---------------+----------------+
+</code></pre>
+
+
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="analytic_functions__avg_analytic">
+
+    <h2 class="title topictitle2" id="ariaid-title4">AVG Function - Analytic Context</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        You can include an <code class="ph codeph">OVER</code> clause with a call to this function to use it as an analytic
+        function. See <a class="xref" href="impala_avg.html#avg">AVG Function</a> for details and examples.
+      </p>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title5" id="analytic_functions__count_analytic">
+
+    <h2 class="title topictitle2" id="ariaid-title5">COUNT Function - Analytic Context</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        You can include an <code class="ph codeph">OVER</code> clause with a call to this function to use it as an analytic
+        function. See <a class="xref" href="impala_count.html#count">COUNT Function</a> for details and examples.
+      </p>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title6" id="analytic_functions__cume_dist">
+
+    <h2 class="title topictitle2" id="ariaid-title6">CUME_DIST Function (<span class="keyword">Impala 2.3</span> or higher only)</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        Returns the cumulative distribution of a value. The value for each row in the result set is greater than 0
+        and less than or equal to 1.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>CUME_DIST (<var class="keyword varname">expr</var>)
+  OVER ([<var class="keyword varname">partition_by_clause</var>] <var class="keyword varname">order_by_clause</var>)
+</code></pre>
+
+      <p class="p">
+        The <code class="ph codeph">ORDER BY</code> clause is required. The <code class="ph codeph">PARTITION BY</code> clause is optional. The
+        window clause is not allowed.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+      <p class="p">
+        Within each partition of the result set, the <code class="ph codeph">CUME_DIST()</code> value represents an ascending
+        sequence that ends at 1. Each value represents the proportion of rows in the partition whose values are
+        less than or equal to the value in the current row.
+      </p>
+
+      <p class="p">
+        If the sequence of input values contains ties, the <code class="ph codeph">CUME_DIST()</code> results are identical for the
+        tied values.
+      </p>
+
+      <p class="p">
+        Impala only supports the <code class="ph codeph">CUME_DIST()</code> function in an analytic context, not as a regular
+        aggregate function.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+      <p class="p">
+        This example uses a table with 9 rows. The <code class="ph codeph">CUME_DIST()</code>
+        function evaluates the entire table because there is no <code class="ph codeph">PARTITION BY</code> clause,
+        with the rows ordered by the weight of the animal.
+        the sequence of values shows that 1/9 of the values are less than or equal to the lightest
+        animal (mouse), 2/9 of the values are less than or equal to the second-lightest animal,
+        and so on up to the heaviest animal (elephant), where 9/9 of the rows are less than or
+        equal to its weight.
+      </p>
+
+<pre class="pre codeblock"><code>create table animals (name string, kind string, kilos decimal(9,3));
+insert into animals values
+  ('Elephant', 'Mammal', 4000), ('Giraffe', 'Mammal', 1200), ('Mouse', 'Mammal', 0.020),
+  ('Condor', 'Bird', 15), ('Horse', 'Mammal', 500), ('Owl', 'Bird', 2.5),
+  ('Ostrich', 'Bird', 145), ('Polar bear', 'Mammal', 700), ('Housecat', 'Mammal', 5);
+
+select name, cume_dist() over (order by kilos) from animals;
++------------+-----------------------+
+| name       | cume_dist() OVER(...) |
++------------+-----------------------+
+| Elephant   | 1                     |
+| Giraffe    | 0.8888888888888888    |
+| Polar bear | 0.7777777777777778    |
+| Horse      | 0.6666666666666666    |
+| Ostrich    | 0.5555555555555556    |
+| Condor     | 0.4444444444444444    |
+| Housecat   | 0.3333333333333333    |
+| Owl        | 0.2222222222222222    |
+| Mouse      | 0.1111111111111111    |
++------------+-----------------------+
+</code></pre>
+
+      <p class="p">
+        Using a <code class="ph codeph">PARTITION BY</code> clause produces a separate sequence for each partition
+        group, in this case one for mammals and one for birds. Because there are 3 birds and 6 mammals,
+        the sequence illustrates how 1/3 of the <span class="q">"Bird"</span> rows have a <code class="ph codeph">kilos</code> value that is less than or equal to
+        the lightest bird, 1/6 of the <span class="q">"Mammal"</span> rows have a <code class="ph codeph">kilos</code> value that is less than or equal to
+        the lightest mammal, and so on until both the heaviest bird and heaviest mammal have a <code class="ph codeph">CUME_DIST()</code>
+        value of 1.
+      </p>
+
+<pre class="pre codeblock"><code>select name, kind, cume_dist() over (partition by kind order by kilos) from animals
++------------+--------+-----------------------+
+| name       | kind   | cume_dist() OVER(...) |
++------------+--------+-----------------------+
+| Ostrich    | Bird   | 1                     |
+| Condor     | Bird   | 0.6666666666666666    |
+| Owl        | Bird   | 0.3333333333333333    |
+| Elephant   | Mammal | 1                     |
+| Giraffe    | Mammal | 0.8333333333333334    |
+| Polar bear | Mammal | 0.6666666666666666    |
+| Horse      | Mammal | 0.5                   |
+| Housecat   | Mammal | 0.3333333333333333    |
+| Mouse      | Mammal | 0.1666666666666667    |
++------------+--------+-----------------------+
+</code></pre>
+
+      <p class="p">
+        We can reverse the ordering within each partition group by using an <code class="ph codeph">ORDER BY ... DESC</code>
+        clause within the <code class="ph codeph">OVER()</code> clause. Now the lightest (smallest value of <code class="ph codeph">kilos</code>)
+        animal of each kind has a <code class="ph codeph">CUME_DIST()</code> value of 1.
+      </p>
+
+<pre class="pre codeblock"><code>select name, kind, cume_dist() over (partition by kind order by kilos desc) from animals
++------------+--------+-----------------------+
+| name       | kind   | cume_dist() OVER(...) |
++------------+--------+-----------------------+
+| Owl        | Bird   | 1                     |
+| Condor     | Bird   | 0.6666666666666666    |
+| Ostrich    | Bird   | 0.3333333333333333    |
+| Mouse      | Mammal | 1                     |
+| Housecat   | Mammal | 0.8333333333333334    |
+| Horse      | Mammal | 0.6666666666666666    |
+| Polar bear | Mammal | 0.5                   |
+| Giraffe    | Mammal | 0.3333333333333333    |
+| Elephant   | Mammal | 0.1666666666666667    |
++------------+--------+-----------------------+
+</code></pre>
+
+      <p class="p">
+        The following example manufactures some rows with identical values in the <code class="ph codeph">kilos</code> column,
+        to demonstrate how the results look in case of tie values. For simplicity, it only shows the <code class="ph codeph">CUME_DIST()</code>
+        sequence for the <span class="q">"Bird"</span> rows. Now with 3 rows all with a value of 15, all of those rows have the same
+        <code class="ph codeph">CUME_DIST()</code> value. 4/5 of the rows have a value for <code class="ph codeph">kilos</code> that is less than or
+        equal to 15.
+      </p>
+
+<pre class="pre codeblock"><code>insert into animals values ('California Condor', 'Bird', 15), ('Andean Condor', 'Bird', 15)
+
+select name, kind, cume_dist() over (order by kilos) from animals where kind = 'Bird';
++-------------------+------+-----------------------+
+| name              | kind | cume_dist() OVER(...) |
++-------------------+------+-----------------------+
+| Ostrich           | Bird | 1                     |
+| Condor            | Bird | 0.8                   |
+| California Condor | Bird | 0.8                   |
+| Andean Condor     | Bird | 0.8                   |
+| Owl               | Bird | 0.2                   |
++-------------------+------+-----------------------+
+</code></pre>
+
+      <p class="p">
+        The following example shows how to use an <code class="ph codeph">ORDER BY</code> clause in the outer block
+        to order the result set in case of ties. Here, all the <span class="q">"Bird"</span> rows are together, then in descending order
+        by the result of the <code class="ph codeph">CUME_DIST()</code> function, and all tied <code class="ph codeph">CUME_DIST()</code>
+        values are ordered by the animal name.
+      </p>
+
+<pre class="pre codeblock"><code>select name, kind, cume_dist() over (partition by kind order by kilos) as ordering
+  from animals
+where
+  kind = 'Bird'
+order by kind, ordering desc, name;
++-------------------+------+----------+
+| name              | kind | ordering |
++-------------------+------+----------+
+| Ostrich           | Bird | 1        |
+| Andean Condor     | Bird | 0.8      |
+| California Condor | Bird | 0.8      |
+| Condor            | Bird | 0.8      |
+| Owl               | Bird | 0.2      |
++-------------------+------+----------+
+</code></pre>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title7" id="analytic_functions__dense_rank">
+
+    <h2 class="title topictitle2" id="ariaid-title7">DENSE_RANK Function</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        Returns an ascending sequence of integers, starting with 1. The output sequence produces duplicate integers
+        for duplicate values of the <code class="ph codeph">ORDER BY</code> expressions. After generating duplicate output values
+        for the <span class="q">"tied"</span> input values, the function continues the sequence with the next higher integer.
+        Therefore, the sequence contains duplicates but no gaps when the input contains duplicates. Starts the
+        sequence over for each group produced by the <code class="ph codeph">PARTITIONED BY</code> clause.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>DENSE_RANK() OVER([<var class="keyword varname">partition_by_clause</var>] <var class="keyword varname">order_by_clause</var>)</code></pre>
+
+      <p class="p">
+        The <code class="ph codeph">PARTITION BY</code> clause is optional. The <code class="ph codeph">ORDER BY</code> clause is required. The
+        window clause is not allowed.
+      </p>
+
+
+
+      <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+      <p class="p">
+        Often used for top-N and bottom-N queries. For example, it could produce a <span class="q">"top 10"</span> report including
+        all the items with the 10 highest values, even if several items tied for 1st place.
+      </p>
+
+      <p class="p">
+        Similar to <code class="ph codeph">ROW_NUMBER</code> and <code class="ph codeph">RANK</code>. These functions differ in how they treat
+        duplicate combinations of values.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.0.0</span>
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+      <p class="p">
+        The following example demonstrates how the <code class="ph codeph">DENSE_RANK()</code> function identifies where each
+        value <span class="q">"places"</span> in the result set, producing the same result for duplicate values, but with a strict
+        sequence from 1 to the number of groups. For example, when results are ordered by the <code class="ph codeph">X</code>
+        column, both <code class="ph codeph">1</code> values are tied for first; both <code class="ph codeph">2</code> values are tied for
+        second; and so on.
+      </p>
+
+<pre class="pre codeblock"><code>select x, dense_rank() over(order by x) as rank, property from int_t;
++----+------+----------+
+| x  | rank | property |
++----+------+----------+
+| 1  | 1    | square   |
+| 1  | 1    | odd      |
+| 2  | 2    | even     |
+| 2  | 2    | prime    |
+| 3  | 3    | prime    |
+| 3  | 3    | odd      |
+| 4  | 4    | even     |
+| 4  | 4    | square   |
+| 5  | 5    | odd      |
+| 5  | 5    | prime    |
+| 6  | 6    | even     |
+| 6  | 6    | perfect  |
+| 7  | 7    | lucky    |
+| 7  | 7    | lucky    |
+| 7  | 7    | lucky    |
+| 7  | 7    | odd      |
+| 7  | 7    | prime    |
+| 8  | 8    | even     |
+| 9  | 9    | square   |
+| 9  | 9    | odd      |
+| 10 | 10   | round    |
+| 10 | 10   | even     |
++----+------+----------+
+</code></pre>
+
+      <p class="p">
+        The following examples show how the <code class="ph codeph">DENSE_RANK()</code> function is affected by the
+        <code class="ph codeph">PARTITION</code> property within the <code class="ph codeph">ORDER BY</code> clause.
+      </p>
+
+      <p class="p">
+        Partitioning by the <code class="ph codeph">PROPERTY</code> column groups all the even, odd, and so on values together,
+        and <code class="ph codeph">DENSE_RANK()</code> returns the place of each value within the group, producing several
+        ascending sequences.
+      </p>
+
+<pre class="pre codeblock"><code>select x, dense_rank() over(partition by property order by x) as rank, property from int_t;
++----+------+----------+
+| x  | rank | property |
++----+------+----------+
+| 2  | 1    | even     |
+| 4  | 2    | even     |
+| 6  | 3    | even     |
+| 8  | 4    | even     |
+| 10 | 5    | even     |
+| 7  | 1    | lucky    |
+| 7  | 1    | lucky    |
+| 7  | 1    | lucky    |
+| 1  | 1    | odd      |
+| 3  | 2    | odd      |
+| 5  | 3    | odd      |
+| 7  | 4    | odd      |
+| 9  | 5    | odd      |
+| 6  | 1    | perfect  |
+| 2  | 1    | prime    |
+| 3  | 2    | prime    |
+| 5  | 3    | prime    |
+| 7  | 4    | prime    |
+| 10 | 1    | round    |
+| 1  | 1    | square   |
+| 4  | 2    | square   |
+| 9  | 3    | square   |
++----+------+----------+
+</code></pre>
+
+      <p class="p">
+        Partitioning by the <code class="ph codeph">X</code> column groups all the duplicate numbers together and returns the
+        place each value within the group; because each value occurs only 1 or 2 times,
+        <code class="ph codeph">DENSE_RANK()</code> designates each <code class="ph codeph">X</code> value as either first or second within its
+        group.
+      </p>
+
+<pre class="pre codeblock"><code>select x, dense_rank() over(partition by x order by property) as rank, property from int_t;
++----+------+----------+
+| x  | rank | property |
++----+------+----------+
+| 1  | 1    | odd      |
+| 1  | 2    | square   |
+| 2  | 1    | even     |
+| 2  | 2    | prime    |
+| 3  | 1    | odd      |
+| 3  | 2    | prime    |
+| 4  | 1    | even     |
+| 4  | 2    | square   |
+| 5  | 1    | odd      |
+| 5  | 2    | prime    |
+| 6  | 1    | even     |
+| 6  | 2    | perfect  |
+| 7  | 1    | lucky    |
+| 7  | 1    | lucky    |
+| 7  | 1    | lucky    |
+| 7  | 2    | odd      |
+| 7  | 3    | prime    |
+| 8  | 1    | even     |
+| 9  | 1    | odd      |
+| 9  | 2    | square   |
+| 10 | 1    | even     |
+| 10 | 2    | round    |
++----+------+----------+
+</code></pre>
+
+      <p class="p">
+        The following example shows how <code class="ph codeph">DENSE_RANK()</code> produces a continuous sequence while still
+        allowing for ties. In this case, Croesus and Midas both have the second largest fortune, while Crassus has
+        the third largest. (In <a class="xref" href="impala_analytic_functions.html#rank">RANK Function</a>, you see a similar query with the
+        <code class="ph codeph">RANK()</code> function that shows that while Crassus has the third largest fortune, he is the
+        fourth richest person.)
+      </p>
+
+<pre class="pre codeblock"><code>select dense_rank() over (order by net_worth desc) as placement, name, net_worth from wealth order by placement, name;
++-----------+---------+---------------+
+| placement | name    | net_worth     |
++-----------+---------+---------------+
+| 1         | Solomon | 2000000000.00 |
+| 2         | Croesus | 1000000000.00 |
+| 2         | Midas   | 1000000000.00 |
+| 3         | Crassus | 500000000.00  |
+| 4         | Scrooge | 80000000.00   |
++-----------+---------+---------------+
+</code></pre>
+
+      <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+      <p class="p">
+        <a class="xref" href="impala_analytic_functions.html#rank">RANK Function</a>, <a class="xref" href="impala_analytic_functions.html#row_number">ROW_NUMBER Function</a>
+      </p>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title8" id="analytic_functions__first_value">
+
+    <h2 class="title topictitle2" id="ariaid-title8">FIRST_VALUE Function</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        Returns the expression value from the first row in the window. The return value is <code class="ph codeph">NULL</code> if
+        the input expression is <code class="ph codeph">NULL</code>.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>FIRST_VALUE(<var class="keyword varname">expr</var>) OVER([<var class="keyword varname">partition_by_clause</var>] <var class="keyword varname">order_by_clause</var> [<var class="keyword varname">window_clause</var>])</code></pre>
+
+      <p class="p">
+        The <code class="ph codeph">PARTITION BY</code> clause is optional. The <code class="ph codeph">ORDER BY</code> clause is required. The
+        window clause is optional.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+      <p class="p">
+        If any duplicate values occur in the tuples evaluated by the <code class="ph codeph">ORDER BY</code> clause, the result
+        of this function is not deterministic. Consider adding additional <code class="ph codeph">ORDER BY</code> columns to
+        ensure consistent ordering.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.0.0</span>
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+      <p class="p">
+        The following example shows a table with a wide variety of country-appropriate greetings. For consistency,
+        we want to standardize on a single greeting for each country. The <code class="ph codeph">FIRST_VALUE()</code> function
+        helps to produce a mail merge report where every person from the same country is addressed with the same
+        greeting.
+      </p>
+
+<pre class="pre codeblock"><code>select name, country, greeting from mail_merge
++---------+---------+--------------+
+| name    | country | greeting     |
++---------+---------+--------------+
+| Pete    | USA     | Hello        |
+| John    | USA     | Hi           |
+| Boris   | Germany | Guten tag    |
+| Michael | Germany | Guten morgen |
+| Bjorn   | Sweden  | Hej          |
+| Mats    | Sweden  | Tja          |
++---------+---------+--------------+
+
+select country, name,
+  first_value(greeting)
+    over (partition by country order by name, greeting) as greeting
+  from mail_merge;
++---------+---------+-----------+
+| country | name    | greeting  |
++---------+---------+-----------+
+| Germany | Boris   | Guten tag |
+| Germany | Michael | Guten tag |
+| Sweden  | Bjorn   | Hej       |
+| Sweden  | Mats    | Hej       |
+| USA     | John    | Hi        |
+| USA     | Pete    | Hi        |
++---------+---------+-----------+
+</code></pre>
+
+      <p class="p">
+        Changing the order in which the names are evaluated changes which greeting is applied to each group.
+      </p>
+
+<pre class="pre codeblock"><code>select country, name,
+  first_value(greeting)
+    over (partition by country order by name desc, greeting) as greeting
+  from mail_merge;
++---------+---------+--------------+
+| country | name    | greeting     |
++---------+---------+--------------+
+| Germany | Michael | Guten morgen |
+| Germany | Boris   | Guten morgen |
+| Sweden  | Mats    | Tja          |
+| Sweden  | Bjorn   | Tja          |
+| USA     | Pete    | Hello        |
+| USA     | John    | Hello        |
++---------+---------+--------------+
+</code></pre>
+
+      <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+      <p class="p">
+        <a class="xref" href="impala_analytic_functions.html#last_value">LAST_VALUE Function</a>
+      </p>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title9" id="analytic_functions__lag">
+
+    <h2 class="title topictitle2" id="ariaid-title9">LAG Function</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        This function returns the value of an expression using column values from a preceding row. You specify an
+        integer offset, which designates a row position some number of rows previous to the current row. Any column
+        references in the expression argument refer to column values from that prior row. Typically, the table
+        contains a time sequence or numeric sequence column that clearly distinguishes the ordering of the rows.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>LAG (<var class="keyword varname">expr</var> [, <var class="keyword varname">offset</var>] [, <var class="keyword varname">default</var>])
+  OVER ([<var class="keyword varname">partition_by_clause</var>] <var class="keyword varname">order_by_clause</var>)</code></pre>
+
+      <p class="p">
+        The <code class="ph codeph">ORDER BY</code> clause is required. The <code class="ph codeph">PARTITION BY</code> clause is optional. The
+        window clause is not allowed.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+      <p class="p">
+        Sometimes used an an alternative to doing a self-join.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.0.0</span>
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+      <p class="p">
+        The following example uses the same stock data created in <a class="xref" href="#window_clause">Window Clause</a>. For each day, the
+        query prints the closing price alongside the previous day's closing price. The first row for each stock
+        symbol has no previous row, so that <code class="ph codeph">LAG()</code> value is <code class="ph codeph">NULL</code>.
+      </p>
+
+<pre class="pre codeblock"><code>select stock_symbol, closing_date, closing_price,
+    lag(closing_price,1) over (partition by stock_symbol order by closing_date) as "yesterday closing"
+  from stock_ticker
+    order by closing_date;
++--------------+---------------------+---------------+-------------------+
+| stock_symbol | closing_date        | closing_price | yesterday closing |
++--------------+---------------------+---------------+-------------------+
+| JDR          | 2014-09-13 00:00:00 | 12.86         | NULL              |
+| JDR          | 2014-09-14 00:00:00 | 12.89         | 12.86             |
+| JDR          | 2014-09-15 00:00:00 | 12.94         | 12.89             |
+| JDR          | 2014-09-16 00:00:00 | 12.55         | 12.94             |
+| JDR          | 2014-09-17 00:00:00 | 14.03         | 12.55             |
+| JDR          | 2014-09-18 00:00:00 | 14.75         | 14.03             |
+| JDR          | 2014-09-19 00:00:00 | 13.98         | 14.75             |
++--------------+---------------------+---------------+-------------------+
+</code></pre>
+
+      <p class="p">
+        The following example does an arithmetic operation between the current row and a value from the previous
+        row, to produce a delta value for each day. This example also demonstrates how <code class="ph codeph">ORDER BY</code>
+        works independently in the different parts of the query. The <code class="ph codeph">ORDER BY closing_date</code> in the
+        <code class="ph codeph">OVER</code> clause makes the query analyze the rows in chronological order. Then the outer query
+        block uses <code class="ph codeph">ORDER BY closing_date DESC</code> to present the results with the most recent date
+        first.
+      </p>
+
+<pre class="pre codeblock"><code>select stock_symbol, closing_date, closing_price,
+    cast(
+      closing_price - lag(closing_price,1) over
+        (partition by stock_symbol order by closing_date)
+      as decimal(8,2)
+    )
+    as "change from yesterday"
+  from stock_ticker
+    order by closing_date desc;
++--------------+---------------------+---------------+-----------------------+
+| stock_symbol | closing_date        | closing_price | change from yesterday |
++--------------+---------------------+---------------+-----------------------+
+| JDR          | 2014-09-19 00:00:00 | 13.98         | -0.76                 |
+| JDR          | 2014-09-18 00:00:00 | 14.75         | 0.72                  |
+| JDR          | 2014-09-17 00:00:00 | 14.03         | 1.47                  |
+| JDR          | 2014-09-16 00:00:00 | 12.55         | -0.38                 |
+| JDR          | 2014-09-15 00:00:00 | 12.94         | 0.04                  |
+| JDR          | 2014-09-14 00:00:00 | 12.89         | 0.03                  |
+| JDR          | 2014-09-13 00:00:00 | 12.86         | NULL                  |
++--------------+---------------------+---------------+-----------------------+
+</code></pre>
+
+      <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+      <p class="p">
+        This function is the converse of <a class="xref" href="impala_analytic_functions.html#lead">LEAD Function</a>.
+      </p>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title10" id="analytic_functions__last_value">
+
+    <h2 class="title topictitle2" id="ariaid-title10">LAST_VALUE Function</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        Returns the expression value from the last row in the window. This same value is repeated for all result
+        rows for the group. The return value is <code class="ph codeph">NULL</code> if the input expression is
+        <code class="ph codeph">NULL</code>.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>LAST_VALUE(<var class="keyword varname">expr</var>) OVER([<var class="keyword varname">partition_by_clause</var>] <var class="keyword varname">order_by_clause</var> [<var class="keyword varname">window_clause</var>])</code></pre>
+
+      <p class="p">
+        The <code class="ph codeph">PARTITION BY</code> clause is optional. The <code class="ph codeph">ORDER BY</code> clause is required. The
+        window clause is optional.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+      <p class="p">
+        If any duplicate values occur in the tuples evaluated by the <code class="ph codeph">ORDER BY</code> clause, the result
+        of this function is not deterministic. Consider adding additional <code class="ph codeph">ORDER BY</code> columns to
+        ensure consistent ordering.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.0.0</span>
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+      <p class="p">
+        The following example uses the same <code class="ph codeph">MAIL_MERGE</code> table as in the example for
+        <a class="xref" href="impala_analytic_functions.html#first_value">FIRST_VALUE Function</a>. Because the default window when <code class="ph codeph">ORDER
+        BY</code> is used is <code class="ph codeph">BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW</code>, the query requires the
+        <code class="ph codeph">UNBOUNDED FOLLOWING</code> to look ahead to subsequent rows and find the last value for each
+        country.
+      </p>
+
+<pre class="pre codeblock"><code>select country, name,
+  last_value(greeting) over (
+    partition by country order by name, greeting
+    rows between unbounded preceding and unbounded following
+  ) as greeting
+  from mail_merge
++---------+---------+--------------+
+| country | name    | greeting     |
++---------+---------+--------------+
+| Germany | Boris   | Guten morgen |
+| Germany | Michael | Guten morgen |
+| Sweden  | Bjorn   | Tja          |
+| Sweden  | Mats    | Tja          |
+| USA     | John    | Hello        |
+| USA     | Pete    | Hello        |
++---------+---------+--------------+
+</code></pre>
+
+      <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+      <p class="p">
+        <a class="xref" href="impala_analytic_functions.html#first_value">FIRST_VALUE Function</a>
+      </p>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title11" id="analytic_functions__lead">
+
+    <h2 class="title topictitle2" id="ariaid-title11">LEAD Function</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        This function returns the value of an expression using column values from a following row. You specify an
+        integer offset, which designates a row position some number of rows after to the current row. Any column
+        references in the expression argument refer to column values from that later row. Typically, the table
+        contains a time sequence or numeric sequence column that clearly distinguishes the ordering of the rows.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>LEAD (<var class="keyword varname">expr</var> [, <var class="keyword varname">offset</var>] [, <var class="keyword varname">default</var>])
+  OVER ([<var class="keyword varname">partition_by_clause</var>] <var class="keyword varname">order_by_clause</var>)</code></pre>
+
+      <p class="p">
+        The <code class="ph codeph">ORDER BY</code> clause is required. The <code class="ph codeph">PARTITION BY</code> clause is optional. The
+        window clause is not allowed.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+      <p class="p">
+        Sometimes used an an alternative to doing a self-join.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.0.0</span>
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+      <p class="p">
+        The following example uses the same stock data created in <a class="xref" href="#window_clause">Window Clause</a>. The query analyzes
+        the closing price for a stock symbol, and for each day evaluates if the closing price for the following day
+        is higher or lower.
+      </p>
+
+<pre class="pre codeblock"><code>select stock_symbol, closing_date, closing_price,
+  case
+    (lead(closing_price,1)
+      over (partition by stock_symbol order by closing_date)
+        - closing_price) &gt; 0
+    when true then "higher"
+    when false then "flat or lower"
+  end as "trending"
+from stock_ticker
+  order by closing_date;
++--------------+---------------------+---------------+---------------+
+| stock_symbol | closing_date        | closing_price | trending      |
++--------------+---------------------+---------------+---------------+
+| JDR          | 2014-09-13 00:00:00 | 12.86         | higher        |
+| JDR          | 2014-09-14 00:00:00 | 12.89         | higher        |
+| JDR          | 2014-09-15 00:00:00 | 12.94         | flat or lower |
+| JDR          | 2014-09-16 00:00:00 | 12.55         | higher        |
+| JDR          | 2014-09-17 00:00:00 | 14.03         | higher        |
+| JDR          | 2014-09-18 00:00:00 | 14.75         | flat or lower |
+| JDR          | 2014-09-19 00:00:00 | 13.98         | NULL          |
++--------------+---------------------+---------------+---------------+
+</code></pre>
+
+      <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+      <p class="p">
+        This function is the converse of <a class="xref" href="impala_analytic_functions.html#lag">LAG Function</a>.
+      </p>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title12" id="analytic_functions__max_analytic">
+
+    <h2 class="title topictitle2" id="ariaid-title12">MAX Function - Analytic Context</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        You can include an <code class="ph codeph">OVER</code> clause with a call to this function to use it as an analytic
+        function. See <a class="xref" href="impala_max.html#max">MAX Function</a> for details and examples.
+      </p>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title13" id="analytic_functions__min_analytic">
+
+    <h2 class="title topictitle2" id="ariaid-title13">MIN Function - Analytic Context</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        You can include an <code class="ph codeph">OVER</code> clause with a call to this function to use it as an analytic
+        function. See <a class="xref" href="impala_min.html#min">MIN Function</a> for details and examples.
+      </p>
+
+    </div>
+
+  </article>
+
+
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title14" id="analytic_functions__ntile">
+
+    <h2 class="title topictitle2" id="ariaid-title14">NTILE Function (<span class="keyword">Impala 2.3</span> or higher only)</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        Returns the <span class="q">"bucket number"</span> associated with each row, between 1 and the value of an expression. For
+        example, creating 100 buckets puts the lowest 1% of values in the first bucket, while creating 10 buckets
+        puts the lowest 10% of values in the first bucket. Each partition can have a different number of buckets.
+
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>NTILE (<var class="keyword varname">expr</var> [, <var class="keyword varname">offset</var> ...]
+  OVER ([<var class="keyword varname">partition_by_clause</var>] <var class="keyword varname">order_by_clause</var>)</code></pre>
+
+      <p class="p">
+        The <code class="ph codeph">ORDER BY</code> clause is required. The <code class="ph codeph">PARTITION BY</code> clause is optional. The
+        window clause is not allowed.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+      <p class="p">
+        The <span class="q">"ntile"</span> name is derived from the practice of dividing result sets into fourths (quartile), tenths
+        (decile), and so on. The <code class="ph codeph">NTILE()</code> function divides the result set based on an arbitrary
+        percentile value.
+      </p>
+
+      <p class="p">
+        The number of buckets must be a positive integer.
+      </p>
+
+      <p class="p">
+        The number of items in each bucket is identical or almost so, varying by at most 1. If the number of items
+        does not divide evenly between the buckets, the remaining N items are divided evenly among the first N
+        buckets.
+      </p>
+
+      <p class="p">
+        If the number of buckets N is greater than the number of input rows in the partition, then the first N
+        buckets each contain one item, and the remaining buckets are empty.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+      <p class="p">
+        The following example shows divides groups of animals into 4 buckets based on their weight. The
+        <code class="ph codeph">ORDER BY ... DESC</code> clause in the <code class="ph codeph">OVER()</code> clause means that the heaviest 25%
+        are in the first group, and the lightest 25% are in the fourth group. (The <code class="ph codeph">ORDER BY</code> in the
+        outermost part of the query shows how you can order the final result set independently from the order in
+        which the rows are evaluated by the <code class="ph codeph">OVER()</code> clause.) Because there are 9 rows in the group,
+        divided into 4 buckets, the first bucket receives the extra item.
+      </p>
+
+<pre class="pre codeblock"><code>create table animals (name string, kind string, kilos decimal(9,3));
+
+insert into animals values
+  ('Elephant', 'Mammal', 4000), ('Giraffe', 'Mammal', 1200), ('Mouse', 'Mammal', 0.020),
+  ('Condor', 'Bird', 15), ('Horse', 'Mammal', 500), ('Owl', 'Bird', 2.5),
+  ('Ostrich', 'Bird', 145), ('Polar bear', 'Mammal', 700), ('Housecat', 'Mammal', 5);
+
+select name, ntile(4) over (order by kilos desc) as quarter
+  from animals
+order by quarter desc;
++------------+---------+
+| name       | quarter |
++------------+---------+
+| Owl        | 4       |
+| Mouse      | 4       |
+| Condor     | 3       |
+| Housecat   | 3       |
+| Horse      | 2       |
+| Ostrich    | 2       |
+| Elephant   | 1       |
+| Giraffe    | 1       |
+| Polar bear | 1       |
++------------+---------+
+</code></pre>
+
+      <p class="p">
+        The following examples show how the <code class="ph codeph">PARTITION</code> clause works for the
+        <code class="ph codeph">NTILE()</code> function. Here, we divide each kind of animal (mammal or bird) into 2 buckets,
+        the heavier half and the lighter half.
+      </p>
+
+<pre class="pre codeblock"><code>select name, kind, ntile(2) over (partition by kind order by kilos desc) as half
+  from animals
+order by kind;
++------------+--------+------+
+| name       | kind   | half |
++------------+--------+------+
+| Ostrich    | Bird   | 1    |
+| Condor     | Bird   | 1    |
+| Owl        | Bird   | 2    |
+| Elephant   | Mammal | 1    |
+| Giraffe    | Mammal | 1    |
+| Polar bear | Mammal | 1    |
+| Horse      | Mammal | 2    |
+| Housecat   | Mammal | 2    |
+| Mouse      | Mammal | 2    |
++------------+--------+------+
+</code></pre>
+
+      <p class="p">
+        Again, the result set can be ordered independently
+        from the analytic evaluation. This next example lists all the animals heaviest to lightest,
+        showing that elephant and giraffe are in the <span class="q">"top half"</span> of mammals by weight, while
+        housecat and mouse are in the <span class="q">"bottom half"</span>.
+      </p>
+
+<pre class="pre codeblock"><code>select name, kind, ntile(2) over (partition by kind order by kilos desc) as half
+  from animals
+order by kilos desc;
++------------+--------+------+
+| name       | kind   | half |
++------------+--------+------+
+| Elephant   | Mammal | 1    |
+| Giraffe    | Mammal | 1    |
+| Polar bear | Mammal | 1    |
+| Horse      | Mammal | 2    |
+| Ostrich    | Bird   | 1    |
+| Condor     | Bird   | 1    |
+| Housecat   | Mammal | 2    |
+| Owl        | Bird   | 2    |
+| Mouse      | Mammal | 2    |
++------------+--------+------+
+</code></pre>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title15" id="analytic_functions__percent_rank">
+
+    <h2 class="title topictitle2" id="ariaid-title15">PERCENT_RANK Function (<span class="keyword">Impala 2.3</span> or higher only)</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>PERCENT_RANK (<var class="keyword varname">expr</var>)
+  OVER ([<var class="keyword varname">partition_by_clause</var>] <var class="keyword varname">order_by_clause</var>)
+</code></pre>
+
+      <p class="p">
+      Calculates the rank, expressed as a percentage, of each row within a group of rows.
+      If <code class="ph codeph">rank</code> is the value for that same row from the <code class="ph codeph">RANK()</code> function (from 1 to the total number of rows in the partition group),
+      then the <code class="ph codeph">PERCENT_RANK()</code> value is calculated as <code class="ph codeph">(<var class="keyword varname">rank</var> - 1) / (<var class="keyword varname">rows_in_group</var> - 1)</code> .
+      If there is only a single item in the partition group, its <code class="ph codeph">PERCENT_RANK()</code> value is 0.
+      </p>
+
+      <p class="p">
+        The <code class="ph codeph">ORDER BY</code> clause is required. The <code class="ph codeph">PARTITION BY</code> clause is optional. The
+        window clause is not allowed.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+      <p class="p">
+        This function is similar to the <code class="ph codeph">RANK</code> and <code class="ph codeph">CUME_DIST()</code> functions: it returns an ascending sequence representing the position of each
+        row within the rows of the same partition group. The actual numeric sequence is calculated differently,
+        and the handling of duplicate (tied) values is different.
+      </p>
+
+      <p class="p">
+        The return values range from 0 to 1 inclusive.
+        The first row in each partition group always has the value 0.
+        A <code class="ph codeph">NULL</code> value is considered the lowest possible value.
+        In the case of duplicate input values, all the corresponding rows in the result set
+        have an identical value: the lowest <code class="ph codeph">PERCENT_RANK()</code> value of those
+        tied rows. (In contrast to <code class="ph codeph">CUME_DIST()</code>, where all tied rows have
+        the highest <code class="ph codeph">CUME_DIST()</code> value.)
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+      <p class="p">
+        The following example uses the same <code class="ph codeph">ANIMALS</code> table as the examples for <code class="ph codeph">CUME_DIST()</code>
+        and <code class="ph codeph">NTILE()</code>, with a few additional rows to illustrate the results where some values are
+        <code class="ph codeph">NULL</code> or there is only a single row in a partition group.
+      </p>
+
+<pre class="pre codeblock"><code>insert into animals values ('Komodo dragon', 'Reptile', 70);
+insert into animals values ('Unicorn', 'Mythical', NULL);
+insert into animals values ('Fire-breathing dragon', 'Mythical', NULL);
+</code></pre>
+
+      <p class="p">
+        As with <code class="ph codeph">CUME_DIST()</code>, there is an ascending sequence for each kind of animal.
+        For example, the <span class="q">"Birds"</span> and <span class="q">"Mammals"</span> rows each have a <code class="ph codeph">PERCENT_RANK()</code> sequence
+        that ranges from 0 to 1.
+        The <span class="q">"Reptile"</span> row has a <code class="ph codeph">PERCENT_RANK()</code> of 0 because that partition group contains only a single item.
+        Both <span class="q">"Mythical"</span> animals have a <code class="ph codeph">PERCENT_RANK()</code> of 0 because
+        a <code class="ph codeph">NULL</code> is considered the lowest value within its partition group.
+      </p>
+
+<pre class="pre codeblock"><code>select name, kind, percent_rank() over (partition by kind order by kilos) from animals;
++-----------------------+----------+--------------------------+
+| name                  | kind     | percent_rank() OVER(...) |
++-----------------------+----------+--------------------------+
+| Mouse                 | Mammal   | 0                        |
+| Housecat              | Mammal   | 0.2                      |
+| Horse                 | Mammal   | 0.4                      |
+| Polar bear            | Mammal   | 0.6                      |
+| Giraffe               | Mammal   | 0.8                      |
+| Elephant              | Mammal   | 1                        |
+| Komodo dragon         | Reptile  | 0                        |
+| Owl                   | Bird     | 0                        |
+| California Condor     | Bird     | 0.25                     |
+| Andean Condor         | Bird     | 0.25                     |
+| Condor                | Bird     | 0.25                     |
+| Ostrich               | Bird     | 1                        |
+| Fire-breathing dragon | Mythical | 0                        |
+| Unicorn               | Mythical | 0                        |
++-----------------------+----------+--------------------------+
+</code></pre>
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title16" id="analytic_functions__rank">
+
+    <h2 class="title topictitle2" id="ariaid-title16">RANK Function</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        Returns an ascending sequence of integers, starting with 1. The output sequence produces duplicate integers
+        for duplicate values of the <code class="ph codeph">ORDER BY</code> expressions. After generating duplicate output values
+        for the <span class="q">"tied"</span> input values, the function increments the sequence by the number of tied values.
+        Therefore, the sequence contains both duplicates and gaps when the input contains duplicates. Starts the
+        sequence over for each group produced by the <code class="ph codeph">PARTITIONED BY</code> clause.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>RANK() OVER([<var class="keyword varname">partition_by_clause</var>] <var class="keyword varname">order_by_clause</var>)</code></pre>
+
+      <p class="p">
+        The <code class="ph codeph">PARTITION BY</code> clause is optional. The <code class="ph codeph">ORDER BY</code> clause is required. The
+        window clause is not allowed.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+
+
+      <p class="p">
+        Often used for top-N and bottom-N queries. For example, it could produce a <span class="q">"top 10"</span> report including
+        several items that were tied for 10th place.
+      </p>
+
+      <p class="p">
+        Similar to <code class="ph codeph">ROW_NUMBER</code> and <code class="ph codeph">DENSE_RANK</code>. These functions differ in how they
+        treat duplicate combinations of values.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.0.0</span>
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+      <p class="p">
+        The following example demonstrates how the <code class="ph codeph">RANK()</code> function identifies where each value
+        <span class="q">"places"</span> in the result set, producing the same result for duplicate values, and skipping values in the
+        sequence to account for the number of duplicates. For example, when results are ordered by the
+        <code class="ph codeph">X</code> column, both <code class="ph codeph">1</code> values are tied for first; both <code class="ph codeph">2</code>
+        values are tied for third; and so on.
+      </p>
+
+<pre class="pre codeblock"><code>select x, rank() over(order by x) as rank, property from int_t;
++----+------+----------+
+| x  | rank | property |
++----+------+----------+
+| 1  | 1    | square   |
+| 1  | 1    | odd      |
+| 2  | 3    | even     |
+| 2  | 3    | prime    |
+| 3  | 5    | prime    |
+| 3  | 5    | odd      |
+| 4  | 7    | even     |
+| 4  | 7    | square   |
+| 5  | 9    | odd      |
+| 5  | 9    | prime    |
+| 6  | 11   | even     |
+| 6  | 11   | perfect  |
+| 7  | 13   | lucky    |
+| 7  | 13   | lucky    |
+| 7  | 13   | lucky    |
+| 7  | 13   | odd      |
+| 7  | 13   | prime    |
+| 8  | 18   | even     |
+| 9  | 19   | square   |
+| 9  | 19   | odd      |
+| 10 | 21   | round    |
+| 10 | 21   | even     |
++----+------+----------+
+</code></pre>
+
+      <p class="p">
+        The following examples show how the <code class="ph codeph">RANK()</code> function is affected by the
+        <code class="ph codeph">PARTITION</code> property within the <code class="ph codeph">ORDER BY</code> clause.
+      </p>
+
+      <p class="p">
+        Partitioning by the <code class="ph codeph">PROPERTY</code> column groups all the even, odd, and so on values together,
+        and <code class="ph codeph">RANK()</code> returns the place of each value within the group, producing several ascending
+        sequences.
+      </p>
+
+<pre class="pre codeblock"><code>select x, rank() over(partition by property order by x) as rank, property from int_t;
++----+------+----------+
+| x  | rank | property |
++----+------+----------+
+| 2  | 1    | even     |
+| 4  | 2    | even     |
+| 6  | 3    | even     |
+| 8  | 4    | even     |
+| 10 | 5    | even     |
+| 7  | 1    | lucky    |
+| 7  | 1    | lucky    |
+| 7  | 1    | lucky    |
+| 1  | 1    | odd      |
+| 3  | 2    | odd      |
+| 5  | 3    | odd      |
+| 7  | 4    | odd      |
+| 9  | 5    | odd      |
+| 6  | 1    | perfect  |
+| 2  | 1    | prime    |
+| 3  | 2    | prime    |
+| 5  | 3    | prime    |
+| 7  | 4    | prime    |
+| 10 | 1    | round    |
+| 1  | 1    | square   |
+| 4  | 2    | square   |
+| 9  | 3    | square   |
++----+------+----------+
+</code></pre>
+
+      <p class="p">
+        Partitioning by the <code class="ph codeph">X</code> column groups all the duplicate numbers together and returns the
+        place each value within the group; because each value occurs only 1 or 2 times,
+        <code class="ph codeph">RANK()</code> designates each <code class="ph codeph">X</code> value as either first or second within its
+        group.
+      </p>
+
+<pre class="pre codeblock"><code>select x, rank() over(partition by x order by property) as rank, property from int_t;
++----+------+----------+
+| x  | rank | property |
++----+------+----------+
+| 1  | 1    | odd      |
+| 1  | 2    | square   |
+| 2  | 1    | even     |
+| 2  | 2    | prime    |
+| 3  | 1    | odd      |
+| 3  | 2    | prime    |
+| 4  | 1    | even     |
+| 4  | 2    | square   |
+| 5  | 1    | odd      |
+| 5  | 2    | prime    |
+| 6  | 1    | even     |
+| 6  | 2    | perfect  |
+| 7  | 1    | lucky    |
+| 7  | 1    | lucky    |
+| 7  | 1    | lucky    |
+| 7  | 4    | odd      |
+| 7  | 5    | prime    |
+| 8  | 1    | even     |
+| 9  | 1    | odd      |
+| 9  | 2    | square   |
+| 10 | 1    | even     |
+| 10 | 2    | round    |
++----+------+----------+
+</code></pre>
+
+      <p class="p">
+        The following example shows how a magazine might prepare a list of history's wealthiest people. Croesus and
+        Midas are tied for second, then Crassus is fourth.
+      </p>
+
+<pre class="pre codeblock"><code>select rank() over (order by net_worth desc) as rank, name, net_worth from wealth order by rank, name;
++------+---------+---------------+
+| rank | name    | net_worth     |
++------+---------+---------------+
+| 1    | Solomon | 2000000000.00 |
+| 2    | Croesus | 1000000000.00 |
+| 2    | Midas   | 1000000000.00 |
+| 4    | Crassus | 500000000.00  |
+| 5    | Scrooge | 80000000.00   |
++------+---------+---------------+
+</code></pre>
+
+      <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+      <p class="p">
+        <a class="xref" href="impala_analytic_functions.html#dense_rank">DENSE_RANK Function</a>,
+        <a class="xref" href="impala_analytic_functions.html#row_number">ROW_NUMBER Function</a>
+      </p>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title17" id="analytic_functions__row_number">
+
+    <h2 class="title topictitle2" id="ariaid-title17">ROW_NUMBER Function</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        Returns an ascending sequence of integers, starting with 1. Starts the sequence over for each group
+        produced by the <code class="ph codeph">PARTITIONED BY</code> clause. The output sequence includes different values for
+        duplicate input values. Therefore, the sequence never contains any duplicates or gaps, regardless of
+        duplicate input values.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>ROW_NUMBER() OVER([<var class="keyword varname">partition_by_clause</var>] <var class="keyword varname">order_by_clause</var>)</code></pre>
+
+      <p class="p">
+        The <code class="ph codeph">ORDER BY</code> clause is required. The <code class="ph codeph">PARTITION BY</code> clause is optional. The
+        window clause is not allowed.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+      <p class="p">
+        Often used for top-N and bottom-N queries where the input values are known to be unique, or precisely N
+        rows are needed regardless of duplicate values.
+      </p>
+
+      <p class="p">
+        Because its result value is different for each row in the result set (when used without a <code class="ph codeph">PARTITION
+        BY</code> clause), <code class="ph codeph">ROW_NUMBER()</code> can be used to synthesize unique numeric ID values, for
+        example for result sets involving unique values or tuples.
+      </p>
+
+      <p class="p">
+        Similar to <code class="ph codeph">RANK</code> and <code class="ph codeph">DENSE_RANK</code>. These functions differ in how they treat
+        duplicate combinations of values.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.0.0</span>
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+      <p class="p">
+        The following example demonstrates how <code class="ph codeph">ROW_NUMBER()</code> produces a continuous numeric
+        sequence, even though some values of <code class="ph codeph">X</code> are repeated.
+      </p>
+
+<pre class="pre codeblock"><code>select x, row_number() over(order by x, property) as row_number, property from int_t;
++----+------------+----------+
+| x  | row_number | property |
++----+------------+----------+
+| 1  | 1          | odd      |
+| 1  | 2          | square   |
+| 2  | 3          | even     |
+| 2  | 4          | prime    |
+| 3  | 5          | odd      |
+| 3  | 6          | prime    |
+| 4  | 7          | even     |
+| 4  | 8          | square   |
+| 5  | 9          | odd      |
+| 5  | 10         | prime    |
+| 6  | 11         | even     |
+| 6  | 12         | perfect  |
+| 7  | 13         | lucky    |
+| 7  | 14         | lucky    |
+| 7  | 15         | lucky    |
+| 7  | 16         | odd      |
+| 7  | 17         | prime    |
+| 8  | 18         | even     |
+| 9  | 19         | odd      |
+| 9  | 20         | square   |
+| 10 | 21         | even     |
+| 10 | 22         | round    |
++----+------------+----------+
+</code></pre>
+
+      <p class="p">
+        The following example shows how a financial institution might assign customer IDs to some of history's
+        wealthiest figures. Although two of the people have identical net worth figures, unique IDs are required
+        for this purpose. <code class="ph codeph">ROW_NUMBER()</code> produces a sequence of five different values for the five
+        input rows.
+      </p>
+
+<pre class="pre codeblock"><code>select row_number() over (order by net_worth desc) as account_id, name, net_worth
+  from wealth order by account_id, name;
++------------+---------+---------------+
+| account_id | name    | net_worth     |
++------------+---------+---------------+
+| 1          | Solomon | 2000000000.00 |
+| 2          | Croesus | 1000000000.00 |
+| 3          | Midas   | 1000000000.00 |
+| 4          | Crassus | 500000000.00  |
+| 5          | Scrooge | 80000000.00   |
++------------+---------+---------------+
+</code></pre>
+
+      <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+      <p class="p">
+        <a class="xref" href="impala_analytic_functions.html#rank">RANK Function</a>, <a class="xref" href="impala_analytic_functions.html#dense_rank">DENSE_RANK Function</a>
+      </p>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title18" id="analytic_functions__sum_analytic">
+
+    <h2 class="title topictitle2" id="ariaid-title18">SUM Function - Analytic Context</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        You can include an <code class="ph codeph">OVER</code> clause with a call to this function to use it as an analytic
+        function. See <a class="xref" href="impala_sum.html#sum">SUM Function</a> for details and examples.
+      </p>
+
+    </div>
+
+  </article>
+
+</article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_appx_count_distinct.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_appx_count_distinct.html b/docs/build3x/html/topics/impala_appx_count_distinct.html
new file mode 100644
index 0000000..c42c2ca
--- /dev/null
+++ b/docs/build3x/html/topics/impala_appx_count_distinct.html
@@ -0,0 +1,82 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="appx_count_distinct"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>APPX_COUNT_DISTINCT Query Option (Impala 2.0 or higher only)</title></head><body id="appx_count_distinct"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">APPX_COUNT_DISTINCT Query Option (<span class="keyword">Impala 2.0</span> or higher only)</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+
+      Allows multiple <code class="ph codeph">COUNT(DISTINCT)</code> operations within a single query, by internally rewriting
+      each <code class="ph codeph">COUNT(DISTINCT)</code> to use the <code class="ph codeph">NDV()</code> function. The resulting count is
+      approximate rather than precise.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Type:</strong> Boolean; recognized values are 1 and 0, or <code class="ph codeph">true</code> and <code class="ph codeph">false</code>;
+        any other value interpreted as <code class="ph codeph">false</code>
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Default:</strong> <code class="ph codeph">false</code> (shown as 0 in output of <code class="ph codeph">SET</code> statement)
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+    <p class="p">
+      The following examples show how the <code class="ph codeph">APPX_COUNT_DISTINCT</code> lets you work around the restriction
+      where a query can only evaluate <code class="ph codeph">COUNT(DISTINCT <var class="keyword varname">col_name</var>)</code> for a single
+      column. By default, you can count the distinct values of one column or another, but not both in a single
+      query:
+    </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; select count(distinct x) from int_t;
++-------------------+
+| count(distinct x) |
++-------------------+
+| 10                |
++-------------------+
+[localhost:21000] &gt; select count(distinct property) from int_t;
++--------------------------+
+| count(distinct property) |
++--------------------------+
+| 7                        |
++--------------------------+
+[localhost:21000] &gt; select count(distinct x), count(distinct property) from int_t;
+ERROR: AnalysisException: all DISTINCT aggregate functions need to have the same set of parameters
+as count(DISTINCT x); deviating function: count(DISTINCT property)
+</code></pre>
+
+    <p class="p">
+      When you enable the <code class="ph codeph">APPX_COUNT_DISTINCT</code> query option, now the query with multiple
+      <code class="ph codeph">COUNT(DISTINCT)</code> works. The reason this behavior requires a query option is that each
+      <code class="ph codeph">COUNT(DISTINCT)</code> is rewritten internally to use the <code class="ph codeph">NDV()</code> function instead,
+      which provides an approximate result rather than a precise count.
+    </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; set APPX_COUNT_DISTINCT=true;
+[localhost:21000] &gt; select count(distinct x), count(distinct property) from int_t;
++-------------------+--------------------------+
+| count(distinct x) | count(distinct property) |
++-------------------+--------------------------+
+| 10                | 7                        |
++-------------------+--------------------------+
+</code></pre>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+    <p class="p">
+      <a class="xref" href="impala_count.html#count">COUNT Function</a>,
+      <a class="xref" href="impala_distinct.html#distinct">DISTINCT Operator</a>,
+      <a class="xref" href="impala_ndv.html#ndv">NDV Function</a>
+    </p>
+
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>

[26/51] [partial] impala git commit: [DOCS] Impala doc site update for 3.0

Posted by mi...@apache.org.

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_kerberos.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_kerberos.html b/docs/build3x/html/topics/impala_kerberos.html
new file mode 100644
index 0000000..582c7da
--- /dev/null
+++ b/docs/build3x/html/topics/impala_kerberos.html
@@ -0,0 +1,342 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_authentication.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 
 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="kerberos"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Enabling Kerberos Authentication for Impala</title></head><body id="kerberos"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Enabling Kerberos Authentication for Impala</h1>
+
+
+  <div class="body conbody">
+
+    <p class="p">
+      Impala supports an enterprise-grade authentication system called Kerberos. Kerberos provides strong security benefits including
+      capabilities that render intercepted authentication packets unusable by an attacker. It virtually eliminates the threat of
+      impersonation by never sending a user's credentials in cleartext over the network. For more information on Kerberos, visit
+      the <a class="xref" href="https://web.mit.edu/kerberos/" target="_blank">MIT Kerberos website</a>.
+    </p>
+
+    <p class="p">
+      The rest of this topic assumes you have a working <a class="xref" href="https://web.mit.edu/kerberos/krb5-latest/doc/admin/install_kdc.html" target="_blank">Kerberos Key Distribution Center (KDC)</a>
+      set up. To enable Kerberos, you first create a Kerberos principal for each host running
+      <span class="keyword cmdname">impalad</span> or <span class="keyword cmdname">statestored</span>.
+    </p>
+
+    <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+      Regardless of the authentication mechanism used, Impala always creates HDFS directories and data files
+      owned by the same user (typically <code class="ph codeph">impala</code>). To implement user-level access to different
+      databases, tables, columns, partitions, and so on, use the Sentry authorization feature, as explained in
+      <a class="xref" href="../shared/../topics/impala_authorization.html#authorization">Enabling Sentry Authorization for Impala</a>.
+    </div>
+
+    <p class="p">
+      An alternative form of authentication you can use is LDAP, described in <a class="xref" href="impala_ldap.html#ldap">Enabling LDAP Authentication for Impala</a>.
+    </p>
+
+    <p class="p toc inpage"></p>
+
+  </div>
+
+  <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_authentication.html">Impala Authentication</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="kerberos__kerberos_prereqs">
+
+    <h2 class="title topictitle2" id="ariaid-title2">Requirements for Using Impala with Kerberos</h2>
+
+
+    <div class="body conbody">
+
+      <div class="p">
+        On version 5 of Red Hat Enterprise Linux and comparable distributions, some additional setup is needed for
+        the <span class="keyword cmdname">impala-shell</span> interpreter to connect to a Kerberos-enabled Impala cluster:
+<pre class="pre codeblock"><code>sudo yum install python-devel openssl-devel python-pip
+sudo pip-python install ssl</code></pre>
+      </div>
+
+      <div class="note important note_important"><span class="note__title importanttitle">Important:</span>
+        <p class="p">
+          If you plan to use Impala in your cluster, you must configure your KDC to allow tickets to be renewed,
+          and you must configure <span class="ph filepath">krb5.conf</span> to request renewable tickets. Typically, you can do
+          this by adding the <code class="ph codeph">max_renewable_life</code> setting to your realm in
+          <span class="ph filepath">kdc.conf</span>, and by adding the <span class="ph filepath">renew_lifetime</span> parameter to the
+          <span class="ph filepath">libdefaults</span> section of <span class="ph filepath">krb5.conf</span>. For more information about
+          renewable tickets, see the
+          <a class="xref" href="http://web.mit.edu/Kerberos/krb5-1.8/" target="_blank"> Kerberos
+          documentation</a>.
+        </p>
+        <p class="p">
+          Currently, you cannot use the resource management feature on a cluster that has Kerberos
+          authentication enabled.
+        </p>
+      </div>
+
+      <p class="p">
+        Start all <span class="keyword cmdname">impalad</span> and <span class="keyword cmdname">statestored</span> daemons with the
+        <code class="ph codeph">--principal</code> and <code class="ph codeph">--keytab-file</code> flags set to the principal and full path
+        name of the <code class="ph codeph">keytab</code> file containing the credentials for the principal.
+      </p>
+
+      <p class="p">
+        To enable Kerberos in the Impala shell, start the <span class="keyword cmdname">impala-shell</span> command using the
+        <code class="ph codeph">-k</code> flag.
+      </p>
+
+      <p class="p">
+        To enable Impala to work with Kerberos security on your Hadoop cluster, make sure you perform the
+        installation and configuration steps in
+        <a class="xref" href="https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/SecureMode.html#Authentication" target="_blank">Authentication in Hadoop</a>.
+        Note that when Kerberos security is enabled in Impala, a web browser that
+        supports Kerberos HTTP SPNEGO is required to access the Impala web console (for example, Firefox, Internet
+        Explorer, or Chrome).
+      </p>
+
+      <p class="p">
+        If the NameNode, Secondary NameNode, DataNode, JobTracker, TaskTrackers, ResourceManager, NodeManagers,
+        HttpFS, Oozie, Impala, or Impala statestore services are configured to use Kerberos HTTP SPNEGO
+        authentication, and two or more of these services are running on the same host, then all of the running
+        services must use the same HTTP principal and keytab file used for their HTTP endpoints.
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="kerberos__kerberos_config">
+
+    <h2 class="title topictitle2" id="ariaid-title3">Configuring Impala to Support Kerberos Security</h2>
+
+
+    <div class="body conbody">
+
+      <p class="p">
+        Enabling Kerberos authentication for Impala involves steps that can be summarized as follows:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          Creating service principals for Impala and the HTTP service. Principal names take the form:
+          <code class="ph codeph"><var class="keyword varname">serviceName</var>/<var class="keyword varname">fully.qualified.domain.name</var>@<var class="keyword varname">KERBEROS.REALM</var></code>.
+          <p class="p">
+        In Impala 2.0 and later, <code class="ph codeph">user()</code> returns the full Kerberos principal string, such as
+        <code class="ph codeph">user@example.com</code>, in a Kerberized environment.
+      </p>
+        </li>
+
+        <li class="li">
+          Creating, merging, and distributing key tab files for these principals.
+        </li>
+
+        <li class="li">
+          Editing <code class="ph codeph">/etc/default/impala</code>
+          to accommodate Kerberos authentication.
+        </li>
+      </ul>
+    </div>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title4" id="kerberos_config__kerberos_setup">
+
+      <h3 class="title topictitle3" id="ariaid-title4">Enabling Kerberos for Impala</h3>
+
+      <div class="body conbody">
+
+
+
+        <ol class="ol">
+          <li class="li">
+            Create an Impala service principal, specifying the name of the OS user that the Impala daemons run
+            under, the fully qualified domain name of each node running <span class="keyword cmdname">impalad</span>, and the realm
+            name. For example:
+<pre class="pre codeblock"><code>$ kadmin
+kadmin: addprinc -requires_preauth -randkey impala/impala_host.example.com@TEST.EXAMPLE.COM</code></pre>
+          </li>
+
+          <li class="li">
+            Create an HTTP service principal. For example:
+<pre class="pre codeblock"><code>kadmin: addprinc -randkey HTTP/impala_host.example.com@TEST.EXAMPLE.COM</code></pre>
+            <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+              The <code class="ph codeph">HTTP</code> component of the service principal must be uppercase as shown in the
+              preceding example.
+            </div>
+          </li>
+
+          <li class="li">
+            Create <code class="ph codeph">keytab</code> files with both principals. For example:
+<pre class="pre codeblock"><code>kadmin: xst -k impala.keytab impala/impala_host.example.com
+kadmin: xst -k http.keytab HTTP/impala_host.example.com
+kadmin: quit</code></pre>
+          </li>
+
+          <li class="li">
+            Use <code class="ph codeph">ktutil</code> to read the contents of the two keytab files and then write those contents
+            to a new file. For example:
+<pre class="pre codeblock"><code>$ ktutil
+ktutil: rkt impala.keytab
+ktutil: rkt http.keytab
+ktutil: wkt impala-http.keytab
+ktutil: quit</code></pre>
+          </li>
+
+          <li class="li">
+            (Optional) Test that credentials in the merged keytab file are valid, and that the <span class="q">"renew until"</span>
+            date is in the future. For example:
+<pre class="pre codeblock"><code>$ klist -e -k -t impala-http.keytab</code></pre>
+          </li>
+
+          <li class="li">
+            Copy the <span class="ph filepath">impala-http.keytab</span> file to the Impala configuration directory. Change the
+            permissions to be only read for the file owner and change the file owner to the <code class="ph codeph">impala</code>
+            user. By default, the Impala user and group are both named <code class="ph codeph">impala</code>. For example:
+<pre class="pre codeblock"><code>$ cp impala-http.keytab /etc/impala/conf
+$ cd /etc/impala/conf
+$ chmod 400 impala-http.keytab
+$ chown impala:impala impala-http.keytab</code></pre>
+          </li>
+
+          <li class="li">
+            Add Kerberos options to the Impala defaults file, <span class="ph filepath">/etc/default/impala</span>. Add the
+            options for both the <span class="keyword cmdname">impalad</span> and <span class="keyword cmdname">statestored</span> daemons, using the
+            <code class="ph codeph">IMPALA_SERVER_ARGS</code> and <code class="ph codeph">IMPALA_STATE_STORE_ARGS</code> variables. For
+            example, you might add:
+
+<pre class="pre codeblock"><code>-kerberos_reinit_interval=60
+-principal=impala_1/impala_host.example.com@TEST.EXAMPLE.COM
+-keytab_file=<var class="keyword varname">/path/to/impala.keytab</var></code></pre>
+            <p class="p">
+              For more information on changing the Impala defaults specified in
+              <span class="ph filepath">/etc/default/impala</span>, see
+              <a class="xref" href="impala_config_options.html#config_options">Modifying Impala Startup
+              Options</a>.
+            </p>
+          </li>
+        </ol>
+
+        <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+          Restart <span class="keyword cmdname">impalad</span> and <span class="keyword cmdname">statestored</span> for these configuration changes to
+          take effect.
+        </div>
+      </div>
+    </article>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title5" id="kerberos__kerberos_proxy">
+
+    <h2 class="title topictitle2" id="ariaid-title5">Enabling Kerberos for Impala with a Proxy Server</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        A common configuration for Impala with High Availability is to use a proxy server to submit requests to the
+        actual <span class="keyword cmdname">impalad</span> daemons on different hosts in the cluster. This configuration avoids
+        connection problems in case of machine failure, because the proxy server can route new requests through one
+        of the remaining hosts in the cluster. This configuration also helps with load balancing, because the
+        additional overhead of being the <span class="q">"coordinator node"</span> for each query is spread across multiple hosts.
+      </p>
+
+      <p class="p">
+        Although you can set up a proxy server with or without Kerberos authentication, typically users set up a
+        secure Kerberized configuration. For information about setting up a proxy server for Impala, including
+        Kerberos-specific steps, see <a class="xref" href="impala_proxy.html#proxy">Using Impala through a Proxy for High Availability</a>.
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title6" id="kerberos__spnego">
+
+    <h2 class="title topictitle2" id="ariaid-title6">Using a Web Browser to Access a URL Protected by Kerberos HTTP SPNEGO</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        Your web browser must support Kerberos HTTP SPNEGO. For example, Chrome, Firefox, or Internet Explorer.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">To configure Firefox to access a URL protected by Kerberos HTTP SPNEGO:</strong>
+      </p>
+
+      <ol class="ol">
+        <li class="li">
+          Open the advanced settings Firefox configuration page by loading the <code class="ph codeph">about:config</code> page.
+        </li>
+
+        <li class="li">
+          Use the <strong class="ph b">Filter</strong> text box to find <code class="ph codeph">network.negotiate-auth.trusted-uris</code>.
+        </li>
+
+        <li class="li">
+          Double-click the <code class="ph codeph">network.negotiate-auth.trusted-uris</code> preference and enter the hostname
+          or the domain of the web server that is protected by Kerberos HTTP SPNEGO. Separate multiple domains and
+          hostnames with a comma.
+        </li>
+
+        <li class="li">
+          Click <strong class="ph b">OK</strong>.
+        </li>
+      </ol>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title7" id="kerberos__kerberos_delegation">
+    <h2 class="title topictitle2" id="ariaid-title7">Enabling Impala Delegation for Kerberos Users</h2>
+    <div class="body conbody">
+      <p class="p">
+        See <a class="xref" href="impala_delegation.html#delegation">Configuring Impala Delegation for Hue and BI Tools</a> for details about the delegation feature
+        that lets certain users submit queries using the credentials of other users.
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title8" id="kerberos__ssl_jdbc_odbc">
+    <h2 class="title topictitle2" id="ariaid-title8">Using TLS/SSL with Business Intelligence Tools</h2>
+    <div class="body conbody">
+      <p class="p">
+        You can use Kerberos authentication, TLS/SSL encryption, or both to secure
+        connections from JDBC and ODBC applications to Impala.
+        See <a class="xref" href="impala_jdbc.html#impala_jdbc">Configuring Impala to Work with JDBC</a> and <a class="xref" href="impala_odbc.html#impala_odbc">Configuring Impala to Work with ODBC</a>
+        for details.
+      </p>
+
+      <p class="p">
+        Prior to <span class="keyword">Impala 2.5</span>, the Hive JDBC driver did not support connections that use both Kerberos authentication
+        and SSL encryption. If your cluster is running an older release that has this restriction,
+        use an alternative JDBC driver that supports
+        both of these security features.
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title9" id="kerberos__whitelisting_internal_apis">
+  <h2 class="title topictitle2" id="ariaid-title9">Enabling Access to Internal Impala APIs for Kerberos Users</h2>
+    <div class="body conbody">
+
+      <p class="p">
+        For applications that need direct access
+        to Impala APIs, without going through the HiveServer2 or Beeswax interfaces, you can
+        specify a list of Kerberos users who are allowed to call those APIs. By default, the
+        <code class="ph codeph">impala</code> and <code class="ph codeph">hdfs</code> users are the only ones authorized
+        for this kind of access.
+        Any users not explicitly authorized through the <code class="ph codeph">internal_principals_whitelist</code>
+        configuration setting are blocked from accessing the APIs. This setting applies to all the
+        Impala-related daemons, although currently it is primarily used for HDFS to control the
+        behavior of the catalog server.
+      </p>
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title10" id="kerberos__auth_to_local">
+    <h2 class="title topictitle2" id="ariaid-title10">Mapping Kerberos Principals to Short Names for Impala</h2>
+    <div class="body conbody">
+      <div class="p">
+      In <span class="keyword">Impala 2.6</span> and higher, Impala recognizes the <code class="ph codeph">auth_to_local</code> setting,
+      specified through the HDFS configuration setting
+      <code class="ph codeph">hadoop.security.auth_to_local</code>.
+      This feature is disabled by default, to avoid an unexpected change in security-related behavior.
+      To enable it:
+      <ul class="ul">
+        <li class="li">
+          <p class="p">
+            Specify <code class="ph codeph">--load_auth_to_local_rules=true</code>
+            in the <span class="keyword cmdname">impalad</span> and <span class="keyword cmdname">catalogd</span> configuration settings.
+          </p>
+        </li>
+      </ul>
+    </div>
+    </div>
+  </article>
+
+</article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_known_issues.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_known_issues.html b/docs/build3x/html/topics/impala_known_issues.html
new file mode 100644
index 0000000..275753b
--- /dev/null
+++ b/docs/build3x/html/topics/impala_known_issues.html
@@ -0,0 +1,1012 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_release_notes.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="known_issues"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Known Issues and Workarounds in Impala</title></head><body id="known_issues"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1"><span class="ph">Known Issues and Workarounds in Impala</span></h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+      The following sections describe known issues and workarounds in Impala, as of the current
+      production release. This page summarizes the most serious or frequently encountered issues
+      in the current release, to help you make planning decisions about installing and
+      upgrading. Any workarounds are listed here. The bug links take you to the Impala issues
+      site, where you can see the diagnosis and whether a fix is in the pipeline.
+    </p>
+
+    <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+      The online issue tracking system for Impala contains comprehensive information and is
+      updated in real time. To verify whether an issue you are experiencing has already been
+      reported, or which release an issue is fixed in, search on the
+      <a class="xref" href="https://issues.apache.org/jira/" target="_blank">issues.apache.org
+      JIRA tracker</a>.
+    </div>
+
+    <p class="p toc inpage"></p>
+
+    <p class="p">
+      For issues fixed in various Impala releases, see
+      <a class="xref" href="impala_fixed_issues.html#fixed_issues">Fixed Issues in Apache Impala</a>.
+    </p>
+
+
+
+  </div>
+
+  <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_release_notes.html">Impala Release Notes</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="known_issues__known_issues_startup">
+
+    <h2 class="title topictitle2" id="ariaid-title2">Impala Known Issues: Startup</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        These issues can prevent one or more Impala-related daemons from starting properly.
+      </p>
+
+    </div>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title3" id="known_issues_startup__IMPALA-4978">
+
+      <h3 class="title topictitle3" id="ariaid-title3">Impala requires FQDN from hostname command on kerberized clusters</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          The method Impala uses to retrieve the host name while constructing the Kerberos
+          principal is the <code class="ph codeph">gethostname()</code> system call. This function might not
+          always return the fully qualified domain name, depending on the network configuration.
+          If the daemons cannot determine the FQDN, Impala does not start on a kerberized
+          cluster.
+        </p>
+
+        <p class="p">
+          <strong class="ph b">Workaround:</strong> Test if a host is affected by checking whether the output of the
+          <span class="keyword cmdname">hostname</span> command includes the FQDN. On hosts where
+          <span class="keyword cmdname">hostname</span>, only returns the short name, pass the command-line flag
+          <code class="ph codeph">--hostname=<var class="keyword varname">fully_qualified_domain_name</var></code> in the
+          startup options of all Impala-related daemons.
+        </p>
+
+        <p class="p">
+          <strong class="ph b">Apache Issue:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-4978" target="_blank">IMPALA-4978</a>
+        </p>
+
+      </div>
+
+    </article>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="known_issues_performance__ki_performance" id="known_issues__known_issues_performance">
+
+    <h2 class="title topictitle2" id="known_issues_performance__ki_performance">Impala Known Issues: Performance</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        These issues involve the performance of operations such as queries or DDL statements.
+      </p>
+
+    </div>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title5" id="known_issues_performance__impala-6671">
+
+      <h3 class="title topictitle3" id="ariaid-title5">Metadata operations block read-only operations on unrelated tables</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          Metadata operations that change the state of a table, like <code class="ph codeph">COMPUTE
+          STATS</code> or <code class="ph codeph">ALTER RECOVER PARTITIONS</code>, may delay metadata
+          propagation of unrelated unloaded tables triggered by statements like
+          <code class="ph codeph">DESCRIBE</code> or <code class="ph codeph">SELECT</code> queries.
+        </p>
+
+        <p class="p">
+          <strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-6671" target="_blank">IMPALA-6671</a>
+        </p>
+
+      </div>
+
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title6" id="known_issues_performance__IMPALA-3316">
+
+      <h3 class="title topictitle3" id="ariaid-title6">Slow queries for Parquet tables with convert_legacy_hive_parquet_utc_timestamps=true</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          The configuration setting
+          <code class="ph codeph">convert_legacy_hive_parquet_utc_timestamps=true</code> uses an underlying
+          function that can be a bottleneck on high volume, highly concurrent queries due to the
+          use of a global lock while loading time zone information. This bottleneck can cause
+          slowness when querying Parquet tables, up to 30x for scan-heavy queries. The amount of
+          slowdown depends on factors such as the number of cores and number of threads involved
+          in the query.
+        </p>
+
+        <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+          <p class="p">
+            The slowdown only occurs when accessing <code class="ph codeph">TIMESTAMP</code> columns within
+            Parquet files that were generated by Hive, and therefore require the on-the-fly
+            timezone conversion processing.
+          </p>
+        </div>
+
+        <p class="p">
+          <strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-3316" target="_blank">IMPALA-3316</a>
+        </p>
+
+        <p class="p">
+          <strong class="ph b">Severity:</strong> High
+        </p>
+
+        <p class="p">
+          <strong class="ph b">Workaround:</strong> If the <code class="ph codeph">TIMESTAMP</code> values stored in the table
+          represent dates only, with no time portion, consider storing them as strings in
+          <code class="ph codeph">yyyy-MM-dd</code> format. Impala implicitly converts such string values to
+          <code class="ph codeph">TIMESTAMP</code> in calls to date/time functions.
+        </p>
+
+      </div>
+
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title7" id="known_issues_performance__ki_file_handle_cache">
+
+      <h3 class="title topictitle3" id="ariaid-title7">Interaction of File Handle Cache with HDFS Appends and Short-Circuit Reads</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          If a data file used by Impala is being continuously appended or overwritten in place
+          by an HDFS mechanism, such as <span class="keyword cmdname">hdfs dfs -appendToFile</span>, interaction
+          with the file handle caching feature in <span class="keyword">Impala 2.10</span> and higher
+          could cause short-circuit reads to sometimes be disabled on some DataNodes. When a
+          mismatch is detected between the cached file handle and a data block that was
+          rewritten because of an append, short-circuit reads are turned off on the affected
+          host for a 10-minute period.
+        </p>
+
+        <p class="p">
+          The possibility of encountering such an issue is the reason why the file handle
+          caching feature is currently turned off by default. See
+          <a class="xref" href="impala_scalability.html">Scalability Considerations for Impala</a> for information about this feature and
+          how to enable it.
+        </p>
+
+        <p class="p">
+          <strong class="ph b">Bug:</strong>
+          <a class="xref" href="https://issues.apache.org/jira/browse/HDFS-12528" target="_blank">HDFS-12528</a>
+        </p>
+
+        <p class="p">
+          <strong class="ph b">Severity:</strong> High
+        </p>
+
+        <p class="p">
+          <strong class="ph b">Workaround:</strong> Verify whether your ETL process is susceptible to this issue before
+          enabling the file handle caching feature. You can set the <span class="keyword cmdname">impalad</span>
+          configuration option <code class="ph codeph">unused_file_handle_timeout_sec</code> to a time period
+          that is shorter than the HDFS setting
+          <code class="ph codeph">dfs.client.read.shortcircuit.streams.cache.expiry.ms</code>. (Keep in mind
+          that the HDFS setting is in milliseconds while the Impala setting is in seconds.)
+        </p>
+
+        <p class="p">
+          <strong class="ph b">Resolution:</strong> Fixed in HDFS 2.10 and higher. Use the new HDFS parameter
+          <code class="ph codeph">dfs.domain.socket.disable.interval.seconds</code> to specify the amount of
+          time that short circuit reads are disabled on encountering an error. The default value
+          is 10 minutes (<code class="ph codeph">600</code> seconds). It is recommended that you set
+          <code class="ph codeph">dfs.domain.socket.disable.interval.seconds</code> to a small value, such as
+          <code class="ph codeph">1</code> second, when using the file handle cache. Setting <code class="ph codeph">
+          dfs.domain.socket.disable.interval.seconds</code> to <code class="ph codeph">0</code> is not
+          recommended as a non-zero interval protects the system if there is a persistent
+          problem with short circuit reads.
+        </p>
+
+      </div>
+
+    </article>
+
+  </article>
+
+
+
+  <article class="topic concept nested1" aria-labelledby="known_issues_drivers__ki_drivers" id="known_issues__known_issues_drivers">
+
+    <h2 class="title topictitle2" id="known_issues_drivers__ki_drivers">Impala Known Issues: JDBC and ODBC Drivers</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        These issues affect applications that use the JDBC or ODBC APIs, such as business
+        intelligence tools or custom-written applications in languages such as Java or C++.
+      </p>
+
+    </div>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title9" id="known_issues_drivers__IMPALA-1792">
+
+
+
+      <h3 class="title topictitle3" id="ariaid-title9">ImpalaODBC: Can not get the value in the SQLGetData(m-x th column) after the SQLBindCol(m th column)</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          If the ODBC <code class="ph codeph">SQLGetData</code> is called on a series of columns, the function
+          calls must follow the same order as the columns. For example, if data is fetched from
+          column 2 then column 1, the <code class="ph codeph">SQLGetData</code> call for column 1 returns
+          <code class="ph codeph">NULL</code>.
+        </p>
+
+        <p class="p">
+          <strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-1792" target="_blank">IMPALA-1792</a>
+        </p>
+
+        <p class="p">
+          <strong class="ph b">Workaround:</strong> Fetch columns in the same order they are defined in the table.
+        </p>
+
+      </div>
+
+    </article>
+
+  </article>
+
+
+
+  <article class="topic concept nested1" aria-labelledby="known_issues_resources__ki_resources" id="known_issues__known_issues_resources">
+
+    <h2 class="title topictitle2" id="known_issues_resources__ki_resources">Impala Known Issues: Resources</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        These issues involve memory or disk usage, including out-of-memory conditions, the
+        spill-to-disk feature, and resource management features.
+      </p>
+
+    </div>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title11" id="known_issues_resources__IMPALA-6028">
+
+      <h3 class="title topictitle3" id="ariaid-title11">Handling large rows during upgrade to <span class="keyword">Impala 2.10</span> or higher</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          After an upgrade to <span class="keyword">Impala 2.10</span> or higher, users who process
+          very large column values (long strings), or have increased the
+          <code class="ph codeph">--read_size</code> configuration setting from its default of 8 MB, might
+          encounter capacity errors for some queries that previously worked.
+        </p>
+
+        <p class="p">
+          <strong class="ph b">Resolution:</strong> After the upgrade, follow the instructions in
+          <span class="xref"></span> to check if your queries are affected by these
+          changes and to modify your configuration settings if so.
+        </p>
+
+        <p class="p">
+          <strong class="ph b">Apache Issue:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-6028" target="_blank">IMPALA-6028</a>
+        </p>
+
+      </div>
+
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title12" id="known_issues_resources__IMPALA-5605">
+
+      <h3 class="title topictitle3" id="ariaid-title12">Configuration to prevent crashes caused by thread resource limits</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          Impala could encounter a serious error due to resource usage under very high
+          concurrency. The error message is similar to:
+        </p>
+
+<pre class="pre codeblock"><code>
+F0629 08:20:02.956413 29088 llvm-codegen.cc:111] LLVM hit fatal error: Unable to allocate section memory!
+terminate called after throwing an instance of 'boost::exception_detail::clone_impl&lt;boost::exception_detail::error_info_injector&lt;boost::thread_resource_error&gt; &gt;'
+
+</code></pre>
+
+        <p class="p">
+          <strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-5605" target="_blank">IMPALA-5605</a>
+        </p>
+
+        <p class="p">
+          <strong class="ph b">Severity:</strong> High
+        </p>
+
+        <p class="p">
+          <strong class="ph b">Workaround:</strong> To prevent such errors, configure each host running an
+          <span class="keyword cmdname">impalad</span> daemon with the following settings:
+        </p>
+
+<pre class="pre codeblock"><code>
+echo 2000000 &gt; /proc/sys/kernel/threads-max
+echo 2000000 &gt; /proc/sys/kernel/pid_max
+echo 8000000 &gt; /proc/sys/vm/max_map_count
+</code></pre>
+
+        <p class="p">
+          Add the following lines in <span class="ph filepath">/etc/security/limits.conf</span>:
+        </p>
+
+<pre class="pre codeblock"><code>
+impala soft nproc 262144
+impala hard nproc 262144
+</code></pre>
+
+      </div>
+
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title13" id="known_issues_resources__drop_table_purge_s3a">
+
+      <h3 class="title topictitle3" id="ariaid-title13"><strong class="ph b">Breakpad minidumps can be very large when the thread count is high</strong></h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          The size of the breakpad minidump files grows linearly with the number of threads. By
+          default, each thread adds 8 KB to the minidump size. Minidump files could consume
+          significant disk space when the daemons have a high number of threads.
+        </p>
+
+        <p class="p">
+          <strong class="ph b">Workaround:</strong> Add
+          <samp class="ph systemoutput">--minidump_size_limit_hint_kb=size</samp>
+          to set a soft upper limit on the size of each minidump file. If the minidump file
+          would exceed that limit, Impala reduces the amount of information for each thread from
+          8 KB to 2 KB. (Full thread information is captured for the first 20 threads, then 2 KB
+          per thread after that.) The minidump file can still grow larger than the "hinted"
+          size. For example, if you have 10,000 threads, the minidump file can be more than 20
+          MB.
+        </p>
+
+        <p class="p">
+          <strong class="ph b">Apache Issue:</strong>
+          <a class="xref" href="https://issues.cloudera.org/browse/IMPALA-3509" target="_blank">IMPALA-3509</a>
+        </p>
+
+      </div>
+
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title14" id="known_issues_resources__IMPALA-691">
+
+      <h3 class="title topictitle3" id="ariaid-title14"><strong class="ph b">Process mem limit does not account for the JVM's memory usage</strong></h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          Some memory allocated by the JVM used internally by Impala is not counted against the
+          memory limit for the impalad daemon.
+        </p>
+
+        <p class="p">
+          <strong class="ph b">Workaround:</strong> To monitor overall memory usage, use the top command, or add the
+          memory figures in the Impala web UI <strong class="ph b">/memz</strong> tab to JVM memory usage shown on the
+          <strong class="ph b">/metrics</strong> tab.
+        </p>
+
+        <p class="p">
+          <strong class="ph b">Apache Issue:</strong>
+          <a class="xref" href="https://issues.cloudera.org/browse/IMPALA-691" target="_blank">IMPALA-691</a>
+        </p>
+
+      </div>
+
+    </article>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="known_issues_correctness__ki_correctness" id="known_issues__known_issues_correctness">
+
+    <h2 class="title topictitle2" id="known_issues_correctness__ki_correctness">Impala Known Issues: Correctness</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        These issues can cause incorrect or unexpected results from queries. They typically only
+        arise in very specific circumstances.
+      </p>
+
+    </div>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title16" id="known_issues_correctness__IMPALA-3094">
+
+      <h3 class="title topictitle3" id="ariaid-title16">Incorrect result due to constant evaluation in query with outer join</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          An <code class="ph codeph">OUTER JOIN</code> query could omit some expected result rows due to a
+          constant such as <code class="ph codeph">FALSE</code> in another join clause. For example:
+        </p>
+
+<pre class="pre codeblock"><code>
+explain SELECT 1 FROM alltypestiny a1
+  INNER JOIN alltypesagg a2 ON a1.smallint_col = a2.year AND false
+  RIGHT JOIN alltypes a3 ON a1.year = a1.bigint_col;
++---------------------------------------------------------+
+| Explain String                                          |
++---------------------------------------------------------+
+| Estimated Per-Host Requirements: Memory=1.00KB VCores=1 |
+|                                                         |
+| 00:EMPTYSET                                             |
++---------------------------------------------------------+
+
+</code></pre>
+
+        <p class="p">
+          <strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-3094" target="_blank">IMPALA-3094</a>
+        </p>
+
+        <p class="p">
+          <strong class="ph b">Severity:</strong> High
+        </p>
+
+      </div>
+
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title17" id="known_issues_correctness__IMPALA-3006">
+
+      <h3 class="title topictitle3" id="ariaid-title17">Impala may use incorrect bit order with BIT_PACKED encoding</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          Parquet <code class="ph codeph">BIT_PACKED</code> encoding as implemented by Impala is LSB first.
+          The parquet standard says it is MSB first.
+        </p>
+
+        <p class="p">
+          <strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-3006" target="_blank">IMPALA-3006</a>
+        </p>
+
+        <p class="p">
+          <strong class="ph b">Severity:</strong> High, but rare in practice because BIT_PACKED is infrequently used,
+          is not written by Impala, and is deprecated in Parquet 2.0.
+        </p>
+
+      </div>
+
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title18" id="known_issues_correctness__IMPALA-3082">
+
+      <h3 class="title topictitle3" id="ariaid-title18">BST between 1972 and 1995</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          The calculation of start and end times for the BST (British Summer Time) time zone
+          could be incorrect between 1972 and 1995. Between 1972 and 1995, BST began and ended
+          at 02:00 GMT on the third Sunday in March (or second Sunday when Easter fell on the
+          third) and fourth Sunday in October. For example, both function calls should return
+          13, but actually return 12, in a query such as:
+        </p>
+
+<pre class="pre codeblock"><code>
+select
+  extract(from_utc_timestamp(cast('1970-01-01 12:00:00' as timestamp), 'Europe/London'), "hour") summer70start,
+  extract(from_utc_timestamp(cast('1970-12-31 12:00:00' as timestamp), 'Europe/London'), "hour") summer70end;
+</code></pre>
+
+        <p class="p">
+          <strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-3082" target="_blank">IMPALA-3082</a>
+        </p>
+
+        <p class="p">
+          <strong class="ph b">Severity:</strong> High
+        </p>
+
+      </div>
+
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title19" id="known_issues_correctness__IMPALA-2422">
+
+      <h3 class="title topictitle3" id="ariaid-title19">% escaping does not work correctly when occurs at the end in a LIKE clause</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          If the final character in the RHS argument of a <code class="ph codeph">LIKE</code> operator is an
+          escaped <code class="ph codeph">\%</code> character, it does not match a <code class="ph codeph">%</code> final
+          character of the LHS argument.
+        </p>
+
+        <p class="p">
+          <strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2422" target="_blank">IMPALA-2422</a>
+        </p>
+
+      </div>
+
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title20" id="known_issues_correctness__IMPALA-2603">
+
+      <h3 class="title topictitle3" id="ariaid-title20">Crash: impala::Coordinator::ValidateCollectionSlots</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          A query could encounter a serious error if includes multiple nested levels of
+          <code class="ph codeph">INNER JOIN</code> clauses involving subqueries.
+        </p>
+
+        <p class="p">
+          <strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2603" target="_blank">IMPALA-2603</a>
+        </p>
+
+      </div>
+
+    </article>
+
+  </article>
+
+
+
+  <article class="topic concept nested1" aria-labelledby="known_issues_interop__ki_interop" id="known_issues__known_issues_interop">
+
+    <h2 class="title topictitle2" id="known_issues_interop__ki_interop">Impala Known Issues: Interoperability</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        These issues affect the ability to interchange data between Impala and other database
+        systems. They cover areas such as data types and file formats.
+      </p>
+
+    </div>
+
+
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title22" id="known_issues_interop__describe_formatted_avro">
+
+      <h3 class="title topictitle3" id="ariaid-title22">DESCRIBE FORMATTED gives error on Avro table</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          This issue can occur either on old Avro tables (created prior to Hive 1.1) or when
+          changing the Avro schema file by adding or removing columns. Columns added to the
+          schema file will not show up in the output of the <code class="ph codeph">DESCRIBE FORMATTED</code>
+          command. Removing columns from the schema file will trigger a
+          <code class="ph codeph">NullPointerException</code>.
+        </p>
+
+        <p class="p">
+          As a workaround, you can use the output of <code class="ph codeph">SHOW CREATE TABLE</code> to drop
+          and recreate the table. This will populate the Hive metastore database with the
+          correct column definitions.
+        </p>
+
+        <div class="note warning note_warning"><span class="note__title warningtitle">Warning:</span>
+          <div class="p">
+            Only use this for external tables, or Impala will remove the data files. In case of
+            an internal table, set it to external first:
+<pre class="pre codeblock"><code>
+ALTER TABLE table_name SET TBLPROPERTIES('EXTERNAL'='TRUE');
+</code></pre>
+            (The part in parentheses is case sensitive.) Make sure to pick the right choice
+            between internal and external when recreating the table. See
+            <a class="xref" href="impala_tables.html#tables">Overview of Impala Tables</a> for the differences between internal and
+            external tables.
+          </div>
+        </div>
+
+        <p class="p">
+          <strong class="ph b">Severity:</strong> High
+        </p>
+
+      </div>
+
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title23" id="known_issues_interop__IMP-175">
+
+
+
+      <h3 class="title topictitle3" id="ariaid-title23">Deviation from Hive behavior: Out of range values float/double values are returned as maximum allowed value of type (Hive returns NULL)</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          Impala behavior differs from Hive with respect to out of range float/double values.
+          Out of range values are returned as maximum allowed value of type (Hive returns NULL).
+        </p>
+
+        <p class="p">
+          <strong class="ph b">Workaround:</strong> None
+        </p>
+
+      </div>
+
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title24" id="known_issues_interop__flume_writeformat_text">
+
+
+
+      <h3 class="title topictitle3" id="ariaid-title24">Configuration needed for Flume to be compatible with Impala</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          For compatibility with Impala, the value for the Flume HDFS Sink
+          <code class="ph codeph">hdfs.writeFormat</code> must be set to <code class="ph codeph">Text</code>, rather than
+          its default value of <code class="ph codeph">Writable</code>. The <code class="ph codeph">hdfs.writeFormat</code>
+          setting must be changed to <code class="ph codeph">Text</code> before creating data files with
+          Flume; otherwise, those files cannot be read by either Impala or Hive.
+        </p>
+
+        <p class="p">
+          <strong class="ph b">Resolution:</strong> This information has been requested to be added to the upstream
+          Flume documentation.
+        </p>
+
+      </div>
+
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title25" id="known_issues_interop__IMPALA-635">
+
+
+
+      <h3 class="title topictitle3" id="ariaid-title25">Avro Scanner fails to parse some schemas</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          Querying certain Avro tables could cause a crash or return no rows, even though Impala
+          could <code class="ph codeph">DESCRIBE</code> the table.
+        </p>
+
+        <p class="p">
+          <strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-635" target="_blank">IMPALA-635</a>
+        </p>
+
+        <p class="p">
+          <strong class="ph b">Workaround:</strong> Swap the order of the fields in the schema specification. For
+          example, <code class="ph codeph">["null", "string"]</code> instead of <code class="ph codeph">["string",
+          "null"]</code>.
+        </p>
+
+        <p class="p">
+          <strong class="ph b">Resolution:</strong> Not allowing this syntax agrees with the Avro specification, so it
+          may still cause an error even when the crashing issue is resolved.
+        </p>
+
+      </div>
+
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title26" id="known_issues_interop__IMPALA-1024">
+
+
+
+      <h3 class="title topictitle3" id="ariaid-title26">Impala BE cannot parse Avro schema that contains a trailing semi-colon</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          If an Avro table has a schema definition with a trailing semicolon, Impala encounters
+          an error when the table is queried.
+        </p>
+
+        <p class="p">
+          <strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-1024" target="_blank">IMPALA-1024</a>
+        </p>
+
+        <p class="p">
+          <strong class="ph b">Severity:</strong> Remove trailing semicolon from the Avro schema.
+        </p>
+
+      </div>
+
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title27" id="known_issues_interop__IMPALA-1652">
+
+
+
+      <h3 class="title topictitle3" id="ariaid-title27">Incorrect results with basic predicate on CHAR typed column</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          When comparing a <code class="ph codeph">CHAR</code> column value to a string literal, the literal
+          value is not blank-padded and so the comparison might fail when it should match.
+        </p>
+
+        <p class="p">
+          <strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-1652" target="_blank">IMPALA-1652</a>
+        </p>
+
+        <p class="p">
+          <strong class="ph b">Workaround:</strong> Use the <code class="ph codeph">RPAD()</code> function to blank-pad literals
+          compared with <code class="ph codeph">CHAR</code> columns to the expected length.
+        </p>
+
+      </div>
+
+    </article>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title28" id="known_issues__known_issues_limitations">
+
+    <h2 class="title topictitle2" id="ariaid-title28">Impala Known Issues: Limitations</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        These issues are current limitations of Impala that require evaluation as you plan how
+        to integrate Impala into your data management workflow.
+      </p>
+
+    </div>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title29" id="known_issues_limitations__IMPALA-4551">
+
+      <h3 class="title topictitle3" id="ariaid-title29">Set limits on size of expression trees</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          Very deeply nested expressions within queries can exceed internal Impala limits,
+          leading to excessive memory usage.
+        </p>
+
+        <p class="p">
+          <strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-4551" target="_blank">IMPALA-4551</a>
+        </p>
+
+        <p class="p">
+          <strong class="ph b">Severity:</strong> High
+        </p>
+
+        <p class="p">
+          <strong class="ph b">Resolution:</strong>
+        </p>
+
+        <p class="p">
+          <strong class="ph b">Workaround:</strong> Avoid queries with extremely large expression trees. Setting the
+          query option <code class="ph codeph">disable_codegen=true</code> may reduce the impact, at a cost of
+          longer query runtime.
+        </p>
+
+      </div>
+
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title30" id="known_issues_limitations__IMPALA-77">
+
+
+
+      <h3 class="title topictitle3" id="ariaid-title30">Impala does not support running on clusters with federated namespaces</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          Impala does not support running on clusters with federated namespaces. The
+          <code class="ph codeph">impalad</code> process will not start on a node running such a filesystem
+          based on the <code class="ph codeph">org.apache.hadoop.fs.viewfs.ViewFs</code> class.
+        </p>
+
+        <p class="p">
+          <strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-77" target="_blank">IMPALA-77</a>
+        </p>
+
+        <p class="p">
+          <strong class="ph b">Anticipated Resolution:</strong> Limitation
+        </p>
+
+        <p class="p">
+          <strong class="ph b">Workaround:</strong> Use standard HDFS on all Impala nodes.
+        </p>
+
+      </div>
+
+    </article>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title31" id="known_issues__known_issues_misc">
+
+    <h2 class="title topictitle2" id="ariaid-title31">Impala Known Issues: Miscellaneous</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        These issues do not fall into one of the above categories or have not been categorized
+        yet.
+      </p>
+
+    </div>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title32" id="known_issues_misc__IMPALA-2005">
+
+
+
+      <h3 class="title topictitle3" id="ariaid-title32">A failed CTAS does not drop the table if the insert fails</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          If a <code class="ph codeph">CREATE TABLE AS SELECT</code> operation successfully creates the target
+          table but an error occurs while querying the source table or copying the data, the new
+          table is left behind rather than being dropped.
+        </p>
+
+        <p class="p">
+          <strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2005" target="_blank">IMPALA-2005</a>
+        </p>
+
+        <p class="p">
+          <strong class="ph b">Workaround:</strong> Drop the new table manually after a failed <code class="ph codeph">CREATE TABLE AS
+          SELECT</code>.
+        </p>
+
+      </div>
+
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title33" id="known_issues_misc__IMPALA-1821">
+
+
+
+      <h3 class="title topictitle3" id="ariaid-title33">Casting scenarios with invalid/inconsistent results</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          Using a <code class="ph codeph">CAST()</code> function to convert large literal values to smaller
+          types, or to convert special values such as <code class="ph codeph">NaN</code> or
+          <code class="ph codeph">Inf</code>, produces values not consistent with other database systems. This
+          could lead to unexpected results from queries.
+        </p>
+
+        <p class="p">
+          <strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-1821" target="_blank">IMPALA-1821</a>
+        </p>
+
+
+
+      </div>
+
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title34" id="known_issues_misc__IMPALA-941">
+
+
+
+      <h3 class="title topictitle3" id="ariaid-title34">Impala Parser issue when using fully qualified table names that start with a number</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          A fully qualified table name starting with a number could cause a parsing error. In a
+          name such as <code class="ph codeph">db.571_market</code>, the decimal point followed by digits is
+          interpreted as a floating-point number.
+        </p>
+
+        <p class="p">
+          <strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-941" target="_blank">IMPALA-941</a>
+        </p>
+
+        <p class="p">
+          <strong class="ph b">Workaround:</strong> Surround each part of the fully qualified name with backticks
+          (<code class="ph codeph">``</code>).
+        </p>
+
+      </div>
+
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title35" id="known_issues_misc__IMPALA-532">
+
+
+
+      <h3 class="title topictitle3" id="ariaid-title35">Impala should tolerate bad locale settings</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          If the <code class="ph codeph">LC_*</code> environment variables specify an unsupported locale,
+          Impala does not start.
+        </p>
+
+        <p class="p">
+          <strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-532" target="_blank">IMPALA-532</a>
+        </p>
+
+        <p class="p">
+          <strong class="ph b">Workaround:</strong> Add <code class="ph codeph">LC_ALL="C"</code> to the environment settings for
+          both the Impala daemon and the Statestore daemon. See
+          <a class="xref" href="impala_config_options.html#config_options">Modifying Impala Startup Options</a> for details about modifying
+          these environment settings.
+        </p>
+
+        <p class="p">
+          <strong class="ph b">Resolution:</strong> Fixing this issue would require an upgrade to Boost 1.47 in the
+          Impala distribution.
+        </p>
+
+      </div>
+
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title36" id="known_issues_misc__IMP-1203">
+
+
+
+      <h3 class="title topictitle3" id="ariaid-title36">Log Level 3 Not Recommended for Impala</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          The extensive logging produced by log level 3 can cause serious performance overhead
+          and capacity issues.
+        </p>
+
+        <p class="p">
+          <strong class="ph b">Workaround:</strong> Reduce the log level to its default value of 1, that is,
+          <code class="ph codeph">GLOG_v=1</code>. See <a class="xref" href="impala_logging.html#log_levels">Setting Logging Levels</a> for
+          details about the effects of setting different logging levels.
+        </p>
+
+      </div>
+
+    </article>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title37" id="known_issues__known_issues_crash">
+
+    <h2 class="title topictitle2" id="ariaid-title37">Impala Known Issues: Crashes and Hangs</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        These issues can cause Impala to quit or become unresponsive.
+      </p>
+
+    </div>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title38" id="known_issues_crash__impala-6841">
+
+      <h3 class="title topictitle3" id="ariaid-title38">Unable to view large catalog objects in catalogd Web UI</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          In <code class="ph codeph">catalogd</code> Web UI, you can list metadata objects and view their
+          details. These details are accessed via a link and printed to a string formatted using
+          thrift's <code class="ph codeph">DebugProtocol</code>. Printing large objects (&gt; 1 GB) in Web UI can
+          crash <code class="ph codeph">catalogd</code>.
+        </p>
+
+        <p class="p">
+          <strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-6841" target="_blank">IMPALA-6841</a>
+        </p>
+
+      </div>
+
+    </article>
+
+  </article>
+
+</article></main></body></html>

[09/51] [partial] impala git commit: [DOCS] Impala doc site update for 3.0

Posted by mi...@apache.org.

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_reserved_words.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_reserved_words.html b/docs/build3x/html/topics/impala_reserved_words.html
new file mode 100644
index 0000000..3676084
--- /dev/null
+++ b/docs/build3x/html/topics/impala_reserved_words.html
@@ -0,0 +1,3853 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="reserved_words"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Impala Reserved Words</title></head><body id="reserved_words"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Impala Reserved Words</h1>
+
+
+  <div class="body conbody">
+
+    <p class="p">
+       This topic lists
+      the reserved words in Impala.
+    </p>
+    <div class="p">
+      A reserved word is one that cannot be used directly as an identifier. If
+      you need to use it as an identifier, you must quote it with backticks.
+      For example:
+      <ul class="ul">
+        <li class="li">
+          <code class="ph codeph">CREATE TABLE select (x INT)</code>: fails
+        </li>
+        <li class="li">
+          <code class="ph codeph">CREATE TABLE `select` (x INT)</code>: succeeds
+        </li>
+      </ul>
+    </div>
+
+    <p class="p">
+      Because different database systems have different sets of reserved words,
+      and the reserved words change from release to release, carefully consider
+      database, table, and column names to ensure maximum compatibility between
+      products and versions.
+    </p>
+
+    <p class="p">
+      Also consider whether your object names are the same as any Hive
+      keywords, and rename or quote any that conflict as you might switch
+      between Impala and Hive when doing analytics and ETL. Consult the
+      <a class="xref" href="https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Keywords,Non-reservedKeywordsandReservedKeywords" target="_blank">list of Hive keywords</a>.
+    </p>
+    <p class="p">
+      To future-proof your code, you should avoid additional words in case they
+      become reserved words if Impala adds features in later releases. This kind
+      of planning can also help to avoid name conflicts in case you port SQL
+      from other systems that have different sets of reserved words. The Future
+      Keyword column in the table below indicates those additional words that
+      you should avoid for table, column, or other object names, even though
+      they are not currently reserved by Impala.
+    </p>
+    <p class="p">
+      The following is a summary of the process for deciding whether a
+      particular SQL 2016 word is to be reserved in Impala.
+    </p>
+    <ul class="ul">
+      <li class="li">
+        By default, Impala targets to have the same list of reserved words as
+        SQL 2016.
+      </li>
+      <li class="li">
+        At the same time, to be compatible with earlier versions of Impala
+        and to avoid breaking existing tables/workloads, Impala built-in
+        function names are removed from the reserved words list, e.g. COUNT,
+        AVG, as Impala generally does not need to reserve the names of built-in
+        functions for parsing to work.
+      </li>
+      <li class="li">
+        For those remaining SQL 2016 reserved words, if a word is likely to be
+        in-use by users of older Impala versions and if there is a low chance of
+        Impala needing to reserve that word in the future, then the word is not
+        reserved.
+      </li>
+      <li class="li">
+        Otherwise, the word is reserved in Impala.
+      </li>
+    </ul>
+
+    <p class="p toc inpage"></p>
+
+  </div>
+
+<article class="topic concept nested1" aria-labelledby="ariaid-title2" id="reserved_words__reserved_words_current">
+<h2 class="title topictitle2" id="ariaid-title2">List of Reserved Words</h2>
+<div class="body conbody">
+
+      <div class="p">
+        <table dir="ltr" class="table frame-all" id="reserved_words_current__table_lfw_pjs_cdb"><caption></caption><colgroup><col><col><col><col><col></colgroup><tbody class="tbody">
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><strong class="ph b">Keyword</strong></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><strong class="ph b">Reserved</strong><p class="p"><strong class="ph b">in</strong></p><p class="p"><strong class="ph b">SQL:2016</strong></p></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><strong class="ph b">Reserved</strong><p class="p"><strong class="ph b">in</strong></p><p class="p"><strong class="ph b">Impala 2.12 and
+                      lower</strong></p></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><strong class="ph b">Reserved</strong><p class="p"><strong class="ph b">in</strong></p><p class="p"><strong class="ph b">Impala 3.0 and
+                      higher</strong></p></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><strong class="ph b">Future Keyword</strong></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">abs</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">acos</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">add</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">aggregate</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">all</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">allocate</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">alter</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">analytic</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">and</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">anti</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">any</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">api_version</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">are</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">array</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">array_agg</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">array_max_cardinality</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">as</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">asc</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">asensitive</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">asin</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">asymmetric</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">at</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">atan</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">atomic</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">authorization</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">avg</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">avro</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">backup</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">begin</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">begin_frame</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">begin_partition</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">between</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">bigint</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">binary</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">blob</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">block_size</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">boolean</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">both</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">break</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">browse</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">bulk</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">by</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">cache</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">cached</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">call</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">called</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">cardinality</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">cascade</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">cascaded</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">case</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">cast</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">ceil</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">ceiling</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">change</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">char</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">char_length</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">character</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">character_length</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">check</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">checkpoint</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">class</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">classifier</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">clob</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">close</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">close_fn</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">clustered</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">coalesce</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">collate</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">collect</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">column</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">columns</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">comment</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">commit</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">compression</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">compute</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">condition</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">conf</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">connect</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">constraint</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">contains</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">continue</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">convert</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">copy</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">corr</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">corresponding</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">cos</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">cosh</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">count</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">covar_pop</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">covar_samp</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">create</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">cross</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">cube</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">cume_dist</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">current</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">current_catalog</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">current_date</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">current_default_transform_group</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">current_path</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">current_role</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">current_row</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">current_schema</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">current_time</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">current_timestamp</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">current_transform_group_for_type</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">current_user</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">cursor</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">cycle</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">data</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">database</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">databases</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">date</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">datetime</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">day</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">dayofweek</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">dbcc</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">deallocate</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">dec</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">decfloat</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">decimal</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">declare</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">default</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">define</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">delete</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">delimited</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">dense_rank</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">deny</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">deref</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">desc</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">describe</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">deterministic</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">disconnect</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">disk</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">distinct</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">distributed</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">div</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">double</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">drop</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">dump</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">dynamic</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">each</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">element</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">else</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">empty</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">encoding</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">end</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">end-exec</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">end_frame</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">end_partition</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">equals</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">errlvl</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">escape</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">escaped</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">every</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">except</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">exchange</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">exec</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">execute</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">exists</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">exit</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">exp</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">explain</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+              </tr>
+              <tr class="row">
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">extended</code></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+                <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+        

<TRUNCATED>