You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@impala.apache.org by jb...@apache.org on 2017/04/12 18:25:05 UTC

[01/51] [partial] incubator-impala git commit: IMPALA-4181 [DOCS] Publish rendered Impala documentation to ASF site

Repository: incubator-impala
Updated Branches:
  refs/heads/asf-site d96cd395e -> 75c469182


http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_tutorial.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_tutorial.html b/docs/build/html/topics/impala_tutorial.html
new file mode 100644
index 0000000..ca76d1e
--- /dev/null
+++ b/docs/build/html/topics/impala_tutorial.html
@@ -0,0 +1,2270 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" cont
 ent="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="tutorial"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Impala Tutorials</title></head><body id="tutorial"><main role="main"><article role="article" aria-labelledby="tutorial__tutorials">
+
+  <h1 class="title topictitle1" id="tutorial__tutorials">Impala Tutorials</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      This section includes tutorial scenarios that demonstrate how to begin using Impala once the software is
+      installed. It focuses on techniques for loading data, because once you have some data in tables and can query
+      that data, you can quickly progress to more advanced Impala features.
+    </p>
+
+    <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+      <p class="p">
+        Where practical, the tutorials take you from <span class="q">"ground zero"</span> to having the desired Impala tables and
+        data. In some cases, you might need to download additional files from outside sources, set up additional
+        software components, modify commands or scripts to fit your own configuration, or substitute your own
+        sample data.
+      </p>
+    </div>
+
+    <p class="p">
+      Before trying these tutorial lessons, install Impala using one of these procedures:
+    </p>
+
+    <ul class="ul">
+      <li class="li">
+        If you already have some <span class="keyword">Apache Hadoop</span> environment set up and just need to add Impala to it,
+        follow the installation process described in <a class="xref" href="impala_install.html#install">Installing Impala</a>. Make sure to also install the Hive
+        metastore service if you do not already have Hive configured.
+      </li>
+
+    </ul>
+
+    <p class="p toc inpage"></p>
+  </div>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title2" id="tutorial__tut_beginner">
+
+    <h2 class="title topictitle2" id="ariaid-title2">Tutorials for Getting Started</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        These tutorials demonstrate the basics of using Impala. They are intended for first-time users, and for
+        trying out Impala on any new cluster to make sure the major components are working correctly.
+      </p>
+
+      <p class="p toc inpage"></p>
+    </div>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title3" id="tut_beginner__tutorial_explore">
+
+      <h3 class="title topictitle3" id="ariaid-title3">Explore a New Impala Instance</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          This tutorial demonstrates techniques for finding your way around the tables and databases of an
+          unfamiliar (possibly empty) Impala instance.
+        </p>
+
+        <p class="p">
+          When you connect to an Impala instance for the first time, you use the <code class="ph codeph">SHOW DATABASES</code>
+          and <code class="ph codeph">SHOW TABLES</code> statements to view the most common types of objects. Also, call the
+          <code class="ph codeph">version()</code> function to confirm which version of Impala you are running; the version
+          number is important when consulting documentation and dealing with support issues.
+        </p>
+
+        <p class="p">
+          A completely empty Impala instance contains no tables, but still has two databases:
+        </p>
+
+        <ul class="ul">
+          <li class="li">
+            <code class="ph codeph">default</code>, where new tables are created when you do not specify any other database.
+          </li>
+
+          <li class="li">
+            <code class="ph codeph">_impala_builtins</code>, a system database used to hold all the built-in functions.
+          </li>
+        </ul>
+
+        <p class="p">
+          The following example shows how to see the available databases, and the tables in each. If the list of
+          databases or tables is long, you can use wildcard notation to locate specific databases or tables based
+          on their names.
+        </p>
+
+<pre class="pre codeblock"><code>$ impala-shell -i localhost --quiet
+Starting Impala Shell without Kerberos authentication
+Welcome to the Impala shell. Press TAB twice to see a list of available commands.
+...
+<span class="ph">(Shell
+      build version: Impala Shell v2.8.x (<var class="keyword varname">hash</var>) built on
+      <var class="keyword varname">date</var>)</span>
+[localhost:21000] &gt; select version();
++-------------------------------------------
+| version()
++-------------------------------------------
+| impalad version ...
+| Built on ...
++-------------------------------------------
+[localhost:21000] &gt; show databases;
++--------------------------+
+| name                     |
++--------------------------+
+| _impala_builtins         |
+| ctas                     |
+| d1                       |
+| d2                       |
+| d3                       |
+| default                  |
+| explain_plans            |
+| external_table           |
+| file_formats             |
+| tpc                      |
++--------------------------+
+[localhost:21000] &gt; select current_database();
++--------------------+
+| current_database() |
++--------------------+
+| default            |
++--------------------+
+[localhost:21000] &gt; show tables;
++-------+
+| name  |
++-------+
+| ex_t  |
+| t1    |
++-------+
+[localhost:21000] &gt; show tables in d3;
+
+[localhost:21000] &gt; show tables in tpc;
++------------------------+
+| name                   |
++------------------------+
+| city                   |
+| customer               |
+| customer_address       |
+| customer_demographics  |
+| household_demographics |
+| item                   |
+| promotion              |
+| store                  |
+| store2                 |
+| store_sales            |
+| ticket_view            |
+| time_dim               |
+| tpc_tables             |
++------------------------+
+[localhost:21000] &gt; show tables in tpc like 'customer*';
++-----------------------+
+| name                  |
++-----------------------+
+| customer              |
+| customer_address      |
+| customer_demographics |
++-----------------------+
+</code></pre>
+
+        <p class="p">
+          Once you know what tables and databases are available, you descend into a database with the
+          <code class="ph codeph">USE</code> statement. To understand the structure of each table, you use the
+          <code class="ph codeph">DESCRIBE</code> command. Once inside a database, you can issue statements such as
+          <code class="ph codeph">INSERT</code> and <code class="ph codeph">SELECT</code> that operate on particular tables.
+        </p>
+
+        <p class="p">
+          The following example explores a database named <code class="ph codeph">TPC</code> whose name we learned in the
+          previous example. It shows how to filter the table names within a database based on a search string,
+          examine the columns of a table, and run queries to examine the characteristics of the table data. For
+          example, for an unfamiliar table you might want to know the number of rows, the number of different
+          values for a column, and other properties such as whether the column contains any <code class="ph codeph">NULL</code>
+          values. When sampling the actual data values from a table, use a <code class="ph codeph">LIMIT</code> clause to avoid
+          excessive output if the table contains more rows or distinct values than you expect.
+        </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; use tpc;
+[localhost:21000] &gt; show tables like '*view*';
++-------------+
+| name        |
++-------------+
+| ticket_view |
++-------------+
+[localhost:21000] &gt; describe city;
++-------------+--------+---------+
+| name        | type   | comment |
++-------------+--------+---------+
+| id          | int    |         |
+| name        | string |         |
+| countrycode | string |         |
+| district    | string |         |
+| population  | int    |         |
++-------------+--------+---------+
+[localhost:21000] &gt; select count(*) from city;
++----------+
+| count(*) |
++----------+
+| 0        |
++----------+
+[localhost:21000] &gt; desc customer;
++------------------------+--------+---------+
+| name                   | type   | comment |
++------------------------+--------+---------+
+| c_customer_sk          | int    |         |
+| c_customer_id          | string |         |
+| c_current_cdemo_sk     | int    |         |
+| c_current_hdemo_sk     | int    |         |
+| c_current_addr_sk      | int    |         |
+| c_first_shipto_date_sk | int    |         |
+| c_first_sales_date_sk  | int    |         |
+| c_salutation           | string |         |
+| c_first_name           | string |         |
+| c_last_name            | string |         |
+| c_preferred_cust_flag  | string |         |
+| c_birth_day            | int    |         |
+| c_birth_month          | int    |         |
+| c_birth_year           | int    |         |
+| c_birth_country        | string |         |
+| c_login                | string |         |
+| c_email_address        | string |         |
+| c_last_review_date     | string |         |
++------------------------+--------+---------+
+[localhost:21000] &gt; select count(*) from customer;
++----------+
+| count(*) |
++----------+
+| 100000   |
++----------+
+[localhost:21000] &gt; select count(distinct c_birth_month) from customer;
++-------------------------------+
+| count(distinct c_birth_month) |
++-------------------------------+
+| 12                            |
++-------------------------------+
+[localhost:21000] &gt; select count(*) from customer where c_email_address is null;
++----------+
+| count(*) |
++----------+
+| 0        |
++----------+
+[localhost:21000] &gt; select distinct c_salutation from customer limit 10;
++--------------+
+| c_salutation |
++--------------+
+| Mr.          |
+| Ms.          |
+| Dr.          |
+|              |
+| Miss         |
+| Sir          |
+| Mrs.         |
++--------------+
+</code></pre>
+
+        <p class="p">
+          When you graduate from read-only exploration, you use statements such as <code class="ph codeph">CREATE DATABASE</code>
+          and <code class="ph codeph">CREATE TABLE</code> to set up your own database objects.
+        </p>
+
+        <p class="p">
+          The following example demonstrates creating a new database holding a new table. Although the last example
+          ended inside the <code class="ph codeph">TPC</code> database, the new <code class="ph codeph">EXPERIMENTS</code> database is not
+          nested inside <code class="ph codeph">TPC</code>; all databases are arranged in a single top-level list.
+        </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; create database experiments;
+[localhost:21000] &gt; show databases;
++--------------------------+
+| name                     |
++--------------------------+
+| _impala_builtins         |
+| ctas                     |
+| d1                       |
+| d2                       |
+| d3                       |
+| default                  |
+| experiments              |
+| explain_plans            |
+| external_table           |
+| file_formats             |
+| tpc                      |
++--------------------------+
+[localhost:21000] &gt; show databases like 'exp*';
++---------------+
+| name          |
++---------------+
+| experiments   |
+| explain_plans |
++---------------+
+</code></pre>
+
+        <p class="p">
+          The following example creates a new table, <code class="ph codeph">T1</code>. To illustrate a common mistake, it creates this table inside
+          the wrong database, the <code class="ph codeph">TPC</code> database where the previous example ended. The <code class="ph codeph">ALTER
+          TABLE</code> statement lets you move the table to the intended database, <code class="ph codeph">EXPERIMENTS</code>, as part of a rename operation.
+          The <code class="ph codeph">USE</code> statement is always needed to switch to a new database, and the
+          <code class="ph codeph">current_database()</code> function confirms which database the session is in, to avoid these
+          kinds of mistakes.
+        </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; create table t1 (x int);
+
+[localhost:21000] &gt; show tables;
++------------------------+
+| name                   |
++------------------------+
+| city                   |
+| customer               |
+| customer_address       |
+| customer_demographics  |
+| household_demographics |
+| item                   |
+| promotion              |
+| store                  |
+| store2                 |
+| store_sales            |
+| t1                     |
+| ticket_view            |
+| time_dim               |
+| tpc_tables             |
++------------------------+
+[localhost:21000] &gt; select current_database();
++--------------------+
+| current_database() |
++--------------------+
+| tpc                |
++--------------------+
+[localhost:21000] &gt; alter table t1 rename to experiments.t1;
+[localhost:21000] &gt; use experiments;
+[localhost:21000] &gt; show tables;
++------+
+| name |
++------+
+| t1   |
++------+
+[localhost:21000] &gt; select current_database();
++--------------------+
+| current_database() |
++--------------------+
+| experiments        |
++--------------------+
+</code></pre>
+
+        <p class="p">
+          For your initial experiments with tables, you can use ones with just a few columns and a few rows, and
+          text-format data files.
+        </p>
+
+        <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+          As you graduate to more realistic scenarios, you will use more elaborate tables with many columns,
+          features such as partitioning, and file formats such as Parquet. When dealing with realistic data
+          volumes, you will bring in data using <code class="ph codeph">LOAD DATA</code> or <code class="ph codeph">INSERT ... SELECT</code>
+          statements to operate on millions or billions of rows at once.
+        </div>
+
+        <p class="p">
+          The following example sets up a couple of simple tables with a few rows, and performs queries involving
+          sorting, aggregate functions and joins.
+        </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; insert into t1 values (1), (3), (2), (4);
+[localhost:21000] &gt; select x from t1 order by x desc;
++---+
+| x |
++---+
+| 4 |
+| 3 |
+| 2 |
+| 1 |
++---+
+[localhost:21000] &gt; select min(x), max(x), sum(x), avg(x) from t1;
++--------+--------+--------+--------+
+| min(x) | max(x) | sum(x) | avg(x) |
++--------+--------+--------+--------+
+| 1      | 4      | 10     | 2.5    |
++--------+--------+--------+--------+
+
+[localhost:21000] &gt; create table t2 (id int, word string);
+[localhost:21000] &gt; insert into t2 values (1, "one"), (3, "three"), (5, 'five');
+[localhost:21000] &gt; select word from t1 join t2 on (t1.x = t2.id);
++-------+
+| word  |
++-------+
+| one   |
+| three |
++-------+
+</code></pre>
+
+        <p class="p">
+          After completing this tutorial, you should now know:
+        </p>
+
+        <ul class="ul">
+          <li class="li">
+            How to tell which version of Impala is running on your system.
+          </li>
+
+          <li class="li">
+            How to find the names of databases in an Impala instance, either displaying the full list or
+            searching for specific names.
+          </li>
+
+          <li class="li">
+            How to find the names of tables in an Impala database, either displaying the full list or
+            searching for specific names.
+          </li>
+
+          <li class="li">
+            How to switch between databases and check which database you are currently in.
+          </li>
+
+          <li class="li">
+            How to learn the column names and types of a table.
+          </li>
+
+          <li class="li">
+            How to create databases and tables, insert small amounts of test data, and run simple queries.
+          </li>
+        </ul>
+      </div>
+    </article>
+
+    
+
+    
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title4" id="tut_beginner__tutorial_csv_setup">
+
+      <h3 class="title topictitle3" id="ariaid-title4">Load CSV Data from Local Files</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          This scenario illustrates how to create some very small tables, suitable for first-time users to
+          experiment with Impala SQL features. <code class="ph codeph">TAB1</code> and <code class="ph codeph">TAB2</code> are loaded with data
+          from files in HDFS. A subset of data is copied from <code class="ph codeph">TAB1</code> into <code class="ph codeph">TAB3</code>.
+        </p>
+
+        <p class="p">
+          Populate HDFS with the data you want to query. To begin this process, create one or more new
+          subdirectories underneath your user directory in HDFS. The data for each table resides in a separate
+          subdirectory. Substitute your own username for <code class="ph codeph">username</code> where appropriate. This example
+          uses the <code class="ph codeph">-p</code> option with the <code class="ph codeph">mkdir</code> operation to create any necessary
+          parent directories if they do not already exist.
+        </p>
+
+<pre class="pre codeblock"><code>$ whoami
+username
+$ hdfs dfs -ls /user
+Found 3 items
+drwxr-xr-x   - username username            0 2013-04-22 18:54 /user/username
+drwxrwx---   - mapred   mapred              0 2013-03-15 20:11 /user/history
+drwxr-xr-x   - hue      supergroup          0 2013-03-15 20:10 /user/hive
+
+$ hdfs dfs -mkdir -p /user/username/sample_data/tab1 /user/username/sample_data/tab2</code></pre>
+
+        <p class="p">
+          Here is some sample data, for two tables named <code class="ph codeph">TAB1</code> and <code class="ph codeph">TAB2</code>.
+        </p>
+
+        <p class="p">
+          Copy the following content to <code class="ph codeph">.csv</code> files in your local filesystem:
+        </p>
+
+        <p class="p">
+          <span class="ph filepath">tab1.csv</span>:
+        </p>
+
+<pre class="pre codeblock"><code>1,true,123.123,2012-10-24 08:55:00
+2,false,1243.5,2012-10-25 13:40:00
+3,false,24453.325,2008-08-22 09:33:21.123
+4,false,243423.325,2007-05-12 22:32:21.33454
+5,true,243.325,1953-04-22 09:11:33
+</code></pre>
+
+        <p class="p">
+          <span class="ph filepath">tab2.csv</span>:
+        </p>
+
+<pre class="pre codeblock"><code>1,true,12789.123
+2,false,1243.5
+3,false,24453.325
+4,false,2423.3254
+5,true,243.325
+60,false,243565423.325
+70,true,243.325
+80,false,243423.325
+90,true,243.325
+</code></pre>
+
+        <p class="p">
+          Put each <code class="ph codeph">.csv</code> file into a separate HDFS directory using commands like the following,
+          which use paths available in the Impala Demo VM:
+        </p>
+
+<pre class="pre codeblock"><code>$ hdfs dfs -put tab1.csv /user/username/sample_data/tab1
+$ hdfs dfs -ls /user/username/sample_data/tab1
+Found 1 items
+-rw-r--r--   1 username username        192 2013-04-02 20:08 /user/username/sample_data/tab1/tab1.csv
+
+
+$ hdfs dfs -put tab2.csv /user/username/sample_data/tab2
+$ hdfs dfs -ls /user/username/sample_data/tab2
+Found 1 items
+-rw-r--r--   1 username username        158 2013-04-02 20:09 /user/username/sample_data/tab2/tab2.csv
+</code></pre>
+
+        <p class="p">
+          The name of each data file is not significant. In fact, when Impala examines the contents of the data
+          directory for the first time, it considers all files in the directory to make up the data of the table,
+          regardless of how many files there are or what the files are named.
+        </p>
+
+        <p class="p">
+          To understand what paths are available within your own HDFS filesystem and what the permissions are for
+          the various directories and files, issue <code class="ph codeph">hdfs dfs -ls /</code> and work your way down the tree
+          doing <code class="ph codeph">-ls</code> operations for the various directories.
+        </p>
+
+        <p class="p">
+          Use the <code class="ph codeph">impala-shell</code> command to create tables, either interactively or through a SQL
+          script.
+        </p>
+
+        <p class="p">
+          The following example shows creating three tables. For each table, the example shows creating columns
+          with various attributes such as Boolean or integer types. The example also includes commands that provide
+          information about how the data is formatted, such as rows terminating with commas, which makes sense in
+          the case of importing data from a <code class="ph codeph">.csv</code> file. Where we already have <code class="ph codeph">.csv</code>
+          files containing data in the HDFS directory tree, we specify the location of the directory containing the
+          appropriate <code class="ph codeph">.csv</code> file. Impala considers all the data from all the files in that
+          directory to represent the data for the table.
+        </p>
+
+<pre class="pre codeblock"><code>DROP TABLE IF EXISTS tab1;
+-- The EXTERNAL clause means the data is located outside the central location
+-- for Impala data files and is preserved when the associated Impala table is dropped.
+-- We expect the data to already exist in the directory specified by the LOCATION clause.
+CREATE EXTERNAL TABLE tab1
+(
+   id INT,
+   col_1 BOOLEAN,
+   col_2 DOUBLE,
+   col_3 TIMESTAMP
+)
+ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
+LOCATION '/user/username/sample_data/tab1';
+
+DROP TABLE IF EXISTS tab2;
+-- TAB2 is an external table, similar to TAB1.
+CREATE EXTERNAL TABLE tab2
+(
+   id INT,
+   col_1 BOOLEAN,
+   col_2 DOUBLE
+)
+ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
+LOCATION '/user/username/sample_data/tab2';
+
+DROP TABLE IF EXISTS tab3;
+-- Leaving out the EXTERNAL clause means the data will be managed
+-- in the central Impala data directory tree. Rather than reading
+-- existing data files when the table is created, we load the
+-- data after creating the table.
+CREATE TABLE tab3
+(
+   id INT,
+   col_1 BOOLEAN,
+   col_2 DOUBLE,
+   month INT,
+   day INT
+)
+ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
+</code></pre>
+
+        <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+          Getting through these <code class="ph codeph">CREATE TABLE</code> statements successfully is an important validation
+          step to confirm everything is configured correctly with the Hive metastore and HDFS permissions. If you
+          receive any errors during the <code class="ph codeph">CREATE TABLE</code> statements:
+          <ul class="ul">
+            <li class="li">
+              Make sure you followed the installation instructions closely, in
+              <a class="xref" href="impala_install.html#install">Installing Impala</a>.
+            </li>
+
+            <li class="li">
+              Make sure the <code class="ph codeph">hive.metastore.warehouse.dir</code> property points to a directory that
+              Impala can write to. The ownership should be <code class="ph codeph">hive:hive</code>, and the
+              <code class="ph codeph">impala</code> user should also be a member of the <code class="ph codeph">hive</code> group.
+            </li>
+          </ul>
+        </div>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title5" id="tut_beginner__tutorial_create_table">
+
+      <h3 class="title topictitle3" id="ariaid-title5">Point an Impala Table at Existing Data Files</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          A convenient way to set up data for Impala to access is to use an external table, where the data already
+          exists in a set of HDFS files and you just point the Impala table at the directory containing those
+          files. For example, you might run in <code class="ph codeph">impala-shell</code> a <code class="ph codeph">*.sql</code> file with
+          contents similar to the following, to create an Impala table that accesses an existing data file used by
+          Hive.
+        </p>
+
+        <p class="p">
+          The following examples set up 2 tables, referencing the paths and sample data from the sample TPC-DS kit for Impala.
+          For historical reasons, the data physically resides in an HDFS directory tree under
+          <span class="ph filepath">/user/hive</span>, although this particular data is entirely managed by Impala rather than
+          Hive. When we create an external table, we specify the directory containing one or more data files, and
+          Impala queries the combined content of all the files inside that directory. Here is how we examine the
+          directories and files within the HDFS filesystem:
+        </p>
+
+<pre class="pre codeblock"><code>$ cd ~/username/datasets
+$ ./tpcds-setup.sh
+... Downloads and unzips the kit, builds the data and loads it into HDFS ...
+$ hdfs dfs -ls /user/hive/tpcds/customer
+Found 1 items
+-rw-r--r--   1 username supergroup   13209372 2013-03-22 18:09 /user/hive/tpcds/customer/customer.dat
+$ hdfs dfs -cat /user/hive/tpcds/customer/customer.dat | more
+1|AAAAAAAABAAAAAAA|980124|7135|32946|2452238|2452208|Mr.|Javier|Lewis|Y|9|12|1936|CHILE||Javie
+r.Lewis@VFAxlnZEvOx.org|2452508|
+2|AAAAAAAACAAAAAAA|819667|1461|31655|2452318|2452288|Dr.|Amy|Moses|Y|9|4|1966|TOGO||Amy.Moses@
+Ovk9KjHH.com|2452318|
+3|AAAAAAAADAAAAAAA|1473522|6247|48572|2449130|2449100|Miss|Latisha|Hamilton|N|18|9|1979|NIUE||
+Latisha.Hamilton@V.com|2452313|
+4|AAAAAAAAEAAAAAAA|1703214|3986|39558|2450030|2450000|Dr.|Michael|White|N|7|6|1983|MEXICO||Mic
+hael.White@i.org|2452361|
+5|AAAAAAAAFAAAAAAA|953372|4470|36368|2449438|2449408|Sir|Robert|Moran|N|8|5|1956|FIJI||Robert.
+Moran@Hh.edu|2452469|
+...
+</code></pre>
+
+        <p class="p">
+          Here is a SQL script to set up Impala tables pointing to some of these data files in HDFS.
+          (The script in the VM sets up tables like this through Hive; ignore those tables
+          for purposes of this demonstration.)
+          Save the following as <span class="ph filepath">customer_setup.sql</span>:
+        </p>
+
+<pre class="pre codeblock"><code>--
+-- store_sales fact table and surrounding dimension tables only
+--
+create database tpcds;
+use tpcds;
+
+drop table if exists customer;
+create external table customer
+(
+    c_customer_sk             int,
+    c_customer_id             string,
+    c_current_cdemo_sk        int,
+    c_current_hdemo_sk        int,
+    c_current_addr_sk         int,
+    c_first_shipto_date_sk    int,
+    c_first_sales_date_sk     int,
+    c_salutation              string,
+    c_first_name              string,
+    c_last_name               string,
+    c_preferred_cust_flag     string,
+    c_birth_day               int,
+    c_birth_month             int,
+    c_birth_year              int,
+    c_birth_country           string,
+    c_login                   string,
+    c_email_address           string,
+    c_last_review_date        string
+)
+row format delimited fields terminated by '|'
+location '/user/hive/tpcds/customer';
+
+drop table if exists customer_address;
+create external table customer_address
+(
+    ca_address_sk             int,
+    ca_address_id             string,
+    ca_street_number          string,
+    ca_street_name            string,
+    ca_street_type            string,
+    ca_suite_number           string,
+    ca_city                   string,
+    ca_county                 string,
+    ca_state                  string,
+    ca_zip                    string,
+    ca_country                string,
+    ca_gmt_offset             float,
+    ca_location_type          string
+)
+row format delimited fields terminated by '|'
+location '/user/hive/tpcds/customer_address';
+</code></pre>
+
+        <div class="p">
+          We would run this script with a command such as:
+<pre class="pre codeblock"><code>impala-shell -i localhost -f customer_setup.sql</code></pre>
+        </div>
+
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title6" id="tut_beginner__tutorial_describe_impala">
+
+      <h3 class="title topictitle3" id="ariaid-title6">Describe the Impala Table</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          Now that you have updated the database metadata that Impala caches, you can confirm that the expected
+          tables are accessible by Impala and examine the attributes of one of the tables. We created these tables
+          in the database named <code class="ph codeph">default</code>. If the tables were in a database other than the default,
+          we would issue a command <code class="ph codeph">use <var class="keyword varname">db_name</var> </code> to switch to that database
+          before examining or querying its tables. We could also qualify the name of a table by prepending the
+          database name, for example <code class="ph codeph">default.customer</code> and <code class="ph codeph">default.customer_name</code>.
+        </p>
+
+<pre class="pre codeblock"><code>[impala-host:21000] &gt; show databases
+Query finished, fetching results ...
+default
+Returned 1 row(s) in 0.00s
+[impala-host:21000] &gt; show tables
+Query finished, fetching results ...
+customer
+customer_address
+Returned 2 row(s) in 0.00s
+[impala-host:21000] &gt; describe customer_address
++------------------+--------+---------+
+| name             | type   | comment |
++------------------+--------+---------+
+| ca_address_sk    | int    |         |
+| ca_address_id    | string |         |
+| ca_street_number | string |         |
+| ca_street_name   | string |         |
+| ca_street_type   | string |         |
+| ca_suite_number  | string |         |
+| ca_city          | string |         |
+| ca_county        | string |         |
+| ca_state         | string |         |
+| ca_zip           | string |         |
+| ca_country       | string |         |
+| ca_gmt_offset    | float  |         |
+| ca_location_type | string |         |
++------------------+--------+---------+
+Returned 13 row(s) in 0.01
+</code></pre>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title7" id="tut_beginner__tutorial_query_impala">
+
+      <h3 class="title topictitle3" id="ariaid-title7">Query the Impala Table</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          You can query data contained in the tables. Impala coordinates the query execution across a single node
+          or multiple nodes depending on your configuration, without the overhead of running MapReduce jobs to
+          perform the intermediate processing.
+        </p>
+
+        <p class="p">
+          There are a variety of ways to execute queries on Impala:
+        </p>
+
+        <ul class="ul">
+          <li class="li">
+            Using the <code class="ph codeph">impala-shell</code> command in interactive mode:
+<pre class="pre codeblock"><code>$ impala-shell -i impala-host
+Connected to localhost:21000
+[impala-host:21000] &gt; select count(*) from customer_address;
+50000
+Returned 1 row(s) in 0.37s
+</code></pre>
+          </li>
+
+          <li class="li">
+            Passing a set of commands contained in a file:
+<pre class="pre codeblock"><code>$ impala-shell -i impala-host -f myquery.sql
+Connected to localhost:21000
+50000
+Returned 1 row(s) in 0.19s</code></pre>
+          </li>
+
+          <li class="li">
+            Passing a single command to the <code class="ph codeph">impala-shell</code> command. The query is executed, the
+            results are returned, and the shell exits. Make sure to quote the command, preferably with single
+            quotation marks to avoid shell expansion of characters such as <code class="ph codeph">*</code>.
+<pre class="pre codeblock"><code>$ impala-shell -i impala-host -q 'select count(*) from customer_address'
+Connected to localhost:21000
+50000
+Returned 1 row(s) in 0.29s</code></pre>
+          </li>
+        </ul>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title8" id="tut_beginner__tutorial_etl">
+
+      <h3 class="title topictitle3" id="ariaid-title8">Data Loading and Querying Examples</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          This section describes how to create some sample tables and load data into them. These tables can then be
+          queried using the Impala shell.
+        </p>
+      </div>
+
+      <article class="topic concept nested3" aria-labelledby="ariaid-title9" id="tutorial_etl__tutorial_loading">
+
+        <h4 class="title topictitle4" id="ariaid-title9">Loading Data</h4>
+
+        <div class="body conbody">
+
+          <p class="p">
+            Loading data involves:
+          </p>
+
+          <ul class="ul">
+            <li class="li">
+              Establishing a data set. The example below uses <code class="ph codeph">.csv</code> files.
+            </li>
+
+            <li class="li">
+              Creating tables to which to load data.
+            </li>
+
+            <li class="li">
+              Loading the data into the tables you created.
+            </li>
+          </ul>
+
+
+
+
+
+
+        </div>
+      </article>
+
+      <article class="topic concept nested3" aria-labelledby="ariaid-title10" id="tutorial_etl__tutorial_queries">
+
+        <h4 class="title topictitle4" id="ariaid-title10">Sample Queries</h4>
+
+        <div class="body conbody">
+
+          <p class="p">
+            To run these sample queries, create a SQL query file <code class="ph codeph">query.sql</code>, copy and paste each
+            query into the query file, and then run the query file using the shell. For example, to run
+            <code class="ph codeph">query.sql</code> on <code class="ph codeph">impala-host</code>, you might use the command:
+          </p>
+
+<pre class="pre codeblock"><code>impala-shell.sh -i impala-host -f query.sql</code></pre>
+
+          <p class="p">
+            The examples and results below assume you have loaded the sample data into the tables as described
+            above.
+          </p>
+
+          <div class="example"><h5 class="title sectiontitle">Example: Examining Contents of Tables</h5>
+
+            
+
+            <p class="p">
+              Let's start by verifying that the tables do contain the data we expect. Because Impala often deals
+              with tables containing millions or billions of rows, when examining tables of unknown size, include
+              the <code class="ph codeph">LIMIT</code> clause to avoid huge amounts of unnecessary output, as in the final query.
+              (If your interactive query starts displaying an unexpected volume of data, press
+              <code class="ph codeph">Ctrl-C</code> in <code class="ph codeph">impala-shell</code> to cancel the query.)
+            </p>
+
+<pre class="pre codeblock"><code>SELECT * FROM tab1;
+SELECT * FROM tab2;
+SELECT * FROM tab2 LIMIT 5;</code></pre>
+
+            <p class="p">
+              Results:
+            </p>
+
+<pre class="pre codeblock"><code>+----+-------+------------+-------------------------------+
+| id | col_1 | col_2      | col_3                         |
++----+-------+------------+-------------------------------+
+| 1  | true  | 123.123    | 2012-10-24 08:55:00           |
+| 2  | false | 1243.5     | 2012-10-25 13:40:00           |
+| 3  | false | 24453.325  | 2008-08-22 09:33:21.123000000 |
+| 4  | false | 243423.325 | 2007-05-12 22:32:21.334540000 |
+| 5  | true  | 243.325    | 1953-04-22 09:11:33           |
++----+-------+------------+-------------------------------+
+
++----+-------+---------------+
+| id | col_1 | col_2         |
++----+-------+---------------+
+| 1  | true  | 12789.123     |
+| 2  | false | 1243.5        |
+| 3  | false | 24453.325     |
+| 4  | false | 2423.3254     |
+| 5  | true  | 243.325       |
+| 60 | false | 243565423.325 |
+| 70 | true  | 243.325       |
+| 80 | false | 243423.325    |
+| 90 | true  | 243.325       |
++----+-------+---------------+
+
++----+-------+-----------+
+| id | col_1 | col_2     |
++----+-------+-----------+
+| 1  | true  | 12789.123 |
+| 2  | false | 1243.5    |
+| 3  | false | 24453.325 |
+| 4  | false | 2423.3254 |
+| 5  | true  | 243.325   |
++----+-------+-----------+</code></pre>
+
+          </div>
+
+          <div class="example"><h5 class="title sectiontitle">Example: Aggregate and Join</h5>
+
+            
+
+<pre class="pre codeblock"><code>SELECT tab1.col_1, MAX(tab2.col_2), MIN(tab2.col_2)
+FROM tab2 JOIN tab1 USING (id)
+GROUP BY col_1 ORDER BY 1 LIMIT 5;</code></pre>
+
+            <p class="p">
+              Results:
+            </p>
+
+<pre class="pre codeblock"><code>+-------+-----------------+-----------------+
+| col_1 | max(tab2.col_2) | min(tab2.col_2) |
++-------+-----------------+-----------------+
+| false | 24453.325       | 1243.5          |
+| true  | 12789.123       | 243.325         |
++-------+-----------------+-----------------+</code></pre>
+
+          </div>
+
+          <div class="example"><h5 class="title sectiontitle">Example: Subquery, Aggregate and Joins</h5>
+
+            
+
+<pre class="pre codeblock"><code>SELECT tab2.*
+FROM tab2,
+(SELECT tab1.col_1, MAX(tab2.col_2) AS max_col2
+ FROM tab2, tab1
+ WHERE tab1.id = tab2.id
+ GROUP BY col_1) subquery1
+WHERE subquery1.max_col2 = tab2.col_2;</code></pre>
+
+            <p class="p">
+              Results:
+            </p>
+
+<pre class="pre codeblock"><code>+----+-------+-----------+
+| id | col_1 | col_2     |
++----+-------+-----------+
+| 1  | true  | 12789.123 |
+| 3  | false | 24453.325 |
++----+-------+-----------+</code></pre>
+
+          </div>
+
+          <div class="example"><h5 class="title sectiontitle">Example: INSERT Query</h5>
+
+            
+
+<pre class="pre codeblock"><code>INSERT OVERWRITE TABLE tab3
+SELECT id, col_1, col_2, MONTH(col_3), DAYOFMONTH(col_3)
+FROM tab1 WHERE YEAR(col_3) = 2012;</code></pre>
+
+            <p class="p">
+              Query <code class="ph codeph">TAB3</code> to check the result:
+            </p>
+
+<pre class="pre codeblock"><code>SELECT * FROM tab3;
+</code></pre>
+
+            <p class="p">
+              Results:
+            </p>
+
+<pre class="pre codeblock"><code>+----+-------+---------+-------+-----+
+| id | col_1 | col_2   | month | day |
++----+-------+---------+-------+-----+
+| 1  | true  | 123.123 | 10    | 24  |
+| 2  | false | 1243.5  | 10    | 25  |
++----+-------+---------+-------+-----+</code></pre>
+
+          </div>
+        </div>
+      </article>
+    </article>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title11" id="tutorial__tut_advanced">
+
+    <h2 class="title topictitle2" id="ariaid-title11">Advanced Tutorials</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        These tutorials walk you through advanced scenarios or specialized features.
+      </p>
+
+      <p class="p toc inpage"></p>
+    </div>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title12" id="tut_advanced__tut_external_partition_data">
+
+      <h3 class="title topictitle3" id="ariaid-title12">Attaching an External Partitioned Table to an HDFS Directory Structure</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          This tutorial shows how you might set up a directory tree in HDFS, put data files into the lowest-level
+          subdirectories, and then use an Impala external table to query the data files from their original
+          locations.
+        </p>
+
+        <p class="p">
+          The tutorial uses a table with web log data, with separate subdirectories for the year, month, day, and
+          host. For simplicity, we use a tiny amount of CSV data, loading the same data into each partition.
+        </p>
+
+        <p class="p">
+          First, we make an Impala partitioned table for CSV data, and look at the underlying HDFS directory
+          structure to understand the directory structure to re-create elsewhere in HDFS. The columns
+          <code class="ph codeph">field1</code>, <code class="ph codeph">field2</code>, and <code class="ph codeph">field3</code> correspond to the contents
+          of the CSV data files. The <code class="ph codeph">year</code>, <code class="ph codeph">month</code>, <code class="ph codeph">day</code>, and
+          <code class="ph codeph">host</code> columns are all represented as subdirectories within the table structure, and are
+          not part of the CSV files. We use <code class="ph codeph">STRING</code> for each of these columns so that we can
+          produce consistent subdirectory names, with leading zeros for a consistent length.
+        </p>
+
+<pre class="pre codeblock"><code>create database external_partitions;
+use external_partitions;
+create table logs (field1 string, field2 string, field3 string)
+  partitioned by (year string, month string , day string, host string)
+  row format delimited fields terminated by ',';
+insert into logs partition (year="2013", month="07", day="28", host="host1") values ("foo","foo","foo");
+insert into logs partition (year="2013", month="07", day="28", host="host2") values ("foo","foo","foo");
+insert into logs partition (year="2013", month="07", day="29", host="host1") values ("foo","foo","foo");
+insert into logs partition (year="2013", month="07", day="29", host="host2") values ("foo","foo","foo");
+insert into logs partition (year="2013", month="08", day="01", host="host1") values ("foo","foo","foo");
+</code></pre>
+
+        <p class="p">
+          Back in the Linux shell, we examine the HDFS directory structure. (Your Impala data directory might be in
+          a different location; for historical reasons, it is sometimes under the HDFS path
+          <span class="ph filepath">/user/hive/warehouse</span>.) We use the <code class="ph codeph">hdfs dfs -ls</code> command to examine
+          the nested subdirectories corresponding to each partitioning column, with separate subdirectories at each
+          level (with <code class="ph codeph">=</code> in their names) representing the different values for each partitioning
+          column. When we get to the lowest level of subdirectory, we use the <code class="ph codeph">hdfs dfs -cat</code>
+          command to examine the data file and see CSV-formatted data produced by the <code class="ph codeph">INSERT</code>
+          statement in Impala.
+        </p>
+
+<pre class="pre codeblock"><code>$ hdfs dfs -ls /user/impala/warehouse/external_partitions.db
+Found 1 items
+drwxrwxrwt   - impala hive          0 2013-08-07 12:24 /user/impala/warehouse/external_partitions.db/logs
+$ hdfs dfs -ls /user/impala/warehouse/external_partitions.db/logs
+Found 1 items
+drwxr-xr-x   - impala hive          0 2013-08-07 12:24 /user/impala/warehouse/external_partitions.db/logs/year=2013
+$ hdfs dfs -ls /user/impala/warehouse/external_partitions.db/logs/year=2013
+Found 2 items
+drwxr-xr-x   - impala hive          0 2013-08-07 12:23 /user/impala/warehouse/external_partitions.db/logs/year=2013/month=07
+drwxr-xr-x   - impala hive          0 2013-08-07 12:24 /user/impala/warehouse/external_partitions.db/logs/year=2013/month=08
+$ hdfs dfs -ls /user/impala/warehouse/external_partitions.db/logs/year=2013/month=07
+Found 2 items
+drwxr-xr-x   - impala hive          0 2013-08-07 12:22 /user/impala/warehouse/external_partitions.db/logs/year=2013/month=07/day=28
+drwxr-xr-x   - impala hive          0 2013-08-07 12:23 /user/impala/warehouse/external_partitions.db/logs/year=2013/month=07/day=29
+$ hdfs dfs -ls /user/impala/warehouse/external_partitions.db/logs/year=2013/month=07/day=28
+Found 2 items
+drwxr-xr-x   - impala hive          0 2013-08-07 12:21 /user/impala/warehouse/external_partitions.db/logs/year=2013/month=07/day=28/host=host1
+drwxr-xr-x   - impala hive          0 2013-08-07 12:22 /user/impala/warehouse/external_partitions.db/logs/year=2013/month=07/day=28/host=host2
+$ hdfs dfs -ls /user/impala/warehouse/external_partitions.db/logs/year=2013/month=07/day=28/host=host1
+Found 1 items
+-rw-r--r--   3 impala hive         12 2013-08-07 12:21 /user/impala/warehouse/external_partiti
+ons.db/logs/year=2013/month=07/day=28/host=host1/3981726974111751120--8907184999369517436_822630111_data.0
+$ hdfs dfs -cat /user/impala/warehouse/external_partitions.db/logs/year=2013/month=07/day=28/\
+host=host1/3981726974111751120--8 907184999369517436_822630111_data.0
+foo,foo,foo
+</code></pre>
+
+        <p class="p">
+          Still in the Linux shell, we use <code class="ph codeph">hdfs dfs -mkdir</code> to create several data directories
+          outside the HDFS directory tree that Impala controls (<span class="ph filepath">/user/impala/warehouse</span> in this
+          example, maybe different in your case). Depending on your configuration, you might need to log in as a
+          user with permission to write into this HDFS directory tree; for example, the commands shown here were
+          run while logged in as the <code class="ph codeph">hdfs</code> user.
+        </p>
+
+<pre class="pre codeblock"><code>$ hdfs dfs -mkdir -p /user/impala/data/logs/year=2013/month=07/day=28/host=host1
+$ hdfs dfs -mkdir -p /user/impala/data/logs/year=2013/month=07/day=28/host=host2
+$ hdfs dfs -mkdir -p /user/impala/data/logs/year=2013/month=07/day=28/host=host1
+$ hdfs dfs -mkdir -p /user/impala/data/logs/year=2013/month=07/day=29/host=host1
+$ hdfs dfs -mkdir -p /user/impala/data/logs/year=2013/month=08/day=01/host=host1
+</code></pre>
+
+        <p class="p">
+          We make a tiny CSV file, with values different than in the <code class="ph codeph">INSERT</code> statements used
+          earlier, and put a copy within each subdirectory that we will use as an Impala partition.
+        </p>
+
+<pre class="pre codeblock"><code>$ cat &gt;dummy_log_data
+bar,baz,bletch
+$ hdfs dfs -mkdir -p /user/impala/data/external_partitions/year=2013/month=08/day=01/host=host1
+$ hdfs dfs -mkdir -p /user/impala/data/external_partitions/year=2013/month=07/day=28/host=host1
+$ hdfs dfs -mkdir -p /user/impala/data/external_partitions/year=2013/month=07/day=28/host=host2
+$ hdfs dfs -mkdir -p /user/impala/data/external_partitions/year=2013/month=07/day=29/host=host1
+$ hdfs dfs -put dummy_log_data /user/impala/data/logs/year=2013/month=07/day=28/host=host1
+$ hdfs dfs -put dummy_log_data /user/impala/data/logs/year=2013/month=07/day=28/host=host2
+$ hdfs dfs -put dummy_log_data /user/impala/data/logs/year=2013/month=07/day=29/host=host1
+$ hdfs dfs -put dummy_log_data /user/impala/data/logs/year=2013/month=08/day=01/host=host1
+</code></pre>
+
+        <p class="p">
+          Back in the <span class="keyword cmdname">impala-shell</span> interpreter, we move the original Impala-managed table aside,
+          and create a new <em class="ph i">external</em> table with a <code class="ph codeph">LOCATION</code> clause pointing to the directory
+          under which we have set up all the partition subdirectories and data files.
+        </p>
+
+<pre class="pre codeblock"><code>use external_partitions;
+alter table logs rename to logs_original;
+create external table logs (field1 string, field2 string, field3 string)
+  partitioned by (year string, month string, day string, host string)
+  row format delimited fields terminated by ','
+  location '/user/impala/data/logs';
+</code></pre>
+
+        <p class="p">
+          Because partition subdirectories and data files come and go during the data lifecycle, you must identify
+          each of the partitions through an <code class="ph codeph">ALTER TABLE</code> statement before Impala recognizes the
+          data files they contain.
+        </p>
+
+<pre class="pre codeblock"><code>alter table logs add partition (year="2013",month="07",day="28",host="host1")
+alter table log_type add partition (year="2013",month="07",day="28",host="host2");
+alter table log_type add partition (year="2013",month="07",day="29",host="host1");
+alter table log_type add partition (year="2013",month="08",day="01",host="host1");
+</code></pre>
+
+        <p class="p">
+          We issue a <code class="ph codeph">REFRESH</code> statement for the table, always a safe practice when data files have
+          been manually added, removed, or changed. Then the data is ready to be queried. The <code class="ph codeph">SELECT
+          *</code> statement illustrates that the data from our trivial CSV file was recognized in each of the
+          partitions where we copied it. Although in this case there are only a few rows, we include a
+          <code class="ph codeph">LIMIT</code> clause on this test query just in case there is more data than we expect.
+        </p>
+
+<pre class="pre codeblock"><code>refresh log_type;
+select * from log_type limit 100;
++--------+--------+--------+------+-------+-----+-------+
+| field1 | field2 | field3 | year | month | day | host  |
++--------+--------+--------+------+-------+-----+-------+
+| bar    | baz    | bletch | 2013 | 07    | 28  | host1 |
+| bar    | baz    | bletch | 2013 | 08    | 01  | host1 |
+| bar    | baz    | bletch | 2013 | 07    | 29  | host1 |
+| bar    | baz    | bletch | 2013 | 07    | 28  | host2 |
++--------+--------+--------+------+-------+-----+-------+
+</code></pre>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title13" id="tut_advanced__tutorial_impala_hive">
+
+      <h3 class="title topictitle3" id="ariaid-title13">Switching Back and Forth Between Impala and Hive</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          Sometimes, you might find it convenient to switch to the Hive shell to perform some data loading or
+          transformation operation, particularly on file formats such as RCFile, SequenceFile, and Avro that Impala
+          currently can query but not write to.
+        </p>
+
+        <p class="p">
+          Whenever you create, drop, or alter a table or other kind of object through Hive, the next time you
+          switch back to the <span class="keyword cmdname">impala-shell</span> interpreter, issue a one-time <code class="ph codeph">INVALIDATE
+          METADATA</code> statement so that Impala recognizes the new or changed object.
+        </p>
+
+        <p class="p">
+          Whenever you load, insert, or change data in an existing table through Hive (or even through manual HDFS
+          operations such as the <span class="keyword cmdname">hdfs</span> command), the next time you switch back to the
+          <span class="keyword cmdname">impala-shell</span> interpreter, issue a one-time <code class="ph codeph">REFRESH
+          <var class="keyword varname">table_name</var></code> statement so that Impala recognizes the new or changed data.
+        </p>
+
+        <p class="p">
+          For examples showing how this process works for the <code class="ph codeph">REFRESH</code> statement, look at the
+          examples of creating RCFile and SequenceFile tables in Impala, loading data through Hive, and then
+          querying the data through Impala. See <a class="xref" href="impala_rcfile.html#rcfile">Using the RCFile File Format with Impala Tables</a> and
+          <a class="xref" href="impala_seqfile.html#seqfile">Using the SequenceFile File Format with Impala Tables</a> for those examples.
+        </p>
+
+        <p class="p">
+          For examples showing how this process works for the <code class="ph codeph">INVALIDATE METADATA</code> statement, look
+          at the example of creating and loading an Avro table in Hive, and then querying the data through Impala.
+          See <a class="xref" href="impala_avro.html#avro">Using the Avro File Format with Impala Tables</a> for that example.
+        </p>
+
+        <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+          <p class="p">
+            Originally, Impala did not support UDFs, but this feature is available in Impala starting in Impala
+            1.2. Some <code class="ph codeph">INSERT ... SELECT</code> transformations that you originally did through Hive can
+            now be done through Impala. See <a class="xref" href="impala_udf.html#udfs">Impala User-Defined Functions (UDFs)</a> for details.
+          </p>
+
+          <p class="p">
+            Prior to Impala 1.2, the <code class="ph codeph">REFRESH</code> and <code class="ph codeph">INVALIDATE METADATA</code> statements
+            needed to be issued on each Impala node to which you connected and issued queries. In Impala 1.2 and
+            higher, when you issue either of those statements on any Impala node, the results are broadcast to all
+            the Impala nodes in the cluster, making it truly a one-step operation after each round of DDL or ETL
+            operations in Hive.
+          </p>
+        </div>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title14" id="tut_advanced__tut_cross_join">
+
+      <h3 class="title topictitle3" id="ariaid-title14">Cross Joins and Cartesian Products with the CROSS JOIN Operator</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          Originally, Impala restricted join queries so that they had to include at least one equality comparison
+          between the columns of the tables on each side of the join operator. With the huge tables typically
+          processed by Impala, any miscoded query that produced a full Cartesian product as a result set could
+          consume a huge amount of cluster resources.
+        </p>
+
+        <p class="p">
+          In Impala 1.2.2 and higher, this restriction is lifted when you use the <code class="ph codeph">CROSS JOIN</code>
+          operator in the query. You still cannot remove all <code class="ph codeph">WHERE</code> clauses from a query like
+          <code class="ph codeph">SELECT * FROM t1 JOIN t2</code> to produce all combinations of rows from both tables. But you
+          can use the <code class="ph codeph">CROSS JOIN</code> operator to explicitly request such a Cartesian product.
+          Typically, this operation is applicable for smaller tables, where the result set still fits within the
+          memory of a single Impala node.
+        </p>
+
+        <p class="p">
+          The following example sets up data for use in a series of comic books where characters battle each other.
+          At first, we use an equijoin query, which only allows characters from the same time period and the same
+          planet to meet.
+        </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; create table heroes (name string, era string, planet string);
+[localhost:21000] &gt; create table villains (name string, era string, planet string);
+[localhost:21000] &gt; insert into heroes values
+                  &gt; ('Tesla','20th century','Earth'),
+                  &gt; ('Pythagoras','Antiquity','Earth'),
+                  &gt; ('Zopzar','Far Future','Mars');
+Inserted 3 rows in 2.28s
+[localhost:21000] &gt; insert into villains values
+                  &gt; ('Caligula','Antiquity','Earth'),
+                  &gt; ('John Dillinger','20th century','Earth'),
+                  &gt; ('Xibulor','Far Future','Venus');
+Inserted 3 rows in 1.93s
+[localhost:21000] &gt; select concat(heroes.name,' vs. ',villains.name) as battle
+                  &gt; from heroes join villains
+                  &gt; where heroes.era = villains.era and heroes.planet = villains.planet;
++--------------------------+
+| battle                   |
++--------------------------+
+| Tesla vs. John Dillinger |
+| Pythagoras vs. Caligula  |
++--------------------------+
+Returned 2 row(s) in 0.47s</code></pre>
+
+        <p class="p">
+          Readers demanded more action, so we added elements of time travel and space travel so that any hero could
+          face any villain. Prior to Impala 1.2.2, this type of query was impossible because all joins had to
+          reference matching values between the two tables:
+        </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; -- Cartesian product not possible in Impala 1.1.
+                  &gt; select concat(heroes.name,' vs. ',villains.name) as battle from heroes join villains;
+ERROR: NotImplementedException: Join between 'heroes' and 'villains' requires at least one conjunctive equality predicate between the two tables</code></pre>
+
+        <p class="p">
+          With Impala 1.2.2, we rewrite the query slightly to use <code class="ph codeph">CROSS JOIN</code> rather than
+          <code class="ph codeph">JOIN</code>, and now the result set includes all combinations:
+        </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; -- Cartesian product available in Impala 1.2.2 with the CROSS JOIN syntax.
+                  &gt; select concat(heroes.name,' vs. ',villains.name) as battle from heroes cross join villains;
++-------------------------------+
+| battle                        |
++-------------------------------+
+| Tesla vs. Caligula            |
+| Tesla vs. John Dillinger      |
+| Tesla vs. Xibulor             |
+| Pythagoras vs. Caligula       |
+| Pythagoras vs. John Dillinger |
+| Pythagoras vs. Xibulor        |
+| Zopzar vs. Caligula           |
+| Zopzar vs. John Dillinger     |
+| Zopzar vs. Xibulor            |
++-------------------------------+
+Returned 9 row(s) in 0.33s</code></pre>
+
+        <p class="p">
+          The full combination of rows from both tables is known as the Cartesian product. This type of result set
+          is often used for creating grid data structures. You can also filter the result set by including
+          <code class="ph codeph">WHERE</code> clauses that do not explicitly compare columns between the two tables. The
+          following example shows how you might produce a list of combinations of year and quarter for use in a
+          chart, and then a shorter list with only selected quarters.
+        </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; create table x_axis (x int);
+[localhost:21000] &gt; create table y_axis (y int);
+[localhost:21000] &gt; insert into x_axis values (1),(2),(3),(4);
+Inserted 4 rows in 2.14s
+[localhost:21000] &gt; insert into y_axis values (2010),(2011),(2012),(2013),(2014);
+Inserted 5 rows in 1.32s
+[localhost:21000] &gt; select y as year, x as quarter from x_axis cross join y_axis;
++------+---------+
+| year | quarter |
++------+---------+
+| 2010 | 1       |
+| 2011 | 1       |
+| 2012 | 1       |
+| 2013 | 1       |
+| 2014 | 1       |
+| 2010 | 2       |
+| 2011 | 2       |
+| 2012 | 2       |
+| 2013 | 2       |
+| 2014 | 2       |
+| 2010 | 3       |
+| 2011 | 3       |
+| 2012 | 3       |
+| 2013 | 3       |
+| 2014 | 3       |
+| 2010 | 4       |
+| 2011 | 4       |
+| 2012 | 4       |
+| 2013 | 4       |
+| 2014 | 4       |
++------+---------+
+Returned 20 row(s) in 0.38s
+[localhost:21000] &gt; select y as year, x as quarter from x_axis cross join y_axis where x in (1,3);
++------+---------+
+| year | quarter |
++------+---------+
+| 2010 | 1       |
+| 2011 | 1       |
+| 2012 | 1       |
+| 2013 | 1       |
+| 2014 | 1       |
+| 2010 | 3       |
+| 2011 | 3       |
+| 2012 | 3       |
+| 2013 | 3       |
+| 2014 | 3       |
++------+---------+
+Returned 10 row(s) in 0.39s</code></pre>
+      </div>
+    </article>
+
+    
+  </article>
+
+  
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title15" id="tutorial__tut_parquet_schemaless">
+
+    <h2 class="title topictitle2" id="ariaid-title15">Dealing with Parquet Files with Unknown Schema</h2>
+    
+
+    <div class="body conbody">
+
+      <p class="p">
+        As data pipelines start to include more aspects such as NoSQL or loosely specified schemas, you might encounter
+        situations where you have data files (particularly in Parquet format) where you do not know the precise table definition.
+        This tutorial shows how you can build an Impala table around data that comes from non-Impala or even non-SQL sources,
+        where you do not have control of the table layout and might not be familiar with the characteristics of the data.
+      </p>
+
+<p class="p">
+The data used in this tutorial represents airline on-time arrival statistics, from October 1987 through April 2008.
+See the details on the <a class="xref" href="http://stat-computing.org/dataexpo/2009/" target="_blank">2009 ASA Data Expo web site</a>.
+You can also see the <a class="xref" href="http://stat-computing.org/dataexpo/2009/the-data.html" target="_blank">explanations of the columns</a>;
+for purposes of this exercise, wait until after following the tutorial before examining the schema, to better simulate
+a real-life situation where you cannot rely on assumptions and assertions about the ranges and representations of
+data values.
+</p>
+
+<p class="p">
+We will download Parquet files containing this data from the Ibis blog.
+First, we download and unpack the data files.
+There are 8 files totalling 1.4 GB.
+Each file is less than 256 MB.
+</p>
+
+<pre class="pre codeblock"><code>$ wget -O airlines_parquet.tar.gz https://www.dropbox.com/s/ol9x51tqp6cv4yc/airlines_parquet.tar.gz?dl=0
+...
+Length: 1245204740 (1.2G) [application/octet-stream]
+Saving to: \u201cairlines_parquet.tar.gz\u201d
+
+2015-08-12 17:14:24 (23.6 MB/s) - \u201cairlines_parquet.tar.gz\u201d saved [1245204740/1245204740]
+
+$ tar xvzf airlines_parquet.tar.gz
+airlines_parquet/
+airlines_parquet/93459d994898a9ba-77674173b331fa9a_2073981944_data.0.parq
+airlines_parquet/93459d994898a9ba-77674173b331fa99_1555718317_data.1.parq
+airlines_parquet/93459d994898a9ba-77674173b331fa99_1555718317_data.0.parq
+airlines_parquet/93459d994898a9ba-77674173b331fa96_2118228804_data.0.parq
+airlines_parquet/93459d994898a9ba-77674173b331fa97_574780876_data.0.parq
+airlines_parquet/93459d994898a9ba-77674173b331fa96_2118228804_data.1.parq
+airlines_parquet/93459d994898a9ba-77674173b331fa98_1194408366_data.0.parq
+airlines_parquet/93459d994898a9ba-77674173b331fa9b_1413430552_data.0.parq
+$ cd airlines_parquet/
+$ du -kch *.parq
+253M  93459d994898a9ba-77674173b331fa96_2118228804_data.0.parq
+65M 93459d994898a9ba-77674173b331fa96_2118228804_data.1.parq
+156M  93459d994898a9ba-77674173b331fa97_574780876_data.0.parq
+240M  93459d994898a9ba-77674173b331fa98_1194408366_data.0.parq
+253M  93459d994898a9ba-77674173b331fa99_1555718317_data.0.parq
+16M 93459d994898a9ba-77674173b331fa99_1555718317_data.1.parq
+177M  93459d994898a9ba-77674173b331fa9a_2073981944_data.0.parq
+213M  93459d994898a9ba-77674173b331fa9b_1413430552_data.0.parq
+1.4G  total
+</code></pre>
+
+<p class="p">
+Next, we put the Parquet data files in HDFS, all together in a single directory,
+with permissions on the directory and the files so that the <code class="ph codeph">impala</code>
+user will be able to read them.
+</p>
+
+<div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+After unpacking, we saw the largest Parquet file was 253 MB.
+When copying Parquet files into HDFS for Impala to use,
+for maximum query performance, make sure that each file resides in a single HDFS data block.
+Therefore, we pick a size larger than any single file and specify that as the block size, using the argument
+<code class="ph codeph">-Ddfs.block.size=256m</code> on the <code class="ph codeph">hdfs dfs -put</code> command.
+</div>
+
+<pre class="pre codeblock"><code>$ hdfs dfs -mkdir -p hdfs://demo_host.example.com:8020/user/impala/staging/airlines
+$ hdfs dfs -Ddfs.block.size=256m -put *.parq /user/impala/staging/airlines
+$ hdfs dfs -ls /user/impala/staging
+Found 1 items
+drwxrwxrwx   - hdfs supergroup          0 2015-08-12 13:52 /user/impala/staging/airlines
+$ hdfs dfs -ls hdfs://demo_host.example.com:8020/user/impala/staging/airlines
+Found 8 items
+-rw-r--r--   3 jrussell supergroup  265107489 2015-08-12 17:18 /user/impala/staging/airlines/93459d994898a9ba-77674173b331fa96_2118228804_data.0.parq
+-rw-r--r--   3 jrussell supergroup   67544715 2015-08-12 17:18 /user/impala/staging/airlines/93459d994898a9ba-77674173b331fa96_2118228804_data.1.parq
+-rw-r--r--   3 jrussell supergroup  162556490 2015-08-12 17:18 /user/impala/staging/airlines/93459d994898a9ba-77674173b331fa97_574780876_data.0.parq
+-rw-r--r--   3 jrussell supergroup  251603518 2015-08-12 17:18 /user/impala/staging/airlines/93459d994898a9ba-77674173b331fa98_1194408366_data.0.parq
+-rw-r--r--   3 jrussell supergroup  265186603 2015-08-12 17:18 /user/impala/staging/airlines/93459d994898a9ba-77674173b331fa99_1555718317_data.0.parq
+-rw-r--r--   3 jrussell supergroup   16663754 2015-08-12 17:18 /user/impala/staging/airlines/93459d994898a9ba-77674173b331fa99_1555718317_data.1.parq
+-rw-r--r--   3 jrussell supergroup  185511677 2015-08-12 17:18 /user/impala/staging/airlines/93459d994898a9ba-77674173b331fa9a_2073981944_data.0.parq
+-rw-r--r--   3 jrussell supergroup  222794621 2015-08-12 17:18 /user/impala/staging/airlines/93459d994898a9ba-77674173b331fa9b_1413430552_data.0.parq
+</code></pre>
+
+<p class="p">
+With the files in an accessible location in HDFS, we create a database table that uses the data in those files.
+The <code class="ph codeph">CREATE EXTERNAL</code> syntax and the <code class="ph codeph">LOCATION</code> attribute point Impala at the appropriate HDFS directory.
+The <code class="ph codeph">LIKE PARQUET '<var class="keyword varname">path_to_any_parquet_file</var>'</code> clause means we skip the list of column names and types;
+Impala automatically gets the column names and data types straight from the data files.
+(Currently, this technique only works for Parquet files.)
+We ignore the warning about lack of <code class="ph codeph">READ_WRITE</code> access to the files in HDFS;
+the <code class="ph codeph">impala</code> user can read the files, which will be sufficient for us to experiment with
+queries and perform some copy and transform operations into other tables.
+</p>
+
+<pre class="pre codeblock"><code>$ impala-shell -i localhost
+Starting Impala Shell without Kerberos authentication
+
+Connected to localhost:21000
+<span class="ph">Server version: impalad version 2.8.x (build
+      x.y.z)</span>
+Welcome to the Impala shell. Press TAB twice to see a list of available commands.
+...
+<span class="ph">(Shell
+      build version: Impala Shell v2.8.x (<var class="keyword varname">hash</var>) built on
+      <var class="keyword varname">date</var>)</span>
+[localhost:21000] &gt; create database airline_data;
+[localhost:21000] &gt; use airline_data;
+[localhost:21000] &gt; create external table airlines_external
+                  &gt; like parquet 'hdfs://demo_host.example.com:8020/user/impala/staging/airlines/93459d994898a9ba-77674173b331fa96_2118228804_data.0.parq'
+                  &gt; stored as parquet location 'hdfs://demo_host.example.com:8020/user/impala/staging/airlines';
+WARNINGS: Impala does not have READ_WRITE access to path 'hdfs://demo_host.example.com:8020/user/impala/staging'
+</code></pre>
+
+<p class="p">
+With the table created, we examine its physical and logical characteristics to confirm that the data is really
+there and in a format and shape that we can work with.
+The <code class="ph codeph">SHOW TABLE STATS</code> statement gives a very high-level summary of the table,
+showing how many files and how much total data it contains.
+Also, it confirms that the table is expecting all the associated data files to be in Parquet format.
+(The ability to work with all kinds of HDFS data files in different formats means that it is
+possible to have a mismatch between the format of the data files, and the format
+that the table expects the data files to be in.)
+The <code class="ph codeph">SHOW FILES</code> statement confirms that the data in the table has the expected number,
+names, and sizes of the original Parquet files.
+The <code class="ph codeph">DESCRIBE</code> statement (or its abbreviation <code class="ph codeph">DESC</code>) confirms the names and types
+of the columns that Impala automatically created after reading that metadata from the Parquet file.
+The <code class="ph codeph">DESCRIBE FORMATTED</code> statement prints out some extra detail along with the column definitions;
+the pieces we care about for this exercise are the containing database for the table,
+the location of the associated data files in HDFS, the fact that it's an external table so Impala will not
+delete the HDFS files when we finish the experiments and drop the table, and the fact that the
+table is set up to work exclusively with files in the Parquet format.
+</p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; show table stats airlines_external;
++-------+--------+--------+--------------+-------------------+---------+-------------------+
+| #Rows | #Files | Size   | Bytes Cached | Cache Replication | Format  | Incremental stats |
++-------+--------+--------+--------------+-------------------+---------+-------------------+
+| -1    | 8      | 1.34GB | NOT CACHED   | NOT CACHED        | PARQUET | false             |
++-------+--------+--------+--------------+-------------------+---------+-------------------+
+[localhost:21000] &gt; show files in airlines_external;
++----------------------------------------------------------------------------------------+----------+-----------+
+| path                                                                                   | size     | partition |
++----------------------------------------------------------------------------------------+----------+-----------+
+| /user/impala/staging/airlines/93459d994898a9ba-77674173b331fa96_2118228804_data.0.parq | 252.83MB |           |
+| /user/impala/staging/airlines/93459d994898a9ba-77674173b331fa96_2118228804_data.1.parq | 64.42MB  |           |
+| /user/impala/staging/airlines/93459d994898a9ba-77674173b331fa97_574780876_data.0.parq  | 155.03MB |           |
+| /user/impala/staging/airlines/93459d994898a9ba-77674173b331fa98_1194408366_data.0.parq | 239.95MB |           |
+| /user/impala/staging/airlines/93459d994898a9ba-77674173b331fa99_1555718317_data.0.parq | 252.90MB |           |
+| /user/impala/staging/airlines/93459d994898a9ba-77674173b331fa99_1555718317_data.1.parq | 15.89MB  |           |
+| /user/impala/staging/airlines/93459d994898a9ba-77674173b331fa9a_2073981944_data.0.parq | 176.92MB |           |
+| /user/impala/staging/airlines/93459d994898a9ba-77674173b331fa9b_1413430552_data.0.parq | 212.47MB |           |
++----------------------------------------------------------------------------------------+----------+-----------+
+[localhost:21000] &gt; describe airlines_external;
++---------------------+--------+---------------------------------------------------+
+| name                | type   | comment                                           |
++---------------------+--------+---------------------------------------------------+
+| year                | int    | inferred from: optional int32 year                |
+| month               | int    | inferred from: optional int32 month               |
+| day                 | int    | inferred from: optional int32 day                 |
+| dayofweek           | int    | inferred from: optional int32 dayofweek           |
+| dep_time            | int    | inferred from: optional int32 dep_time            |
+| crs_dep_time        | int    | inferred from: optional int32 crs_dep_time        |
+| arr_time            | int    | inferred from: optional int32 arr_time            |
+| crs_arr_time        | int    | inferred from: optional int32 crs_arr_time        |
+| carrier             | string | inferred from: optional binary carrier            |
+| flight_num          | int    | inferred from: optional int32 flight_num          |
+| tail_num            | int    | inferred from: optional int32 tail_num            |
+| actual_elapsed_time | int    | inferred from: optional int32 actual_elapsed_time |
+| crs_elapsed_time    | int    | inferred from: optional int32 crs_elapsed_time    |
+| airtime             | int    | inferred from: optional int32 airtime             |
+| arrdelay            | int    | inferred from: optional int32 arrdelay            |
+| depdelay            | int    | inferred from: optional int32 depdelay            |
+| origin              | string | inferred from: optional binary origin             |
+| dest                | string | inferred from: optional binary dest               |
+| distance            | int    | inferred from: optional int32 distance            |
+| taxi_in             | int    | inferred from: optional int32 taxi_in             |
+| taxi_out            | int    | inferred from: optional int32 taxi_out            |
+| cancelled           | int    | inferred from: optional int32 cancelled           |
+| cancellation_code   | string | inferred from: optional binary cancellation_code  |
+| diverted            | int    | inferred from: optional int32 diverted            |
+| carrier_delay       | int    | inferred from: optional int32 carrier_delay       |
+| weather_delay       | int    | inferred from: optional int32 weather_delay       |
+| nas_delay           | int    | inferred from: optional int32 nas_delay           |
+| security_delay      | int    | inferred from: optional int32 security_delay      |
+| late_aircraft_delay | int    | inferred from: optional int32 late_aircraft_delay |
++---------------------+--------+---------------------------------------------------+
+[localhost:21000] &gt; desc formatted airlines_external;
++------------------------------+-------------------------------
+| name                         | type
++------------------------------+-------------------------------
+...
+| # Detailed Table Information | NULL
+| Database:                    | airline_data
+| Owner:                       | jrussell
+...
+| Location:                    | /user/impala/staging/airlines
+| Table Type:                  | EXTERNAL_TABLE
+...
+| # Storage Information        | NULL
+| SerDe Library:               | parquet.hive.serde.ParquetHiveSerDe
+| InputFormat:                 | parquet.hive.DeprecatedParquetInputFormat
+| OutputFormat:                | parquet.hive.DeprecatedParquetOutputFormat
+...
+</code></pre>
+
+<p class="p">
+Now that we are confident that the connections are solid between the Impala table and the
+underlying Parquet files, we run some initial queries to understand the characteristics
+of the data: the overall number of rows, and the ranges and how many
+different values are in certain columns.
+For convenience in understanding the magnitude of the <code class="ph codeph">COUNT(*)</code>
+result, we run another query dividing the number of rows by 1 million, demonstrating that there are 123 million rows in the table.
+</p>
+
+
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; select count(*) from airlines_external;
++-----------+
+| count(*)  |
++-----------+
+| 123534969 |
++-----------+
+Fetched 1 row(s) in 1.32s
+[localhost:21000] &gt; select count(*) / 1e6 as 'millions of rows' from airlines_external;
++------------------+
+| millions of rows |
++------------------+
+| 123.534969       |
++------------------+
+Fetched 1 row(s) in 1.24s
+</code></pre>
+
+<p class="p"> The <code class="ph codeph">NDV()</code> function stands for <span class="q">"number of distinct
+          values"</span>, which for performance reasons is an estimate when there
+        are lots of different values in the column, but is precise when the
+        cardinality is less than 16 K. Use <code class="ph codeph">NDV()</code> calls for this
+        kind of exploration rather than <code class="ph codeph">COUNT(DISTINCT
+            <var class="keyword varname">colname</var>)</code>, because Impala can evaluate
+        multiple <code class="ph codeph">NDV()</code> functions in a single query, but only a
+        single instance of <code class="ph codeph">COUNT DISTINCT</code>. Here we see that
+        there are modest numbers of different airlines, flight numbers, and
+        origin and destination airports. Two things jump out from this query:
+        the number of <code class="ph codeph">tail_num</code> values is much smaller than we
+        might have expected, and there are more destination airports than origin
+        airports. Let's dig further. What we find is that most
+          <code class="ph codeph">tail_num</code> values are <code class="ph codeph">NULL</code>. It looks
+        like this was an experimental column that wasn't filled in accurately.
+        We make a mental note that if we use this data as a starting point,
+        we'll ignore this column. We also find that certain airports are
+        represented in the <code class="ph codeph">ORIGIN</code> column but not the
+          <code class="ph codeph">DEST</code> column; now we know that we cannot rely on the
+        assumption that those sets of airport codes are identical. </p>
+
+<div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+A slight digression for some performance tuning. Notice how the first
+<code class="ph codeph">SELECT DISTINCT DEST</code> query takes almost 40 seconds.
+We expect all queries on such a small data set, less than 2 GB, to
+take a few seconds at most. The reason is because the expression
+<code class="ph codeph">NOT IN (SELECT origin FROM airlines_external)</code>
+produces an intermediate result set of 123 million rows, then
+runs 123 million comparisons on each data node against the tiny set of destination airports.
+The way the <code class="ph codeph">NOT IN</code> operator works internally means that
+this intermediate result set with 123 million rows might be transmitted
+across the network to each data node in the cluster.
+Applying another <code class="ph codeph">DISTINCT</code> inside the <code class="ph codeph">NOT IN</code>
+subquery means that the intermediate result set is only 340 items,
+resulting in much less network traffic and fewer comparison operations.
+The more efficient query with the added <code class="ph codeph">DISTINCT</code> is approximately 7 times as fast.
+</div>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; select ndv(carrier), ndv(flight_num), ndv(tail_num),
+                  &gt;   ndv(origin), ndv(dest) from airlines_external;
++--------------+-----------------+---------------+-------------+-----------+
+| ndv(carrier) | ndv(flight_num) | ndv(tail_num) | ndv(origin) | ndv(dest) |
++--------------+-----------------+---------------+-------------+-----------+
+| 29           | 9086            | 3             | 340         | 347       |
++--------------+-----------------+---------------+-------------+-----------+
+[localhost:21000] &gt; select tail_num, count(*) as howmany from airlines_external
+                  &gt;   group by tail_num;
++----------+-----------+
+| tail_num | howmany   |
++----------+-----------+
+| 715      | 1         |
+| 0        | 406405    |
+| 112      | 6562      |
+| NULL     | 123122001 |
++----------+-----------+
+Fetched 1 row(s) in 5.18s
+[localhost:21000] &gt; select distinct dest from airlines_external
+                  &gt;   where dest not in (select origin from airlines_external);
++------+
+| dest |
++------+
+| LBF  |
+| CBM  |
+| RCA  |
+| SKA  |
+| LAR  |
++------+
+Fetched 5 row(s) in 39.64s
+[localhost:21000] &gt; select distinct dest from airlines_external
+                  &gt;   where dest not in (select distinct origin from airlines_external);
++------+
+| dest |
++------+
+| LBF  |
+| RCA  |
+| CBM  |
+| SKA  |
+| LAR  |
++------+
+Fetched 5 row(s) in 5.59s
+[localhost:21000] &gt; select distinct origin from airlines_external
+                  &gt;   where origin not in (select distinct dest from airlines_external);
+Fetched 0 row(s) in 5.37s
+</code></pre>
+
+<p class="p"> Next, we try doing a simple calculation, with results broken down by year.
+        This reveals that some years have no data in the
+          <code class="ph codeph">AIRTIME</code> column. That means we might be able to use
+        that column in queries involving certain date ranges, but we cannot
+        count on it to always be reliable. The question of whether a column
+        contains any <code class="ph codeph">NULL</code> values, and if so what is their
+        number, proportion, and distribution, comes up again and again when
+        doing initial exploration of a data set. </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; select year, sum(airtime) from airlines_external
+                  &gt;   group by year order by year desc;
++------+--------------+
+| year | sum(airtime) |
++------+--------------+
+| 2008 | 713050445    |
+| 2007 | 748015545    |
+| 2006 | 720372850    |
+| 2005 | 708204026    |
+| 2004 | 714276973    |
+| 2003 | 665706940    |
+| 2002 | 549761849    |
+| 2001 | 590867745    |
+| 2000 | 583537683    |
+| 1999 | 561219227    |
+| 1998 | 538050663    |
+| 1997 | 536991229    |
+| 1996 | 519440044    |
+| 1995 | 513364265    |
+| 1994 | NULL         |
+| 1993 | NULL         |
+| 1992 | NULL         |
+| 1991 | NULL         |
+| 1990 | NULL         |
+| 1989 | NULL         |
+| 1988 | NULL         |
+| 1987 | NULL         |
++------+--------------+
+</code></pre>
+
+<p class="p">
+With the notion of <code class="ph codeph">NULL</code> values in mind, let's come back to the <code class="ph codeph">TAILNUM</code>
+column that we discovered had a lot of <code class="ph codeph">NULL</code>s.
+Let's quantify the <code class="ph codeph">NULL</code> and non-<code class="ph codeph">NULL</code> values in that column for better understanding.
+First, we just count the overall number of rows versus the non-<code class="ph codeph">NULL</code> values in that column.
+That initial result gives the appearance of relatively few non-<code class="ph codeph">NULL</code> values, but we can break
+it down more clearly in a single query.
+Once we have the <code class="ph codeph">COUNT(*)</code> and the <code class="ph codeph">COUNT(<var class="keyword varname">colname</var>)</code> numbers,
+we can encode that initial query in a <code class="ph codeph">WITH</code> clause, then run a followon query that performs
+multiple arithmetic operations on those values.
+Seeing that only one-third of one percent of all rows have non-<code class="ph codeph">NULL</code> values for the
+<code class="ph codeph">TAILNUM</code> column clearly illustrates that that column is not of much use.
+</p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; select count(*) as 'rows', count(tail_num) as 'non-null tail numbers'
+                  &gt;   from airlines_external;
++-----------+-----------------------+
+| rows      | non-null tail numbers |
++-----------+-----------------------+
+| 123534969 | 412968                |
++-----------+-----------------------+
+Fetched 1 row(s) in 1.51s
+[localhost:21000] &gt; with t1 as
+                  &gt;   (select count(*) as 'rows', count(tail_num) as 'nonnull'
+                  &gt;   from airlines_external)
+                  &gt; select `rows`, `nonnull`, `rows` - `nonnull` as 'nulls',
+                  &gt;   (`nonnull` / `rows`) * 100 as 'percentage non-null'
+                  &gt; from t1;
++-----------+---------+-----------+---------------------+
+| rows      | nonnull | nulls     | percentage non-null |
++-----------+---------+-----------+---------------------+
+| 123534969 | 412968  | 123122001 | 0.3342923897119365  |
++-----------+---------+-----------+---------------------+
+</code></pre>
+
+<p class="p">
+By examining other columns using these techniques, we can form a mental picture of the way data is distributed
+throughout the table, and which columns are most significant for query purposes. For this tutorial, we focus mostly on
+the fields likely to hold discrete values, rather than columns such as <code class="ph codeph">ACTUAL_ELAPSED_TIME</code>
+whose names suggest they hold measurements. We would dig deeper into those columns once we had a clear picture
+of which questions were worthwhile to ask, and what kinds of trends we might look for.
+For the final piece of initial exploration, let's look at the <code class="ph codeph">YEAR</code> column.
+A simple <code class="ph codeph">GROUP BY</code> query shows that it has a well-defined range, a manageable number of
+distinct values, and relatively even distribution of rows across the different years.
+</p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; select min(year), max(year), ndv(year) from airlines_external;
++-----------+-----------+-----------+
+| min(year) | max(year) | ndv(year) |
++-----------+-----------+-----------+
+| 1987      | 2008      | 22        |
++-----------+-----------+-----------+
+Fetched 1 row(s) in 2.03s
+[localhost:21000] &gt; select year, count(*) howmany from airlines_external
+                  &gt;   group by year order by year desc;
++------+---------+
+| year | howmany |
++------+---------+
+| 2008 | 7009728 |
+| 2007 | 7453215 |
+| 2006 | 7141922 |
+| 2005 | 7140596 |
+| 2004 | 7129270 |
+| 2003 | 6488540 |
+| 2002 | 5271359 |
+| 2001 | 5967780 |
+| 2000 | 5683047 |
+| 1999 | 5527884 |
+| 1998 | 5384721 |
+| 1997 | 5411843 |
+| 1996 | 5351983 |
+| 1995 | 5327435 |
+| 1994 | 5180048 |
+| 1993 | 5070501 |
+| 1992 | 5092157 |
+| 1991 | 5076925 |
+| 1990 | 5270893 |
+| 1989 | 5041200 |
+| 1988 | 5202096 |
+| 1987 | 1311826 |
++------+---------+
+Fetched 22 row(s) in 2.13s
+</code></pre>
+
+<p class="p">
+We could go quite far with the data in this initial raw format, just as we downloaded it from the web.
+If the data set proved to be useful and worth persisting in Impala for extensive queries,
+we might want to copy it to an internal table, letting Impala manage the data files and perhaps
+reorganizing a little for higher efficiency.
+In this next stage of the tutorial, we copy the original data into a partitioned table, still in Parquet format.
+Partitioning based on the <code class="ph codeph">YEAR</code> column lets us run queries with clauses such as <code class="ph codeph">WHERE year = 2001</code>
+or <code class="ph codeph">WHERE year BETWEEN 1989 AND 1999</code>, which can dramatically cut down on I/O by
+ignoring all the data from years outside the desired range.
+R

<TRUNCATED>

[28/51] [partial] incubator-impala git commit: IMPALA-4181 [DOCS] Publish rendered Impala documentation to ASF site

Posted by jb...@apache.org.

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_isilon.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_isilon.html b/docs/build/html/topics/impala_isilon.html
new file mode 100644
index 0000000..304e8ec
--- /dev/null
+++ b/docs/build/html/topics/impala_isilon.html
@@ -0,0 +1,89 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="impala_isilon"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Using Impala with Isilon Storage</title></head><body id="impala_isilon"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Using Impala with Isilon Storage</h1>
+  
+
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      
+      You can use Impala to query data files that reside on EMC Isilon storage devices, rather than in HDFS.
+      This capability allows convenient query access to a storage system where you might already be
+      managing large volumes of data. The combination of the Impala query engine and Isilon storage is
+      certified on <span class="keyword">Impala 2.2.4</span> or higher.
+    </p>
+
+    <div class="p">
+        Because the EMC Isilon storage devices use a global value for the block size
+        rather than a configurable value for each file, the <code class="ph codeph">PARQUET_FILE_SIZE</code>
+        query option has no effect when Impala inserts data into a table or partition
+        residing on Isilon storage. Use the <code class="ph codeph">isi</code> command to set the
+        default block size globally on the Isilon device. For example, to set the
+        Isilon default block size to 256 MB, the recommended size for Parquet
+        data files for Impala, issue the following command:
+<pre class="pre codeblock"><code>isi hdfs settings modify --default-block-size=256MB</code></pre>
+      </div>
+
+    <p class="p">
+      The typical use case for Impala and Isilon together is to use Isilon for the
+      default filesystem, replacing HDFS entirely. In this configuration,
+      when you create a database, table, or partition, the data always resides on
+      Isilon storage and you do not need to specify any special <code class="ph codeph">LOCATION</code>
+      attribute. If you do specify a <code class="ph codeph">LOCATION</code> attribute, its value refers
+      to a path within the Isilon filesystem.
+      For example:
+    </p>
+<pre class="pre codeblock"><code>-- If the default filesystem is Isilon, all Impala data resides there
+-- and all Impala databases and tables are located there.
+CREATE TABLE t1 (x INT, s STRING);
+
+-- You can specify LOCATION for database, table, or partition,
+-- using values from the Isilon filesystem.
+CREATE DATABASE d1 LOCATION '/some/path/on/isilon/server/d1.db';
+CREATE TABLE d1.t2 (a TINYINT, b BOOLEAN);
+</code></pre>
+
+    <p class="p">
+      Impala can write to, delete, and rename data files and database, table,
+      and partition directories on Isilon storage. Therefore, Impala statements such
+      as
+      <code class="ph codeph">CREATE TABLE</code>, <code class="ph codeph">DROP TABLE</code>,
+      <code class="ph codeph">CREATE DATABASE</code>, <code class="ph codeph">DROP DATABASE</code>,
+      <code class="ph codeph">ALTER TABLE</code>,
+      and
+      <code class="ph codeph">INSERT</code> work the same with Isilon storage as with HDFS.
+    </p>
+
+    <p class="p">
+      When the Impala spill-to-disk feature is activated by a query that approaches
+      the memory limit, Impala writes all the temporary data to a local (not Isilon)
+      storage device. Because the I/O bandwidth for the temporary data depends on
+      the number of local disks, and clusters using Isilon storage might not have
+      as many local disks attached, pay special attention on Isilon-enabled clusters
+      to any queries that use the spill-to-disk feature. Where practical, tune the
+      queries or allocate extra memory for Impala to avoid spilling.
+      Although you can specify an Isilon storage device as the destination for
+      the temporary data for the spill-to-disk feature, that configuration is
+      not recommended due to the need to transfer the data both ways using remote I/O.
+    </p>
+
+    <p class="p">
+      When tuning Impala queries on HDFS, you typically try to avoid any remote reads.
+      When the data resides on Isilon storage, all the I/O consists of remote reads.
+      Do not be alarmed when you see non-zero numbers for remote read measurements
+      in query profile output. The benefit of the Impala and Isilon integration is
+      primarily convenience of not having to move or copy large volumes of data to HDFS,
+      rather than raw query performance. You can increase the performance of Impala
+      I/O for Isilon systems by increasing the value for the
+      <code class="ph codeph">--num_remote_hdfs_io_threads</code> startup option for the
+      <span class="keyword cmdname">impalad</span> daemon.
+    </p>
+
+    
+  </div>
+
+</article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_jdbc.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_jdbc.html b/docs/build/html/topics/impala_jdbc.html
new file mode 100644
index 0000000..cdb031b
--- /dev/null
+++ b/docs/build/html/topics/impala_jdbc.html
@@ -0,0 +1,326 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_config.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="impala_jdbc"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Configuring Impala to Work with JDBC</title></head><body id="impala_jdbc"><main role="main"><article role="article" aria-labelledby="impala_jdbc__jdbc">
+
+  <h1 class="title topictitle1" id="impala_jdbc__jdbc">Configuring Impala to Work with JDBC</h1>
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      
+      Impala supports the standard JDBC interface, allowing access from commercial Business Intelligence tools and
+      custom software written in Java or other programming languages. The JDBC driver allows you to access Impala
+      from a Java program that you write, or a Business Intelligence or similar tool that uses JDBC to communicate
+      with various database products.
+    </p>
+
+    <p class="p">
+      Setting up a JDBC connection to Impala involves the following steps:
+    </p>
+
+    <ul class="ul">
+      <li class="li">
+        Verifying the communication port where the Impala daemons in your cluster are listening for incoming JDBC
+        requests.
+      </li>
+
+      <li class="li">
+        Installing the JDBC driver on every system that runs the JDBC-enabled application.
+      </li>
+
+      <li class="li">
+        Specifying a connection string for the JDBC application to access one of the servers running the
+        <span class="keyword cmdname">impalad</span> daemon, with the appropriate security settings.
+      </li>
+    </ul>
+
+    <p class="p toc inpage"></p>
+  </div>
+
+  <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_config.html">Managing Impala</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="impala_jdbc__jdbc_port">
+
+    <h2 class="title topictitle2" id="ariaid-title2">Configuring the JDBC Port</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        The default port used by JDBC 2.0 and later (as well as ODBC 2.x) is 21050. Impala server accepts JDBC
+        connections through this same port 21050 by default. Make sure this port is available for communication
+        with other hosts on your network, for example, that it is not blocked by firewall software. If your JDBC
+        client software connects to a different port, specify that alternative port number with the
+        <code class="ph codeph">--hs2_port</code> option when starting <code class="ph codeph">impalad</code>. See
+        <a class="xref" href="impala_processes.html#processes">Starting Impala</a> for details about Impala startup options. See
+        <a class="xref" href="impala_ports.html#ports">Ports Used by Impala</a> for information about all ports used for communication between Impala
+        and clients or between Impala components.
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="impala_jdbc__jdbc_driver_choice">
+
+    <h2 class="title topictitle2" id="ariaid-title3">Choosing the JDBC Driver</h2>
+  
+
+    <div class="body conbody">
+
+      <p class="p">
+        In Impala 2.0 and later, you can use the Hive 0.13 JDBC driver.  If you are
+        already using JDBC applications with an earlier Impala release, you should update
+        your JDBC driver, because the Hive 0.12 driver that was formerly the only choice
+        is not compatible with Impala 2.0 and later.
+      </p>
+
+      <p class="p">
+        The Hive JDBC driver provides a substantial speed increase for JDBC
+        applications with Impala 2.0 and higher, for queries that return large result sets.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Complex type considerations:</strong>
+      </p>
+
+      <p class="p">
+        The Impala complex types (<code class="ph codeph">STRUCT</code>, <code class="ph codeph">ARRAY</code>, or <code class="ph codeph">MAP</code>)
+        are available in <span class="keyword">Impala 2.3</span> and higher.
+        To use these types with JDBC requires version 2.5.28 or higher of the JDBC Connector for Impala.
+        To use these types with ODBC requires version 2.5.30 or higher of the ODBC Connector for Impala.
+        Consider upgrading all JDBC and ODBC drivers at the same time you upgrade from <span class="keyword">Impala 2.3</span> or higher.
+      </p>
+      <p class="p">
+        Although the result sets from queries involving complex types consist of all scalar values,
+        the queries involve join notation and column references that might not be understood by
+        a particular JDBC or ODBC connector. Consider defining a view that represents the
+        flattened version of a table containing complex type columns, and pointing the JDBC
+        or ODBC application at the view.
+        See <a class="xref" href="../shared/../topics/impala_complex_types.html#complex_types">Complex Types (Impala 2.3 or higher only)</a> for details.
+      </p>
+
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="impala_jdbc__jdbc_setup">
+
+    <h2 class="title topictitle2" id="ariaid-title4">Enabling Impala JDBC Support on Client Systems</h2>
+  
+
+    <div class="body conbody">
+
+      <section class="section" id="jdbc_setup__install_hive_driver"><h3 class="title sectiontitle">Using the Hive JDBC Driver</h3>
+        
+        <p class="p">
+          You install the Hive JDBC driver (<code class="ph codeph">hive-jdbc</code> package) through the Linux package manager, on
+          hosts within the cluster. The driver consists of several Java JAR files. The same driver can be used by Impala and Hive.
+        </p>
+
+        <p class="p">
+          To get the JAR files, install the Hive JDBC driver on each host in the cluster that will run
+          JDBC applications. 
+        </p>
+
+        <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+          The latest JDBC driver, corresponding to Hive 0.13, provides substantial performance improvements for
+          Impala queries that return large result sets. Impala 2.0 and later are compatible with the Hive 0.13
+          driver. If you already have an older JDBC driver installed, and are running Impala 2.0 or higher, consider
+          upgrading to the latest Hive JDBC driver for best performance with JDBC applications.
+        </div>
+
+        <p class="p">
+          If you are using JDBC-enabled applications on hosts outside the cluster, you cannot use the the same install
+          procedure on the hosts. Install the JDBC driver on at least one cluster host using the preceding
+          procedure. Then download the JAR files to each client machine that will use JDBC with Impala:
+        </p>
+
+  <pre class="pre codeblock"><code>commons-logging-X.X.X.jar
+  hadoop-common.jar
+  hive-common-X.XX.X.jar
+  hive-jdbc-X.XX.X.jar
+  hive-metastore-X.XX.X.jar
+  hive-service-X.XX.X.jar
+  httpclient-X.X.X.jar
+  httpcore-X.X.X.jar
+  libfb303-X.X.X.jar
+  libthrift-X.X.X.jar
+  log4j-X.X.XX.jar
+  slf4j-api-X.X.X.jar
+  slf4j-logXjXX-X.X.X.jar
+  </code></pre>
+
+        <p class="p">
+          <strong class="ph b">To enable JDBC support for Impala on the system where you run the JDBC application:</strong>
+        </p>
+
+        <ol class="ol">
+          <li class="li">
+            Download the JAR files listed above to each client machine.
+            <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+              For Maven users, see
+              <a class="xref" href="https://github.com/onefoursix/Cloudera-Impala-JDBC-Example" target="_blank">this sample github page</a> for an example of the
+              dependencies you could add to a <code class="ph codeph">pom</code> file instead of downloading the individual JARs.
+            </div>
+          </li>
+
+          <li class="li">
+            Store the JAR files in a location of your choosing, ideally a directory already referenced in your
+            <code class="ph codeph">CLASSPATH</code> setting. For example:
+            <ul class="ul">
+              <li class="li">
+                On Linux, you might use a location such as <code class="ph codeph">/opt/jars/</code>.
+              </li>
+
+              <li class="li">
+                On Windows, you might use a subdirectory underneath <span class="ph filepath">C:\Program Files</span>.
+              </li>
+            </ul>
+          </li>
+
+          <li class="li">
+            To successfully load the Impala JDBC driver, client programs must be able to locate the associated JAR
+            files. This often means setting the <code class="ph codeph">CLASSPATH</code> for the client process to include the
+            JARs. Consult the documentation for your JDBC client for more details on how to install new JDBC drivers,
+            but some examples of how to set <code class="ph codeph">CLASSPATH</code> variables include:
+            <ul class="ul">
+              <li class="li">
+                On Linux, if you extracted the JARs to <code class="ph codeph">/opt/jars/</code>, you might issue the following
+                command to prepend the JAR files path to an existing classpath:
+  <pre class="pre codeblock"><code>export CLASSPATH=/opt/jars/*.jar:$CLASSPATH</code></pre>
+              </li>
+
+              <li class="li">
+                On Windows, use the <strong class="ph b">System Properties</strong> control panel item to modify the <strong class="ph b">Environment
+                Variables</strong> for your system. Modify the environment variables to include the path to which you
+                extracted the files.
+                <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+                  If the existing <code class="ph codeph">CLASSPATH</code> on your client machine refers to some older version of
+                  the Hive JARs, ensure that the new JARs are the first ones listed. Either put the new JAR files
+                  earlier in the listings, or delete the other references to Hive JAR files.
+                </div>
+              </li>
+            </ul>
+          </li>
+        </ol>
+      </section>
+
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title5" id="impala_jdbc__jdbc_connect">
+
+    <h2 class="title topictitle2" id="ariaid-title5">Establishing JDBC Connections</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        The JDBC driver class depends on which driver you select.
+      </p>
+
+      <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+        If your JDBC or ODBC application connects to Impala through a load balancer such as
+        <code class="ph codeph">haproxy</code>, be cautious about reusing the connections. If the load balancer has set up
+        connection timeout values, either check the connection frequently so that it never sits idle longer than
+        the load balancer timeout value, or check the connection validity before using it and create a new one if
+        the connection has been closed.
+      </div>
+
+      <section class="section" id="jdbc_connect__class_hive_driver"><h3 class="title sectiontitle">Using the Hive JDBC Driver</h3>
+      
+
+      <p class="p">
+        For example, with the Hive JDBC driver, the class name is <code class="ph codeph">org.apache.hive.jdbc.HiveDriver</code>.
+        Once you have configured Impala to work with JDBC, you can establish connections between the two.
+        To do so for a cluster that does not use
+        Kerberos authentication, use a connection string of the form
+        <code class="ph codeph">jdbc:hive2://<var class="keyword varname">host</var>:<var class="keyword varname">port</var>/;auth=noSasl</code>.
+
+        For example, you might use:
+      </p>
+
+<pre class="pre codeblock"><code>jdbc:hive2://myhost.example.com:21050/;auth=noSasl</code></pre>
+
+      <p class="p">
+        To connect to an instance of Impala that requires Kerberos authentication, use a connection string of the
+        form
+        <code class="ph codeph">jdbc:hive2://<var class="keyword varname">host</var>:<var class="keyword varname">port</var>/;principal=<var class="keyword varname">principal_name</var></code>.
+        The principal must be the same user principal you used when starting Impala. For example, you might use:
+      </p>
+
+<pre class="pre codeblock"><code>jdbc:hive2://myhost.example.com:21050/;principal=impala/myhost.example.com@H2.EXAMPLE.COM</code></pre>
+
+      <p class="p">
+        To connect to an instance of Impala that requires LDAP authentication, use a connection string of the form
+        <code class="ph codeph">jdbc:hive2://<var class="keyword varname">host</var>:<var class="keyword varname">port</var>/<var class="keyword varname">db_name</var>;user=<var class="keyword varname">ldap_userid</var>;password=<var class="keyword varname">ldap_password</var></code>.
+        For example, you might use:
+      </p>
+
+<pre class="pre codeblock"><code>jdbc:hive2://myhost.example.com:21050/test_db;user=fred;password=xyz123</code></pre>
+
+      <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+          <p class="p">
+        Prior to <span class="keyword">Impala 2.5</span>, the Hive JDBC driver did not support connections that use both Kerberos authentication
+        and SSL encryption. If your cluster is running an older release that has this restriction,
+        use an alternative JDBC driver that supports
+        both of these security features.
+      </p>
+      </div>
+
+      </section>
+
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title6" id="impala_jdbc__jdbc_odbc_notes">
+    <h2 class="title topictitle2" id="ariaid-title6">Notes about JDBC and ODBC Interaction with Impala SQL Features</h2>
+    <div class="body conbody">
+      <p class="p">
+        Most Impala SQL features work equivalently through the <span class="keyword cmdname">impala-shell</span> interpreter
+        of the JDBC or ODBC APIs. The following are some exceptions to keep in mind when switching between
+        the interactive shell and applications using the APIs:
+      </p>
+      <ul class="ul">
+        <li class="li">
+          <p class="p">
+        <strong class="ph b">Complex type considerations:</strong>
+      </p>
+          <ul class="ul">
+          <li class="li">
+          <p class="p">
+            Queries involving the complex types (<code class="ph codeph">ARRAY</code>, <code class="ph codeph">STRUCT</code>, and <code class="ph codeph">MAP</code>)
+            require notation that might not be available in all levels of JDBC and ODBC drivers.
+            If you have trouble querying such a table due to the driver level or
+            inability to edit the queries used by the application, you can create a view that exposes
+            a <span class="q">"flattened"</span> version of the complex columns and point the application at the view.
+            See <a class="xref" href="impala_complex_types.html#complex_types">Complex Types (Impala 2.3 or higher only)</a> for details.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            The complex types available in <span class="keyword">Impala 2.3</span> and higher are supported by the
+            JDBC <code class="ph codeph">getColumns()</code> API.
+            Both <code class="ph codeph">MAP</code> and <code class="ph codeph">ARRAY</code> are reported as the JDBC SQL Type <code class="ph codeph">ARRAY</code>,
+            because this is the closest matching Java SQL type. This behavior is consistent with Hive.
+            <code class="ph codeph">STRUCT</code> types are reported as the JDBC SQL Type <code class="ph codeph">STRUCT</code>.
+          </p>
+          <div class="p">
+            To be consistent with Hive's behavior, the TYPE_NAME field is populated
+            with the primitive type name for scalar types, and with the full <code class="ph codeph">toSql()</code>
+            for complex types. The resulting type names are somewhat inconsistent,
+            because nested types are printed differently than top-level types. For example,
+            the following list shows how <code class="ph codeph">toSQL()</code> for Impala types are
+            translated to <code class="ph codeph">TYPE_NAME</code> values:
+<pre class="pre codeblock"><code>DECIMAL(10,10)         becomes  DECIMAL
+CHAR(10)               becomes  CHAR
+VARCHAR(10)            becomes  VARCHAR
+ARRAY&lt;DECIMAL(10,10)&gt;  becomes  ARRAY&lt;DECIMAL(10,10)&gt;
+ARRAY&lt;CHAR(10)&gt;        becomes  ARRAY&lt;CHAR(10)&gt;
+ARRAY&lt;VARCHAR(10)&gt;     becomes  ARRAY&lt;VARCHAR(10)&gt;
+
+</code></pre>
+          </div>
+          </li>
+        </ul>
+        </li>
+      </ul>
+    </div>
+  </article>
+
+</article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_joins.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_joins.html b/docs/build/html/topics/impala_joins.html
new file mode 100644
index 0000000..436f9f5
--- /dev/null
+++ b/docs/build/html/topics/impala_joins.html
@@ -0,0 +1,531 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_select.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="joins"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Joins in Impala SELECT Statements</title></head><body id="joins"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Joins in Impala SELECT Statements</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      
+      A join query is a <code class="ph codeph">SELECT</code> statement that combines data from two or more tables,
+      and returns a result set containing items from some or all of those tables. It is a way to
+      cross-reference and correlate related data that is organized into multiple tables, typically
+      using identifiers that are repeated in each of the joined tables.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+    <p class="p">
+        Impala supports a wide variety of <code class="ph codeph">JOIN</code> clauses. Left, right, semi, full, and outer joins
+        are supported in all Impala versions. The <code class="ph codeph">CROSS JOIN</code> operator is available in Impala 1.2.2
+        and higher. During performance tuning, you can override the reordering of join clauses that Impala does
+        internally by including the keyword <code class="ph codeph">STRAIGHT_JOIN</code> immediately after the
+        <code class="ph codeph">SELECT</code> keyword
+      </p>
+
+<pre class="pre codeblock"><code>SELECT <var class="keyword varname">select_list</var> FROM
+  <var class="keyword varname">table_or_subquery1</var> [INNER] JOIN <var class="keyword varname">table_or_subquery2</var> |
+  <var class="keyword varname">table_or_subquery1</var> {LEFT [OUTER] | RIGHT [OUTER] | FULL [OUTER]} JOIN <var class="keyword varname">table_or_subquery2</var> |
+  <var class="keyword varname">table_or_subquery1</var> {LEFT | RIGHT} SEMI JOIN <var class="keyword varname">table_or_subquery2</var> |
+  <span class="ph"><var class="keyword varname">table_or_subquery1</var> {LEFT | RIGHT} ANTI JOIN <var class="keyword varname">table_or_subquery2</var> |</span>
+    [ ON <var class="keyword varname">col1</var> = <var class="keyword varname">col2</var> [AND <var class="keyword varname">col3</var> = <var class="keyword varname">col4</var> ...] |
+      USING (<var class="keyword varname">col1</var> [, <var class="keyword varname">col2</var> ...]) ]
+  [<var class="keyword varname">other_join_clause</var> ...]
+[ WHERE <var class="keyword varname">where_clauses</var> ]
+
+SELECT <var class="keyword varname">select_list</var> FROM
+  <var class="keyword varname">table_or_subquery1</var>, <var class="keyword varname">table_or_subquery2</var> [, <var class="keyword varname">table_or_subquery3</var> ...]
+  [<var class="keyword varname">other_join_clause</var> ...]
+WHERE
+    <var class="keyword varname">col1</var> = <var class="keyword varname">col2</var> [AND <var class="keyword varname">col3</var> = <var class="keyword varname">col4</var> ...]
+
+SELECT <var class="keyword varname">select_list</var> FROM
+  <var class="keyword varname">table_or_subquery1</var> CROSS JOIN <var class="keyword varname">table_or_subquery2</var>
+  [<var class="keyword varname">other_join_clause</var> ...]
+[ WHERE <var class="keyword varname">where_clauses</var> ]</code></pre>
+
+    <p class="p">
+      <strong class="ph b">SQL-92 and SQL-89 Joins:</strong>
+    </p>
+
+    <p class="p">
+      Queries with the explicit <code class="ph codeph">JOIN</code> keywords are known as SQL-92 style joins, referring to the
+      level of the SQL standard where they were introduced. The corresponding <code class="ph codeph">ON</code> or
+      <code class="ph codeph">USING</code> clauses clearly show which columns are used as the join keys in each case:
+    </p>
+
+<pre class="pre codeblock"><code>SELECT t1.c1, t2.c2 FROM <strong class="ph b">t1 JOIN t2</strong>
+  <strong class="ph b">ON t1.id = t2.id and t1.type_flag = t2.type_flag</strong>
+  WHERE t1.c1 &gt; 100;
+
+SELECT t1.c1, t2.c2 FROM <strong class="ph b">t1 JOIN t2</strong>
+  <strong class="ph b">USING (id, type_flag)</strong>
+  WHERE t1.c1 &gt; 100;</code></pre>
+
+    <p class="p">
+      The <code class="ph codeph">ON</code> clause is a general way to compare columns across the two tables, even if the column
+      names are different. The <code class="ph codeph">USING</code> clause is a shorthand notation for specifying the join
+      columns, when the column names are the same in both tables. You can code equivalent <code class="ph codeph">WHERE</code>
+      clauses that compare the columns, instead of <code class="ph codeph">ON</code> or <code class="ph codeph">USING</code> clauses, but that
+      practice is not recommended because mixing the join comparisons with other filtering clauses is typically
+      less readable and harder to maintain.
+    </p>
+
+    <p class="p">
+      Queries with a comma-separated list of tables and subqueries are known as SQL-89 style joins. In these
+      queries, the equality comparisons between columns of the joined tables go in the <code class="ph codeph">WHERE</code>
+      clause alongside other kinds of comparisons. This syntax is easy to learn, but it is also easy to
+      accidentally remove a <code class="ph codeph">WHERE</code> clause needed for the join to work correctly.
+    </p>
+
+<pre class="pre codeblock"><code>SELECT t1.c1, t2.c2 FROM <strong class="ph b">t1, t2</strong>
+  WHERE
+  <strong class="ph b">t1.id = t2.id AND t1.type_flag = t2.type_flag</strong>
+  AND t1.c1 &gt; 100;</code></pre>
+
+    <p class="p">
+      <strong class="ph b">Self-joins:</strong>
+    </p>
+
+    <p class="p">
+      Impala can do self-joins, for example to join on two different columns in the same table to represent
+      parent-child relationships or other tree-structured data. There is no explicit syntax for this; just use the
+      same table name for both the left-hand and right-hand table, and assign different table aliases to use when
+      referring to the fully qualified column names:
+    </p>
+
+<pre class="pre codeblock"><code>-- Combine fields from both parent and child rows.
+SELECT lhs.id, rhs.parent, lhs.c1, rhs.c2 FROM tree_data lhs, tree_data rhs WHERE lhs.id = rhs.parent;</code></pre>
+
+    <p class="p">
+      <strong class="ph b">Cartesian joins:</strong>
+    </p>
+
+    <div class="p">
+      To avoid producing huge result sets by mistake, Impala does not allow Cartesian joins of the form:
+<pre class="pre codeblock"><code>SELECT ... FROM t1 JOIN t2;
+SELECT ... FROM t1, t2;</code></pre>
+      If you intend to join the tables based on common values, add <code class="ph codeph">ON</code> or <code class="ph codeph">WHERE</code>
+      clauses to compare columns across the tables. If you truly intend to do a Cartesian join, use the
+      <code class="ph codeph">CROSS JOIN</code> keyword as the join operator. The <code class="ph codeph">CROSS JOIN</code> form does not use
+      any <code class="ph codeph">ON</code> clause, because it produces a result set with all combinations of rows from the
+      left-hand and right-hand tables. The result set can still be filtered by subsequent <code class="ph codeph">WHERE</code>
+      clauses. For example:
+    </div>
+
+<pre class="pre codeblock"><code>SELECT ... FROM t1 CROSS JOIN t2;
+SELECT ... FROM t1 CROSS JOIN t2 WHERE <var class="keyword varname">tests_on_non_join_columns</var>;</code></pre>
+
+    <p class="p">
+      <strong class="ph b">Inner and outer joins:</strong>
+    </p>
+
+    <p class="p">
+      An inner join is the most common and familiar type: rows in the result set contain the requested columns from
+      the appropriate tables, for all combinations of rows where the join columns of the tables have identical
+      values. If a column with the same name occurs in both tables, use a fully qualified name or a column alias to
+      refer to the column in the select list or other clauses. Impala performs inner joins by default for both
+      SQL-89 and SQL-92 join syntax:
+    </p>
+
+<pre class="pre codeblock"><code>-- The following 3 forms are all equivalent.
+SELECT t1.id, c1, c2 FROM t1, t2 WHERE t1.id = t2.id;
+SELECT t1.id, c1, c2 FROM t1 JOIN t2 ON t1.id = t2.id;
+SELECT t1.id, c1, c2 FROM t1 INNER JOIN t2 ON t1.id = t2.id;</code></pre>
+
+    <p class="p">
+      An outer join retrieves all rows from the left-hand table, or the right-hand table, or both; wherever there
+      is no matching data in the table on the other side of the join, the corresponding columns in the result set
+      are set to <code class="ph codeph">NULL</code>. To perform an outer join, include the <code class="ph codeph">OUTER</code> keyword in the
+      join operator, along with either <code class="ph codeph">LEFT</code>, <code class="ph codeph">RIGHT</code>, or <code class="ph codeph">FULL</code>:
+    </p>
+
+<pre class="pre codeblock"><code>SELECT * FROM t1 LEFT OUTER JOIN t2 ON t1.id = t2.id;
+SELECT * FROM t1 RIGHT OUTER JOIN t2 ON t1.id = t2.id;
+SELECT * FROM t1 FULL OUTER JOIN t2 ON t1.id = t2.id;</code></pre>
+
+    <p class="p">
+      For outer joins, Impala requires SQL-92 syntax; that is, the <code class="ph codeph">JOIN</code> keyword instead of
+      comma-separated table names. Impala does not support vendor extensions such as <code class="ph codeph">(+)</code> or
+      <code class="ph codeph">*=</code> notation for doing outer joins with SQL-89 query syntax.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Equijoins and Non-Equijoins:</strong>
+    </p>
+
+    <p class="p">
+      By default, Impala requires an equality comparison between the left-hand and right-hand tables, either
+      through <code class="ph codeph">ON</code>, <code class="ph codeph">USING</code>, or <code class="ph codeph">WHERE</code> clauses. These types of
+      queries are classified broadly as equijoins. Inner, outer, full, and semi joins can all be equijoins based on
+      the presence of equality tests between columns in the left-hand and right-hand tables.
+    </p>
+
+    <p class="p">
+      In Impala 1.2.2 and higher, non-equijoin queries are also possible, with comparisons such as
+      <code class="ph codeph">!=</code> or <code class="ph codeph">&lt;</code> between the join columns. These kinds of queries require care to
+      avoid producing huge result sets that could exceed resource limits. Once you have planned a non-equijoin
+      query that produces a result set of acceptable size, you can code the query using the <code class="ph codeph">CROSS
+      JOIN</code> operator, and add the extra comparisons in the <code class="ph codeph">WHERE</code> clause:
+    </p>
+
+<pre class="pre codeblock"><code>SELECT * FROM t1 CROSS JOIN t2 WHERE t1.total &gt; t2.maximum_price;</code></pre>
+
+    <p class="p">
+      In <span class="keyword">Impala 2.3</span> and higher, additional non-equijoin queries are possible due to the addition
+      of nested loop joins. These queries typically involve <code class="ph codeph">SEMI JOIN</code>,
+      <code class="ph codeph">ANTI JOIN</code>, or <code class="ph codeph">FULL OUTER JOIN</code> clauses.
+      Impala sometimes also uses nested loop joins internally when evaluating <code class="ph codeph">OUTER JOIN</code>
+      queries involving complex type columns.
+      Query phases involving nested loop joins do not use the spill-to-disk mechanism if they
+      exceed the memory limit. Impala decides internally when to use each join mechanism; you cannot
+      specify any query hint to choose between the nested loop join or the original hash join algorithm.
+    </p>
+
+<pre class="pre codeblock"><code>SELECT * FROM t1 LEFT OUTER JOIN t2 ON t1.int_col &lt; t2.int_col;</code></pre>
+
+    <p class="p">
+      <strong class="ph b">Semi-joins:</strong>
+    </p>
+
+    <p class="p">
+      Semi-joins are a relatively rarely used variation. With the left semi-join, only data from the left-hand
+      table is returned, for rows where there is matching data in the right-hand table, based on comparisons
+      between join columns in <code class="ph codeph">ON</code> or <code class="ph codeph">WHERE</code> clauses. Only one instance of each row
+      from the left-hand table is returned, regardless of how many matching rows exist in the right-hand table.
+      <span class="ph">A right semi-join (available in Impala 2.0 and higher) reverses the comparison and returns
+      data from the right-hand table.</span>
+    </p>
+
+<pre class="pre codeblock"><code>SELECT t1.c1, t1.c2, t1.c2 FROM t1 LEFT SEMI JOIN t2 ON t1.id = t2.id;</code></pre>
+
+    <p class="p">
+      <strong class="ph b">Natural joins (not supported):</strong>
+    </p>
+
+    <p class="p">
+      Impala does not support the <code class="ph codeph">NATURAL JOIN</code> operator, again to avoid inconsistent or huge
+      result sets. Natural joins do away with the <code class="ph codeph">ON</code> and <code class="ph codeph">USING</code> clauses, and
+      instead automatically join on all columns with the same names in the left-hand and right-hand tables. This
+      kind of query is not recommended for rapidly evolving data structures such as are typically used in Hadoop.
+      Thus, Impala does not support the <code class="ph codeph">NATURAL JOIN</code> syntax, which can produce different query
+      results as columns are added to or removed from tables.
+    </p>
+
+    <p class="p">
+      If you do have any queries that use <code class="ph codeph">NATURAL JOIN</code>, make sure to rewrite them with explicit
+      <code class="ph codeph">USING</code> clauses, because Impala could interpret the <code class="ph codeph">NATURAL</code> keyword as a
+      table alias:
+    </p>
+
+<pre class="pre codeblock"><code>-- 'NATURAL' is interpreted as an alias for 't1' and Impala attempts an inner join,
+-- resulting in an error because inner joins require explicit comparisons between columns.
+SELECT t1.c1, t2.c2 FROM t1 NATURAL JOIN t2;
+ERROR: NotImplementedException: Join with 't2' requires at least one conjunctive equality predicate.
+  To perform a Cartesian product between two tables, use a CROSS JOIN.
+
+-- If you expect the tables to have identically named columns with matching values,
+-- list the corresponding column names in a USING clause.
+SELECT t1.c1, t2.c2 FROM t1 JOIN t2 USING (id, type_flag, name, address);</code></pre>
+
+    <p class="p">
+      <strong class="ph b">Anti-joins (<span class="keyword">Impala 2.0</span> and higher only):</strong>
+    </p>
+
+    <p class="p">
+      Impala supports the <code class="ph codeph">LEFT ANTI JOIN</code> and <code class="ph codeph">RIGHT ANTI JOIN</code> clauses in
+      <span class="keyword">Impala 2.0</span> and higher. The <code class="ph codeph">LEFT</code> or <code class="ph codeph">RIGHT</code>
+      keyword is required for this kind of join. For <code class="ph codeph">LEFT ANTI JOIN</code>, this clause returns those
+      values from the left-hand table that have no matching value in the right-hand table. <code class="ph codeph">RIGHT ANTI
+      JOIN</code> reverses the comparison and returns values from the right-hand table. You can express this
+      negative relationship either through the <code class="ph codeph">ANTI JOIN</code> clause or through a <code class="ph codeph">NOT
+      EXISTS</code> operator with a subquery.
+    </p>
+
+
+
+    <p class="p">
+        <strong class="ph b">Complex type considerations:</strong>
+      </p>
+
+
+
+    <p class="p">
+      When referring to a column with a complex type (<code class="ph codeph">STRUCT</code>, <code class="ph codeph">ARRAY</code>, or <code class="ph codeph">MAP</code>)
+      in a query, you use join notation to <span class="q">"unpack"</span> the scalar fields of the struct, the elements of the array, or
+      the key-value pairs of the map. (The join notation is not required for aggregation operations, such as
+      <code class="ph codeph">COUNT()</code> or <code class="ph codeph">SUM()</code> for array elements.) Because Impala recognizes which complex type elements are associated with which row
+      of the result set, you use the same syntax as for a cross or cartesian join, without an explicit join condition.
+      See <a class="xref" href="impala_complex_types.html#complex_types">Complex Types (Impala 2.3 or higher only)</a> for details about Impala support for complex types.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+    <p class="p">
+      You typically use join queries in situations like these:
+    </p>
+
+    <ul class="ul">
+      <li class="li">
+        When related data arrives from different sources, with each data set physically residing in a separate
+        table. For example, you might have address data from business records that you cross-check against phone
+        listings or census data.
+        <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+          Impala can join tables of different file formats, including Impala-managed tables and HBase tables. For
+          example, you might keep small dimension tables in HBase, for convenience of single-row lookups and
+          updates, and for the larger fact tables use Parquet or other binary file format optimized for scan
+          operations. Then, you can issue a join query to cross-reference the fact tables with the dimension
+          tables.
+        </div>
+      </li>
+
+      <li class="li">
+        When data is normalized, a technique for reducing data duplication by dividing it across multiple tables.
+        This kind of organization is often found in data that comes from traditional relational database systems.
+        For example, instead of repeating some long string such as a customer name in multiple tables, each table
+        might contain a numeric customer ID. Queries that need to display the customer name could <span class="q">"join"</span> the
+        table that specifies which customer ID corresponds to which name.
+      </li>
+
+      <li class="li">
+        When certain columns are rarely needed for queries, so they are moved into separate tables to reduce
+        overhead for common queries. For example, a <code class="ph codeph">biography</code> field might be rarely needed in
+        queries on employee data. Putting that field in a separate table reduces the amount of I/O for common
+        queries on employee addresses or phone numbers. Queries that do need the <code class="ph codeph">biography</code> column
+        can retrieve it by performing a join with that separate table.
+      </li>
+
+      <li class="li">
+        In <span class="keyword">Impala 2.3</span> or higher, when referring to complex type columns in queries.
+        See <a class="xref" href="impala_complex_types.html#complex_types">Complex Types (Impala 2.3 or higher only)</a> for details.
+      </li>
+    </ul>
+
+    <p class="p">
+      When comparing columns with the same names in <code class="ph codeph">ON</code> or <code class="ph codeph">WHERE</code> clauses, use the
+      fully qualified names such as <code class="ph codeph"><var class="keyword varname">db_name</var>.<var class="keyword varname">table_name</var></code>, or
+      assign table aliases, column aliases, or both to make the code more compact and understandable:
+    </p>
+
+<pre class="pre codeblock"><code>select t1.c1 as first_id, t2.c2 as second_id from
+  t1 join t2 on first_id = second_id;
+
+select fact.custno, dimension.custno from
+  customer_data as fact join customer_address as dimension
+  using (custno)</code></pre>
+
+    <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+      <p class="p">
+        Performance for join queries is a crucial aspect for Impala, because complex join queries are
+        resource-intensive operations. An efficient join query produces much less network traffic and CPU overhead
+        than an inefficient one. For best results:
+      </p>
+      <ul class="ul">
+        <li class="li">
+          Make sure that both <a class="xref" href="impala_perf_stats.html#perf_stats">table and column statistics</a> are
+          available for all the tables involved in a join query, and especially for the columns referenced in any
+          join conditions. Impala uses the statistics to automatically deduce an efficient join order.
+          Use <a class="xref" href="impala_show.html#show"><code class="ph codeph">SHOW TABLE STATS <var class="keyword varname">table_name</var></code> and
+          <code class="ph codeph">SHOW COLUMN STATS <var class="keyword varname">table_name</var></code></a> to check if statistics are
+          already present. Issue the <code class="ph codeph">COMPUTE STATS <var class="keyword varname">table_name</var></code> for a nonpartitioned table,
+          or (in Impala 2.1.0 and higher) <code class="ph codeph">COMPUTE INCREMENTAL STATS <var class="keyword varname">table_name</var></code>
+          for a partitioned table, to collect the initial statistics at both the table and column levels, and to keep the
+          statistics up to date after any substantial <code class="ph codeph">INSERT</code> or <code class="ph codeph">LOAD DATA</code> operations.
+        </li>
+
+        <li class="li">
+          If table or column statistics are not available, join the largest table first. You can check the
+          existence of statistics with the <code class="ph codeph">SHOW TABLE STATS <var class="keyword varname">table_name</var></code> and
+          <code class="ph codeph">SHOW COLUMN STATS <var class="keyword varname">table_name</var></code> statements.
+        </li>
+
+        <li class="li">
+          If table or column statistics are not available, join subsequent tables according to which table has the
+          most selective filter, based on overall size and <code class="ph codeph">WHERE</code> clauses. Joining the table with
+          the most selective filter results in the fewest number of rows being returned.
+        </li>
+      </ul>
+      <p class="p">
+        For more information and examples of performance for join queries, see
+        <a class="xref" href="impala_perf_joins.html#perf_joins">Performance Considerations for Join Queries</a>.
+      </p>
+    </div>
+
+    <p class="p">
+      To control the result set from a join query, include the names of corresponding column names in both tables
+      in an <code class="ph codeph">ON</code> or <code class="ph codeph">USING</code> clause, or by coding equality comparisons for those
+      columns in the <code class="ph codeph">WHERE</code> clause.
+    </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; select c_last_name, ca_city from customer join customer_address where c_customer_sk = ca_address_sk;
++-------------+-----------------+
+| c_last_name | ca_city         |
++-------------+-----------------+
+| Lewis       | Fairfield       |
+| Moses       | Fairview        |
+| Hamilton    | Pleasant Valley |
+| White       | Oak Ridge       |
+| Moran       | Glendale        |
+...
+| Richards    | Lakewood         |
+| Day         | Lebanon          |
+| Painter     | Oak Hill         |
+| Bentley     | Greenfield       |
+| Jones       | Stringtown       |
++-------------+------------------+
+Returned 50000 row(s) in 9.82s</code></pre>
+
+    <p class="p">
+      One potential downside of joins is the possibility of excess resource usage in poorly constructed queries.
+      Impala imposes restrictions on join queries to guard against such issues. To minimize the chance of runaway
+      queries on large data sets, Impala requires every join query to contain at least one equality predicate
+      between the columns of the various tables. For example, if <code class="ph codeph">T1</code> contains 1000 rows and
+      <code class="ph codeph">T2</code> contains 1,000,000 rows, a query <code class="ph codeph">SELECT <var class="keyword varname">columns</var> FROM t1 JOIN
+      t2</code> could return up to 1 billion rows (1000 * 1,000,000); Impala requires that the query include a
+      clause such as <code class="ph codeph">ON t1.c1 = t2.c2</code> or <code class="ph codeph">WHERE t1.c1 = t2.c2</code>.
+    </p>
+
+    <p class="p">
+      Because even with equality clauses, the result set can still be large, as we saw in the previous example, you
+      might use a <code class="ph codeph">LIMIT</code> clause to return a subset of the results:
+    </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; select c_last_name, ca_city from customer, customer_address where c_customer_sk = ca_address_sk limit 10;
++-------------+-----------------+
+| c_last_name | ca_city         |
++-------------+-----------------+
+| Lewis       | Fairfield       |
+| Moses       | Fairview        |
+| Hamilton    | Pleasant Valley |
+| White       | Oak Ridge       |
+| Moran       | Glendale        |
+| Sharp       | Lakeview        |
+| Wiles       | Farmington      |
+| Shipman     | Union           |
+| Gilbert     | New Hope        |
+| Brunson     | Martinsville    |
++-------------+-----------------+
+Returned 10 row(s) in 0.63s</code></pre>
+
+    <p class="p">
+      Or you might use additional comparison operators or aggregation functions to condense a large result set into
+      a smaller set of values:
+    </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; -- Find the names of customers who live in one particular town.
+[localhost:21000] &gt; select distinct c_last_name from customer, customer_address where
+  c_customer_sk = ca_address_sk
+  and ca_city = "Green Acres";
++---------------+
+| c_last_name   |
++---------------+
+| Hensley       |
+| Pearson       |
+| Mayer         |
+| Montgomery    |
+| Ricks         |
+...
+| Barrett       |
+| Price         |
+| Hill          |
+| Hansen        |
+| Meeks         |
++---------------+
+Returned 332 row(s) in 0.97s
+
+[localhost:21000] &gt; -- See how many different customers in this town have names starting with "A".
+[localhost:21000] &gt; select count(distinct c_last_name) from customer, customer_address where
+  c_customer_sk = ca_address_sk
+  and ca_city = "Green Acres"
+  and substr(c_last_name,1,1) = "A";
++-----------------------------+
+| count(distinct c_last_name) |
++-----------------------------+
+| 12                          |
++-----------------------------+
+Returned 1 row(s) in 1.00s</code></pre>
+
+    <p class="p">
+      Because a join query can involve reading large amounts of data from disk, sending large amounts of data
+      across the network, and loading large amounts of data into memory to do the comparisons and filtering, you
+      might do benchmarking, performance analysis, and query tuning to find the most efficient join queries for
+      your data set, hardware capacity, network configuration, and cluster workload.
+    </p>
+
+    <p class="p">
+      The two categories of joins in Impala are known as <strong class="ph b">partitioned joins</strong> and <strong class="ph b">broadcast joins</strong>. If
+      inaccurate table or column statistics, or some quirk of the data distribution, causes Impala to choose the
+      wrong mechanism for a particular join, consider using query hints as a temporary workaround. For details, see
+      <a class="xref" href="impala_hints.html#hints">Query Hints in Impala SELECT Statements</a>.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Handling NULLs in Join Columns:</strong>
+    </p>
+
+    <p class="p">
+      By default, join key columns do not match if either one contains a <code class="ph codeph">NULL</code> value.
+      To treat such columns as equal if both contain <code class="ph codeph">NULL</code>, you can use an expression
+      such as <code class="ph codeph">A = B OR (A IS NULL AND B IS NULL)</code>.
+      In <span class="keyword">Impala 2.5</span> and higher, the <code class="ph codeph">&lt;=&gt;</code> operator (shorthand for
+      <code class="ph codeph">IS NOT DISTINCT FROM</code>) performs the same comparison in a concise and efficient form.
+      The <code class="ph codeph">&lt;=&gt;</code> operator is more efficient in for comparing join keys in a <code class="ph codeph">NULL</code>-safe
+      manner, because the operator can use a hash join while the <code class="ph codeph">OR</code> expression cannot.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+    <div class="p">
+      The following examples refer to these simple tables containing small sets of integers:
+<pre class="pre codeblock"><code>[localhost:21000] &gt; create table t1 (x int);
+[localhost:21000] &gt; insert into t1 values (1), (2), (3), (4), (5), (6);
+
+[localhost:21000] &gt; create table t2 (y int);
+[localhost:21000] &gt; insert into t2 values (2), (4), (6);
+
+[localhost:21000] &gt; create table t3 (z int);
+[localhost:21000] &gt; insert into t3 values (1), (3), (5);
+</code></pre>
+    </div>
+
+
+
+    <p class="p">
+      The following example demonstrates an anti-join, returning the values from <code class="ph codeph">T1</code> that do not
+      exist in <code class="ph codeph">T2</code> (in this case, the odd numbers 1, 3, and 5):
+    </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; select x from t1 left anti join t2 on (t1.x = t2.y);
++---+
+| x |
++---+
+| 1 |
+| 3 |
+| 5 |
++---+
+</code></pre>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+    <p class="p">
+      See these tutorials for examples of different kinds of joins:
+    </p>
+
+    <ul class="ul">
+      <li class="li">
+        <a class="xref" href="impala_tutorial.html#tut_cross_join">Cross Joins and Cartesian Products with the CROSS JOIN Operator</a>
+      </li>
+    </ul>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_select.html">SELECT Statement</a></div></div></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_kerberos.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_kerberos.html b/docs/build/html/topics/impala_kerberos.html
new file mode 100644
index 0000000..858ff87
--- /dev/null
+++ b/docs/build/html/topics/impala_kerberos.html
@@ -0,0 +1,342 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_authentication.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 
 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="kerberos"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Enabling Kerberos Authentication for Impala</title></head><body id="kerberos"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Enabling Kerberos Authentication for Impala</h1>
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      Impala supports an enterprise-grade authentication system called Kerberos. Kerberos provides strong security benefits including
+      capabilities that render intercepted authentication packets unusable by an attacker. It virtually eliminates the threat of
+      impersonation by never sending a user's credentials in cleartext over the network. For more information on Kerberos, visit
+      the <a class="xref" href="https://web.mit.edu/kerberos/" target="_blank">MIT Kerberos website</a>.
+    </p>
+
+    <p class="p">
+      The rest of this topic assumes you have a working <a class="xref" href="https://web.mit.edu/kerberos/krb5-latest/doc/admin/install_kdc.html" target="_blank">Kerberos Key Distribution Center (KDC)</a>
+      set up. To enable Kerberos, you first create a Kerberos principal for each host running
+      <span class="keyword cmdname">impalad</span> or <span class="keyword cmdname">statestored</span>.
+    </p>
+
+    <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+      Regardless of the authentication mechanism used, Impala always creates HDFS directories and data files
+      owned by the same user (typically <code class="ph codeph">impala</code>). To implement user-level access to different
+      databases, tables, columns, partitions, and so on, use the Sentry authorization feature, as explained in
+      <a class="xref" href="../shared/../topics/impala_authorization.html#authorization">Enabling Sentry Authorization for Impala</a>.
+    </div>
+
+    <p class="p">
+      An alternative form of authentication you can use is LDAP, described in <a class="xref" href="impala_ldap.html#ldap">Enabling LDAP Authentication for Impala</a>.
+    </p>
+
+    <p class="p toc inpage"></p>
+
+  </div>
+
+  <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_authentication.html">Impala Authentication</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="kerberos__kerberos_prereqs">
+
+    <h2 class="title topictitle2" id="ariaid-title2">Requirements for Using Impala with Kerberos</h2>
+  
+
+    <div class="body conbody">
+
+      <div class="p">
+        On version 5 of Red Hat Enterprise Linux and comparable distributions, some additional setup is needed for
+        the <span class="keyword cmdname">impala-shell</span> interpreter to connect to a Kerberos-enabled Impala cluster:
+<pre class="pre codeblock"><code>sudo yum install python-devel openssl-devel python-pip
+sudo pip-python install ssl</code></pre>
+      </div>
+
+      <div class="note important note_important"><span class="note__title importanttitle">Important:</span> 
+        <p class="p">
+          If you plan to use Impala in your cluster, you must configure your KDC to allow tickets to be renewed,
+          and you must configure <span class="ph filepath">krb5.conf</span> to request renewable tickets. Typically, you can do
+          this by adding the <code class="ph codeph">max_renewable_life</code> setting to your realm in
+          <span class="ph filepath">kdc.conf</span>, and by adding the <span class="ph filepath">renew_lifetime</span> parameter to the
+          <span class="ph filepath">libdefaults</span> section of <span class="ph filepath">krb5.conf</span>. For more information about
+          renewable tickets, see the
+          <a class="xref" href="http://web.mit.edu/Kerberos/krb5-1.8/" target="_blank"> Kerberos
+          documentation</a>.
+        </p>
+        <p class="p">
+          Currently, you cannot use the resource management feature on a cluster that has Kerberos
+          authentication enabled.
+        </p>
+      </div>
+
+      <p class="p">
+        Start all <span class="keyword cmdname">impalad</span> and <span class="keyword cmdname">statestored</span> daemons with the
+        <code class="ph codeph">--principal</code> and <code class="ph codeph">--keytab-file</code> flags set to the principal and full path
+        name of the <code class="ph codeph">keytab</code> file containing the credentials for the principal.
+      </p>
+
+      <p class="p">
+        To enable Kerberos in the Impala shell, start the <span class="keyword cmdname">impala-shell</span> command using the
+        <code class="ph codeph">-k</code> flag.
+      </p>
+
+      <p class="p">
+        To enable Impala to work with Kerberos security on your Hadoop cluster, make sure you perform the
+        installation and configuration steps in
+        <a class="xref" href="https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/SecureMode.html#Authentication" target="_blank">Authentication in Hadoop</a>.
+        Note that when Kerberos security is enabled in Impala, a web browser that
+        supports Kerberos HTTP SPNEGO is required to access the Impala web console (for example, Firefox, Internet
+        Explorer, or Chrome).
+      </p>
+
+      <p class="p">
+        If the NameNode, Secondary NameNode, DataNode, JobTracker, TaskTrackers, ResourceManager, NodeManagers,
+        HttpFS, Oozie, Impala, or Impala statestore services are configured to use Kerberos HTTP SPNEGO
+        authentication, and two or more of these services are running on the same host, then all of the running
+        services must use the same HTTP principal and keytab file used for their HTTP endpoints.
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="kerberos__kerberos_config">
+
+    <h2 class="title topictitle2" id="ariaid-title3">Configuring Impala to Support Kerberos Security</h2>
+  
+
+    <div class="body conbody">
+
+      <p class="p">
+        Enabling Kerberos authentication for Impala involves steps that can be summarized as follows:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          Creating service principals for Impala and the HTTP service. Principal names take the form:
+          <code class="ph codeph"><var class="keyword varname">serviceName</var>/<var class="keyword varname">fully.qualified.domain.name</var>@<var class="keyword varname">KERBEROS.REALM</var></code>.
+          <p class="p">
+        In Impala 2.0 and later, <code class="ph codeph">user()</code> returns the full Kerberos principal string, such as
+        <code class="ph codeph">user@example.com</code>, in a Kerberized environment.
+      </p>
+        </li>
+
+        <li class="li">
+          Creating, merging, and distributing key tab files for these principals.
+        </li>
+
+        <li class="li">
+          Editing <code class="ph codeph">/etc/default/impala</code>
+          to accommodate Kerberos authentication.
+        </li>
+      </ul>
+    </div>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title4" id="kerberos_config__kerberos_setup">
+
+      <h3 class="title topictitle3" id="ariaid-title4">Enabling Kerberos for Impala</h3>
+
+      <div class="body conbody">
+
+
+
+        <ol class="ol">
+          <li class="li">
+            Create an Impala service principal, specifying the name of the OS user that the Impala daemons run
+            under, the fully qualified domain name of each node running <span class="keyword cmdname">impalad</span>, and the realm
+            name. For example:
+<pre class="pre codeblock"><code>$ kadmin
+kadmin: addprinc -requires_preauth -randkey impala/impala_host.example.com@TEST.EXAMPLE.COM</code></pre>
+          </li>
+
+          <li class="li">
+            Create an HTTP service principal. For example:
+<pre class="pre codeblock"><code>kadmin: addprinc -randkey HTTP/impala_host.example.com@TEST.EXAMPLE.COM</code></pre>
+            <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+              The <code class="ph codeph">HTTP</code> component of the service principal must be uppercase as shown in the
+              preceding example.
+            </div>
+          </li>
+
+          <li class="li">
+            Create <code class="ph codeph">keytab</code> files with both principals. For example:
+<pre class="pre codeblock"><code>kadmin: xst -k impala.keytab impala/impala_host.example.com
+kadmin: xst -k http.keytab HTTP/impala_host.example.com
+kadmin: quit</code></pre>
+          </li>
+
+          <li class="li">
+            Use <code class="ph codeph">ktutil</code> to read the contents of the two keytab files and then write those contents
+            to a new file. For example:
+<pre class="pre codeblock"><code>$ ktutil
+ktutil: rkt impala.keytab
+ktutil: rkt http.keytab
+ktutil: wkt impala-http.keytab
+ktutil: quit</code></pre>
+          </li>
+
+          <li class="li">
+            (Optional) Test that credentials in the merged keytab file are valid, and that the <span class="q">"renew until"</span>
+            date is in the future. For example:
+<pre class="pre codeblock"><code>$ klist -e -k -t impala-http.keytab</code></pre>
+          </li>
+
+          <li class="li">
+            Copy the <span class="ph filepath">impala-http.keytab</span> file to the Impala configuration directory. Change the
+            permissions to be only read for the file owner and change the file owner to the <code class="ph codeph">impala</code>
+            user. By default, the Impala user and group are both named <code class="ph codeph">impala</code>. For example:
+<pre class="pre codeblock"><code>$ cp impala-http.keytab /etc/impala/conf
+$ cd /etc/impala/conf
+$ chmod 400 impala-http.keytab
+$ chown impala:impala impala-http.keytab</code></pre>
+          </li>
+
+          <li class="li">
+            Add Kerberos options to the Impala defaults file, <span class="ph filepath">/etc/default/impala</span>. Add the
+            options for both the <span class="keyword cmdname">impalad</span> and <span class="keyword cmdname">statestored</span> daemons, using the
+            <code class="ph codeph">IMPALA_SERVER_ARGS</code> and <code class="ph codeph">IMPALA_STATE_STORE_ARGS</code> variables. For
+            example, you might add:
+
+<pre class="pre codeblock"><code>-kerberos_reinit_interval=60
+-principal=impala_1/impala_host.example.com@TEST.EXAMPLE.COM
+-keytab_file=<var class="keyword varname">/path/to/impala.keytab</var></code></pre>
+            <p class="p">
+              For more information on changing the Impala defaults specified in
+              <span class="ph filepath">/etc/default/impala</span>, see
+              <a class="xref" href="impala_config_options.html#config_options">Modifying Impala Startup
+              Options</a>.
+            </p>
+          </li>
+        </ol>
+
+        <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+          Restart <span class="keyword cmdname">impalad</span> and <span class="keyword cmdname">statestored</span> for these configuration changes to
+          take effect.
+        </div>
+      </div>
+    </article>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title5" id="kerberos__kerberos_proxy">
+
+    <h2 class="title topictitle2" id="ariaid-title5">Enabling Kerberos for Impala with a Proxy Server</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        A common configuration for Impala with High Availability is to use a proxy server to submit requests to the
+        actual <span class="keyword cmdname">impalad</span> daemons on different hosts in the cluster. This configuration avoids
+        connection problems in case of machine failure, because the proxy server can route new requests through one
+        of the remaining hosts in the cluster. This configuration also helps with load balancing, because the
+        additional overhead of being the <span class="q">"coordinator node"</span> for each query is spread across multiple hosts.
+      </p>
+
+      <p class="p">
+        Although you can set up a proxy server with or without Kerberos authentication, typically users set up a
+        secure Kerberized configuration. For information about setting up a proxy server for Impala, including
+        Kerberos-specific steps, see <a class="xref" href="impala_proxy.html#proxy">Using Impala through a Proxy for High Availability</a>.
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title6" id="kerberos__spnego">
+
+    <h2 class="title topictitle2" id="ariaid-title6">Using a Web Browser to Access a URL Protected by Kerberos HTTP SPNEGO</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        Your web browser must support Kerberos HTTP SPNEGO. For example, Chrome, Firefox, or Internet Explorer.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">To configure Firefox to access a URL protected by Kerberos HTTP SPNEGO:</strong>
+      </p>
+
+      <ol class="ol">
+        <li class="li">
+          Open the advanced settings Firefox configuration page by loading the <code class="ph codeph">about:config</code> page.
+        </li>
+
+        <li class="li">
+          Use the <strong class="ph b">Filter</strong> text box to find <code class="ph codeph">network.negotiate-auth.trusted-uris</code>.
+        </li>
+
+        <li class="li">
+          Double-click the <code class="ph codeph">network.negotiate-auth.trusted-uris</code> preference and enter the hostname
+          or the domain of the web server that is protected by Kerberos HTTP SPNEGO. Separate multiple domains and
+          hostnames with a comma.
+        </li>
+
+        <li class="li">
+          Click <strong class="ph b">OK</strong>.
+        </li>
+      </ol>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title7" id="kerberos__kerberos_delegation">
+    <h2 class="title topictitle2" id="ariaid-title7">Enabling Impala Delegation for Kerberos Users</h2>
+    <div class="body conbody">
+      <p class="p">
+        See <a class="xref" href="impala_delegation.html#delegation">Configuring Impala Delegation for Hue and BI Tools</a> for details about the delegation feature
+        that lets certain users submit queries using the credentials of other users.
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title8" id="kerberos__ssl_jdbc_odbc">
+    <h2 class="title topictitle2" id="ariaid-title8">Using TLS/SSL with Business Intelligence Tools</h2>
+    <div class="body conbody">
+      <p class="p">
+        You can use Kerberos authentication, TLS/SSL encryption, or both to secure
+        connections from JDBC and ODBC applications to Impala.
+        See <a class="xref" href="impala_jdbc.html#impala_jdbc">Configuring Impala to Work with JDBC</a> and <a class="xref" href="impala_odbc.html#impala_odbc">Configuring Impala to Work with ODBC</a>
+        for details.
+      </p>
+
+      <p class="p">
+        Prior to <span class="keyword">Impala 2.5</span>, the Hive JDBC driver did not support connections that use both Kerberos authentication
+        and SSL encryption. If your cluster is running an older release that has this restriction,
+        use an alternative JDBC driver that supports
+        both of these security features.
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title9" id="kerberos__whitelisting_internal_apis">
+  <h2 class="title topictitle2" id="ariaid-title9">Enabling Access to Internal Impala APIs for Kerberos Users</h2>
+    <div class="body conbody">
+    
+      <p class="p">
+        For applications that need direct access
+        to Impala APIs, without going through the HiveServer2 or Beeswax interfaces, you can
+        specify a list of Kerberos users who are allowed to call those APIs. By default, the
+        <code class="ph codeph">impala</code> and <code class="ph codeph">hdfs</code> users are the only ones authorized
+        for this kind of access.
+        Any users not explicitly authorized through the <code class="ph codeph">internal_principals_whitelist</code>
+        configuration setting are blocked from accessing the APIs. This setting applies to all the
+        Impala-related daemons, although currently it is primarily used for HDFS to control the
+        behavior of the catalog server.
+      </p>
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title10" id="kerberos__auth_to_local">
+    <h2 class="title topictitle2" id="ariaid-title10">Mapping Kerberos Principals to Short Names for Impala</h2>
+    <div class="body conbody">
+      <div class="p">
+      In <span class="keyword">Impala 2.6</span> and higher, Impala recognizes the <code class="ph codeph">auth_to_local</code> setting,
+      specified through the HDFS configuration setting
+      <code class="ph codeph">hadoop.security.auth_to_local</code>.
+      This feature is disabled by default, to avoid an unexpected change in security-related behavior.
+      To enable it:
+      <ul class="ul">
+        <li class="li">
+          <p class="p">
+            Specify <code class="ph codeph">--load_auth_to_local_rules=true</code>
+            in the <span class="keyword cmdname">impalad</span> and <span class="keyword cmdname">catalogd</span> configuration settings.
+          </p>
+        </li>
+      </ul>
+    </div>
+    </div>
+  </article>
+
+</article></main></body></html>
\ No newline at end of file

[06/51] [partial] incubator-impala git commit: IMPALA-4181 [DOCS] Publish rendered Impala documentation to ASF site

Posted by jb...@apache.org.

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_show.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_show.html b/docs/build/html/topics/impala_show.html
new file mode 100644
index 0000000..cfe748b
--- /dev/null
+++ b/docs/build/html/topics/impala_show.html
@@ -0,0 +1,1525 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_langref_sql.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" 
 content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="show"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>SHOW Statement</title></head><body id="show"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">SHOW Statement</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      
+      The <code class="ph codeph">SHOW</code> statement is a flexible way to get information about different types of Impala
+      objects.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>SHOW DATABASES [[LIKE] '<var class="keyword varname">pattern</var>']
+SHOW SCHEMAS [[LIKE] '<var class="keyword varname">pattern</var>'] - an alias for SHOW DATABASES
+SHOW TABLES [IN <var class="keyword varname">database_name</var>] [[LIKE] '<var class="keyword varname">pattern</var>']
+<span class="ph">SHOW [AGGREGATE | ANALYTIC] FUNCTIONS [IN <var class="keyword varname">database_name</var>] [[LIKE] '<var class="keyword varname">pattern</var>']</span>
+<span class="ph">SHOW CREATE TABLE [<var class="keyword varname">database_name</var>].<var class="keyword varname">table_name</var></span>
+<span class="ph">SHOW TABLE STATS [<var class="keyword varname">database_name</var>.]<var class="keyword varname">table_name</var></span>
+<span class="ph">SHOW COLUMN STATS [<var class="keyword varname">database_name</var>.]<var class="keyword varname">table_name</var></span>
+<span class="ph">SHOW PARTITIONS [<var class="keyword varname">database_name</var>.]<var class="keyword varname">table_name</var></span>
+<span class="ph">SHOW <span class="ph">[RANGE]</span> PARTITIONS [<var class="keyword varname">database_name</var>.]<var class="keyword varname">table_name</var></span>
+SHOW FILES IN [<var class="keyword varname">database_name</var>.]<var class="keyword varname">table_name</var> <span class="ph">[PARTITION (<var class="keyword varname">key_col_expression</var> [, <var class="keyword varname">key_col_expression</var>]</span>]
+
+<span class="ph">SHOW ROLES
+SHOW CURRENT ROLES
+SHOW ROLE GRANT GROUP <var class="keyword varname">group_name</var>
+SHOW GRANT ROLE <var class="keyword varname">role_name</var></span>
+</code></pre>
+
+
+
+
+
+
+
+    <p class="p">
+      Issue a <code class="ph codeph">SHOW <var class="keyword varname">object_type</var></code> statement to see the appropriate objects in the
+      current database, or <code class="ph codeph">SHOW <var class="keyword varname">object_type</var> IN <var class="keyword varname">database_name</var></code>
+      to see objects in a specific database.
+    </p>
+
+    <p class="p">
+      The optional <var class="keyword varname">pattern</var> argument is a quoted string literal, using Unix-style
+      <code class="ph codeph">*</code> wildcards and allowing <code class="ph codeph">|</code> for alternation. The preceding
+      <code class="ph codeph">LIKE</code> keyword is also optional. All object names are stored in lowercase, so use all
+      lowercase letters in the pattern string. For example:
+    </p>
+
+<pre class="pre codeblock"><code>show databases 'a*';
+show databases like 'a*';
+show tables in some_db like '*fact*';
+use some_db;
+show tables '*dim*|*fact*';</code></pre>
+
+    <p class="p">
+        <strong class="ph b">Cancellation:</strong> Cannot be cancelled.
+      </p>
+
+    <p class="p toc inpage"></p>
+
+  </div>
+
+  <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_langref_sql.html">Impala SQL Statements</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="show__show_files">
+
+    <h2 class="title topictitle2" id="ariaid-title2">SHOW FILES Statement</h2>
+  
+
+    <div class="body conbody">
+
+      <p class="p">
+        The <code class="ph codeph">SHOW FILES</code> statement displays the files that constitute a specified table,
+        or a partition within a partitioned table. This syntax is available in <span class="keyword">Impala 2.2</span> and higher
+        only. The output includes the names of the files, the size of each file, and the applicable partition
+        for a partitioned table. The size includes a suffix of <code class="ph codeph">B</code> for bytes,
+        <code class="ph codeph">MB</code> for megabytes, and <code class="ph codeph">GB</code> for gigabytes.
+      </p>
+
+      <div class="p">
+        In <span class="keyword">Impala 2.8</span> and higher, you can use general
+        expressions with operators such as <code class="ph codeph">&lt;</code>, <code class="ph codeph">IN</code>,
+        <code class="ph codeph">LIKE</code>, and <code class="ph codeph">BETWEEN</code> in the <code class="ph codeph">PARTITION</code>
+        clause, instead of only equality operators. For example:
+<pre class="pre codeblock"><code>
+show files in sample_table partition (j &lt; 5);
+show files in sample_table partition (k = 3, l between 1 and 10);
+show files in sample_table partition (month like 'J%');
+
+</code></pre>
+      </div>
+
+      <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+        This statement applies to tables and partitions stored on HDFS, or in the Amazon Simple Storage System (S3).
+        It does not apply to views.
+        It does not apply to tables mapped onto HBase <span class="ph">or Kudu</span>,
+        because those data management systems do not use the same file-based storage layout.
+      </div>
+
+      <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+      <p class="p">
+        You can use this statement to verify the results of your ETL process: that is, that
+        the expected files are present, with the expected sizes. You can examine the file information
+        to detect conditions such as empty files, missing files, or inefficient layouts due to
+        a large number of small files. When you use <code class="ph codeph">INSERT</code> statements to copy
+        from one table to another, you can see how the file layout changes due to file format
+        conversions, compaction of small input files into large data blocks, and
+        multiple output files from parallel queries and partitioned inserts.
+      </p>
+
+      <p class="p">
+        The output from this statement does not include files that Impala considers to be hidden
+        or invisible, such as those whose names start with a dot or an underscore, or that
+        end with the suffixes <code class="ph codeph">.copying</code> or <code class="ph codeph">.tmp</code>.
+      </p>
+
+      <p class="p">
+        The information for partitioned tables complements the output of the <code class="ph codeph">SHOW PARTITIONS</code>
+        statement, which summarizes information about each partition. <code class="ph codeph">SHOW PARTITIONS</code>
+        produces some output for each partition, while <code class="ph codeph">SHOW FILES</code> does not
+        produce any output for empty partitions because they do not include any data files.
+      </p>
+
+
+      <p class="p">
+        <strong class="ph b">HDFS permissions:</strong>
+      </p>
+      <p class="p">
+        The user ID that the <span class="keyword cmdname">impalad</span> daemon runs under,
+        typically the <code class="ph codeph">impala</code> user, must have read
+        permission for all the table files, read and execute permission for all the directories that make up the table,
+        and execute permission for the database directory and all its parent directories.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+      <p class="p">
+        The following example shows a <code class="ph codeph">SHOW FILES</code> statement
+        for an unpartitioned table using text format:
+      </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; create table unpart_text (x bigint, s string);
+[localhost:21000] &gt; insert into unpart_text (x, s) select id, name
+                  &gt; from oreilly.sample_data limit 20e6;
+[localhost:21000] &gt; show files in unpart_text;
++------------------------------------------------------------------------------+----------+-----------+
+| path                                                                         | size     | partition |
++------------------------------------------------------------------------------+----------+-----------+
+| hdfs://<var class="keyword varname">impala_data_dir</var>/show_files.db/unpart_text/35665776ef85cfaf_1012432410_data.0. | 448.31MB |           |
++------------------------------------------------------------------------------+----------+-----------+
+[localhost:21000] &gt; insert into unpart_text (x, s) select id, name from oreilly.sample_data limit 100e6;
+[localhost:21000] &gt; show files in unpart_text;
++--------------------------------------------------------------------------------------+----------+-----------+
+| path                                                                                 | size     | partition |
++--------------------------------------------------------------------------------------+----------+-----------+
+| hdfs://<var class="keyword varname">impala_data_dir</var>/show_files.db/unpart_text/35665776ef85cfaf_1012432410_data.0. | 448.31MB |           |
+| hdfs://<var class="keyword varname">impala_data_dir</var>/show_files.db/unpart_text/ac3dba252a8952b8_1663177415_data.0. | 2.19GB   |           |
++--------------------------------------------------------------------------------------+----------+-----------+
+</code></pre>
+
+      <p class="p">
+        This example illustrates how, after issuing some <code class="ph codeph">INSERT ... VALUES</code> statements,
+        the table now contains some tiny files of just a few bytes. Such small files could cause inefficient processing of
+        parallel queries that are expecting multi-megabyte input files. The example shows how you might compact the small files by doing
+        an <code class="ph codeph">INSERT ... SELECT</code> into a different table, possibly converting the data to Parquet in the process:
+      </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; insert into unpart_text values (10,'hello'), (20, 'world');
+[localhost:21000] &gt; insert into unpart_text values (-1,'foo'), (-1000, 'bar');
+[localhost:21000] &gt; show files in unpart_text;
++--------------------------------------------------------------------------------------+----------+
+| path                                                                                 | size     |
++--------------------------------------------------------------------------------------+----------+
+| hdfs://<var class="keyword varname">impala_data_dir</var>/show_files.db/unpart_text/4f11b8bdf8b6aa92_238145083_data.0.  | 18B
+| hdfs://<var class="keyword varname">impala_data_dir</var>/show_files.db/unpart_text/35665776ef85cfaf_1012432410_data.0. | 448.31MB
+| hdfs://<var class="keyword varname">impala_data_dir</var>/show_files.db/unpart_text/ac3dba252a8952b8_1663177415_data.0. | 2.19GB
+| hdfs://<var class="keyword varname">impala_data_dir</var>/show_files.db/unpart_text/cfb8252452445682_1868457216_data.0. | 17B
++--------------------------------------------------------------------------------------+----------+
+[localhost:21000] &gt; create table unpart_parq stored as parquet as select * from unpart_text;
++---------------------------+
+| summary                   |
++---------------------------+
+| Inserted 120000002 row(s) |
++---------------------------+
+[localhost:21000] &gt; show files in unpart_parq;
++------------------------------------------------------------------------------------------+----------+
+| path                                                                                     | size     |
++------------------------------------------------------------------------------------------+----------+
+| hdfs://<var class="keyword varname">impala_data_dir</var>/show_files.db/unpart_parq/60798d96ba630184_549959007_data.0.parq  | 255.36MB |
+| hdfs://<var class="keyword varname">impala_data_dir</var>/show_files.db/unpart_parq/60798d96ba630184_549959007_data.1.parq  | 178.52MB |
+| hdfs://<var class="keyword varname">impala_data_dir</var>/show_files.db/unpart_parq/60798d96ba630185_549959007_data.0.parq  | 255.37MB |
+| hdfs://<var class="keyword varname">impala_data_dir</var>/show_files.db/unpart_parq/60798d96ba630185_549959007_data.1.parq  | 57.71MB  |
+| hdfs://<var class="keyword varname">impala_data_dir</var>/show_files.db/unpart_parq/60798d96ba630186_2141167244_data.0.parq | 255.40MB |
+| hdfs://<var class="keyword varname">impala_data_dir</var>/show_files.db/unpart_parq/60798d96ba630186_2141167244_data.1.parq | 175.52MB |
+| hdfs://<var class="keyword varname">impala_data_dir</var>/show_files.db/unpart_parq/60798d96ba630187_1006832086_data.0.parq | 255.40MB |
+| hdfs://<var class="keyword varname">impala_data_dir</var>/show_files.db/unpart_parq/60798d96ba630187_1006832086_data.1.parq | 214.61MB |
++------------------------------------------------------------------------------------------+----------+
+</code></pre>
+
+      <p class="p">
+        The following example shows a <code class="ph codeph">SHOW FILES</code> statement for a partitioned text table
+        with data in two different partitions, and two empty partitions.
+        The partitions with no data are not represented in the <code class="ph codeph">SHOW FILES</code> output.
+      </p>
+<pre class="pre codeblock"><code>[localhost:21000] &gt; create table part_text (x bigint, y int, s string)
+                                        &gt; partitioned by (year bigint, month bigint, day bigint);
+[localhost:21000] &gt; insert overwrite part_text (x, y, s) partition (year=2014,month=1,day=1)
+                  &gt; select id, val, name from oreilly.normalized_parquet
+where id between 1 and 1000000;
+[localhost:21000] &gt; insert overwrite part_text (x, y, s) partition (year=2014,month=1,day=2)
+                  &gt; select id, val, name from oreilly.normalized_parquet
+                  &gt; where id between 1000001 and 2000000;
+[localhost:21000] &gt; alter table part_text add partition (year=2014,month=1,day=3);
+[localhost:21000] &gt; alter table part_text add partition (year=2014,month=1,day=4);
+[localhost:21000] &gt; show partitions part_text;
++-------+-------+-----+-------+--------+---------+--------------+-------------------+--------+-------------------+
+| year  | month | day | #Rows | #Files | Size    | Bytes Cached | Cache Replication | Format | Incremental stats |
++-------+-------+-----+-------+--------+---------+--------------+-------------------+--------+-------------------+
+| 2014  | 1     | 1   | -1    | 4      | 25.16MB | NOT CACHED   | NOT CACHED        | TEXT   | false             |
+| 2014  | 1     | 2   | -1    | 4      | 26.22MB | NOT CACHED   | NOT CACHED        | TEXT   | false             |
+| 2014  | 1     | 3   | -1    | 0      | 0B      | NOT CACHED   | NOT CACHED        | TEXT   | false             |
+| 2014  | 1     | 4   | -1    | 0      | 0B      | NOT CACHED   | NOT CACHED        | TEXT   | false             |
+| Total |       |     | -1    | 8      | 51.38MB | 0B           |                   |        |                   |
++-------+-------+-----+-------+--------+---------+--------------+-------------------+--------+-------------------+
+[localhost:21000] &gt; show files in part_text;
++---------------------------------------------------------------------------------------------------------+--------+-------------------------+
+| path                                                                                                    | size   | partition               |
++---------------------------------------------------------------------------------------------------------+--------+-------------------------+
+| hdfs://<var class="keyword varname">impala_data_dir</var>/show_files.db/part_text/year=2014/month=1/day=1/80732d9dc80689f_1418645991_data.0.  | 5.77MB | year=2014/month=1/day=1 |
+| hdfs://<var class="keyword varname">impala_data_dir</var>/show_files.db/part_text/year=2014/month=1/day=1/80732d9dc8068a0_1418645991_data.0.  | 6.25MB | year=2014/month=1/day=1 |
+| hdfs://<var class="keyword varname">impala_data_dir</var>/show_files.db/part_text/year=2014/month=1/day=1/80732d9dc8068a1_147082319_data.0.   | 7.16MB | year=2014/month=1/day=1 |
+| hdfs://<var class="keyword varname">impala_data_dir</var>/show_files.db/part_text/year=2014/month=1/day=1/80732d9dc8068a2_2111411753_data.0.  | 5.98MB | year=2014/month=1/day=1 |
+| hdfs://<var class="keyword varname">impala_data_dir</var>/show_files.db/part_text/year=2014/month=1/day=2/21a828cf494b5bbb_501271652_data.0.  | 6.42MB | year=2014/month=1/day=2 |
+| hdfs://<var class="keyword varname">impala_data_dir</var>/show_files.db/part_text/year=2014/month=1/day=2/21a828cf494b5bbc_501271652_data.0.  | 6.62MB | year=2014/month=1/day=2 |
+| hdfs://<var class="keyword varname">impala_data_dir</var>/show_files.db/part_text/year=2014/month=1/day=2/21a828cf494b5bbd_1393490200_data.0. | 6.98MB | year=2014/month=1/day=2 |
+| hdfs://<var class="keyword varname">impala_data_dir</var>/show_files.db/part_text/year=2014/month=1/day=2/21a828cf494b5bbe_1393490200_data.0. | 6.20MB | year=2014/month=1/day=2 |
++---------------------------------------------------------------------------------------------------------+--------+-------------------------+
+</code></pre>
+      <p class="p">
+        The following example shows a <code class="ph codeph">SHOW FILES</code> statement for a partitioned Parquet table.
+        The number and sizes of files are different from the equivalent partitioned text table
+        used in the previous example, because <code class="ph codeph">INSERT</code> operations for Parquet tables
+        are parallelized differently than for text tables. (Also, the amount of data is so small
+        that it can be written to Parquet without involving all the hosts in this 4-node cluster.)
+      </p>
+<pre class="pre codeblock"><code>[localhost:21000] &gt; create table part_parq (x bigint, y int, s string) partitioned by (year bigint, month bigint, day bigint) stored as parquet;
+[localhost:21000] &gt; insert into part_parq partition (year,month,day) select x, y, s, year, month, day from partitioned_text;
+[localhost:21000] &gt; show partitions part_parq;
++-------+-------+-----+-------+--------+---------+--------------+-------------------+---------+-------------------+
+| year  | month | day | #Rows | #Files | Size    | Bytes Cached | Cache Replication | Format  | Incremental stats |
++-------+-------+-----+-------+--------+---------+--------------+-------------------+---------+-------------------+
+| 2014  | 1     | 1   | -1    | 3      | 17.89MB | NOT CACHED   | NOT CACHED        | PARQUET | false             |
+| 2014  | 1     | 2   | -1    | 3      | 17.89MB | NOT CACHED   | NOT CACHED        | PARQUET | false             |
+| Total |       |     | -1    | 6      | 35.79MB | 0B           |                   |         |                   |
++-------+-------+-----+-------+--------+---------+--------------+-------------------+---------+-------------------+
+[localhost:21000] &gt; show files in part_parq;
++-----------------------------------------------------------------------------------------------+--------+-------------------------+
+| path                                                                                          | size   | partition               |
++-----------------------------------------------------------------------------------------------+--------+-------------------------+
+| hdfs://<var class="keyword varname">impala_data_dir</var>/show_files.db/part_parq/year=2014/month=1/day=1/1134113650_data.0.parq | 4.49MB | year=2014/month=1/day=1 |
+| hdfs://<var class="keyword varname">impala_data_dir</var>/show_files.db/part_parq/year=2014/month=1/day=1/617567880_data.0.parq  | 5.14MB | year=2014/month=1/day=1 |
+| hdfs://<var class="keyword varname">impala_data_dir</var>/show_files.db/part_parq/year=2014/month=1/day=1/2099499416_data.0.parq | 8.27MB | year=2014/month=1/day=1 |
+| hdfs://<var class="keyword varname">impala_data_dir</var>/show_files.db/part_parq/year=2014/month=1/day=2/945567189_data.0.parq  | 8.80MB | year=2014/month=1/day=2 |
+| hdfs://<var class="keyword varname">impala_data_dir</var>/show_files.db/part_parq/year=2014/month=1/day=2/2145850112_data.0.parq | 4.80MB | year=2014/month=1/day=2 |
+| hdfs://<var class="keyword varname">impala_data_dir</var>/show_files.db/part_parq/year=2014/month=1/day=2/665613448_data.0.parq  | 4.29MB | year=2014/month=1/day=2 |
++-----------------------------------------------------------------------------------------------+--------+-------------------------+
+</code></pre>
+<p class="p">
+  The following example shows output from the <code class="ph codeph">SHOW FILES</code> statement
+  for a table where the data files are stored in Amazon S3:
+</p>
+<pre class="pre codeblock"><code>[localhost:21000] &gt; show files in s3_testing.sample_data_s3;
++-----------------------------------------------------------------------+---------+
+| path                                                                  | size    |
++-----------------------------------------------------------------------+---------+
+| s3a://impala-demo/sample_data/e065453cba1988a6_1733868553_data.0.parq | 24.84MB |
++-----------------------------------------------------------------------+---------+
+</code></pre>
+
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="show__show_roles">
+
+    <h2 class="title topictitle2" id="ariaid-title3">SHOW ROLES Statement</h2>
+  
+
+    <div class="body conbody">
+
+      <p class="p">
+        The <code class="ph codeph">SHOW ROLES</code> statement displays roles. This syntax is available in <span class="keyword">Impala 2.0</span> and later
+        only, when you are using the Sentry authorization framework along with the Sentry service, as described in
+        <a class="xref" href="impala_authorization.html#sentry_service">Using Impala with the Sentry Service (Impala 1.4 or higher only)</a>. It does not apply when you use the Sentry framework
+        with privileges defined in a policy file.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Security considerations:</strong>
+      </p>
+
+      <p class="p">
+        When authorization is enabled, the output of the <code class="ph codeph">SHOW</code> statement is limited to those
+        objects for which you have some privilege. There might be other database, tables, and so on, but their
+        names are concealed. If you believe an object exists but you cannot see it in the <code class="ph codeph">SHOW</code>
+        output, check with the system administrator if you need to be granted a new privilege for that object. See
+        <a class="xref" href="../shared/../topics/impala_authorization.html#authorization">Enabling Sentry Authorization for Impala</a> for how to set up authorization and add
+        privileges for specific kinds of objects.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+      <p class="p">
+        Depending on the roles set up within your organization by the <code class="ph codeph">CREATE ROLE</code> statement, the
+        output might look something like this:
+      </p>
+
+<pre class="pre codeblock"><code>show roles;
++-----------+
+| role_name |
++-----------+
+| analyst   |
+| role1     |
+| sales     |
+| superuser |
+| test_role |
++-----------+
+</code></pre>
+
+      <p class="p">
+        <strong class="ph b">HDFS permissions:</strong> This statement does not touch any HDFS files or directories,
+        therefore no HDFS permissions are required.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+      <p class="p">
+        <a class="xref" href="impala_authorization.html#authorization">Enabling Sentry Authorization for Impala</a>
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="show__show_current_role">
+
+    <h2 class="title topictitle2" id="ariaid-title4">SHOW CURRENT ROLE</h2>
+  
+
+    <div class="body conbody">
+
+      <p class="p">
+        The <code class="ph codeph">SHOW CURRENT ROLE</code> statement displays roles assigned to the current user. This syntax
+        is available in <span class="keyword">Impala 2.0</span> and later only, when you are using the Sentry authorization framework along with
+        the Sentry service, as described in <a class="xref" href="impala_authorization.html#sentry_service">Using Impala with the Sentry Service (Impala 1.4 or higher only)</a>. It does not
+        apply when you use the Sentry framework with privileges defined in a policy file.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Security considerations:</strong>
+      </p>
+
+      <p class="p">
+        When authorization is enabled, the output of the <code class="ph codeph">SHOW</code> statement is limited to those
+        objects for which you have some privilege. There might be other database, tables, and so on, but their
+        names are concealed. If you believe an object exists but you cannot see it in the <code class="ph codeph">SHOW</code>
+        output, check with the system administrator if you need to be granted a new privilege for that object. See
+        <a class="xref" href="../shared/../topics/impala_authorization.html#authorization">Enabling Sentry Authorization for Impala</a> for how to set up authorization and add
+        privileges for specific kinds of objects.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+      <p class="p">
+        Depending on the roles set up within your organization by the <code class="ph codeph">CREATE ROLE</code> statement, the
+        output might look something like this:
+      </p>
+
+<pre class="pre codeblock"><code>show current roles;
++-----------+
+| role_name |
++-----------+
+| role1     |
+| superuser |
++-----------+
+</code></pre>
+
+      <p class="p">
+        <strong class="ph b">HDFS permissions:</strong> This statement does not touch any HDFS files or directories,
+        therefore no HDFS permissions are required.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+      <p class="p">
+        <a class="xref" href="impala_authorization.html#authorization">Enabling Sentry Authorization for Impala</a>
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title5" id="show__show_role_grant">
+
+    <h2 class="title topictitle2" id="ariaid-title5">SHOW ROLE GRANT Statement</h2>
+  
+
+
+    <div class="body conbody">
+
+      <p class="p">
+
+        The <code class="ph codeph">SHOW ROLE GRANT</code> statement lists all the roles assigned to the specified group. This
+        statement is only allowed for Sentry administrative users and others users that are part of the specified
+        group. This syntax is available in <span class="keyword">Impala 2.0</span> and later only, when you are using the Sentry authorization
+        framework along with the Sentry service, as described in
+        <a class="xref" href="impala_authorization.html#sentry_service">Using Impala with the Sentry Service (Impala 1.4 or higher only)</a>. It does not apply when you use the Sentry framework
+        with privileges defined in a policy file.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Security considerations:</strong>
+      </p>
+
+      <p class="p">
+        When authorization is enabled, the output of the <code class="ph codeph">SHOW</code> statement is limited to those
+        objects for which you have some privilege. There might be other database, tables, and so on, but their
+        names are concealed. If you believe an object exists but you cannot see it in the <code class="ph codeph">SHOW</code>
+        output, check with the system administrator if you need to be granted a new privilege for that object. See
+        <a class="xref" href="../shared/../topics/impala_authorization.html#authorization">Enabling Sentry Authorization for Impala</a> for how to set up authorization and add
+        privileges for specific kinds of objects.
+      </p>
+
+
+
+      <p class="p">
+        <strong class="ph b">HDFS permissions:</strong> This statement does not touch any HDFS files or directories,
+        therefore no HDFS permissions are required.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+      <p class="p">
+        <a class="xref" href="impala_authorization.html#authorization">Enabling Sentry Authorization for Impala</a>
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title6" id="show__show_grant_role">
+
+    <h2 class="title topictitle2" id="ariaid-title6">SHOW GRANT ROLE Statement</h2>
+  
+
+
+    <div class="body conbody">
+
+      <p class="p">
+
+        The <code class="ph codeph">SHOW GRANT ROLE</code> statement list all the grants for the given role name. This statement
+        is only allowed for Sentry administrative users and other users that have been granted the specified role.
+        This syntax is available in <span class="keyword">Impala 2.0</span> and later only, when you are using the Sentry authorization framework
+        along with the Sentry service, as described in <a class="xref" href="impala_authorization.html#sentry_service">Using Impala with the Sentry Service (Impala 1.4 or higher only)</a>. It
+        does not apply when you use the Sentry framework with privileges defined in a policy file.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Security considerations:</strong>
+      </p>
+
+      <p class="p">
+        When authorization is enabled, the output of the <code class="ph codeph">SHOW</code> statement is limited to those
+        objects for which you have some privilege. There might be other database, tables, and so on, but their
+        names are concealed. If you believe an object exists but you cannot see it in the <code class="ph codeph">SHOW</code>
+        output, check with the system administrator if you need to be granted a new privilege for that object. See
+        <a class="xref" href="../shared/../topics/impala_authorization.html#authorization">Enabling Sentry Authorization for Impala</a> for how to set up authorization and add
+        privileges for specific kinds of objects.
+      </p>
+
+
+
+      <p class="p">
+        <strong class="ph b">HDFS permissions:</strong> This statement does not touch any HDFS files or directories,
+        therefore no HDFS permissions are required.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+      <p class="p">
+        <a class="xref" href="impala_authorization.html#authorization">Enabling Sentry Authorization for Impala</a>
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title7" id="show__show_databases">
+
+    <h2 class="title topictitle2" id="ariaid-title7">SHOW DATABASES</h2>
+  
+
+    <div class="body conbody">
+
+      <p class="p">
+        The <code class="ph codeph">SHOW DATABASES</code> statement is often the first one you issue when connecting to an
+        instance for the first time. You typically issue <code class="ph codeph">SHOW DATABASES</code> to see the names you can
+        specify in a <code class="ph codeph">USE <var class="keyword varname">db_name</var></code> statement, then after switching to a database
+        you issue <code class="ph codeph">SHOW TABLES</code> to see the names you can specify in <code class="ph codeph">SELECT</code> and
+        <code class="ph codeph">INSERT</code> statements.
+      </p>
+
+      <p class="p">
+        In <span class="keyword">Impala 2.5</span> and higher, the output includes a second column showing any associated comment
+        for each database.
+      </p>
+
+      <p class="p">
+        The output of <code class="ph codeph">SHOW DATABASES</code> includes the special <code class="ph codeph">_impala_builtins</code>
+        database, which lets you view definitions of built-in functions, as described under <code class="ph codeph">SHOW
+        FUNCTIONS</code>.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Security considerations:</strong>
+      </p>
+
+      <p class="p">
+        When authorization is enabled, the output of the <code class="ph codeph">SHOW</code> statement is limited to those
+        objects for which you have some privilege. There might be other database, tables, and so on, but their
+        names are concealed. If you believe an object exists but you cannot see it in the <code class="ph codeph">SHOW</code>
+        output, check with the system administrator if you need to be granted a new privilege for that object. See
+        <a class="xref" href="../shared/../topics/impala_authorization.html#authorization">Enabling Sentry Authorization for Impala</a> for how to set up authorization and add
+        privileges for specific kinds of objects.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+      <p class="p">
+        This example shows how you might locate a particular table on an unfamiliar system. The
+        <code class="ph codeph">DEFAULT</code> database is the one you initially connect to; a database with that name is present
+        on every system. You can issue <code class="ph codeph">SHOW TABLES IN <var class="keyword varname">db_name</var></code> without going
+        into a database, or <code class="ph codeph">SHOW TABLES</code> once you are inside a particular database.
+      </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; show databases;
++------------------+----------------------------------------------+
+| name             | comment                                      |
++------------------+----------------------------------------------+
+| _impala_builtins | System database for Impala builtin functions |
+| default          | Default Hive database                        |
+| file_formats     |                                              |
++------------------+----------------------------------------------+
+Returned 3 row(s) in 0.02s
+[localhost:21000] &gt; show tables in file_formats;
++--------------------+
+| name               |
++--------------------+
+| parquet_table      |
+| rcfile_table       |
+| sequencefile_table |
+| textfile_table     |
++--------------------+
+Returned 4 row(s) in 0.01s
+[localhost:21000] &gt; use file_formats;
+[localhost:21000] &gt; show tables like '*parq*';
++--------------------+
+| name               |
++--------------------+
+| parquet_table      |
++--------------------+
+Returned 1 row(s) in 0.01s</code></pre>
+
+      <p class="p">
+        <strong class="ph b">HDFS permissions:</strong> This statement does not touch any HDFS files or directories,
+        therefore no HDFS permissions are required.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+      <p class="p">
+        <a class="xref" href="impala_databases.html#databases">Overview of Impala Databases</a>, <a class="xref" href="impala_create_database.html#create_database">CREATE DATABASE Statement</a>,
+        <a class="xref" href="impala_drop_database.html#drop_database">DROP DATABASE Statement</a>, <a class="xref" href="impala_use.html#use">USE Statement</a>
+        <a class="xref" href="impala_show.html#show_tables">SHOW TABLES Statement</a>,
+        <a class="xref" href="impala_show.html#show_functions">SHOW FUNCTIONS Statement</a>
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title8" id="show__show_tables">
+
+    <h2 class="title topictitle2" id="ariaid-title8">SHOW TABLES Statement</h2>
+  
+
+    <div class="body conbody">
+
+      <p class="p">
+        Displays the names of tables. By default, lists tables in the current database, or with the
+        <code class="ph codeph">IN</code> clause, in a specified database. By default, lists all tables, or with the
+        <code class="ph codeph">LIKE</code> clause, only those whose name match a pattern with <code class="ph codeph">*</code> wildcards.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Security considerations:</strong>
+      </p>
+
+      <p class="p">
+        When authorization is enabled, the output of the <code class="ph codeph">SHOW</code> statement is limited to those
+        objects for which you have some privilege. There might be other database, tables, and so on, but their
+        names are concealed. If you believe an object exists but you cannot see it in the <code class="ph codeph">SHOW</code>
+        output, check with the system administrator if you need to be granted a new privilege for that object. See
+        <a class="xref" href="../shared/../topics/impala_authorization.html#authorization">Enabling Sentry Authorization for Impala</a> for how to set up authorization and add
+        privileges for specific kinds of objects.
+      </p>
+
+      <p class="p">
+        The user ID that the <span class="keyword cmdname">impalad</span> daemon runs under,
+        typically the <code class="ph codeph">impala</code> user, must have read and execute
+        permissions for all directories that are part of the table.
+        (A table could span multiple different HDFS directories if it is partitioned.
+        The directories could be widely scattered because a partition can reside
+        in an arbitrary HDFS directory based on its <code class="ph codeph">LOCATION</code> attribute.)
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+      <p class="p">
+        The following examples demonstrate the <code class="ph codeph">SHOW TABLES</code> statement.
+        If the database contains no tables, the result set is empty.
+        If the database does contain tables, <code class="ph codeph">SHOW TABLES IN <var class="keyword varname">db_name</var></code>
+        lists all the table names. <code class="ph codeph">SHOW TABLES</code> with no qualifiers lists
+        all the table names in the current database.
+      </p>
+
+<pre class="pre codeblock"><code>create database empty_db;
+show tables in empty_db;
+Fetched 0 row(s) in 0.11s
+
+create database full_db;
+create table full_db.t1 (x int);
+create table full_db.t2 like full_db.t1;
+
+show tables in full_db;
++------+
+| name |
++------+
+| t1   |
+| t2   |
++------+
+
+use full_db;
+show tables;
++------+
+| name |
++------+
+| t1   |
+| t2   |
++------+
+</code></pre>
+
+      <p class="p">
+        This example demonstrates how <code class="ph codeph">SHOW TABLES LIKE '<var class="keyword varname">wildcard_pattern</var>'</code>
+        lists table names that match a pattern, or multiple alternative patterns.
+        The ability to do wildcard matches for table names makes it helpful to establish naming conventions for tables to
+        conveniently locate a group of related tables.
+      </p>
+
+<pre class="pre codeblock"><code>create table fact_tbl (x int);
+create table dim_tbl_1 (s string);
+create table dim_tbl_2 (s string);
+
+/* Asterisk is the wildcard character. Only 2 out of the 3 just-created tables are returned. */
+show tables like 'dim*';
++-----------+
+| name      |
++-----------+
+| dim_tbl_1 |
+| dim_tbl_2 |
++-----------+
+
+/* We are already in the FULL_DB database, but just to be sure we can specify the database name also. */
+show tables in full_db like 'dim*';
++-----------+
+| name      |
++-----------+
+| dim_tbl_1 |
+| dim_tbl_2 |
++-----------+
+
+/* The pipe character separates multiple wildcard patterns. */
+show tables like '*dim*|t*';
++-----------+
+| name      |
++-----------+
+| dim_tbl_1 |
+| dim_tbl_2 |
+| t1        |
+| t2        |
++-----------+
+</code></pre>
+
+      <p class="p">
+        <strong class="ph b">HDFS permissions:</strong> This statement does not touch any HDFS files or directories,
+        therefore no HDFS permissions are required.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+      <p class="p">
+        <a class="xref" href="impala_tables.html#tables">Overview of Impala Tables</a>, <a class="xref" href="impala_create_table.html#create_table">CREATE TABLE Statement</a>,
+        <a class="xref" href="impala_alter_table.html#alter_table">ALTER TABLE Statement</a>, <a class="xref" href="impala_drop_table.html#drop_table">DROP TABLE Statement</a>,
+        <a class="xref" href="impala_describe.html#describe">DESCRIBE Statement</a>, <a class="xref" href="impala_show.html#show_create_table">SHOW CREATE TABLE Statement</a>,
+        <a class="xref" href="impala_show.html#show_table_stats">SHOW TABLE STATS Statement</a>,
+        <a class="xref" href="impala_show.html#show_databases">SHOW DATABASES</a>,
+        <a class="xref" href="impala_show.html#show_functions">SHOW FUNCTIONS Statement</a>
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title9" id="show__show_create_table">
+
+    <h2 class="title topictitle2" id="ariaid-title9">SHOW CREATE TABLE Statement</h2>
+  
+
+    <div class="body conbody">
+
+      <p class="p">
+        As a schema changes over time, you might run a <code class="ph codeph">CREATE TABLE</code> statement followed by several
+        <code class="ph codeph">ALTER TABLE</code> statements. To capture the cumulative effect of all those statements,
+        <code class="ph codeph">SHOW CREATE TABLE</code> displays a <code class="ph codeph">CREATE TABLE</code> statement that would reproduce
+        the current structure of a table. You can use this output in scripts that set up or clone a group of
+        tables, rather than trying to reproduce the original sequence of <code class="ph codeph">CREATE TABLE</code> and
+        <code class="ph codeph">ALTER TABLE</code> statements. When creating variations on the original table, or cloning the
+        original table on a different system, you might need to edit the <code class="ph codeph">SHOW CREATE TABLE</code> output
+        to change things such as the database name, <code class="ph codeph">LOCATION</code> field, and so on that might be
+        different on the destination system.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Security considerations:</strong>
+      </p>
+
+      <p class="p">
+        When authorization is enabled, the output of the <code class="ph codeph">SHOW</code> statement is limited to those
+        objects for which you have some privilege. There might be other database, tables, and so on, but their
+        names are concealed. If you believe an object exists but you cannot see it in the <code class="ph codeph">SHOW</code>
+        output, check with the system administrator if you need to be granted a new privilege for that object. See
+        <a class="xref" href="../shared/../topics/impala_authorization.html#authorization">Enabling Sentry Authorization for Impala</a> for how to set up authorization and add
+        privileges for specific kinds of objects.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">HDFS permissions:</strong> This statement does not touch any HDFS files or directories,
+        therefore no HDFS permissions are required.
+      </p>
+
+      <p class="p">
+        For Kudu tables:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          <p class="p">
+            The column specifications include attributes such as <code class="ph codeph">NULL</code>,
+            <code class="ph codeph">NOT NULL</code>, <code class="ph codeph">ENCODING</code>, and <code class="ph codeph">COMPRESSION</code>.
+            If you do not specify those attributes in the original <code class="ph codeph">CREATE TABLE</code> statement,
+            the <code class="ph codeph">SHOW CREATE TABLE</code> output displays the defaults that were used.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            The specifications of any <code class="ph codeph">RANGE</code> clauses are not displayed in full.
+            To see the definition of the range clauses for a Kudu table, use the <code class="ph codeph">SHOW RANGE PARTITIONS</code> statement.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            The <code class="ph codeph">TBLPROPERTIES</code> output reflects the Kudu master address
+            and the internal Kudu name associated with the Impala table.
+          </p>
+        </li>
+      </ul>
+
+<pre class="pre codeblock"><code>
+show CREATE TABLE numeric_grades_default_letter;
++------------------------------------------------------------------------------------------------+
+| result                                                                                         |
++------------------------------------------------------------------------------------------------+
+| CREATE TABLE user.numeric_grades_default_letter (                                              |
+|   score TINYINT NOT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION,               |
+|   letter_grade STRING NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION DEFAULT '-', |
+|   student STRING NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION,                  |
+|   PRIMARY KEY (score)                                                                          |
+| )                                                                                              |
+| PARTITION BY <strong class="ph b">RANGE (score) (...)</strong>                                                               |
+| STORED AS KUDU                                                                                 |
+| TBLPROPERTIES ('kudu.master_addresses'='vd0342.example.com:7051',                              |
+|   'kudu.table_name'='impala::USER.numeric_grades_default_letter')                              |
++------------------------------------------------------------------------------------------------+
+
+show range partitions numeric_grades_default_letter;
++--------------------+
+| RANGE (score)      |
++--------------------+
+| 0 &lt;= VALUES &lt; 50   |
+| 50 &lt;= VALUES &lt; 65  |
+| 65 &lt;= VALUES &lt; 80  |
+| 80 &lt;= VALUES &lt; 100 |
++--------------------+
+</code></pre>
+
+      <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+      <p class="p">
+        The following example shows how various clauses from the <code class="ph codeph">CREATE TABLE</code> statement are
+        represented in the output of <code class="ph codeph">SHOW CREATE TABLE</code>.
+      </p>
+
+<pre class="pre codeblock"><code>create table show_create_table_demo (id int comment "Unique ID", y double, s string)
+  partitioned by (year smallint)
+  stored as parquet;
+
+show create table show_create_table_demo;
++----------------------------------------------------------------------------------------+
+| result                                                                                 |
++----------------------------------------------------------------------------------------+
+| CREATE TABLE scratch.show_create_table_demo (                                          |
+|   id INT COMMENT 'Unique ID',                                                          |
+|   y DOUBLE,                                                                            |
+|   s STRING                                                                             |
+| )                                                                                      |
+| PARTITIONED BY (                                                                       |
+|   year SMALLINT                                                                        |
+| )                                                                                      |
+| STORED AS PARQUET                                                                      |
+| LOCATION 'hdfs://127.0.0.1:8020/user/hive/warehouse/scratch.db/show_create_table_demo' |
+| TBLPROPERTIES ('transient_lastDdlTime'='1418152582')                                   |
++----------------------------------------------------------------------------------------+
+</code></pre>
+
+      <p class="p">
+        The following example shows how, after a sequence of <code class="ph codeph">ALTER TABLE</code> statements, the output
+        from <code class="ph codeph">SHOW CREATE TABLE</code> represents the current state of the table. This output could be
+        used to create a matching table rather than executing the original <code class="ph codeph">CREATE TABLE</code> and
+        sequence of <code class="ph codeph">ALTER TABLE</code> statements.
+      </p>
+
+<pre class="pre codeblock"><code>alter table show_create_table_demo drop column s;
+alter table show_create_table_demo set fileformat textfile;
+
+show create table show_create_table_demo;
++----------------------------------------------------------------------------------------+
+| result                                                                                 |
++----------------------------------------------------------------------------------------+
+| CREATE TABLE scratch.show_create_table_demo (                                          |
+|   id INT COMMENT 'Unique ID',                                                          |
+|   y DOUBLE                                                                             |
+| )                                                                                      |
+| PARTITIONED BY (                                                                       |
+|   year SMALLINT                                                                        |
+| )                                                                                      |
+| STORED AS TEXTFILE                                                                     |
+| LOCATION 'hdfs://127.0.0.1:8020/user/hive/warehouse/demo.db/show_create_table_demo'    |
+| TBLPROPERTIES ('transient_lastDdlTime'='1418152638')                                   |
++----------------------------------------------------------------------------------------+
+</code></pre>
+
+      <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+      <p class="p">
+        <a class="xref" href="impala_create_table.html#create_table">CREATE TABLE Statement</a>, <a class="xref" href="impala_describe.html#describe">DESCRIBE Statement</a>,
+        <a class="xref" href="impala_show.html#show_tables">SHOW TABLES Statement</a>
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title10" id="show__show_table_stats">
+
+    <h2 class="title topictitle2" id="ariaid-title10">SHOW TABLE STATS Statement</h2>
+  
+
+    <div class="body conbody">
+
+      <p class="p">
+        The <code class="ph codeph">SHOW TABLE STATS</code> and <code class="ph codeph">SHOW COLUMN STATS</code> variants are important for
+        tuning performance and diagnosing performance issues, especially with the largest tables and the most
+        complex join queries.
+      </p>
+
+      <p class="p">
+        Any values that are not available (because the <code class="ph codeph">COMPUTE STATS</code> statement has not been run
+        yet) are displayed as <code class="ph codeph">-1</code>.
+      </p>
+
+      <p class="p">
+        <code class="ph codeph">SHOW TABLE STATS</code> provides some general information about the table, such as the number of
+        files, overall size of the data, whether some or all of the data is in the HDFS cache, and the file format,
+        that is useful whether or not you have run the <code class="ph codeph">COMPUTE STATS</code> statement. A
+        <code class="ph codeph">-1</code> in the <code class="ph codeph">#Rows</code> output column indicates that the <code class="ph codeph">COMPUTE
+        STATS</code> statement has never been run for this table. If the table is partitioned, <code class="ph codeph">SHOW TABLE
+        STATS</code> provides this information for each partition. (It produces the same output as the
+        <code class="ph codeph">SHOW PARTITIONS</code> statement in this case.)
+      </p>
+
+      <p class="p">
+        The output of <code class="ph codeph">SHOW COLUMN STATS</code> is primarily only useful after the <code class="ph codeph">COMPUTE
+        STATS</code> statement has been run on the table. A <code class="ph codeph">-1</code> in the <code class="ph codeph">#Distinct
+        Values</code> output column indicates that the <code class="ph codeph">COMPUTE STATS</code> statement has never been
+        run for this table. Currently, Impala always leaves the <code class="ph codeph">#Nulls</code> column as
+        <code class="ph codeph">-1</code>, even after <code class="ph codeph">COMPUTE STATS</code> has been run.
+      </p>
+
+      <p class="p">
+        These <code class="ph codeph">SHOW</code> statements work on actual tables only, not on views.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Security considerations:</strong>
+      </p>
+
+      <p class="p">
+        When authorization is enabled, the output of the <code class="ph codeph">SHOW</code> statement is limited to those
+        objects for which you have some privilege. There might be other database, tables, and so on, but their
+        names are concealed. If you believe an object exists but you cannot see it in the <code class="ph codeph">SHOW</code>
+        output, check with the system administrator if you need to be granted a new privilege for that object. See
+        <a class="xref" href="../shared/../topics/impala_authorization.html#authorization">Enabling Sentry Authorization for Impala</a> for how to set up authorization and add
+        privileges for specific kinds of objects.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Kudu considerations:</strong>
+      </p>
+
+      <p class="p">
+        Because Kudu tables do not have characteristics derived from HDFS, such
+        as number of files, file format, and HDFS cache status, the output of
+        <code class="ph codeph">SHOW TABLE STATS</code> reflects different characteristics
+        that apply to Kudu tables. If the Kudu table is created with the
+        clause <code class="ph codeph">PARTITIONS 20</code>, then the result set of
+        <code class="ph codeph">SHOW TABLE STATS</code> consists of 20 rows, each representing
+        one of the numbered partitions. For example:
+      </p>
+
+<pre class="pre codeblock"><code>
+show table stats kudu_table;
++--------+-----------+----------+-----------------------+------------+
+| # Rows | Start Key | Stop Key | Leader Replica        | # Replicas |
++--------+-----------+----------+-----------------------+------------+
+| -1     |           | 00000001 | host.example.com:7050 | 3          |
+| -1     | 00000001  | 00000002 | host.example.com:7050 | 3          |
+| -1     | 00000002  | 00000003 | host.example.com:7050 | 3          |
+| -1     | 00000003  | 00000004 | host.example.com:7050 | 3          |
+| -1     | 00000004  | 00000005 | host.example.com:7050 | 3          |
+...
+</code></pre>
+
+      <p class="p">
+        Impala does not compute the number of rows for each partition for
+        Kudu tables. Therefore, you do not need to re-run <code class="ph codeph">COMPUTE STATS</code>
+        when you see -1 in the <code class="ph codeph"># Rows</code> column of the output from
+        <code class="ph codeph">SHOW TABLE STATS</code>. That column always shows -1 for
+        all Kudu tables. 
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+      <p class="p">
+        The following examples show how the <code class="ph codeph">SHOW TABLE STATS</code> statement displays physical
+        information about a table and the associated data files:
+      </p>
+
+<pre class="pre codeblock"><code>show table stats store_sales;
++-------+--------+----------+--------------+--------+-------------------+
+| #Rows | #Files | Size     | Bytes Cached | Format | Incremental stats |
++-------+--------+----------+--------------+--------+-------------------+
+| -1    | 1      | 370.45MB | NOT CACHED   | TEXT   | false             |
++-------+--------+----------+--------------+--------+-------------------+
+
+show table stats customer;
++-------+--------+---------+--------------+--------+-------------------+
+| #Rows | #Files | Size    | Bytes Cached | Format | Incremental stats |
++-------+--------+---------+--------------+--------+-------------------+
+| -1    | 1      | 12.60MB | NOT CACHED   | TEXT   | false             |
++-------+--------+---------+--------------+--------+-------------------+
+</code></pre>
+
+      <p class="p">
+        The following example shows how, after a <code class="ph codeph">COMPUTE STATS</code> or <code class="ph codeph">COMPUTE INCREMENTAL
+        STATS</code> statement, the <code class="ph codeph">#Rows</code> field is now filled in. Because the
+        <code class="ph codeph">STORE_SALES</code> table in this example is not partitioned, the <code class="ph codeph">COMPUTE INCREMENTAL
+        STATS</code> statement produces regular stats rather than incremental stats, therefore the
+        <code class="ph codeph">Incremental stats</code> field remains <code class="ph codeph">false</code>.
+      </p>
+
+<pre class="pre codeblock"><code>compute stats customer;
++------------------------------------------+
+| summary                                  |
++------------------------------------------+
+| Updated 1 partition(s) and 18 column(s). |
++------------------------------------------+
+
+show table stats customer;
++--------+--------+---------+--------------+--------+-------------------+
+| #Rows  | #Files | Size    | Bytes Cached | Format | Incremental stats |
++--------+--------+---------+--------------+--------+-------------------+
+| 100000 | 1      | 12.60MB | NOT CACHED   | TEXT   | false             |
++--------+--------+---------+--------------+--------+-------------------+
+
+compute incremental stats store_sales;
++------------------------------------------+
+| summary                                  |
++------------------------------------------+
+| Updated 1 partition(s) and 23 column(s). |
++------------------------------------------+
+
+show table stats store_sales;
++---------+--------+----------+--------------+--------+-------------------+
+| #Rows   | #Files | Size     | Bytes Cached | Format | Incremental stats |
++---------+--------+----------+--------------+--------+-------------------+
+| 2880404 | 1      | 370.45MB | NOT CACHED   | TEXT   | false             |
++---------+--------+----------+--------------+--------+-------------------+
+</code></pre>
+
+      <p class="p">
+        <strong class="ph b">HDFS permissions:</strong>
+      </p>
+      <p class="p">
+        The user ID that the <span class="keyword cmdname">impalad</span> daemon runs under,
+        typically the <code class="ph codeph">impala</code> user, must have read and execute
+        permissions for all directories that are part of the table.
+        (A table could span multiple different HDFS directories if it is partitioned.
+        The directories could be widely scattered because a partition can reside
+        in an arbitrary HDFS directory based on its <code class="ph codeph">LOCATION</code> attribute.)
+        The Impala user must also have execute
+        permission for the database directory, and any parent directories of the database directory in HDFS.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+      <p class="p">
+        <a class="xref" href="impala_compute_stats.html#compute_stats">COMPUTE STATS Statement</a>, <a class="xref" href="impala_show.html#show_column_stats">SHOW COLUMN STATS Statement</a>
+      </p>
+
+      <p class="p">
+        See <a class="xref" href="impala_perf_stats.html#perf_stats">Table and Column Statistics</a> for usage information and examples.
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title11" id="show__show_column_stats">
+
+    <h2 class="title topictitle2" id="ariaid-title11">SHOW COLUMN STATS Statement</h2>
+  
+
+    <div class="body conbody">
+
+      <p class="p">
+        The <code class="ph codeph">SHOW TABLE STATS</code> and <code class="ph codeph">SHOW COLUMN STATS</code> variants are important for
+        tuning performance and diagnosing performance issues, especially with the largest tables and the most
+        complex join queries.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Security considerations:</strong>
+      </p>
+
+      <p class="p">
+        When authorization is enabled, the output of the <code class="ph codeph">SHOW</code> statement is limited to those
+        objects for which you have some privilege. There might be other database, tables, and so on, but their
+        names are concealed. If you believe an object exists but you cannot see it in the <code class="ph codeph">SHOW</code>
+        output, check with the system administrator if you need to be granted a new privilege for that object. See
+        <a class="xref" href="../shared/../topics/impala_authorization.html#authorization">Enabling Sentry Authorization for Impala</a> for how to set up authorization and add
+        privileges for specific kinds of objects.
+      </p>
+
+      <p class="p">
+        The output for <code class="ph codeph">SHOW COLUMN STATS</code> includes
+        the relevant information for Kudu tables.
+        The information for column statistics that originates in the
+        underlying Kudu storage layer is also represented in the
+        metastore database that Impala uses.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+      <p class="p">
+        The following examples show the output of the <code class="ph codeph">SHOW COLUMN STATS</code> statement for some tables,
+        before the <code class="ph codeph">COMPUTE STATS</code> statement is run. Impala deduces some information, such as
+        maximum and average size for fixed-length columns, and leaves and unknown values as <code class="ph codeph">-1</code>.
+      </p>
+
+<pre class="pre codeblock"><code>show column stats customer;
++------------------------+--------+------------------+--------+----------+----------+
+| Column                 | Type   | #Distinct Values | #Nulls | Max Size | Avg Size |
++------------------------+--------+------------------+--------+----------+----------+
+| c_customer_sk          | INT    | -1               | -1     | 4        | 4        |
+| c_customer_id          | STRING | -1               | -1     | -1       | -1       |
+| c_current_cdemo_sk     | INT    | -1               | -1     | 4        | 4        |
+| c_current_hdemo_sk     | INT    | -1               | -1     | 4        | 4        |
+| c_current_addr_sk      | INT    | -1               | -1     | 4        | 4        |
+| c_first_shipto_date_sk | INT    | -1               | -1     | 4        | 4        |
+| c_first_sales_date_sk  | INT    | -1               | -1     | 4        | 4        |
+| c_salutation           | STRING | -1               | -1     | -1       | -1       |
+| c_first_name           | STRING | -1               | -1     | -1       | -1       |
+| c_last_name            | STRING | -1               | -1     | -1       | -1       |
+| c_preferred_cust_flag  | STRING | -1               | -1     | -1       | -1       |
+| c_birth_day            | INT    | -1               | -1     | 4        | 4        |
+| c_birth_month          | INT    | -1               | -1     | 4        | 4        |
+| c_birth_year           | INT    | -1               | -1     | 4        | 4        |
+| c_birth_country        | STRING | -1               | -1     | -1       | -1       |
+| c_login                | STRING | -1               | -1     | -1       | -1       |
+| c_email_address        | STRING | -1               | -1     | -1       | -1       |
+| c_last_review_date     | STRING | -1               | -1     | -1       | -1       |
++------------------------+--------+------------------+--------+----------+----------+
+
+show column stats store_sales;
++-----------------------+-------+------------------+--------+----------+----------+
+| Column                | Type  | #Distinct Values | #Nulls | Max Size | Avg Size |
++-----------------------+-------+------------------+--------+----------+----------+
+| ss_sold_date_sk       | INT   | -1               | -1     | 4        | 4        |
+| ss_sold_time_sk       | INT   | -1               | -1     | 4        | 4        |
+| ss_item_sk            | INT   | -1               | -1     | 4        | 4        |
+| ss_customer_sk        | INT   | -1               | -1     | 4        | 4        |
+| ss_cdemo_sk           | INT   | -1               | -1     | 4        | 4        |
+| ss_hdemo_sk           | INT   | -1               | -1     | 4        | 4        |
+| ss_addr_sk            | INT   | -1               | -1     | 4        | 4        |
+| ss_store_sk           | INT   | -1               | -1     | 4        | 4        |
+| ss_promo_sk           | INT   | -1               | -1     | 4        | 4        |
+| ss_ticket_number      | INT   | -1               | -1     | 4        | 4        |
+| ss_quantity           | INT   | -1               | -1     | 4        | 4        |
+| ss_wholesale_cost     | FLOAT | -1               | -1     | 4        | 4        |
+| ss_list_price         | FLOAT | -1               | -1     | 4        | 4        |
+| ss_sales_price        | FLOAT | -1               | -1     | 4        | 4        |
+| ss_ext_discount_amt   | FLOAT | -1               | -1     | 4        | 4        |
+| ss_ext_sales_price    | FLOAT | -1               | -1     | 4        | 4        |
+| ss_ext_wholesale_cost | FLOAT | -1               | -1     | 4        | 4        |
+| ss_ext_list_price     | FLOAT | -1               | -1     | 4        | 4        |
+| ss_ext_tax            | FLOAT | -1               | -1     | 4        | 4        |
+| ss_coupon_amt         | FLOAT | -1               | -1     | 4        | 4        |
+| ss_net_paid           | FLOAT | -1               | -1     | 4        | 4        |
+| ss_net_paid_inc_tax   | FLOAT | -1               | -1     | 4        | 4        |
+| ss_net_profit         | FLOAT | -1               | -1     | 4        | 4        |
++-----------------------+-------+------------------+--------+----------+----------+
+</code></pre>
+
+      <p class="p">
+        The following examples show the output of the <code class="ph codeph">SHOW COLUMN STATS</code> statement for some tables,
+        after the <code class="ph codeph">COMPUTE STATS</code> statement is run. Now most of the <code class="ph codeph">-1</code> values are
+        changed to reflect the actual table data. The <code class="ph codeph">#Nulls</code> column remains <code class="ph codeph">-1</code>
+        because Impala does not use the number of <code class="ph codeph">NULL</code> values to influence query planning.
+      </p>
+
+<pre class="pre codeblock"><code>compute stats customer;
++------------------------------------------+
+| summary                                  |
++------------------------------------------+
+| Updated 1 partition(s) and 18 column(s). |
++------------------------------------------+
+
+compute stats store_sales;
++------------------------------------------+
+| summary                                  |
++------------------------------------------+
+| Updated 1 partition(s) and 23 column(s). |
++------------------------------------------+
+
+show column stats customer;
++------------------------+--------+------------------+--------+----------+--------+
+| Column                 | Type   | #Distinct Values | #Nulls | Max Size | Avg Size
++------------------------+--------+------------------+--------+----------+--------+
+| c_customer_sk          | INT    | 139017           | -1     | 4        | 4      |
+| c_customer_id          | STRING | 111904           | -1     | 16       | 16     |
+| c_current_cdemo_sk     | INT    | 95837            | -1     | 4        | 4      |
+| c_current_hdemo_sk     | INT    | 8097             | -1     | 4        | 4      |
+| c_current_addr_sk      | INT    | 57334            | -1     | 4        | 4      |
+| c_first_shipto_date_sk | INT    | 4374             | -1     | 4        | 4      |
+| c_first_sales_date_sk  | INT    | 4409             | -1     | 4        | 4      |
+| c_salutation           | STRING | 7                | -1     | 4        | 3.1308 |
+| c_first_name           | STRING | 3887             | -1     | 11       | 5.6356 |
+| c_last_name            | STRING | 4739             | -1     | 13       | 5.9106 |
+| c_preferred_cust_flag  | STRING | 3                | -1     | 1        | 0.9656 |
+| c_birth_day            | INT    | 31               | -1     | 4        | 4      |
+| c_birth_month          | INT    | 12               | -1     | 4        | 4      |
+| c_birth_year           | INT    | 71               | -1     | 4        | 4      |
+| c_birth_country        | STRING | 205              | -1     | 20       | 8.4001 |
+| c_login                | STRING | 1                | -1     | 0        | 0      |
+| c_email_address        | STRING | 94492            | -1     | 46       | 26.485 |
+| c_last_review_date     | STRING | 349              | -1     | 7        | 6.7561 |
++------------------------+--------+------------------+--------+----------+--------+
+
+show column stats store_sales;
++-----------------------+-------+------------------+--------+----------+----------+
+| Column                | Type  | #Distinct Values | #Nulls | Max Size | Avg Size |
++-----------------------+-------+------------------+--------+----------+----------+
+| ss_sold_date_sk       | INT   | 4395             | -1     | 4        | 4        |
+| ss_sold_time_sk       | INT   | 63617            | -1     | 4        | 4        |
+| ss_item_sk            | INT   | 19463            | -1     | 4        | 4        |
+| ss_customer_sk        | INT   | 122720           | -1     | 4        | 4        |
+| ss_cdemo_sk           | INT   | 242982           | -1     | 4        | 4        |
+| ss_hdemo_sk           | INT   | 8097             | -1     | 4        | 4        |
+| ss_addr_sk            | INT   | 70770            | -1     | 4        | 4        |
+| ss_store_sk           | INT   | 6                | -1     | 4        | 4        |
+| ss_promo_sk           | INT   | 355              | -1     | 4        | 4        |
+| ss_ticket_number      | INT   | 304098           | -1     | 4        | 4        |
+| ss_quantity           | INT   | 105              | -1     | 4        | 4        |
+| ss_wholesale_cost     | FLOAT | 9600             | -1     | 4        | 4        |
+| ss_list_price         | FLOAT | 22191            | -1     | 4        | 4        |
+| ss_sales_price        | FLOAT | 20693            | -1     | 4        | 4        |
+| ss_ext_discount_amt   | FLOAT | 228141           | -1     | 4        | 4        |
+| ss_ext_sales_price    | FLOAT | 433550           | -1     | 4        | 4        |
+| ss_ext_wholesale_cost | FLOAT | 406291           | -1     | 4        | 4        |
+| ss_ext_list_price     | FLOAT | 574871           | -1     | 4        | 4        |
+| ss_ext_tax            | FLOAT | 91806            | -1     | 4        | 4        |
+| ss_coupon_amt         | FLOAT | 228141           | -1     | 4        | 4        |
+| ss_net_paid           | FLOAT | 493107           | -1     | 4        | 4        |
+| ss_net_paid_inc_tax   | FLOAT | 653523           | -1     | 4        | 4        |
+| ss_net_profit         | FLOAT | 611934           | -1     | 4        | 4        |
++-----------------------+-------+------------------+--------+----------+----------+
+</code></pre>
+
+      <p class="p">
+        <strong class="ph b">HDFS permissions:</strong>
+      </p>
+      <p class="p">
+        The user ID that the <span class="keyword cmdname">impalad</span> daemon runs under,
+        typically the <code class="ph codeph">impala</code> user, must have read and execute
+        permissions for all directories that are part of the table.
+        (A table could span multiple different HDFS directories if it is partitioned.
+        The directories could be widely scattered because a partition can reside
+        in an arbitrary HDFS directory based on its <code class="ph codeph">LOCATION</code> attribute.)
+        The Impala user must also have execute
+        permission for the database directory, and any parent directories of the database directory in HDFS.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+      <p class="p">
+        <a class="xref" href="impala_compute_stats.html#compute_stats">COMPUTE STATS Statement</a>, <a class="xref" href="impala_show.html#show_table_stats">SHOW TABLE STATS Statement</a>
+      </p>
+
+      <p class="p">
+        See <a class="xref" href="impala_perf_stats.html#perf_stats">Table and Column Statistics</a> for usage information and examples.
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title12" id="show__show_partitions">
+
+    <h2 class="title topictitle2" id="ariaid-title12">SHOW PARTITIONS Statement</h2>
+  
+
+    <div class="body conbody">
+
+      <p class="p">
+        <code class="ph codeph">SHOW PARTITIONS</code> displays information about each partition for a partitioned table. (The
+        output is the same as the <code class="ph codeph">SHOW TABLE STATS</code> statement, but <code class="ph codeph">SHOW PARTITIONS</code>
+        only works on a partitioned table.) Because it displays table statistics for all partitions, the output is
+        more informative if you have run the <code class="ph codeph">COMPUTE STATS</code> statement after creating all the
+        partitions. See <a class="xref" href="impala_compute_stats.html#compute_stats">COMPUTE STATS Statement</a> for details. For example, on a
+        <code class="ph codeph">CENSUS</code> table partitioned on the <code class="ph codeph">YEAR</code> column:
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Security considerations:</strong>
+      </p>
+
+      <p class="p">
+        When authorization is enabled, the output of the <code class="ph codeph">SHOW</code> statement is limited to those
+        objects for which you have some privilege. There might be other database, tables, and so on, but their
+        names are concealed. If you believe an object exists but you cannot see it in the <code class="ph codeph">SHOW</code>
+        output, check with the system administrator if you need to be granted a new privilege for that object. See
+        <a class="xref" href="../shared/../topics/impala_authorization.html#authorization">Enabling Sentry Authorization for Impala</a> for how to set up authorization and add
+        privileges for specific kinds of objects.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Kudu considerations:</strong>
+      </p>
+
+      <p class="p">
+        The optional <code class="ph codeph">RANGE</code> clause only applies to Kudu tables. It displays only the partitions
+        defined by the <code class="ph codeph">RANGE</code> clause of <code class="ph codeph">CREATE TABLE</code> or <code class="ph codeph">ALTER TABLE</code>.
+      </p>
+
+      <p class="p">
+        Although you can specify <code class="ph codeph">&lt;</code> or
+        <code class="ph codeph">&lt;=</code> comparison operators when defining
+        range partitions for Kudu tables, Kudu rewrites them if necessary
+        to represent each range as
+        <code class="ph codeph"><var class="keyword varname">low_bound</var> &lt;= VALUES &lt; <var class="keyword varname">high_bound</var></code>.
+        This rewriting might involve incrementing one of the boundary values
+        or appending a <code class="ph codeph">\0</code> for string values, so that the
+        partition covers the same range as originally specified.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+      <p class="p">
+        The following example shows the output for a Parquet, text, or other
+        HDFS-backed table partitioned on the <code class="ph codeph">YEAR</code> column:
+      </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; show partitions census;
++-------+-------+--------+------+---------+
+| year  | #Rows | #Files | Size | Format  |
++-------+-------+--------+------+---------+
+| 2000  | -1    | 0      | 0B   | TEXT    |
+| 2004  | -1    | 0      | 0B   | TEXT    |
+| 2008  | -1    | 0      | 0B   | TEXT    |
+| 2010  | -1    | 0      | 0B   | TEXT    |
+| 2011  | 4     | 1      | 22B  | TEXT    |
+| 2012  | 4     | 1      | 22B  | TEXT    |
+| 2013  | 1     | 1      | 231B | PARQUET |
+| Total | 9     | 3      | 275B |         |
++-------+-------+--------+------+---------+
+</code></pre>
+
+      <p class="p">
+        The following example shows the output for a Kudu table
+        using the hash partitioning mechanism. The number of
+        rows in the result set corresponds to the values used
+        in the <code class="ph codeph">PARTITIONS <var class="keyword varname">N</var></code>
+        clause of <code class="ph codeph">CREATE TABLE</code>.
+      </p>
+
+<pre class="pre codeblock"><code>
+show partitions million_rows_hash;
+
++--------+-----------+----------+-----------------------+--
+| # Rows | Start Key | Stop Key | Leader Replica        | # Replicas
++--------+-----------+----------+-----------------------+--
+| -1     |           | 00000001 | n236.example.com:7050 | 3
+| -1     | 00000001  | 00000002 | n236.example.com:7050 | 3
+| -1     | 00000002  | 00000003 | n336.example.com:7050 | 3
+| -1     | 00000003  | 00000004 | n238.example.com:7050 | 3
+| -1     | 00000004  | 00000005 | n338.example.com:7050 | 3
+....
+| -1     | 0000002E  | 0000002F | n240.example.com:7050 | 3
+| -1     | 0000002F  | 00000030 | n336.example.com:7050 | 3
+| -1     | 00000030  | 00000031 | n240.example.com:7050 | 3
+| -1     | 00000031  |          | n334.example.com:7050 | 3
++--------+-----------+----------+-----------------------+--
+Fetched 50 row(s) in 0.05s
+
+</code></pre>
+
+      <p class="p">
+        The following example shows the output for a Kudu table
+        using the range partitioning mechanism:
+      </p>
+
+<pre class="pre codeblock"><code>
+show range partitions million_rows_range;
++-----------------------+
+| RANGE (id)            |
++-----------------------+
+| VALUES &lt; "A"          |
+| "A" &lt;= VALUES &lt; "["   |
+| "a" &lt;= VALUES &lt; "{"   |
+| "{" &lt;= VALUES &lt; "~\0" |
++-----------------------+
+
+</code></pre>
+
+      <p class="p">
+        <strong class="ph b">HDFS permissions:</strong>
+      </p>
+      <p class="p">
+        The user ID that the <span class="keyword cmdname">impalad</span> daemon runs under,
+        typically the <code class="ph codeph">impala</code> user, must have read and execute
+        permissions for all directories that are part of the table.
+        (A table could span multiple different HDFS directories if it is partitioned.
+        The directories could be widely scattered because a partition can reside
+        in an arbitrary HDFS directory based on its <code class="ph codeph">LOCATION</code> attribute.)
+        The Impala user must also have execute
+        permission for the database directory, and any parent directories of the database directory in HDFS.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+      <p class="p">
+        See <a class="xref" href="impala_perf_stats.html#perf_stats">Table and Column Statistics</a> for usage information and examples.
+      </p>
+
+      <p class="p">
+        <a class="xref" href="impala_show.html#show_table_stats">SHOW TABLE STATS Statement</a>, <a class="xref" href="impala_partitioning.html#partitioning">Partitioning for Impala Tables</a>
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title13" id="show__show_functions">
+
+    <h2 class="title topictitle2" id="ariaid-title13">SHOW FUNCTIONS Statement</h2>
+  
+
+    <div class="body conbody">
+
+      <p class="p">
+        By default, <code class="ph codeph">SHOW FUNCTIONS</code> displays user-defined functions (UDFs) and <code class="ph codeph">SHOW
+        AGGREGATE FUNCTIONS</code> displays user-defined aggregate functions (UDAFs) associated with a particular
+        database. The output from <code class="ph codeph">SHOW FUNCTIONS</code> includes the argument signature of each function.
+        You specify this argument signature as part of the <code class="ph codeph">DROP FUNCTION</code> statement. You might have
+        several UDFs with the same name, each accepting different argument data types.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+      <p class="p">
+        In <span class="keyword">Impala 2.5</span> and higher, the <code class="ph codeph">SHOW FUNCTIONS</code> output includes
+        a new column, labelled <code class="ph codeph">is persistent</code>. This property is <code class="ph codeph">true</code> for
+        Impala built-in functions, C++ UDFs, and Java UDFs created using the new <code class="ph codeph">CREATE FUNCTION</code>
+        syntax with no signature. It is <code class="ph codeph">false</code> for Java UDFs created using the old
+        <code class="ph codeph">CREATE FUNCTION</code> syntax that includes the types for the arguments and return value.
+        Any functions with <code class="ph codeph">false</code> shown for this property must be created again by the
+        <code class="ph codeph">CREATE FUNCTION</code> statement each time the Impala catalog server is restarted.
+        See <code class="ph codeph">CREATE FUNCTION</code> for information on switching to the new syntax, so that
+        Java UDFs are preserved across restarts. Java UDFs that are persisted this way are also easier
+        to share across Impala and Hive.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Security considerations:</strong>
+      </p>
+
+      <p class="p">
+        When authorization is enabled, the output of the <code class="ph codeph">SHOW</code> statement is limited to those
+        objects for which you have some privilege. There might be other database, tables, and so on, but their
+        names are concealed. If you believe an object exists but you cannot see it in the <code class="ph codeph">SHOW</code>
+        output, check with the system administrator if you need to be granted a new privilege for that object. See
+        <a class="xref" href="../shared/../topics/impala_authorization.html#authorization">Enabling Sentry Authorization for Impala</a> for how to set up authorization and add
+        privileges for specific kinds of objects.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">HDFS permissions:</strong> This statement does not touch any HDFS files or directories,
+        therefore no HDFS permissions are required.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+      <p class="p">
+        To display Impala built-in functions, specify the special database name <code class="ph codeph">_impala_builtins</code>:
+      </p>
+
+<pre class="pre codeblock"><code>show functions in _impala_builtins;
++--------------+-------------------------------------------------+-------------+---------------+
+| return type  | signature                                       | binary type | is persistent |
++--------------+-------------------------------------------------+-------------+---------------+
+| BIGINT       | abs(BIGINT)                                     | BUILTIN     | true          |
+| DECIMAL(*,*) | abs(DECIMAL(*,*))                               | BUILTIN     | true          |
+| DOUBLE       | abs(DOUBLE)                                     | BUILTIN     | true          |
+| FLOAT        | abs(FLOAT)                                      | BUILTIN     | true          |
++----------------+----------------------------------------+
+...
+
+show functions in _impala_builtins like '*week*';
++-------------+------------------------------+-------------+---------------+
+| return type | signature                    | binary type | is persistent |
++-------------+------------------------------+-------------+---------------+
+| INT         | dayofweek(TIMESTAMP)         | BUILTIN     | true          |
+| INT         | weekofyear(TIMESTAMP)        | BUILTIN     | true          |
+| TIMESTAMP   | weeks_add(TIMESTAMP, BIGINT) | BUILTIN     | true          |
+| TIMESTAMP   | weeks_add(TIMESTAMP, INT)    | BUILTIN     | true          |
+| TIMESTAMP   | weeks_sub(TIMESTAMP, BIGINT) | BUILTIN     | true          |
+| TIMESTAMP   | weeks_sub(TIMESTAMP, INT)    | BUILTIN     | true          |
++-------------+------------------------------+-------------+---------------+
+</code></pre>
+
+      <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+      <p class="p">
+        <a class="xref" href="impala_functions_overview.html#functions">Overview of Impala Functions</a>, <a class="xref" href="impala_functions.html#builtins">Impala Built-In Functions</a>,
+        <a class="xref" href="impala_udf.html#udfs">Impala User-Defined Functions (UDFs)</a>,
+        <a class="xref" href="impala_show.html#show_databases">SHOW DATABASES</a>,
+        <a class="xref" href="impala_show.html#show_tables">SHOW TABLES Statement</a>
+      </p>
+    </div>
+  </article>
+
+  
+</article></main></body></html>
\ No newline at end of file

[44/51] [partial] incubator-impala git commit: IMPALA-4181 [DOCS] Publish rendered Impala documentation to ASF site

Posted by jb...@apache.org.

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_breakpad.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_breakpad.html b/docs/build/html/topics/impala_breakpad.html
new file mode 100644
index 0000000..7e05497
--- /dev/null
+++ b/docs/build/html/topics/impala_breakpad.html
@@ -0,0 +1,223 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_troubleshooting.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="breakpad"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Breakpad Minidumps for Impala (Impala 2.6 or higher only)</title></head><body id="breakpad"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Breakpad Minidumps for Impala (<span class="keyword">Impala 2.6</span> or higher only)</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      The <a class="xref" href="https://chromium.googlesource.com/breakpad/breakpad/" target="_blank">breakpad</a>
+      project is an open-source framework for crash reporting.
+      In <span class="keyword">Impala 2.6</span> and higher, Impala can use <code class="ph codeph">breakpad</code> to record stack information and
+      register values when any of the Impala-related daemons crash due to an error such as <code class="ph codeph">SIGSEGV</code>
+      or unhandled exceptions.
+      The dump files are much smaller than traditional core dump files. The dump mechanism itself uses very little
+      memory, which improves reliability if the crash occurs while the system is low on memory.
+    </p>
+
+    <div class="note important note_important"><span class="note__title importanttitle">Important:</span> 
+      Because of the internal mechanisms involving Impala memory allocation and Linux
+      signalling for out-of-memory (OOM) errors, if an Impala-related daemon experiences a
+      crash due to an OOM condition, it does <em class="ph i">not</em> generate a minidump for that error.
+    <p class="p">
+
+    </p>
+    </div>
+
+
+    <p class="p toc inpage"></p>
+
+  </div>
+
+  <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_troubleshooting.html">Troubleshooting Impala</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="breakpad__breakpad_minidump_enable">
+    <h2 class="title topictitle2" id="ariaid-title2">Enabling or Disabling Minidump Generation</h2>
+    <div class="body conbody">
+      <p class="p">
+        By default, a minidump file is generated when an Impala-related daemon crashes.
+        To turn off generation of the minidump files, change the
+        <span class="ph uicontrol">minidump_path</span> configuration setting of one or more Impala-related daemons
+        to the empty string, and restart the corresponding services or daemons.
+      </p>
+
+      <p class="p">
+        In <span class="keyword">Impala 2.7</span> and higher,
+        you can send a <code class="ph codeph">SIGUSR1</code> signal to any Impala-related daemon to write a
+        Breakpad minidump. For advanced troubleshooting, you can now produce a minidump
+        without triggering a crash.
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="breakpad__breakpad_minidump_location">
+    <h2 class="title topictitle2" id="ariaid-title3">Specifying the Location for Minidump Files</h2>
+    <div class="body conbody">
+      <div class="p">
+        By default, all minidump files are written to the following location
+        on the host where a crash occurs:
+        
+         <ul class="ul">
+          <li class="li">
+            <p class="p">
+              Clusters not managed by cluster management software:
+              <span class="ph filepath"><var class="keyword varname">impala_log_dir</var>/<var class="keyword varname">daemon_name</var>/minidumps/<var class="keyword varname">daemon_name</var></span>
+            </p>
+          </li>
+        </ul>
+        The minidump files for <span class="keyword cmdname">impalad</span>, <span class="keyword cmdname">catalogd</span>,
+        and <span class="keyword cmdname">statestored</span> are each written to a separate directory.
+      </div>
+      <p class="p">
+        To specify a different location, set the
+        
+        <span class="ph uicontrol">minidump_path</span>
+        configuration setting of one or more Impala-related daemons, and restart the corresponding services or daemons.
+      </p>
+      <p class="p">
+        If you specify a relative path for this setting, the value is interpreted relative to
+        the default <span class="ph uicontrol">minidump_path</span> directory.
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="breakpad__breakpad_minidump_number">
+    <h2 class="title topictitle2" id="ariaid-title4">Controlling the Number of Minidump Files</h2>
+    <div class="body conbody">
+      <p class="p">
+        Like any files used for logging or troubleshooting, consider limiting the number of
+        minidump files, or removing unneeded ones, depending on the amount of free storage
+        space on the hosts in the cluster.
+      </p>
+      <p class="p">
+        Because the minidump files are only used for problem resolution, you can remove any such files that
+        are not needed to debug current issues.
+      </p>
+      <p class="p">
+        To control how many minidump files Impala keeps around at any one time,
+        set the <span class="ph uicontrol">max_minidumps</span> configuration setting for
+        of one or more Impala-related daemon, and restart the corresponding services or daemons.
+        The default for this setting is 9. A zero or negative value is interpreted as
+        <span class="q">"unlimited"</span>.
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title5" id="breakpad__breakpad_minidump_logging">
+    <h2 class="title topictitle2" id="ariaid-title5">Detecting Crash Events</h2>
+    <div class="body conbody">
+
+      <p class="p">
+        You can see in the Impala log files when crash events occur that generate
+        minidump files. Because each restart begins a new log file, the <span class="q">"crashed"</span> message
+        is always at or near the bottom of the log file. There might be another later message
+        if core dumps are also enabled.
+      </p>
+
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title6" id="breakpad__breakpad_demo">
+    <h2 class="title topictitle2" id="ariaid-title6">Demonstration of Breakpad Feature</h2>
+    <div class="body conbody">
+      <p class="p">
+        The following example uses the command <span class="keyword cmdname">kill -11</span> to
+        simulate a <code class="ph codeph">SIGSEGV</code> crash for an <span class="keyword cmdname">impalad</span>
+        process on a single DataNode, then examines the relevant log files and minidump file.
+      </p>
+
+      <p class="p">
+        First, as root on a worker node, kill the <span class="keyword cmdname">impalad</span> process with a
+        <code class="ph codeph">SIGSEGV</code> error. The original process ID was 23114.
+      </p>
+
+<pre class="pre codeblock"><code>
+# ps ax | grep impalad
+23114 ?        Sl     0:18 /opt/local/parcels/&lt;parcel_version&gt;/lib/impala/sbin/impalad --flagfile=/var/run/impala/process/114-impala-IMPALAD/impala-conf/impalad_flags
+31259 pts/0    S+     0:00 grep impalad
+#
+# kill -11 23114
+#
+# ps ax | grep impalad
+31374 ?        Rl     0:04 /opt/local/parcels/&lt;parcel_version&gt;/lib/impala/sbin/impalad --flagfile=/var/run/impala/process/114-impala-IMPALAD/impala-conf/impalad_flags
+31475 pts/0    S+     0:00 grep impalad
+
+</code></pre>
+
+      <p class="p">
+        We locate the log directory underneath <span class="ph filepath">/var/log</span>.
+        There is a <code class="ph codeph">.INFO</code>, <code class="ph codeph">.WARNING</code>, and <code class="ph codeph">.ERROR</code>
+        log file for the 23114 process ID. The minidump message is written to the
+        <code class="ph codeph">.INFO</code> file and the <code class="ph codeph">.ERROR</code> file, but not the
+        <code class="ph codeph">.WARNING</code> file. In this case, a large core file was also produced.
+      </p>
+<pre class="pre codeblock"><code>
+# cd /var/log/impalad
+# ls -la | grep 23114
+-rw-------   1 impala impala 3539079168 Jun 23 15:20 core.23114
+-rw-r--r--   1 impala impala      99057 Jun 23 15:20 hs_err_pid23114.log
+-rw-r--r--   1 impala impala        351 Jun 23 15:20 impalad.worker_node_123.impala.log.ERROR.20160623-140343.23114
+-rw-r--r--   1 impala impala      29101 Jun 23 15:20 impalad.worker_node_123.impala.log.INFO.20160623-140343.23114
+-rw-r--r--   1 impala impala        228 Jun 23 14:03 impalad.worker_node_123.impala.log.WARNING.20160623-140343.23114
+
+</code></pre>
+      <p class="p">
+        The <code class="ph codeph">.INFO</code> log includes the location of the minidump file, followed by
+        a report of a core dump. With the breakpad minidump feature enabled, now we might
+        disable core dumps or keep fewer of them around.
+      </p>
+<pre class="pre codeblock"><code>
+# cat impalad.worker_node_123.impala.log.INFO.20160623-140343.23114
+...
+Wrote minidump to /var/log/impala-minidumps/impalad/0980da2d-a905-01e1-25ff883a-04ee027a.dmp
+#
+# A fatal error has been detected by the Java Runtime Environment:
+#
+#  SIGSEGV (0xb) at pc=0x00000030c0e0b68a, pid=23114, tid=139869541455968
+#
+# JRE version: Java(TM) SE Runtime Environment (7.0_67-b01) (build 1.7.0_67-b01)
+# Java VM: Java HotSpot(TM) 64-Bit Server VM (24.65-b04 mixed mode linux-amd64 compressed oops)
+# Problematic frame:
+# C  [libpthread.so.0+0xb68a]  pthread_cond_wait+0xca
+#
+# Core dump written. Default location: /var/log/impalad/core or core.23114
+#
+# An error report file with more information is saved as:
+# /var/log/impalad/hs_err_pid23114.log
+#
+# If you would like to submit a bug report, please visit:
+#   http://bugreport.sun.com/bugreport/crash.jsp
+# The crash happened outside the Java Virtual Machine in native code.
+# See problematic frame for where to report the bug.
+...
+
+# cat impalad.worker_node_123.impala.log.ERROR.20160623-140343.23114
+
+Log file created at: 2016/06/23 14:03:43
+Running on machine:.worker_node_123
+Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
+E0623 14:03:43.911002 23114 logging.cc:118] stderr will be logged to this file.
+Wrote minidump to /var/log/impala-minidumps/impalad/0980da2d-a905-01e1-25ff883a-04ee027a.dmp
+
+</code></pre>
+
+      <p class="p">
+        The resulting minidump file is much smaller than the corresponding core file,
+        making it much easier to supply diagnostic information to <span class="keyword">the appropriate support channel</span>.
+      </p>
+
+<pre class="pre codeblock"><code>
+# pwd
+/var/log/impalad
+# cd ../impala-minidumps/impalad
+# ls
+0980da2d-a905-01e1-25ff883a-04ee027a.dmp
+# du -kh *
+2.4M  0980da2d-a905-01e1-25ff883a-04ee027a.dmp
+
+</code></pre>
+    </div>
+  </article>
+
+</article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_char.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_char.html b/docs/build/html/topics/impala_char.html
new file mode 100644
index 0000000..e0b4cb9
--- /dev/null
+++ b/docs/build/html/topics/impala_char.html
@@ -0,0 +1,305 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_datatypes.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="char"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>CHAR Data Type (Impala 2.0 or higher only)</title></head><body id="char"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">CHAR Data Type (<span class="keyword">Impala 2.0</span> or higher only)</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      
+      A fixed-length character type, padded with trailing spaces if necessary to achieve the specified length. If
+      values are longer than the specified length, Impala truncates any trailing characters.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+    <p class="p">
+      In the column definition of a <code class="ph codeph">CREATE TABLE</code> statement:
+    </p>
+
+<pre class="pre codeblock"><code><var class="keyword varname">column_name</var> CHAR(<var class="keyword varname">length</var>)</code></pre>
+
+    <p class="p">
+      The maximum length you can specify is 255.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Semantics of trailing spaces:</strong>
+    </p>
+
+    <ul class="ul">
+      <li class="li">
+        When you store a <code class="ph codeph">CHAR</code> value shorter than the specified length in a table, queries return
+        the value padded with trailing spaces if necessary; the resulting value has the same length as specified in
+        the column definition.
+      </li>
+
+      <li class="li">
+        If you store a <code class="ph codeph">CHAR</code> value containing trailing spaces in a table, those trailing spaces are
+        not stored in the data file. When the value is retrieved by a query, the result could have a different
+        number of trailing spaces. That is, the value includes however many spaces are needed to pad it to the
+        specified length of the column.
+      </li>
+
+      <li class="li">
+        If you compare two <code class="ph codeph">CHAR</code> values that differ only in the number of trailing spaces, those
+        values are considered identical.
+      </li>
+    </ul>
+
+    <p class="p">
+        <strong class="ph b">Partitioning:</strong> This type can be used for partition key columns. Because of the efficiency advantage
+        of numeric values over character-based values, if the partition key is a string representation of a number,
+        prefer to use an integer type with sufficient range (<code class="ph codeph">INT</code>, <code class="ph codeph">BIGINT</code>, and so
+        on) where practical.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">HBase considerations:</strong> This data type cannot be used with HBase tables.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Parquet considerations:</strong>
+      </p>
+
+    <ul class="ul">
+      <li class="li">
+        This type can be read from and written to Parquet files.
+      </li>
+
+      <li class="li">
+        There is no requirement for a particular level of Parquet.
+      </li>
+
+      <li class="li">
+        Parquet files generated by Impala and containing this type can be freely interchanged with other components
+        such as Hive and MapReduce.
+      </li>
+
+      <li class="li">
+        Any trailing spaces, whether implicitly or explicitly specified, are not written to the Parquet data files.
+      </li>
+
+      <li class="li">
+        Parquet data files might contain values that are longer than allowed by the
+        <code class="ph codeph">CHAR(<var class="keyword varname">n</var>)</code> length limit. Impala ignores any extra trailing characters when
+        it processes those values during a query.
+      </li>
+    </ul>
+
+    <p class="p">
+        <strong class="ph b">Text table considerations:</strong>
+      </p>
+
+    <p class="p">
+      Text data files might contain values that are longer than allowed for a particular
+      <code class="ph codeph">CHAR(<var class="keyword varname">n</var>)</code> column. Any extra trailing characters are ignored when Impala
+      processes those values during a query. Text data files can also contain values that are shorter than the
+      defined length limit, and Impala pads them with trailing spaces up to the specified length. Any text data
+      files produced by Impala <code class="ph codeph">INSERT</code> statements do not include any trailing blanks for
+      <code class="ph codeph">CHAR</code> columns.
+    </p>
+
+    <p class="p"><strong class="ph b">Avro considerations:</strong></p>
+    <p class="p">
+        The Avro specification allows string values up to 2**64 bytes in length.
+        Impala queries for Avro tables use 32-bit integers to hold string lengths.
+        In <span class="keyword">Impala 2.5</span> and higher, Impala truncates <code class="ph codeph">CHAR</code>
+        and <code class="ph codeph">VARCHAR</code> values in Avro tables to (2**31)-1 bytes.
+        If a query encounters a <code class="ph codeph">STRING</code> value longer than (2**31)-1
+        bytes in an Avro table, the query fails. In earlier releases,
+        encountering such long values in an Avro table could cause a crash.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Compatibility:</strong>
+      </p>
+
+    <p class="p">
+      This type is available using <span class="keyword">Impala 2.0</span> or higher.
+    </p>
+
+    <p class="p">
+      Some other database systems make the length specification optional. For Impala, the length is required.
+    </p>
+
+
+
+    <p class="p">
+        <strong class="ph b">Internal details:</strong> Represented in memory as a byte array with the same size as the length
+        specification. Values that are shorter than the specified length are padded on the right with trailing
+        spaces.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.0.0</span>
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Column statistics considerations:</strong> Because this type has a fixed size, the maximum and average size
+        fields are always filled in for column statistics, even before you run the <code class="ph codeph">COMPUTE STATS</code>
+        statement.
+      </p>
+
+
+
+    <p class="p">
+        <strong class="ph b">UDF considerations:</strong> This type cannot be used for the argument or return type of a user-defined
+        function (UDF) or user-defined aggregate function (UDA).
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+    <p class="p">
+      These examples show how trailing spaces are not considered significant when comparing or processing
+      <code class="ph codeph">CHAR</code> values. <code class="ph codeph">CAST()</code> truncates any longer string to fit within the defined
+      length. If a <code class="ph codeph">CHAR</code> value is shorter than the specified length, it is padded on the right with
+      spaces until it matches the specified length. Therefore, <code class="ph codeph">LENGTH()</code> represents the length
+      including any trailing spaces, and <code class="ph codeph">CONCAT()</code> also treats the column value as if it has
+      trailing spaces.
+    </p>
+
+<pre class="pre codeblock"><code>select cast('x' as char(4)) = cast('x   ' as char(4)) as "unpadded equal to padded";
++--------------------------+
+| unpadded equal to padded |
++--------------------------+
+| true                     |
++--------------------------+
+
+create table char_length(c char(3));
+insert into char_length values (cast('1' as char(3))), (cast('12' as char(3))), (cast('123' as char(3))), (cast('123456' as char(3)));
+select concat("[",c,"]") as c, length(c) from char_length;
++-------+-----------+
+| c     | length(c) |
++-------+-----------+
+| [1  ] | 3         |
+| [12 ] | 3         |
+| [123] | 3         |
+| [123] | 3         |
++-------+-----------+
+</code></pre>
+
+    <p class="p">
+      This example shows a case where data values are known to have a specific length, where <code class="ph codeph">CHAR</code>
+      is a logical data type to use.
+
+    </p>
+
+<pre class="pre codeblock"><code>create table addresses
+  (id bigint,
+   street_name string,
+   state_abbreviation char(2),
+   country_abbreviation char(2));
+</code></pre>
+
+    <p class="p">
+      The following example shows how values written by Impala do not physically include the trailing spaces. It
+      creates a table using text format, with <code class="ph codeph">CHAR</code> values much shorter than the declared length,
+      and then prints the resulting data file to show that the delimited values are not separated by spaces. The
+      same behavior applies to binary-format Parquet data files.
+    </p>
+
+<pre class="pre codeblock"><code>create table char_in_text (a char(20), b char(30), c char(40))
+  row format delimited fields terminated by ',';
+
+insert into char_in_text values (cast('foo' as char(20)), cast('bar' as char(30)), cast('baz' as char(40))), (cast('hello' as char(20)), cast('goodbye' as char(30)), cast('aloha' as char(40)));
+
+-- Running this Linux command inside impala-shell using the ! shortcut.
+!hdfs dfs -cat 'hdfs://127.0.0.1:8020/user/hive/warehouse/impala_doc_testing.db/char_in_text/*.*';
+foo,bar,baz
+hello,goodbye,aloha
+</code></pre>
+
+    <p class="p">
+      The following example further illustrates the treatment of spaces. It replaces the contents of the previous
+      table with some values including leading spaces, trailing spaces, or both. Any leading spaces are preserved
+      within the data file, but trailing spaces are discarded. Then when the values are retrieved by a query, the
+      leading spaces are retrieved verbatim while any necessary trailing spaces are supplied by Impala.
+    </p>
+
+<pre class="pre codeblock"><code>insert overwrite char_in_text values (cast('trailing   ' as char(20)), cast('   leading and trailing   ' as char(30)), cast('   leading' as char(40)));
+!hdfs dfs -cat 'hdfs://127.0.0.1:8020/user/hive/warehouse/impala_doc_testing.db/char_in_text/*.*';
+trailing,   leading and trailing,   leading
+
+select concat('[',a,']') as a, concat('[',b,']') as b, concat('[',c,']') as c from char_in_text;
++------------------------+----------------------------------+--------------------------------------------+
+| a                      | b                                | c                                          |
++------------------------+----------------------------------+--------------------------------------------+
+| [trailing            ] | [   leading and trailing       ] | [   leading                              ] |
++------------------------+----------------------------------+--------------------------------------------+
+</code></pre>
+
+    <p class="p">
+        <strong class="ph b">Kudu considerations:</strong>
+      </p>
+    <p class="p">
+        Currently, the data types <code class="ph codeph">DECIMAL</code>, <code class="ph codeph">TIMESTAMP</code>, <code class="ph codeph">CHAR</code>, <code class="ph codeph">VARCHAR</code>,
+        <code class="ph codeph">ARRAY</code>, <code class="ph codeph">MAP</code>, and <code class="ph codeph">STRUCT</code> cannot be used with Kudu tables.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Restrictions:</strong>
+      </p>
+
+    <p class="p">
+      Because the blank-padding behavior requires allocating the maximum length for each value in memory, for
+      scalability reasons avoid declaring <code class="ph codeph">CHAR</code> columns that are much longer than typical values in
+      that column.
+    </p>
+
+    <p class="p">
+        All data in <code class="ph codeph">CHAR</code> and <code class="ph codeph">VARCHAR</code> columns must be in a character encoding that
+        is compatible with UTF-8. If you have binary data from another database system (that is, a BLOB type), use
+        a <code class="ph codeph">STRING</code> column to hold it.
+      </p>
+
+    <p class="p">
+      When an expression compares a <code class="ph codeph">CHAR</code> with a <code class="ph codeph">STRING</code> or
+      <code class="ph codeph">VARCHAR</code>, the <code class="ph codeph">CHAR</code> value is implicitly converted to <code class="ph codeph">STRING</code>
+      first, with trailing spaces preserved.
+    </p>
+
+<pre class="pre codeblock"><code>select cast("foo  " as char(5)) = 'foo' as "char equal to string";
++----------------------+
+| char equal to string |
++----------------------+
+| false                |
++----------------------+
+</code></pre>
+
+    <p class="p">
+      This behavior differs from other popular database systems. To get the expected result of
+      <code class="ph codeph">TRUE</code>, cast the expressions on both sides to <code class="ph codeph">CHAR</code> values of the appropriate
+      length:
+    </p>
+
+<pre class="pre codeblock"><code>select cast("foo  " as char(5)) = cast('foo' as char(3)) as "char equal to string";
++----------------------+
+| char equal to string |
++----------------------+
+| true                 |
++----------------------+
+</code></pre>
+
+    <p class="p">
+      This behavior is subject to change in future releases.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+    <p class="p">
+      <a class="xref" href="impala_string.html#string">STRING Data Type</a>, <a class="xref" href="impala_varchar.html#varchar">VARCHAR Data Type (Impala 2.0 or higher only)</a>,
+      <a class="xref" href="impala_literals.html#string_literals">String Literals</a>,
+      <a class="xref" href="impala_string_functions.html#string_functions">Impala String Functions</a>
+    </p>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_datatypes.html">Data Types</a></div></div></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_cluster_sizing.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_cluster_sizing.html b/docs/build/html/topics/impala_cluster_sizing.html
new file mode 100644
index 0000000..d1f2a51
--- /dev/null
+++ b/docs/build/html/topics/impala_cluster_sizing.html
@@ -0,0 +1,318 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_planning.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="cluster_sizing"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Cluster Sizing Guidelines for Impala</title></head><body id="cluster_sizing"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Cluster Sizing Guidelines for Impala</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      
+      This document provides a very rough guideline to estimate the size of a cluster needed for a specific
+      customer application. You can use this information when planning how much and what type of hardware to
+      acquire for a new cluster, or when adding Impala workloads to an existing cluster.
+    </p>
+
+    <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+      Before making purchase or deployment decisions, consult organizations with relevant experience
+      to verify the conclusions about hardware requirements based on your data volume and workload.
+    </div>
+
+
+
+    <p class="p">
+      Always use hosts with identical specifications and capacities for all the nodes in the cluster. Currently,
+      Impala divides the work evenly between cluster nodes, regardless of their exact hardware configuration.
+      Because work can be distributed in different ways for different queries, if some hosts are overloaded
+      compared to others in terms of CPU, memory, I/O, or network, you might experience inconsistent performance
+      and overall slowness
+    </p>
+
+    <p class="p">
+      For analytic workloads with star/snowflake schemas, and using consistent hardware for all nodes (64 GB RAM,
+      12 2 TB hard drives, 2x E5-2630L 12 cores total, 10 GB network), the following table estimates the number of
+      DataNodes needed in the cluster based on data size and the number of concurrent queries, for workloads
+      similar to TPC-DS benchmark queries:
+    </p>
+
+    <table class="table"><caption><span class="table--title-label">Table 1. </span><span class="title">Cluster size estimation based on the number of concurrent queries and data size with a 20 second average query response time</span></caption><colgroup><col><col><col><col><col><col></colgroup><thead class="thead">
+          <tr class="row">
+            <th class="entry nocellnorowborder" id="cluster_sizing__entry__1">
+              Data Size
+            </th>
+            <th class="entry nocellnorowborder" id="cluster_sizing__entry__2">
+              1 query
+            </th>
+            <th class="entry nocellnorowborder" id="cluster_sizing__entry__3">
+              10 queries
+            </th>
+            <th class="entry nocellnorowborder" id="cluster_sizing__entry__4">
+              100 queries
+            </th>
+            <th class="entry nocellnorowborder" id="cluster_sizing__entry__5">
+              1000 queries
+            </th>
+            <th class="entry nocellnorowborder" id="cluster_sizing__entry__6">
+              2000 queries
+            </th>
+          </tr>
+        </thead><tbody class="tbody">
+          <tr class="row">
+            <td class="entry nocellnorowborder" headers="cluster_sizing__entry__1 ">
+              <strong class="ph b">250 GB</strong>
+            </td>
+            <td class="entry nocellnorowborder" headers="cluster_sizing__entry__2 ">
+              2
+            </td>
+            <td class="entry nocellnorowborder" headers="cluster_sizing__entry__3 ">
+              2
+            </td>
+            <td class="entry nocellnorowborder" headers="cluster_sizing__entry__4 ">
+              5
+            </td>
+            <td class="entry nocellnorowborder" headers="cluster_sizing__entry__5 ">
+              35
+            </td>
+            <td class="entry nocellnorowborder" headers="cluster_sizing__entry__6 ">
+              70
+            </td>
+          </tr>
+          <tr class="row">
+            <td class="entry nocellnorowborder" headers="cluster_sizing__entry__1 ">
+              <strong class="ph b">500 GB</strong>
+            </td>
+            <td class="entry nocellnorowborder" headers="cluster_sizing__entry__2 ">
+              2
+            </td>
+            <td class="entry nocellnorowborder" headers="cluster_sizing__entry__3 ">
+              2
+            </td>
+            <td class="entry nocellnorowborder" headers="cluster_sizing__entry__4 ">
+              10
+            </td>
+            <td class="entry nocellnorowborder" headers="cluster_sizing__entry__5 ">
+              70
+            </td>
+            <td class="entry nocellnorowborder" headers="cluster_sizing__entry__6 ">
+              135
+            </td>
+          </tr>
+          <tr class="row">
+            <td class="entry nocellnorowborder" headers="cluster_sizing__entry__1 ">
+              <strong class="ph b">1 TB</strong>
+            </td>
+            <td class="entry nocellnorowborder" headers="cluster_sizing__entry__2 ">
+              2
+            </td>
+            <td class="entry nocellnorowborder" headers="cluster_sizing__entry__3 ">
+              2
+            </td>
+            <td class="entry nocellnorowborder" headers="cluster_sizing__entry__4 ">
+              15
+            </td>
+            <td class="entry nocellnorowborder" headers="cluster_sizing__entry__5 ">
+              135
+            </td>
+            <td class="entry nocellnorowborder" headers="cluster_sizing__entry__6 ">
+              270
+            </td>
+          </tr>
+          <tr class="row">
+            <td class="entry nocellnorowborder" headers="cluster_sizing__entry__1 ">
+              <strong class="ph b">15 TB</strong>
+            </td>
+            <td class="entry nocellnorowborder" headers="cluster_sizing__entry__2 ">
+              2
+            </td>
+            <td class="entry nocellnorowborder" headers="cluster_sizing__entry__3 ">
+              20
+            </td>
+            <td class="entry nocellnorowborder" headers="cluster_sizing__entry__4 ">
+              200
+            </td>
+            <td class="entry nocellnorowborder" headers="cluster_sizing__entry__5 ">
+              N/A
+            </td>
+            <td class="entry nocellnorowborder" headers="cluster_sizing__entry__6 ">
+              N/A
+            </td>
+          </tr>
+          <tr class="row">
+            <td class="entry nocellnorowborder" headers="cluster_sizing__entry__1 ">
+              <strong class="ph b">30 TB</strong>
+            </td>
+            <td class="entry nocellnorowborder" headers="cluster_sizing__entry__2 ">
+              4
+            </td>
+            <td class="entry nocellnorowborder" headers="cluster_sizing__entry__3 ">
+              40
+            </td>
+            <td class="entry nocellnorowborder" headers="cluster_sizing__entry__4 ">
+              400
+            </td>
+            <td class="entry nocellnorowborder" headers="cluster_sizing__entry__5 ">
+              N/A
+            </td>
+            <td class="entry nocellnorowborder" headers="cluster_sizing__entry__6 ">
+              N/A
+            </td>
+          </tr>
+          <tr class="row">
+            <td class="entry nocellnorowborder" headers="cluster_sizing__entry__1 ">
+              <strong class="ph b">60 TB</strong>
+            </td>
+            <td class="entry nocellnorowborder" headers="cluster_sizing__entry__2 ">
+              8
+            </td>
+            <td class="entry nocellnorowborder" headers="cluster_sizing__entry__3 ">
+              80
+            </td>
+            <td class="entry nocellnorowborder" headers="cluster_sizing__entry__4 ">
+              800
+            </td>
+            <td class="entry nocellnorowborder" headers="cluster_sizing__entry__5 ">
+              N/A
+            </td>
+            <td class="entry nocellnorowborder" headers="cluster_sizing__entry__6 ">
+              N/A
+            </td>
+          </tr>
+        </tbody></table>
+
+    <section class="section" id="cluster_sizing__sizing_factors"><h2 class="title sectiontitle">Factors Affecting Scalability</h2>
+
+      
+
+      <p class="p">
+        A typical analytic workload (TPC-DS style queries) using recommended hardware is usually CPU-bound. Each
+        node can process roughly 1.6 GB/sec. Both CPU-bound and disk-bound workloads can scale almost linearly with
+        cluster size. However, for some workloads, the scalability might be bounded by the network, or even by
+        memory.
+      </p>
+
+      <p class="p">
+        If the workload is already network bound (on a 10 GB network), increasing the cluster size won\u2019t reduce
+        the network load; in fact, a larger cluster could increase network traffic because some queries involve
+        <span class="q">"broadcast"</span> operations to all DataNodes. Therefore, boosting the cluster size does not improve query
+        throughput in a network-constrained environment.
+      </p>
+
+      <p class="p">
+        Let\u2019s look at a memory-bound workload. A workload is memory-bound if Impala cannot run any additional
+        concurrent queries because all memory allocated has already been consumed, but neither CPU, disk, nor
+        network is saturated yet. This can happen because currently Impala uses only a single core per node to
+        process join and aggregation queries. For a node with 128 GB of RAM, if a join node takes 50 GB, the system
+        cannot run more than 2 such queries at the same time.
+      </p>
+
+      <p class="p">
+        Therefore, at most 2 cores are used. Throughput can still scale almost linearly even for a memory-bound
+        workload. It\u2019s just that the CPU will not be saturated. Per-node throughput will be lower than 1.6
+        GB/sec. Consider increasing the memory per node.
+      </p>
+
+      <p class="p">
+        As long as the workload is not network- or memory-bound, we can use the 1.6 GB/second per node as the
+        throughput estimate.
+      </p>
+    </section>
+
+    <section class="section" id="cluster_sizing__sizing_details"><h2 class="title sectiontitle">A More Precise Approach</h2>
+
+      
+
+      <p class="p">
+        A more precise sizing estimate would require not only queries per minute (QPM), but also an average data
+        size scanned per query (D). With the proper partitioning strategy, D is usually a fraction of the total
+        data size. The following equation can be used as a rough guide to estimate the number of nodes (N) needed:
+      </p>
+
+<pre class="pre codeblock"><code>Eq 1: N &gt; QPM * D / 100 GB
+</code></pre>
+
+      <p class="p">
+        Here is an example. Suppose, on average, a query scans 50 GB of data and the average response time is
+        required to be 15 seconds or less when there are 100 concurrent queries. The QPM is 100/15*60 = 400. We can
+        estimate the number of node using our equation above.
+      </p>
+
+<pre class="pre codeblock"><code>N &gt; QPM * D / 100GB
+N &gt; 400 * 50GB / 100GB
+N &gt; 200
+</code></pre>
+
+      <p class="p">
+        Because this figure is a rough estimate, the corresponding number of nodes could be between 100 and 500.
+      </p>
+
+      <p class="p">
+        Depending on the complexity of the query, the processing rate of query might change. If the query has more
+        joins, aggregation functions, or CPU-intensive functions such as string processing or complex UDFs, the
+        process rate will be lower than 1.6 GB/second per node. On the other hand, if the query only does scan and
+        filtering on numbers, the processing rate can be higher.
+      </p>
+    </section>
+
+    <section class="section" id="cluster_sizing__sizing_mem_estimate"><h2 class="title sectiontitle">Estimating Memory Requirements</h2>
+
+      
+      
+
+      <p class="p">
+        Impala can handle joins between multiple large tables. Make sure that statistics are collected for all the
+        joined tables, using the <code class="ph codeph"><a class="xref" href="impala_compute_stats.html#compute_stats">COMPUTE
+        STATS</a></code> statement. However, joining big tables does consume more memory. Follow the steps
+        below to calculate the minimum memory requirement.
+      </p>
+
+      <p class="p">
+        Suppose you are running the following join:
+      </p>
+
+<pre class="pre codeblock"><code>select a.*, b.col_1, b.col_2, \u2026 b.col_n
+from a, b
+where a.key = b.key
+and b.col_1 in (1,2,4...)
+and b.col_4 in (....);
+</code></pre>
+
+      <p class="p">
+        And suppose table <code class="ph codeph">B</code> is smaller than table <code class="ph codeph">A</code> (but still a large table).
+      </p>
+
+      <p class="p">
+        The memory requirement for the query is the right-hand table (<code class="ph codeph">B</code>), after decompression,
+        filtering (<code class="ph codeph">b.col_n in ...</code>) and after projection (only using certain columns) must be less
+        than the total memory of the entire cluster.
+      </p>
+
+<pre class="pre codeblock"><code>Cluster Total Memory Requirement  = Size of the smaller table *
+  selectivity factor from the predicate *
+  projection factor * compression ratio
+</code></pre>
+
+      <p class="p">
+        In this case, assume that table <code class="ph codeph">B</code> is 100 TB in Parquet format with 200 columns. The
+        predicate on <code class="ph codeph">B</code> (<code class="ph codeph">b.col_1 in ...and b.col_4 in ...</code>) will select only 10% of
+        the rows from <code class="ph codeph">B</code> and for projection, we are only projecting 5 columns out of 200 columns.
+        Usually, Snappy compression gives us 3 times compression, so we estimate a 3x compression factor.
+      </p>
+
+<pre class="pre codeblock"><code>Cluster Total Memory Requirement  = Size of the smaller table *
+  selectivity factor from the predicate *
+  projection factor * compression ratio
+  = 100TB * 10% * 5/200 * 3
+  = 0.75TB
+  = 750GB
+</code></pre>
+
+      <p class="p">
+        So, if you have a 10-node cluster, each node has 128 GB of RAM and you give 80% to Impala, then you have 1
+        TB of usable memory for Impala, which is more than 750GB. Therefore, your cluster can handle join queries
+        of this magnitude.
+      </p>
+    </section>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_planning.html">Planning for Impala Deployment</a></div></div></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_comments.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_comments.html b/docs/build/html/topics/impala_comments.html
new file mode 100644
index 0000000..e3d711a
--- /dev/null
+++ b/docs/build/html/topics/impala_comments.html
@@ -0,0 +1,46 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_langref.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="comments"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Comments</title></head><body id="comments"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Comments</h1>
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      
+      Impala supports the familiar styles of SQL comments:
+    </p>
+
+    <ul class="ul">
+      <li class="li">
+        All text from a <code class="ph codeph">--</code> sequence to the end of the line is considered a comment and ignored.
+        This type of comment can occur on a single line by itself, or after all or part of a statement.
+      </li>
+
+      <li class="li">
+        All text from a <code class="ph codeph">/*</code> sequence to the next <code class="ph codeph">*/</code> sequence is considered a
+        comment and ignored. This type of comment can stretch over multiple lines. This type of comment can occur
+        on one or more lines by itself, in the middle of a statement, or before or after a statement.
+      </li>
+    </ul>
+
+    <p class="p">
+      For example:
+    </p>
+
+<pre class="pre codeblock"><code>-- This line is a comment about a table.
+create table ...;
+
+/*
+This is a multi-line comment about a query.
+*/
+select ...;
+
+select * from t /* This is an embedded comment about a query. */ where ...;
+
+select * from t -- This is a trailing comment within a multi-line command.
+where ...;
+</code></pre>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_langref.html">Impala SQL Language Reference</a></div></div></nav></article></main></body></html>
\ No newline at end of file

[31/51] [partial] incubator-impala git commit: IMPALA-4181 [DOCS] Publish rendered Impala documentation to ASF site

Posted by jb...@apache.org.

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_hbase.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_hbase.html b/docs/build/html/topics/impala_hbase.html
new file mode 100644
index 0000000..7ee8bad
--- /dev/null
+++ b/docs/build/html/topics/impala_hbase.html
@@ -0,0 +1,763 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.
 8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="impala_hbase"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Using Impala to Query HBase Tables</title></head><body id="impala_hbase"><main role="main"><article role="article" aria-labelledby="impala_hbase__hbase">
+
+  <h1 class="title topictitle1" id="impala_hbase__hbase">Using Impala to Query HBase Tables</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      
+      You can use Impala to query HBase tables. This capability allows convenient access to a storage system that
+      is tuned for different kinds of workloads than the default with Impala. The default Impala tables use data
+      files stored on HDFS, which are ideal for bulk loads and queries using full-table scans. In contrast, HBase
+      can do efficient queries for data organized for OLTP-style workloads, with lookups of individual rows or
+      ranges of values.
+    </p>
+
+    <p class="p">
+      From the perspective of an Impala user, coming from an RDBMS background, HBase is a kind of key-value store
+      where the value consists of multiple fields. The key is mapped to one column in the Impala table, and the
+      various fields of the value are mapped to the other columns in the Impala table.
+    </p>
+
+    <p class="p">
+      For background information on HBase, see <a class="xref" href="https://hbase.apache.org/book.html" target="_blank">the Apache HBase documentation</a>.
+    </p>
+
+    <p class="p toc inpage"></p>
+  </div>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title2" id="impala_hbase__hbase_using">
+
+    <h2 class="title topictitle2" id="ariaid-title2">Overview of Using HBase with Impala</h2>
+  
+
+    <div class="body conbody">
+
+      <p class="p">
+        When you use Impala with HBase:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          You create the tables on the Impala side using the Hive shell, because the Impala <code class="ph codeph">CREATE
+          TABLE</code> statement currently does not support custom SerDes and some other syntax needed for these
+          tables:
+          <ul class="ul">
+            <li class="li">
+              You designate it as an HBase table using the <code class="ph codeph">STORED BY
+              'org.apache.hadoop.hive.hbase.HBaseStorageHandler'</code> clause on the Hive <code class="ph codeph">CREATE
+              TABLE</code> statement.
+            </li>
+
+            <li class="li">
+              You map these specially created tables to corresponding tables that exist in HBase, with the clause
+              <code class="ph codeph">TBLPROPERTIES("hbase.table.name" = "<var class="keyword varname">table_name_in_hbase</var>")</code> on the
+              Hive <code class="ph codeph">CREATE TABLE</code> statement.
+            </li>
+
+            <li class="li">
+              See <a class="xref" href="#hbase_queries">Examples of Querying HBase Tables from Impala</a> for a full example.
+            </li>
+          </ul>
+        </li>
+
+        <li class="li">
+          You define the column corresponding to the HBase row key as a string with the <code class="ph codeph">#string</code>
+          keyword, or map it to a <code class="ph codeph">STRING</code> column.
+        </li>
+
+        <li class="li">
+          Because Impala and Hive share the same metastore database, once you create the table in Hive, you can
+          query or insert into it through Impala. (After creating a new table through Hive, issue the
+          <code class="ph codeph">INVALIDATE METADATA</code> statement in <span class="keyword cmdname">impala-shell</span> to make Impala aware of
+          the new table.)
+        </li>
+
+        <li class="li">
+          You issue queries against the Impala tables. For efficient queries, use <code class="ph codeph">WHERE</code> clauses to
+          find a single key value or a range of key values wherever practical, by testing the Impala column
+          corresponding to the HBase row key. Avoid queries that do full-table scans, which are efficient for
+          regular Impala tables but inefficient in HBase.
+        </li>
+      </ul>
+
+      <p class="p">
+        To work with an HBase table from Impala, ensure that the <code class="ph codeph">impala</code> user has read/write
+        privileges for the HBase table, using the <code class="ph codeph">GRANT</code> command in the HBase shell. For details
+        about HBase security, see <a class="xref" href="https://hbase.apache.org/book.html#security" target="_blank">the Security chapter in the Apache HBase documentation</a>.
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="impala_hbase__hbase_config">
+
+    <h2 class="title topictitle2" id="ariaid-title3">Configuring HBase for Use with Impala</h2>
+  
+
+    <div class="body conbody">
+
+      <p class="p">
+        HBase works out of the box with Impala. There is no mandatory configuration needed to use these two
+        components together.
+      </p>
+
+      <p class="p">
+        To avoid delays if HBase is unavailable during Impala startup or after an <code class="ph codeph">INVALIDATE
+        METADATA</code> statement, set timeout values similar to the following in
+        <span class="ph filepath">/etc/impala/conf/hbase-site.xml</span>:
+      </p>
+
+<pre class="pre codeblock"><code>&lt;property&gt;
+  &lt;name&gt;hbase.client.retries.number&lt;/name&gt;
+  &lt;value&gt;3&lt;/value&gt;
+&lt;/property&gt;
+&lt;property&gt;
+  &lt;name&gt;hbase.rpc.timeout&lt;/name&gt;
+  &lt;value&gt;3000&lt;/value&gt;
+&lt;/property&gt;
+</code></pre>
+
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="impala_hbase__hbase_types">
+
+    <h2 class="title topictitle2" id="ariaid-title4">Supported Data Types for HBase Columns</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        To understand how Impala column data types are mapped to fields in HBase, you should have some background
+        knowledge about HBase first. You set up the mapping by running the <code class="ph codeph">CREATE TABLE</code> statement
+        in the Hive shell. See
+        <a class="xref" href="https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration" target="_blank">the
+        Hive wiki</a> for a starting point, and <a class="xref" href="#hbase_queries">Examples of Querying HBase Tables from Impala</a> for examples.
+      </p>
+
+      <p class="p">
+        HBase works as a kind of <span class="q">"bit bucket"</span>, in the sense that HBase does not enforce any typing for the
+        key or value fields. All the type enforcement is done on the Impala side.
+      </p>
+
+      <p class="p">
+        For best performance of Impala queries against HBase tables, most queries will perform comparisons in the
+        <code class="ph codeph">WHERE</code> against the column that corresponds to the HBase row key. When creating the table
+        through the Hive shell, use the <code class="ph codeph">STRING</code> data type for the column that corresponds to the
+        HBase row key. Impala can translate conditional tests (through operators such as <code class="ph codeph">=</code>,
+        <code class="ph codeph">&lt;</code>, <code class="ph codeph">BETWEEN</code>, and <code class="ph codeph">IN</code>) against this column into fast
+        lookups in HBase, but this optimization (<span class="q">"predicate pushdown"</span>) only works when that column is
+        defined as <code class="ph codeph">STRING</code>.
+      </p>
+
+      <p class="p">
+        Starting in Impala 1.1, Impala also supports reading and writing to columns that are defined in the Hive
+        <code class="ph codeph">CREATE TABLE</code> statement using binary data types, represented in the Hive table definition
+        using the <code class="ph codeph">#binary</code> keyword, often abbreviated as <code class="ph codeph">#b</code>. Defining numeric
+        columns as binary can reduce the overall data volume in the HBase tables. You should still define the
+        column that corresponds to the HBase row key as a <code class="ph codeph">STRING</code>, to allow fast lookups using
+        those columns.
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title5" id="impala_hbase__hbase_performance">
+
+    <h2 class="title topictitle2" id="ariaid-title5">Performance Considerations for the Impala-HBase Integration</h2>
+  
+
+    <div class="body conbody">
+
+      <p class="p">
+        To understand the performance characteristics of SQL queries against data stored in HBase, you should have
+        some background knowledge about how HBase interacts with SQL-oriented systems first. See
+        <a class="xref" href="https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration" target="_blank">the
+        Hive wiki</a> for a starting point; because Impala shares the same metastore database as Hive, the
+        information about mapping columns from Hive tables to HBase tables is generally applicable to Impala too.
+      </p>
+
+      <p class="p">
+        Impala uses the HBase client API via Java Native Interface (JNI) to query data stored in HBase. This
+        querying does not read HFiles directly. The extra communication overhead makes it important to choose what
+        data to store in HBase or in HDFS, and construct efficient queries that can retrieve the HBase data
+        efficiently:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          Use HBase table for queries that return a single row or a range of rows, not queries that scan the entire
+          table. (If a query has no <code class="ph codeph">WHERE</code> clause, that is a strong indicator that it is an
+          inefficient query for an HBase table.)
+        </li>
+
+        <li class="li">
+          If you have join queries that do aggregation operations on large fact tables and join the results against
+          small dimension tables, consider using Impala for the fact tables and HBase for the dimension tables.
+          (Because Impala does a full scan on the HBase table in this case, rather than doing single-row HBase
+          lookups based on the join column, only use this technique where the HBase table is small enough that
+          doing a full table scan does not cause a performance bottleneck for the query.)
+        </li>
+      </ul>
+
+      <p class="p">
+        Query predicates are applied to row keys as start and stop keys, thereby limiting the scope of a particular
+        lookup. If row keys are not mapped to string columns, then ordering is typically incorrect and comparison
+        operations do not work. For example, if row keys are not mapped to string columns, evaluating for greater
+        than (&gt;) or less than (&lt;) cannot be completed.
+      </p>
+
+      <p class="p">
+        Predicates on non-key columns can be sent to HBase to scan as <code class="ph codeph">SingleColumnValueFilters</code>,
+        providing some performance gains. In such a case, HBase returns fewer rows than if those same predicates
+        were applied using Impala. While there is some improvement, it is not as great when start and stop rows are
+        used. This is because the number of rows that HBase must examine is not limited as it is when start and
+        stop rows are used. As long as the row key predicate only applies to a single row, HBase will locate and
+        return that row. Conversely, if a non-key predicate is used, even if it only applies to a single row, HBase
+        must still scan the entire table to find the correct result.
+      </p>
+
+      <div class="example"><h3 class="title sectiontitle">Interpreting EXPLAIN Output for HBase Queries</h3>
+
+        
+
+        <p class="p">
+          For example, here are some queries against the following Impala table, which is mapped to an HBase table.
+          The examples show excerpts from the output of the <code class="ph codeph">EXPLAIN</code> statement, demonstrating what
+          things to look for to indicate an efficient or inefficient query against an HBase table.
+        </p>
+
+        <p class="p">
+          The first column (<code class="ph codeph">cust_id</code>) was specified as the key column in the <code class="ph codeph">CREATE
+          EXTERNAL TABLE</code> statement; for performance, it is important to declare this column as
+          <code class="ph codeph">STRING</code>. Other columns, such as <code class="ph codeph">BIRTH_YEAR</code> and
+          <code class="ph codeph">NEVER_LOGGED_ON</code>, are also declared as <code class="ph codeph">STRING</code>, rather than their
+          <span class="q">"natural"</span> types of <code class="ph codeph">INT</code> or <code class="ph codeph">BOOLEAN</code>, because Impala can optimize
+          those types more effectively in HBase tables. For comparison, we leave one column,
+          <code class="ph codeph">YEAR_REGISTERED</code>, as <code class="ph codeph">INT</code> to show that filtering on this column is
+          inefficient.
+        </p>
+
+<pre class="pre codeblock"><code>describe hbase_table;
+Query: describe hbase_table
++-----------------------+--------+---------+
+| name                  | type   | comment |
++-----------------------+--------+---------+
+| cust_id               | <strong class="ph b">string</strong> |         |
+| birth_year            | <strong class="ph b">string</strong> |         |
+| never_logged_on       | <strong class="ph b">string</strong> |         |
+| private_email_address | string |         |
+| year_registered       | <strong class="ph b">int</strong>    |         |
++-----------------------+--------+---------+
+</code></pre>
+
+        <p class="p">
+          The best case for performance involves a single row lookup using an equality comparison on the column
+          defined as the row key:
+        </p>
+
+<pre class="pre codeblock"><code>explain select count(*) from hbase_table where cust_id = 'some_user@example.com';
++------------------------------------------------------------------------------------+
+| Explain String                                                                     |
++------------------------------------------------------------------------------------+
+| Estimated Per-Host Requirements: Memory=1.01GB VCores=1                            |
+| WARNING: The following tables are missing relevant table and/or column statistics. |
+| hbase.hbase_table                                                                  |
+|                                                                                    |
+| 03:AGGREGATE [MERGE FINALIZE]                                                      |
+| |  output: sum(count(*))                                                           |
+| |                                                                                  |
+| 02:EXCHANGE [PARTITION=UNPARTITIONED]                                              |
+| |                                                                                  |
+| 01:AGGREGATE                                                                       |
+| |  output: count(*)                                                                |
+| |                                                                                  |
+<strong class="ph b">| 00:SCAN HBASE [hbase.hbase_table]                                                  |</strong>
+<strong class="ph b">|    start key: some_user@example.com                                                |</strong>
+<strong class="ph b">|    stop key: some_user@example.com\0                                               |</strong>
++------------------------------------------------------------------------------------+
+</code></pre>
+
+        <p class="p">
+          Another type of efficient query involves a range lookup on the row key column, using SQL operators such
+          as greater than (or equal), less than (or equal), or <code class="ph codeph">BETWEEN</code>. This example also includes
+          an equality test on a non-key column; because that column is a <code class="ph codeph">STRING</code>, Impala can let
+          HBase perform that test, indicated by the <code class="ph codeph">hbase filters:</code> line in the
+          <code class="ph codeph">EXPLAIN</code> output. Doing the filtering within HBase is more efficient than transmitting all
+          the data to Impala and doing the filtering on the Impala side.
+        </p>
+
+<pre class="pre codeblock"><code>explain select count(*) from hbase_table where cust_id between 'a' and 'b'
+  and never_logged_on = 'true';
++------------------------------------------------------------------------------------+
+| Explain String                                                                     |
++------------------------------------------------------------------------------------+
+...
+
+| 01:AGGREGATE                                                                       |
+| |  output: count(*)                                                                |
+| |                                                                                  |
+<strong class="ph b">| 00:SCAN HBASE [hbase.hbase_table]                                                  |</strong>
+<strong class="ph b">|    start key: a                                                                    |</strong>
+<strong class="ph b">|    stop key: b\0                                                                   |</strong>
+<strong class="ph b">|    hbase filters: cols:never_logged_on EQUAL 'true'                                |</strong>
++------------------------------------------------------------------------------------+
+</code></pre>
+
+        <p class="p">
+          The query is less efficient if Impala has to evaluate any of the predicates, because Impala must scan the
+          entire HBase table. Impala can only push down predicates to HBase for columns declared as
+          <code class="ph codeph">STRING</code>. This example tests a column declared as <code class="ph codeph">INT</code>, and the
+          <code class="ph codeph">predicates:</code> line in the <code class="ph codeph">EXPLAIN</code> output indicates that the test is
+          performed after the data is transmitted to Impala.
+        </p>
+
+<pre class="pre codeblock"><code>explain select count(*) from hbase_table where year_registered = 2010;
++------------------------------------------------------------------------------------+
+| Explain String                                                                     |
++------------------------------------------------------------------------------------+
+...
+
+| 01:AGGREGATE                                                                       |
+| |  output: count(*)                                                                |
+| |                                                                                  |
+<strong class="ph b">| 00:SCAN HBASE [hbase.hbase_table]                                                  |</strong>
+<strong class="ph b">|    predicates: year_registered = 2010                                              |</strong>
++------------------------------------------------------------------------------------+
+</code></pre>
+
+        <p class="p">
+          The same inefficiency applies if the key column is compared to any non-constant value. Here, even though
+          the key column is a <code class="ph codeph">STRING</code>, and is tested using an equality operator, Impala must scan
+          the entire HBase table because the key column is compared to another column value rather than a constant.
+        </p>
+
+<pre class="pre codeblock"><code>explain select count(*) from hbase_table where cust_id = private_email_address;
++------------------------------------------------------------------------------------+
+| Explain String                                                                     |
++------------------------------------------------------------------------------------+
+...
+
+| 01:AGGREGATE                                                                       |
+| |  output: count(*)                                                                |
+| |                                                                                  |
+<strong class="ph b">| 00:SCAN HBASE [hbase.hbase_table]                                                  |</strong>
+<strong class="ph b">|    predicates: cust_id = private_email_address                                    |</strong>
++------------------------------------------------------------------------------------+
+</code></pre>
+
+        <p class="p">
+          Currently, tests on the row key using <code class="ph codeph">OR</code> or <code class="ph codeph">IN</code> clauses are not
+          optimized into direct lookups either. Such limitations might be lifted in the future, so always check the
+          <code class="ph codeph">EXPLAIN</code> output to be sure whether a particular SQL construct results in an efficient
+          query or not for HBase tables.
+        </p>
+
+<pre class="pre codeblock"><code>explain select count(*) from hbase_table where
+  cust_id = 'some_user@example.com' or cust_id = 'other_user@example.com';
++----------------------------------------------------------------------------------------+
+| Explain String                                                                         |
++----------------------------------------------------------------------------------------+
+...
+
+| 01:AGGREGATE                                                                           |
+| |  output: count(*)                                                                    |
+| |                                                                                      |
+<strong class="ph b">| 00:SCAN HBASE [hbase.hbase_table]                                                      |</strong>
+<strong class="ph b">|    predicates: cust_id = 'some_user@example.com' OR cust_id = 'other_user@example.com' |</strong>
++----------------------------------------------------------------------------------------+
+
+explain select count(*) from hbase_table where
+  cust_id in ('some_user@example.com', 'other_user@example.com');
++------------------------------------------------------------------------------------+
+| Explain String                                                                     |
++------------------------------------------------------------------------------------+
+...
+
+| 01:AGGREGATE                                                                       |
+| |  output: count(*)                                                                |
+| |                                                                                  |
+<strong class="ph b">| 00:SCAN HBASE [hbase.hbase_table]                                                  |</strong>
+<strong class="ph b">|    predicates: cust_id IN ('some_user@example.com', 'other_user@example.com')      |</strong>
++------------------------------------------------------------------------------------+
+</code></pre>
+
+        <p class="p">
+          Either rewrite into separate queries for each value and combine the results in the application, or
+          combine the single-row queries using UNION ALL:
+        </p>
+
+<pre class="pre codeblock"><code>select count(*) from hbase_table where cust_id = 'some_user@example.com';
+select count(*) from hbase_table where cust_id = 'other_user@example.com';
+
+explain
+  select count(*) from hbase_table where cust_id = 'some_user@example.com'
+  union all
+  select count(*) from hbase_table where cust_id = 'other_user@example.com';
++------------------------------------------------------------------------------------+
+| Explain String                                                                     |
++------------------------------------------------------------------------------------+
+...
+
+| |  04:AGGREGATE                                                                    |
+| |  |  output: count(*)                                                             |
+| |  |                                                                               |
+<strong class="ph b">| |  03:SCAN HBASE [hbase.hbase_table]                                               |</strong>
+<strong class="ph b">| |     start key: other_user@example.com                                            |</strong>
+<strong class="ph b">| |     stop key: other_user@example.com\0                                           |</strong>
+| |                                                                                  |
+| 10:MERGE                                                                           |
+...
+
+| 02:AGGREGATE                                                                       |
+| |  output: count(*)                                                                |
+| |                                                                                  |
+<strong class="ph b">| 01:SCAN HBASE [hbase.hbase_table]                                                  |</strong>
+<strong class="ph b">|    start key: some_user@example.com                                                |</strong>
+<strong class="ph b">|    stop key: some_user@example.com\0                                               |</strong>
++------------------------------------------------------------------------------------+
+</code></pre>
+
+      </div>
+
+      <div class="example"><h3 class="title sectiontitle">Configuration Options for Java HBase Applications</h3>
+
+        
+
+        <p class="p"> If you have an HBase Java application that calls the
+            <code class="ph codeph">setCacheBlocks</code> or <code class="ph codeph">setCaching</code>
+          methods of the class <a class="xref" href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html" target="_blank">org.apache.hadoop.hbase.client.Scan</a>, you can set these same
+          caching behaviors through Impala query options, to control the memory
+          pressure on the HBase RegionServer. For example, when doing queries in
+          HBase that result in full-table scans (which by default are
+          inefficient for HBase), you can reduce memory usage and speed up the
+          queries by turning off the <code class="ph codeph">HBASE_CACHE_BLOCKS</code> setting
+          and specifying a large number for the <code class="ph codeph">HBASE_CACHING</code>
+          setting.
+        </p>
+
+        <p class="p">
+          To set these options, issue commands like the following in <span class="keyword cmdname">impala-shell</span>:
+        </p>
+
+<pre class="pre codeblock"><code>-- Same as calling setCacheBlocks(true) or setCacheBlocks(false).
+set hbase_cache_blocks=true;
+set hbase_cache_blocks=false;
+
+-- Same as calling setCaching(rows).
+set hbase_caching=1000;
+</code></pre>
+
+        <p class="p">
+          Or update the <span class="keyword cmdname">impalad</span> defaults file <span class="ph filepath">/etc/default/impala</span> and
+          include settings for <code class="ph codeph">HBASE_CACHE_BLOCKS</code> and/or <code class="ph codeph">HBASE_CACHING</code> in the
+          <code class="ph codeph">-default_query_options</code> setting for <code class="ph codeph">IMPALA_SERVER_ARGS</code>. See
+          <a class="xref" href="impala_config_options.html#config_options">Modifying Impala Startup Options</a> for details.
+        </p>
+
+        <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+          In Impala 2.0 and later, these options are settable through the JDBC or ODBC interfaces using the
+          <code class="ph codeph">SET</code> statement.
+        </div>
+
+      </div>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title6" id="impala_hbase__hbase_scenarios">
+
+    <h2 class="title topictitle2" id="ariaid-title6">Use Cases for Querying HBase through Impala</h2>
+    
+
+    <div class="body conbody">
+
+      <p class="p">
+        The following are popular use cases for using Impala to query HBase tables:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          Keeping large fact tables in Impala, and smaller dimension tables in HBase. The fact tables use Parquet
+          or other binary file format optimized for scan operations. Join queries scan through the large Impala
+          fact tables, and cross-reference the dimension tables using efficient single-row lookups in HBase.
+        </li>
+
+        <li class="li">
+          Using HBase to store rapidly incrementing counters, such as how many times a web page has been viewed, or
+          on a social network, how many connections a user has or how many votes a post received. HBase is
+          efficient for capturing such changeable data: the append-only storage mechanism is efficient for writing
+          each change to disk, and a query always returns the latest value. An application could query specific
+          totals like these from HBase, and combine the results with a broader set of data queried from Impala.
+        </li>
+
+        <li class="li">
+          <p class="p">
+            Storing very wide tables in HBase. Wide tables have many columns, possibly thousands, typically
+            recording many attributes for an important subject such as a user of an online service. These tables
+            are also often sparse, that is, most of the columns values are <code class="ph codeph">NULL</code>, 0,
+            <code class="ph codeph">false</code>, empty string, or other blank or placeholder value. (For example, any particular
+            web site user might have never used some site feature, filled in a certain field in their profile,
+            visited a particular part of the site, and so on.) A typical query against this kind of table is to
+            look up a single row to retrieve all the information about a specific subject, rather than summing,
+            averaging, or filtering millions of rows as in typical Impala-managed tables.
+          </p>
+          <p class="p">
+            Or the HBase table could be joined with a larger Impala-managed table. For example, analyze the large
+            Impala table representing web traffic for a site and pick out 50 users who view the most pages. Join
+            that result with the wide user table in HBase to look up attributes of those users. The HBase side of
+            the join would result in 50 efficient single-row lookups in HBase, rather than scanning the entire user
+            table.
+          </p>
+        </li>
+      </ul>
+    </div>
+  </article>
+
+  
+
+  
+
+  
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title7" id="impala_hbase__hbase_loading">
+
+    <h2 class="title topictitle2" id="ariaid-title7">Loading Data into an HBase Table</h2>
+  
+
+    <div class="body conbody">
+
+      <p class="p">
+        The Impala <code class="ph codeph">INSERT</code> statement works for HBase tables. The <code class="ph codeph">INSERT ... VALUES</code>
+        syntax is ideally suited to HBase tables, because inserting a single row is an efficient operation for an
+        HBase table. (For regular Impala tables, with data files in HDFS, the tiny data files produced by
+        <code class="ph codeph">INSERT ... VALUES</code> are extremely inefficient, so you would not use that technique with
+        tables containing any significant data volume.)
+      </p>
+
+      
+
+      <p class="p">
+        When you use the <code class="ph codeph">INSERT ... SELECT</code> syntax, the result in the HBase table could be fewer
+        rows than you expect. HBase only stores the most recent version of each unique row key, so if an
+        <code class="ph codeph">INSERT ... SELECT</code> statement copies over multiple rows containing the same value for the
+        key column, subsequent queries will only return one row with each key column value:
+      </p>
+
+      <p class="p">
+        Although Impala does not have an <code class="ph codeph">UPDATE</code> statement, you can achieve the same effect by
+        doing successive <code class="ph codeph">INSERT</code> statements using the same value for the key column each time:
+      </p>
+
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title8" id="impala_hbase__hbase_limitations">
+
+    <h2 class="title topictitle2" id="ariaid-title8">Limitations and Restrictions of the Impala and HBase Integration</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        The Impala integration with HBase has the following limitations and restrictions, some inherited from the
+        integration between HBase and Hive, and some unique to Impala:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          <p class="p">
+            If you issue a <code class="ph codeph">DROP TABLE</code> for an internal (Impala-managed) table that is mapped to an
+            HBase table, the underlying table is not removed in HBase. The Hive <code class="ph codeph">DROP TABLE</code>
+            statement also removes the HBase table in this case.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            The <code class="ph codeph">INSERT OVERWRITE</code> statement is not available for HBase tables. You can insert new
+            data, or modify an existing row by inserting a new row with the same key value, but not replace the
+            entire contents of the table. You can do an <code class="ph codeph">INSERT OVERWRITE</code> in Hive if you need this
+            capability.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            If you issue a <code class="ph codeph">CREATE TABLE LIKE</code> statement for a table mapped to an HBase table, the
+            new table is also an HBase table, but inherits the same underlying HBase table name as the original.
+            The new table is effectively an alias for the old one, not a new table with identical column structure.
+            Avoid using <code class="ph codeph">CREATE TABLE LIKE</code> for HBase tables, to avoid any confusion.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            Copying data into an HBase table using the Impala <code class="ph codeph">INSERT ... SELECT</code> syntax might
+            produce fewer new rows than are in the query result set. If the result set contains multiple rows with
+            the same value for the key column, each row supercedes any previous rows with the same key value.
+            Because the order of the inserted rows is unpredictable, you cannot rely on this technique to preserve
+            the <span class="q">"latest"</span> version of a particular key value.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            Because the complex data types (<code class="ph codeph">ARRAY</code>, <code class="ph codeph">STRUCT</code>, and <code class="ph codeph">MAP</code>)
+            available in <span class="keyword">Impala 2.3</span> and higher are currently only supported in Parquet tables, you cannot
+            use these types in HBase tables that are queried through Impala.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+        The <code class="ph codeph">LOAD DATA</code> statement cannot be used with HBase tables.
+      </p>
+        </li>
+      </ul>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title9" id="impala_hbase__hbase_queries">
+
+    <h2 class="title topictitle2" id="ariaid-title9">Examples of Querying HBase Tables from Impala</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        The following examples create an HBase table with four column families,
+        create a corresponding table through Hive,
+        then insert and query the table through Impala.
+      </p>
+      <p class="p">
+        In HBase shell, the table
+        name is quoted in <code class="ph codeph">CREATE</code> and <code class="ph codeph">DROP</code> statements. Tables created in HBase
+        begin in <span class="q">"enabled"</span> state; before dropping them through the HBase shell, you must issue a
+        <code class="ph codeph">disable '<var class="keyword varname">table_name</var>'</code> statement.
+      </p>
+
+<pre class="pre codeblock"><code>$ hbase shell
+15/02/10 16:07:45
+HBase Shell; enter 'help&lt;RETURN&gt;' for list of supported commands.
+Type "exit&lt;RETURN&gt;" to leave the HBase Shell
+...
+
+hbase(main):001:0&gt; create 'hbasealltypessmall', 'boolsCF', 'intsCF', 'floatsCF', 'stringsCF'
+0 row(s) in 4.6520 seconds
+
+=&gt; Hbase::Table - hbasealltypessmall
+hbase(main):006:0&gt; quit
+</code></pre>
+
+        <p class="p">
+          Issue the following <code class="ph codeph">CREATE TABLE</code> statement in the Hive shell. (The Impala <code class="ph codeph">CREATE
+          TABLE</code> statement currently does not support the <code class="ph codeph">STORED BY</code> clause, so you switch into Hive to
+          create the table, then back to Impala and the <span class="keyword cmdname">impala-shell</span> interpreter to issue the
+          queries.)
+        </p>
+
+        <p class="p">
+          This example creates an external table mapped to the HBase table, usable by both Impala and Hive. It is
+          defined as an external table so that when dropped by Impala or Hive, the original HBase table is not touched at all.
+        </p>
+
+        <p class="p">
+          The <code class="ph codeph">WITH SERDEPROPERTIES</code> clause
+          specifies that the first column (<code class="ph codeph">ID</code>) represents the row key, and maps the remaining
+          columns of the SQL table to HBase column families. The mapping relies on the ordinal order of the
+          columns in the table, not the column names in the <code class="ph codeph">CREATE TABLE</code> statement.
+          The first column is defined to be the lookup key; the
+          <code class="ph codeph">STRING</code> data type produces the fastest key-based lookups for HBase tables.
+        </p>
+
+        <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+          For Impala with HBase tables, the most important aspect to ensure good performance is to use a
+          <code class="ph codeph">STRING</code> column as the row key, as shown in this example.
+        </div>
+
+<pre class="pre codeblock"><code>$ hive
+...
+hive&gt; use hbase;
+OK
+Time taken: 4.095 seconds
+hive&gt; CREATE EXTERNAL TABLE hbasestringids (
+    &gt;   id string,
+    &gt;   bool_col boolean,
+    &gt;   tinyint_col tinyint,
+    &gt;   smallint_col smallint,
+    &gt;   int_col int,
+    &gt;   bigint_col bigint,
+    &gt;   float_col float,
+    &gt;   double_col double,
+    &gt;   date_string_col string,
+    &gt;   string_col string,
+    &gt;   timestamp_col timestamp)
+    &gt; STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
+    &gt; WITH SERDEPROPERTIES (
+    &gt;   "hbase.columns.mapping" =
+    &gt;   ":key,boolsCF:bool_col,intsCF:tinyint_col,intsCF:smallint_col,intsCF:int_col,intsCF:\
+    &gt;   bigint_col,floatsCF:float_col,floatsCF:double_col,stringsCF:date_string_col,\
+    &gt;   stringsCF:string_col,stringsCF:timestamp_col"
+    &gt; )
+    &gt; TBLPROPERTIES("hbase.table.name" = "hbasealltypessmall");
+OK
+Time taken: 2.879 seconds
+hive&gt; quit;
+</code></pre>
+
+        <p class="p">
+          Once you have established the mapping to an HBase table, you can issue DML statements and queries
+          from Impala. The following example shows a series of <code class="ph codeph">INSERT</code>
+          statements followed by a query.
+          The ideal kind of query from a performance standpoint
+          retrieves a row from the table based on a row key
+          mapped to a string column.
+          An initial <code class="ph codeph">INVALIDATE METADATA <var class="keyword varname">table_name</var></code>
+          statement makes the table created through Hive visible to Impala.
+        </p>
+
+<pre class="pre codeblock"><code>$ impala-shell -i localhost -d hbase
+Starting Impala Shell without Kerberos authentication
+Connected to localhost:21000
+...
+Query: use `hbase`
+[localhost:21000] &gt; invalidate metadata hbasestringids;
+Fetched 0 row(s) in 0.09s
+[localhost:21000] &gt; desc hbasestringids;
++-----------------+-----------+---------+
+| name            | type      | comment |
++-----------------+-----------+---------+
+| id              | string    |         |
+| bool_col        | boolean   |         |
+| double_col      | double    |         |
+| float_col       | float     |         |
+| bigint_col      | bigint    |         |
+| int_col         | int       |         |
+| smallint_col    | smallint  |         |
+| tinyint_col     | tinyint   |         |
+| date_string_col | string    |         |
+| string_col      | string    |         |
+| timestamp_col   | timestamp |         |
++-----------------+-----------+---------+
+Fetched 11 row(s) in 0.02s
+[localhost:21000] &gt; insert into hbasestringids values ('0001',true,3.141,9.94,1234567,32768,4000,76,'2014-12-31','Hello world',now());
+Inserted 1 row(s) in 0.26s
+[localhost:21000] &gt; insert into hbasestringids values ('0002',false,2.004,6.196,1500,8000,129,127,'2014-01-01','Foo bar',now());
+Inserted 1 row(s) in 0.12s
+[localhost:21000] &gt; select * from hbasestringids where id = '0001';
++------+----------+------------+-------------------+------------+---------+--------------+-------------+-----------------+-------------+-------------------------------+
+| id   | bool_col | double_col | float_col         | bigint_col | int_col | smallint_col | tinyint_col | date_string_col | string_col  | timestamp_col                 |
++------+----------+------------+-------------------+------------+---------+--------------+-------------+-----------------+-------------+-------------------------------+
+| 0001 | true     | 3.141      | 9.939999580383301 | 1234567    | 32768   | 4000         | 76          | 2014-12-31      | Hello world | 2015-02-10 16:36:59.764838000 |
++------+----------+------------+-------------------+------------+---------+--------------+-------------+-----------------+-------------+-------------------------------+
+Fetched 1 row(s) in 0.54s
+</code></pre>
+
+        <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+        After you create a table in Hive, such as the HBase mapping table in this example, issue an
+        <code class="ph codeph">INVALIDATE METADATA <var class="keyword varname">table_name</var></code> statement the next time you connect to
+        Impala, make Impala aware of the new table. (Prior to Impala 1.2.4, you could not specify the table name if
+        Impala was not aware of the table yet; in Impala 1.2.4 and higher, specifying the table name avoids
+        reloading the metadata for other tables that are not changed.)
+      </div>
+    </div>
+  </article>
+</article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_hbase_cache_blocks.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_hbase_cache_blocks.html b/docs/build/html/topics/impala_hbase_cache_blocks.html
new file mode 100644
index 0000000..ac76539
--- /dev/null
+++ b/docs/build/html/topics/impala_hbase_cache_blocks.html
@@ -0,0 +1,36 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="hbase_cache_blocks"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>HBASE_CACHE_BLOCKS Query Option</title></head><body id="hbase_cache_blocks"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">HBASE_CACHE_BLOCKS Query Option</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      
+      Setting this option is equivalent to calling the
+        <code class="ph codeph">setCacheBlocks</code> method of the class <a class="xref" href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html" target="_blank">org.apache.hadoop.hbase.client.Scan</a>, in an HBase Java
+      application. Helps to control the memory pressure on the HBase
+      RegionServer, in conjunction with the <code class="ph codeph">HBASE_CACHING</code> query
+      option. </p>
+
+    <p class="p">
+        <strong class="ph b">Type:</strong> Boolean; recognized values are 1 and 0, or <code class="ph codeph">true</code> and <code class="ph codeph">false</code>;
+        any other value interpreted as <code class="ph codeph">false</code>
+      </p>
+    <p class="p">
+        <strong class="ph b">Default:</strong> <code class="ph codeph">false</code> (shown as 0 in output of <code class="ph codeph">SET</code> statement)
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+    <p class="p">
+      <a class="xref" href="impala_hbase.html#impala_hbase">Using Impala to Query HBase Tables</a>,
+      <a class="xref" href="impala_hbase_caching.html#hbase_caching">HBASE_CACHING Query Option</a>
+    </p>
+
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_hbase_caching.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_hbase_caching.html b/docs/build/html/topics/impala_hbase_caching.html
new file mode 100644
index 0000000..f78d0d5
--- /dev/null
+++ b/docs/build/html/topics/impala_hbase_caching.html
@@ -0,0 +1,36 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="hbase_caching"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>HBASE_CACHING Query Option</title></head><body id="hbase_caching"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">HBASE_CACHING Query Option</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      
+      Setting this option is equivalent to calling the
+        <code class="ph codeph">setCaching</code> method of the class <a class="xref" href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html" target="_blank">org.apache.hadoop.hbase.client.Scan</a>, in an HBase Java
+      application. Helps to control the memory pressure on the HBase
+      RegionServer, in conjunction with the <code class="ph codeph">HBASE_CACHE_BLOCKS</code>
+      query option. </p>
+
+    <p class="p">
+      <strong class="ph b">Type:</strong> <code class="ph codeph">BOOLEAN</code>
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Default:</strong> 0
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+    <p class="p">
+      <a class="xref" href="impala_hbase.html#impala_hbase">Using Impala to Query HBase Tables</a>,
+      <a class="xref" href="impala_hbase_cache_blocks.html#hbase_cache_blocks">HBASE_CACHE_BLOCKS Query Option</a>
+    </p>
+
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_hints.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_hints.html b/docs/build/html/topics/impala_hints.html
new file mode 100644
index 0000000..3b0c81d
--- /dev/null
+++ b/docs/build/html/topics/impala_hints.html
@@ -0,0 +1,306 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_select.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="hints"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Query Hints in Impala SELECT Statements</title></head><body id="hints"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Query Hints in Impala SELECT Statements</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      
+      The Impala SQL dialect supports query hints, for fine-tuning the inner workings of queries. Specify hints as
+      a temporary workaround for expensive queries, where missing statistics or other factors cause inefficient
+      performance.
+    </p>
+
+    <p class="p">
+      Hints are most often used for the most resource-intensive kinds of Impala queries:
+    </p>
+
+    <ul class="ul">
+      <li class="li">
+        Join queries involving large tables, where intermediate result sets are transmitted across the network to
+        evaluate the join conditions.
+      </li>
+
+      <li class="li">
+        Inserting into partitioned Parquet tables, where many memory buffers could be allocated on each host to
+        hold intermediate results for each partition.
+      </li>
+    </ul>
+
+    <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+    <p class="p">
+      You can represent the hints as keywords surrounded by <code class="ph codeph">[]</code> square brackets; include the
+      brackets in the text of the SQL statement.
+    </p>
+
+<pre class="pre codeblock"><code>SELECT STRAIGHT_JOIN <var class="keyword varname">select_list</var> FROM
+<var class="keyword varname">join_left_hand_table</var>
+  JOIN [{BROADCAST|SHUFFLE}]
+<var class="keyword varname">join_right_hand_table</var>
+<var class="keyword varname">remainder_of_query</var>;
+
+INSERT <var class="keyword varname">insert_clauses</var>
+  [{SHUFFLE|NOSHUFFLE}]
+  SELECT <var class="keyword varname">remainder_of_query</var>;
+</code></pre>
+
+    <p class="p">
+      In <span class="keyword">Impala 2.0</span> and higher, you can also specify the hints inside comments that use
+      either the <code class="ph codeph">/* */</code> or <code class="ph codeph">--</code> notation. Specify a <code class="ph codeph">+</code> symbol
+      immediately before the hint name.
+    </p>
+
+<pre class="pre codeblock"><code>SELECT STRAIGHT_JOIN <var class="keyword varname">select_list</var> FROM
+<var class="keyword varname">join_left_hand_table</var>
+  JOIN /* +BROADCAST|SHUFFLE */
+<var class="keyword varname">join_right_hand_table</var>
+<var class="keyword varname">remainder_of_query</var>;
+
+SELECT <var class="keyword varname">select_list</var> FROM
+<var class="keyword varname">join_left_hand_table</var>
+  JOIN -- +BROADCAST|SHUFFLE
+<var class="keyword varname">join_right_hand_table</var>
+<var class="keyword varname">remainder_of_query</var>;
+
+INSERT <var class="keyword varname">insert_clauses</var>
+  /* +SHUFFLE|NOSHUFFLE */
+  SELECT <var class="keyword varname">remainder_of_query</var>;
+
+INSERT <var class="keyword varname">insert_clauses</var>
+  -- +SHUFFLE|NOSHUFFLE
+  SELECT <var class="keyword varname">remainder_of_query</var>;
+</code></pre>
+
+    <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+    <p class="p">
+      With both forms of hint syntax, include the <code class="ph codeph">STRAIGHT_JOIN</code>
+      keyword immediately after the <code class="ph codeph">SELECT</code> keyword to prevent Impala from
+      reordering the tables in a way that makes the hint ineffective.
+    </p>
+
+    <p class="p">
+      To reduce the need to use hints, run the <code class="ph codeph">COMPUTE STATS</code> statement against all tables involved
+      in joins, or used as the source tables for <code class="ph codeph">INSERT ... SELECT</code> operations where the
+      destination is a partitioned Parquet table. Do this operation after loading data or making substantial
+      changes to the data within each table. Having up-to-date statistics helps Impala choose more efficient query
+      plans without the need for hinting. See <a class="xref" href="impala_perf_stats.html#perf_stats">Table and Column Statistics</a> for details and
+      examples.
+    </p>
+
+    <p class="p">
+      To see which join strategy is used for a particular query, examine the <code class="ph codeph">EXPLAIN</code> output for
+      that query. See <a class="xref" href="impala_explain_plan.html#perf_explain">Using the EXPLAIN Plan for Performance Tuning</a> for details and examples.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Hints for join queries:</strong>
+    </p>
+
+    <p class="p">
+      The <code class="ph codeph">[BROADCAST]</code> and <code class="ph codeph">[SHUFFLE]</code> hints control the execution strategy for join
+      queries. Specify one of the following constructs immediately after the <code class="ph codeph">JOIN</code> keyword in a
+      query:
+    </p>
+
+    <ul class="ul">
+      <li class="li">
+        <code class="ph codeph">[SHUFFLE]</code> - Makes that join operation use the <span class="q">"partitioned"</span> technique, which divides
+        up corresponding rows from both tables using a hashing algorithm, sending subsets of the rows to other
+        nodes for processing. (The keyword <code class="ph codeph">SHUFFLE</code> is used to indicate a <span class="q">"partitioned join"</span>,
+        because that type of join is not related to <span class="q">"partitioned tables"</span>.) Since the alternative
+        <span class="q">"broadcast"</span> join mechanism is the default when table and index statistics are unavailable, you might
+        use this hint for queries where broadcast joins are unsuitable; typically, partitioned joins are more
+        efficient for joins between large tables of similar size.
+      </li>
+
+      <li class="li">
+        <code class="ph codeph">[BROADCAST]</code> - Makes that join operation use the <span class="q">"broadcast"</span> technique that sends the
+        entire contents of the right-hand table to all nodes involved in processing the join. This is the default
+        mode of operation when table and index statistics are unavailable, so you would typically only need it if
+        stale metadata caused Impala to mistakenly choose a partitioned join operation. Typically, broadcast joins
+        are more efficient in cases where one table is much smaller than the other. (Put the smaller table on the
+        right side of the <code class="ph codeph">JOIN</code> operator.)
+      </li>
+    </ul>
+
+    <p class="p">
+      <strong class="ph b">Hints for INSERT ... SELECT queries:</strong>
+    </p>
+
+    <div class="p">
+        When inserting into partitioned tables, especially using the Parquet file format, you can include a hint in
+        the <code class="ph codeph">INSERT</code> statement to fine-tune the overall performance of the operation and its
+        resource usage:
+        <ul class="ul">
+          <li class="li">
+            These hints are available in Impala 1.2.2 and higher.
+          </li>
+
+          <li class="li">
+            You would only use these hints if an <code class="ph codeph">INSERT</code> into a partitioned Parquet table was
+            failing due to capacity limits, or if such an <code class="ph codeph">INSERT</code> was succeeding but with
+            less-than-optimal performance.
+          </li>
+
+          <li class="li">
+            To use these hints, put the hint keyword <code class="ph codeph">[SHUFFLE]</code> or <code class="ph codeph">[NOSHUFFLE]</code>
+            (including the square brackets) after the <code class="ph codeph">PARTITION</code> clause, immediately before the
+            <code class="ph codeph">SELECT</code> keyword.
+          </li>
+
+          <li class="li">
+            <code class="ph codeph">[SHUFFLE]</code> selects an execution plan that minimizes the number of files being written
+            simultaneously to HDFS, and the number of memory buffers holding data for individual partitions. Thus
+            it reduces overall resource usage for the <code class="ph codeph">INSERT</code> operation, allowing some
+            <code class="ph codeph">INSERT</code> operations to succeed that otherwise would fail. It does involve some data
+            transfer between the nodes so that the data files for a particular partition are all constructed on the
+            same node.
+          </li>
+
+          <li class="li">
+            <code class="ph codeph">[NOSHUFFLE]</code> selects an execution plan that might be faster overall, but might also
+            produce a larger number of small data files or exceed capacity limits, causing the
+            <code class="ph codeph">INSERT</code> operation to fail. Use <code class="ph codeph">[SHUFFLE]</code> in cases where an
+            <code class="ph codeph">INSERT</code> statement fails or runs inefficiently due to all nodes attempting to construct
+            data for all partitions.
+          </li>
+
+          <li class="li">
+            Impala automatically uses the <code class="ph codeph">[SHUFFLE]</code> method if any partition key column in the
+            source table, mentioned in the <code class="ph codeph">INSERT ... SELECT</code> query, does not have column
+            statistics. In this case, only the <code class="ph codeph">[NOSHUFFLE]</code> hint would have any effect.
+          </li>
+
+          <li class="li">
+            If column statistics are available for all partition key columns in the source table mentioned in the
+            <code class="ph codeph">INSERT ... SELECT</code> query, Impala chooses whether to use the <code class="ph codeph">[SHUFFLE]</code>
+            or <code class="ph codeph">[NOSHUFFLE]</code> technique based on the estimated number of distinct values in those
+            columns and the number of nodes involved in the <code class="ph codeph">INSERT</code> operation. In this case, you
+            might need the <code class="ph codeph">[SHUFFLE]</code> or the <code class="ph codeph">[NOSHUFFLE]</code> hint to override the
+            execution plan selected by Impala.
+          </li>
+        </ul>
+      </div>
+
+    <p class="p">
+      <strong class="ph b">Suggestions versus directives:</strong>
+    </p>
+
+    <p class="p">
+      In early Impala releases, hints were always obeyed and so acted more like directives. Once Impala gained join
+      order optimizations, sometimes join queries were automatically reordered in a way that made a hint
+      irrelevant. Therefore, the hints act more like suggestions in Impala 1.2.2 and higher.
+    </p>
+
+    <p class="p">
+      To force Impala to follow the hinted execution mechanism for a join query, include the
+      <code class="ph codeph">STRAIGHT_JOIN</code> keyword in the <code class="ph codeph">SELECT</code> statement. See
+      <a class="xref" href="impala_perf_joins.html#straight_join">Overriding Join Reordering with STRAIGHT_JOIN</a> for details. When you use this technique, Impala does not
+      reorder the joined tables at all, so you must be careful to arrange the join order to put the largest table
+      (or subquery result set) first, then the smallest, second smallest, third smallest, and so on. This ordering lets Impala do the
+      most I/O-intensive parts of the query using local reads on the DataNodes, and then reduce the size of the
+      intermediate result set as much as possible as each subsequent table or subquery result set is joined.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Restrictions:</strong>
+      </p>
+
+    <p class="p">
+      Queries that include subqueries in the <code class="ph codeph">WHERE</code> clause can be rewritten internally as join
+      queries. Currently, you cannot apply hints to the joins produced by these types of queries.
+    </p>
+
+    <p class="p">
+      Because hints can prevent queries from taking advantage of new metadata or improvements in query planning,
+      use them only when required to work around performance issues, and be prepared to remove them when they are
+      no longer required, such as after a new Impala release or bug fix.
+    </p>
+
+    <p class="p">
+      In particular, the <code class="ph codeph">[BROADCAST]</code> and <code class="ph codeph">[SHUFFLE]</code> hints are expected to be
+      needed much less frequently in Impala 1.2.2 and higher, because the join order optimization feature in
+      combination with the <code class="ph codeph">COMPUTE STATS</code> statement now automatically choose join order and join
+      mechanism without the need to rewrite the query and add hints. See
+      <a class="xref" href="impala_perf_joins.html#perf_joins">Performance Considerations for Join Queries</a> for details.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Compatibility:</strong>
+      </p>
+
+    <p class="p">
+      The hints embedded within <code class="ph codeph">--</code> comments are compatible with Hive queries. The hints embedded
+      within <code class="ph codeph">/* */</code> comments or <code class="ph codeph">[ ]</code> square brackets are not recognized by or not
+      compatible with Hive. For example, Hive raises an error for Impala hints within <code class="ph codeph">/* */</code>
+      comments because it does not recognize the Impala hint names.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Considerations for views:</strong>
+      </p>
+
+    <p class="p">
+      If you use a hint in the query that defines a view, the hint is preserved when you query the view. Impala
+      internally rewrites all hints in views to use the <code class="ph codeph">--</code> comment notation, so that Hive can
+      query such views without errors due to unrecognized hint names.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+    <p class="p">
+      For example, this query joins a large customer table with a small lookup table of less than 100 rows. The
+      right-hand table can be broadcast efficiently to all nodes involved in the join. Thus, you would use the
+      <code class="ph codeph">[broadcast]</code> hint to force a broadcast join strategy:
+    </p>
+
+<pre class="pre codeblock"><code>select straight_join customer.address, state_lookup.state_name
+  from customer join <strong class="ph b">[broadcast]</strong> state_lookup
+  on customer.state_id = state_lookup.state_id;</code></pre>
+
+    <p class="p">
+      This query joins two large tables of unpredictable size. You might benchmark the query with both kinds of
+      hints and find that it is more efficient to transmit portions of each table to other nodes for processing.
+      Thus, you would use the <code class="ph codeph">[shuffle]</code> hint to force a partitioned join strategy:
+    </p>
+
+<pre class="pre codeblock"><code>select straight_join weather.wind_velocity, geospatial.altitude
+  from weather join <strong class="ph b">[shuffle]</strong> geospatial
+  on weather.lat = geospatial.lat and weather.long = geospatial.long;</code></pre>
+
+    <p class="p">
+      For joins involving three or more tables, the hint applies to the tables on either side of that specific
+      <code class="ph codeph">JOIN</code> keyword. The <code class="ph codeph">STRAIGHT_JOIN</code> keyword ensures that joins are processed
+      in a predictable order from left to right. For example, this query joins
+      <code class="ph codeph">t1</code> and <code class="ph codeph">t2</code> using a partitioned join, then joins that result set to
+      <code class="ph codeph">t3</code> using a broadcast join:
+    </p>
+
+<pre class="pre codeblock"><code>select straight_join t1.name, t2.id, t3.price
+  from t1 join <strong class="ph b">[shuffle]</strong> t2 join <strong class="ph b">[broadcast]</strong> t3
+  on t1.id = t2.id and t2.id = t3.id;</code></pre>
+
+    
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+    <p class="p">
+      For more background information about join queries, see <a class="xref" href="impala_joins.html#joins">Joins in Impala SELECT Statements</a>. For
+      performance considerations, see <a class="xref" href="impala_perf_joins.html#perf_joins">Performance Considerations for Join Queries</a>.
+    </p>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_select.html">SELECT Statement</a></div></div></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_identifiers.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_identifiers.html b/docs/build/html/topics/impala_identifiers.html
new file mode 100644
index 0000000..3a46f8e
--- /dev/null
+++ b/docs/build/html/topics/impala_identifiers.html
@@ -0,0 +1,110 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_schema_objects.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="identifiers"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Overview of Impala Identifiers</title></head><body id="identifiers"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Overview of Impala Identifiers</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      Identifiers are the names of databases, tables, or columns that you specify in a SQL statement. The rules for
+      identifiers govern what names you can give to things you create, the notation for referring to names
+      containing unusual characters, and other aspects such as case sensitivity.
+    </p>
+
+    <ul class="ul">
+      <li class="li">
+        <p class="p">
+        The minimum length of an identifier is 1 character.
+        </p>
+      </li>
+
+      <li class="li">
+        <p class="p">
+        The maximum length of an identifier is currently 128 characters, enforced by the metastore database.
+        </p>
+      </li>
+
+      <li class="li">
+        <p class="p">
+        An identifier must start with an alphabetic character. The remainder can contain any combination of
+        alphanumeric characters and underscores. Quoting the identifier with backticks has no effect on the allowed
+        characters in the name.
+        </p>
+      </li>
+
+      <li class="li">
+        <p class="p">
+        An identifier can contain only ASCII characters.
+        </p>
+      </li>
+
+      <li class="li">
+        <p class="p">
+        To use an identifier name that matches one of the Impala reserved keywords (listed in
+        <a class="xref" href="impala_reserved_words.html#reserved_words">Impala Reserved Words</a>), surround the identifier with <code class="ph codeph">``</code>
+        characters (backticks). Quote the reserved word even if it is part of a fully qualified name.
+        The following example shows how a reserved word can be used as a column name if it is quoted
+        with backticks in the <code class="ph codeph">CREATE TABLE</code> statement, and how the column name
+        must also be quoted with backticks in a query:
+        </p>
+<pre class="pre codeblock"><code>[localhost:21000] &gt; create table reserved (`data` string);
+
+[localhost:21000] &gt; select data from reserved;
+ERROR: AnalysisException: Syntax error in line 1:
+select data from reserved
+       ^
+Encountered: DATA
+Expected: ALL, CASE, CAST, DISTINCT, EXISTS, FALSE, IF, INTERVAL, NOT, NULL, STRAIGHT_JOIN, TRUE, IDENTIFIER
+CAUSED BY: Exception: Syntax error
+
+[localhost:21000] &gt; select reserved.data from reserved;
+ERROR: AnalysisException: Syntax error in line 1:
+select reserved.data from reserved
+                ^
+Encountered: DATA
+Expected: IDENTIFIER
+CAUSED BY: Exception: Syntax error
+
+[localhost:21000] &gt; select reserved.`data` from reserved;
+
+[localhost:21000] &gt;
+</code></pre>
+
+        <div class="note important note_important"><span class="note__title importanttitle">Important:</span> 
+          Because the list of reserved words grows over time as new SQL syntax is added,
+          consider adopting coding conventions (especially for any automated scripts
+          or in packaged applications) to always quote all identifiers with backticks.
+          Quoting all identifiers protects your SQL from compatibility issues if
+          new reserved words are added in later releases.
+        </div>
+
+      </li>
+
+      <li class="li">
+        <p class="p">
+        Impala identifiers are always case-insensitive. That is, tables named <code class="ph codeph">t1</code> and
+        <code class="ph codeph">T1</code> always refer to the same table, regardless of quote characters. Internally, Impala
+        always folds all specified table and column names to lowercase. This is why the column headers in query
+        output are always displayed in lowercase.
+        </p>
+      </li>
+    </ul>
+
+    <p class="p">
+      See <a class="xref" href="impala_aliases.html#aliases">Overview of Impala Aliases</a> for how to define shorter or easier-to-remember aliases if the
+      original names are long or cryptic identifiers.
+      <span class="ph"> Aliases follow the same rules as identifiers when it comes to case
+        insensitivity. Aliases can be longer than identifiers (up to the maximum length of a Java string) and can
+        include additional characters such as spaces and dashes when they are quoted using backtick characters.
+        </span>
+    </p>
+
+    <p class="p">
+        Another way to define different names for the same tables or columns is to create views. See
+        <a class="xref" href="../shared/../topics/impala_views.html#views">Overview of Impala Views</a> for details.
+      </p>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_schema_objects.html">Impala Schema Objects and Object Names</a></div></div></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_impala_shell.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_impala_shell.html b/docs/build/html/topics/impala_impala_shell.html
new file mode 100644
index 0000000..020a588
--- /dev/null
+++ b/docs/build/html/topics/impala_impala_shell.html
@@ -0,0 +1,87 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_shell_options.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_connecting.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_shell_running_commands.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_shell_commands.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="impala_shell"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Using the Impala Shell (impala-shell Command)</title></head><body id=
 "impala_shell"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Using the Impala Shell (impala-shell Command)</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      
+      You can use the Impala shell tool (<code class="ph codeph">impala-shell</code>) to set up databases and tables, insert
+      data, and issue queries. For ad hoc queries and exploration, you can submit SQL statements in an interactive
+      session. To automate your work, you can specify command-line options to process a single statement or a
+      script file. The <span class="keyword cmdname">impala-shell</span> interpreter accepts all the same SQL statements listed in
+      <a class="xref" href="impala_langref_sql.html#langref_sql">Impala SQL Statements</a>, plus some shell-only commands that you can use for tuning
+      performance and diagnosing problems.
+    </p>
+
+    <p class="p">
+      The <code class="ph codeph">impala-shell</code> command fits into the familiar Unix toolchain:
+    </p>
+
+    <ul class="ul">
+      <li class="li">
+        The <code class="ph codeph">-q</code> option lets you issue a single query from the command line, without starting the
+        interactive interpreter. You could use this option to run <code class="ph codeph">impala-shell</code> from inside a shell
+        script or with the command invocation syntax from a Python, Perl, or other kind of script.
+      </li>
+
+      <li class="li">
+        The <code class="ph codeph">-f</code> option lets you process a file containing multiple SQL statements,
+        such as a set of reports or DDL statements to create a group of tables and views.
+      </li>
+
+      <li class="li">
+        The <code class="ph codeph">--var</code> option lets you pass substitution variables to the statements that
+        are executed by that <span class="keyword cmdname">impala-shell</span> session, for example the statements
+        in a script file processed by the <code class="ph codeph">-f</code> option. You encode the substitution variable
+        on the command line using the notation
+        <code class="ph codeph">--var=<var class="keyword varname">variable_name</var>=<var class="keyword varname">value</var></code>.
+        Within a SQL statement, you substitute the value by using the notation <code class="ph codeph">${var:<var class="keyword varname">variable_name</var>}</code>.
+        This feature is available in <span class="keyword">Impala 2.5</span> and higher.
+      </li>
+
+      <li class="li">
+        The <code class="ph codeph">-o</code> option lets you save query output to a file.
+      </li>
+
+      <li class="li">
+        The <code class="ph codeph">-B</code> option turns off pretty-printing, so that you can produce comma-separated,
+        tab-separated, or other delimited text files as output. (Use the <code class="ph codeph">--output_delimiter</code> option
+        to choose the delimiter character; the default is the tab character.)
+      </li>
+
+      <li class="li">
+        In non-interactive mode, query output is printed to <code class="ph codeph">stdout</code> or to the file specified by the
+        <code class="ph codeph">-o</code> option, while incidental output is printed to <code class="ph codeph">stderr</code>, so that you can
+        process just the query output as part of a Unix pipeline.
+      </li>
+
+      <li class="li">
+        In interactive mode, <code class="ph codeph">impala-shell</code> uses the <code class="ph codeph">readline</code> facility to recall
+        and edit previous commands.
+      </li>
+    </ul>
+
+    <p class="p">
+      For information on installing the Impala shell, see <a class="xref" href="impala_install.html#install">Installing Impala</a>.
+    </p>
+
+    <p class="p">
+      For information about establishing a connection to a DataNode running the <code class="ph codeph">impalad</code> daemon
+      through the <code class="ph codeph">impala-shell</code> command, see <a class="xref" href="impala_connecting.html#connecting">Connecting to impalad through impala-shell</a>.
+    </p>
+
+    <p class="p">
+      For a list of the <code class="ph codeph">impala-shell</code> command-line options, see
+      <a class="xref" href="impala_shell_options.html#shell_options">impala-shell Configuration Options</a>. For reference information about the
+      <code class="ph codeph">impala-shell</code> interactive commands, see
+      <a class="xref" href="impala_shell_commands.html#shell_commands">impala-shell Command Reference</a>.
+    </p>
+
+    <p class="p toc"></p>
+  </div>
+<nav role="navigation" class="related-links"><ul class="ullinks"><li class="link ulchildlink"><strong><a href="../topics/impala_shell_options.html">impala-shell Configuration Options</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_connecting.html">Connecting to impalad through impala-shell</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_shell_running_commands.html">Running Commands and SQL Statements in impala-shell</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_shell_commands.html">impala-shell Command Reference</a></strong><br></li></ul></nav></article></main></body></html>
\ No newline at end of file

[26/51] [partial] incubator-impala git commit: IMPALA-4181 [DOCS] Publish rendered Impala documentation to ASF site

Posted by jb...@apache.org.

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_kudu.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_kudu.html b/docs/build/html/topics/impala_kudu.html
new file mode 100644
index 0000000..78f6534
--- /dev/null
+++ b/docs/build/html/topics/impala_kudu.html
@@ -0,0 +1,1329 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="impala_kudu"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Using Impala to Query Kudu Tables</title></head><body id="impala_kudu"><main role="main"><article role="article" aria-labelledby="impala_kudu__kudu">
+
+  <h1 class="title topictitle1" id="impala_kudu__kudu">Using Impala to Query Kudu Tables</h1>
+
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      
+      You can use Impala to query tables stored by Apache Kudu. This capability
+      allows convenient access to a storage system that is tuned for different kinds of
+      workloads than the default with Impala.
+    </p>
+
+    <p class="p">
+      By default, Impala tables are stored on HDFS using data files with various file formats.
+      HDFS files are ideal for bulk loads (append operations) and queries using full-table scans,
+      but do not support in-place updates or deletes. Kudu is an alternative storage engine used
+      by Impala which can do both in-place updates (for mixed read/write workloads) and fast scans
+      (for data-warehouse/analytic operations). Using Kudu tables with Impala can simplify the
+      ETL pipeline by avoiding extra steps to segregate and reorganize newly arrived data.
+    </p>
+
+    <p class="p">
+      Certain Impala SQL statements and clauses, such as <code class="ph codeph">DELETE</code>,
+      <code class="ph codeph">UPDATE</code>, <code class="ph codeph">UPSERT</code>, and <code class="ph codeph">PRIMARY KEY</code> work
+      only with Kudu tables. Other statements and clauses, such as <code class="ph codeph">LOAD DATA</code>,
+      <code class="ph codeph">TRUNCATE TABLE</code>, and <code class="ph codeph">INSERT OVERWRITE</code>, are not applicable
+      to Kudu tables.
+    </p>
+
+    <p class="p toc inpage"></p>
+
+  </div>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title2" id="impala_kudu__kudu_benefits">
+
+    <h2 class="title topictitle2" id="ariaid-title2">Benefits of Using Kudu Tables with Impala</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        The combination of Kudu and Impala works best for tables where scan performance is
+        important, but data arrives continuously, in small batches, or needs to be updated
+        without being completely replaced. HDFS-backed tables can require substantial overhead
+        to replace or reorganize data files as new data arrives. Impala can perform efficient
+        lookups and scans within Kudu tables, and Impala can also perform update or
+        delete operations efficiently. You can also use the Kudu Java, C++, and Python APIs to
+        do ingestion or transformation operations outside of Impala, and Impala can query the
+        current data at any time.
+      </p>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="impala_kudu__kudu_config">
+
+    <h2 class="title topictitle2" id="ariaid-title3">Configuring Impala for Use with Kudu</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        The <code class="ph codeph">-kudu_master_hosts</code> configuration property must be set correctly
+        for the <span class="keyword cmdname">impalad</span> daemon, for <code class="ph codeph">CREATE TABLE ... STORED AS
+        KUDU</code> statements to connect to the appropriate Kudu server. Typically, the
+        required value for this setting is <code class="ph codeph"><var class="keyword varname">kudu_host</var>:7051</code>.
+        In a high-availability Kudu deployment, specify the names of multiple Kudu hosts separated by commas.
+      </p>
+
+      <p class="p">
+        If the <code class="ph codeph">-kudu_master_hosts</code> configuration property is not set, you can
+        still associate the appropriate value for each table by specifying a
+        <code class="ph codeph">TBLPROPERTIES('kudu.master_addresses')</code> clause in the <code class="ph codeph">CREATE TABLE</code> statement or
+        changing the <code class="ph codeph">TBLPROPERTIES('kudu.master_addresses')</code> value with an <code class="ph codeph">ALTER TABLE</code>
+        statement.
+      </p>
+
+    </div>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title4" id="kudu_config__kudu_topology">
+
+      <h3 class="title topictitle3" id="ariaid-title4">Cluster Topology for Kudu Tables</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          With HDFS-backed tables, you are typically concerned with the number of DataNodes in
+          the cluster, how many and how large HDFS data files are read during a query, and
+          therefore the amount of work performed by each DataNode and the network communication
+          to combine intermediate results and produce the final result set.
+        </p>
+
+        <p class="p">
+          With Kudu tables, the topology considerations are different, because:
+        </p>
+
+        <ul class="ul">
+          <li class="li">
+            <p class="p">
+              The underlying storage is managed and organized by Kudu, not represented as HDFS
+              data files.
+            </p>
+          </li>
+
+          <li class="li">
+            <p class="p">
+              Kudu handles some of the underlying mechanics of partitioning the data. You can specify
+              the partitioning scheme with combinations of hash and range partitioning, so that you can
+              decide how much effort to expend to manage the partitions as new data arrives. For example,
+              you can construct partitions that apply to date ranges rather than a separate partition for each
+              day or each hour.
+            </p>
+          </li>
+
+          <li class="li">
+            <p class="p">
+              Data is physically divided based on units of storage called <dfn class="term">tablets</dfn>. Tablets are
+              stored by <dfn class="term">tablet servers</dfn>. Each tablet server can store multiple tablets,
+              and each tablet is replicated across multiple tablet servers, managed automatically by Kudu.
+              Where practical, colocate the tablet servers on the same hosts as the DataNodes, although that is not required.
+            </p>
+          </li>
+        </ul>
+
+        <p class="p">
+          One consideration for the cluster topology is that the number of replicas for a Kudu table
+          must be odd.
+        </p>
+
+      </div>
+
+    </article>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title5" id="impala_kudu__kudu_ddl">
+
+    <h2 class="title topictitle2" id="ariaid-title5">Impala DDL Enhancements for Kudu Tables (CREATE TABLE and ALTER TABLE)</h2>
+
+    
+
+    <div class="body conbody">
+
+      <p class="p">
+        You can use the Impala <code class="ph codeph">CREATE TABLE</code> and <code class="ph codeph">ALTER TABLE</code>
+        statements to create and fine-tune the characteristics of Kudu tables. Because Kudu
+        tables have features and properties that do not apply to other kinds of Impala tables,
+        familiarize yourself with Kudu-related concepts and syntax first.
+        For the general syntax of the <code class="ph codeph">CREATE TABLE</code>
+        statement for Kudu tables, see <a class="xref" href="impala_create_table.html">CREATE TABLE Statement</a>.
+      </p>
+
+      <p class="p toc inpage"></p>
+
+    </div>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title6" id="kudu_ddl__kudu_primary_key">
+
+      <h3 class="title topictitle3" id="ariaid-title6">Primary Key Columns for Kudu Tables</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          Kudu tables introduce the notion of primary keys to Impala for the first time. The
+          primary key is made up of one or more columns, whose values are combined and used as a
+          lookup key during queries. The tuple represented by these columns must be unique and cannot contain any
+          <code class="ph codeph">NULL</code> values, and can never be updated once inserted. For a
+          Kudu table, all the partition key columns must come from the set of
+          primary key columns.
+        </p>
+
+        <p class="p">
+          The primary key has both physical and logical aspects:
+        </p>
+
+        <ul class="ul">
+          <li class="li">
+            <p class="p">
+              On the physical side, it is used to map the data values to particular tablets for fast retrieval.
+              Because the tuples formed by the primary key values are unique, the primary key columns are typically
+              highly selective.
+            </p>
+          </li>
+          <li class="li">
+            <p class="p">
+              On the logical side, the uniqueness constraint allows you to avoid duplicate data in a table.
+              For example, if an <code class="ph codeph">INSERT</code> operation fails partway through, only some of the
+              new rows might be present in the table. You can re-run the same <code class="ph codeph">INSERT</code>, and
+              only the missing rows will be added. Or if data in the table is stale, you can run an
+              <code class="ph codeph">UPSERT</code> statement that brings the data up to date, without the possibility
+              of creating duplicate copies of existing rows.
+            </p>
+          </li>
+        </ul>
+
+        <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+          <p class="p">
+            Impala only allows <code class="ph codeph">PRIMARY KEY</code> clauses and <code class="ph codeph">NOT NULL</code>
+            constraints on columns for Kudu tables. These constraints are enforced on the Kudu side.
+          </p>
+        </div>
+
+      </div>
+
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title7" id="kudu_ddl__kudu_column_attributes">
+
+      <h3 class="title topictitle3" id="ariaid-title7">Kudu-Specific Column Attributes for CREATE TABLE</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          For the general syntax of the <code class="ph codeph">CREATE TABLE</code>
+          statement for Kudu tables, see <a class="xref" href="impala_create_table.html">CREATE TABLE Statement</a>.
+          The following sections provide more detail for some of the
+          Kudu-specific keywords you can use in column definitions.
+        </p>
+
+        <p class="p">
+          The column list in a <code class="ph codeph">CREATE TABLE</code> statement can include the following
+          attributes, which only apply to Kudu tables:
+        </p>
+
+<pre class="pre codeblock"><code>
+  PRIMARY KEY
+| [NOT] NULL
+| ENCODING <var class="keyword varname">codec</var>
+| COMPRESSION <var class="keyword varname">algorithm</var>
+| DEFAULT <var class="keyword varname">constant_expression</var>
+| BLOCK_SIZE <var class="keyword varname">number</var>
+</code></pre>
+
+        <p class="p toc inpage">
+          See the following sections for details about each column attribute.
+        </p>
+
+      </div>
+
+      <article class="topic concept nested3" aria-labelledby="ariaid-title8" id="kudu_column_attributes__kudu_primary_key_attribute">
+
+        <h4 class="title topictitle4" id="ariaid-title8">PRIMARY KEY Attribute</h4>
+
+        <div class="body conbody">
+
+          <p class="p">
+            The primary key for a Kudu table is a column, or set of columns, that uniquely
+            identifies every row. The primary key value also is used as the natural sort order
+            for the values from the table. The primary key value for each row is based on the
+            combination of values for the columns.
+          </p>
+
+          <p class="p">
+        Because all of the primary key columns must have non-null values, specifying a column
+        in the <code class="ph codeph">PRIMARY KEY</code> clause implicitly adds the <code class="ph codeph">NOT
+        NULL</code> attribute to that column.
+      </p>
+
+          <p class="p">
+            The primary key columns must be the first ones specified in the <code class="ph codeph">CREATE
+            TABLE</code> statement. For a single-column primary key, you can include a
+            <code class="ph codeph">PRIMARY KEY</code> attribute inline with the column definition. For a
+            multi-column primary key, you include a <code class="ph codeph">PRIMARY KEY (<var class="keyword varname">c1</var>,
+            <var class="keyword varname">c2</var>, ...)</code> clause as a separate entry at the end of the
+            column list.
+          </p>
+
+          <p class="p">
+            You can specify the <code class="ph codeph">PRIMARY KEY</code> attribute either inline in a single
+            column definition, or as a separate clause at the end of the column list:
+          </p>
+
+<pre class="pre codeblock"><code>
+CREATE TABLE pk_inline
+(
+  col1 BIGINT PRIMARY KEY,
+  col2 STRING,
+  col3 BOOLEAN
+) PARTITION BY HASH(col1) PARTITIONS 2 STORED AS KUDU;
+
+CREATE TABLE pk_at_end
+(
+  col1 BIGINT,
+  col2 STRING,
+  col3 BOOLEAN,
+  PRIMARY KEY (col1)
+) PARTITION BY HASH(col1) PARTITIONS 2 STORED AS KUDU;
+</code></pre>
+
+          <p class="p">
+            When the primary key is a single column, these two forms are equivalent. If the
+            primary key consists of more than one column, you must specify the primary key using
+            a separate entry in the column list:
+          </p>
+
+<pre class="pre codeblock"><code>
+CREATE TABLE pk_multiple_columns
+(
+  col1 BIGINT,
+  col2 STRING,
+  col3 BOOLEAN,
+  <strong class="ph b">PRIMARY KEY (col1, col2)</strong>
+) PARTITION BY HASH(col2) PARTITIONS 2 STORED AS KUDU;
+</code></pre>
+
+          <p class="p">
+            The <code class="ph codeph">SHOW CREATE TABLE</code> statement always represents the
+            <code class="ph codeph">PRIMARY KEY</code> specification as a separate item in the column list:
+          </p>
+
+<pre class="pre codeblock"><code>
+CREATE TABLE inline_pk_rewritten (id BIGINT <strong class="ph b">PRIMARY KEY</strong>, s STRING)
+  PARTITION BY HASH(id) PARTITIONS 2 STORED AS KUDU;
+
+SHOW CREATE TABLE inline_pk_rewritten;
++------------------------------------------------------------------------------+
+| result                                                                       |
++------------------------------------------------------------------------------+
+| CREATE TABLE user.inline_pk_rewritten (                                      |
+|   id BIGINT NOT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION, |
+|   s STRING NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION,      |
+|   <strong class="ph b">PRIMARY KEY (id)</strong>                                                           |
+| )                                                                            |
+| PARTITION BY HASH (id) PARTITIONS 2                                          |
+| STORED AS KUDU                                                               |
+| TBLPROPERTIES ('kudu.master_addresses'='host.example.com')                   |
++------------------------------------------------------------------------------+
+</code></pre>
+
+          <p class="p">
+            The notion of primary key only applies to Kudu tables. Every Kudu table requires a
+            primary key. The primary key consists of one or more columns. You must specify any
+            primary key columns first in the column list.
+          </p>
+
+          <p class="p">
+            The contents of the primary key columns cannot be changed by an
+            <code class="ph codeph">UPDATE</code> or <code class="ph codeph">UPSERT</code> statement. Including too many
+            columns in the primary key (more than 5 or 6) can also reduce the performance of
+            write operations. Therefore, pick the most selective and most frequently
+            tested non-null columns for the primary key specification.
+            If a column must always have a value, but that value
+            might change later, leave it out of the primary key and use a <code class="ph codeph">NOT
+            NULL</code> clause for that column instead. If an existing row has an
+            incorrect or outdated key column value, delete the old row and insert an entirely
+            new row with the correct primary key.
+          </p>
+
+        </div>
+
+      </article>
+
+      <article class="topic concept nested3" aria-labelledby="ariaid-title9" id="kudu_column_attributes__kudu_not_null_attribute">
+
+        <h4 class="title topictitle4" id="ariaid-title9">NULL | NOT NULL Attribute</h4>
+
+        <div class="body conbody">
+
+          <p class="p">
+            For Kudu tables, you can specify which columns can contain nulls or not. This
+            constraint offers an extra level of consistency enforcement for Kudu tables. If an
+            application requires a field to always be specified, include a <code class="ph codeph">NOT
+            NULL</code> clause in the corresponding column definition, and Kudu prevents rows
+            from being inserted with a <code class="ph codeph">NULL</code> in that column.
+          </p>
+
+          <p class="p">
+            For example, a table containing geographic information might require the latitude
+            and longitude coordinates to always be specified. Other attributes might be allowed
+            to be <code class="ph codeph">NULL</code>. For example, a location might not have a designated
+            place name, its altitude might be unimportant, and its population might be initially
+            unknown, to be filled in later.
+          </p>
+
+          <p class="p">
+        Because all of the primary key columns must have non-null values, specifying a column
+        in the <code class="ph codeph">PRIMARY KEY</code> clause implicitly adds the <code class="ph codeph">NOT
+        NULL</code> attribute to that column.
+      </p>
+
+          <p class="p">
+            For non-Kudu tables, Impala allows any column to contain <code class="ph codeph">NULL</code>
+            values, because it is not practical to enforce a <span class="q">"not null"</span> constraint on HDFS
+            data files that could be prepared using external tools and ETL processes.
+          </p>
+
+<pre class="pre codeblock"><code>
+CREATE TABLE required_columns
+(
+  id BIGINT PRIMARY KEY,
+  latitude DOUBLE NOT NULL,
+  longitude DOUBLE NOT NULL,
+  place_name STRING,
+  altitude DOUBLE,
+  population BIGINT
+) PARTITION BY HASH(id) PARTITIONS 2 STORED AS KUDU;
+</code></pre>
+
+          <p class="p">
+            During performance optimization, Kudu can use the knowledge that nulls are not
+            allowed to skip certain checks on each input row, speeding up queries and join
+            operations. Therefore, specify <code class="ph codeph">NOT NULL</code> constraints when
+            appropriate.
+          </p>
+
+          <p class="p">
+            The <code class="ph codeph">NULL</code> clause is the default condition for all columns that are not
+            part of the primary key. You can omit it, or specify it to clarify that you have made a
+            conscious design decision to allow nulls in a column.
+          </p>
+
+          <p class="p">
+            Because primary key columns cannot contain any <code class="ph codeph">NULL</code> values, the
+            <code class="ph codeph">NOT NULL</code> clause is not required for the primary key columns,
+            but you might still specify it to make your code self-describing.
+          </p>
+
+        </div>
+
+      </article>
+
+      <article class="topic concept nested3" aria-labelledby="ariaid-title10" id="kudu_column_attributes__kudu_default_attribute">
+
+        <h4 class="title topictitle4" id="ariaid-title10">DEFAULT Attribute</h4>
+
+        <div class="body conbody">
+
+          <p class="p">
+            You can specify a default value for columns in Kudu tables. The default value can be
+            any constant expression, for example, a combination of literal values, arithmetic
+            and string operations. It cannot contain references to columns or non-deterministic
+            function calls.
+          </p>
+
+          <p class="p">
+            The following example shows different kinds of expressions for the
+            <code class="ph codeph">DEFAULT</code> clause. The requirement to use a constant value means that
+            you can fill in a placeholder value such as <code class="ph codeph">NULL</code>, empty string,
+            0, -1, <code class="ph codeph">'N/A'</code> and so on, but you cannot reference functions or
+            column names. Therefore, you cannot use <code class="ph codeph">DEFAULT</code> to do things such as
+            automatically making an uppercase copy of a string value, storing Boolean values based
+            on tests of other columns, or add or subtract one from another column representing a sequence number.
+          </p>
+
+<pre class="pre codeblock"><code>
+CREATE TABLE default_vals
+(
+  id BIGINT PRIMARY KEY,
+  name STRING NOT NULL DEFAULT 'unknown',
+  address STRING DEFAULT upper('no fixed address'),
+  age INT DEFAULT -1,
+  earthling BOOLEAN DEFAULT TRUE,
+  planet_of_origin STRING DEFAULT 'Earth',
+  optional_col STRING DEFAULT NULL
+) PARTITION BY HASH(id) PARTITIONS 2 STORED AS KUDU;
+</code></pre>
+
+          <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+            <p class="p">
+              When designing an entirely new schema, prefer to use <code class="ph codeph">NULL</code> as the
+              placeholder for any unknown or missing values, because that is the universal convention
+              among database systems. Null values can be stored efficiently, and easily checked with the
+              <code class="ph codeph">IS NULL</code> or <code class="ph codeph">IS NOT NULL</code> operators. The <code class="ph codeph">DEFAULT</code>
+              attribute is appropriate when ingesting data that already has an established convention for
+              representing unknown or missing values, or where the vast majority of rows have some common
+              non-null value.
+            </p>
+          </div>
+
+        </div>
+
+      </article>
+
+      <article class="topic concept nested3" aria-labelledby="ariaid-title11" id="kudu_column_attributes__kudu_encoding_attribute">
+
+        <h4 class="title topictitle4" id="ariaid-title11">ENCODING Attribute</h4>
+
+        <div class="body conbody">
+
+          <p class="p">
+            Each column in a Kudu table can optionally use an encoding, a low-overhead form of
+            compression that reduces the size on disk, then requires additional CPU cycles to
+            reconstruct the original values during queries. Typically, highly compressible data
+            benefits from the reduced I/O to read the data back from disk. By default, each
+            column uses the <span class="q">"plain"</span> encoding where the data is stored unchanged.
+          </p>
+
+          <div class="p">
+            The encoding keywords that Impala recognizes are:
+
+            <ul class="ul">
+              <li class="li">
+                <p class="p">
+                  <code class="ph codeph">AUTO_ENCODING</code>: use the default encoding based on the column
+                  type; currently always the same as <code class="ph codeph">PLAIN_ENCODING</code>, but subject to
+                  change in the future.
+                </p>
+              </li>
+              <li class="li">
+                <p class="p">
+                  <code class="ph codeph">PLAIN_ENCODING</code>: leave the value in its original binary format.
+                </p>
+              </li>
+              
+              <li class="li">
+                <p class="p">
+                  <code class="ph codeph">RLE</code>: compress repeated values (when sorted in primary key
+                  order) by including a count.
+                </p>
+              </li>
+              <li class="li">
+                <p class="p">
+                  <code class="ph codeph">DICT_ENCODING</code>: when the number of different string values is
+                  low, replace the original string with a numeric ID.
+                </p>
+              </li>
+              <li class="li">
+                <p class="p">
+                  <code class="ph codeph">BIT_SHUFFLE</code>: rearrange the bits of the values to efficiently
+                  compress sequences of values that are identical or vary only slightly based
+                  on primary key order. The resulting encoded data is also compressed with LZ4.
+                </p>
+              </li>
+              <li class="li">
+                <p class="p">
+                  <code class="ph codeph">PREFIX_ENCODING</code>: compress common prefixes in string values; mainly for use internally within Kudu.
+                </p>
+              </li>
+            </ul>
+          </div>
+
+
+
+          <p class="p">
+            The following example shows the Impala keywords representing the encoding types.
+            (The Impala keywords match the symbolic names used within Kudu.)
+            For usage guidelines on the different kinds of encoding, see
+            <a class="xref" href="https://kudu.apache.org/docs/schema_design.html" target="_blank">the Kudu documentation</a>.
+            The <code class="ph codeph">DESCRIBE</code> output shows how the encoding is reported after
+            the table is created, and that omitting the encoding (in this case, for the
+            <code class="ph codeph">ID</code> column) is the same as specifying <code class="ph codeph">DEFAULT_ENCODING</code>.
+          </p>
+
+<pre class="pre codeblock"><code>
+CREATE TABLE various_encodings
+(
+  id BIGINT PRIMARY KEY,
+  c1 BIGINT ENCODING PLAIN_ENCODING,
+  c2 BIGINT ENCODING AUTO_ENCODING,
+  c3 TINYINT ENCODING BIT_SHUFFLE,
+  c4 DOUBLE ENCODING BIT_SHUFFLE,
+  c5 BOOLEAN ENCODING RLE,
+  c6 STRING ENCODING DICT_ENCODING,
+  c7 STRING ENCODING PREFIX_ENCODING
+) PARTITION BY HASH(id) PARTITIONS 2 STORED AS KUDU;
+
+-- Some columns are omitted from the output for readability.
+describe various_encodings;
++------+---------+-------------+----------+-----------------+
+| name | type    | primary_key | nullable | encoding        |
++------+---------+-------------+----------+-----------------+
+| id   | bigint  | true        | false    | AUTO_ENCODING   |
+| c1   | bigint  | false       | true     | PLAIN_ENCODING  |
+| c2   | bigint  | false       | true     | AUTO_ENCODING   |
+| c3   | tinyint | false       | true     | BIT_SHUFFLE     |
+| c4   | double  | false       | true     | BIT_SHUFFLE     |
+| c5   | boolean | false       | true     | RLE             |
+| c6   | string  | false       | true     | DICT_ENCODING   |
+| c7   | string  | false       | true     | PREFIX_ENCODING |
++------+---------+-------------+----------+-----------------+
+</code></pre>
+
+        </div>
+
+      </article>
+
+      <article class="topic concept nested3" aria-labelledby="ariaid-title12" id="kudu_column_attributes__kudu_compression_attribute">
+
+        <h4 class="title topictitle4" id="ariaid-title12">COMPRESSION Attribute</h4>
+
+        <div class="body conbody">
+
+          <p class="p">
+            You can specify a compression algorithm to use for each column in a Kudu table. This
+            attribute imposes more CPU overhead when retrieving the values than the
+            <code class="ph codeph">ENCODING</code> attribute does. Therefore, use it primarily for columns with
+            long strings that do not benefit much from the less-expensive <code class="ph codeph">ENCODING</code>
+            attribute.
+          </p>
+
+          <p class="p">
+            The choices for <code class="ph codeph">COMPRESSION</code> are <code class="ph codeph">LZ4</code>,
+            <code class="ph codeph">SNAPPY</code>, and <code class="ph codeph">ZLIB</code>.
+          </p>
+
+          <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+            <p class="p">
+              Columns that use the <code class="ph codeph">BITSHUFFLE</code> encoding are already compressed
+              using <code class="ph codeph">LZ4</code>, and so typically do not need any additional
+              <code class="ph codeph">COMPRESSION</code> attribute.
+            </p>
+          </div>
+
+          <p class="p">
+            The following example shows design considerations for several
+            <code class="ph codeph">STRING</code> columns with different distribution characteristics, leading
+            to choices for both the <code class="ph codeph">ENCODING</code> and <code class="ph codeph">COMPRESSION</code>
+            attributes. The <code class="ph codeph">country</code> values come from a specific set of strings,
+            therefore this column is a good candidate for dictionary encoding. The
+            <code class="ph codeph">post_id</code> column contains an ascending sequence of integers, where
+            several leading bits are likely to be all zeroes, therefore this column is a good
+            candidate for bitshuffle encoding. The <code class="ph codeph">body</code>
+            column and the corresponding columns for translated versions tend to be long unique
+            strings that are not practical to use with any of the encoding schemes, therefore
+            they employ the <code class="ph codeph">COMPRESSION</code> attribute instead. The ideal compression
+            codec in each case would require some experimentation to determine how much space
+            savings it provided and how much CPU overhead it added, based on real-world data.
+          </p>
+
+<pre class="pre codeblock"><code>
+CREATE TABLE blog_posts
+(
+  user_id STRING ENCODING DICT_ENCODING,
+  post_id BIGINT ENCODING BIT_SHUFFLE,
+  subject STRING ENCODING PLAIN_ENCODING,
+  body STRING COMPRESSION LZ4,
+  spanish_translation STRING COMPRESSION SNAPPY,
+  esperanto_translation STRING COMPRESSION ZLIB,
+  PRIMARY KEY (user_id, post_id)
+) PARTITION BY HASH(user_id, post_id) PARTITIONS 2 STORED AS KUDU;
+</code></pre>
+
+        </div>
+
+      </article>
+
+      <article class="topic concept nested3" aria-labelledby="ariaid-title13" id="kudu_column_attributes__kudu_block_size_attribute">
+
+        <h4 class="title topictitle4" id="ariaid-title13">BLOCK_SIZE Attribute</h4>
+
+        <div class="body conbody">
+
+          <p class="p">
+            Although Kudu does not use HDFS files internally, and thus is not affected by
+            the HDFS block size, it does have an underlying unit of I/O called the
+            <dfn class="term">block size</dfn>. The <code class="ph codeph">BLOCK_SIZE</code> attribute lets you set the
+            block size for any column.
+          </p>
+
+          <p class="p">
+            The block size attribute is a relatively advanced feature. Refer to
+            <a class="xref" href="https://kudu.apache.org/docs/index.html" target="_blank">the Kudu documentation</a>
+            for usage details.
+          </p>
+
+
+
+        </div>
+
+      </article>
+
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title14" id="kudu_ddl__kudu_partitioning">
+
+      <h3 class="title topictitle3" id="ariaid-title14">Partitioning for Kudu Tables</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          Kudu tables use special mechanisms to distribute data among the underlying
+          tablet servers. Although we refer to such tables as partitioned tables, they are
+          distinguished from traditional Impala partitioned tables by use of different clauses
+          on the <code class="ph codeph">CREATE TABLE</code> statement. Kudu tables use
+          <code class="ph codeph">PARTITION BY</code>, <code class="ph codeph">HASH</code>, <code class="ph codeph">RANGE</code>, and
+          range specification clauses rather than the <code class="ph codeph">PARTITIONED BY</code> clause
+          for HDFS-backed tables, which specifies only a column name and creates a new partition for each
+          different value.
+        </p>
+
+        <p class="p">
+          For background information and architectural details about the Kudu partitioning
+          mechanism, see
+          <a class="xref" href="https://kudu.apache.org/kudu.pdf" target="_blank">the Kudu white paper, section 3.2</a>.
+        </p>
+
+
+
+        <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+          <p class="p">
+            The Impala DDL syntax for Kudu tables is different than in early Kudu versions,
+            which used an experimental fork of the Impala code. For example, the
+            <code class="ph codeph">DISTRIBUTE BY</code> clause is now <code class="ph codeph">PARTITION BY</code>, the
+            <code class="ph codeph">INTO <var class="keyword varname">n</var> BUCKETS</code> clause is now
+            <code class="ph codeph">PARTITIONS <var class="keyword varname">n</var></code> and the range partitioning syntax
+            is reworked to replace the <code class="ph codeph">SPLIT ROWS</code> clause with more expressive
+            syntax involving comparison operators.
+          </p>
+        </div>
+
+        <p class="p toc inpage"></p>
+
+      </div>
+
+      <article class="topic concept nested3" aria-labelledby="ariaid-title15" id="kudu_partitioning__kudu_hash_partitioning">
+        <h4 class="title topictitle4" id="ariaid-title15">Hash Partitioning</h4>
+        <div class="body conbody">
+
+          <p class="p">
+            Hash partitioning is the simplest type of partitioning for Kudu tables.
+            For hash-partitioned Kudu tables, inserted rows are divided up between a fixed number
+            of <span class="q">"buckets"</span> by applying a hash function to the values of the columns specified
+            in the <code class="ph codeph">HASH</code> clause.
+            Hashing ensures that rows with similar values are evenly distributed, instead of
+            clumping together all in the same bucket. Spreading new rows across the buckets this
+            way lets insertion operations work in parallel across multiple tablet servers.
+            Separating the hashed values can impose additional overhead on queries, where
+            queries with range-based predicates might have to read multiple tablets to retrieve
+            all the relevant values.
+          </p>
+
+<pre class="pre codeblock"><code>
+-- 1M rows with 50 hash partitions = approximately 20,000 rows per partition.
+-- The values in each partition are not sequential, but rather based on a hash function.
+-- Rows 1, 99999, and 123456 might be in the same partition.
+CREATE TABLE million_rows (id string primary key, s string)
+  PARTITION BY HASH(id) PARTITIONS 50
+  STORED AS KUDU;
+
+-- Because the ID values are unique, we expect the rows to be roughly
+-- evenly distributed between the buckets in the destination table.
+INSERT INTO million_rows SELECT * FROM billion_rows ORDER BY id LIMIT 1e6;
+</code></pre>
+
+          <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+            <p class="p">
+              The largest number of buckets that you can create with a <code class="ph codeph">PARTITIONS</code>
+              clause varies depending on the number of tablet servers in the cluster, while the smallest is 2.
+              For simplicity, some of the simple <code class="ph codeph">CREATE TABLE</code> statements throughout this section
+              use <code class="ph codeph">PARTITIONS 2</code> to illustrate the minimum requirements for a Kudu table.
+              For large tables, prefer to use roughly 10 partitions per server in the cluster.
+            </p>
+          </div>
+
+        </div>
+      </article>
+
+      <article class="topic concept nested3" aria-labelledby="ariaid-title16" id="kudu_partitioning__kudu_range_partitioning">
+        <h4 class="title topictitle4" id="ariaid-title16">Range Partitioning</h4>
+        <div class="body conbody">
+
+          <p class="p">
+            Range partitioning lets you specify partitioning precisely, based on single values or ranges
+            of values within one or more columns. You add one or more <code class="ph codeph">RANGE</code> clauses to the
+            <code class="ph codeph">CREATE TABLE</code> statement, following the <code class="ph codeph">PARTITION BY</code>
+            clause.
+          </p>
+
+          <p class="p">
+            Range-partitioned Kudu tables use one or more range clauses, which include a
+            combination of constant expressions, <code class="ph codeph">VALUE</code> or <code class="ph codeph">VALUES</code>
+            keywords, and comparison operators. (This syntax replaces the <code class="ph codeph">SPLIT
+            ROWS</code> clause used with early Kudu versions.)
+            For the full syntax, see <a class="xref" href="impala_create_table.html">CREATE TABLE Statement</a>.
+          </p>
+
+<pre class="pre codeblock"><code>
+-- 50 buckets, all for IDs beginning with a lowercase letter.
+-- Having only a single range enforces the allowed range of values
+-- but does not add any extra parallelism.
+create table million_rows_one_range (id string primary key, s string)
+  partition by hash(id) partitions 50,
+  range (partition 'a' &lt;= values &lt; '{')
+  stored as kudu;
+
+-- 50 buckets for IDs beginning with a lowercase letter
+-- plus 50 buckets for IDs beginning with an uppercase letter.
+-- Total number of buckets = number in the PARTITIONS clause x number of ranges.
+-- We are still enforcing constraints on the primary key values
+-- allowed in the table, and the 2 ranges provide better parallelism
+-- as rows are inserted or the table is scanned.
+create table million_rows_two_ranges (id string primary key, s string)
+  partition by hash(id) partitions 50,
+  range (partition 'a' &lt;= values &lt; '{', partition 'A' &lt;= values &lt; '[')
+  stored as kudu;
+
+-- Same as previous table, with an extra range covering the single key value '00000'.
+create table million_rows_three_ranges (id string primary key, s string)
+  partition by hash(id) partitions 50,
+  range (partition 'a' &lt;= values &lt; '{', partition 'A' &lt;= values &lt; '[', partition value = '00000')
+  stored as kudu;
+
+-- The range partitioning can be displayed with a SHOW command in impala-shell.
+show range partitions million_rows_three_ranges;
++---------------------+
+| RANGE (id)          |
++---------------------+
+| VALUE = "00000"     |
+| "A" &lt;= VALUES &lt; "[" |
+| "a" &lt;= VALUES &lt; "{" |
++---------------------+
+
+</code></pre>
+
+          <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+            <p class="p">
+              When defining ranges, be careful to avoid <span class="q">"fencepost errors"</span> where values at the
+              extreme ends might be included or omitted by accident. For example, in the tables defined
+              in the preceding code listings, the range <code class="ph codeph">"a" &lt;= VALUES &lt; "{"</code> ensures that
+              any values starting with <code class="ph codeph">z</code>, such as <code class="ph codeph">za</code> or <code class="ph codeph">zzz</code>
+              or <code class="ph codeph">zzz-ZZZ</code>, are all included, by using a less-than operator for the smallest
+              value after all the values starting with <code class="ph codeph">z</code>.
+            </p>
+          </div>
+
+          <p class="p">
+            For range-partitioned Kudu tables, an appropriate range must exist before a data value can be created in the table.
+            Any <code class="ph codeph">INSERT</code>, <code class="ph codeph">UPDATE</code>, or <code class="ph codeph">UPSERT</code> statements fail if they try to
+            create column values that fall outside the specified ranges. The error checking for ranges is performed on the
+            Kudu side; Impala passes the specified range information to Kudu, and passes back any error or warning if the
+            ranges are not valid. (A nonsensical range specification causes an error for a DDL statement, but only a warning
+            for a DML statement.)
+          </p>
+
+          <p class="p">
+            Ranges can be non-contiguous:
+          </p>
+
+<pre class="pre codeblock"><code>
+partition by range (year) (partition 1885 &lt;= values &lt;= 1889, partition 1893 &lt;= values &lt;= 1897)
+
+partition by range (letter_grade) (partition value = 'A', partition value = 'B',
+  partition value = 'C', partition value = 'D', partition value = 'F')
+
+</code></pre>
+
+          <p class="p">
+            The <code class="ph codeph">ALTER TABLE</code> statement with the <code class="ph codeph">ADD PARTITION</code> or
+            <code class="ph codeph">DROP PARTITION</code> clauses can be used to add or remove ranges from an
+            existing Kudu table.
+          </p>
+
+<pre class="pre codeblock"><code>
+ALTER TABLE foo ADD PARTITION 30 &lt;= VALUES &lt; 50;
+ALTER TABLE foo DROP PARTITION 1 &lt;= VALUES &lt; 5;
+
+</code></pre>
+
+          <p class="p">
+            When a range is added, the new range must not overlap with any of the previous ranges;
+            that is, it can only fill in gaps within the previous ranges.
+          </p>
+
+<pre class="pre codeblock"><code>
+alter table test_scores add range partition value = 'E';
+
+alter table year_ranges add range partition 1890 &lt;= values &lt; 1893;
+
+</code></pre>
+
+          <p class="p">
+            When a range is removed, all the associated rows in the table are deleted. (This
+            is true whether the table is internal or external.)
+          </p>
+
+<pre class="pre codeblock"><code>
+alter table test_scores drop range partition value = 'E';
+
+alter table year_ranges drop range partition 1890 &lt;= values &lt; 1893;
+
+</code></pre>
+
+        <p class="p">
+          Kudu tables can also use a combination of hash and range partitioning.
+        </p>
+
+<pre class="pre codeblock"><code>
+partition by hash (school) partitions 10,
+  range (letter_grade) (partition value = 'A', partition value = 'B',
+    partition value = 'C', partition value = 'D', partition value = 'F')
+
+</code></pre>
+
+        </div>
+      </article>
+
+      <article class="topic concept nested3" aria-labelledby="ariaid-title17" id="kudu_partitioning__kudu_partitioning_misc">
+        <h4 class="title topictitle4" id="ariaid-title17">Working with Partitioning in Kudu Tables</h4>
+        <div class="body conbody">
+
+          <p class="p">
+            To see the current partitioning scheme for a Kudu table, you can use the <code class="ph codeph">SHOW
+            CREATE TABLE</code> statement or the <code class="ph codeph">SHOW PARTITIONS</code> statement. The
+            <code class="ph codeph">CREATE TABLE</code> syntax displayed by this statement includes all the
+            hash, range, or both clauses that reflect the original table structure plus any
+            subsequent <code class="ph codeph">ALTER TABLE</code> statements that changed the table structure.
+          </p>
+
+          <p class="p">
+            To see the underlying buckets and partitions for a Kudu table, use the
+            <code class="ph codeph">SHOW TABLE STATS</code> or <code class="ph codeph">SHOW PARTITIONS</code> statement.
+          </p>
+
+        </div>
+      </article>
+
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title18" id="kudu_ddl__kudu_timestamps">
+
+      <h3 class="title topictitle3" id="ariaid-title18">Handling Date, Time, or Timestamp Data with Kudu</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          Because currently a Kudu table cannot have a column with the Impala
+          <code class="ph codeph">TIMESTAMP</code> type, expect to store date/time information as the number
+          of seconds, milliseconds, or microseconds since the Unix epoch date of January 1,
+          1970. Specify the column as <code class="ph codeph">BIGINT</code> in the Impala <code class="ph codeph">CREATE
+          TABLE</code> statement, corresponding to an 8-byte integer (an
+          <code class="ph codeph">int64</code>) in the underlying Kudu table). Then use Impala date/time
+          conversion functions as necessary to produce a numeric, <code class="ph codeph">TIMESTAMP</code>,
+          or <code class="ph codeph">STRING</code> value depending on the context.
+        </p>
+
+        <p class="p">
+          For example, the <code class="ph codeph">unix_timestamp()</code> function returns an integer result
+          representing the number of seconds past the epoch. The <code class="ph codeph">now()</code> function
+          produces a <code class="ph codeph">TIMESTAMP</code> representing the current date and time, which can
+          be passed as an argument to <code class="ph codeph">unix_timestamp()</code>. And string literals
+          representing dates and date/times can be cast to <code class="ph codeph">TIMESTAMP</code>, and from there
+          converted to numeric values. The following examples show how you might store a date/time
+          column as <code class="ph codeph">BIGINT</code> in a Kudu table, but still use string literals and
+          <code class="ph codeph">TIMESTAMP</code> values for convenience.
+        </p>
+
+<pre class="pre codeblock"><code>
+-- now() returns a TIMESTAMP and shows the format for string literals you can cast to TIMESTAMP.
+select now();
++-------------------------------+
+| now()                         |
++-------------------------------+
+| 2017-01-25 23:50:10.132385000 |
++-------------------------------+
+
+-- unix_timestamp() accepts either a TIMESTAMP or an equivalent string literal.
+select unix_timestamp(now());
++------------------+
+| unix_timestamp() |
++------------------+
+| 1485386670       |
++------------------+
+
+select unix_timestamp('2017-01-01');
++------------------------------+
+| unix_timestamp('2017-01-01') |
++------------------------------+
+| 1483228800                   |
++------------------------------+
+
+-- Make a table representing a date/time value as BIGINT.
+-- Construct 1 range partition and 20 associated hash partitions for each year.
+-- Use date/time conversion functions to express the ranges as human-readable dates.
+create table time_series(id bigint, when_exactly bigint, event string, primary key (id, when_exactly))
+	partition by hash (id) partitions 20,
+	range (when_exactly)
+	(
+		partition unix_timestamp('2015-01-01') &lt;= values &lt; unix_timestamp('2016-01-01'),
+		partition unix_timestamp('2016-01-01') &lt;= values &lt; unix_timestamp('2017-01-01'),
+		partition unix_timestamp('2017-01-01') &lt;= values &lt; unix_timestamp('2018-01-01')
+	)
+	stored as kudu;
+
+-- On insert, we can transform a human-readable date/time into a numeric value.
+insert into time_series values (12345, unix_timestamp('2017-01-25 23:24:56'), 'Working on doc examples');
+
+-- On retrieval, we can examine the numeric date/time value or turn it back into a string for readability.
+select id, when_exactly, from_unixtime(when_exactly) as 'human-readable date/time', event
+  from time_series order by when_exactly limit 100;
++-------+--------------+--------------------------+-------------------------+
+| id    | when_exactly | human-readable date/time | event                   |
++-------+--------------+--------------------------+-------------------------+
+| 12345 | 1485386696   | 2017-01-25 23:24:56      | Working on doc examples |
++-------+--------------+--------------------------+-------------------------+
+
+</code></pre>
+
+        <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+          <p class="p">
+            If you do high-precision arithmetic involving numeric date/time values,
+            when dividing millisecond values by 1000, or microsecond values by 1 million, always
+            cast the integer numerator to a <code class="ph codeph">DECIMAL</code> with sufficient precision
+            and scale to avoid any rounding or loss of precision.
+          </p>
+        </div>
+
+<pre class="pre codeblock"><code>
+-- 1 million and 1 microseconds = 1.000001 seconds.
+select microseconds,
+  cast (microseconds as decimal(20,7)) / 1e6 as fractional_seconds
+  from table_with_microsecond_column;
++--------------+----------------------+
+| microseconds | fractional_seconds   |
++--------------+----------------------+
+| 1000001      | 1.000001000000000000 |
++--------------+----------------------+
+
+</code></pre>
+
+      </div>
+
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title19" id="kudu_ddl__kudu_metadata">
+
+      <h3 class="title topictitle3" id="ariaid-title19">How Impala Handles Kudu Metadata</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+        Much of the metadata for Kudu tables is handled by the underlying
+        storage layer. Kudu tables have less reliance on the metastore
+        database, and require less metadata caching on the Impala side.
+        For example, information about partitions in Kudu tables is managed
+        by Kudu, and Impala does not cache any block locality metadata
+        for Kudu tables.
+      </p>
+        <p class="p">
+        The <code class="ph codeph">REFRESH</code> and <code class="ph codeph">INVALIDATE METADATA</code>
+        statements are needed less frequently for Kudu tables than for
+        HDFS-backed tables. Neither statement is needed when data is
+        added to, removed, or updated in a Kudu table, even if the changes
+        are made directly to Kudu through a client program using the Kudu API.
+        Run <code class="ph codeph">REFRESH <var class="keyword varname">table_name</var></code> or
+        <code class="ph codeph">INVALIDATE METADATA <var class="keyword varname">table_name</var></code>
+        for a Kudu table only after making a change to the Kudu table schema,
+        such as adding or dropping a column, by a mechanism other than
+        Impala.
+      </p>
+
+        <p class="p">
+          Because Kudu manages the metadata for its own tables separately from the metastore
+          database, there is a table name stored in the metastore database for Impala to use,
+          and a table name on the Kudu side, and these names can be modified independently
+          through <code class="ph codeph">ALTER TABLE</code> statements.
+        </p>
+
+        <p class="p">
+          To avoid potential name conflicts, the prefix <code class="ph codeph">impala::</code>
+          and the Impala database name are encoded into the underlying Kudu
+          table name:
+        </p>
+
+<pre class="pre codeblock"><code>
+create database some_database;
+use some_database;
+
+create table table_name_demo (x int primary key, y int)
+  partition by hash (x) partitions 2 stored as kudu;
+
+describe formatted table_name_demo;
+...
+kudu.table_name  | impala::some_database.table_name_demo
+
+</code></pre>
+
+        <p class="p">
+          See <a class="xref" href="impala_tables.html">Overview of Impala Tables</a> for examples of how to change the name of
+          the Impala table in the metastore database, the name of the underlying Kudu
+          table, or both.
+        </p>
+
+      </div>
+
+    </article>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title20" id="impala_kudu__kudu_etl">
+
+    <h2 class="title topictitle2" id="ariaid-title20">Loading Data into Kudu Tables</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        Kudu tables are well-suited to use cases where data arrives continuously, in small or
+        moderate volumes. To bring data into Kudu tables, use the Impala <code class="ph codeph">INSERT</code>
+        and <code class="ph codeph">UPSERT</code> statements. The <code class="ph codeph">LOAD DATA</code> statement does
+        not apply to Kudu tables.
+      </p>
+
+      <p class="p">
+        Because Kudu manages its own storage layer that is optimized for smaller block sizes than
+        HDFS, and performs its own housekeeping to keep data evenly distributed, it is not
+        subject to the <span class="q">"many small files"</span> issue and does not need explicit reorganization
+        and compaction as the data grows over time. The partitions within a Kudu table can be
+        specified to cover a variety of possible data distributions, instead of hardcoding a new
+        partition for each new day, hour, and so on, which can lead to inefficient,
+        hard-to-scale, and hard-to-manage partition schemes with HDFS tables.
+      </p>
+
+      <p class="p">
+        Your strategy for performing ETL or bulk updates on Kudu tables should take into account
+        the limitations on consistency for DML operations.
+      </p>
+
+      <p class="p">
+        Make <code class="ph codeph">INSERT</code>, <code class="ph codeph">UPDATE</code>, and <code class="ph codeph">UPSERT</code>
+        operations <dfn class="term">idempotent</dfn>: that is, able to be applied multiple times and still
+        produce an identical result.
+      </p>
+
+      <p class="p">
+        If a bulk operation is in danger of exceeding capacity limits due to timeouts or high
+        memory usage, split it into a series of smaller operations.
+      </p>
+
+      <p class="p">
+        Avoid running concurrent ETL operations where the end results depend on precise
+        ordering. In particular, do not rely on an <code class="ph codeph">INSERT ... SELECT</code> statement
+        that selects from the same table into which it is inserting, unless you include extra
+        conditions in the <code class="ph codeph">WHERE</code> clause to avoid reading the newly inserted rows
+        within the same statement.
+      </p>
+
+      <p class="p">
+        Because relationships between tables cannot be enforced by Impala and Kudu, and cannot
+        be committed or rolled back together, do not expect transactional semantics for
+        multi-table operations.
+      </p>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title21" id="impala_kudu__kudu_dml">
+
+    <h2 class="title topictitle2" id="ariaid-title21">Impala DML Support for Kudu Tables (INSERT, UPDATE, DELETE, UPSERT)</h2>
+
+    
+
+    <div class="body conbody">
+
+      <p class="p">
+        Impala supports certain DML statements for Kudu tables only. The <code class="ph codeph">UPDATE</code>
+        and <code class="ph codeph">DELETE</code> statements let you modify data within Kudu tables without
+        rewriting substantial amounts of table data. The <code class="ph codeph">UPSERT</code> statement acts
+        as a combination of <code class="ph codeph">INSERT</code> and <code class="ph codeph">UPDATE</code>, inserting rows
+        where the primary key does not already exist, and updating the non-primary key columns
+        where the primary key does already exist in the table.
+      </p>
+
+      <p class="p">
+        The <code class="ph codeph">INSERT</code> statement for Kudu tables honors the unique and <code class="ph codeph">NOT
+        NULL</code> requirements for the primary key columns.
+      </p>
+
+      <p class="p">
+        Because Impala and Kudu do not support transactions, the effects of any
+        <code class="ph codeph">INSERT</code>, <code class="ph codeph">UPDATE</code>, or <code class="ph codeph">DELETE</code> statement
+        are immediately visible. For example, you cannot do a sequence of
+        <code class="ph codeph">UPDATE</code> statements and only make the changes visible after all the
+        statements are finished. Also, if a DML statement fails partway through, any rows that
+        were already inserted, deleted, or changed remain in the table; there is no rollback
+        mechanism to undo the changes.
+      </p>
+
+      <p class="p">
+        In particular, an <code class="ph codeph">INSERT ... SELECT</code> statement that refers to the table
+        being inserted into might insert more rows than expected, because the
+        <code class="ph codeph">SELECT</code> part of the statement sees some of the new rows being inserted
+        and processes them again.
+      </p>
+
+      <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+        <p class="p">
+          The <code class="ph codeph">LOAD DATA</code> statement, which involves manipulation of HDFS data files,
+          does not apply to Kudu tables.
+        </p>
+      </div>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title22" id="impala_kudu__kudu_consistency">
+
+    <h2 class="title topictitle2" id="ariaid-title22">Consistency Considerations for Kudu Tables</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        Kudu tables have consistency characteristics such as uniqueness, controlled by the
+        primary key columns, and non-nullable columns. The emphasis for consistency is on
+        preventing duplicate or incomplete data from being stored in a table.
+      </p>
+
+      <p class="p">
+        Currently, Kudu does not enforce strong consistency for order of operations, total
+        success or total failure of a multi-row statement, or data that is read while a write
+        operation is in progress. Changes are applied atomically to each row, but not applied
+        as a single unit to all rows affected by a multi-row DML statement. That is, Kudu does
+        not currently have atomic multi-row statements or isolation between statements.
+      </p>
+
+      <p class="p">
+        If some rows are rejected during a DML operation because of a mismatch with duplicate
+        primary key values, <code class="ph codeph">NOT NULL</code> constraints, and so on, the statement
+        succeeds with a warning. Impala still inserts, deletes, or updates the other rows that
+        are not affected by the constraint violation.
+      </p>
+
+      <p class="p">
+        Consequently, the number of rows affected by a DML operation on a Kudu table might be
+        different than you expect.
+      </p>
+
+      <p class="p">
+        Because there is no strong consistency guarantee for information being inserted into,
+        deleted from, or updated across multiple tables simultaneously, consider denormalizing
+        the data where practical. That is, if you run separate <code class="ph codeph">INSERT</code>
+        statements to insert related rows into two different tables, one <code class="ph codeph">INSERT</code>
+        might fail while the other succeeds, leaving the data in an inconsistent state. Even if
+        both inserts succeed, a join query might happen during the interval between the
+        completion of the first and second statements, and the query would encounter incomplete
+        inconsistent data. Denormalizing the data into a single wide table can reduce the
+        possibility of inconsistency due to multi-table operations.
+      </p>
+
+      <p class="p">
+        Information about the number of rows affected by a DML operation is reported in
+        <span class="keyword cmdname">impala-shell</span> output, and in the <code class="ph codeph">PROFILE</code> output, but
+        is not currently reported to HiveServer2 clients such as JDBC or ODBC applications.
+      </p>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title23" id="impala_kudu__kudu_security">
+
+    <h2 class="title topictitle2" id="ariaid-title23">Security Considerations for Kudu Tables</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        Security for Kudu tables involves:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          <p class="p">
+            Sentry authorization.
+          </p>
+          <p class="p">
+        Access to Kudu tables must be granted to and revoked from roles as usual.
+        Only users with <code class="ph codeph">ALL</code> privileges on <code class="ph codeph">SERVER</code> can create external Kudu tables.
+        Currently, access to a Kudu table is <span class="q">"all or nothing"</span>:
+        enforced at the table level rather than the column level, and applying to all
+        SQL operations rather than individual statements such as <code class="ph codeph">INSERT</code>.
+        Because non-SQL APIs can access Kudu data without going through Sentry
+        authorization, currently the Sentry support is considered preliminary
+        and subject to change.
+      </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            Lineage tracking.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            Auditing.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            Redaction of sensitive information from log files.
+          </p>
+        </li>
+      </ul>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title24" id="impala_kudu__kudu_performance">
+
+    <h2 class="title topictitle2" id="ariaid-title24">Impala Query Performance for Kudu Tables</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        For queries involving Kudu tables, Impala can delegate much of the work of filtering the
+        result set to Kudu, avoiding some of the I/O involved in full table scans of tables
+        containing HDFS data files. This type of optimization is especially effective for
+        partitioned Kudu tables, where the Impala query <code class="ph codeph">WHERE</code> clause refers to
+        one or more primary key columns that are also used as partition key columns. For
+        example, if a partitioned Kudu table uses a <code class="ph codeph">HASH</code> clause for
+        <code class="ph codeph">col1</code> and a <code class="ph codeph">RANGE</code> clause for <code class="ph codeph">col2</code>, a
+        query using a clause such as <code class="ph codeph">WHERE col1 IN (1,2,3) AND col2 &gt; 100</code>
+        can determine exactly which tablet servers contain relevant data, and therefore
+        parallelize the query very efficiently.
+      </p>
+
+      <p class="p">
+        See <a class="xref" href="impala_explain.html">EXPLAIN Statement</a> for examples of evaluating the effectiveness of
+        the predicate pushdown for a specific query against a Kudu table.
+      </p>
+
+      
+      
+
+    </div>
+
+    
+
+    
+
+  </article>
+
+</article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_langref.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_langref.html b/docs/build/html/topics/impala_langref.html
new file mode 100644
index 0000000..b7f8546
--- /dev/null
+++ b/docs/build/html/topics/impala_langref.html
@@ -0,0 +1,66 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_comments.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_datatypes.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_literals.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_operators.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_schema_objects.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_langref_sql.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_functions.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_langref_unsupported.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_porting.html"><met
 a name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="langref"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Impala SQL Language Reference</title></head><body id="langref"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Impala SQL Language Reference</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      Impala uses SQL as its query language. To protect user investment in skills development and query
+      design, Impala provides a high degree of compatibility with the Hive Query Language (HiveQL):
+    </p>
+
+    <ul class="ul">
+      <li class="li">
+        Because Impala uses the same metadata store as Hive to record information about table structure and
+        properties, Impala can access tables defined through the native Impala <code class="ph codeph">CREATE TABLE</code>
+        command, or tables created using the Hive data definition language (DDL).
+      </li>
+
+      <li class="li">
+        Impala supports data manipulation (DML) statements similar to the DML component of HiveQL.
+      </li>
+
+      <li class="li">
+        Impala provides many <a class="xref" href="impala_functions.html#builtins">built-in functions</a> with the same
+        names and parameter types as their HiveQL equivalents.
+      </li>
+    </ul>
+
+    <p class="p">
+      Impala supports most of the same <a class="xref" href="impala_langref_sql.html#langref_sql">statements and
+      clauses</a> as HiveQL, including, but not limited to <code class="ph codeph">JOIN</code>, <code class="ph codeph">AGGREGATE</code>,
+      <code class="ph codeph">DISTINCT</code>, <code class="ph codeph">UNION ALL</code>, <code class="ph codeph">ORDER BY</code>, <code class="ph codeph">LIMIT</code> and
+      (uncorrelated) subquery in the <code class="ph codeph">FROM</code> clause. Impala also supports <code class="ph codeph">INSERT
+      INTO</code> and <code class="ph codeph">INSERT OVERWRITE</code>.
+    </p>
+
+    <p class="p">
+      Impala supports data types with the same names and semantics as the equivalent Hive data types:
+      <code class="ph codeph">STRING</code>, <code class="ph codeph">TINYINT</code>, <code class="ph codeph">SMALLINT</code>, <code class="ph codeph">INT</code>,
+      <code class="ph codeph">BIGINT</code>, <code class="ph codeph">FLOAT</code>, <code class="ph codeph">DOUBLE</code>, <code class="ph codeph">BOOLEAN</code>,
+      <code class="ph codeph">STRING</code>, <code class="ph codeph">TIMESTAMP</code>.
+    </p>
+
+    <p class="p">
+      For full details about Impala SQL syntax and semantics, see
+      <a class="xref" href="impala_langref_sql.html#langref_sql">Impala SQL Statements</a>.
+    </p>
+
+    <p class="p">
+      Most HiveQL <code class="ph codeph">SELECT</code> and <code class="ph codeph">INSERT</code> statements run unmodified with Impala. For
+      information about Hive syntax not available in Impala, see
+      <a class="xref" href="impala_langref_unsupported.html#langref_hiveql_delta">SQL Differences Between Impala and Hive</a>.
+    </p>
+
+    <p class="p">
+      For a list of the built-in functions available in Impala queries, see
+      <a class="xref" href="impala_functions.html#builtins">Impala Built-In Functions</a>.
+    </p>
+
+    <p class="p toc"></p>
+  </div>
+<nav role="navigation" class="related-links"><ul class="ullinks"><li class="link ulchildlink"><strong><a href="../topics/impala_comments.html">Comments</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_datatypes.html">Data Types</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_literals.html">Literals</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_operators.html">SQL Operators</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_schema_objects.html">Impala Schema Objects and Object Names</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_langref_sql.html">Impala SQL Statements</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_functions.html">Impala Built-In Functions</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_langref_unsupported.html">SQL Diff
 erences Between Impala and Hive</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_porting.html">Porting SQL from Other Database Systems to Impala</a></strong><br></li></ul></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_langref_sql.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_langref_sql.html b/docs/build/html/topics/impala_langref_sql.html
new file mode 100644
index 0000000..9d6d33d
--- /dev/null
+++ b/docs/build/html/topics/impala_langref_sql.html
@@ -0,0 +1,28 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_langref.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_ddl.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_dml.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_alter_table.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_alter_view.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_compute_stats.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_create_database.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_create_function.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_create_role.html"><meta name
 ="DC.Relation" scheme="URI" content="../topics/impala_create_table.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_create_view.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_delete.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_describe.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_drop_database.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_drop_function.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_drop_role.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_drop_stats.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_drop_table.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_drop_view.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_explain.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_grant.html"><meta name="DC.Relation" scheme="URI" cont
 ent="../topics/impala_insert.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_invalidate_metadata.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_load_data.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_refresh.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_revoke.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_select.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_set.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_show.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_truncate_table.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_update.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_upsert.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_use.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version"
  content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="langref_sql"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Impala SQL Statements</title></head><body id="langref_sql"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Impala SQL Statements</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      The Impala SQL dialect supports a range of standard elements, plus some extensions for Big Data use cases
+      related to data loading and data warehousing.
+    </p>
+
+    <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+      <p class="p">
+        In the <span class="keyword cmdname">impala-shell</span> interpreter, a semicolon at the end of each statement is required.
+        Since the semicolon is not actually part of the SQL syntax, we do not include it in the syntax definition
+        of each statement, but we do show it in examples intended to be run in <span class="keyword cmdname">impala-shell</span>.
+      </p>
+    </div>
+
+    <p class="p toc all">
+      The following sections show the major SQL statements that you work with in Impala:
+    </p>
+  </div>
+<nav role="navigation" class="related-links"><ul class="ullinks"><li class="link ulchildlink"><strong><a href="../topics/impala_ddl.html">DDL Statements</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_dml.html">DML Statements</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_alter_table.html">ALTER TABLE Statement</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_alter_view.html">ALTER VIEW Statement</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_compute_stats.html">COMPUTE STATS Statement</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_create_database.html">CREATE DATABASE Statement</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_create_function.html">CREATE FUNCTION Statement</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_create_role.h
 tml">CREATE ROLE Statement (Impala 2.0 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_create_table.html">CREATE TABLE Statement</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_create_view.html">CREATE VIEW Statement</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_delete.html">DELETE Statement (Impala 2.8 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_describe.html">DESCRIBE Statement</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_drop_database.html">DROP DATABASE Statement</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_drop_function.html">DROP FUNCTION Statement</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_drop_role.html">DROP ROLE Statement (Impala 2.0 or higher only)</a></strong><br></li><li cla
 ss="link ulchildlink"><strong><a href="../topics/impala_drop_stats.html">DROP STATS Statement</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_drop_table.html">DROP TABLE Statement</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_drop_view.html">DROP VIEW Statement</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_explain.html">EXPLAIN Statement</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_grant.html">GRANT Statement (Impala 2.0 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_insert.html">INSERT Statement</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_invalidate_metadata.html">INVALIDATE METADATA Statement</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_load_data.html">LOAD DATA Statement</a></strong><br></li><li class=
 "link ulchildlink"><strong><a href="../topics/impala_refresh.html">REFRESH Statement</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_revoke.html">REVOKE Statement (Impala 2.0 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_select.html">SELECT Statement</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_set.html">SET Statement</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_show.html">SHOW Statement</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_truncate_table.html">TRUNCATE TABLE Statement (Impala 2.3 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_update.html">UPDATE Statement (Impala 2.8 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_upsert.html">UPSERT Statement (Impala 2.8 or higher on
 ly)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_use.html">USE Statement</a></strong><br></li></ul><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_langref.html">Impala SQL Language Reference</a></div></div></nav></article></main></body></html>
\ No newline at end of file

[22/51] [partial] incubator-impala git commit: IMPALA-4181 [DOCS] Publish rendered Impala documentation to ASF site

Posted by jb...@apache.org.

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_mem_limit.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_mem_limit.html b/docs/build/html/topics/impala_mem_limit.html
new file mode 100644
index 0000000..704dae9
--- /dev/null
+++ b/docs/build/html/topics/impala_mem_limit.html
@@ -0,0 +1,206 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="mem_limit"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>MEM_LIMIT Query Option</title></head><body id="mem_limit"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">MEM_LIMIT Query Option</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      
+      When resource management is not enabled, defines the maximum amount of memory a query can allocate on each node.
+      Therefore, the total memory that can be used by a query is the <code class="ph codeph">MEM_LIMIT</code> times the number of nodes.
+    </p>
+
+    <p class="p">
+      There are two levels of memory limit for Impala.
+      The <code class="ph codeph">-mem_limit</code> startup option sets an overall limit for the <span class="keyword cmdname">impalad</span> process
+      (which handles multiple queries concurrently).
+      That limit is typically expressed in terms of a percentage of the RAM available on the host, such as <code class="ph codeph">-mem_limit=70%</code>.
+      The <code class="ph codeph">MEM_LIMIT</code> query option, which you set through <span class="keyword cmdname">impala-shell</span>
+      or the <code class="ph codeph">SET</code> statement in a JDBC or ODBC application, applies to each individual query.
+      The <code class="ph codeph">MEM_LIMIT</code> query option is usually expressed as a fixed size such as <code class="ph codeph">10gb</code>,
+      and must always be less than the <span class="keyword cmdname">impalad</span> memory limit.
+    </p>
+
+    <p class="p">
+      If query processing exceeds the specified memory limit on any node, either the per-query limit or the
+      <span class="keyword cmdname">impalad</span> limit, Impala cancels the query automatically.
+      Memory limits are checked periodically during query processing, so the actual memory in use
+      might briefly exceed the limit without the query being cancelled.
+    </p>
+
+    <p class="p">
+      When resource management is enabled, the mechanism for this option changes. If set, it overrides the
+      automatic memory estimate from Impala. Impala requests this amount of memory from YARN on each node, and the
+      query does not proceed until that much memory is available. The actual memory used by the query could be
+      lower, since some queries use much less memory than others. With resource management, the
+      <code class="ph codeph">MEM_LIMIT</code> setting acts both as a hard limit on the amount of memory a query can use on any
+      node (enforced by YARN) and a guarantee that that much memory will be available on each node while the query
+      is being executed. When resource management is enabled but no <code class="ph codeph">MEM_LIMIT</code> setting is
+      specified, Impala estimates the amount of memory needed on each node for each query, requests that much
+      memory from YARN before starting the query, and then internally sets the <code class="ph codeph">MEM_LIMIT</code> on each
+      node to the requested amount of memory during the query. Thus, if the query takes more memory than was
+      originally estimated, Impala detects that the <code class="ph codeph">MEM_LIMIT</code> is exceeded and cancels the query
+      itself.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Type:</strong> numeric
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Units:</strong> A numeric argument represents memory size in bytes; you can also use a suffix of <code class="ph codeph">m</code> or <code class="ph codeph">mb</code>
+      for megabytes, or more commonly <code class="ph codeph">g</code> or <code class="ph codeph">gb</code> for gigabytes. If you specify a value with unrecognized
+      formats, subsequent queries fail with an error.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Default:</strong> 0 (unlimited)
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+    <p class="p">
+      The <code class="ph codeph">MEM_LIMIT</code> setting is primarily useful in a high-concurrency setting,
+      or on a cluster with a workload shared between Impala and other data processing components.
+      You can prevent any query from accidentally using much more memory than expected,
+      which could negatively impact other Impala queries.
+    </p>
+
+    <p class="p">
+      Use the output of the <code class="ph codeph">SUMMARY</code> command in <span class="keyword cmdname">impala-shell</span>
+      to get a report of memory used for each phase of your most heavyweight queries on each node,
+      and then set a <code class="ph codeph">MEM_LIMIT</code> somewhat higher than that.
+      See <a class="xref" href="impala_explain_plan.html#perf_summary">Using the SUMMARY Report for Performance Tuning</a> for usage information about
+      the <code class="ph codeph">SUMMARY</code> command.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+    <p class="p">
+      The following examples show how to set the <code class="ph codeph">MEM_LIMIT</code> query option
+      using a fixed number of bytes, or suffixes representing gigabytes or megabytes.
+    </p>
+
+<pre class="pre codeblock"><code>
+[localhost:21000] &gt; set mem_limit=3000000000;
+MEM_LIMIT set to 3000000000
+[localhost:21000] &gt; select 5;
+Query: select 5
++---+
+| 5 |
++---+
+| 5 |
++---+
+
+[localhost:21000] &gt; set mem_limit=3g;
+MEM_LIMIT set to 3g
+[localhost:21000] &gt; select 5;
+Query: select 5
++---+
+| 5 |
++---+
+| 5 |
++---+
+
+[localhost:21000] &gt; set mem_limit=3gb;
+MEM_LIMIT set to 3gb
+[localhost:21000] &gt; select 5;
++---+
+| 5 |
++---+
+| 5 |
++---+
+
+[localhost:21000] &gt; set mem_limit=3m;
+MEM_LIMIT set to 3m
+[localhost:21000] &gt; select 5;
++---+
+| 5 |
++---+
+| 5 |
++---+
+[localhost:21000] &gt; set mem_limit=3mb;
+MEM_LIMIT set to 3mb
+[localhost:21000] &gt; select 5;
++---+
+| 5 |
++---+
+</code></pre>
+
+    <p class="p">
+      The following examples show how unrecognized <code class="ph codeph">MEM_LIMIT</code>
+      values lead to errors for subsequent queries.
+    </p>
+
+<pre class="pre codeblock"><code>
+[localhost:21000] &gt; set mem_limit=3tb;
+MEM_LIMIT set to 3tb
+[localhost:21000] &gt; select 5;
+ERROR: Failed to parse query memory limit from '3tb'.
+
+[localhost:21000] &gt; set mem_limit=xyz;
+MEM_LIMIT set to xyz
+[localhost:21000] &gt; select 5;
+Query: select 5
+ERROR: Failed to parse query memory limit from 'xyz'.
+</code></pre>
+
+    <p class="p">
+      The following examples shows the automatic query cancellation
+      when the <code class="ph codeph">MEM_LIMIT</code> value is exceeded
+      on any host involved in the Impala query. First it runs a
+      successful query and checks the largest amount of memory
+      used on any node for any stage of the query.
+      Then it sets an artificially low <code class="ph codeph">MEM_LIMIT</code>
+      setting so that the same query cannot run.
+    </p>
+
+<pre class="pre codeblock"><code>
+[localhost:21000] &gt; select count(*) from customer;
+Query: select count(*) from customer
++----------+
+| count(*) |
++----------+
+| 150000   |
++----------+
+
+[localhost:21000] &gt; select count(distinct c_name) from customer;
+Query: select count(distinct c_name) from customer
++------------------------+
+| count(distinct c_name) |
++------------------------+
+| 150000                 |
++------------------------+
+
+[localhost:21000] &gt; summary;
++--------------+--------+----------+----------+---------+------------+----------+---------------+---------------+
+| Operator     | #Hosts | Avg Time | Max Time | #Rows   | Est. #Rows | Peak Mem | Est. Peak Mem | Detail        |
++--------------+--------+----------+----------+---------+------------+----------+---------------+---------------+
+| 06:AGGREGATE | 1      | 230.00ms | 230.00ms | 1       | 1          | 16.00 KB | -1 B          | FINALIZE      |
+| 05:EXCHANGE  | 1      | 43.44us  | 43.44us  | 1       | 1          | 0 B      | -1 B          | UNPARTITIONED |
+| 02:AGGREGATE | 1      | 227.14ms | 227.14ms | 1       | 1          | 12.00 KB | 10.00 MB      |               |
+| 04:AGGREGATE | 1      | 126.27ms | 126.27ms | 150.00K | 150.00K    | 15.17 MB | 10.00 MB      |               |
+| 03:EXCHANGE  | 1      | 44.07ms  | 44.07ms  | 150.00K | 150.00K    | 0 B      | 0 B           | HASH(c_name)  |
+<strong class="ph b">| 01:AGGREGATE | 1      | 361.94ms | 361.94ms | 150.00K | 150.00K    | 23.04 MB | 10.00 MB      |               |</strong>
+| 00:SCAN HDFS | 1      | 43.64ms  | 43.64ms  | 150.00K | 150.00K    | 24.19 MB | 64.00 MB      | tpch.customer |
++--------------+--------+----------+----------+---------+------------+----------+---------------+---------------+
+
+[localhost:21000] &gt; set mem_limit=15mb;
+MEM_LIMIT set to 15mb
+[localhost:21000] &gt; select count(distinct c_name) from customer;
+Query: select count(distinct c_name) from customer
+ERROR:
+Memory limit exceeded
+Query did not have enough memory to get the minimum required buffers in the block manager.
+</code></pre>
+
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_min.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_min.html b/docs/build/html/topics/impala_min.html
new file mode 100644
index 0000000..88a39ab
--- /dev/null
+++ b/docs/build/html/topics/impala_min.html
@@ -0,0 +1,297 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_aggregate_functions.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="min"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>MIN Function</title></head><body id="min"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">MIN Function</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      
+      An aggregate function that returns the minimum value from a set of numbers. Opposite of the
+      <code class="ph codeph">MAX</code> function. Its single argument can be numeric column, or the numeric result of a function
+      or expression applied to the column value. Rows with a <code class="ph codeph">NULL</code> value for the specified column
+      are ignored. If the table is empty, or all the values supplied to <code class="ph codeph">MIN</code> are
+      <code class="ph codeph">NULL</code>, <code class="ph codeph">MIN</code> returns <code class="ph codeph">NULL</code>.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>MIN([DISTINCT | ALL] <var class="keyword varname">expression</var>) [OVER (<var class="keyword varname">analytic_clause</var>)]</code></pre>
+
+    <p class="p">
+      When the query contains a <code class="ph codeph">GROUP BY</code> clause, returns one value for each combination of
+      grouping values.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Restrictions:</strong> In Impala 2.0 and higher, this function can be used as an analytic function, but with restrictions on any window clause.
+        For <code class="ph codeph">MAX()</code> and <code class="ph codeph">MIN()</code>, the window clause is only allowed if the start
+        bound is <code class="ph codeph">UNBOUNDED PRECEDING</code>.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Return type:</strong> Same as the input value, except for <code class="ph codeph">CHAR</code> and <code class="ph codeph">VARCHAR</code>
+        arguments which produce a <code class="ph codeph">STRING</code> result
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+    <p class="p">
+        If you frequently run aggregate functions such as <code class="ph codeph">MIN()</code>, <code class="ph codeph">MAX()</code>, and
+        <code class="ph codeph">COUNT(DISTINCT)</code> on partition key columns, consider enabling the <code class="ph codeph">OPTIMIZE_PARTITION_KEY_SCANS</code>
+        query option, which optimizes such queries. This feature is available in <span class="keyword">Impala 2.5</span> and higher.
+        See <a class="xref" href="../shared/../topics/impala_optimize_partition_key_scans.html">OPTIMIZE_PARTITION_KEY_SCANS Query Option (Impala 2.5 or higher only)</a>
+        for the kinds of queries that this option applies to, and slight differences in how partitions are
+        evaluated when this query option is enabled.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Complex type considerations:</strong>
+      </p>
+
+    <p class="p">
+        To access a column with a complex type (<code class="ph codeph">ARRAY</code>, <code class="ph codeph">STRUCT</code>, or <code class="ph codeph">MAP</code>)
+        in an aggregation function, you unpack the individual elements using join notation in the query,
+        and then apply the function to the final scalar item, field, key, or value at the bottom of any nested type hierarchy in the column.
+        See <a class="xref" href="../shared/../topics/impala_complex_types.html#complex_types">Complex Types (Impala 2.3 or higher only)</a> for details about using complex types in Impala.
+      </p>
+
+    <div class="p">
+The following example demonstrates calls to several aggregation functions
+using values from a column containing nested complex types
+(an <code class="ph codeph">ARRAY</code> of <code class="ph codeph">STRUCT</code> items).
+The array is unpacked inside the query using join notation.
+The array elements are referenced using the <code class="ph codeph">ITEM</code>
+pseudocolumn, and the structure fields inside the array elements
+are referenced using dot notation.
+Numeric values such as <code class="ph codeph">SUM()</code> and <code class="ph codeph">AVG()</code>
+are computed using the numeric <code class="ph codeph">R_NATIONKEY</code> field, and
+the general-purpose <code class="ph codeph">MAX()</code> and <code class="ph codeph">MIN()</code>
+values are computed from the string <code class="ph codeph">N_NAME</code> field.
+<pre class="pre codeblock"><code>describe region;
++-------------+-------------------------+---------+
+| name        | type                    | comment |
++-------------+-------------------------+---------+
+| r_regionkey | smallint                |         |
+| r_name      | string                  |         |
+| r_comment   | string                  |         |
+| r_nations   | array&lt;struct&lt;           |         |
+|             |   n_nationkey:smallint, |         |
+|             |   n_name:string,        |         |
+|             |   n_comment:string      |         |
+|             | &gt;&gt;                      |         |
++-------------+-------------------------+---------+
+
+select r_name, r_nations.item.n_nationkey
+  from region, region.r_nations as r_nations
+order by r_name, r_nations.item.n_nationkey;
++-------------+------------------+
+| r_name      | item.n_nationkey |
++-------------+------------------+
+| AFRICA      | 0                |
+| AFRICA      | 5                |
+| AFRICA      | 14               |
+| AFRICA      | 15               |
+| AFRICA      | 16               |
+| AMERICA     | 1                |
+| AMERICA     | 2                |
+| AMERICA     | 3                |
+| AMERICA     | 17               |
+| AMERICA     | 24               |
+| ASIA        | 8                |
+| ASIA        | 9                |
+| ASIA        | 12               |
+| ASIA        | 18               |
+| ASIA        | 21               |
+| EUROPE      | 6                |
+| EUROPE      | 7                |
+| EUROPE      | 19               |
+| EUROPE      | 22               |
+| EUROPE      | 23               |
+| MIDDLE EAST | 4                |
+| MIDDLE EAST | 10               |
+| MIDDLE EAST | 11               |
+| MIDDLE EAST | 13               |
+| MIDDLE EAST | 20               |
++-------------+------------------+
+
+select
+  r_name,
+  count(r_nations.item.n_nationkey) as count,
+  sum(r_nations.item.n_nationkey) as sum,
+  avg(r_nations.item.n_nationkey) as avg,
+  min(r_nations.item.n_name) as minimum,
+  max(r_nations.item.n_name) as maximum,
+  ndv(r_nations.item.n_nationkey) as distinct_vals
+from
+  region, region.r_nations as r_nations
+group by r_name
+order by r_name;
++-------------+-------+-----+------+-----------+----------------+---------------+
+| r_name      | count | sum | avg  | minimum   | maximum        | distinct_vals |
++-------------+-------+-----+------+-----------+----------------+---------------+
+| AFRICA      | 5     | 50  | 10   | ALGERIA   | MOZAMBIQUE     | 5             |
+| AMERICA     | 5     | 47  | 9.4  | ARGENTINA | UNITED STATES  | 5             |
+| ASIA        | 5     | 68  | 13.6 | CHINA     | VIETNAM        | 5             |
+| EUROPE      | 5     | 77  | 15.4 | FRANCE    | UNITED KINGDOM | 5             |
+| MIDDLE EAST | 5     | 58  | 11.6 | EGYPT     | SAUDI ARABIA   | 5             |
++-------------+-------+-----+------+-----------+----------------+---------------+
+</code></pre>
+</div>
+
+    <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>-- Find the smallest value for this column in the table.
+select min(c1) from t1;
+-- Find the smallest value for this column from a subset of the table.
+select min(c1) from t1 where month = 'January' and year = '2013';
+-- Find the smallest value from a set of numeric function results.
+select min(length(s)) from t1;
+-- Can also be used in combination with DISTINCT and/or GROUP BY.
+-- Return more than one result.
+select month, year, min(purchase_price) from store_stats group by month, year;
+-- Filter the input to eliminate duplicates before performing the calculation.
+select min(distinct x) from t1;
+</code></pre>
+
+    <div class="p">
+      The following examples show how to use <code class="ph codeph">MIN()</code> in an analytic context. They use a table
+      containing integers from 1 to 10. Notice how the <code class="ph codeph">MIN()</code> is reported for each input value, as
+      opposed to the <code class="ph codeph">GROUP BY</code> clause which condenses the result set.
+<pre class="pre codeblock"><code>select x, property, min(x) over (partition by property) as min from int_t where property in ('odd','even');
++----+----------+-----+
+| x  | property | min |
++----+----------+-----+
+| 2  | even     | 2   |
+| 4  | even     | 2   |
+| 6  | even     | 2   |
+| 8  | even     | 2   |
+| 10 | even     | 2   |
+| 1  | odd      | 1   |
+| 3  | odd      | 1   |
+| 5  | odd      | 1   |
+| 7  | odd      | 1   |
+| 9  | odd      | 1   |
++----+----------+-----+
+</code></pre>
+
+Adding an <code class="ph codeph">ORDER BY</code> clause lets you experiment with results that are cumulative or apply to a moving
+set of rows (the <span class="q">"window"</span>). The following examples use <code class="ph codeph">MIN()</code> in an analytic context
+(that is, with an <code class="ph codeph">OVER()</code> clause) to display the smallest value of <code class="ph codeph">X</code>
+encountered up to each row in the result set. The examples use two columns in the <code class="ph codeph">ORDER BY</code>
+clause to produce a sequence of values that rises and falls, to illustrate how the <code class="ph codeph">MIN()</code>
+result only decreases or stays the same throughout each partition within the result set.
+The basic <code class="ph codeph">ORDER BY x</code> clause implicitly
+activates a window clause of <code class="ph codeph">RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW</code>,
+which is effectively the same as <code class="ph codeph">ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW</code>,
+therefore all of these examples produce the same results:
+
+<pre class="pre codeblock"><code>select x, property, min(x) <strong class="ph b">over (order by property, x desc)</strong> as 'minimum to this point'
+  from int_t where property in ('prime','square');
++---+----------+-----------------------+
+| x | property | minimum to this point |
++---+----------+-----------------------+
+| 7 | prime    | 7                     |
+| 5 | prime    | 5                     |
+| 3 | prime    | 3                     |
+| 2 | prime    | 2                     |
+| 9 | square   | 2                     |
+| 4 | square   | 2                     |
+| 1 | square   | 1                     |
++---+----------+-----------------------+
+
+select x, property,
+  min(x) over
+  (
+    <strong class="ph b">order by property, x desc</strong>
+    <strong class="ph b">range between unbounded preceding and current row</strong>
+  ) as 'minimum to this point'
+from int_t where property in ('prime','square');
++---+----------+-----------------------+
+| x | property | minimum to this point |
++---+----------+-----------------------+
+| 7 | prime    | 7                     |
+| 5 | prime    | 5                     |
+| 3 | prime    | 3                     |
+| 2 | prime    | 2                     |
+| 9 | square   | 2                     |
+| 4 | square   | 2                     |
+| 1 | square   | 1                     |
++---+----------+-----------------------+
+
+select x, property,
+  min(x) over
+  (
+    <strong class="ph b">order by property, x desc</strong>
+    <strong class="ph b">rows between unbounded preceding and current row</strong>
+  ) as 'minimum to this point'
+from int_t where property in ('prime','square');
++---+----------+-----------------------+
+| x | property | minimum to this point |
++---+----------+-----------------------+
+| 7 | prime    | 7                     |
+| 5 | prime    | 5                     |
+| 3 | prime    | 3                     |
+| 2 | prime    | 2                     |
+| 9 | square   | 2                     |
+| 4 | square   | 2                     |
+| 1 | square   | 1                     |
++---+----------+-----------------------+
+</code></pre>
+
+The following examples show how to construct a moving window, with a running minimum taking into account all rows before
+and 1 row after the current row.
+Because of a restriction in the Impala <code class="ph codeph">RANGE</code> syntax, this type of
+moving window is possible with the <code class="ph codeph">ROWS BETWEEN</code> clause but not the <code class="ph codeph">RANGE BETWEEN</code> clause.
+Because of an extra Impala restriction on the <code class="ph codeph">MAX()</code> and <code class="ph codeph">MIN()</code> functions in an
+analytic context, the lower bound must be <code class="ph codeph">UNBOUNDED PRECEDING</code>.
+<pre class="pre codeblock"><code>select x, property,
+  min(x) over
+  (
+    <strong class="ph b">order by property, x desc</strong>
+    <strong class="ph b">rows between unbounded preceding and 1 following</strong>
+  ) as 'local minimum'
+from int_t where property in ('prime','square');
++---+----------+---------------+
+| x | property | local minimum |
++---+----------+---------------+
+| 7 | prime    | 5             |
+| 5 | prime    | 3             |
+| 3 | prime    | 2             |
+| 2 | prime    | 2             |
+| 9 | square   | 2             |
+| 4 | square   | 1             |
+| 1 | square   | 1             |
++---+----------+---------------+
+
+-- Doesn't work because of syntax restriction on RANGE clause.
+select x, property,
+  min(x) over
+  (
+    <strong class="ph b">order by property, x desc</strong>
+    <strong class="ph b">range between unbounded preceding and 1 following</strong>
+  ) as 'local minimum'
+from int_t where property in ('prime','square');
+ERROR: AnalysisException: RANGE is only supported with both the lower and upper bounds UNBOUNDED or one UNBOUNDED and the other CURRENT ROW.
+</code></pre>
+    </div>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+    <p class="p">
+      <a class="xref" href="impala_analytic_functions.html#analytic_functions">Impala Analytic Functions</a>, <a class="xref" href="impala_max.html#max">MAX Function</a>,
+      <a class="xref" href="impala_avg.html#avg">AVG Function</a>
+    </p>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_aggregate_functions.html">Impala Aggregate Functions</a></div></div></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_misc_functions.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_misc_functions.html b/docs/build/html/topics/impala_misc_functions.html
new file mode 100644
index 0000000..bf8c990
--- /dev/null
+++ b/docs/build/html/topics/impala_misc_functions.html
@@ -0,0 +1,175 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_functions.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="misc_functions"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Impala Miscellaneous Functions</title></head><body id="misc_functions"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Impala Miscellaneous Functions</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      Impala supports the following utility functions that do not operate on a particular column or data type:
+    </p>
+
+    <dl class="dl">
+      
+
+        <dt class="dt dlterm" id="misc_functions__current_database">
+          <code class="ph codeph">current_database()</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Returns the database that the session is currently using, either <code class="ph codeph">default</code>
+          if no database has been selected, or whatever database the session switched to through a
+          <code class="ph codeph">USE</code> statement or the <span class="keyword cmdname">impalad</span><code class="ph codeph">-d</code> option.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+          </p>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="misc_functions__effective_user">
+          <code class="ph codeph">effective_user()</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Typically returns the same value as <code class="ph codeph">user()</code>,
+          except if delegation is enabled, in which case it returns the ID of the delegated user.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+          </p>
+          <p class="p">
+            <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.2.5</span>
+          </p>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="misc_functions__pid">
+          <code class="ph codeph">pid()</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Returns the process ID of the <span class="keyword cmdname">impalad</span> daemon that the session is
+          connected to. You can use it during low-level debugging, to issue Linux commands that trace, show the
+          arguments, and so on the <span class="keyword cmdname">impalad</span> process.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">int</code>
+          </p>
+        </dd>
+
+      
+
+      
+
+      
+
+        <dt class="dt dlterm" id="misc_functions__user">
+          <code class="ph codeph">user()</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Returns the username of the Linux user who is connected to the <span class="keyword cmdname">impalad</span>
+          daemon. Typically called a single time, in a query without any <code class="ph codeph">FROM</code> clause, to
+          understand how authorization settings apply in a security context; once you know the logged-in username,
+          you can check which groups that user belongs to, and from the list of groups you can check which roles
+          are available to those groups through the authorization policy file.
+          <p class="p">
+        In Impala 2.0 and later, <code class="ph codeph">user()</code> returns the full Kerberos principal string, such as
+        <code class="ph codeph">user@example.com</code>, in a Kerberized environment.
+      </p>
+          <p class="p">
+            When delegation is enabled, consider calling the <code class="ph codeph">effective_user()</code> function instead.
+          </p>
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+          </p>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="misc_functions__uuid">
+          <code class="ph codeph">uuid()</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Returns a <a class="xref" href="https://en.wikipedia.org/wiki/Universally_unique_identifier" target="_blank">universal unique identifier</a>, a 128-bit value encoded as a string with groups of hexadecimal digits separated by dashes.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+          </p>
+          <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.5.0</span>
+      </p>
+          <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+          <p class="p">
+            Ascending numeric sequences of type <code class="ph codeph">BIGINT</code> are often used
+            as identifiers within a table, and as join keys across multiple tables.
+            The <code class="ph codeph">uuid()</code> value is a convenient alternative that does not
+            require storing or querying the highest sequence number. For example, you
+            can use it to quickly construct new unique identifiers during a data import job,
+            or to combine data from different tables without the likelihood of ID collisions.
+          </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+<pre class="pre codeblock"><code>
+-- Each call to uuid() produces a new arbitrary value.
+select uuid();
++--------------------------------------+
+| uuid()                               |
++--------------------------------------+
+| c7013e25-1455-457f-bf74-a2046e58caea |
++--------------------------------------+
+
+-- If you get a UUID for each row of a result set, you can use it as a
+-- unique identifier within a table, or even a unique ID across tables.
+select uuid() from four_row_table;
++--------------------------------------+
+| uuid()                               |
++--------------------------------------+
+| 51d3c540-85e5-4cb9-9110-604e53999e2e |
+| 0bb40071-92f6-4a59-a6a4-60d46e9703e2 |
+| 5e9d7c36-9842-4a96-862d-c13cd0457c02 |
+| cae29095-0cc0-4053-a5ea-7fcd3c780861 |
++--------------------------------------+
+</code></pre>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="misc_functions__version">
+          <code class="ph codeph">version()</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Returns information such as the precise version number and build date for the
+          <code class="ph codeph">impalad</code> daemon that you are currently connected to. Typically used to confirm that you
+          are connected to the expected level of Impala to use a particular feature, or to connect to several nodes
+          and confirm they are all running the same level of <span class="keyword cmdname">impalad</span>.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code> (with one or more embedded newlines)
+          </p>
+        </dd>
+
+      
+    </dl>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_functions.html">Impala Built-In Functions</a></div></div></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_mixed_security.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_mixed_security.html b/docs/build/html/topics/impala_mixed_security.html
new file mode 100644
index 0000000..b48fa8e
--- /dev/null
+++ b/docs/build/html/topics/impala_mixed_security.html
@@ -0,0 +1,26 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_authentication.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="mixed_security"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Using Multiple Authentication Methods with Impala</title></head><body id="mixed_security"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Using Multiple Authentication Methods with Impala</h1>
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      Impala 2.0 and later automatically handles both Kerberos and LDAP authentication. Each
+      <span class="keyword cmdname">impalad</span> daemon can accept both Kerberos and LDAP requests through the same port. No
+      special actions need to be taken if some users authenticate through Kerberos and some through LDAP.
+    </p>
+
+    <p class="p">
+      Prior to Impala 2.0, you had to configure each <span class="keyword cmdname">impalad</span> to listen on a specific port
+      depending on the kind of authentication, then configure your network load balancer to forward each kind of
+      request to a DataNode that was set up with the appropriate authentication type. Once the initial request was
+      made using either Kerberos or LDAP authentication, Impala automatically handled the process of coordinating
+      the work across multiple nodes and transmitting intermediate results back to the coordinator node.
+    </p>
+
+
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_authentication.html">Impala Authentication</a></div></div></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_mt_dop.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_mt_dop.html b/docs/build/html/topics/impala_mt_dop.html
new file mode 100644
index 0000000..adb4f0f
--- /dev/null
+++ b/docs/build/html/topics/impala_mt_dop.html
@@ -0,0 +1,190 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="mt_dop"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>MT_DOP Query Option</title></head><body id="mt_dop"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">MT_DOP Query Option</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      
+      Sets the degree of intra-node parallelism used for certain operations that
+      can benefit from multithreaded execution. You can specify values
+      higher than zero to find the ideal balance of response time,
+      memory usage, and CPU usage during statement processing.
+    </p>
+
+    <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+      <p class="p">
+        The Impala execution engine is being revamped incrementally to add
+        additional parallelism within a single host for certain statements and
+        kinds of operations. The setting <code class="ph codeph">MT_DOP=0</code> uses the
+        <span class="q">"old"</span> code path with limited intra-node parallelism.
+      </p>
+
+      <p class="p">
+        Currently, the operations affected by the <code class="ph codeph">MT_DOP</code>
+        query option are:
+      </p>
+      <ul class="ul">
+        <li class="li">
+          <p class="p">
+            <code class="ph codeph">COMPUTE [INCREMENTAL] STATS</code>. Impala automatically sets
+            <code class="ph codeph">MT_DOP=4</code> for <code class="ph codeph">COMPUTE STATS</code> and
+            <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> statements on Parquet tables.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            Queries with execution plans containing only scan and aggregation operators,
+            or local joins that do not need data exchanges (such as for nested types).
+            Other queries produce an error if <code class="ph codeph">MT_DOP</code> is set to a non-zero
+            value. Therefore, this query option is typically only set for the duration of
+            specific long-running, CPU-intensive queries.
+          </p>
+        </li>
+      </ul>
+
+    </div>
+
+    <p class="p">
+        <strong class="ph b">Type:</strong> integer
+      </p>
+    <p class="p">
+        <strong class="ph b">Default:</strong> <code class="ph codeph">0</code>
+      </p>
+    <p class="p">
+      Because <code class="ph codeph">COMPUTE STATS</code> and <code class="ph codeph">COMPUTE INCREMENTAL STATS</code>
+      statements for Parquet tables benefit substantially from extra intra-node
+      parallelism, Impala automatically sets <code class="ph codeph">MT_DOP=4</code> when computing stats
+      for Parquet tables.
+    </p>
+    <p class="p">
+      <strong class="ph b">Range:</strong> 0 to 64
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+    <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+      <p class="p">
+        Any timing figures in the following examples are on a small, lightly loaded development cluster.
+        Your mileage may vary. Speedups depend on many factors, including the number of rows, columns, and
+        partitions within each table.
+      </p>
+    </div>
+
+    <p class="p">
+      The following example shows how to run a <code class="ph codeph">COMPUTE STATS</code>
+      statement against a Parquet table with or without an explicit <code class="ph codeph">MT_DOP</code>
+      setting:
+    </p>
+
+<pre class="pre codeblock"><code>
+-- Explicitly setting MT_DOP to 0 selects the old code path.
+set mt_dop = 0;
+MT_DOP set to 0
+
+-- The analysis for the billion rows is distributed among hosts,
+-- but uses only a single core on each host.
+compute stats billion_rows_parquet;
++-----------------------------------------+
+| summary                                 |
++-----------------------------------------+
+| Updated 1 partition(s) and 2 column(s). |
++-----------------------------------------+
+
+drop stats billion_rows_parquet;
+
+-- Using 4 logical processors per host is faster.
+set mt_dop = 4;
+MT_DOP set to 4
+
+compute stats billion_rows_parquet;
++-----------------------------------------+
+| summary                                 |
++-----------------------------------------+
+| Updated 1 partition(s) and 2 column(s). |
++-----------------------------------------+
+
+drop stats billion_rows_parquet;
+
+-- Unsetting the option reverts back to its default.
+-- Which for COMPUTE STATS and a Parquet table is 4,
+-- so again it uses the fast path.
+unset MT_DOP;
+Unsetting option MT_DOP
+
+compute stats billion_rows_parquet;
++-----------------------------------------+
+| summary                                 |
++-----------------------------------------+
+| Updated 1 partition(s) and 2 column(s). |
++-----------------------------------------+
+
+</code></pre>
+
+    <p class="p">
+      The following example shows the effects of setting <code class="ph codeph">MT_DOP</code>
+      for a query involving only scan and aggregation operations for a Parquet table:
+    </p>
+
+<pre class="pre codeblock"><code>
+set mt_dop = 0;
+MT_DOP set to 0
+
+-- COUNT(DISTINCT) for a unique column is CPU-intensive.
+select count(distinct id) from billion_rows_parquet;
++--------------------+
+| count(distinct id) |
++--------------------+
+| 1000000000         |
++--------------------+
+Fetched 1 row(s) in 67.20s
+
+set mt_dop = 16;
+MT_DOP set to 16
+
+-- Introducing more intra-node parallelism for the aggregation
+-- speeds things up, and potentially reduces memory overhead by
+-- reducing the number of scanner threads.
+select count(distinct id) from billion_rows_parquet;
++--------------------+
+| count(distinct id) |
++--------------------+
+| 1000000000         |
++--------------------+
+Fetched 1 row(s) in 17.19s
+
+</code></pre>
+
+    <p class="p">
+      The following example shows how queries that are not compatible with non-zero
+      <code class="ph codeph">MT_DOP</code> settings produce an error when <code class="ph codeph">MT_DOP</code>
+      is set:
+    </p>
+
+<pre class="pre codeblock"><code>
+set mt_dop=1;
+MT_DOP set to 1
+
+select * from a1 inner join a2
+  on a1.id = a2.id limit 4;
+ERROR: NotImplementedException: MT_DOP not supported for plans with
+  base table joins or table sinks.
+
+</code></pre>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+    <p class="p">
+      <a class="xref" href="impala_compute_stats.html">COMPUTE STATS Statement</a>,
+      <a class="xref" href="impala_aggregate_functions.html">Impala Aggregate Functions</a>
+    </p>
+
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_ndv.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_ndv.html b/docs/build/html/topics/impala_ndv.html
new file mode 100644
index 0000000..d6c3117
--- /dev/null
+++ b/docs/build/html/topics/impala_ndv.html
@@ -0,0 +1,226 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_aggregate_functions.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="ndv"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>NDV Function</title></head><body id="ndv"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">NDV Function</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      
+      An aggregate function that returns an approximate value similar to the result of <code class="ph codeph">COUNT(DISTINCT
+      <var class="keyword varname">col</var>)</code>, the <span class="q">"number of distinct values"</span>. It is much faster than the
+      combination of <code class="ph codeph">COUNT</code> and <code class="ph codeph">DISTINCT</code>, and uses a constant amount of memory and
+      thus is less memory-intensive for columns with high cardinality.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>NDV([DISTINCT | ALL] <var class="keyword varname">expression</var>)</code></pre>
+
+    <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+    <p class="p">
+      This is the mechanism used internally by the <code class="ph codeph">COMPUTE STATS</code> statement for computing the
+      number of distinct values in a column.
+    </p>
+
+    <p class="p">
+      Because this number is an estimate, it might not reflect the precise number of different values in the
+      column, especially if the cardinality is very low or very high. If the estimated number is higher than the
+      number of rows in the table, Impala adjusts the value internally during query planning.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Return type:</strong> <code class="ph codeph">DOUBLE</code> in Impala 2.0 and higher; <code class="ph codeph">STRING</code> in earlier
+        releases
+      </p>
+
+
+
+    <p class="p">
+        <strong class="ph b">Complex type considerations:</strong>
+      </p>
+
+    <p class="p">
+        To access a column with a complex type (<code class="ph codeph">ARRAY</code>, <code class="ph codeph">STRUCT</code>, or <code class="ph codeph">MAP</code>)
+        in an aggregation function, you unpack the individual elements using join notation in the query,
+        and then apply the function to the final scalar item, field, key, or value at the bottom of any nested type hierarchy in the column.
+        See <a class="xref" href="../shared/../topics/impala_complex_types.html#complex_types">Complex Types (Impala 2.3 or higher only)</a> for details about using complex types in Impala.
+      </p>
+
+    <div class="p">
+The following example demonstrates calls to several aggregation functions
+using values from a column containing nested complex types
+(an <code class="ph codeph">ARRAY</code> of <code class="ph codeph">STRUCT</code> items).
+The array is unpacked inside the query using join notation.
+The array elements are referenced using the <code class="ph codeph">ITEM</code>
+pseudocolumn, and the structure fields inside the array elements
+are referenced using dot notation.
+Numeric values such as <code class="ph codeph">SUM()</code> and <code class="ph codeph">AVG()</code>
+are computed using the numeric <code class="ph codeph">R_NATIONKEY</code> field, and
+the general-purpose <code class="ph codeph">MAX()</code> and <code class="ph codeph">MIN()</code>
+values are computed from the string <code class="ph codeph">N_NAME</code> field.
+<pre class="pre codeblock"><code>describe region;
++-------------+-------------------------+---------+
+| name        | type                    | comment |
++-------------+-------------------------+---------+
+| r_regionkey | smallint                |         |
+| r_name      | string                  |         |
+| r_comment   | string                  |         |
+| r_nations   | array&lt;struct&lt;           |         |
+|             |   n_nationkey:smallint, |         |
+|             |   n_name:string,        |         |
+|             |   n_comment:string      |         |
+|             | &gt;&gt;                      |         |
++-------------+-------------------------+---------+
+
+select r_name, r_nations.item.n_nationkey
+  from region, region.r_nations as r_nations
+order by r_name, r_nations.item.n_nationkey;
++-------------+------------------+
+| r_name      | item.n_nationkey |
++-------------+------------------+
+| AFRICA      | 0                |
+| AFRICA      | 5                |
+| AFRICA      | 14               |
+| AFRICA      | 15               |
+| AFRICA      | 16               |
+| AMERICA     | 1                |
+| AMERICA     | 2                |
+| AMERICA     | 3                |
+| AMERICA     | 17               |
+| AMERICA     | 24               |
+| ASIA        | 8                |
+| ASIA        | 9                |
+| ASIA        | 12               |
+| ASIA        | 18               |
+| ASIA        | 21               |
+| EUROPE      | 6                |
+| EUROPE      | 7                |
+| EUROPE      | 19               |
+| EUROPE      | 22               |
+| EUROPE      | 23               |
+| MIDDLE EAST | 4                |
+| MIDDLE EAST | 10               |
+| MIDDLE EAST | 11               |
+| MIDDLE EAST | 13               |
+| MIDDLE EAST | 20               |
++-------------+------------------+
+
+select
+  r_name,
+  count(r_nations.item.n_nationkey) as count,
+  sum(r_nations.item.n_nationkey) as sum,
+  avg(r_nations.item.n_nationkey) as avg,
+  min(r_nations.item.n_name) as minimum,
+  max(r_nations.item.n_name) as maximum,
+  ndv(r_nations.item.n_nationkey) as distinct_vals
+from
+  region, region.r_nations as r_nations
+group by r_name
+order by r_name;
++-------------+-------+-----+------+-----------+----------------+---------------+
+| r_name      | count | sum | avg  | minimum   | maximum        | distinct_vals |
++-------------+-------+-----+------+-----------+----------------+---------------+
+| AFRICA      | 5     | 50  | 10   | ALGERIA   | MOZAMBIQUE     | 5             |
+| AMERICA     | 5     | 47  | 9.4  | ARGENTINA | UNITED STATES  | 5             |
+| ASIA        | 5     | 68  | 13.6 | CHINA     | VIETNAM        | 5             |
+| EUROPE      | 5     | 77  | 15.4 | FRANCE    | UNITED KINGDOM | 5             |
+| MIDDLE EAST | 5     | 58  | 11.6 | EGYPT     | SAUDI ARABIA   | 5             |
++-------------+-------+-----+------+-----------+----------------+---------------+
+</code></pre>
+</div>
+
+    <p class="p">
+        <strong class="ph b">Restrictions:</strong>
+      </p>
+
+    <p class="p">
+        This function cannot be used in an analytic context. That is, the <code class="ph codeph">OVER()</code> clause is not allowed at all with this function.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+    <p class="p">
+      The following example queries a billion-row table to illustrate the relative performance of
+      <code class="ph codeph">COUNT(DISTINCT)</code> and <code class="ph codeph">NDV()</code>. It shows how <code class="ph codeph">COUNT(DISTINCT)</code>
+      gives a precise answer, but is inefficient for large-scale data where an approximate result is sufficient.
+      The <code class="ph codeph">NDV()</code> function gives an approximate result but is much faster.
+    </p>
+
+<pre class="pre codeblock"><code>select count(distinct col1) from sample_data;
++---------------------+
+| count(distinct col1)|
++---------------------+
+| 100000              |
++---------------------+
+Fetched 1 row(s) in 20.13s
+
+select cast(ndv(col1) as bigint) as col1 from sample_data;
++----------+
+| col1     |
++----------+
+| 139017   |
++----------+
+Fetched 1 row(s) in 8.91s
+</code></pre>
+
+    <p class="p">
+      The following example shows how you can code multiple <code class="ph codeph">NDV()</code> calls in a single query, to
+      easily learn which columns have substantially more or fewer distinct values. This technique is faster than
+      running a sequence of queries with <code class="ph codeph">COUNT(DISTINCT)</code> calls.
+    </p>
+
+<pre class="pre codeblock"><code>select cast(ndv(col1) as bigint) as col1, cast(ndv(col2) as bigint) as col2,
+    cast(ndv(col3) as bigint) as col3, cast(ndv(col4) as bigint) as col4
+  from sample_data;
++----------+-----------+------------+-----------+
+| col1     | col2      | col3       | col4      |
++----------+-----------+------------+-----------+
+| 139017   | 282       | 46         | 145636240 |
++----------+-----------+------------+-----------+
+Fetched 1 row(s) in 34.97s
+
+select count(distinct col1) from sample_data;
++---------------------+
+| count(distinct col1)|
++---------------------+
+| 100000              |
++---------------------+
+Fetched 1 row(s) in 20.13s
+
+select count(distinct col2) from sample_data;
++----------------------+
+| count(distinct col2) |
++----------------------+
+| 278                  |
++----------------------+
+Fetched 1 row(s) in 20.09s
+
+select count(distinct col3) from sample_data;
++-----------------------+
+| count(distinct col3)  |
++-----------------------+
+| 46                    |
++-----------------------+
+Fetched 1 row(s) in 19.12s
+
+select count(distinct col4) from sample_data;
++----------------------+
+| count(distinct col4) |
++----------------------+
+| 147135880            |
++----------------------+
+Fetched 1 row(s) in 266.95s
+</code></pre>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_aggregate_functions.html">Impala Aggregate Functions</a></div></div></nav></article></main></body></html>
\ No newline at end of file

[13/51] [partial] incubator-impala git commit: IMPALA-4181 [DOCS] Publish rendered Impala documentation to ASF site

Posted by jb...@apache.org.

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_porting.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_porting.html b/docs/build/html/topics/impala_porting.html
new file mode 100644
index 0000000..cc4cb29
--- /dev/null
+++ b/docs/build/html/topics/impala_porting.html
@@ -0,0 +1,603 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_langref.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="porting"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Porting SQL from Other Database Systems to Impala</title></head><body id="porting"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Porting SQL from Other Database Systems to Impala</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      
+      Although Impala uses standard SQL for queries, you might need to modify SQL source when bringing applications
+      to Impala, due to variations in data types, built-in functions, vendor language extensions, and
+      Hadoop-specific syntax. Even when SQL is working correctly, you might make further minor modifications for
+      best performance.
+    </p>
+
+    <p class="p toc inpage"></p>
+  </div>
+
+  <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_langref.html">Impala SQL Language Reference</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="porting__porting_ddl_dml">
+
+    <h2 class="title topictitle2" id="ariaid-title2">Porting DDL and DML Statements</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        When adapting SQL code from a traditional database system to Impala, expect to find a number of differences
+        in the DDL statements that you use to set up the schema. Clauses related to physical layout of files,
+        tablespaces, and indexes have no equivalent in Impala. You might restructure your schema considerably to
+        account for the Impala partitioning scheme and Hadoop file formats.
+      </p>
+
+      <p class="p">
+        Expect SQL queries to have a much higher degree of compatibility. With modest rewriting to address vendor
+        extensions and features not yet supported in Impala, you might be able to run identical or almost-identical
+        query text on both systems.
+      </p>
+
+      <p class="p">
+        Therefore, consider separating out the DDL into a separate Impala-specific setup script. Focus your reuse
+        and ongoing tuning efforts on the code for SQL queries.
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="porting__porting_data_types">
+
+    <h2 class="title topictitle2" id="ariaid-title3">Porting Data Types from Other Database Systems</h2>
+
+    <div class="body conbody">
+
+      <ul class="ul">
+        <li class="li">
+          <p class="p">
+            Change any <code class="ph codeph">VARCHAR</code>, <code class="ph codeph">VARCHAR2</code>, and <code class="ph codeph">CHAR</code> columns to
+            <code class="ph codeph">STRING</code>. Remove any length constraints from the column declarations; for example,
+            change <code class="ph codeph">VARCHAR(32)</code> or <code class="ph codeph">CHAR(1)</code> to <code class="ph codeph">STRING</code>. Impala is
+            very flexible about the length of string values; it does not impose any length constraints
+            or do any special processing (such as blank-padding) for <code class="ph codeph">STRING</code> columns.
+            (In Impala 2.0 and higher, there are data types <code class="ph codeph">VARCHAR</code> and <code class="ph codeph">CHAR</code>,
+            with length constraints for both types and blank-padding for <code class="ph codeph">CHAR</code>.
+            However, for performance reasons, it is still preferable to use <code class="ph codeph">STRING</code>
+            columns where practical.)
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            For national language character types such as <code class="ph codeph">NCHAR</code>, <code class="ph codeph">NVARCHAR</code>, or
+            <code class="ph codeph">NCLOB</code>, be aware that while Impala can store and query UTF-8 character data, currently
+            some string manipulation operations only work correctly with ASCII data. See
+            <a class="xref" href="impala_string.html#string">STRING Data Type</a> for details.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            Change any <code class="ph codeph">DATE</code>, <code class="ph codeph">DATETIME</code>, or <code class="ph codeph">TIME</code> columns to
+            <code class="ph codeph">TIMESTAMP</code>. Remove any precision constraints. Remove any timezone clauses, and make
+            sure your application logic or ETL process accounts for the fact that Impala expects all
+            <code class="ph codeph">TIMESTAMP</code> values to be in
+            <a class="xref" href="http://en.wikipedia.org/wiki/Coordinated_Universal_Time" target="_blank">Coordinated
+            Universal Time (UTC)</a>. See <a class="xref" href="impala_timestamp.html#timestamp">TIMESTAMP Data Type</a> for information about
+            the <code class="ph codeph">TIMESTAMP</code> data type, and
+            <a class="xref" href="impala_datetime_functions.html#datetime_functions">Impala Date and Time Functions</a> for conversion functions for different
+            date and time formats.
+          </p>
+          <p class="p">
+            You might also need to adapt date- and time-related literal values and format strings to use the
+            supported Impala date and time formats. If you have date and time literals with different separators or
+            different numbers of <code class="ph codeph">YY</code>, <code class="ph codeph">MM</code>, and so on placeholders than Impala
+            expects, consider using calls to <code class="ph codeph">regexp_replace()</code> to transform those values to the
+            Impala-compatible format. See <a class="xref" href="impala_timestamp.html#timestamp">TIMESTAMP Data Type</a> for information about the
+            allowed formats for date and time literals, and
+            <a class="xref" href="impala_string_functions.html#string_functions">Impala String Functions</a> for string conversion functions such as
+            <code class="ph codeph">regexp_replace()</code>.
+          </p>
+          <p class="p">
+            Instead of <code class="ph codeph">SYSDATE</code>, call the function <code class="ph codeph">NOW()</code>.
+          </p>
+          <p class="p">
+            Instead of adding or subtracting directly from a date value to produce a value <var class="keyword varname">N</var>
+            days in the past or future, use an <code class="ph codeph">INTERVAL</code> expression, for example <code class="ph codeph">NOW() +
+            INTERVAL 30 DAYS</code>.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            Although Impala supports <code class="ph codeph">INTERVAL</code> expressions for datetime arithmetic, as shown in
+            <a class="xref" href="impala_timestamp.html#timestamp">TIMESTAMP Data Type</a>, <code class="ph codeph">INTERVAL</code> is not available as a column
+            data type in Impala. For any <code class="ph codeph">INTERVAL</code> values stored in tables, convert them to numeric
+            values that you can add or subtract using the functions in
+            <a class="xref" href="impala_datetime_functions.html#datetime_functions">Impala Date and Time Functions</a>. For example, if you had a table
+            <code class="ph codeph">DEADLINES</code> with an <code class="ph codeph">INT</code> column <code class="ph codeph">TIME_PERIOD</code>, you could
+            construct dates N days in the future like so:
+          </p>
+<pre class="pre codeblock"><code>SELECT NOW() + INTERVAL time_period DAYS from deadlines;</code></pre>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            For <code class="ph codeph">YEAR</code> columns, change to the smallest Impala integer type that has sufficient
+            range. See <a class="xref" href="impala_datatypes.html#datatypes">Data Types</a> for details about ranges, casting, and so on
+            for the various numeric data types.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            Change any <code class="ph codeph">DECIMAL</code> and <code class="ph codeph">NUMBER</code> types. If fixed-point precision is not
+            required, you can use <code class="ph codeph">FLOAT</code> or <code class="ph codeph">DOUBLE</code> on the Impala side depending on
+            the range of values. For applications that require precise decimal values, such as financial data, you
+            might need to make more extensive changes to table structure and application logic, such as using
+            separate integer columns for dollars and cents, or encoding numbers as string values and writing UDFs
+            to manipulate them. See <a class="xref" href="impala_datatypes.html#datatypes">Data Types</a> for details about ranges,
+            casting, and so on for the various numeric data types.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            <code class="ph codeph">FLOAT</code>, <code class="ph codeph">DOUBLE</code>, and <code class="ph codeph">REAL</code> types are supported in
+            Impala. Remove any precision and scale specifications. (In Impala, <code class="ph codeph">REAL</code> is just an
+            alias for <code class="ph codeph">DOUBLE</code>; columns declared as <code class="ph codeph">REAL</code> are turned into
+            <code class="ph codeph">DOUBLE</code> behind the scenes.) See <a class="xref" href="impala_datatypes.html#datatypes">Data Types</a> for
+            details about ranges, casting, and so on for the various numeric data types.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            Most integer types from other systems have equivalents in Impala, perhaps under different names such as
+            <code class="ph codeph">BIGINT</code> instead of <code class="ph codeph">INT8</code>. For any that are unavailable, for example
+            <code class="ph codeph">MEDIUMINT</code>, switch to the smallest Impala integer type that has sufficient range.
+            Remove any precision specifications. See <a class="xref" href="impala_datatypes.html#datatypes">Data Types</a> for details
+            about ranges, casting, and so on for the various numeric data types.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            Remove any <code class="ph codeph">UNSIGNED</code> constraints. All Impala numeric types are signed. See
+            <a class="xref" href="impala_datatypes.html#datatypes">Data Types</a> for details about ranges, casting, and so on for the
+            various numeric data types.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            For any types holding bitwise values, use an integer type with enough range to hold all the relevant
+            bits within a positive integer. See <a class="xref" href="impala_datatypes.html#datatypes">Data Types</a> for details about
+            ranges, casting, and so on for the various numeric data types.
+          </p>
+          <p class="p">
+            For example, <code class="ph codeph">TINYINT</code> has a maximum positive value of 127, not 256, so to manipulate
+            8-bit bitfields as positive numbers switch to the next largest type <code class="ph codeph">SMALLINT</code>.
+          </p>
+<pre class="pre codeblock"><code>[localhost:21000] &gt; select cast(127*2 as tinyint);
++--------------------------+
+| cast(127 * 2 as tinyint) |
++--------------------------+
+| -2                       |
++--------------------------+
+[localhost:21000] &gt; select cast(128 as tinyint);
++----------------------+
+| cast(128 as tinyint) |
++----------------------+
+| -128                 |
++----------------------+
+[localhost:21000] &gt; select cast(127*2 as smallint);
++---------------------------+
+| cast(127 * 2 as smallint) |
++---------------------------+
+| 254                       |
++---------------------------+</code></pre>
+          <p class="p">
+            Impala does not support notation such as <code class="ph codeph">b'0101'</code> for bit literals.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            For BLOB values, use <code class="ph codeph">STRING</code> to represent <code class="ph codeph">CLOB</code> or
+            <code class="ph codeph">TEXT</code> types (character based large objects) up to 32 KB in size. Binary large objects
+            such as <code class="ph codeph">BLOB</code>, <code class="ph codeph">RAW</code> <code class="ph codeph">BINARY</code>, and
+            <code class="ph codeph">VARBINARY</code> do not currently have an equivalent in Impala.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            For Boolean-like types such as <code class="ph codeph">BOOL</code>, use the Impala <code class="ph codeph">BOOLEAN</code> type.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            Because Impala currently does not support composite or nested types, any spatial data types in other
+            database systems do not have direct equivalents in Impala. You could represent spatial values in string
+            format and write UDFs to process them. See <a class="xref" href="impala_udf.html#udfs">Impala User-Defined Functions (UDFs)</a> for details. Where
+            practical, separate spatial types into separate tables so that Impala can still work with the
+            non-spatial data.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            Take out any <code class="ph codeph">DEFAULT</code> clauses. Impala can use data files produced from many different
+            sources, such as Pig, Hive, or MapReduce jobs. The fast import mechanisms of <code class="ph codeph">LOAD DATA</code>
+            and external tables mean that Impala is flexible about the format of data files, and Impala does not
+            necessarily validate or cleanse data before querying it. When copying data through Impala
+            <code class="ph codeph">INSERT</code> statements, you can use conditional functions such as <code class="ph codeph">CASE</code> or
+            <code class="ph codeph">NVL</code> to substitute some other value for <code class="ph codeph">NULL</code> fields; see
+            <a class="xref" href="impala_conditional_functions.html#conditional_functions">Impala Conditional Functions</a> for details.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            Take out any constraints from your <code class="ph codeph">CREATE TABLE</code> and <code class="ph codeph">ALTER TABLE</code>
+            statements, for example <code class="ph codeph">PRIMARY KEY</code>, <code class="ph codeph">FOREIGN KEY</code>,
+            <code class="ph codeph">UNIQUE</code>, <code class="ph codeph">NOT NULL</code>, <code class="ph codeph">UNSIGNED</code>, or
+            <code class="ph codeph">CHECK</code> constraints. Impala can use data files produced from many different sources,
+            such as Pig, Hive, or MapReduce jobs. Therefore, Impala expects initial data validation to happen
+            earlier during the ETL or ELT cycle. After data is loaded into Impala tables, you can perform queries
+            to test for <code class="ph codeph">NULL</code> values. When copying data through Impala <code class="ph codeph">INSERT</code>
+            statements, you can use conditional functions such as <code class="ph codeph">CASE</code> or <code class="ph codeph">NVL</code> to
+            substitute some other value for <code class="ph codeph">NULL</code> fields; see
+            <a class="xref" href="impala_conditional_functions.html#conditional_functions">Impala Conditional Functions</a> for details.
+          </p>
+          <p class="p">
+            Do as much verification as practical before loading data into Impala. After data is loaded into Impala,
+            you can do further verification using SQL queries to check if values have expected ranges, if values
+            are <code class="ph codeph">NULL</code> or not, and so on. If there is a problem with the data, you will need to
+            re-run earlier stages of the ETL process, or do an <code class="ph codeph">INSERT ... SELECT</code> statement in
+            Impala to copy the faulty data to a new table and transform or filter out the bad values.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            Take out any <code class="ph codeph">CREATE INDEX</code>, <code class="ph codeph">DROP INDEX</code>, and <code class="ph codeph">ALTER
+            INDEX</code> statements, and equivalent <code class="ph codeph">ALTER TABLE</code> statements. Remove any
+            <code class="ph codeph">INDEX</code>, <code class="ph codeph">KEY</code>, or <code class="ph codeph">PRIMARY KEY</code> clauses from
+            <code class="ph codeph">CREATE TABLE</code> and <code class="ph codeph">ALTER TABLE</code> statements. Impala is optimized for bulk
+            read operations for data warehouse-style queries, and therefore does not support indexes for its
+            tables.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            Calls to built-in functions with out-of-range or otherwise incorrect arguments, return
+            <code class="ph codeph">NULL</code> in Impala as opposed to raising exceptions. (This rule applies even when the
+            <code class="ph codeph">ABORT_ON_ERROR=true</code> query option is in effect.) Run small-scale queries using
+            representative data to doublecheck that calls to built-in functions are returning expected values
+            rather than <code class="ph codeph">NULL</code>. For example, unsupported <code class="ph codeph">CAST</code> operations do not
+            raise an error in Impala:
+          </p>
+<pre class="pre codeblock"><code>select cast('foo' as int);
++--------------------+
+| cast('foo' as int) |
++--------------------+
+| NULL               |
++--------------------+</code></pre>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            For any other type not supported in Impala, you could represent their values in string format and write
+            UDFs to process them. See <a class="xref" href="impala_udf.html#udfs">Impala User-Defined Functions (UDFs)</a> for details.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            To detect the presence of unsupported or unconvertable data types in data files, do initial testing
+            with the <code class="ph codeph">ABORT_ON_ERROR=true</code> query option in effect. This option causes queries to
+            fail immediately if they encounter disallowed type conversions. See
+            <a class="xref" href="impala_abort_on_error.html#abort_on_error">ABORT_ON_ERROR Query Option</a> for details. For example:
+          </p>
+<pre class="pre codeblock"><code>set abort_on_error=true;
+select count(*) from (select * from t1);
+-- The above query will fail if the data files for T1 contain any
+-- values that can't be converted to the expected Impala data types.
+-- For example, if T1.C1 is defined as INT but the column contains
+-- floating-point values like 1.1, the query will return an error.</code></pre>
+        </li>
+      </ul>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="porting__porting_statements">
+
+    <h2 class="title topictitle2" id="ariaid-title4">SQL Statements to Remove or Adapt</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        Some SQL statements or clauses that you might be familiar with are not currently supported in Impala:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          <p class="p">
+            Impala has no <code class="ph codeph">DELETE</code> statement. Impala is intended for data warehouse-style operations
+            where you do bulk moves and transforms of large quantities of data. Instead of using
+            <code class="ph codeph">DELETE</code>, use <code class="ph codeph">INSERT OVERWRITE</code> to entirely replace the contents of a
+            table or partition, or use <code class="ph codeph">INSERT ... SELECT</code> to copy a subset of data (everything but
+            the rows you intended to delete) from one table to another. See <a class="xref" href="impala_dml.html#dml">DML Statements</a> for
+            an overview of Impala DML statements.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            Impala has no <code class="ph codeph">UPDATE</code> statement. Impala is intended for data warehouse-style operations
+            where you do bulk moves and transforms of large quantities of data. Instead of using
+            <code class="ph codeph">UPDATE</code>, do all necessary transformations early in the ETL process, such as in the job
+            that generates the original data, or when copying from one table to another to convert to a particular
+            file format or partitioning scheme. See <a class="xref" href="impala_dml.html#dml">DML Statements</a> for an overview of Impala DML
+            statements.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            Impala has no transactional statements, such as <code class="ph codeph">COMMIT</code> or <code class="ph codeph">ROLLBACK</code>.
+            Impala effectively works like the <code class="ph codeph">AUTOCOMMIT</code> mode in some database systems, where
+            changes take effect as soon as they are made.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            If your database, table, column, or other names conflict with Impala reserved words, use different
+            names or quote the names with backticks. See <a class="xref" href="impala_reserved_words.html#reserved_words">Impala Reserved Words</a>
+            for the current list of Impala reserved words.
+          </p>
+          <p class="p">
+            Conversely, if you use a keyword that Impala does not recognize, it might be interpreted as a table or
+            column alias. For example, in <code class="ph codeph">SELECT * FROM t1 NATURAL JOIN t2</code>, Impala does not
+            recognize the <code class="ph codeph">NATURAL</code> keyword and interprets it as an alias for the table
+            <code class="ph codeph">t1</code>. If you experience any unexpected behavior with queries, check the list of reserved
+            words to make sure all keywords in join and <code class="ph codeph">WHERE</code> clauses are recognized.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            Impala supports subqueries only in the <code class="ph codeph">FROM</code> clause of a query, not within the
+            <code class="ph codeph">WHERE</code> clauses. Therefore, you cannot use clauses such as <code class="ph codeph">WHERE
+            <var class="keyword varname">column</var> IN (<var class="keyword varname">subquery</var>)</code>. Also, Impala does not allow
+            <code class="ph codeph">EXISTS</code> or <code class="ph codeph">NOT EXISTS</code> clauses (although <code class="ph codeph">EXISTS</code> is a
+            reserved keyword).
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            Impala supports <code class="ph codeph">UNION</code> and <code class="ph codeph">UNION ALL</code> set operators, but not
+            <code class="ph codeph">INTERSECT</code>. <span class="ph">Prefer <code class="ph codeph">UNION ALL</code> over <code class="ph codeph">UNION</code> when you know the
+        data sets are disjoint or duplicate values are not a problem; <code class="ph codeph">UNION ALL</code> is more efficient
+        because it avoids materializing and sorting the entire result set to eliminate duplicate values.</span>
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            Within queries, Impala requires query aliases for any subqueries:
+          </p>
+<pre class="pre codeblock"><code>-- Without the alias 'contents_of_t1' at the end, query gives syntax error.
+select count(*) from (select * from t1) contents_of_t1;</code></pre>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            When an alias is declared for an expression in a query, that alias cannot be referenced again within
+            the same query block:
+          </p>
+<pre class="pre codeblock"><code>-- Can't reference AVERAGE twice in the SELECT list where it's defined.
+select avg(x) as average, average+1 from t1 group by x;
+ERROR: AnalysisException: couldn't resolve column reference: 'average'
+
+-- Although it can be referenced again later in the same query.
+select avg(x) as average from t1 group by x having average &gt; 3;</code></pre>
+          <p class="p">
+            For Impala, either repeat the expression again, or abstract the expression into a <code class="ph codeph">WITH</code>
+            clause, creating named columns that can be referenced multiple times anywhere in the base query:
+          </p>
+<pre class="pre codeblock"><code>-- The following 2 query forms are equivalent.
+select avg(x) as average, avg(x)+1 from t1 group by x;
+with avg_t as (select avg(x) average from t1 group by x) select average, average+1 from avg_t;</code></pre>
+
+        </li>
+
+        <li class="li">
+          <p class="p">
+            Impala does not support certain rarely used join types that are less appropriate for high-volume tables
+            used for data warehousing. In some cases, Impala supports join types but requires explicit syntax to
+            ensure you do not do inefficient joins of huge tables by accident. For example, Impala does not support
+            natural joins or anti-joins, and requires the <code class="ph codeph">CROSS JOIN</code> operator for Cartesian
+            products. See <a class="xref" href="impala_joins.html#joins">Joins in Impala SELECT Statements</a> for details on the syntax for Impala join clauses.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            Impala has a limited choice of partitioning types. Partitions are defined based on each distinct
+            combination of values for one or more partition key columns. Impala does not redistribute or check data
+            to create evenly distributed partitions; you must choose partition key columns based on your knowledge
+            of the data volume and distribution. Adapt any tables that use range, list, hash, or key partitioning
+            to use the Impala partition syntax for <code class="ph codeph">CREATE TABLE</code> and <code class="ph codeph">ALTER TABLE</code>
+            statements. Impala partitioning is similar to range partitioning where every range has exactly one
+            value, or key partitioning where the hash function produces a separate bucket for every combination of
+            key values. See <a class="xref" href="impala_partitioning.html#partitioning">Partitioning for Impala Tables</a> for usage details, and
+            <a class="xref" href="impala_create_table.html#create_table">CREATE TABLE Statement</a> and
+            <a class="xref" href="impala_alter_table.html#alter_table">ALTER TABLE Statement</a> for syntax.
+          </p>
+          <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+            Because the number of separate partitions is potentially higher than in other database systems, keep a
+            close eye on the number of partitions and the volume of data in each one; scale back the number of
+            partition key columns if you end up with too many partitions with a small volume of data in each one.
+            Remember, to distribute work for a query across a cluster, you need at least one HDFS block per node.
+            HDFS blocks are typically multiple megabytes, <span class="ph">especially</span> for Parquet
+            files. Therefore, if each partition holds only a few megabytes of data, you are unlikely to see much
+            parallelism in the query because such a small amount of data is typically processed by a single node.
+          </div>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            For <span class="q">"top-N"</span> queries, Impala uses the <code class="ph codeph">LIMIT</code> clause rather than comparing against a
+            pseudocolumn named <code class="ph codeph">ROWNUM</code> or <code class="ph codeph">ROW_NUM</code>. See
+            <a class="xref" href="impala_limit.html#limit">LIMIT Clause</a> for details.
+          </p>
+        </li>
+      </ul>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title5" id="porting__porting_antipatterns">
+
+    <h2 class="title topictitle2" id="ariaid-title5">SQL Constructs to Doublecheck</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        Some SQL constructs that are supported have behavior or defaults more oriented towards convenience than
+        optimal performance. Also, sometimes machine-generated SQL, perhaps issued through JDBC or ODBC
+        applications, might have inefficiencies or exceed internal Impala limits. As you port SQL code, be alert
+        and change these things where appropriate:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          <p class="p">
+            A <code class="ph codeph">CREATE TABLE</code> statement with no <code class="ph codeph">STORED AS</code> clause creates data files
+            in plain text format, which is convenient for data interchange but not a good choice for high-volume
+            data with high-performance queries. See <a class="xref" href="impala_file_formats.html#file_formats">How Impala Works with Hadoop File Formats</a> for why and
+            how to use specific file formats for compact data and high-performance queries. Especially see
+            <a class="xref" href="impala_parquet.html#parquet">Using the Parquet File Format with Impala Tables</a>, for details about the file format most heavily optimized for
+            large-scale data warehouse queries.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            A <code class="ph codeph">CREATE TABLE</code> statement with no <code class="ph codeph">PARTITIONED BY</code> clause stores all the
+            data files in the same physical location, which can lead to scalability problems when the data volume
+            becomes large.
+          </p>
+          <p class="p">
+            On the other hand, adapting tables that were already partitioned in a different database system could
+            produce an Impala table with a high number of partitions and not enough data in each one, leading to
+            underutilization of Impala's parallel query features.
+          </p>
+          <p class="p">
+            See <a class="xref" href="impala_partitioning.html#partitioning">Partitioning for Impala Tables</a> for details about setting up partitioning and
+            tuning the performance of queries on partitioned tables.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            The <code class="ph codeph">INSERT ... VALUES</code> syntax is suitable for setting up toy tables with a few rows for
+            functional testing, but because each such statement creates a separate tiny file in HDFS, it is not a
+            scalable technique for loading megabytes or gigabytes (let alone petabytes) of data. Consider revising
+            your data load process to produce raw data files outside of Impala, then setting up Impala external
+            tables or using the <code class="ph codeph">LOAD DATA</code> statement to use those data files instantly in Impala
+            tables, with no conversion or indexing stage. See <a class="xref" href="impala_tables.html#external_tables">External Tables</a> and
+            <a class="xref" href="impala_load_data.html#load_data">LOAD DATA Statement</a> for details about the Impala techniques for working with
+            data files produced outside of Impala; see <a class="xref" href="impala_tutorial.html#tutorial_etl">Data Loading and Querying Examples</a> for examples
+            of ETL workflow for Impala.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            If your ETL process is not optimized for Hadoop, you might end up with highly fragmented small data
+            files, or a single giant data file that cannot take advantage of distributed parallel queries or
+            partitioning. In this case, use an <code class="ph codeph">INSERT ... SELECT</code> statement to copy the data into a
+            new table and reorganize into a more efficient layout in the same operation. See
+            <a class="xref" href="impala_insert.html#insert">INSERT Statement</a> for details about the <code class="ph codeph">INSERT</code> statement.
+          </p>
+          <p class="p">
+            You can do <code class="ph codeph">INSERT ... SELECT</code> into a table with a more efficient file format (see
+            <a class="xref" href="impala_file_formats.html#file_formats">How Impala Works with Hadoop File Formats</a>) or from an unpartitioned table into a partitioned
+            one (see <a class="xref" href="impala_partitioning.html#partitioning">Partitioning for Impala Tables</a>).
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            The number of expressions allowed in an Impala query might be smaller than for some other database
+            systems, causing failures for very complicated queries (typically produced by automated SQL
+            generators). Where practical, keep the number of expressions in the <code class="ph codeph">WHERE</code> clauses to
+            approximately 2000 or fewer. As a workaround, set the query option
+            <code class="ph codeph">DISABLE_CODEGEN=true</code> if queries fail for this reason. See
+            <a class="xref" href="impala_disable_codegen.html#disable_codegen">DISABLE_CODEGEN Query Option</a> for details.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            If practical, rewrite <code class="ph codeph">UNION</code> queries to use the <code class="ph codeph">UNION ALL</code> operator
+            instead. <span class="ph">Prefer <code class="ph codeph">UNION ALL</code> over <code class="ph codeph">UNION</code> when you know the
+        data sets are disjoint or duplicate values are not a problem; <code class="ph codeph">UNION ALL</code> is more efficient
+        because it avoids materializing and sorting the entire result set to eliminate duplicate values.</span>
+          </p>
+        </li>
+      </ul>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title6" id="porting__porting_next">
+
+    <h2 class="title topictitle2" id="ariaid-title6">Next Porting Steps after Verifying Syntax and Semantics</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        Throughout this section, some of the decisions you make during the porting process also have a substantial
+        impact on performance. After your SQL code is ported and working correctly, doublecheck the
+        performance-related aspects of your schema design, physical layout, and queries to make sure that the
+        ported application is taking full advantage of Impala's parallelism, performance-related SQL features, and
+        integration with Hadoop components.
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          Have you run the <code class="ph codeph">COMPUTE STATS</code> statement on each table involved in join queries? Have
+          you also run <code class="ph codeph">COMPUTE STATS</code> for each table used as the source table in an <code class="ph codeph">INSERT
+          ... SELECT</code> or <code class="ph codeph">CREATE TABLE AS SELECT</code> statement?
+        </li>
+
+        <li class="li">
+          Are you using the most efficient file format for your data volumes, table structure, and query
+          characteristics?
+        </li>
+
+        <li class="li">
+          Are you using partitioning effectively? That is, have you partitioned on columns that are often used for
+          filtering in <code class="ph codeph">WHERE</code> clauses? Have you partitioned at the right granularity so that there
+          is enough data in each partition to parallelize the work for each query?
+        </li>
+
+        <li class="li">
+          Does your ETL process produce a relatively small number of multi-megabyte data files (good) rather than a
+          huge number of small files (bad)?
+        </li>
+      </ul>
+
+      <p class="p">
+        See <a class="xref" href="impala_performance.html#performance">Tuning Impala for Performance</a> for details about the whole performance tuning
+        process.
+      </p>
+    </div>
+  </article>
+</article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_ports.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_ports.html b/docs/build/html/topics/impala_ports.html
new file mode 100644
index 0000000..7becb40
--- /dev/null
+++ b/docs/build/html/topics/impala_ports.html
@@ -0,0 +1,421 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="ports"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Ports Used by Impala</title></head><body id="ports"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Ports Used by Impala</h1>
+  
+
+  <div class="body conbody" id="ports__conbody_ports">
+
+    <p class="p">
+      
+      Impala uses the TCP ports listed in the following table. Before deploying Impala, ensure these ports are open
+      on each system.
+    </p>
+
+    <table class="table"><caption></caption><colgroup><col style="width:18.181818181818183%"><col style="width:27.27272727272727%"><col style="width:9.090909090909092%"><col style="width:18.181818181818183%"><col style="width:27.27272727272727%"></colgroup><thead class="thead">
+          <tr class="row">
+            <th class="entry nocellnorowborder" id="ports__conbody_ports__entry__1">
+              Component
+            </th>
+            <th class="entry nocellnorowborder" id="ports__conbody_ports__entry__2">
+              Service
+            </th>
+            <th class="entry nocellnorowborder" id="ports__conbody_ports__entry__3">
+              Port
+            </th>
+            <th class="entry nocellnorowborder" id="ports__conbody_ports__entry__4">
+              Access Requirement
+            </th>
+            <th class="entry nocellnorowborder" id="ports__conbody_ports__entry__5">
+              Comment
+            </th>
+          </tr>
+        </thead><tbody class="tbody">
+          <tr class="row">
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__1 ">
+              <p class="p">
+                Impala Daemon
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__2 ">
+              <p class="p">
+                Impala Daemon Frontend Port
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__3 ">
+              <p class="p">
+                21000
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__4 ">
+              <p class="p">
+                External
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__5 ">
+              <p class="p">
+                Used to transmit commands and receive results by <code class="ph codeph">impala-shell</code> and
+                some ODBC drivers.
+              </p>
+            </td>
+          </tr>
+          <tr class="row">
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__1 ">
+              <p class="p">
+                Impala Daemon
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__2 ">
+              <p class="p">
+                Impala Daemon Frontend Port
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__3 ">
+              <p class="p">
+                21050
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__4 ">
+              <p class="p">
+                External
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__5 ">
+              <p class="p">
+                Used to transmit commands and receive results by applications, such as Business Intelligence tools,
+                using JDBC, the Beeswax query editor in Hue, and some ODBC drivers.
+              </p>
+            </td>
+          </tr>
+          <tr class="row">
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__1 ">
+              <p class="p">
+                Impala Daemon
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__2 ">
+              <p class="p">
+                Impala Daemon Backend Port
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__3 ">
+              <p class="p">
+                22000
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__4 ">
+              <p class="p">
+                Internal
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__5 ">
+              <p class="p">
+                Internal use only. Impala daemons use this port to communicate with each other.
+              </p>
+            </td>
+          </tr>
+          <tr class="row">
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__1 ">
+              <p class="p">
+                Impala Daemon
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__2 ">
+              <p class="p">
+                StateStoreSubscriber Service Port
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__3 ">
+              <p class="p">
+                23000
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__4 ">
+              <p class="p">
+                Internal
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__5 ">
+              <p class="p">
+                Internal use only. Impala daemons listen on this port for updates from the statestore daemon.
+              </p>
+            </td>
+          </tr>
+          <tr class="row">
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__1 ">
+              <p class="p">
+                Catalog Daemon
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__2 ">
+              <p class="p">
+                StateStoreSubscriber Service Port
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__3 ">
+              <p class="p">
+                23020
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__4 ">
+              <p class="p">
+                Internal
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__5 ">
+              <p class="p">
+                Internal use only. The catalog daemon listens on this port for updates from the statestore daemon.
+              </p>
+            </td>
+          </tr>
+          <tr class="row">
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__1 ">
+              <p class="p">
+                Impala Daemon
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__2 ">
+              <p class="p">
+                Impala Daemon HTTP Server Port
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__3 ">
+              <p class="p">
+                25000
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__4 ">
+              <p class="p">
+                External
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__5 ">
+              <p class="p">
+                Impala web interface for administrators to monitor and troubleshoot.
+              </p>
+            </td>
+          </tr>
+          <tr class="row">
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__1 ">
+              <p class="p">
+                Impala StateStore Daemon
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__2 ">
+              <p class="p">
+                StateStore HTTP Server Port
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__3 ">
+              <p class="p">
+                25010
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__4 ">
+              <p class="p">
+                External
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__5 ">
+              <p class="p">
+                StateStore web interface for administrators to monitor and troubleshoot.
+              </p>
+            </td>
+          </tr>
+          <tr class="row">
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__1 ">
+              <p class="p">
+                Impala Catalog Daemon
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__2 ">
+              <p class="p">
+                Catalog HTTP Server Port
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__3 ">
+              <p class="p">
+                25020
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__4 ">
+              <p class="p">
+                External
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__5 ">
+              <p class="p">
+                Catalog service web interface for administrators to monitor and troubleshoot. New in Impala 1.2 and
+                higher.
+              </p>
+            </td>
+          </tr>
+          <tr class="row">
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__1 ">
+              <p class="p">
+                Impala StateStore Daemon
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__2 ">
+              <p class="p">
+                StateStore Service Port
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__3 ">
+              <p class="p">
+                24000
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__4 ">
+              <p class="p">
+                Internal
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__5 ">
+              <p class="p">
+                Internal use only. The statestore daemon listens on this port for registration/unregistration
+                requests.
+              </p>
+            </td>
+          </tr>
+          <tr class="row">
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__1 ">
+              <p class="p">
+                Impala Catalog Daemon
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__2 ">
+              <p class="p">
+                StateStore Service Port
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__3 ">
+              <p class="p">
+                26000
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__4 ">
+              <p class="p">
+                Internal
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__5 ">
+              <p class="p">
+                Internal use only. The catalog service uses this port to communicate with the Impala daemons. New
+                in Impala 1.2 and higher.
+              </p>
+            </td>
+          </tr>
+          <tr class="row">
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__1 ">
+              <p class="p">
+                Impala Daemon
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__2 ">
+              <p class="p">
+                Llama Callback Port
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__3 ">
+              <p class="p">
+                28000
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__4 ">
+              <p class="p">
+                Internal
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__5 ">
+              <p class="p">
+                Internal use only. Impala daemons use to communicate with Llama. New in <span class="keyword">Impala 1.3</span> and higher.
+              </p>
+            </td>
+          </tr>
+          <tr class="row">
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__1 ">
+              <p class="p">
+                Impala Llama ApplicationMaster
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__2 ">
+              <p class="p">
+                Llama Thrift Admin Port
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__3 ">
+              <p class="p">
+                15002
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__4 ">
+              <p class="p">
+                Internal
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__5 ">
+              <p class="p">
+                Internal use only. New in <span class="keyword">Impala 1.3</span> and higher.
+              </p>
+            </td>
+          </tr>
+          <tr class="row">
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__1 ">
+              <p class="p">
+                Impala Llama ApplicationMaster
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__2 ">
+              <p class="p">
+                Llama Thrift Port
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__3 ">
+              <p class="p">
+                15000
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__4 ">
+              <p class="p">
+                Internal
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__5 ">
+              <p class="p">
+                Internal use only. New in <span class="keyword">Impala 1.3</span> and higher.
+              </p>
+            </td>
+          </tr>
+          <tr class="row">
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__1 ">
+              <p class="p">
+                Impala Llama ApplicationMaster
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__2 ">
+              <p class="p">
+                Llama HTTP Port
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__3 ">
+              <p class="p">
+                15001
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__4 ">
+              <p class="p">
+                External
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__5 ">
+              <p class="p">
+                Llama service web interface for administrators to monitor and troubleshoot.
+                New in <span class="keyword">Impala 1.3</span> and higher.
+              </p>
+            </td>
+          </tr>
+        </tbody></table>
+  </div>
+</article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_prefetch_mode.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_prefetch_mode.html b/docs/build/html/topics/impala_prefetch_mode.html
new file mode 100644
index 0000000..ea13792
--- /dev/null
+++ b/docs/build/html/topics/impala_prefetch_mode.html
@@ -0,0 +1,47 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="prefetch_mode"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>PREFETCH_MODE Query Option (Impala 2.6 or higher only)</title></head><body id="prefetch_mode"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">PREFETCH_MODE Query Option (<span class="keyword">Impala 2.6</span> or higher only)</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      
+      Determines whether the prefetching optimization is applied during
+      join query processing.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Type:</strong> numeric (0, 1)
+      or corresponding mnemonic strings (<code class="ph codeph">NONE</code>, <code class="ph codeph">HT_BUCKET</code>).
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Default:</strong> 1 (equivalent to <code class="ph codeph">HT_BUCKET</code>)
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.6.0</span>
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+    <p class="p">
+      The default mode is 1, which means that hash table buckets are
+      prefetched during join query processing.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+    <p class="p">
+      <a class="xref" href="impala_joins.html#joins">Joins in Impala SELECT Statements</a>,
+      <a class="xref" href="impala_perf_joins.html#perf_joins">Performance Considerations for Join Queries</a>.
+    </p>
+
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_prereqs.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_prereqs.html b/docs/build/html/topics/impala_prereqs.html
new file mode 100644
index 0000000..e378343
--- /dev/null
+++ b/docs/build/html/topics/impala_prereqs.html
@@ -0,0 +1,275 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_planning.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="prereqs"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Impala Requirements</title></head><body id="prereqs"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Impala Requirements</h1>
+  
+
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      
+      
+      To perform as expected, Impala depends on the availability of the software, hardware, and configurations
+      described in the following sections.
+    </p>
+
+    <p class="p toc inpage"></p>
+  </div>
+
+  <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_planning.html">Planning for Impala Deployment</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="prereqs__prereqs_os">
+
+    <h2 class="title topictitle2" id="ariaid-title2">Supported Operating Systems</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        
+        
+        
+        
+        
+        
+        
+        
+        Apache Impala runs on Linux systems only. See the <span class="ph filepath">README.md</span>
+        file for more information.
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="prereqs__prereqs_hive">
+
+    <h2 class="title topictitle2" id="ariaid-title3">Hive Metastore and Related Configuration</h2>
+  
+
+    <div class="body conbody">
+
+      <p class="p">
+        
+        
+        
+        Impala can interoperate with data stored in Hive, and uses the same infrastructure as Hive for tracking
+        metadata about schema objects such as tables and columns. The following components are prerequisites for
+        Impala:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          MySQL or PostgreSQL, to act as a metastore database for both Impala and Hive.
+          <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+            <p class="p">
+              Installing and configuring a Hive metastore is an Impala requirement. Impala does not work without
+              the metastore database. For the process of installing and configuring the metastore, see
+              <a class="xref" href="impala_install.html#install">Installing Impala</a>.
+            </p>
+
+            <p class="p">
+              Always configure a <strong class="ph b">Hive metastore service</strong> rather than connecting directly to the metastore
+              database. The Hive metastore service is required to interoperate between different levels of
+              metastore APIs if this is necessary for your environment, and using it avoids known issues with
+              connecting directly to the metastore database.
+            </p>
+
+            <p class="p">
+              A summary of the metastore installation process is as follows:
+            </p>
+            <ul class="ul">
+              <li class="li">
+                Install a MySQL or PostgreSQL database. Start the database if it is not started after installation.
+              </li>
+
+              <li class="li">
+                Download the
+                <a class="xref" href="http://www.mysql.com/products/connector/" target="_blank">MySQL
+                connector</a> or the
+                <a class="xref" href="http://jdbc.postgresql.org/download.html" target="_blank">PostgreSQL
+                connector</a> and place it in the <code class="ph codeph">/usr/share/java/</code> directory.
+              </li>
+
+              <li class="li">
+                Use the appropriate command line tool for your database to create the metastore database.
+              </li>
+
+              <li class="li">
+                Use the appropriate command line tool for your database to grant privileges for the metastore
+                database to the <code class="ph codeph">hive</code> user.
+              </li>
+
+              <li class="li">
+                Modify <code class="ph codeph">hive-site.xml</code> to include information matching your particular database: its
+                URL, username, and password. You will copy the <code class="ph codeph">hive-site.xml</code> file to the Impala
+                Configuration Directory later in the Impala installation process.
+              </li>
+            </ul>
+          </div>
+        </li>
+
+        <li class="li">
+          <strong class="ph b">Optional:</strong> Hive. Although only the Hive metastore database is required for Impala to function, you
+          might install Hive on some client machines to create and load data into tables that use certain file
+          formats. See <a class="xref" href="impala_file_formats.html#file_formats">How Impala Works with Hadoop File Formats</a> for details. Hive does not need to be
+          installed on the same DataNodes as Impala; it just needs access to the same metastore database.
+        </li>
+      </ul>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="prereqs__prereqs_java">
+
+    <h2 class="title topictitle2" id="ariaid-title4">Java Dependencies</h2>
+  
+
+    <div class="body conbody">
+
+      <p class="p">
+        
+        
+        Although Impala is primarily written in C++, it does use Java to communicate with various Hadoop
+        components:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          The officially supported JVM for Impala is the Oracle JVM. Other JVMs might cause issues, typically
+          resulting in a failure at <span class="keyword cmdname">impalad</span> startup. In particular, the JamVM used by default on
+          certain levels of Ubuntu systems can cause <span class="keyword cmdname">impalad</span> to fail to start.
+        </li>
+
+        <li class="li">
+          Internally, the <span class="keyword cmdname">impalad</span> daemon relies on the <code class="ph codeph">JAVA_HOME</code> environment
+          variable to locate the system Java libraries. Make sure the <span class="keyword cmdname">impalad</span> service is not run
+          from an environment with an incorrect setting for this variable.
+        </li>
+
+        <li class="li">
+          All Java dependencies are packaged in the <code class="ph codeph">impala-dependencies.jar</code> file, which is located
+          at <code class="ph codeph">/usr/lib/impala/lib/</code>. These map to everything that is built under
+          <code class="ph codeph">fe/target/dependency</code>.
+        </li>
+      </ul>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title5" id="prereqs__prereqs_network">
+
+    <h2 class="title topictitle2" id="ariaid-title5">Networking Configuration Requirements</h2>
+  
+
+    <div class="body conbody">
+
+      <p class="p">
+        
+        As part of ensuring best performance, Impala attempts to complete tasks on local data, as opposed to using
+        network connections to work with remote data. To support this goal, Impala matches
+        the&nbsp;<strong class="ph b">hostname</strong>&nbsp;provided to each Impala daemon with the&nbsp;<strong class="ph b">IP address</strong>&nbsp;of each DataNode by
+        resolving the hostname flag to an IP address. For Impala to work with local data, use a single IP interface
+        for the DataNode and the Impala daemon on each machine. Ensure that the Impala daemon's hostname flag
+        resolves to the IP address of the DataNode. For single-homed machines, this is usually automatic, but for
+        multi-homed machines, ensure that the Impala daemon's hostname resolves to the correct interface. Impala
+        tries to detect the correct hostname at start-up, and prints the derived hostname at the start of the log
+        in a message of the form:
+      </p>
+
+<pre class="pre codeblock"><code>Using hostname: impala-daemon-1.example.com</code></pre>
+
+      <p class="p">
+        In the majority of cases, this automatic detection works correctly. If you need to explicitly set the
+        hostname, do so by setting the&nbsp;<code class="ph codeph">--hostname</code>&nbsp;flag.
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title6" id="prereqs__prereqs_hardware">
+
+    <h2 class="title topictitle2" id="ariaid-title6">Hardware Requirements</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        
+        
+        
+        
+        
+        
+        
+        
+        During join operations, portions of data from each joined table are loaded into memory. Data sets can be
+        very large, so ensure your hardware has sufficient memory to accommodate the joins you anticipate
+        completing.
+      </p>
+
+      <p class="p">
+        While requirements vary according to data set size, the following is generally recommended:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          CPU - Impala version 2.2 and higher uses the SSSE3 instruction set, which is included in newer processors.
+          <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+            This required level of processor is the same as in Impala version 1.x. The Impala 2.0 and 2.1 releases
+            had a stricter requirement for the SSE4.1 instruction set, which has now been relaxed.
+          </div>
+
+        </li>
+
+        <li class="li">
+          Memory - 128 GB or more recommended, ideally 256 GB or more. If the intermediate results during query
+          processing on a particular node exceed the amount of memory available to Impala on that node, the query
+          writes temporary work data to disk, which can lead to long query times. Note that because the work is
+          parallelized, and intermediate results for aggregate queries are typically smaller than the original
+          data, Impala can query and join tables that are much larger than the memory available on an individual
+          node.
+        </li>
+
+        <li class="li">
+          Storage - DataNodes with 12 or more disks each. I/O speeds are often the limiting factor for disk
+          performance with Impala. Ensure that you have sufficient disk space to store the data Impala will be
+          querying.
+        </li>
+      </ul>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title7" id="prereqs__prereqs_account">
+
+    <h2 class="title topictitle2" id="ariaid-title7">User Account Requirements</h2>
+  
+
+    <div class="body conbody">
+
+      <p class="p">
+        
+        
+        
+        Impala creates and uses a user and group named <code class="ph codeph">impala</code>. Do not delete this account or group
+        and do not modify the account's or group's permissions and rights. Ensure no existing systems obstruct the
+        functioning of these accounts and groups. For example, if you have scripts that delete user accounts not in
+        a white-list, add these accounts to the list of permitted accounts.
+      </p>
+
+      <p class="p">
+        For correct file deletion during <code class="ph codeph">DROP TABLE</code> operations, Impala must be able to move files
+        to the HDFS trashcan. You might need to create an HDFS directory <span class="ph filepath">/user/impala</span>,
+        writeable by the <code class="ph codeph">impala</code> user, so that the trashcan can be created. Otherwise, data files
+        might remain behind after a <code class="ph codeph">DROP TABLE</code> statement.
+      </p>
+
+      <p class="p">
+        Impala should not run as root. Best Impala performance is achieved using direct reads, but root is not
+        permitted to use direct reads. Therefore, running Impala as root negatively affects performance.
+      </p>
+
+      <p class="p">
+        By default, any user can connect to Impala and access all the associated databases and tables. You can
+        enable authorization and authentication based on the Linux OS user who connects to the Impala server, and
+        the associated groups for that user. <a class="xref" href="impala_security.html#security">Impala Security</a> for details. These
+        security features do not change the underlying file permission requirements; the <code class="ph codeph">impala</code>
+        user still needs to be able to access the data files.
+      </p>
+    </div>
+  </article>
+</article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_processes.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_processes.html b/docs/build/html/topics/impala_processes.html
new file mode 100644
index 0000000..60bc3c4
--- /dev/null
+++ b/docs/build/html/topics/impala_processes.html
@@ -0,0 +1,115 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_config_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="processes"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Starting Impala</title></head><body id="processes"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Starting Impala</h1>
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      
+      
+      
+      To activate Impala if it is installed but not yet started:
+    </p>
+
+    <ol class="ol">
+      <li class="li">
+        Set any necessary configuration options for the Impala services. See
+        <a class="xref" href="impala_config_options.html#config_options">Modifying Impala Startup Options</a> for details.
+      </li>
+
+      <li class="li">
+        Start one instance of the Impala statestore. The statestore helps Impala to distribute work efficiently,
+        and to continue running in the event of availability problems for other Impala nodes. If the statestore
+        becomes unavailable, Impala continues to function.
+      </li>
+
+      <li class="li">
+        Start one instance of the Impala catalog service.
+      </li>
+
+      <li class="li">
+        Start the main Impala service on one or more DataNodes, ideally on all DataNodes to maximize local
+        processing and avoid network traffic due to remote reads.
+      </li>
+    </ol>
+
+    <p class="p">
+      Once Impala is running, you can conduct interactive experiments using the instructions in
+      <a class="xref" href="impala_tutorial.html#tutorial">Impala Tutorials</a> and try <a class="xref" href="impala_impala_shell.html#impala_shell">Using the Impala Shell (impala-shell Command)</a>.
+    </p>
+
+    <p class="p toc inpage"></p>
+  </div>
+
+  <nav role="navigation" class="related-links"><ul class="ullinks"><li class="link ulchildlink"><strong><a href="../topics/impala_config_options.html">Modifying Impala Startup Options</a></strong><br></li></ul></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="processes__starting_via_cmdline">
+
+    <h2 class="title topictitle2" id="ariaid-title2">Starting Impala from the Command Line</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        To start the Impala state store and Impala from the command line or a script, you can either use the
+        <span class="keyword cmdname">service</span> command or you can start the daemons directly through the
+        <span class="keyword cmdname">impalad</span>, <code class="ph codeph">statestored</code>, and <span class="keyword cmdname">catalogd</span> executables.
+      </p>
+
+      <p class="p">
+        Start the Impala statestore and then start <code class="ph codeph">impalad</code> instances. You can modify the values
+        the service initialization scripts use when starting the statestore and Impala by editing
+        <code class="ph codeph">/etc/default/impala</code>.
+      </p>
+
+      <p class="p">
+        Start the statestore service using a command similar to the following:
+      </p>
+
+      <div class="p">
+<pre class="pre codeblock"><code>$ sudo service impala-state-store start</code></pre>
+      </div>
+
+      <p class="p">
+        Start the catalog service using a command similar to the following:
+      </p>
+
+<pre class="pre codeblock"><code>$ sudo service impala-catalog start</code></pre>
+
+      <p class="p">
+        Start the Impala service on each DataNode using a command similar to the following:
+      </p>
+
+      <div class="p">
+<pre class="pre codeblock"><code>$ sudo service impala-server start</code></pre>
+      </div>
+
+      <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+        <p class="p">
+        In <span class="keyword">Impala 2.5</span> and higher, Impala UDFs and UDAs written in C++ are persisted in the metastore database.
+        Java UDFs are also persisted, if they were created with the new <code class="ph codeph">CREATE FUNCTION</code> syntax for Java UDFs,
+        where the Java function argument and return types are omitted.
+        Java-based UDFs created with the old <code class="ph codeph">CREATE FUNCTION</code> syntax do not persist across restarts
+        because they are held in the memory of the <span class="keyword cmdname">catalogd</span> daemon.
+        Until you re-create such Java UDFs using the new <code class="ph codeph">CREATE FUNCTION</code> syntax,
+        you must reload those Java-based UDFs by running the original <code class="ph codeph">CREATE FUNCTION</code> statements again each time
+        you restart the <span class="keyword cmdname">catalogd</span> daemon.
+        Prior to <span class="keyword">Impala 2.5</span> the requirement to reload functions after a restart applied to both C++ and Java functions.
+      </p>
+      </div>
+
+      <div class="p">
+        If any of the services fail to start, review:
+        <ul class="ul">
+          <li class="li">
+            <a class="xref" href="impala_logging.html#logs_debug">Reviewing Impala Logs</a>
+          </li>
+
+          <li class="li">
+            <a class="xref" href="impala_troubleshooting.html#troubleshooting">Troubleshooting Impala</a>
+          </li>
+        </ul>
+      </div>
+    </div>
+  </article>
+</article></main></body></html>
\ No newline at end of file

[46/51] [partial] incubator-impala git commit: IMPALA-4181 [DOCS] Publish rendered Impala documentation to ASF site

Posted by jb...@apache.org.

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_authorization.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_authorization.html b/docs/build/html/topics/impala_authorization.html
new file mode 100644
index 0000000..13a8fb4
--- /dev/null
+++ b/docs/build/html/topics/impala_authorization.html
@@ -0,0 +1,1177 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_security.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="authorization"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Enabling Sentry Authorization for Impala</title></head><body id="authorization"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Enabling Sentry Authorization for Impala</h1>
+  
+
+  <div class="body conbody" id="authorization__sentry">
+
+    <p class="p">
+      Authorization determines which users are allowed to access which resources, and what operations they are
+      allowed to perform. In Impala 1.1 and higher, you use Apache Sentry for
+      authorization. Sentry adds a fine-grained authorization framework for Hadoop. By default (when authorization
+      is not enabled), Impala does all read and write operations with the privileges of the <code class="ph codeph">impala</code>
+      user, which is suitable for a development/test environment but not for a secure production environment. When
+      authorization is enabled, Impala uses the OS user ID of the user who runs <span class="keyword cmdname">impala-shell</span> or
+      other client program, and associates various privileges with each user.
+    </p>
+
+    <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+      Sentry is typically used in conjunction with Kerberos authentication, which defines which hosts are allowed
+      to connect to each server. Using the combination of Sentry and Kerberos prevents malicious users from being
+      able to connect by creating a named account on an untrusted machine. See
+      <a class="xref" href="impala_kerberos.html#kerberos">Enabling Kerberos Authentication for Impala</a> for details about Kerberos authentication.
+    </div>
+
+    <p class="p toc inpage">
+      See the following sections for details about using the Impala authorization features:
+    </p>
+  </div>
+
+  <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_security.html">Impala Security</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="authorization__sentry_priv_model">
+
+    <h2 class="title topictitle2" id="ariaid-title2">The Sentry Privilege Model</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        Privileges can be granted on different objects in the schema. Any privilege that can be granted is
+        associated with a level in the object hierarchy. If a privilege is granted on a container object in the
+        hierarchy, the child object automatically inherits it. This is the same privilege model as Hive and other
+        database systems such as MySQL.
+      </p>
+
+      <p class="p">
+        The object hierarchy for Impala covers Server, URI, Database, Table, and Column. (The Table privileges apply to views as well;
+        anywhere you specify a table name, you can specify a view name instead.)
+        Column-level authorization is available in <span class="keyword">Impala 2.3</span> and higher.
+        Previously, you constructed views to query specific columns and assigned privilege based on
+        the views rather than the base tables. Now, you can use Impala's <a class="xref" href="impala_grant.html">GRANT Statement (Impala 2.0 or higher only)</a> and
+        <a class="xref" href="impala_revoke.html">REVOKE Statement (Impala 2.0 or higher only)</a> statements to assign and revoke privileges from specific columns
+        in a table.
+      </p>
+
+      <p class="p">
+        A restricted set of privileges determines what you can do with each object:
+      </p>
+
+      <dl class="dl">
+        
+
+          <dt class="dt dlterm" id="sentry_priv_model__select_priv">
+            SELECT privilege
+          </dt>
+
+          <dd class="dd">
+            Lets you read data from a table or view, for example with the <code class="ph codeph">SELECT</code> statement, the
+            <code class="ph codeph">INSERT...SELECT</code> syntax, or <code class="ph codeph">CREATE TABLE...LIKE</code>. Also required to
+            issue the <code class="ph codeph">DESCRIBE</code> statement or the <code class="ph codeph">EXPLAIN</code> statement for a query
+            against a particular table. Only objects for which a user has this privilege are shown in the output
+            for <code class="ph codeph">SHOW DATABASES</code> and <code class="ph codeph">SHOW TABLES</code> statements. The
+            <code class="ph codeph">REFRESH</code> statement and <code class="ph codeph">INVALIDATE METADATA</code> statements only access
+            metadata for tables for which the user has this privilege.
+          </dd>
+
+        
+
+        
+
+          <dt class="dt dlterm" id="sentry_priv_model__insert_priv">
+            INSERT privilege
+          </dt>
+
+          <dd class="dd">
+            Lets you write data to a table. Applies to the <code class="ph codeph">INSERT</code> and <code class="ph codeph">LOAD DATA</code>
+            statements.
+          </dd>
+
+        
+
+        
+
+          <dt class="dt dlterm" id="sentry_priv_model__all_priv">
+            ALL privilege
+          </dt>
+
+          <dd class="dd">
+            Lets you create or modify the object. Required to run DDL statements such as <code class="ph codeph">CREATE
+            TABLE</code>, <code class="ph codeph">ALTER TABLE</code>, or <code class="ph codeph">DROP TABLE</code> for a table,
+            <code class="ph codeph">CREATE DATABASE</code> or <code class="ph codeph">DROP DATABASE</code> for a database, or <code class="ph codeph">CREATE
+            VIEW</code>, <code class="ph codeph">ALTER VIEW</code>, or <code class="ph codeph">DROP VIEW</code> for a view. Also required for
+            the URI of the <span class="q">"location"</span> parameter for the <code class="ph codeph">CREATE EXTERNAL TABLE</code> and
+            <code class="ph codeph">LOAD DATA</code> statements.
+
+          </dd>
+
+        
+      </dl>
+
+      <p class="p">
+        Privileges can be specified for a table or view before that object actually exists. If you do not have
+        sufficient privilege to perform an operation, the error message does not disclose if the object exists or
+        not.
+      </p>
+
+      <p class="p">
+        Originally, privileges were encoded in a policy file, stored in HDFS. This mode of operation is still an
+        option, but the emphasis of privilege management is moving towards being SQL-based. Although currently
+        Impala does not have <code class="ph codeph">GRANT</code> or <code class="ph codeph">REVOKE</code> statements, Impala can make use of
+        privileges assigned through <code class="ph codeph">GRANT</code> and <code class="ph codeph">REVOKE</code> statements done through
+        Hive. The mode of operation with <code class="ph codeph">GRANT</code> and <code class="ph codeph">REVOKE</code> statements instead of
+        the policy file requires that a special Sentry service be enabled; this service stores, retrieves, and
+        manipulates privilege information stored inside the metastore database.
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="authorization__secure_startup">
+
+    <h2 class="title topictitle2" id="ariaid-title3">Starting the impalad Daemon with Sentry Authorization Enabled</h2>
+  
+
+    <div class="body conbody">
+
+      <p class="p">
+        To run the <span class="keyword cmdname">impalad</span> daemon with authorization enabled, you add one or more options to the
+        <code class="ph codeph">IMPALA_SERVER_ARGS</code> declaration in the <span class="ph filepath">/etc/default/impala</span>
+        configuration file:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          The <code class="ph codeph">-server_name</code> option turns on Sentry authorization for Impala. The authorization
+          rules refer to a symbolic server name, and you specify the name to use as the argument to the
+          <code class="ph codeph">-server_name</code> option.
+        </li>
+
+        <li class="li">
+          If you specify just <code class="ph codeph">-server_name</code>, Impala uses the Sentry service for authorization,
+          relying on the results of <code class="ph codeph">GRANT</code> and <code class="ph codeph">REVOKE</code> statements issued through
+          Hive. (This mode of operation is available in Impala 1.4.0 and higher.) Prior to Impala 1.4.0, or if you
+          want to continue storing privilege rules in the policy file, also specify the
+          <code class="ph codeph">-authorization_policy_file</code> option as in the following item.
+        </li>
+
+        <li class="li">
+          Specifying the <code class="ph codeph">-authorization_policy_file</code> option in addition to
+          <code class="ph codeph">-server_name</code> makes Impala read privilege information from a policy file, rather than
+          from the metastore database. The argument to the <code class="ph codeph">-authorization_policy_file</code> option
+          specifies the HDFS path to the policy file that defines the privileges on different schema objects.
+        </li>
+      </ul>
+
+      <p class="p">
+        For example, you might adapt your <span class="ph filepath">/etc/default/impala</span> configuration to contain lines
+        like the following. To use the Sentry service rather than the policy file:
+      </p>
+
+<pre class="pre codeblock"><code>IMPALA_SERVER_ARGS=" \
+-server_name=server1 \
+...
+</code></pre>
+
+      <p class="p">
+        Or to use the policy file, as in releases prior to Impala 1.4:
+      </p>
+
+<pre class="pre codeblock"><code>IMPALA_SERVER_ARGS=" \
+-authorization_policy_file=/user/hive/warehouse/auth-policy.ini \
+-server_name=server1 \
+...
+</code></pre>
+
+      <p class="p">
+        The preceding examples set up a symbolic name of <code class="ph codeph">server1</code> to refer to the current instance
+        of Impala. This symbolic name is used in the following ways:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          <p class="p">
+            Specify the <code class="ph codeph">server1</code> value for the <code class="ph codeph">sentry.hive.server</code> property in the
+            <span class="ph filepath">sentry-site.xml</span> configuration file for Hive, as well as in the
+            <code class="ph codeph">-server_name</code> option for <span class="keyword cmdname">impalad</span>.
+          </p>
+          <p class="p">
+            If the <span class="keyword cmdname">impalad</span> daemon is not already running, start it as described in
+            <a class="xref" href="impala_processes.html#processes">Starting Impala</a>. If it is already running, restart it with the command
+            <code class="ph codeph">sudo /etc/init.d/impala-server restart</code>. Run the appropriate commands on all the nodes
+            where <span class="keyword cmdname">impalad</span> normally runs.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            If you use the mode of operation using the policy file, the rules in the <code class="ph codeph">[roles]</code>
+            section of the policy file refer to this same <code class="ph codeph">server1</code> name. For example, the following
+            rule sets up a role <code class="ph codeph">report_generator</code> that lets users with that role query any table in
+            a database named <code class="ph codeph">reporting_db</code> on a node where the <span class="keyword cmdname">impalad</span> daemon
+            was started up with the <code class="ph codeph">-server_name=server1</code> option:
+          </p>
+<pre class="pre codeblock"><code>[roles]
+report_generator = server=server1-&gt;db=reporting_db-&gt;table=*-&gt;action=SELECT
+</code></pre>
+        </li>
+      </ul>
+
+      <p class="p">
+        When <span class="keyword cmdname">impalad</span> is started with one or both of the <code class="ph codeph">-server_name=server1</code>
+        and <code class="ph codeph">-authorization_policy_file</code> options, Impala authorization is enabled. If Impala detects
+        any errors or inconsistencies in the authorization settings or the policy file, the daemon refuses to
+        start.
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="authorization__sentry_service">
+
+    <h2 class="title topictitle2" id="ariaid-title4">Using Impala with the Sentry Service (<span class="keyword">Impala 1.4</span> or higher only)</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        When you use the Sentry service rather than the policy file, you set up privileges through
+        <code class="ph codeph">GRANT</code> and <code class="ph codeph">REVOKE</code> statement in either Impala or Hive, then both components
+        use those same privileges automatically. (Impala added the <code class="ph codeph">GRANT</code> and
+        <code class="ph codeph">REVOKE</code> statements in <span class="keyword">Impala 2.0</span>.)
+      </p>
+
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title5" id="authorization__security_policy_file">
+
+    <h2 class="title topictitle2" id="ariaid-title5">Using Impala with the Sentry Policy File</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        The policy file is a file that you put in a designated location in HDFS, and is read during the startup of
+        the <span class="keyword cmdname">impalad</span> daemon when you specify both the <code class="ph codeph">-server_name</code> and
+        <code class="ph codeph">-authorization_policy_file</code> startup options. It controls which objects (databases, tables,
+        and HDFS directory paths) can be accessed by the user who connects to <span class="keyword cmdname">impalad</span>, and what
+        operations that user can perform on the objects.
+      </p>
+
+      <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+        <p class="p">
+          The Sentry service, as described in <a class="xref" href="impala_authorization.html#sentry_service">Using Impala with the Sentry Service (Impala 1.4 or higher only)</a>, stores
+          authorization metadata in a relational database. This means you can manage user privileges for Impala tables
+          using traditional <code class="ph codeph">GRANT</code> and <code class="ph codeph">REVOKE</code> SQL statements, rather than the
+          policy file approach described here.If you are still using policy files, migrate to the
+          database-backed service whenever practical.
+        </p>
+      </div>
+
+      <p class="p">
+        The location of the policy file is listed in the <span class="ph filepath">auth-site.xml</span> configuration file. To
+        minimize overhead, the security information from this file is cached by each <span class="keyword cmdname">impalad</span>
+        daemon and refreshed automatically, with a default interval of 5 minutes. After making a substantial change
+        to security policies, restart all Impala daemons to pick up the changes immediately.
+      </p>
+
+      <p class="p toc inpage"></p>
+    </div>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title6" id="security_policy_file__security_policy_file_details">
+
+      <h3 class="title topictitle3" id="ariaid-title6">Policy File Location and Format</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          The policy file uses the familiar <code class="ph codeph">.ini</code> format, divided into the major sections
+          <code class="ph codeph">[groups]</code> and <code class="ph codeph">[roles]</code>. There is also an optional
+          <code class="ph codeph">[databases]</code> section, which allows you to specify a specific policy file for a particular
+          database, as explained in <a class="xref" href="#security_multiple_policy_files">Using Multiple Policy Files for Different Databases</a>. Another optional section,
+          <code class="ph codeph">[users]</code>, allows you to override the OS-level mapping of users to groups; that is an
+          advanced technique primarily for testing and debugging, and is beyond the scope of this document.
+        </p>
+
+        <p class="p">
+          In the <code class="ph codeph">[groups]</code> section, you define various categories of users and select which roles
+          are associated with each category. The group and usernames correspond to Linux groups and users on the
+          server where the <span class="keyword cmdname">impalad</span> daemon runs.
+        </p>
+
+        <p class="p">
+          The group and usernames in the <code class="ph codeph">[groups]</code> section correspond to Linux groups and users on
+          the server where the <span class="keyword cmdname">impalad</span> daemon runs. When you access Impala through the
+          <span class="keyword cmdname">impalad</span> interpreter, for purposes of authorization, the user is the logged-in Linux
+          user and the groups are the Linux groups that user is a member of. When you access Impala through the
+          ODBC or JDBC interfaces, the user and password specified through the connection string are used as login
+          credentials for the Linux server, and authorization is based on that username and the associated Linux
+          group membership.
+        </p>
+
+        <div class="p">
+          In the <code class="ph codeph">[roles]</code> section, you a set of roles. For each role, you specify precisely the set
+          of privileges is available. That is, which objects users with that role can access, and what operations
+          they can perform on those objects. This is the lowest-level category of security information; the other
+          sections in the policy file map the privileges to higher-level divisions of groups and users. In the
+          <code class="ph codeph">[groups]</code> section, you specify which roles are associated with which groups. The group
+          and usernames correspond to Linux groups and users on the server where the <span class="keyword cmdname">impalad</span>
+          daemon runs. The privileges are specified using patterns like:
+<pre class="pre codeblock"><code>server=<var class="keyword varname">server_name</var>-&gt;db=<var class="keyword varname">database_name</var>-&gt;table=<var class="keyword varname">table_name</var>-&gt;action=SELECT
+server=<var class="keyword varname">server_name</var>-&gt;db=<var class="keyword varname">database_name</var>-&gt;table=<var class="keyword varname">table_name</var>-&gt;action=CREATE
+server=<var class="keyword varname">server_name</var>-&gt;db=<var class="keyword varname">database_name</var>-&gt;table=<var class="keyword varname">table_name</var>-&gt;action=ALL
+</code></pre>
+          For the <var class="keyword varname">server_name</var> value, substitute the same symbolic name you specify with the
+          <span class="keyword cmdname">impalad</span> <code class="ph codeph">-server_name</code> option. You can use <code class="ph codeph">*</code> wildcard
+          characters at each level of the privilege specification to allow access to all such objects. For example:
+<pre class="pre codeblock"><code>server=impala-host.example.com-&gt;db=default-&gt;table=t1-&gt;action=SELECT
+server=impala-host.example.com-&gt;db=*-&gt;table=*-&gt;action=CREATE
+server=impala-host.example.com-&gt;db=*-&gt;table=audit_log-&gt;action=SELECT
+server=impala-host.example.com-&gt;db=default-&gt;table=t1-&gt;action=*
+</code></pre>
+        </div>
+
+        <p class="p">
+          When authorization is enabled, Impala uses the policy file as a <em class="ph i">whitelist</em>, representing every
+          privilege available to any user on any object. That is, only operations specified for the appropriate
+          combination of object, role, group, and user are allowed; all other operations are not allowed. If a
+          group or role is defined multiple times in the policy file, the last definition takes precedence.
+        </p>
+
+        <p class="p">
+          To understand the notion of whitelisting, set up a minimal policy file that does not provide any
+          privileges for any object. When you connect to an Impala node where this policy file is in effect, you
+          get no results for <code class="ph codeph">SHOW DATABASES</code>, and an error when you issue any <code class="ph codeph">SHOW
+          TABLES</code>, <code class="ph codeph">USE <var class="keyword varname">database_name</var></code>, <code class="ph codeph">DESCRIBE
+          <var class="keyword varname">table_name</var></code>, <code class="ph codeph">SELECT</code>, and or other statements that expect to
+          access databases or tables, even if the corresponding databases and tables exist.
+        </p>
+
+        <p class="p">
+          The contents of the policy file are cached, to avoid a performance penalty for each query. The policy
+          file is re-checked by each <span class="keyword cmdname">impalad</span> node every 5 minutes. When you make a
+          non-time-sensitive change such as adding new privileges or new users, you can let the change take effect
+          automatically a few minutes later. If you remove or reduce privileges, and want the change to take effect
+          immediately, restart the <span class="keyword cmdname">impalad</span> daemon on all nodes, again specifying the
+          <code class="ph codeph">-server_name</code> and <code class="ph codeph">-authorization_policy_file</code> options so that the rules
+          from the updated policy file are applied.
+        </p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title7" id="security_policy_file__security_examples">
+
+      <h3 class="title topictitle3" id="ariaid-title7">Examples of Policy File Rules for Security Scenarios</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          The following examples show rules that might go in the policy file to deal with various
+          authorization-related scenarios. For illustration purposes, this section shows several very small policy
+          files with only a few rules each. In your environment, typically you would define many roles to cover all
+          the scenarios involving your own databases, tables, and applications, and a smaller number of groups,
+          whose members are given the privileges from one or more roles.
+        </p>
+
+        <div class="example" id="security_examples__sec_ex_unprivileged"><h4 class="title sectiontitle">A User with No Privileges</h4>
+
+          
+
+          <p class="p">
+            If a user has no privileges at all, that user cannot access any schema objects in the system. The error
+            messages do not disclose the names or existence of objects that the user is not authorized to read.
+          </p>
+
+          <p class="p">
+
+            This is the experience you want a user to have if they somehow log into a system where they are not an
+            authorized Impala user. In a real deployment with a filled-in policy file, a user might have no
+            privileges because they are not a member of any of the relevant groups mentioned in the policy file.
+          </p>
+
+
+
+        </div>
+
+        <div class="example" id="security_examples__sec_ex_superuser"><h4 class="title sectiontitle">Examples of Privileges for Administrative Users</h4>
+
+          
+
+          <p class="p">
+            When an administrative user has broad access to tables or databases, the associated rules in the
+            <code class="ph codeph">[roles]</code> section typically use wildcards and/or inheritance. For example, in the
+            following sample policy file, <code class="ph codeph">db=*</code> refers to all databases and
+            <code class="ph codeph">db=*-&gt;table=*</code> refers to all tables in all databases.
+          </p>
+
+          <p class="p">
+            Omitting the rightmost portion of a rule means that the privileges apply to all the objects that could
+            be specified there. For example, in the following sample policy file, the
+            <code class="ph codeph">all_databases</code> role has all privileges for all tables in all databases, while the
+            <code class="ph codeph">one_database</code> role has all privileges for all tables in one specific database. The
+            <code class="ph codeph">all_databases</code> role does not grant privileges on URIs, so a group with that role could
+            not issue a <code class="ph codeph">CREATE TABLE</code> statement with a <code class="ph codeph">LOCATION</code> clause. The
+            <code class="ph codeph">entire_server</code> role has all privileges on both databases and URIs within the server.
+          </p>
+
+<pre class="pre codeblock"><code>[groups]
+supergroup = all_databases
+
+[roles]
+read_all_tables = server=server1-&gt;db=*-&gt;table=*-&gt;action=SELECT
+all_tables = server=server1-&gt;db=*-&gt;table=*
+all_databases = server=server1-&gt;db=*
+one_database = server=server1-&gt;db=test_db
+entire_server = server=server1
+</code></pre>
+
+        </div>
+
+        <div class="example" id="security_examples__sec_ex_detailed"><h4 class="title sectiontitle">A User with Privileges for Specific Databases and Tables</h4>
+
+          
+
+          <p class="p">
+            If a user has privileges for specific tables in specific databases, the user can access those things
+            but nothing else. They can see the tables and their parent databases in the output of <code class="ph codeph">SHOW
+            TABLES</code> and <code class="ph codeph">SHOW DATABASES</code>, <code class="ph codeph">USE</code> the appropriate databases,
+            and perform the relevant actions (<code class="ph codeph">SELECT</code> and/or <code class="ph codeph">INSERT</code>) based on the
+            table privileges. To actually create a table requires the <code class="ph codeph">ALL</code> privilege at the
+            database level, so you might define separate roles for the user that sets up a schema and other users
+            or applications that perform day-to-day operations on the tables.
+          </p>
+
+          <p class="p">
+            The following sample policy file shows some of the syntax that is appropriate as the policy file grows,
+            such as the <code class="ph codeph">#</code> comment syntax, <code class="ph codeph">\</code> continuation syntax, and comma
+            separation for roles assigned to groups or privileges assigned to roles.
+          </p>
+
+<pre class="pre codeblock"><code>[groups]
+employee = training_sysadmin, instructor
+visitor = student
+
+[roles]
+training_sysadmin = server=server1-&gt;db=training, \
+server=server1-&gt;db=instructor_private, \
+server=server1-&gt;db=lesson_development
+instructor = server=server1-&gt;db=training-&gt;table=*-&gt;action=*, \
+server=server1-&gt;db=instructor_private-&gt;table=*-&gt;action=*, \
+server=server1-&gt;db=lesson_development-&gt;table=lesson*
+# This particular course is all about queries, so the students can SELECT but not INSERT or CREATE/DROP.
+student = server=server1-&gt;db=training-&gt;table=lesson_*-&gt;action=SELECT
+</code></pre>
+
+        </div>
+
+
+
+        <div class="example" id="security_examples__sec_ex_external_files"><h4 class="title sectiontitle">Privileges for Working with External Data Files</h4>
+
+          
+
+          <p class="p">
+            When data is being inserted through the <code class="ph codeph">LOAD DATA</code> statement, or is referenced from an
+            HDFS location outside the normal Impala database directories, the user also needs appropriate
+            permissions on the URIs corresponding to those HDFS locations.
+          </p>
+
+          <p class="p">
+            In this sample policy file:
+          </p>
+
+          <ul class="ul">
+            <li class="li">
+              The <code class="ph codeph">external_table</code> role lets us insert into and query the Impala table,
+              <code class="ph codeph">external_table.sample</code>.
+            </li>
+
+            <li class="li">
+              The <code class="ph codeph">staging_dir</code> role lets us specify the HDFS path
+              <span class="ph filepath">/user/username/external_data</span> with the <code class="ph codeph">LOAD DATA</code> statement.
+              Remember, when Impala queries or loads data files, it operates on all the files in that directory,
+              not just a single file, so any Impala <code class="ph codeph">LOCATION</code> parameters refer to a directory
+              rather than an individual file.
+            </li>
+
+            <li class="li">
+              We included the IP address and port of the Hadoop name node in the HDFS URI of the
+              <code class="ph codeph">staging_dir</code> rule. We found those details in
+              <span class="ph filepath">/etc/hadoop/conf/core-site.xml</span>, under the <code class="ph codeph">fs.default.name</code>
+              element. That is what we use in any roles that specify URIs (that is, the locations of directories in
+              HDFS).
+            </li>
+
+            <li class="li">
+              We start this example after the table <code class="ph codeph">external_table.sample</code> is already created. In
+              the policy file for the example, we have already taken away the <code class="ph codeph">external_table_admin</code>
+              role from the <code class="ph codeph">username</code> group, and replaced it with the lesser-privileged
+              <code class="ph codeph">external_table</code> role.
+            </li>
+
+            <li class="li">
+              We assign privileges to a subdirectory underneath <span class="ph filepath">/user/username</span> in HDFS,
+              because such privileges also apply to any subdirectories underneath. If we had assigned privileges to
+              the parent directory <span class="ph filepath">/user/username</span>, it would be too likely to mess up other
+              files by specifying a wrong location by mistake.
+            </li>
+
+            <li class="li">
+              The <code class="ph codeph">username</code> under the <code class="ph codeph">[groups]</code> section refers to the
+              <code class="ph codeph">username</code> group. (In this example, there is a <code class="ph codeph">username</code> user
+              that is a member of a <code class="ph codeph">username</code> group.)
+            </li>
+          </ul>
+
+          <p class="p">
+            Policy file:
+          </p>
+
+<pre class="pre codeblock"><code>[groups]
+username = external_table, staging_dir
+
+[roles]
+external_table_admin = server=server1-&gt;db=external_table
+external_table = server=server1-&gt;db=external_table-&gt;table=sample-&gt;action=*
+staging_dir = server=server1-&gt;uri=hdfs://127.0.0.1:8020/user/username/external_data-&gt;action=*
+</code></pre>
+
+          <p class="p">
+            <span class="keyword cmdname">impala-shell</span> session:
+          </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; use external_table;
+Query: use external_table
+[localhost:21000] &gt; show tables;
+Query: show tables
+Query finished, fetching results ...
++--------+
+| name   |
++--------+
+| sample |
++--------+
+Returned 1 row(s) in 0.02s
+
+[localhost:21000] &gt; select * from sample;
+Query: select * from sample
+Query finished, fetching results ...
++-----+
+| x   |
++-----+
+| 1   |
+| 5   |
+| 150 |
++-----+
+Returned 3 row(s) in 1.04s
+
+[localhost:21000] &gt; load data inpath '/user/username/external_data' into table sample;
+Query: load data inpath '/user/username/external_data' into table sample
+Query finished, fetching results ...
++----------------------------------------------------------+
+| summary                                                  |
++----------------------------------------------------------+
+| Loaded 1 file(s). Total files in destination location: 2 |
++----------------------------------------------------------+
+Returned 1 row(s) in 0.26s
+[localhost:21000] &gt; select * from sample;
+Query: select * from sample
+Query finished, fetching results ...
++-------+
+| x     |
++-------+
+| 2     |
+| 4     |
+| 6     |
+| 8     |
+| 64738 |
+| 49152 |
+| 1     |
+| 5     |
+| 150   |
++-------+
+Returned 9 row(s) in 0.22s
+
+[localhost:21000] &gt; load data inpath '/user/username/unauthorized_data' into table sample;
+Query: load data inpath '/user/username/unauthorized_data' into table sample
+ERROR: AuthorizationException: User 'username' does not have privileges to access: hdfs://127.0.0.1:8020/user/username/unauthorized_data
+</code></pre>
+
+        </div>
+
+        
+
+        <div class="example" id="security_examples__sec_sysadmin"><h4 class="title sectiontitle">Separating Administrator Responsibility from Read and Write Privileges</h4>
+
+          
+
+          <p class="p">
+            Remember that to create a database requires full privilege on that database, while day-to-day
+            operations on tables within that database can be performed with lower levels of privilege on specific
+            table. Thus, you might set up separate roles for each database or application: an administrative one
+            that could create or drop the database, and a user-level one that can access only the relevant tables.
+          </p>
+
+          <p class="p">
+            For example, this policy file divides responsibilities between users in 3 different groups:
+          </p>
+
+          <ul class="ul">
+            <li class="li">
+              Members of the <code class="ph codeph">supergroup</code> group have the <code class="ph codeph">training_sysadmin</code> role and
+              so can set up a database named <code class="ph codeph">training</code>.
+            </li>
+
+            <li class="li"> Members of the <code class="ph codeph">employee</code> group have the
+                <code class="ph codeph">instructor</code> role and so can create, insert into,
+              and query any tables in the <code class="ph codeph">training</code> database,
+              but cannot create or drop the database itself. </li>
+
+            <li class="li">
+              Members of the <code class="ph codeph">visitor</code> group have the <code class="ph codeph">student</code> role and so can query
+              those tables in the <code class="ph codeph">training</code> database.
+            </li>
+          </ul>
+
+<pre class="pre codeblock"><code>[groups]
+supergroup = training_sysadmin
+employee = instructor
+visitor = student
+
+[roles]
+training_sysadmin = server=server1-&gt;db=training
+instructor = server=server1-&gt;db=training-&gt;table=*-&gt;action=*
+student = server=server1-&gt;db=training-&gt;table=*-&gt;action=SELECT
+</code></pre>
+
+        </div>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title8" id="security_policy_file__security_multiple_policy_files">
+
+      <h3 class="title topictitle3" id="ariaid-title8">Using Multiple Policy Files for Different Databases</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          For an Impala cluster with many databases being accessed by many users and applications, it might be
+          cumbersome to update the security policy file for each privilege change or each new database, table, or
+          view. You can allow security to be managed separately for individual databases, by setting up a separate
+          policy file for each database:
+        </p>
+
+        <ul class="ul">
+          <li class="li">
+            Add the optional <code class="ph codeph">[databases]</code> section to the main policy file.
+          </li>
+
+          <li class="li">
+            Add entries in the <code class="ph codeph">[databases]</code> section for each database that has its own policy file.
+          </li>
+
+          <li class="li">
+            For each listed database, specify the HDFS path of the appropriate policy file.
+          </li>
+        </ul>
+
+        <p class="p">
+          For example:
+        </p>
+
+<pre class="pre codeblock"><code>[databases]
+# Defines the location of the per-DB policy files for the 'customers' and 'sales' databases.
+customers = hdfs://ha-nn-uri/etc/access/customers.ini
+sales = hdfs://ha-nn-uri/etc/access/sales.ini
+</code></pre>
+
+        <p class="p">
+          To enable URIs in per-DB policy files, the Java configuration option <code class="ph codeph">sentry.allow.uri.db.policyfile</code>
+          must be set to <code class="ph codeph">true</code>.
+	  For example:
+        </p>
+
+<pre class="pre codeblock"><code>JAVA_TOOL_OPTIONS="-Dsentry.allow.uri.db.policyfile=true"
+</code></pre>
+
+        <div class="note important note_important"><span class="note__title importanttitle">Important:</span> 
+          Enabling URIs in per-DB policy files introduces a security risk by allowing the owner of the db-level
+          policy file to grant himself/herself load privileges to anything the <code class="ph codeph">impala</code> user has
+          read permissions for in HDFS (including data in other databases controlled by different db-level policy
+          files).
+        </div>
+      </div>
+    </article>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title9" id="authorization__security_schema">
+
+    <h2 class="title topictitle2" id="ariaid-title9">Setting Up Schema Objects for a Secure Impala Deployment</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        Remember that in your role definitions, you specify privileges at the level of individual databases and
+        tables, or all databases or all tables within a database. To simplify the structure of these rules, plan
+        ahead of time how to name your schema objects so that data with different authorization requirements is
+        divided into separate databases.
+      </p>
+
+      <p class="p">
+        If you are adding security on top of an existing Impala deployment, remember that you can rename tables or
+        even move them between databases using the <code class="ph codeph">ALTER TABLE</code> statement. In Impala, creating new
+        databases is a relatively inexpensive operation, basically just creating a new directory in HDFS.
+      </p>
+
+      <p class="p">
+        You can also plan the security scheme and set up the policy file before the actual schema objects named in
+        the policy file exist. Because the authorization capability is based on whitelisting, a user can only
+        create a new database or table if the required privilege is already in the policy file: either by listing
+        the exact name of the object being created, or a <code class="ph codeph">*</code> wildcard to match all the applicable
+        objects within the appropriate container.
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title10" id="authorization__security_privileges">
+
+    <h2 class="title topictitle2" id="ariaid-title10">Privilege Model and Object Hierarchy</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        Privileges can be granted on different objects in the schema. Any privilege that can be granted is
+        associated with a level in the object hierarchy. If a privilege is granted on a container object in the
+        hierarchy, the child object automatically inherits it. This is the same privilege model as Hive and other
+        database systems such as MySQL.
+      </p>
+
+      <p class="p">
+        The kinds of objects in the schema hierarchy are:
+      </p>
+
+<pre class="pre codeblock"><code>Server
+URI
+Database
+  Table
+</code></pre>
+
+      <p class="p">
+        The server name is specified by the <code class="ph codeph">-server_name</code> option when <span class="keyword cmdname">impalad</span>
+        starts. Specify the same name for all <span class="keyword cmdname">impalad</span> nodes in the cluster.
+      </p>
+
+      <p class="p">
+        URIs represent the HDFS paths you specify as part of statements such as <code class="ph codeph">CREATE EXTERNAL
+        TABLE</code> and <code class="ph codeph">LOAD DATA</code>. Typically, you specify what look like UNIX paths, but these
+        locations can also be prefixed with <code class="ph codeph">hdfs://</code> to make clear that they are really URIs. To
+        set privileges for a URI, specify the name of a directory, and the privilege applies to all the files in
+        that directory and any directories underneath it.
+      </p>
+
+      <p class="p">
+        In <span class="keyword">Impala 2.3</span> and higher, you can specify privileges for individual columns.
+        Formerly, to specify read privileges at this level, you created a view that queried specific columns
+        and/or partitions from a base table, and gave <code class="ph codeph">SELECT</code> privilege on the view but not
+        the underlying table. Now, you can use Impala's <a class="xref" href="impala_grant.html">GRANT Statement (Impala 2.0 or higher only)</a> and
+        <a class="xref" href="impala_revoke.html">REVOKE Statement (Impala 2.0 or higher only)</a> statements to assign and revoke privileges from specific columns
+        in a table.
+      </p>
+
+      <div class="p">
+        URIs must start with either <code class="ph codeph">hdfs://</code> or <code class="ph codeph">file://</code>. If a URI starts with
+        anything else, it will cause an exception and the policy file will be invalid. When defining URIs for HDFS,
+        you must also specify the NameNode. For example:
+<pre class="pre codeblock"><code>data_read = server=server1-&gt;uri=file:///path/to/dir, \
+server=server1-&gt;uri=hdfs://namenode:port/path/to/dir
+</code></pre>
+        <div class="note warning note_warning"><span class="note__title warningtitle">Warning:</span> 
+          <p class="p">
+            Because the NameNode host and port must be specified, enable High Availability (HA) to ensure
+            that the URI will remain constant even if the NameNode changes.
+          </p>
+<pre class="pre codeblock"><code>data_read = server=server1-&gt;uri=file:///path/to/dir,\ server=server1-&gt;uri=hdfs://ha-nn-uri/path/to/dir
+</code></pre>
+        </div>
+      </div>
+
+
+
+
+
+      <table class="table"><caption><span class="table--title-label">Table 1. </span><span class="title">Valid privilege types and objects they apply to</span></caption><colgroup><col style="width:33.33333333333333%"><col style="width:66.66666666666666%"></colgroup><thead class="thead">
+            <tr class="row">
+              <th class="entry nocellnorowborder" id="security_privileges__entry__1"><strong class="ph b">Privilege</strong></th>
+              <th class="entry nocellnorowborder" id="security_privileges__entry__2"><strong class="ph b">Object</strong></th>
+            </tr>
+          </thead><tbody class="tbody">
+            <tr class="row">
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__1 ">INSERT</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__2 ">DB, TABLE</td>
+            </tr>
+            <tr class="row">
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__1 ">SELECT</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__2 ">DB, TABLE, COLUMN</td>
+            </tr>
+            <tr class="row">
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__1 ">ALL</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__2 ">SERVER, TABLE, DB, URI</td>
+            </tr>
+          </tbody></table>
+
+      <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+        <p class="p">
+          Although this document refers to the <code class="ph codeph">ALL</code> privilege, currently if you use the policy file
+          mode, you do not use the actual keyword <code class="ph codeph">ALL</code> in the policy file. When you code role
+          entries in the policy file:
+        </p>
+        <ul class="ul">
+          <li class="li">
+            To specify the <code class="ph codeph">ALL</code> privilege for a server, use a role like
+            <code class="ph codeph">server=<var class="keyword varname">server_name</var></code>.
+          </li>
+
+          <li class="li">
+            To specify the <code class="ph codeph">ALL</code> privilege for a database, use a role like
+            <code class="ph codeph">server=<var class="keyword varname">server_name</var>-&gt;db=<var class="keyword varname">database_name</var></code>.
+          </li>
+
+          <li class="li">
+            To specify the <code class="ph codeph">ALL</code> privilege for a table, use a role like
+            <code class="ph codeph">server=<var class="keyword varname">server_name</var>-&gt;db=<var class="keyword varname">database_name</var>-&gt;table=<var class="keyword varname">table_name</var>-&gt;action=*</code>.
+          </li>
+        </ul>
+      </div>
+      <table class="table"><caption></caption><colgroup><col style="width:29.241071428571423%"><col style="width:26.116071428571423%"><col style="width:22.32142857142857%"><col style="width:22.32142857142857%"></colgroup><thead class="thead">
+            <tr class="row">
+              <th class="entry nocellnorowborder" id="security_privileges__entry__9">
+                Operation
+              </th>
+              <th class="entry nocellnorowborder" id="security_privileges__entry__10">
+                Scope
+              </th>
+              <th class="entry nocellnorowborder" id="security_privileges__entry__11">
+                Privileges
+              </th>
+              <th class="entry nocellnorowborder" id="security_privileges__entry__12">
+                URI
+              </th>
+            </tr>
+          </thead><tbody class="tbody">
+            <tr class="row">
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__9 ">EXPLAIN</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__10 ">TABLE; COLUMN</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__11 ">SELECT</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__12 "></td>
+            </tr>
+            <tr class="row">
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__9 ">LOAD DATA</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__10 ">TABLE</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__11 ">INSERT</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__12 ">URI</td>
+            </tr>
+            <tr class="row">
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__9 ">CREATE DATABASE</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__10 ">SERVER</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__11 ">ALL</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__12 "></td>
+            </tr>
+            <tr class="row">
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__9 ">DROP DATABASE</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__10 ">DATABASE</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__11 ">ALL</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__12 "></td>
+            </tr>
+            <tr class="row">
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__9 ">CREATE TABLE</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__10 ">DATABASE</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__11 ">ALL</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__12 "></td>
+            </tr>
+            <tr class="row">
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__9 ">DROP TABLE</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__10 ">TABLE</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__11 ">ALL</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__12 "></td>
+            </tr>
+            <tr class="row">
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__9 ">DESCRIBE TABLE<p class="p">-Output shows <em class="ph i">all</em> columns if the
+                  user has table level-privileges or <code class="ph codeph">SELECT</code>
+                  privilege on at least one table column</p></td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__10 ">TABLE</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__11 ">SELECT/INSERT</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__12 "></td>
+            </tr>
+            <tr class="row">
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__9 ">ALTER TABLE .. ADD COLUMNS</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__10 ">TABLE</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__11 ">ALL on DATABASE</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__12 "></td>
+            </tr>
+            <tr class="row">
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__9 ">ALTER TABLE .. REPLACE COLUMNS</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__10 ">TABLE</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__11 ">ALL on DATABASE</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__12 "></td>
+            </tr>
+            <tr class="row">
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__9 ">ALTER TABLE .. CHANGE column</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__10 ">TABLE</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__11 ">ALL on DATABASE</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__12 "></td>
+            </tr>
+            <tr class="row">
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__9 ">ALTER TABLE .. RENAME</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__10 ">TABLE</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__11 ">ALL on DATABASE</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__12 "></td>
+            </tr>
+            <tr class="row">
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__9 ">ALTER TABLE .. SET TBLPROPERTIES</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__10 ">TABLE</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__11 ">ALL on DATABASE</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__12 "></td>
+            </tr>
+            <tr class="row">
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__9 ">ALTER TABLE .. SET FILEFORMAT</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__10 ">TABLE</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__11 ">ALL on DATABASE</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__12 "></td>
+            </tr>
+            <tr class="row">
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__9 ">ALTER TABLE .. SET LOCATION</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__10 ">TABLE</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__11 ">ALL on DATABASE</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__12 ">URI</td>
+            </tr>
+            <tr class="row">
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__9 ">ALTER TABLE .. ADD PARTITION</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__10 ">TABLE</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__11 ">ALL on DATABASE</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__12 "></td>
+            </tr>
+            <tr class="row">
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__9 ">ALTER TABLE .. ADD PARTITION location</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__10 ">TABLE</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__11 ">ALL on DATABASE</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__12 ">URI</td>
+            </tr>
+            <tr class="row">
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__9 ">ALTER TABLE .. DROP PARTITION</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__10 ">TABLE</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__11 ">ALL on DATABASE</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__12 "></td>
+            </tr>
+            <tr class="row">
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__9 ">ALTER TABLE .. PARTITION SET FILEFORMAT</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__10 ">TABLE</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__11 ">ALL on DATABASE</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__12 "></td>
+            </tr>
+            <tr class="row">
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__9 ">ALTER TABLE .. SET SERDEPROPERTIES</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__10 ">TABLE</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__11 ">ALL on DATABASE</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__12 "></td>
+            </tr>
+            <tr class="row">
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__9 ">CREATE VIEW<p class="p">-This operation is allowed if you have
+                  column-level <code class="ph codeph">SELECT</code> access to the columns
+                  being used.</p></td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__10 ">DATABASE; SELECT on TABLE; </td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__11 ">ALL</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__12 "></td>
+            </tr>
+            <tr class="row">
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__9 ">DROP VIEW</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__10 ">VIEW/TABLE</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__11 ">ALL</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__12 "></td>
+            </tr>
+            <tr class="row" id="security_privileges__alter_view_privs">
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__9 ">
+                ALTER VIEW
+              </td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__10 ">
+                You need <code class="ph codeph">ALL</code> privilege on the named view <span class="ph">and the parent
+                database</span>, plus <code class="ph codeph">SELECT</code> privilege for any tables or views referenced by the
+                view query. Once the view is created or altered by a high-privileged system administrator, it can
+                be queried by a lower-privileged user who does not have full query privileges for the base tables.
+              </td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__11 ">
+                ALL, SELECT
+              </td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__12 "></td>
+            </tr>
+            <tr class="row">
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__9 ">ALTER TABLE .. SET LOCATION</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__10 ">TABLE</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__11 ">ALL on DATABASE</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__12 ">URI</td>
+            </tr>
+            <tr class="row" id="security_privileges__create_external_table_privs">
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__9 ">
+                CREATE EXTERNAL TABLE
+              </td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__10 ">
+                Database (ALL), URI (SELECT)
+              </td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__11 ">
+                ALL, SELECT
+              </td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__12 "></td>
+            </tr>
+            <tr class="row">
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__9 ">SELECT<p class="p">-You can grant the SELECT privilege on a view to
+                  give users access to specific columns of a table they do not
+                  otherwise have access to.</p><p class="p">-See
+                  <span class="xref">the documentation for Apache Sentry</span>
+                  for details on allowed column-level
+                operations.</p></td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__10 ">VIEW/TABLE; COLUMN</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__11 ">SELECT</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__12 "></td>
+            </tr>
+            <tr class="row">
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__9 ">USE &lt;dbName&gt;</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__10 ">Any</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__11 "></td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__12 "></td>
+            </tr>
+            <tr class="row">
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__9 ">CREATE FUNCTION</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__10 ">SERVER</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__11 ">ALL</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__12 "></td>
+            </tr>
+            <tr class="row">
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__9 ">DROP FUNCTION</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__10 ">SERVER</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__11 ">ALL</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__12 "></td>
+            </tr>
+            <tr class="row">
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__9 ">REFRESH &lt;table name&gt; or REFRESH &lt;table name&gt; PARTITION (&lt;partition_spec&gt;)</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__10 ">TABLE</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__11 ">SELECT/INSERT</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__12 "></td>
+            </tr>
+            <tr class="row">
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__9 ">INVALIDATE METADATA</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__10 ">SERVER</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__11 ">ALL</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__12 "></td>
+            </tr>
+            <tr class="row">
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__9 ">INVALIDATE METADATA &lt;table name&gt;</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__10 ">TABLE</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__11 ">SELECT/INSERT</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__12 "></td>
+            </tr>
+            <tr class="row">
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__9 ">COMPUTE STATS</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__10 ">TABLE</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__11 ">ALL</td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__12 "></td>
+            </tr>
+            <tr class="row" id="security_privileges__show_table_stats_privs">
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__9 ">
+                SHOW TABLE STATS, SHOW PARTITIONS
+              </td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__10 ">
+                TABLE
+              </td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__11 ">
+                SELECT/INSERT
+              </td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__12 "></td>
+            </tr>
+            <tr class="row">
+              <td class="entry nocellnorowborder" id="security_privileges__show_column_stats_privs" headers="security_privileges__entry__9 ">
+                SHOW COLUMN STATS
+              </td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__10 ">
+                TABLE
+              </td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__11 ">
+                SELECT/INSERT
+              </td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__12 "></td>
+            </tr>
+            <tr class="row">
+              <td class="entry nocellnorowborder" id="security_privileges__show_functions_privs" headers="security_privileges__entry__9 ">
+                SHOW FUNCTIONS
+              </td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__10 ">
+                DATABASE
+              </td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__11 ">
+                SELECT
+              </td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__12 "></td>
+            </tr>
+            <tr class="row" id="security_privileges__show_tables_privs">
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__9 ">
+                SHOW TABLES
+              </td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__10 "></td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__11 ">
+                No special privileges needed to issue the statement, but only shows objects you are authorized for
+              </td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__12 "></td>
+            </tr>
+            <tr class="row" id="security_privileges__show_databases_privs">
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__9 ">
+                SHOW DATABASES, SHOW SCHEMAS
+              </td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__10 "></td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__11 ">
+                No special privileges needed to issue the statement, but only shows objects you are authorized for
+              </td>
+              <td class="entry nocellnorowborder" headers="security_privileges__entry__12 "></td>
+            </tr>
+          </tbody></table>
+
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title11" id="authorization__sentry_debug">
+
+    <h2 class="title topictitle2" id="ariaid-title11"><span class="ph">Debugging Failed Sentry Authorization Requests</span></h2>
+
+    <div class="body conbody">
+
+      <div class="p">
+      Sentry logs all facts that lead up to authorization decisions at the debug level. If you do not understand
+      why Sentry is denying access, the best way to debug is to temporarily turn on debug logging:
+      <ul class="ul">
+        <li class="li">
+          Add <code class="ph codeph">log4j.logger.org.apache.sentry=DEBUG</code> to the <span class="ph filepath">log4j.properties</span>
+          file on each host in the cluster, in the appropriate configuration directory for each service.
+        </li>
+      </ul>
+      Specifically, look for exceptions and messages such as:
+<pre class="pre codeblock"><code>FilePermission server..., RequestPermission server...., result [true|false]</code></pre>
+      which indicate each evaluation Sentry makes. The <code class="ph codeph">FilePermission</code> is from the policy file,
+      while <code class="ph codeph">RequestPermission</code> is the privilege required for the query. A
+      <code class="ph codeph">RequestPermission</code> will iterate over all appropriate <code class="ph codeph">FilePermission</code>
+      settings until a match is found. If no matching privilege is found, Sentry returns <code class="ph codeph">false</code>
+      indicating <span class="q">"Access Denied"</span> .
+
+    </div>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title12" id="authorization__sec_ex_default">
+
+    <h2 class="title topictitle2" id="ariaid-title12">The DEFAULT Database in a Secure Deployment</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        Because of the extra emphasis on granular access controls in a secure deployment, you should move any
+        important or sensitive information out of the <code class="ph codeph">DEFAULT</code> database into a named database whose
+        privileges are specified in the policy file. Sometimes you might need to give privileges on the
+        <code class="ph codeph">DEFAULT</code> database for administrative reasons; for example, as a place you can reliably
+        specify with a <code class="ph codeph">USE</code> statement when preparing to drop a database.
+      </p>
+
+
+    </div>
+  </article>
+</article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_avg.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_avg.html b/docs/build/html/topics/impala_avg.html
new file mode 100644
index 0000000..effb01b
--- /dev/null
+++ b/docs/build/html/topics/impala_avg.html
@@ -0,0 +1,318 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_aggregate_functions.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="avg"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>AVG Function</title></head><body id="avg"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">AVG Function</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      
+      An aggregate function that returns the average value from a set of numbers or <code class="ph codeph">TIMESTAMP</code> values.
+      Its single argument can be numeric column, or the numeric result of a function or expression applied to the
+      column value. Rows with a <code class="ph codeph">NULL</code> value for the specified column are ignored. If the table is empty,
+      or all the values supplied to <code class="ph codeph">AVG</code> are <code class="ph codeph">NULL</code>, <code class="ph codeph">AVG</code> returns
+      <code class="ph codeph">NULL</code>.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>AVG([DISTINCT | ALL] <var class="keyword varname">expression</var>) [OVER (<var class="keyword varname">analytic_clause</var>)]
+</code></pre>
+
+    <p class="p">
+      When the query contains a <code class="ph codeph">GROUP BY</code> clause, returns one value for each combination of
+      grouping values.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Return type:</strong> <code class="ph codeph">DOUBLE</code> for numeric values; <code class="ph codeph">TIMESTAMP</code> for
+      <code class="ph codeph">TIMESTAMP</code> values
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Complex type considerations:</strong>
+      </p>
+
+    <p class="p">
+        To access a column with a complex type (<code class="ph codeph">ARRAY</code>, <code class="ph codeph">STRUCT</code>, or <code class="ph codeph">MAP</code>)
+        in an aggregation function, you unpack the individual elements using join notation in the query,
+        and then apply the function to the final scalar item, field, key, or value at the bottom of any nested type hierarchy in the column.
+        See <a class="xref" href="../shared/../topics/impala_complex_types.html#complex_types">Complex Types (Impala 2.3 or higher only)</a> for details about using complex types in Impala.
+      </p>
+
+    <div class="p">
+The following example demonstrates calls to several aggregation functions
+using values from a column containing nested complex types
+(an <code class="ph codeph">ARRAY</code> of <code class="ph codeph">STRUCT</code> items).
+The array is unpacked inside the query using join notation.
+The array elements are referenced using the <code class="ph codeph">ITEM</code>
+pseudocolumn, and the structure fields inside the array elements
+are referenced using dot notation.
+Numeric values such as <code class="ph codeph">SUM()</code> and <code class="ph codeph">AVG()</code>
+are computed using the numeric <code class="ph codeph">R_NATIONKEY</code> field, and
+the general-purpose <code class="ph codeph">MAX()</code> and <code class="ph codeph">MIN()</code>
+values are computed from the string <code class="ph codeph">N_NAME</code> field.
+<pre class="pre codeblock"><code>describe region;
++-------------+-------------------------+---------+
+| name        | type                    | comment |
++-------------+-------------------------+---------+
+| r_regionkey | smallint                |         |
+| r_name      | string                  |         |
+| r_comment   | string                  |         |
+| r_nations   | array&lt;struct&lt;           |         |
+|             |   n_nationkey:smallint, |         |
+|             |   n_name:string,        |         |
+|             |   n_comment:string      |         |
+|             | &gt;&gt;                      |         |
++-------------+-------------------------+---------+
+
+select r_name, r_nations.item.n_nationkey
+  from region, region.r_nations as r_nations
+order by r_name, r_nations.item.n_nationkey;
++-------------+------------------+
+| r_name      | item.n_nationkey |
++-------------+------------------+
+| AFRICA      | 0                |
+| AFRICA      | 5                |
+| AFRICA      | 14               |
+| AFRICA      | 15               |
+| AFRICA      | 16               |
+| AMERICA     | 1                |
+| AMERICA     | 2                |
+| AMERICA     | 3                |
+| AMERICA     | 17               |
+| AMERICA     | 24               |
+| ASIA        | 8                |
+| ASIA        | 9                |
+| ASIA        | 12               |
+| ASIA        | 18               |
+| ASIA        | 21               |
+| EUROPE      | 6                |
+| EUROPE      | 7                |
+| EUROPE      | 19               |
+| EUROPE      | 22               |
+| EUROPE      | 23               |
+| MIDDLE EAST | 4                |
+| MIDDLE EAST | 10               |
+| MIDDLE EAST | 11               |
+| MIDDLE EAST | 13               |
+| MIDDLE EAST | 20               |
++-------------+------------------+
+
+select
+  r_name,
+  count(r_nations.item.n_nationkey) as count,
+  sum(r_nations.item.n_nationkey) as sum,
+  avg(r_nations.item.n_nationkey) as avg,
+  min(r_nations.item.n_name) as minimum,
+  max(r_nations.item.n_name) as maximum,
+  ndv(r_nations.item.n_nationkey) as distinct_vals
+from
+  region, region.r_nations as r_nations
+group by r_name
+order by r_name;
++-------------+-------+-----+------+-----------+----------------+---------------+
+| r_name      | count | sum | avg  | minimum   | maximum        | distinct_vals |
++-------------+-------+-----+------+-----------+----------------+---------------+
+| AFRICA      | 5     | 50  | 10   | ALGERIA   | MOZAMBIQUE     | 5             |
+| AMERICA     | 5     | 47  | 9.4  | ARGENTINA | UNITED STATES  | 5             |
+| ASIA        | 5     | 68  | 13.6 | CHINA     | VIETNAM        | 5             |
+| EUROPE      | 5     | 77  | 15.4 | FRANCE    | UNITED KINGDOM | 5             |
+| MIDDLE EAST | 5     | 58  | 11.6 | EGYPT     | SAUDI ARABIA   | 5             |
++-------------+-------+-----+------+-----------+----------------+---------------+
+</code></pre>
+</div>
+
+    <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>-- Average all the non-NULL values in a column.
+insert overwrite avg_t values (2),(4),(6),(null),(null);
+-- The average of the above values is 4: (2+4+6) / 3. The 2 NULL values are ignored.
+select avg(x) from avg_t;
+-- Average only certain values from the column.
+select avg(x) from t1 where month = 'January' and year = '2013';
+-- Apply a calculation to the value of the column before averaging.
+select avg(x/3) from t1;
+-- Apply a function to the value of the column before averaging.
+-- Here we are substituting a value of 0 for all NULLs in the column,
+-- so that those rows do factor into the return value.
+select avg(isnull(x,0)) from t1;
+-- Apply some number-returning function to a string column and average the results.
+-- If column s contains any NULLs, length(s) also returns NULL and those rows are ignored.
+select avg(length(s)) from t1;
+-- Can also be used in combination with DISTINCT and/or GROUP BY.
+-- Return more than one result.
+select month, year, avg(page_visits) from web_stats group by month, year;
+-- Filter the input to eliminate duplicates before performing the calculation.
+select avg(distinct x) from t1;
+-- Filter the output after performing the calculation.
+select avg(x) from t1 group by y having avg(x) between 1 and 20;
+</code></pre>
+
+    <div class="p">
+      The following examples show how to use <code class="ph codeph">AVG()</code> in an analytic context. They use a table
+      containing integers from 1 to 10. Notice how the <code class="ph codeph">AVG()</code> is reported for each input value, as
+      opposed to the <code class="ph codeph">GROUP BY</code> clause which condenses the result set.
+<pre class="pre codeblock"><code>select x, property, avg(x) over (partition by property) as avg from int_t where property in ('odd','even');
++----+----------+-----+
+| x  | property | avg |
++----+----------+-----+
+| 2  | even     | 6   |
+| 4  | even     | 6   |
+| 6  | even     | 6   |
+| 8  | even     | 6   |
+| 10 | even     | 6   |
+| 1  | odd      | 5   |
+| 3  | odd      | 5   |
+| 5  | odd      | 5   |
+| 7  | odd      | 5   |
+| 9  | odd      | 5   |
++----+----------+-----+
+</code></pre>
+
+Adding an <code class="ph codeph">ORDER BY</code> clause lets you experiment with results that are cumulative or apply to a moving
+set of rows (the <span class="q">"window"</span>). The following examples use <code class="ph codeph">AVG()</code> in an analytic context
+(that is, with an <code class="ph codeph">OVER()</code> clause) to produce a running average of all the even values,
+then a running average of all the odd values. The basic <code class="ph codeph">ORDER BY x</code> clause implicitly
+activates a window clause of <code class="ph codeph">RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW</code>,
+which is effectively the same as <code class="ph codeph">ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW</code>,
+therefore all of these examples produce the same results:
+<pre class="pre codeblock"><code>select x, property,
+  avg(x) over (partition by property <strong class="ph b">order by x</strong>) as 'cumulative average'
+  from int_t where property in ('odd','even');
++----+----------+--------------------+
+| x  | property | cumulative average |
++----+----------+--------------------+
+| 2  | even     | 2                  |
+| 4  | even     | 3                  |
+| 6  | even     | 4                  |
+| 8  | even     | 5                  |
+| 10 | even     | 6                  |
+| 1  | odd      | 1                  |
+| 3  | odd      | 2                  |
+| 5  | odd      | 3                  |
+| 7  | odd      | 4                  |
+| 9  | odd      | 5                  |
++----+----------+--------------------+
+
+select x, property,
+  avg(x) over
+  (
+    partition by property
+    <strong class="ph b">order by x</strong>
+    <strong class="ph b">range between unbounded preceding and current row</strong>
+  ) as 'cumulative average'
+from int_t where property in ('odd','even');
++----+----------+--------------------+
+| x  | property | cumulative average |
++----+----------+--------------------+
+| 2  | even     | 2                  |
+| 4  | even     | 3                  |
+| 6  | even     | 4                  |
+| 8  | even     | 5                  |
+| 10 | even     | 6                  |
+| 1  | odd      | 1                  |
+| 3  | odd      | 2                  |
+| 5  | odd      | 3                  |
+| 7  | odd      | 4                  |
+| 9  | odd      | 5                  |
++----+----------+--------------------+
+
+select x, property,
+  avg(x) over
+  (
+    partition by property
+    <strong class="ph b">order by x</strong>
+    <strong class="ph b">rows between unbounded preceding and current row</strong>
+  ) as 'cumulative average'
+  from int_t where property in ('odd','even');
++----+----------+--------------------+
+| x  | property | cumulative average |
++----+----------+--------------------+
+| 2  | even     | 2                  |
+| 4  | even     | 3                  |
+| 6  | even     | 4                  |
+| 8  | even     | 5                  |
+| 10 | even     | 6                  |
+| 1  | odd      | 1                  |
+| 3  | odd      | 2                  |
+| 5  | odd      | 3                  |
+| 7  | odd      | 4                  |
+| 9  | odd      | 5                  |
++----+----------+--------------------+
+</code></pre>
+
+The following examples show how to construct a moving window, with a running average taking into account 1 row before
+and 1 row after the current row, within the same partition (all the even values or all the odd values).
+Because of a restriction in the Impala <code class="ph codeph">RANGE</code> syntax, this type of
+moving window is possible with the <code class="ph codeph">ROWS BETWEEN</code> clause but not the <code class="ph codeph">RANGE BETWEEN</code>
+clause:
+<pre class="pre codeblock"><code>select x, property,
+  avg(x) over
+  (
+    partition by property
+    <strong class="ph b">order by x</strong>
+    <strong class="ph b">rows between 1 preceding and 1 following</strong>
+  ) as 'moving average'
+  from int_t where property in ('odd','even');
++----+----------+----------------+
+| x  | property | moving average |
++----+----------+----------------+
+| 2  | even     | 3              |
+| 4  | even     | 4              |
+| 6  | even     | 6              |
+| 8  | even     | 8              |
+| 10 | even     | 9              |
+| 1  | odd      | 2              |
+| 3  | odd      | 3              |
+| 5  | odd      | 5              |
+| 7  | odd      | 7              |
+| 9  | odd      | 8              |
++----+----------+----------------+
+
+-- Doesn't work because of syntax restriction on RANGE clause.
+select x, property,
+  avg(x) over
+  (
+    partition by property
+    <strong class="ph b">order by x</strong>
+    <strong class="ph b">range between 1 preceding and 1 following</strong>
+  ) as 'moving average'
+from int_t where property in ('odd','even');
+ERROR: AnalysisException: RANGE is only supported with both the lower and upper bounds UNBOUNDED or one UNBOUNDED and the other CURRENT ROW.
+</code></pre>
+    </div>
+
+    <p class="p">
+        <strong class="ph b">Restrictions:</strong>
+      </p>
+
+
+
+    <p class="p">
+        Due to the way arithmetic on <code class="ph codeph">FLOAT</code> and <code class="ph codeph">DOUBLE</code> columns uses
+        high-performance hardware instructions, and distributed queries can perform these operations in different
+        order for each query, results can vary slightly for aggregate function calls such as <code class="ph codeph">SUM()</code>
+        and <code class="ph codeph">AVG()</code> for <code class="ph codeph">FLOAT</code> and <code class="ph codeph">DOUBLE</code> columns, particularly on
+        large data sets where millions or billions of values are summed or averaged. For perfect consistency and
+        repeatability, use the <code class="ph codeph">DECIMAL</code> data type for such operations instead of
+        <code class="ph codeph">FLOAT</code> or <code class="ph codeph">DOUBLE</code>.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+    <p class="p">
+      <a class="xref" href="impala_analytic_functions.html#analytic_functions">Impala Analytic Functions</a>, <a class="xref" href="impala_max.html#max">MAX Function</a>,
+      <a class="xref" href="impala_min.html#min">MIN Function</a>
+    </p>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_aggregate_functions.html">Impala Aggregate Functions</a></div></div></nav></article></main></body></html>
\ No newline at end of file

[48/51] [partial] incubator-impala git commit: IMPALA-4181 [DOCS] Publish rendered Impala documentation to ASF site

Posted by jb...@apache.org.

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_analytic_functions.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_analytic_functions.html b/docs/build/html/topics/impala_analytic_functions.html
new file mode 100644
index 0000000..0e0d8da
--- /dev/null
+++ b/docs/build/html/topics/impala_analytic_functions.html
@@ -0,0 +1,1785 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_functions.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Im
 pala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="an
 alytic_functions"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Impala Analytic Functions</title></head><body id="analytic_functions"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Impala Analytic Functions</h1>
+
+  
+
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      
+
+      
+      Analytic functions (also known as window functions) are a special category of built-in functions. Like
+      aggregate functions, they examine the contents of multiple input rows to compute each output value. However,
+      rather than being limited to one result value per <code class="ph codeph">GROUP BY</code> group, they operate on
+      <dfn class="term">windows</dfn> where the input rows are ordered and grouped using flexible conditions expressed through
+      an <code class="ph codeph">OVER()</code> clause.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.0.0</span>
+      </p>
+
+
+
+    <p class="p">
+      Some functions, such as <code class="ph codeph">LAG()</code> and <code class="ph codeph">RANK()</code>, can only be used in this analytic
+      context. Some aggregate functions do double duty: when you call the aggregation functions such as
+      <code class="ph codeph">MAX()</code>, <code class="ph codeph">SUM()</code>, <code class="ph codeph">AVG()</code>, and so on with an
+      <code class="ph codeph">OVER()</code> clause, they produce an output value for each row, based on computations across other
+      rows in the window.
+    </p>
+
+    <p class="p">
+      Although analytic functions often compute the same value you would see from an aggregate function in a
+      <code class="ph codeph">GROUP BY</code> query, the analytic functions produce a value for each row in the result set rather
+      than a single value for each group. This flexibility lets you include additional columns in the
+      <code class="ph codeph">SELECT</code> list, offering more opportunities for organizing and filtering the result set.
+    </p>
+
+    <p class="p">
+      Analytic function calls are only allowed in the <code class="ph codeph">SELECT</code> list and in the outermost
+      <code class="ph codeph">ORDER BY</code> clause of the query. During query processing, analytic functions are evaluated
+      after other query stages such as joins, <code class="ph codeph">WHERE</code>, and <code class="ph codeph">GROUP BY</code>,
+    </p>
+
+
+
+
+
+
+
+
+
+    <p class="p">
+      The rows that are part of each partition are analyzed by computations across an ordered or unordered set of
+      rows. For example, <code class="ph codeph">COUNT()</code> and <code class="ph codeph">SUM()</code> might be applied to all the rows in
+      the partition, in which case the order of analysis does not matter. The <code class="ph codeph">ORDER BY</code> clause
+      might be used inside the <code class="ph codeph">OVER()</code> clause to defines the ordering that applies to functions
+      such as <code class="ph codeph">LAG()</code> and <code class="ph codeph">FIRST_VALUE()</code>.
+    </p>
+
+
+
+
+
+    <p class="p">
+      Analytic functions are frequently used in fields such as finance and science to provide trend, outlier, and
+      bucketed analysis for large data sets. You might also see the term <span class="q">"window functions"</span> in database
+      literature, referring to the sequence of rows (the <span class="q">"window"</span>) that the function call applies to,
+      particularly when the <code class="ph codeph">OVER</code> clause includes a <code class="ph codeph">ROWS</code> or <code class="ph codeph">RANGE</code>
+      keyword.
+    </p>
+
+    <p class="p">
+      The following sections describe the analytic query clauses and the pure analytic functions provided by
+      Impala. For usage information about aggregate functions in an analytic context, see
+      <a class="xref" href="impala_aggregate_functions.html#aggregate_functions">Impala Aggregate Functions</a>.
+    </p>
+
+    <p class="p toc inpage"></p>
+
+  </div>
+
+  <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_functions.html">Impala Built-In Functions</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="analytic_functions__over">
+
+    <h2 class="title topictitle2" id="ariaid-title2">OVER Clause</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        The <code class="ph codeph">OVER</code> clause is required for calls to pure analytic functions such as
+        <code class="ph codeph">LEAD()</code>, <code class="ph codeph">RANK()</code>, and <code class="ph codeph">FIRST_VALUE()</code>. When you include an
+        <code class="ph codeph">OVER</code> clause with calls to aggregate functions such as <code class="ph codeph">MAX()</code>,
+        <code class="ph codeph">COUNT()</code>, or <code class="ph codeph">SUM()</code>, they operate as analytic functions.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>function(<var class="keyword varname">args</var>) OVER([<var class="keyword varname">partition_by_clause</var>] [<var class="keyword varname">order_by_clause</var> [<var class="keyword varname">window_clause</var>]])
+
+partition_by_clause ::= PARTITION BY <var class="keyword varname">expr</var> [, <var class="keyword varname">expr</var> ...]
+order_by_clause ::= ORDER BY <var class="keyword varname">expr</var>  [ASC | DESC] [NULLS FIRST | NULLS LAST] [, <var class="keyword varname">expr</var> [ASC | DESC] [NULLS FIRST | NULLS LAST] ...]
+window_clause: See <a class="xref" href="#window_clause">Window Clause</a>
+</code></pre>
+
+      <p class="p">
+        <strong class="ph b">PARTITION BY clause:</strong>
+      </p>
+
+      <p class="p">
+        The <code class="ph codeph">PARTITION BY</code> clause acts much like the <code class="ph codeph">GROUP BY</code> clause in the
+        outermost block of a query. It divides the rows into groups containing identical values in one or more
+        columns. These logical groups are known as <dfn class="term">partitions</dfn>. Throughout the discussion of analytic
+        functions, <span class="q">"partitions"</span> refers to the groups produced by the <code class="ph codeph">PARTITION BY</code> clause, not
+        to partitioned tables. However, note the following limitation that applies specifically to analytic function
+        calls involving partitioned tables.
+      </p>
+
+      <p class="p">
+        In queries involving both analytic functions and partitioned tables, partition pruning only occurs for columns named in the <code class="ph codeph">PARTITION BY</code>
+        clause of the analytic function call. For example, if an analytic function query has a clause such as <code class="ph codeph">WHERE year=2016</code>,
+        the way to make the query prune all other <code class="ph codeph">YEAR</code> partitions is to include <code class="ph codeph">PARTITION BY year</code>in the analytic function call;
+        for example, <code class="ph codeph">OVER (PARTITION BY year,<var class="keyword varname">other_columns</var> <var class="keyword varname">other_analytic_clauses</var>)</code>.
+
+      </p>
+
+      <p class="p">
+        The sequence of results from an analytic function <span class="q">"resets"</span> for each new partition in the result set.
+        That is, the set of preceding or following rows considered by the analytic function always come from a
+        single partition. Any <code class="ph codeph">MAX()</code>, <code class="ph codeph">SUM()</code>, <code class="ph codeph">ROW_NUMBER()</code>, and so
+        on apply to each partition independently. Omit the <code class="ph codeph">PARTITION BY</code> clause to apply the
+        analytic operation to all the rows in the table.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">ORDER BY clause:</strong>
+      </p>
+
+      <p class="p">
+        The <code class="ph codeph">ORDER BY</code> clause works much like the <code class="ph codeph">ORDER BY</code> clause in the outermost
+        block of a query. It defines the order in which rows are evaluated for the entire input set, or for each
+        group produced by a <code class="ph codeph">PARTITION BY</code> clause. You can order by one or multiple expressions, and
+        for each expression optionally choose ascending or descending order and whether nulls come first or last in
+        the sort order. Because this <code class="ph codeph">ORDER BY</code> clause only defines the order in which rows are
+        evaluated, if you want the results to be output in a specific order, also include an <code class="ph codeph">ORDER
+        BY</code> clause in the outer block of the query.
+      </p>
+
+      <p class="p">
+        When the <code class="ph codeph">ORDER BY</code> clause is omitted, the analytic function applies to all items in the
+        group produced by the <code class="ph codeph">PARTITION BY</code> clause. When the <code class="ph codeph">ORDER BY</code> clause is
+        included, the analysis can apply to all or a subset of the items in the group, depending on the optional
+        window clause.
+      </p>
+
+      <p class="p">
+        The order in which the rows are analyzed is only defined for those columns specified in <code class="ph codeph">ORDER
+        BY</code> clauses.
+      </p>
+
+      <p class="p">
+        One difference between the analytic and outer uses of the <code class="ph codeph">ORDER BY</code> clause: inside the
+        <code class="ph codeph">OVER</code> clause, <code class="ph codeph">ORDER BY 1</code> or other integer value is interpreted as a
+        constant sort value (effectively a no-op) rather than referring to column 1.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Window clause:</strong>
+      </p>
+
+      <p class="p">
+        The window clause is only allowed in combination with an <code class="ph codeph">ORDER BY</code> clause. If the
+        <code class="ph codeph">ORDER BY</code> clause is specified but the window clause is not, the default window is
+        <code class="ph codeph">RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW</code>. See
+        <a class="xref" href="impala_analytic_functions.html#window_clause">Window Clause</a> for full details.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">HBase considerations:</strong>
+      </p>
+
+      <p class="p">
+        Because HBase tables are optimized for single-row lookups rather than full scans, analytic functions using
+        the <code class="ph codeph">OVER()</code> clause are not recommended for HBase tables. Although such queries work, their
+        performance is lower than on comparable tables using HDFS data files.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Parquet considerations:</strong>
+      </p>
+
+      <p class="p">
+        Analytic functions are very efficient for Parquet tables. The data that is examined during evaluation of
+        the <code class="ph codeph">OVER()</code> clause comes from a specified set of columns, and the values for each column
+        are arranged sequentially within each data file.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Text table considerations:</strong>
+      </p>
+
+      <p class="p">
+        Analytic functions are convenient to use with text tables for exploratory business intelligence. When the
+        volume of data is substantial, prefer to use Parquet tables for performance-critical analytic queries.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.0.0</span>
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+      <p class="p">
+        The following example shows how to synthesize a numeric sequence corresponding to all the rows in a table.
+        The new table has the same columns as the old one, plus an additional column <code class="ph codeph">ID</code> containing
+        the integers 1, 2, 3, and so on, corresponding to the order of a <code class="ph codeph">TIMESTAMP</code> column in the
+        original table.
+      </p>
+
+
+
+<pre class="pre codeblock"><code>CREATE TABLE events_with_id AS
+  SELECT
+    row_number() OVER (ORDER BY date_and_time) AS id,
+    c1, c2, c3, c4
+  FROM events;
+</code></pre>
+
+      <p class="p">
+        The following example shows how to determine the number of rows containing each value for a column. Unlike
+        a corresponding <code class="ph codeph">GROUP BY</code> query, this one can analyze a single column and still return all
+        values (not just the distinct ones) from the other columns.
+      </p>
+
+
+
+<pre class="pre codeblock"><code>SELECT x, y, z,
+  count() OVER (PARTITION BY x) AS how_many_x
+FROM t1;
+</code></pre>
+
+      <p class="p">
+        <strong class="ph b">Restrictions:</strong>
+      </p>
+
+      <p class="p">
+        You cannot directly combine the <code class="ph codeph">DISTINCT</code> operator with analytic function calls. You can
+        put the analytic function call in a <code class="ph codeph">WITH</code> clause or an inline view, and apply the
+        <code class="ph codeph">DISTINCT</code> operator to its result set.
+      </p>
+
+<pre class="pre codeblock"><code>WITH t1 AS (SELECT x, sum(x) OVER (PARTITION BY x) AS total FROM t1)
+  SELECT DISTINCT x, total FROM t1;
+</code></pre>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="analytic_functions__window_clause">
+
+    <h2 class="title topictitle2" id="ariaid-title3">Window Clause</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        Certain analytic functions accept an optional <dfn class="term">window clause</dfn>, which makes the function analyze
+        only certain rows <span class="q">"around"</span> the current row rather than all rows in the partition. For example, you can
+        get a moving average by specifying some number of preceding and following rows, or a running count or
+        running total by specifying all rows up to the current position. This clause can result in different
+        analytic results for rows within the same partition.
+      </p>
+
+      <p class="p">
+        The window clause is supported with the <code class="ph codeph">AVG()</code>, <code class="ph codeph">COUNT()</code>,
+        <code class="ph codeph">FIRST_VALUE()</code>, <code class="ph codeph">LAST_VALUE()</code>, and <code class="ph codeph">SUM()</code> functions.
+
+        For <code class="ph codeph">MAX()</code> and <code class="ph codeph">MIN()</code>, the window clause only allowed if the start bound is
+        <code class="ph codeph">UNBOUNDED PRECEDING</code>
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>ROWS BETWEEN [ { <var class="keyword varname">m</var> | UNBOUNDED } PRECEDING | CURRENT ROW] [ AND [CURRENT ROW | { UNBOUNDED | <var class="keyword varname">n</var> } FOLLOWING] ]
+RANGE BETWEEN [ {<var class="keyword varname">m</var> | UNBOUNDED } PRECEDING | CURRENT ROW] [ AND [CURRENT ROW | { UNBOUNDED | <var class="keyword varname">n</var> } FOLLOWING] ]</code></pre>
+
+      <p class="p">
+        <code class="ph codeph">ROWS BETWEEN</code> defines the size of the window in terms of the indexes of the rows in the
+        result set. The size of the window is predictable based on the clauses the position within the result set.
+      </p>
+
+      <p class="p">
+        <code class="ph codeph">RANGE BETWEEN</code> does not currently support numeric arguments to define a variable-size
+        sliding window.
+
+      </p>
+
+
+
+      <p class="p">
+        Currently, Impala supports only some combinations of arguments to the <code class="ph codeph">RANGE</code> clause:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          <code class="ph codeph">RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW</code> (the default when <code class="ph codeph">ORDER
+          BY</code> is specified and the window clause is omitted)
+        </li>
+
+        <li class="li">
+          <code class="ph codeph">RANGE BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING</code>
+        </li>
+
+        <li class="li">
+          <code class="ph codeph">RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING</code>
+        </li>
+      </ul>
+
+      <p class="p">
+        When <code class="ph codeph">RANGE</code> is used, <code class="ph codeph">CURRENT ROW</code> includes not just the current row but all
+        rows that are tied with the current row based on the <code class="ph codeph">ORDER BY</code> expressions.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.0.0</span>
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+      <p class="p">
+        The following examples show financial data for a fictional stock symbol <code class="ph codeph">JDR</code>. The closing
+        price moves up and down each day.
+      </p>
+
+<pre class="pre codeblock"><code>create table stock_ticker (stock_symbol string, closing_price decimal(8,2), closing_date timestamp);
+...load some data...
+select * from stock_ticker order by stock_symbol, closing_date
++--------------+---------------+---------------------+
+| stock_symbol | closing_price | closing_date        |
++--------------+---------------+---------------------+
+| JDR          | 12.86         | 2014-10-02 00:00:00 |
+| JDR          | 12.89         | 2014-10-03 00:00:00 |
+| JDR          | 12.94         | 2014-10-04 00:00:00 |
+| JDR          | 12.55         | 2014-10-05 00:00:00 |
+| JDR          | 14.03         | 2014-10-06 00:00:00 |
+| JDR          | 14.75         | 2014-10-07 00:00:00 |
+| JDR          | 13.98         | 2014-10-08 00:00:00 |
++--------------+---------------+---------------------+
+</code></pre>
+
+      <p class="p">
+        The queries use analytic functions with window clauses to compute moving averages of the closing price. For
+        example, <code class="ph codeph">ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING</code> produces an average of the value from a
+        3-day span, producing a different value for each row. The first row, which has no preceding row, only gets
+        averaged with the row following it. If the table contained more than one stock symbol, the
+        <code class="ph codeph">PARTITION BY</code> clause would limit the window for the moving average to only consider the
+        prices for a single stock.
+      </p>
+
+<pre class="pre codeblock"><code>select stock_symbol, closing_date, closing_price,
+  avg(closing_price) over (partition by stock_symbol order by closing_date
+    rows between 1 preceding and 1 following) as moving_average
+  from stock_ticker;
++--------------+---------------------+---------------+----------------+
+| stock_symbol | closing_date        | closing_price | moving_average |
++--------------+---------------------+---------------+----------------+
+| JDR          | 2014-10-02 00:00:00 | 12.86         | 12.87          |
+| JDR          | 2014-10-03 00:00:00 | 12.89         | 12.89          |
+| JDR          | 2014-10-04 00:00:00 | 12.94         | 12.79          |
+| JDR          | 2014-10-05 00:00:00 | 12.55         | 13.17          |
+| JDR          | 2014-10-06 00:00:00 | 14.03         | 13.77          |
+| JDR          | 2014-10-07 00:00:00 | 14.75         | 14.25          |
+| JDR          | 2014-10-08 00:00:00 | 13.98         | 14.36          |
++--------------+---------------------+---------------+----------------+
+</code></pre>
+
+      <p class="p">
+        The clause <code class="ph codeph">ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW</code> produces a cumulative moving
+        average, from the earliest data up to the value for each day.
+      </p>
+
+<pre class="pre codeblock"><code>select stock_symbol, closing_date, closing_price,
+  avg(closing_price) over (partition by stock_symbol order by closing_date
+    rows between unbounded preceding and current row) as moving_average
+  from stock_ticker;
++--------------+---------------------+---------------+----------------+
+| stock_symbol | closing_date        | closing_price | moving_average |
++--------------+---------------------+---------------+----------------+
+| JDR          | 2014-10-02 00:00:00 | 12.86         | 12.86          |
+| JDR          | 2014-10-03 00:00:00 | 12.89         | 12.87          |
+| JDR          | 2014-10-04 00:00:00 | 12.94         | 12.89          |
+| JDR          | 2014-10-05 00:00:00 | 12.55         | 12.81          |
+| JDR          | 2014-10-06 00:00:00 | 14.03         | 13.05          |
+| JDR          | 2014-10-07 00:00:00 | 14.75         | 13.33          |
+| JDR          | 2014-10-08 00:00:00 | 13.98         | 13.42          |
++--------------+---------------------+---------------+----------------+
+</code></pre>
+
+
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="analytic_functions__avg_analytic">
+
+    <h2 class="title topictitle2" id="ariaid-title4">AVG Function - Analytic Context</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        You can include an <code class="ph codeph">OVER</code> clause with a call to this function to use it as an analytic
+        function. See <a class="xref" href="impala_avg.html#avg">AVG Function</a> for details and examples.
+      </p>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title5" id="analytic_functions__count_analytic">
+
+    <h2 class="title topictitle2" id="ariaid-title5">COUNT Function - Analytic Context</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        You can include an <code class="ph codeph">OVER</code> clause with a call to this function to use it as an analytic
+        function. See <a class="xref" href="impala_count.html#count">COUNT Function</a> for details and examples.
+      </p>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title6" id="analytic_functions__cume_dist">
+
+    <h2 class="title topictitle2" id="ariaid-title6">CUME_DIST Function (<span class="keyword">Impala 2.3</span> or higher only)</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        Returns the cumulative distribution of a value. The value for each row in the result set is greater than 0
+        and less than or equal to 1.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>CUME_DIST (<var class="keyword varname">expr</var>)
+  OVER ([<var class="keyword varname">partition_by_clause</var>] <var class="keyword varname">order_by_clause</var>)
+</code></pre>
+
+      <p class="p">
+        The <code class="ph codeph">ORDER BY</code> clause is required. The <code class="ph codeph">PARTITION BY</code> clause is optional. The
+        window clause is not allowed.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+      <p class="p">
+        Within each partition of the result set, the <code class="ph codeph">CUME_DIST()</code> value represents an ascending
+        sequence that ends at 1. Each value represents the proportion of rows in the partition whose values are
+        less than or equal to the value in the current row.
+      </p>
+
+      <p class="p">
+        If the sequence of input values contains ties, the <code class="ph codeph">CUME_DIST()</code> results are identical for the
+        tied values.
+      </p>
+
+      <p class="p">
+        Impala only supports the <code class="ph codeph">CUME_DIST()</code> function in an analytic context, not as a regular
+        aggregate function.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+      <p class="p">
+        This example uses a table with 9 rows. The <code class="ph codeph">CUME_DIST()</code>
+        function evaluates the entire table because there is no <code class="ph codeph">PARTITION BY</code> clause,
+        with the rows ordered by the weight of the animal.
+        the sequence of values shows that 1/9 of the values are less than or equal to the lightest
+        animal (mouse), 2/9 of the values are less than or equal to the second-lightest animal,
+        and so on up to the heaviest animal (elephant), where 9/9 of the rows are less than or
+        equal to its weight.
+      </p>
+
+<pre class="pre codeblock"><code>create table animals (name string, kind string, kilos decimal(9,3));
+insert into animals values
+  ('Elephant', 'Mammal', 4000), ('Giraffe', 'Mammal', 1200), ('Mouse', 'Mammal', 0.020),
+  ('Condor', 'Bird', 15), ('Horse', 'Mammal', 500), ('Owl', 'Bird', 2.5),
+  ('Ostrich', 'Bird', 145), ('Polar bear', 'Mammal', 700), ('Housecat', 'Mammal', 5);
+
+select name, cume_dist() over (order by kilos) from animals;
++------------+-----------------------+
+| name       | cume_dist() OVER(...) |
++------------+-----------------------+
+| Elephant   | 1                     |
+| Giraffe    | 0.8888888888888888    |
+| Polar bear | 0.7777777777777778    |
+| Horse      | 0.6666666666666666    |
+| Ostrich    | 0.5555555555555556    |
+| Condor     | 0.4444444444444444    |
+| Housecat   | 0.3333333333333333    |
+| Owl        | 0.2222222222222222    |
+| Mouse      | 0.1111111111111111    |
++------------+-----------------------+
+</code></pre>
+
+      <p class="p">
+        Using a <code class="ph codeph">PARTITION BY</code> clause produces a separate sequence for each partition
+        group, in this case one for mammals and one for birds. Because there are 3 birds and 6 mammals,
+        the sequence illustrates how 1/3 of the <span class="q">"Bird"</span> rows have a <code class="ph codeph">kilos</code> value that is less than or equal to
+        the lightest bird, 1/6 of the <span class="q">"Mammal"</span> rows have a <code class="ph codeph">kilos</code> value that is less than or equal to
+        the lightest mammal, and so on until both the heaviest bird and heaviest mammal have a <code class="ph codeph">CUME_DIST()</code>
+        value of 1.
+      </p>
+
+<pre class="pre codeblock"><code>select name, kind, cume_dist() over (partition by kind order by kilos) from animals
++------------+--------+-----------------------+
+| name       | kind   | cume_dist() OVER(...) |
++------------+--------+-----------------------+
+| Ostrich    | Bird   | 1                     |
+| Condor     | Bird   | 0.6666666666666666    |
+| Owl        | Bird   | 0.3333333333333333    |
+| Elephant   | Mammal | 1                     |
+| Giraffe    | Mammal | 0.8333333333333334    |
+| Polar bear | Mammal | 0.6666666666666666    |
+| Horse      | Mammal | 0.5                   |
+| Housecat   | Mammal | 0.3333333333333333    |
+| Mouse      | Mammal | 0.1666666666666667    |
++------------+--------+-----------------------+
+</code></pre>
+
+      <p class="p">
+        We can reverse the ordering within each partition group by using an <code class="ph codeph">ORDER BY ... DESC</code>
+        clause within the <code class="ph codeph">OVER()</code> clause. Now the lightest (smallest value of <code class="ph codeph">kilos</code>)
+        animal of each kind has a <code class="ph codeph">CUME_DIST()</code> value of 1.
+      </p>
+
+<pre class="pre codeblock"><code>select name, kind, cume_dist() over (partition by kind order by kilos desc) from animals
++------------+--------+-----------------------+
+| name       | kind   | cume_dist() OVER(...) |
++------------+--------+-----------------------+
+| Owl        | Bird   | 1                     |
+| Condor     | Bird   | 0.6666666666666666    |
+| Ostrich    | Bird   | 0.3333333333333333    |
+| Mouse      | Mammal | 1                     |
+| Housecat   | Mammal | 0.8333333333333334    |
+| Horse      | Mammal | 0.6666666666666666    |
+| Polar bear | Mammal | 0.5                   |
+| Giraffe    | Mammal | 0.3333333333333333    |
+| Elephant   | Mammal | 0.1666666666666667    |
++------------+--------+-----------------------+
+</code></pre>
+
+      <p class="p">
+        The following example manufactures some rows with identical values in the <code class="ph codeph">kilos</code> column,
+        to demonstrate how the results look in case of tie values. For simplicity, it only shows the <code class="ph codeph">CUME_DIST()</code>
+        sequence for the <span class="q">"Bird"</span> rows. Now with 3 rows all with a value of 15, all of those rows have the same
+        <code class="ph codeph">CUME_DIST()</code> value. 4/5 of the rows have a value for <code class="ph codeph">kilos</code> that is less than or
+        equal to 15.
+      </p>
+
+<pre class="pre codeblock"><code>insert into animals values ('California Condor', 'Bird', 15), ('Andean Condor', 'Bird', 15)
+
+select name, kind, cume_dist() over (order by kilos) from animals where kind = 'Bird';
++-------------------+------+-----------------------+
+| name              | kind | cume_dist() OVER(...) |
++-------------------+------+-----------------------+
+| Ostrich           | Bird | 1                     |
+| Condor            | Bird | 0.8                   |
+| California Condor | Bird | 0.8                   |
+| Andean Condor     | Bird | 0.8                   |
+| Owl               | Bird | 0.2                   |
++-------------------+------+-----------------------+
+</code></pre>
+
+      <p class="p">
+        The following example shows how to use an <code class="ph codeph">ORDER BY</code> clause in the outer block
+        to order the result set in case of ties. Here, all the <span class="q">"Bird"</span> rows are together, then in descending order
+        by the result of the <code class="ph codeph">CUME_DIST()</code> function, and all tied <code class="ph codeph">CUME_DIST()</code>
+        values are ordered by the animal name.
+      </p>
+
+<pre class="pre codeblock"><code>select name, kind, cume_dist() over (partition by kind order by kilos) as ordering
+  from animals
+where
+  kind = 'Bird'
+order by kind, ordering desc, name;
++-------------------+------+----------+
+| name              | kind | ordering |
++-------------------+------+----------+
+| Ostrich           | Bird | 1        |
+| Andean Condor     | Bird | 0.8      |
+| California Condor | Bird | 0.8      |
+| Condor            | Bird | 0.8      |
+| Owl               | Bird | 0.2      |
++-------------------+------+----------+
+</code></pre>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title7" id="analytic_functions__dense_rank">
+
+    <h2 class="title topictitle2" id="ariaid-title7">DENSE_RANK Function</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        Returns an ascending sequence of integers, starting with 1. The output sequence produces duplicate integers
+        for duplicate values of the <code class="ph codeph">ORDER BY</code> expressions. After generating duplicate output values
+        for the <span class="q">"tied"</span> input values, the function continues the sequence with the next higher integer.
+        Therefore, the sequence contains duplicates but no gaps when the input contains duplicates. Starts the
+        sequence over for each group produced by the <code class="ph codeph">PARTITIONED BY</code> clause.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>DENSE_RANK() OVER([<var class="keyword varname">partition_by_clause</var>] <var class="keyword varname">order_by_clause</var>)</code></pre>
+
+      <p class="p">
+        The <code class="ph codeph">PARTITION BY</code> clause is optional. The <code class="ph codeph">ORDER BY</code> clause is required. The
+        window clause is not allowed.
+      </p>
+
+
+
+      <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+      <p class="p">
+        Often used for top-N and bottom-N queries. For example, it could produce a <span class="q">"top 10"</span> report including
+        all the items with the 10 highest values, even if several items tied for 1st place.
+      </p>
+
+      <p class="p">
+        Similar to <code class="ph codeph">ROW_NUMBER</code> and <code class="ph codeph">RANK</code>. These functions differ in how they treat
+        duplicate combinations of values.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.0.0</span>
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+      <p class="p">
+        The following example demonstrates how the <code class="ph codeph">DENSE_RANK()</code> function identifies where each
+        value <span class="q">"places"</span> in the result set, producing the same result for duplicate values, but with a strict
+        sequence from 1 to the number of groups. For example, when results are ordered by the <code class="ph codeph">X</code>
+        column, both <code class="ph codeph">1</code> values are tied for first; both <code class="ph codeph">2</code> values are tied for
+        second; and so on.
+      </p>
+
+<pre class="pre codeblock"><code>select x, dense_rank() over(order by x) as rank, property from int_t;
++----+------+----------+
+| x  | rank | property |
++----+------+----------+
+| 1  | 1    | square   |
+| 1  | 1    | odd      |
+| 2  | 2    | even     |
+| 2  | 2    | prime    |
+| 3  | 3    | prime    |
+| 3  | 3    | odd      |
+| 4  | 4    | even     |
+| 4  | 4    | square   |
+| 5  | 5    | odd      |
+| 5  | 5    | prime    |
+| 6  | 6    | even     |
+| 6  | 6    | perfect  |
+| 7  | 7    | lucky    |
+| 7  | 7    | lucky    |
+| 7  | 7    | lucky    |
+| 7  | 7    | odd      |
+| 7  | 7    | prime    |
+| 8  | 8    | even     |
+| 9  | 9    | square   |
+| 9  | 9    | odd      |
+| 10 | 10   | round    |
+| 10 | 10   | even     |
++----+------+----------+
+</code></pre>
+
+      <p class="p">
+        The following examples show how the <code class="ph codeph">DENSE_RANK()</code> function is affected by the
+        <code class="ph codeph">PARTITION</code> property within the <code class="ph codeph">ORDER BY</code> clause.
+      </p>
+
+      <p class="p">
+        Partitioning by the <code class="ph codeph">PROPERTY</code> column groups all the even, odd, and so on values together,
+        and <code class="ph codeph">DENSE_RANK()</code> returns the place of each value within the group, producing several
+        ascending sequences.
+      </p>
+
+<pre class="pre codeblock"><code>select x, dense_rank() over(partition by property order by x) as rank, property from int_t;
++----+------+----------+
+| x  | rank | property |
++----+------+----------+
+| 2  | 1    | even     |
+| 4  | 2    | even     |
+| 6  | 3    | even     |
+| 8  | 4    | even     |
+| 10 | 5    | even     |
+| 7  | 1    | lucky    |
+| 7  | 1    | lucky    |
+| 7  | 1    | lucky    |
+| 1  | 1    | odd      |
+| 3  | 2    | odd      |
+| 5  | 3    | odd      |
+| 7  | 4    | odd      |
+| 9  | 5    | odd      |
+| 6  | 1    | perfect  |
+| 2  | 1    | prime    |
+| 3  | 2    | prime    |
+| 5  | 3    | prime    |
+| 7  | 4    | prime    |
+| 10 | 1    | round    |
+| 1  | 1    | square   |
+| 4  | 2    | square   |
+| 9  | 3    | square   |
++----+------+----------+
+</code></pre>
+
+      <p class="p">
+        Partitioning by the <code class="ph codeph">X</code> column groups all the duplicate numbers together and returns the
+        place each each value within the group; because each value occurs only 1 or 2 times,
+        <code class="ph codeph">DENSE_RANK()</code> designates each <code class="ph codeph">X</code> value as either first or second within its
+        group.
+      </p>
+
+<pre class="pre codeblock"><code>select x, dense_rank() over(partition by x order by property) as rank, property from int_t;
++----+------+----------+
+| x  | rank | property |
++----+------+----------+
+| 1  | 1    | odd      |
+| 1  | 2    | square   |
+| 2  | 1    | even     |
+| 2  | 2    | prime    |
+| 3  | 1    | odd      |
+| 3  | 2    | prime    |
+| 4  | 1    | even     |
+| 4  | 2    | square   |
+| 5  | 1    | odd      |
+| 5  | 2    | prime    |
+| 6  | 1    | even     |
+| 6  | 2    | perfect  |
+| 7  | 1    | lucky    |
+| 7  | 1    | lucky    |
+| 7  | 1    | lucky    |
+| 7  | 2    | odd      |
+| 7  | 3    | prime    |
+| 8  | 1    | even     |
+| 9  | 1    | odd      |
+| 9  | 2    | square   |
+| 10 | 1    | even     |
+| 10 | 2    | round    |
++----+------+----------+
+</code></pre>
+
+      <p class="p">
+        The following example shows how <code class="ph codeph">DENSE_RANK()</code> produces a continuous sequence while still
+        allowing for ties. In this case, Croesus and Midas both have the second largest fortune, while Crassus has
+        the third largest. (In <a class="xref" href="impala_analytic_functions.html#rank">RANK Function</a>, you see a similar query with the
+        <code class="ph codeph">RANK()</code> function that shows that while Crassus has the third largest fortune, he is the
+        fourth richest person.)
+      </p>
+
+<pre class="pre codeblock"><code>select dense_rank() over (order by net_worth desc) as placement, name, net_worth from wealth order by placement, name;
++-----------+---------+---------------+
+| placement | name    | net_worth     |
++-----------+---------+---------------+
+| 1         | Solomon | 2000000000.00 |
+| 2         | Croesus | 1000000000.00 |
+| 2         | Midas   | 1000000000.00 |
+| 3         | Crassus | 500000000.00  |
+| 4         | Scrooge | 80000000.00   |
++-----------+---------+---------------+
+</code></pre>
+
+      <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+      <p class="p">
+        <a class="xref" href="impala_analytic_functions.html#rank">RANK Function</a>, <a class="xref" href="impala_analytic_functions.html#row_number">ROW_NUMBER Function</a>
+      </p>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title8" id="analytic_functions__first_value">
+
+    <h2 class="title topictitle2" id="ariaid-title8">FIRST_VALUE Function</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        Returns the expression value from the first row in the window. The return value is <code class="ph codeph">NULL</code> if
+        the input expression is <code class="ph codeph">NULL</code>.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>FIRST_VALUE(<var class="keyword varname">expr</var>) OVER([<var class="keyword varname">partition_by_clause</var>] <var class="keyword varname">order_by_clause</var> [<var class="keyword varname">window_clause</var>])</code></pre>
+
+      <p class="p">
+        The <code class="ph codeph">PARTITION BY</code> clause is optional. The <code class="ph codeph">ORDER BY</code> clause is required. The
+        window clause is optional.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+      <p class="p">
+        If any duplicate values occur in the tuples evaluated by the <code class="ph codeph">ORDER BY</code> clause, the result
+        of this function is not deterministic. Consider adding additional <code class="ph codeph">ORDER BY</code> columns to
+        ensure consistent ordering.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.0.0</span>
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+      <p class="p">
+        The following example shows a table with a wide variety of country-appropriate greetings. For consistency,
+        we want to standardize on a single greeting for each country. The <code class="ph codeph">FIRST_VALUE()</code> function
+        helps to produce a mail merge report where every person from the same country is addressed with the same
+        greeting.
+      </p>
+
+<pre class="pre codeblock"><code>select name, country, greeting from mail_merge
++---------+---------+--------------+
+| name    | country | greeting     |
++---------+---------+--------------+
+| Pete    | USA     | Hello        |
+| John    | USA     | Hi           |
+| Boris   | Germany | Guten tag    |
+| Michael | Germany | Guten morgen |
+| Bjorn   | Sweden  | Hej          |
+| Mats    | Sweden  | Tja          |
++---------+---------+--------------+
+
+select country, name,
+  first_value(greeting)
+    over (partition by country order by name, greeting) as greeting
+  from mail_merge;
++---------+---------+-----------+
+| country | name    | greeting  |
++---------+---------+-----------+
+| Germany | Boris   | Guten tag |
+| Germany | Michael | Guten tag |
+| Sweden  | Bjorn   | Hej       |
+| Sweden  | Mats    | Hej       |
+| USA     | John    | Hi        |
+| USA     | Pete    | Hi        |
++---------+---------+-----------+
+</code></pre>
+
+      <p class="p">
+        Changing the order in which the names are evaluated changes which greeting is applied to each group.
+      </p>
+
+<pre class="pre codeblock"><code>select country, name,
+  first_value(greeting)
+    over (partition by country order by name desc, greeting) as greeting
+  from mail_merge;
++---------+---------+--------------+
+| country | name    | greeting     |
++---------+---------+--------------+
+| Germany | Michael | Guten morgen |
+| Germany | Boris   | Guten morgen |
+| Sweden  | Mats    | Tja          |
+| Sweden  | Bjorn   | Tja          |
+| USA     | Pete    | Hello        |
+| USA     | John    | Hello        |
++---------+---------+--------------+
+</code></pre>
+
+      <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+      <p class="p">
+        <a class="xref" href="impala_analytic_functions.html#last_value">LAST_VALUE Function</a>
+      </p>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title9" id="analytic_functions__lag">
+
+    <h2 class="title topictitle2" id="ariaid-title9">LAG Function</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        This function returns the value of an expression using column values from a preceding row. You specify an
+        integer offset, which designates a row position some number of rows previous to the current row. Any column
+        references in the expression argument refer to column values from that prior row. Typically, the table
+        contains a time sequence or numeric sequence column that clearly distinguishes the ordering of the rows.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>LAG (<var class="keyword varname">expr</var> [, <var class="keyword varname">offset</var>] [, <var class="keyword varname">default</var>])
+  OVER ([<var class="keyword varname">partition_by_clause</var>] <var class="keyword varname">order_by_clause</var>)</code></pre>
+
+      <p class="p">
+        The <code class="ph codeph">ORDER BY</code> clause is required. The <code class="ph codeph">PARTITION BY</code> clause is optional. The
+        window clause is not allowed.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+      <p class="p">
+        Sometimes used an an alternative to doing a self-join.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.0.0</span>
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+      <p class="p">
+        The following example uses the same stock data created in <a class="xref" href="#window_clause">Window Clause</a>. For each day, the
+        query prints the closing price alongside the previous day's closing price. The first row for each stock
+        symbol has no previous row, so that <code class="ph codeph">LAG()</code> value is <code class="ph codeph">NULL</code>.
+      </p>
+
+<pre class="pre codeblock"><code>select stock_symbol, closing_date, closing_price,
+    lag(closing_price,1) over (partition by stock_symbol order by closing_date) as "yesterday closing"
+  from stock_ticker
+    order by closing_date;
++--------------+---------------------+---------------+-------------------+
+| stock_symbol | closing_date        | closing_price | yesterday closing |
++--------------+---------------------+---------------+-------------------+
+| JDR          | 2014-09-13 00:00:00 | 12.86         | NULL              |
+| JDR          | 2014-09-14 00:00:00 | 12.89         | 12.86             |
+| JDR          | 2014-09-15 00:00:00 | 12.94         | 12.89             |
+| JDR          | 2014-09-16 00:00:00 | 12.55         | 12.94             |
+| JDR          | 2014-09-17 00:00:00 | 14.03         | 12.55             |
+| JDR          | 2014-09-18 00:00:00 | 14.75         | 14.03             |
+| JDR          | 2014-09-19 00:00:00 | 13.98         | 14.75             |
++--------------+---------------------+---------------+-------------------+
+</code></pre>
+
+      <p class="p">
+        The following example does an arithmetic operation between the current row and a value from the previous
+        row, to produce a delta value for each day. This example also demonstrates how <code class="ph codeph">ORDER BY</code>
+        works independently in the different parts of the query. The <code class="ph codeph">ORDER BY closing_date</code> in the
+        <code class="ph codeph">OVER</code> clause makes the query analyze the rows in chronological order. Then the outer query
+        block uses <code class="ph codeph">ORDER BY closing_date DESC</code> to present the results with the most recent date
+        first.
+      </p>
+
+<pre class="pre codeblock"><code>select stock_symbol, closing_date, closing_price,
+    cast(
+      closing_price - lag(closing_price,1) over
+        (partition by stock_symbol order by closing_date)
+      as decimal(8,2)
+    )
+    as "change from yesterday"
+  from stock_ticker
+    order by closing_date desc;
++--------------+---------------------+---------------+-----------------------+
+| stock_symbol | closing_date        | closing_price | change from yesterday |
++--------------+---------------------+---------------+-----------------------+
+| JDR          | 2014-09-19 00:00:00 | 13.98         | -0.76                 |
+| JDR          | 2014-09-18 00:00:00 | 14.75         | 0.72                  |
+| JDR          | 2014-09-17 00:00:00 | 14.03         | 1.47                  |
+| JDR          | 2014-09-16 00:00:00 | 12.55         | -0.38                 |
+| JDR          | 2014-09-15 00:00:00 | 12.94         | 0.04                  |
+| JDR          | 2014-09-14 00:00:00 | 12.89         | 0.03                  |
+| JDR          | 2014-09-13 00:00:00 | 12.86         | NULL                  |
++--------------+---------------------+---------------+-----------------------+
+</code></pre>
+
+      <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+      <p class="p">
+        This function is the converse of <a class="xref" href="impala_analytic_functions.html#lead">LEAD Function</a>.
+      </p>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title10" id="analytic_functions__last_value">
+
+    <h2 class="title topictitle2" id="ariaid-title10">LAST_VALUE Function</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        Returns the expression value from the last row in the window. This same value is repeated for all result
+        rows for the group. The return value is <code class="ph codeph">NULL</code> if the input expression is
+        <code class="ph codeph">NULL</code>.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>LAST_VALUE(<var class="keyword varname">expr</var>) OVER([<var class="keyword varname">partition_by_clause</var>] <var class="keyword varname">order_by_clause</var> [<var class="keyword varname">window_clause</var>])</code></pre>
+
+      <p class="p">
+        The <code class="ph codeph">PARTITION BY</code> clause is optional. The <code class="ph codeph">ORDER BY</code> clause is required. The
+        window clause is optional.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+      <p class="p">
+        If any duplicate values occur in the tuples evaluated by the <code class="ph codeph">ORDER BY</code> clause, the result
+        of this function is not deterministic. Consider adding additional <code class="ph codeph">ORDER BY</code> columns to
+        ensure consistent ordering.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.0.0</span>
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+      <p class="p">
+        The following example uses the same <code class="ph codeph">MAIL_MERGE</code> table as in the example for
+        <a class="xref" href="impala_analytic_functions.html#first_value">FIRST_VALUE Function</a>. Because the default window when <code class="ph codeph">ORDER
+        BY</code> is used is <code class="ph codeph">BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW</code>, the query requires the
+        <code class="ph codeph">UNBOUNDED FOLLOWING</code> to look ahead to subsequent rows and find the last value for each
+        country.
+      </p>
+
+<pre class="pre codeblock"><code>select country, name,
+  last_value(greeting) over (
+    partition by country order by name, greeting
+    rows between unbounded preceding and unbounded following
+  ) as greeting
+  from mail_merge
++---------+---------+--------------+
+| country | name    | greeting     |
++---------+---------+--------------+
+| Germany | Boris   | Guten morgen |
+| Germany | Michael | Guten morgen |
+| Sweden  | Bjorn   | Tja          |
+| Sweden  | Mats    | Tja          |
+| USA     | John    | Hello        |
+| USA     | Pete    | Hello        |
++---------+---------+--------------+
+</code></pre>
+
+      <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+      <p class="p">
+        <a class="xref" href="impala_analytic_functions.html#first_value">FIRST_VALUE Function</a>
+      </p>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title11" id="analytic_functions__lead">
+
+    <h2 class="title topictitle2" id="ariaid-title11">LEAD Function</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        This function returns the value of an expression using column values from a following row. You specify an
+        integer offset, which designates a row position some number of rows after to the current row. Any column
+        references in the expression argument refer to column values from that later row. Typically, the table
+        contains a time sequence or numeric sequence column that clearly distinguishes the ordering of the rows.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>LEAD (<var class="keyword varname">expr</var> [, <var class="keyword varname">offset</var>] [, <var class="keyword varname">default</var>])
+  OVER ([<var class="keyword varname">partition_by_clause</var>] <var class="keyword varname">order_by_clause</var>)</code></pre>
+
+      <p class="p">
+        The <code class="ph codeph">ORDER BY</code> clause is required. The <code class="ph codeph">PARTITION BY</code> clause is optional. The
+        window clause is not allowed.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+      <p class="p">
+        Sometimes used an an alternative to doing a self-join.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.0.0</span>
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+      <p class="p">
+        The following example uses the same stock data created in <a class="xref" href="#window_clause">Window Clause</a>. The query analyzes
+        the closing price for a stock symbol, and for each day evaluates if the closing price for the following day
+        is higher or lower.
+      </p>
+
+<pre class="pre codeblock"><code>select stock_symbol, closing_date, closing_price,
+  case
+    (lead(closing_price,1)
+      over (partition by stock_symbol order by closing_date)
+        - closing_price) &gt; 0
+    when true then "higher"
+    when false then "flat or lower"
+  end as "trending"
+from stock_ticker
+  order by closing_date;
++--------------+---------------------+---------------+---------------+
+| stock_symbol | closing_date        | closing_price | trending      |
++--------------+---------------------+---------------+---------------+
+| JDR          | 2014-09-13 00:00:00 | 12.86         | higher        |
+| JDR          | 2014-09-14 00:00:00 | 12.89         | higher        |
+| JDR          | 2014-09-15 00:00:00 | 12.94         | flat or lower |
+| JDR          | 2014-09-16 00:00:00 | 12.55         | higher        |
+| JDR          | 2014-09-17 00:00:00 | 14.03         | higher        |
+| JDR          | 2014-09-18 00:00:00 | 14.75         | flat or lower |
+| JDR          | 2014-09-19 00:00:00 | 13.98         | NULL          |
++--------------+---------------------+---------------+---------------+
+</code></pre>
+
+      <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+      <p class="p">
+        This function is the converse of <a class="xref" href="impala_analytic_functions.html#lag">LAG Function</a>.
+      </p>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title12" id="analytic_functions__max_analytic">
+
+    <h2 class="title topictitle2" id="ariaid-title12">MAX Function - Analytic Context</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        You can include an <code class="ph codeph">OVER</code> clause with a call to this function to use it as an analytic
+        function. See <a class="xref" href="impala_max.html#max">MAX Function</a> for details and examples.
+      </p>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title13" id="analytic_functions__min_analytic">
+
+    <h2 class="title topictitle2" id="ariaid-title13">MIN Function - Analytic Context</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        You can include an <code class="ph codeph">OVER</code> clause with a call to this function to use it as an analytic
+        function. See <a class="xref" href="impala_min.html#min">MIN Function</a> for details and examples.
+      </p>
+
+    </div>
+
+  </article>
+
+  
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title14" id="analytic_functions__ntile">
+
+    <h2 class="title topictitle2" id="ariaid-title14">NTILE Function (<span class="keyword">Impala 2.3</span> or higher only)</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        Returns the <span class="q">"bucket number"</span> associated with each row, between 1 and the value of an expression. For
+        example, creating 100 buckets puts the lowest 1% of values in the first bucket, while creating 10 buckets
+        puts the lowest 10% of values in the first bucket. Each partition can have a different number of buckets.
+
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>NTILE (<var class="keyword varname">expr</var> [, <var class="keyword varname">offset</var> ...]
+  OVER ([<var class="keyword varname">partition_by_clause</var>] <var class="keyword varname">order_by_clause</var>)</code></pre>
+
+      <p class="p">
+        The <code class="ph codeph">ORDER BY</code> clause is required. The <code class="ph codeph">PARTITION BY</code> clause is optional. The
+        window clause is not allowed.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+      <p class="p">
+        The <span class="q">"ntile"</span> name is derived from the practice of dividing result sets into fourths (quartile), tenths
+        (decile), and so on. The <code class="ph codeph">NTILE()</code> function divides the result set based on an arbitrary
+        percentile value.
+      </p>
+
+      <p class="p">
+        The number of buckets must be a positive integer.
+      </p>
+
+      <p class="p">
+        The number of items in each bucket is identical or almost so, varying by at most 1. If the number of items
+        does not divide evenly between the buckets, the remaining N items are divided evenly among the first N
+        buckets.
+      </p>
+
+      <p class="p">
+        If the number of buckets N is greater than the number of input rows in the partition, then the first N
+        buckets each contain one item, and the remaining buckets are empty.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+      <p class="p">
+        The following example shows divides groups of animals into 4 buckets based on their weight. The
+        <code class="ph codeph">ORDER BY ... DESC</code> clause in the <code class="ph codeph">OVER()</code> clause means that the heaviest 25%
+        are in the first group, and the lightest 25% are in the fourth group. (The <code class="ph codeph">ORDER BY</code> in the
+        outermost part of the query shows how you can order the final result set independently from the order in
+        which the rows are evaluated by the <code class="ph codeph">OVER()</code> clause.) Because there are 9 rows in the group,
+        divided into 4 buckets, the first bucket receives the extra item.
+      </p>
+
+<pre class="pre codeblock"><code>create table animals (name string, kind string, kilos decimal(9,3));
+
+insert into animals values
+  ('Elephant', 'Mammal', 4000), ('Giraffe', 'Mammal', 1200), ('Mouse', 'Mammal', 0.020),
+  ('Condor', 'Bird', 15), ('Horse', 'Mammal', 500), ('Owl', 'Bird', 2.5),
+  ('Ostrich', 'Bird', 145), ('Polar bear', 'Mammal', 700), ('Housecat', 'Mammal', 5);
+
+select name, ntile(4) over (order by kilos desc) as quarter
+  from animals
+order by quarter desc;
++------------+---------+
+| name       | quarter |
++------------+---------+
+| Owl        | 4       |
+| Mouse      | 4       |
+| Condor     | 3       |
+| Housecat   | 3       |
+| Horse      | 2       |
+| Ostrich    | 2       |
+| Elephant   | 1       |
+| Giraffe    | 1       |
+| Polar bear | 1       |
++------------+---------+
+</code></pre>
+
+      <p class="p">
+        The following examples show how the <code class="ph codeph">PARTITION</code> clause works for the
+        <code class="ph codeph">NTILE()</code> function. Here, we divide each kind of animal (mammal or bird) into 2 buckets,
+        the heavier half and the lighter half.
+      </p>
+
+<pre class="pre codeblock"><code>select name, kind, ntile(2) over (partition by kind order by kilos desc) as half
+  from animals
+order by kind;
++------------+--------+------+
+| name       | kind   | half |
++------------+--------+------+
+| Ostrich    | Bird   | 1    |
+| Condor     | Bird   | 1    |
+| Owl        | Bird   | 2    |
+| Elephant   | Mammal | 1    |
+| Giraffe    | Mammal | 1    |
+| Polar bear | Mammal | 1    |
+| Horse      | Mammal | 2    |
+| Housecat   | Mammal | 2    |
+| Mouse      | Mammal | 2    |
++------------+--------+------+
+</code></pre>
+
+      <p class="p">
+        Again, the result set can be ordered independently
+        from the analytic evaluation. This next example lists all the animals heaviest to lightest,
+        showing that elephant and giraffe are in the <span class="q">"top half"</span> of mammals by weight, while
+        housecat and mouse are in the <span class="q">"bottom half"</span>.
+      </p>
+
+<pre class="pre codeblock"><code>select name, kind, ntile(2) over (partition by kind order by kilos desc) as half
+  from animals
+order by kilos desc;
++------------+--------+------+
+| name       | kind   | half |
++------------+--------+------+
+| Elephant   | Mammal | 1    |
+| Giraffe    | Mammal | 1    |
+| Polar bear | Mammal | 1    |
+| Horse      | Mammal | 2    |
+| Ostrich    | Bird   | 1    |
+| Condor     | Bird   | 1    |
+| Housecat   | Mammal | 2    |
+| Owl        | Bird   | 2    |
+| Mouse      | Mammal | 2    |
++------------+--------+------+
+</code></pre>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title15" id="analytic_functions__percent_rank">
+
+    <h2 class="title topictitle2" id="ariaid-title15">PERCENT_RANK Function (<span class="keyword">Impala 2.3</span> or higher only)</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>PERCENT_RANK (<var class="keyword varname">expr</var>)
+  OVER ([<var class="keyword varname">partition_by_clause</var>] <var class="keyword varname">order_by_clause</var>)
+</code></pre>
+
+      <p class="p">
+      Calculates the rank, expressed as a percentage, of each row within a group of rows.
+      If <code class="ph codeph">rank</code> is the value for that same row from the <code class="ph codeph">RANK()</code> function (from 1 to the total number of rows in the partition group),
+      then the <code class="ph codeph">PERCENT_RANK()</code> value is calculated as <code class="ph codeph">(<var class="keyword varname">rank</var> - 1) / (<var class="keyword varname">rows_in_group</var> - 1)</code> .
+      If there is only a single item in the partition group, its <code class="ph codeph">PERCENT_RANK()</code> value is 0.
+      </p>
+
+      <p class="p">
+        The <code class="ph codeph">ORDER BY</code> clause is required. The <code class="ph codeph">PARTITION BY</code> clause is optional. The
+        window clause is not allowed.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+      <p class="p">
+        This function is similar to the <code class="ph codeph">RANK</code> and <code class="ph codeph">CUME_DIST()</code> functions: it returns an ascending sequence representing the position of each
+        row within the rows of the same partition group. The actual numeric sequence is calculated differently,
+        and the handling of duplicate (tied) values is different.
+      </p>
+
+      <p class="p">
+        The return values range from 0 to 1 inclusive.
+        The first row in each partition group always has the value 0.
+        A <code class="ph codeph">NULL</code> value is considered the lowest possible value.
+        In the case of duplicate input values, all the corresponding rows in the result set
+        have an identical value: the lowest <code class="ph codeph">PERCENT_RANK()</code> value of those
+        tied rows. (In contrast to <code class="ph codeph">CUME_DIST()</code>, where all tied rows have
+        the highest <code class="ph codeph">CUME_DIST()</code> value.)
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+      <p class="p">
+        The following example uses the same <code class="ph codeph">ANIMALS</code> table as the examples for <code class="ph codeph">CUME_DIST()</code>
+        and <code class="ph codeph">NTILE()</code>, with a few additional rows to illustrate the results where some values are
+        <code class="ph codeph">NULL</code> or there is only a single row in a partition group.
+      </p>
+
+<pre class="pre codeblock"><code>insert into animals values ('Komodo dragon', 'Reptile', 70);
+insert into animals values ('Unicorn', 'Mythical', NULL);
+insert into animals values ('Fire-breathing dragon', 'Mythical', NULL);
+</code></pre>
+
+      <p class="p">
+        As with <code class="ph codeph">CUME_DIST()</code>, there is an ascending sequence for each kind of animal.
+        For example, the <span class="q">"Birds"</span> and <span class="q">"Mammals"</span> rows each have a <code class="ph codeph">PERCENT_RANK()</code> sequence
+        that ranges from 0 to 1.
+        The <span class="q">"Reptile"</span> row has a <code class="ph codeph">PERCENT_RANK()</code> of 0 because that partition group contains only a single item.
+        Both <span class="q">"Mythical"</span> animals have a <code class="ph codeph">PERCENT_RANK()</code> of 0 because
+        a <code class="ph codeph">NULL</code> is considered the lowest value within its partition group.
+      </p>
+
+<pre class="pre codeblock"><code>select name, kind, percent_rank() over (partition by kind order by kilos) from animals;
++-----------------------+----------+--------------------------+
+| name                  | kind     | percent_rank() OVER(...) |
++-----------------------+----------+--------------------------+
+| Mouse                 | Mammal   | 0                        |
+| Housecat              | Mammal   | 0.2                      |
+| Horse                 | Mammal   | 0.4                      |
+| Polar bear            | Mammal   | 0.6                      |
+| Giraffe               | Mammal   | 0.8                      |
+| Elephant              | Mammal   | 1                        |
+| Komodo dragon         | Reptile  | 0                        |
+| Owl                   | Bird     | 0                        |
+| California Condor     | Bird     | 0.25                     |
+| Andean Condor         | Bird     | 0.25                     |
+| Condor                | Bird     | 0.25                     |
+| Ostrich               | Bird     | 1                        |
+| Fire-breathing dragon | Mythical | 0                        |
+| Unicorn               | Mythical | 0                        |
++-----------------------+----------+--------------------------+
+</code></pre>
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title16" id="analytic_functions__rank">
+
+    <h2 class="title topictitle2" id="ariaid-title16">RANK Function</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        Returns an ascending sequence of integers, starting with 1. The output sequence produces duplicate integers
+        for duplicate values of the <code class="ph codeph">ORDER BY</code> expressions. After generating duplicate output values
+        for the <span class="q">"tied"</span> input values, the function increments the sequence by the number of tied values.
+        Therefore, the sequence contains both duplicates and gaps when the input contains duplicates. Starts the
+        sequence over for each group produced by the <code class="ph codeph">PARTITIONED BY</code> clause.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>RANK() OVER([<var class="keyword varname">partition_by_clause</var>] <var class="keyword varname">order_by_clause</var>)</code></pre>
+
+      <p class="p">
+        The <code class="ph codeph">PARTITION BY</code> clause is optional. The <code class="ph codeph">ORDER BY</code> clause is required. The
+        window clause is not allowed.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+
+
+      <p class="p">
+        Often used for top-N and bottom-N queries. For example, it could produce a <span class="q">"top 10"</span> report including
+        several items that were tied for 10th place.
+      </p>
+
+      <p class="p">
+        Similar to <code class="ph codeph">ROW_NUMBER</code> and <code class="ph codeph">DENSE_RANK</code>. These functions differ in how they
+        treat duplicate combinations of values.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.0.0</span>
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+      <p class="p">
+        The following example demonstrates how the <code class="ph codeph">RANK()</code> function identifies where each value
+        <span class="q">"places"</span> in the result set, producing the same result for duplicate values, and skipping values in the
+        sequence to account for the number of duplicates. For example, when results are ordered by the
+        <code class="ph codeph">X</code> column, both <code class="ph codeph">1</code> values are tied for first; both <code class="ph codeph">2</code>
+        values are tied for third; and so on.
+      </p>
+
+<pre class="pre codeblock"><code>select x, rank() over(order by x) as rank, property from int_t;
++----+------+----------+
+| x  | rank | property |
++----+------+----------+
+| 1  | 1    | square   |
+| 1  | 1    | odd      |
+| 2  | 3    | even     |
+| 2  | 3    | prime    |
+| 3  | 5    | prime    |
+| 3  | 5    | odd      |
+| 4  | 7    | even     |
+| 4  | 7    | square   |
+| 5  | 9    | odd      |
+| 5  | 9    | prime    |
+| 6  | 11   | even     |
+| 6  | 11   | perfect  |
+| 7  | 13   | lucky    |
+| 7  | 13   | lucky    |
+| 7  | 13   | lucky    |
+| 7  | 13   | odd      |
+| 7  | 13   | prime    |
+| 8  | 18   | even     |
+| 9  | 19   | square   |
+| 9  | 19   | odd      |
+| 10 | 21   | round    |
+| 10 | 21   | even     |
++----+------+----------+
+</code></pre>
+
+      <p class="p">
+        The following examples show how the <code class="ph codeph">RANK()</code> function is affected by the
+        <code class="ph codeph">PARTITION</code> property within the <code class="ph codeph">ORDER BY</code> clause.
+      </p>
+
+      <p class="p">
+        Partitioning by the <code class="ph codeph">PROPERTY</code> column groups all the even, odd, and so on values together,
+        and <code class="ph codeph">RANK()</code> returns the place of each value within the group, producing several ascending
+        sequences.
+      </p>
+
+<pre class="pre codeblock"><code>select x, rank() over(partition by property order by x) as rank, property from int_t;
++----+------+----------+
+| x  | rank | property |
++----+------+----------+
+| 2  | 1    | even     |
+| 4  | 2    | even     |
+| 6  | 3    | even     |
+| 8  | 4    | even     |
+| 10 | 5    | even     |
+| 7  | 1    | lucky    |
+| 7  | 1    | lucky    |
+| 7  | 1    | lucky    |
+| 1  | 1    | odd      |
+| 3  | 2    | odd      |
+| 5  | 3    | odd      |
+| 7  | 4    | odd      |
+| 9  | 5    | odd      |
+| 6  | 1    | perfect  |
+| 2  | 1    | prime    |
+| 3  | 2    | prime    |
+| 5  | 3    | prime    |
+| 7  | 4    | prime    |
+| 10 | 1    | round    |
+| 1  | 1    | square   |
+| 4  | 2    | square   |
+| 9  | 3    | square   |
++----+------+----------+
+</code></pre>
+
+      <p class="p">
+        Partitioning by the <code class="ph codeph">X</code> column groups all the duplicate numbers together and returns the
+        place each each value within the group; because each value occurs only 1 or 2 times,
+        <code class="ph codeph">RANK()</code> designates each <code class="ph codeph">X</code> value as either first or second within its
+        group.
+      </p>
+
+<pre class="pre codeblock"><code>select x, rank() over(partition by x order by property) as rank, property from int_t;
++----+------+----------+
+| x  | rank | property |
++----+------+----------+
+| 1  | 1    | odd      |
+| 1  | 2    | square   |
+| 2  | 1    | even     |
+| 2  | 2    | prime    |
+| 3  | 1    | odd      |
+| 3  | 2    | prime    |
+| 4  | 1    | even     |
+| 4  | 2    | square   |
+| 5  | 1    | odd      |
+| 5  | 2    | prime    |
+| 6  | 1    | even     |
+| 6  | 2    | perfect  |
+| 7  | 1    | lucky    |
+| 7  | 1    | lucky    |
+| 7  | 1    | lucky    |
+| 7  | 4    | odd      |
+| 7  | 5    | prime    |
+| 8  | 1    | even     |
+| 9  | 1    | odd      |
+| 9  | 2    | square   |
+| 10 | 1    | even     |
+| 10 | 2    | round    |
++----+------+----------+
+</code></pre>
+
+      <p class="p">
+        The following example shows how a magazine might prepare a list of history's wealthiest people. Croesus and
+        Midas are tied for second, then Crassus is fourth.
+      </p>
+
+<pre class="pre codeblock"><code>select rank() over (order by net_worth desc) as rank, name, net_worth from wealth order by rank, name;
++------+---------+---------------+
+| rank | name    | net_worth     |
++------+---------+---------------+
+| 1    | Solomon | 2000000000.00 |
+| 2    | Croesus | 1000000000.00 |
+| 2    | Midas   | 1000000000.00 |
+| 4    | Crassus | 500000000.00  |
+| 5    | Scrooge | 80000000.00   |
++------+---------+---------------+
+</code></pre>
+
+      <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+      <p class="p">
+        <a class="xref" href="impala_analytic_functions.html#dense_rank">DENSE_RANK Function</a>,
+        <a class="xref" href="impala_analytic_functions.html#row_number">ROW_NUMBER Function</a>
+      </p>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title17" id="analytic_functions__row_number">
+
+    <h2 class="title topictitle2" id="ariaid-title17">ROW_NUMBER Function</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        Returns an ascending sequence of integers, starting with 1. Starts the sequence over for each group
+        produced by the <code class="ph codeph">PARTITIONED BY</code> clause. The output sequence includes different values for
+        duplicate input values. Therefore, the sequence never contains any duplicates or gaps, regardless of
+        duplicate input values.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>ROW_NUMBER() OVER([<var class="keyword varname">partition_by_clause</var>] <var class="keyword varname">order_by_clause</var>)</code></pre>
+
+      <p class="p">
+        The <code class="ph codeph">ORDER BY</code> clause is required. The <code class="ph codeph">PARTITION BY</code> clause is optional. The
+        window clause is not allowed.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+      <p class="p">
+        Often used for top-N and bottom-N queries where the input values are known to be unique, or precisely N
+        rows are needed regardless of duplicate values.
+      </p>
+
+      <p class="p">
+        Because its result value is different for each row in the result set (when used without a <code class="ph codeph">PARTITION
+        BY</code> clause), <code class="ph codeph">ROW_NUMBER()</code> can be used to synthesize unique numeric ID values, for
+        example for result sets involving unique values or tuples.
+      </p>
+
+      <p class="p">
+        Similar to <code class="ph codeph">RANK</code> and <code class="ph codeph">DENSE_RANK</code>. These functions differ in how they treat
+        duplicate combinations of values.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.0.0</span>
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+      <p class="p">
+        The following example demonstrates how <code class="ph codeph">ROW_NUMBER()</code> produces a continuous numeric
+        sequence, even though some values of <code class="ph codeph">X</code> are repeated.
+      </p>
+
+<pre class="pre codeblock"><code>select x, row_number() over(order by x, property) as row_number, property from int_t;
++----+------------+----------+
+| x  | row_number | property |
++----+------------+----------+
+| 1  | 1          | odd      |
+| 1  | 2          | square   |
+| 2  | 3          | even     |
+| 2  | 4          | prime    |
+| 3  | 5          | odd      |
+| 3  | 6          | prime    |
+| 4  | 7          | even     |
+| 4  | 8          | square   |
+| 5  | 9          | odd      |
+| 5  | 10         | prime    |
+| 6  | 11         | even     |
+| 6  | 12         | perfect  |
+| 7  | 13         | lucky    |
+| 7  | 14         | lucky    |
+| 7  | 15         | lucky    |
+| 7  | 16         | odd      |
+| 7  | 17         | prime    |
+| 8  | 18         | even     |
+| 9  | 19         | odd      |
+| 9  | 20         | square   |
+| 10 | 21         | even     |
+| 10 | 22         | round    |
++----+------------+----------+
+</code></pre>
+
+      <p class="p">
+        The following example shows how a financial institution might assign customer IDs to some of history's
+        wealthiest figures. Although two of the people have identical net worth figures, unique IDs are required
+        for this purpose. <code class="ph codeph">ROW_NUMBER()</code> produces a sequence of five different values for the five
+        input rows.
+      </p>
+
+<pre class="pre codeblock"><code>select row_number() over (order by net_worth desc) as account_id, name, net_worth
+  from wealth order by account_id, name;
++------------+---------+---------------+
+| account_id | name    | net_worth     |
++------------+---------+---------------+
+| 1          | Solomon | 2000000000.00 |
+| 2          | Croesus | 1000000000.00 |
+| 3          | Midas   | 1000000000.00 |
+| 4          | Crassus | 500000000.00  |
+| 5          | Scrooge | 80000000.00   |
++------------+---------+---------------+
+</code></pre>
+
+      <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+      <p class="p">
+        <a class="xref" href="impala_analytic_functions.html#rank">RANK Function</a>, <a class="xref" href="impala_analytic_functions.html#dense_rank">DENSE_RANK Function</a>
+      </p>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title18" id="analytic_functions__sum_analytic">
+
+    <h2 class="title topictitle2" id="ariaid-title18">SUM Function - Analytic Context</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        You can include an <code class="ph codeph">OVER</code> clause with a call to this function to use it as an analytic
+        function. See <a class="xref" href="impala_sum.html#sum">SUM Function</a> for details and examples.
+      </p>
+
+    </div>
+
+  </article>
+
+</article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_appx_count_distinct.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_appx_count_distinct.html b/docs/build/html/topics/impala_appx_count_distinct.html
new file mode 100644
index 0000000..efb5004
--- /dev/null
+++ b/docs/build/html/topics/impala_appx_count_distinct.html
@@ -0,0 +1,82 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="appx_count_distinct"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>APPX_COUNT_DISTINCT Query Option (Impala 2.0 or higher only)</title></head><body id="appx_count_distinct"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">APPX_COUNT_DISTINCT Query Option (<span class="keyword">Impala 2.0</span> or higher only)</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      
+      Allows multiple <code class="ph codeph">COUNT(DISTINCT)</code> operations within a single query, by internally rewriting
+      each <code class="ph codeph">COUNT(DISTINCT)</code> to use the <code class="ph codeph">NDV()</code> function. The resulting count is
+      approximate rather than precise.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Type:</strong> Boolean; recognized values are 1 and 0, or <code class="ph codeph">true</code> and <code class="ph codeph">false</code>;
+        any other value interpreted as <code class="ph codeph">false</code>
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Default:</strong> <code class="ph codeph">false</code> (shown as 0 in output of <code class="ph codeph">SET</code> statement)
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+    <p class="p">
+      The following examples show how the <code class="ph codeph">APPX_COUNT_DISTINCT</code> lets you work around the restriction
+      where a query can only evaluate <code class="ph codeph">COUNT(DISTINCT <var class="keyword varname">col_name</var>)</code> for a single
+      column. By default, you can count the distinct values of one column or another, but not both in a single
+      query:
+    </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; select count(distinct x) from int_t;
++-------------------+
+| count(distinct x) |
++-------------------+
+| 10                |
++-------------------+
+[localhost:21000] &gt; select count(distinct property) from int_t;
++--------------------------+
+| count(distinct property) |
++--------------------------+
+| 7                        |
++--------------------------+
+[localhost:21000] &gt; select count(distinct x), count(distinct property) from int_t;
+ERROR: AnalysisException: all DISTINCT aggregate functions need to have the same set of parameters
+as count(DISTINCT x); deviating function: count(DISTINCT property)
+</code></pre>
+
+    <p class="p">
+      When you enable the <code class="ph codeph">APPX_COUNT_DISTINCT</code> query option, now the query with multiple
+      <code class="ph codeph">COUNT(DISTINCT)</code> works. The reason this behavior requires a query option is that each
+      <code class="ph codeph">COUNT(DISTINCT)</code> is rewritten internally to use the <code class="ph codeph">NDV()</code> function instead,
+      which provides an approximate result rather than a precise count.
+    </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; set APPX_COUNT_DISTINCT=true;
+[localhost:21000] &gt; select count(distinct x), count(distinct property) from int_t;
++-------------------+--------------------------+
+| count(distinct x) | count(distinct property) |
++-------------------+--------------------------+
+| 10                | 7                        |
++-------------------+--------------------------+
+</code></pre>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+    <p class="p">
+      <a class="xref" href="impala_count.html#count">COUNT Function</a>,
+      <a class="xref" href="impala_distinct.html#distinct">DISTINCT Operator</a>,
+      <a class="xref" href="impala_ndv.html#ndv">NDV Function</a>
+    </p>
+
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>
\ No newline at end of file

[38/51] [partial] incubator-impala git commit: IMPALA-4181 [DOCS] Publish rendered Impala documentation to ASF site

Posted by jb...@apache.org.

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_datetime_functions.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_datetime_functions.html b/docs/build/html/topics/impala_datetime_functions.html
new file mode 100644
index 0000000..222ae8c
--- /dev/null
+++ b/docs/build/html/topics/impala_datetime_functions.html
@@ -0,0 +1,2657 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_functions.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="datetime_functions"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Impala Date and Time Functions</title></head><body id="datetime_functions"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Impala Date and Time Functions</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      The underlying Impala data type for date and time data is
+      <code class="ph codeph"><a class="xref" href="impala_timestamp.html#timestamp">TIMESTAMP</a></code>, which has both a date and a
+      time portion. Functions that extract a single field, such as <code class="ph codeph">hour()</code> or
+      <code class="ph codeph">minute()</code>, typically return an integer value. Functions that format the date portion, such as
+      <code class="ph codeph">date_add()</code> or <code class="ph codeph">to_date()</code>, typically return a string value.
+    </p>
+
+    <p class="p">
+      You can also adjust a <code class="ph codeph">TIMESTAMP</code> value by adding or subtracting an <code class="ph codeph">INTERVAL</code>
+      expression. See <a class="xref" href="impala_timestamp.html#timestamp">TIMESTAMP Data Type</a> for details. <code class="ph codeph">INTERVAL</code>
+      expressions are also allowed as the second argument for the <code class="ph codeph">date_add()</code> and
+      <code class="ph codeph">date_sub()</code> functions, rather than integers.
+    </p>
+
+    <p class="p">
+      Some of these functions are affected by the setting of the
+      <code class="ph codeph">-use_local_tz_for_unix_timestamp_conversions</code> startup flag for the
+      <span class="keyword cmdname">impalad</span> daemon. This setting is off by default, meaning that
+      functions such as <code class="ph codeph">from_unixtime()</code> and <code class="ph codeph">unix_timestamp()</code>
+      consider the input values to always represent the UTC time zone.
+      This setting also applies when you <code class="ph codeph">CAST()</code> a <code class="ph codeph">BIGINT</code>
+      value to <code class="ph codeph">TIMESTAMP</code>, or a <code class="ph codeph">TIMESTAMP</code>
+      value to <code class="ph codeph">BIGINT</code>.
+      When this setting is enabled, these functions and operations convert to and from
+      values representing the local time zone.
+      See <a class="xref" href="impala_timestamp.html#timestamp">TIMESTAMP Data Type</a> for details about how
+      Impala handles time zone considerations for the <code class="ph codeph">TIMESTAMP</code> data type.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Function reference:</strong>
+    </p>
+
+    <p class="p">
+      Impala supports the following data and time functions:
+    </p>
+
+
+
+    <dl class="dl">
+      
+
+        <dt class="dt dlterm" id="datetime_functions__add_months">
+          <code class="ph codeph">add_months(timestamp date, int months)</code>, <code class="ph codeph">add_months(timestamp date, bigint
+          months)</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Returns the specified date and time plus some number of months.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">timestamp</code>
+          </p>
+          <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+          <p class="p">
+            Same as <code class="ph codeph"><a class="xref" href="#datetime_functions__months_add">months_add()</a></code>.
+            Available in Impala 1.4 and higher. For
+            compatibility when porting code with vendor extensions.
+          </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+          <p class="p">
+            The following examples demonstrate adding months to construct the same
+            day of the month in a different month; how if the current day of the month
+            does not exist in the target month, the last day of that month is substituted;
+            and how a negative argument produces a return value from a previous month.
+          </p>
+<pre class="pre codeblock"><code>
+select now(), add_months(now(), 2);
++-------------------------------+-------------------------------+
+| now()                         | add_months(now(), 2)          |
++-------------------------------+-------------------------------+
+| 2016-05-31 10:47:00.429109000 | 2016-07-31 10:47:00.429109000 |
++-------------------------------+-------------------------------+
+
+select now(), add_months(now(), 1);
++-------------------------------+-------------------------------+
+| now()                         | add_months(now(), 1)          |
++-------------------------------+-------------------------------+
+| 2016-05-31 10:47:14.540226000 | 2016-06-30 10:47:14.540226000 |
++-------------------------------+-------------------------------+
+
+select now(), add_months(now(), -1);
++-------------------------------+-------------------------------+
+| now()                         | add_months(now(), -1)         |
++-------------------------------+-------------------------------+
+| 2016-05-31 10:47:31.732298000 | 2016-04-30 10:47:31.732298000 |
++-------------------------------+-------------------------------+
+</code></pre>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="datetime_functions__adddate">
+          <code class="ph codeph">adddate(timestamp startdate, int days)</code>, <code class="ph codeph">adddate(timestamp startdate, bigint
+          days)</code>,
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Adds a specified number of days to a <code class="ph codeph">TIMESTAMP</code> value. Similar to
+          <code class="ph codeph">date_add()</code>, but starts with an actual <code class="ph codeph">TIMESTAMP</code> value instead of a
+          string that is converted to a <code class="ph codeph">TIMESTAMP</code>.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">timestamp</code>
+          </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+          <p class="p">
+            The following examples show how to add a number of days to a <code class="ph codeph">TIMESTAMP</code>.
+            The number of days can also be negative, which gives the same effect as the <code class="ph codeph">subdate()</code> function.
+          </p>
+<pre class="pre codeblock"><code>
+select now() as right_now, adddate(now(), 30) as now_plus_30;
++-------------------------------+-------------------------------+
+| right_now                     | now_plus_30                   |
++-------------------------------+-------------------------------+
+| 2016-05-20 10:23:08.640111000 | 2016-06-19 10:23:08.640111000 |
++-------------------------------+-------------------------------+
+
+select now() as right_now, adddate(now(), -15) as now_minus_15;
++-------------------------------+-------------------------------+
+| right_now                     | now_minus_15                  |
++-------------------------------+-------------------------------+
+| 2016-05-20 10:23:38.214064000 | 2016-05-05 10:23:38.214064000 |
++-------------------------------+-------------------------------+
+</code></pre>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="datetime_functions__current_timestamp">
+          <code class="ph codeph">current_timestamp()</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Alias for the <code class="ph codeph">now()</code> function.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">timestamp</code>
+          </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+<pre class="pre codeblock"><code>
+select now(), current_timestamp();
++-------------------------------+-------------------------------+
+| now()                         | current_timestamp()           |
++-------------------------------+-------------------------------+
+| 2016-05-19 16:10:14.237849000 | 2016-05-19 16:10:14.237849000 |
++-------------------------------+-------------------------------+
+
+select current_timestamp() as right_now,            
+  current_timestamp() + interval 3 hours as in_three_hours;
++-------------------------------+-------------------------------+
+| right_now                     | in_three_hours                |
++-------------------------------+-------------------------------+
+| 2016-05-19 16:13:20.017117000 | 2016-05-19 19:13:20.017117000 |
++-------------------------------+-------------------------------+
+</code></pre>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="datetime_functions__date_add">
+          <code class="ph codeph">date_add(timestamp startdate, int days)</code>, <code class="ph codeph">date_add(timestamp startdate,
+          <var class="keyword varname">interval_expression</var>)</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Adds a specified number of days to a <code class="ph codeph">TIMESTAMP</code> value.
+          
+          With an <code class="ph codeph">INTERVAL</code>
+          expression as the second argument, you can calculate a delta value using other units such as weeks,
+          years, hours, seconds, and so on; see <a class="xref" href="impala_timestamp.html#timestamp">TIMESTAMP Data Type</a> for details.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">timestamp</code>
+          </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+          <p class="p">
+            The following example shows the simplest usage, of adding a specified number of days
+            to a <code class="ph codeph">TIMESTAMP</code> value:
+          </p>
+<pre class="pre codeblock"><code>
+select now() as right_now, date_add(now(), 7) as next_week;
++-------------------------------+-------------------------------+
+| right_now                     | next_week                     |
++-------------------------------+-------------------------------+
+| 2016-05-20 11:03:48.687055000 | 2016-05-27 11:03:48.687055000 |
++-------------------------------+-------------------------------+
+</code></pre>
+
+          <p class="p">
+            The following examples show the shorthand notation of an <code class="ph codeph">INTERVAL</code>
+            expression, instead of specifying the precise number of days.
+            The <code class="ph codeph">INTERVAL</code> notation also lets you work with units smaller than
+            a single day.
+          </p>
+<pre class="pre codeblock"><code>
+select now() as right_now, date_add(now(), interval 3 weeks) as in_3_weeks;
++-------------------------------+-------------------------------+
+| right_now                     | in_3_weeks                    |
++-------------------------------+-------------------------------+
+| 2016-05-20 11:05:39.173331000 | 2016-06-10 11:05:39.173331000 |
++-------------------------------+-------------------------------+
+
+select now() as right_now, date_add(now(), interval 6 hours) as in_6_hours;
++-------------------------------+-------------------------------+
+| right_now                     | in_6_hours                    |
++-------------------------------+-------------------------------+
+| 2016-05-20 11:13:51.492536000 | 2016-05-20 17:13:51.492536000 |
++-------------------------------+-------------------------------+
+</code></pre>
+
+          <p class="p">
+            Like all date/time functions that deal with months, <code class="ph codeph">date_add()</code>
+            handles nonexistent dates past the end of a month by setting the date to the
+            last day of the month. The following example shows how the nonexistent date
+            April 31st is normalized to April 30th:
+          </p>
+<pre class="pre codeblock"><code>
+select date_add(cast('2016-01-31' as timestamp), interval 3 months) as 'april_31st';
++---------------------+
+| april_31st          |
++---------------------+
+| 2016-04-30 00:00:00 |
++---------------------+
+</code></pre>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="datetime_functions__date_part">
+          <code class="ph codeph">date_part(string, timestamp)</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Similar to
+          <a class="xref" href="impala_datetime_functions.html#datetime_functions__extract"><code class="ph codeph">EXTRACT()</code></a>,
+          with the argument order reversed. Supports the same date and time units as <code class="ph codeph">EXTRACT()</code>.
+          For compatibility with SQL code containing vendor extensions.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">int</code>
+          </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+<pre class="pre codeblock"><code>
+select date_part('year',now()) as current_year;
++--------------+
+| current_year |
++--------------+
+| 2016         |
++--------------+
+
+select date_part('hour',now()) as hour_of_day;
++-------------+
+| hour_of_day |
++-------------+
+| 11          |
++-------------+
+</code></pre>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="datetime_functions__date_sub">
+          <code class="ph codeph">date_sub(timestamp startdate, int days)</code>, <code class="ph codeph">date_sub(timestamp startdate,
+          <var class="keyword varname">interval_expression</var>)</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Subtracts a specified number of days from a <code class="ph codeph">TIMESTAMP</code> value.
+          
+          With an
+          <code class="ph codeph">INTERVAL</code> expression as the second argument, you can calculate a delta value using other
+          units such as weeks, years, hours, seconds, and so on; see <a class="xref" href="impala_timestamp.html#timestamp">TIMESTAMP Data Type</a>
+          for details.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">timestamp</code>
+          </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+          <p class="p">
+            The following example shows the simplest usage, of subtracting a specified number of days
+            from a <code class="ph codeph">TIMESTAMP</code> value:
+          </p>
+<pre class="pre codeblock"><code>
+select now() as right_now, date_sub(now(), 7) as last_week;
++-------------------------------+-------------------------------+
+| right_now                     | last_week                     |
++-------------------------------+-------------------------------+
+| 2016-05-20 11:21:30.491011000 | 2016-05-13 11:21:30.491011000 |
++-------------------------------+-------------------------------+
+</code></pre>
+          <p class="p">
+            The following examples show the shorthand notation of an <code class="ph codeph">INTERVAL</code>
+            expression, instead of specifying the precise number of days.
+            The <code class="ph codeph">INTERVAL</code> notation also lets you work with units smaller than
+            a single day.
+          </p>
+<pre class="pre codeblock"><code>
+select now() as right_now, date_sub(now(), interval 3 weeks) as 3_weeks_ago;
++-------------------------------+-------------------------------+
+| right_now                     | 3_weeks_ago                   |
++-------------------------------+-------------------------------+
+| 2016-05-20 11:23:05.176953000 | 2016-04-29 11:23:05.176953000 |
++-------------------------------+-------------------------------+
+
+select now() as right_now, date_sub(now(), interval 6 hours) as 6_hours_ago;
++-------------------------------+-------------------------------+
+| right_now                     | 6_hours_ago                   |
++-------------------------------+-------------------------------+
+| 2016-05-20 11:23:35.439631000 | 2016-05-20 05:23:35.439631000 |
++-------------------------------+-------------------------------+
+</code></pre>
+
+          <p class="p">
+            Like all date/time functions that deal with months, <code class="ph codeph">date_add()</code>
+            handles nonexistent dates past the end of a month by setting the date to the
+            last day of the month. The following example shows how the nonexistent date
+            April 31st is normalized to April 30th:
+          </p>
+<pre class="pre codeblock"><code>
+select date_sub(cast('2016-05-31' as timestamp), interval 1 months) as 'april_31st';
++---------------------+
+| april_31st          |
++---------------------+
+| 2016-04-30 00:00:00 |
++---------------------+
+</code></pre>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="datetime_functions__datediff">
+          <code class="ph codeph">datediff(timestamp enddate, timestamp startdate)</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Returns the number of days between two <code class="ph codeph">TIMESTAMP</code> values.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">int</code>
+          </p>
+          <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+          <p class="p">
+            If the first argument represents a later date than the second argument,
+            the return value is positive. If both arguments represent the same date,
+            the return value is zero. The time portions of the <code class="ph codeph">TIMESTAMP</code>
+            values are irrelevant. For example, 11:59 PM on one day and 12:01 on the next
+            day represent a <code class="ph codeph">datediff()</code> of -1 because the date/time values
+            represent different days, even though the <code class="ph codeph">TIMESTAMP</code> values differ by only 2 minutes.
+          </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+          <p class="p">
+            The following example shows how comparing a <span class="q">"late"</span> value with 
+            an <span class="q">"earlier"</span> value produces a positive number. In this case,
+            the result is (365 * 5) + 1, because one of the intervening years is
+            a leap year.
+          </p>
+<pre class="pre codeblock"><code>
+select now() as right_now, datediff(now() + interval 5 years, now()) as in_5_years;
++-------------------------------+------------+
+| right_now                     | in_5_years |
++-------------------------------+------------+
+| 2016-05-20 13:43:55.873826000 | 1826       |
++-------------------------------+------------+
+</code></pre>
+          <p class="p">
+            The following examples show how the return value represent the number of days
+            between the associated dates, regardless of the time portion of each <code class="ph codeph">TIMESTAMP</code>.
+            For example, different times on the same day produce a <code class="ph codeph">date_diff()</code> of 0,
+            regardless of which one is earlier or later. But if the arguments represent different dates,
+            <code class="ph codeph">date_diff()</code> returns a non-zero integer value, regardless of the time portions
+            of the dates.
+          </p>
+<pre class="pre codeblock"><code>
+select now() as right_now, datediff(now(), now() + interval 4 hours) as in_4_hours;
++-------------------------------+------------+
+| right_now                     | in_4_hours |
++-------------------------------+------------+
+| 2016-05-20 13:42:05.302747000 | 0          |
++-------------------------------+------------+
+
+select now() as right_now, datediff(now(), now() - interval 4 hours) as 4_hours_ago;
++-------------------------------+-------------+
+| right_now                     | 4_hours_ago |
++-------------------------------+-------------+
+| 2016-05-20 13:42:21.134958000 | 0           |
++-------------------------------+-------------+
+
+select now() as right_now, datediff(now(), now() + interval 12 hours) as in_12_hours;
++-------------------------------+-------------+
+| right_now                     | in_12_hours |
++-------------------------------+-------------+
+| 2016-05-20 13:42:44.765873000 | -1          |
++-------------------------------+-------------+
+
+select now() as right_now, datediff(now(), now() - interval 18 hours) as 18_hours_ago;
++-------------------------------+--------------+
+| right_now                     | 18_hours_ago |
++-------------------------------+--------------+
+| 2016-05-20 13:54:38.829827000 | 1            |
++-------------------------------+--------------+
+</code></pre>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="datetime_functions__day">
+          
+          <code class="ph codeph">day(timestamp date), <span class="ph" id="datetime_functions__dayofmonth">dayofmonth(timestamp date)</span></code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Returns the day field from the date portion of a <code class="ph codeph">TIMESTAMP</code>.
+          The value represents the day of the month, therefore is in the range 1-31, or less for
+          months without 31 days.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">int</code>
+          </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+          <p class="p">
+            The following examples show how the day value corresponds to the day
+            of the month, resetting back to 1 at the start of each month.
+          </p>
+<pre class="pre codeblock"><code>
+select now(), day(now());
++-------------------------------+------------+
+| now()                         | day(now()) |
++-------------------------------+------------+
+| 2016-05-20 15:01:51.042185000 | 20         |
++-------------------------------+------------+
+
+select now() + interval 11 days, day(now() + interval 11 days);
++-------------------------------+-------------------------------+
+| now() + interval 11 days      | day(now() + interval 11 days) |
++-------------------------------+-------------------------------+
+| 2016-05-31 15:05:56.843139000 | 31                            |
++-------------------------------+-------------------------------+
+
+select now() + interval 12 days, day(now() + interval 12 days);
++-------------------------------+-------------------------------+
+| now() + interval 12 days      | day(now() + interval 12 days) |
++-------------------------------+-------------------------------+
+| 2016-06-01 15:06:05.074236000 | 1                             |
++-------------------------------+-------------------------------+
+</code></pre>
+          <p class="p">
+            The following examples show how the day value is <code class="ph codeph">NULL</code>
+            for nonexistent dates or misformatted date strings.
+          </p>
+<pre class="pre codeblock"><code>
+-- 2016 is a leap year, so it has a Feb. 29.
+select day('2016-02-29');
++-------------------+
+| day('2016-02-29') |
++-------------------+
+| 29                |
++-------------------+
+
+-- 2015 is not a leap year, so Feb. 29 is nonexistent.
+select day('2015-02-29');
++-------------------+
+| day('2015-02-29') |
++-------------------+
+| NULL              |
++-------------------+
+
+-- A string that does not match the expected YYYY-MM-DD format
+-- produces an invalid TIMESTAMP, causing day() to return NULL.
+select day('2016-02-028');
++--------------------+
+| day('2016-02-028') |
++--------------------+
+| NULL               |
++--------------------+
+</code></pre>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="datetime_functions__dayname">
+          <code class="ph codeph">dayname(timestamp date)</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Returns the day field from a <code class="ph codeph">TIMESTAMP</code> value, converted to the string
+          corresponding to that day name. The range of return values is <code class="ph codeph">'Sunday'</code> to
+          <code class="ph codeph">'Saturday'</code>. Used in report-generating queries, as an alternative to calling
+          <code class="ph codeph">dayofweek()</code> and turning that numeric return value into a string using a
+          <code class="ph codeph">CASE</code> expression.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+          </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+          <p class="p">
+            The following examples show the day name associated with
+            <code class="ph codeph">TIMESTAMP</code> values representing different days.
+          </p>
+<pre class="pre codeblock"><code>
+select now() as right_now,
+  dayofweek(now()) as todays_day_of_week,
+  dayname(now()) as todays_day_name;
++-------------------------------+--------------------+-----------------+
+| right_now                     | todays_day_of_week | todays_day_name |
++-------------------------------+--------------------+-----------------+
+| 2016-05-31 10:57:03.953670000 | 3                  | Tuesday         |
++-------------------------------+--------------------+-----------------+
+
+select now() + interval 1 day as tomorrow,
+  dayname(now() + interval 1 day) as tomorrows_day_name;
++-------------------------------+--------------------+
+| tomorrow                      | tomorrows_day_name |
++-------------------------------+--------------------+
+| 2016-06-01 10:58:53.945761000 | Wednesday          |
++-------------------------------+--------------------+
+</code></pre>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="datetime_functions__dayofweek">
+          
+          <code class="ph codeph">dayofweek(timestamp date)</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Returns the day field from the date portion of a <code class="ph codeph">TIMESTAMP</code>, corresponding to the day of
+          the week. The range of return values is 1 (Sunday) to 7 (Saturday).
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">int</code>
+          </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+<pre class="pre codeblock"><code>
+select now() as right_now,
+  dayofweek(now()) as todays_day_of_week,
+  dayname(now()) as todays_day_name;
++-------------------------------+--------------------+-----------------+
+| right_now                     | todays_day_of_week | todays_day_name |
++-------------------------------+--------------------+-----------------+
+| 2016-05-31 10:57:03.953670000 | 3                  | Tuesday         |
++-------------------------------+--------------------+-----------------+
+</code></pre>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="datetime_functions__dayofyear">
+          <code class="ph codeph">dayofyear(timestamp date)</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Returns the day field from a <code class="ph codeph">TIMESTAMP</code> value, corresponding to the day
+          of the year. The range of return values is 1 (January 1) to 366 (December 31 of a leap year).
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">int</code>
+          </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+          <p class="p">
+            The following examples show return values from the
+            <code class="ph codeph">dayofyear()</code> function. The same date
+            in different years returns a different day number
+            for all dates after February 28,
+            because 2016 is a leap year while 2015 is not a leap year.
+          </p>
+<pre class="pre codeblock"><code>
+select now() as right_now,
+  dayofyear(now()) as today_day_of_year;
++-------------------------------+-------------------+
+| right_now                     | today_day_of_year |
++-------------------------------+-------------------+
+| 2016-05-31 11:05:48.314932000 | 152               |
++-------------------------------+-------------------+
+
+select now() - interval 1 year as last_year,
+  dayofyear(now() - interval 1 year) as year_ago_day_of_year;
++-------------------------------+----------------------+
+| last_year                     | year_ago_day_of_year |
++-------------------------------+----------------------+
+| 2015-05-31 11:07:03.733689000 | 151                  |
++-------------------------------+----------------------+
+</code></pre>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="datetime_functions__days_add">
+          <code class="ph codeph">days_add(timestamp startdate, int days)</code>, <code class="ph codeph">days_add(timestamp startdate, bigint
+          days)</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Adds a specified number of days to a <code class="ph codeph">TIMESTAMP</code> value. Similar to
+          <code class="ph codeph">date_add()</code>, but starts with an actual <code class="ph codeph">TIMESTAMP</code> value instead of a
+          string that is converted to a <code class="ph codeph">TIMESTAMP</code>.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">timestamp</code>
+          </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+<pre class="pre codeblock"><code>
+select now() as right_now, days_add(now(), 31) as 31_days_later;
++-------------------------------+-------------------------------+
+| right_now                     | 31_days_later                 |
++-------------------------------+-------------------------------+
+| 2016-05-31 11:12:32.216764000 | 2016-07-01 11:12:32.216764000 |
++-------------------------------+-------------------------------+
+</code></pre>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="datetime_functions__days_sub">
+          <code class="ph codeph">days_sub(timestamp startdate, int days)</code>, <code class="ph codeph">days_sub(timestamp startdate, bigint
+          days)</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Subtracts a specified number of days from a <code class="ph codeph">TIMESTAMP</code> value. Similar to
+          <code class="ph codeph">date_sub()</code>, but starts with an actual <code class="ph codeph">TIMESTAMP</code> value instead of a
+          string that is converted to a <code class="ph codeph">TIMESTAMP</code>.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">timestamp</code>
+          </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+<pre class="pre codeblock"><code>
+select now() as right_now, days_sub(now(), 31) as 31_days_ago;
++-------------------------------+-------------------------------+
+| right_now                     | 31_days_ago                   |
++-------------------------------+-------------------------------+
+| 2016-05-31 11:13:42.163905000 | 2016-04-30 11:13:42.163905000 |
++-------------------------------+-------------------------------+
+</code></pre>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="datetime_functions__extract">
+          <code class="ph codeph">extract(timestamp, string unit)</code><code class="ph codeph">extract(unit FROM timestamp)</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Returns one of the numeric date or time fields from a <code class="ph codeph">TIMESTAMP</code> value.
+          <p class="p">
+            <strong class="ph b">Unit argument:</strong> The <code class="ph codeph">unit</code> string can be one of <code class="ph codeph">year</code>,
+            <code class="ph codeph">month</code>, <code class="ph codeph">day</code>, <code class="ph codeph">hour</code>, <code class="ph codeph">minute</code>,
+            <code class="ph codeph">second</code>, or <code class="ph codeph">millisecond</code>. This argument value is case-insensitive.
+          </p>
+          <div class="p">
+            In Impala 2.0 and higher, you can use special syntax rather than a regular function call, for
+            compatibility with code that uses the SQL-99 format with the <code class="ph codeph">FROM</code> keyword. With this
+            style, the unit names are identifiers rather than <code class="ph codeph">STRING</code> literals. For example, the
+            following calls are both equivalent:
+<pre class="pre codeblock"><code>extract(year from now());
+extract(now(), "year");
+</code></pre>
+          </div>
+          <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+          <p class="p">
+            Typically used in <code class="ph codeph">GROUP BY</code> queries to arrange results by hour,
+            day, month, and so on. You can also use this function in an <code class="ph codeph">INSERT ... SELECT</code> into a
+            partitioned table to split up <code class="ph codeph">TIMESTAMP</code> values into individual parts, if the
+            partitioned table has separate partition key columns representing year, month, day, and so on. If you
+            need to divide by more complex units of time, such as by week or by quarter, use the
+            <code class="ph codeph">TRUNC()</code> function instead.
+          </p>
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">int</code>
+          </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+<pre class="pre codeblock"><code>
+select now() as right_now,
+  extract(year from now()) as this_year,
+  extract(month from now()) as this_month;
++-------------------------------+-----------+------------+
+| right_now                     | this_year | this_month |
++-------------------------------+-----------+------------+
+| 2016-05-31 11:18:43.310328000 | 2016      | 5          |
++-------------------------------+-----------+------------+
+
+select now() as right_now,
+  extract(day from now()) as this_day,  
+  extract(hour from now()) as this_hour;  
++-------------------------------+----------+-----------+
+| right_now                     | this_day | this_hour |
++-------------------------------+----------+-----------+
+| 2016-05-31 11:19:24.025303000 | 31       | 11        |
++-------------------------------+----------+-----------+
+</code></pre>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="datetime_functions__from_unixtime">
+          <code class="ph codeph">from_unixtime(bigint unixtime[, string format])</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Converts the number of seconds from the Unix epoch to the specified time into a string in
+          the local time zone.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+          </p>
+          <p class="p">
+        In Impala 2.2.0 and higher, built-in functions that accept or return integers representing <code class="ph codeph">TIMESTAMP</code> values
+        use the <code class="ph codeph">BIGINT</code> type for parameters and return values, rather than <code class="ph codeph">INT</code>.
+        This change lets the date and time functions avoid an overflow error that would otherwise occur
+        on January 19th, 2038 (known as the
+        <a class="xref" href="http://en.wikipedia.org/wiki/Year_2038_problem" target="_blank"><span class="q">"Year 2038 problem"</span> or <span class="q">"Y2K38 problem"</span></a>).
+        This change affects the <code class="ph codeph">from_unixtime()</code> and <code class="ph codeph">unix_timestamp()</code> functions.
+        You might need to change application code that interacts with these functions, change the types of
+        columns that store the return values, or add <code class="ph codeph">CAST()</code> calls to SQL statements that
+        call these functions.
+      </p>
+          <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+          <p class="p">
+            The format string accepts the variations allowed for the <code class="ph codeph">TIMESTAMP</code>
+            data type: date plus time, date by itself, time by itself, and optional fractional seconds for the
+            time. See <a class="xref" href="impala_timestamp.html#timestamp">TIMESTAMP Data Type</a> for details.
+          </p>
+          <p class="p">
+            Currently, the format string is case-sensitive, especially to distinguish <code class="ph codeph">m</code> for
+            minutes and <code class="ph codeph">M</code> for months. In Impala 1.3 and later, you can switch the order of
+            elements, use alternative separator characters, and use a different number of placeholders for each
+            unit. Adding more instances of <code class="ph codeph">y</code>, <code class="ph codeph">d</code>, <code class="ph codeph">H</code>, and so on
+            produces output strings zero-padded to the requested number of characters. The exception is
+            <code class="ph codeph">M</code> for months, where <code class="ph codeph">M</code> produces a non-padded value such as
+            <code class="ph codeph">3</code>, <code class="ph codeph">MM</code> produces a zero-padded value such as <code class="ph codeph">03</code>,
+            <code class="ph codeph">MMM</code> produces an abbreviated month name such as <code class="ph codeph">Mar</code>, and sequences of
+            4 or more <code class="ph codeph">M</code> are not allowed. A date string including all fields could be
+            <code class="ph codeph">"yyyy-MM-dd HH:mm:ss.SSSSSS"</code>, <code class="ph codeph">"dd/MM/yyyy HH:mm:ss.SSSSSS"</code>,
+            <code class="ph codeph">"MMM dd, yyyy HH.mm.ss (SSSSSS)"</code> or other combinations of placeholders and separator
+            characters.
+          </p>
+          <p class="p">
+        The way this function deals with time zones when converting to or from <code class="ph codeph">TIMESTAMP</code>
+        values is affected by the <code class="ph codeph">-use_local_tz_for_unix_timestamp_conversions</code> startup flag for the
+        <span class="keyword cmdname">impalad</span> daemon. See <a class="xref" href="../shared/../topics/impala_timestamp.html#timestamp">TIMESTAMP Data Type</a> for details about
+        how Impala handles time zone considerations for the <code class="ph codeph">TIMESTAMP</code> data type.
+      </p>
+          <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+            <p class="p">
+              The more flexible format strings allowed with the built-in functions do not change the rules about
+              using <code class="ph codeph">CAST()</code> to convert from a string to a <code class="ph codeph">TIMESTAMP</code> value. Strings
+              being converted through <code class="ph codeph">CAST()</code> must still have the elements in the specified order and use the specified delimiter
+              characters, as described in <a class="xref" href="impala_timestamp.html#timestamp">TIMESTAMP Data Type</a>.
+            </p>
+          </div>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+<pre class="pre codeblock"><code>select from_unixtime(1392394861,"yyyy-MM-dd HH:mm:ss.SSSS");
++-------------------------------------------------------+
+| from_unixtime(1392394861, 'yyyy-mm-dd hh:mm:ss.ssss') |
++-------------------------------------------------------+
+| 2014-02-14 16:21:01.0000                              |
++-------------------------------------------------------+
+
+select from_unixtime(1392394861,"yyyy-MM-dd");
++-----------------------------------------+
+| from_unixtime(1392394861, 'yyyy-mm-dd') |
++-----------------------------------------+
+| 2014-02-14                              |
++-----------------------------------------+
+
+select from_unixtime(1392394861,"HH:mm:ss.SSSS");
++--------------------------------------------+
+| from_unixtime(1392394861, 'hh:mm:ss.ssss') |
++--------------------------------------------+
+| 16:21:01.0000                              |
++--------------------------------------------+
+
+select from_unixtime(1392394861,"HH:mm:ss");
++---------------------------------------+
+| from_unixtime(1392394861, 'hh:mm:ss') |
++---------------------------------------+
+| 16:21:01                              |
++---------------------------------------+</code></pre>
+          <div class="p">
+        <code class="ph codeph">unix_timestamp()</code> and <code class="ph codeph">from_unixtime()</code> are often used in combination to
+        convert a <code class="ph codeph">TIMESTAMP</code> value into a particular string format. For example:
+<pre class="pre codeblock"><code>select from_unixtime(unix_timestamp(now() + interval 3 days),
+  'yyyy/MM/dd HH:mm') as yyyy_mm_dd_hh_mm;
++------------------+
+| yyyy_mm_dd_hh_mm |
++------------------+
+| 2016/06/03 11:38 |
++------------------+
+</code></pre>
+      </div>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="datetime_functions__from_utc_timestamp">
+          <code class="ph codeph">from_utc_timestamp(timestamp, string timezone)</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Converts a specified UTC timestamp value into the appropriate value for a specified time
+          zone.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">timestamp</code>
+          </p>
+          <p class="p">
+            <strong class="ph b">Usage notes:</strong> Often used to translate UTC time zone data stored in a table back to the local
+            date and time for reporting. The opposite of the <code class="ph codeph">to_utc_timestamp()</code> function.
+          </p>
+          <p class="p">
+        To determine the time zone of the server you are connected to, in <span class="keyword">Impala 2.3</span> and
+        higher you can call the <code class="ph codeph">timeofday()</code> function, which includes the time zone
+        specifier in its return value. Remember that with cloud computing, the server you interact
+        with might be in a different time zone than you are, or different sessions might connect to
+        servers in different time zones, or a cluster might include servers in more than one time zone.
+      </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+          <p class="p">
+            See discussion of time zones in <a class="xref" href="impala_timestamp.html#timestamp">TIMESTAMP Data Type</a>
+            for information about using this function for conversions between the local time zone and UTC.
+          </p>
+          <p class="p">
+            The following example shows how when <code class="ph codeph">TIMESTAMP</code> values representing the UTC time zone
+            are stored in a table, a query can display the equivalent local date and time for a different time zone.
+          </p>
+<pre class="pre codeblock"><code>
+with t1 as (select cast('2016-06-02 16:25:36.116143000' as timestamp) as utc_datetime)
+  select utc_datetime as 'Date/time in Greenwich UK',
+    from_utc_timestamp(utc_datetime, 'PDT')
+      as 'Equivalent in California USA'
+  from t1;
++-------------------------------+-------------------------------+
+| date/time in greenwich uk     | equivalent in california usa  |
++-------------------------------+-------------------------------+
+| 2016-06-02 16:25:36.116143000 | 2016-06-02 09:25:36.116143000 |
++-------------------------------+-------------------------------+
+</code></pre>
+          <p class="p">
+            The following example shows that for a date and time when daylight savings
+            is in effect (<code class="ph codeph">PDT</code>), the UTC time
+            is 7 hours ahead of the local California time; while when daylight savings
+            is not in effect (<code class="ph codeph">PST</code>), the UTC time is 8 hours ahead of
+            the local California time.
+          </p>
+<pre class="pre codeblock"><code>
+select now() as local_datetime,
+  to_utc_timestamp(now(), 'PDT') as utc_datetime;
++-------------------------------+-------------------------------+
+| local_datetime                | utc_datetime                  |
++-------------------------------+-------------------------------+
+| 2016-05-31 11:50:02.316883000 | 2016-05-31 18:50:02.316883000 |
++-------------------------------+-------------------------------+
+
+select '2016-01-05' as local_datetime,
+  to_utc_timestamp('2016-01-05', 'PST') as utc_datetime;
++----------------+---------------------+
+| local_datetime | utc_datetime        |
++----------------+---------------------+
+| 2016-01-05     | 2016-01-05 08:00:00 |
++----------------+---------------------+
+</code></pre>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="datetime_functions__hour">
+          <code class="ph codeph">hour(timestamp date)</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Returns the hour field from a <code class="ph codeph">TIMESTAMP</code> field.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">int</code>
+          </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+<pre class="pre codeblock"><code>
+select now() as right_now, hour(now()) as current_hour;
++-------------------------------+--------------+
+| right_now                     | current_hour |
++-------------------------------+--------------+
+| 2016-06-01 14:14:12.472846000 | 14           |
++-------------------------------+--------------+
+
+select now() + interval 12 hours as 12_hours_from_now,
+  hour(now() + interval 12 hours) as hour_in_12_hours;
++-------------------------------+-------------------+
+| 12_hours_from_now             | hour_in_12_hours  |
++-------------------------------+-------------------+
+| 2016-06-02 02:15:32.454750000 | 2                 |
++-------------------------------+-------------------+
+</code></pre>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="datetime_functions__hours_add">
+          <code class="ph codeph">hours_add(timestamp date, int hours)</code>, <code class="ph codeph">hours_add(timestamp date, bigint
+          hours)</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Returns the specified date and time plus some number of hours.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">timestamp</code>
+          </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+<pre class="pre codeblock"><code>
+select now() as right_now,
+  hours_add(now(), 12) as in_12_hours;
++-------------------------------+-------------------------------+
+| right_now                     | in_12_hours                   |
++-------------------------------+-------------------------------+
+| 2016-06-01 14:19:48.948107000 | 2016-06-02 02:19:48.948107000 |
++-------------------------------+-------------------------------+
+</code></pre>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="datetime_functions__hours_sub">
+          <code class="ph codeph">hours_sub(timestamp date, int hours)</code>, <code class="ph codeph">hours_sub(timestamp date, bigint
+          hours)</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Returns the specified date and time minus some number of hours.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">timestamp</code>
+          </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+<pre class="pre codeblock"><code>
+select now() as right_now,
+  hours_sub(now(), 18) as 18_hours_ago;
++-------------------------------+-------------------------------+
+| right_now                     | 18_hours_ago                  |
++-------------------------------+-------------------------------+
+| 2016-06-01 14:23:13.868150000 | 2016-05-31 20:23:13.868150000 |
++-------------------------------+-------------------------------+
+</code></pre>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="datetime_functions__int_months_between">
+          <code class="ph codeph">int_months_between(timestamp newer, timestamp older)</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Returns the number of months between the date portions of two <code class="ph codeph">TIMESTAMP</code> values,
+          as an <code class="ph codeph">INT</code> representing only the full months that passed.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">int</code>
+          </p>
+          <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.3.0</span>
+      </p>
+          <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+          <p class="p">
+            Typically used in business contexts, for example to determine whether
+            a specified number of months have passed or whether some end-of-month deadline was reached.
+          </p>
+          <p class="p">
+            The method of determining the number of elapsed months includes some special handling of
+            months with different numbers of days that creates edge cases for dates between the
+            28th and 31st days of certain months. See <code class="ph codeph">months_between()</code> for details.
+            The <code class="ph codeph">int_months_between()</code> result is essentially the <code class="ph codeph">floor()</code>
+            of the <code class="ph codeph">months_between()</code> result.
+          </p>
+          <p class="p">
+            If either value is <code class="ph codeph">NULL</code>, which could happen for example when converting a
+            nonexistent date string such as <code class="ph codeph">'2015-02-29'</code> to a <code class="ph codeph">TIMESTAMP</code>,
+            the result is also <code class="ph codeph">NULL</code>.
+          </p>
+          <p class="p">
+            If the first argument represents an earlier time than the second argument, the result is negative.
+          </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>/* Less than a full month = 0. */
+select int_months_between('2015-02-28', '2015-01-29');
++------------------------------------------------+
+| int_months_between('2015-02-28', '2015-01-29') |
++------------------------------------------------+
+| 0                                              |
++------------------------------------------------+
+
+/* Last day of month to last day of next month = 1. */
+select int_months_between('2015-02-28', '2015-01-31');
++------------------------------------------------+
+| int_months_between('2015-02-28', '2015-01-31') |
++------------------------------------------------+
+| 1                                              |
++------------------------------------------------+
+
+/* Slightly less than 2 months = 1. */
+select int_months_between('2015-03-28', '2015-01-31');
++------------------------------------------------+
+| int_months_between('2015-03-28', '2015-01-31') |
++------------------------------------------------+
+| 1                                              |
++------------------------------------------------+
+
+/* 2 full months (identical days of the month) = 2. */
+select int_months_between('2015-03-31', '2015-01-31');
++------------------------------------------------+
+| int_months_between('2015-03-31', '2015-01-31') |
++------------------------------------------------+
+| 2                                              |
++------------------------------------------------+
+
+/* Last day of month to last day of month-after-next = 2. */
+select int_months_between('2015-03-31', '2015-01-30');
++------------------------------------------------+
+| int_months_between('2015-03-31', '2015-01-30') |
++------------------------------------------------+
+| 2                                              |
++------------------------------------------------+
+</code></pre>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="datetime_functions__microseconds_add">
+          <code class="ph codeph">microseconds_add(timestamp date, int microseconds)</code>, <code class="ph codeph">microseconds_add(timestamp
+          date, bigint microseconds)</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Returns the specified date and time plus some number of microseconds.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">timestamp</code>
+          </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+<pre class="pre codeblock"><code>
+select now() as right_now,
+  microseconds_add(now(), 500000) as half_a_second_from_now;
++-------------------------------+-------------------------------+
+| right_now                     | half_a_second_from_now        |
++-------------------------------+-------------------------------+
+| 2016-06-01 14:25:11.455051000 | 2016-06-01 14:25:11.955051000 |
++-------------------------------+-------------------------------+
+</code></pre>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="datetime_functions__microseconds_sub">
+          <code class="ph codeph">microseconds_sub(timestamp date, int microseconds)</code>, <code class="ph codeph">microseconds_sub(timestamp
+          date, bigint microseconds)</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Returns the specified date and time minus some number of microseconds.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">timestamp</code>
+          </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+<pre class="pre codeblock"><code>
+select now() as right_now,
+  microseconds_sub(now(), 500000) as half_a_second_ago;
++-------------------------------+-------------------------------+
+| right_now                     | half_a_second_ago             |
++-------------------------------+-------------------------------+
+| 2016-06-01 14:26:16.509990000 | 2016-06-01 14:26:16.009990000 |
++-------------------------------+-------------------------------+
+</code></pre>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="datetime_functions__millisecond">
+          <code class="ph codeph">millisecond(timestamp)</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Returns the millisecond portion of a <code class="ph codeph">TIMESTAMP</code> value.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">int</code>
+          </p>
+          <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.5.0</span>
+      </p>
+          <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+          <p class="p">
+            The millisecond value is truncated, not rounded, if the <code class="ph codeph">TIMESTAMP</code>
+            value contains more than 3 significant digits to the right of the decimal point.
+          </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+<pre class="pre codeblock"><code>
+252.4 milliseconds truncated to 252.
+
+select now(), millisecond(now());
++-------------------------------+--------------------+
+| now()                         | millisecond(now()) |
++-------------------------------+--------------------+
+| 2016-03-14 22:30:25.252400000 | 252                |
++-------------------------------+--------------------+
+
+761.767 milliseconds truncated to 761.
+
+select now(), millisecond(now());
++-------------------------------+--------------------+
+| now()                         | millisecond(now()) |
++-------------------------------+--------------------+
+| 2016-03-14 22:30:58.761767000 | 761                |
++-------------------------------+--------------------+
+</code></pre>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="datetime_functions__milliseconds_add">
+          <code class="ph codeph">milliseconds_add(timestamp date, int milliseconds)</code>, <code class="ph codeph">milliseconds_add(timestamp
+          date, bigint milliseconds)</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Returns the specified date and time plus some number of milliseconds.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">timestamp</code>
+          </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+<pre class="pre codeblock"><code>
+select now() as right_now,
+  milliseconds_add(now(), 1500) as 1_point_5_seconds_from_now;
++-------------------------------+-------------------------------+
+| right_now                     | 1_point_5_seconds_from_now    |
++-------------------------------+-------------------------------+
+| 2016-06-01 14:30:30.067366000 | 2016-06-01 14:30:31.567366000 |
++-------------------------------+-------------------------------+
+</code></pre>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="datetime_functions__milliseconds_sub">
+          <code class="ph codeph">milliseconds_sub(timestamp date, int milliseconds)</code>, <code class="ph codeph">milliseconds_sub(timestamp
+          date, bigint milliseconds)</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Returns the specified date and time minus some number of milliseconds.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">timestamp</code>
+          </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+<pre class="pre codeblock"><code>
+select now() as right_now,
+  milliseconds_sub(now(), 1500) as 1_point_5_seconds_ago;
++-------------------------------+-------------------------------+
+| right_now                     | 1_point_5_seconds_ago         |
++-------------------------------+-------------------------------+
+| 2016-06-01 14:30:53.467140000 | 2016-06-01 14:30:51.967140000 |
++-------------------------------+-------------------------------+
+</code></pre>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="datetime_functions__minute">
+          <code class="ph codeph">minute(timestamp date)</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Returns the minute field from a <code class="ph codeph">TIMESTAMP</code> value.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">int</code>
+          </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+<pre class="pre codeblock"><code>
+select now() as right_now, minute(now()) as current_minute;
++-------------------------------+----------------+
+| right_now                     | current_minute |
++-------------------------------+----------------+
+| 2016-06-01 14:34:08.051702000 | 34             |
++-------------------------------+----------------+
+</code></pre>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="datetime_functions__minutes_add">
+          <code class="ph codeph">minutes_add(timestamp date, int minutes)</code>, <code class="ph codeph">minutes_add(timestamp date, bigint
+          minutes)</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Returns the specified date and time plus some number of minutes.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">timestamp</code>
+          </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+<pre class="pre codeblock"><code>
+select now() as right_now, minutes_add(now(), 90) as 90_minutes_from_now;
++-------------------------------+-------------------------------+
+| right_now                     | 90_minutes_from_now           |
++-------------------------------+-------------------------------+
+| 2016-06-01 14:36:04.887095000 | 2016-06-01 16:06:04.887095000 |
++-------------------------------+-------------------------------+
+</code></pre>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="datetime_functions__minutes_sub">
+          <code class="ph codeph">minutes_sub(timestamp date, int minutes)</code>, <code class="ph codeph">minutes_sub(timestamp date, bigint
+          minutes)</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Returns the specified date and time minus some number of minutes.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">timestamp</code>
+          </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+<pre class="pre codeblock"><code>
+select now() as right_now, minutes_sub(now(), 90) as 90_minutes_ago;
++-------------------------------+-------------------------------+
+| right_now                     | 90_minutes_ago                |
++-------------------------------+-------------------------------+
+| 2016-06-01 14:36:32.643061000 | 2016-06-01 13:06:32.643061000 |
++-------------------------------+-------------------------------+
+</code></pre>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="datetime_functions__month">
+          
+          <code class="ph codeph">month(timestamp date)</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Returns the month field, represented as an integer, from the date portion of a <code class="ph codeph">TIMESTAMP</code>.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">int</code>
+          </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+<pre class="pre codeblock"><code>
+select now() as right_now, month(now()) as current_month;
++-------------------------------+---------------+
+| right_now                     | current_month |
++-------------------------------+---------------+
+| 2016-06-01 14:43:37.141542000 | 6             |
++-------------------------------+---------------+
+</code></pre>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="datetime_functions__months_add">
+          <code class="ph codeph">months_add(timestamp date, int months)</code>, <code class="ph codeph">months_add(timestamp date, bigint
+          months)</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Returns the specified date and time plus some number of months.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">timestamp</code>
+          </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+          <p class="p">
+            The following example shows the effects of adding some number of
+            months to a <code class="ph codeph">TIMESTAMP</code> value, using both the
+            <code class="ph codeph">months_add()</code> function and its <code class="ph codeph">add_months()</code>
+            alias. These examples use <code class="ph codeph">trunc()</code> to strip off the time portion
+            and leave just the date.
+          </p>
+<pre class="pre codeblock"><code>
+with t1 as (select trunc(now(), 'dd') as today)
+  select today, months_add(today,1) as next_month from t1;
++---------------------+---------------------+
+| today               | next_month          |
++---------------------+---------------------+
+| 2016-05-19 00:00:00 | 2016-06-19 00:00:00 |
++---------------------+---------------------+
+
+with t1 as (select trunc(now(), 'dd') as today)
+  select today, add_months(today,1) as next_month from t1;
++---------------------+---------------------+
+| today               | next_month          |
++---------------------+---------------------+
+| 2016-05-19 00:00:00 | 2016-06-19 00:00:00 |
++---------------------+---------------------+
+</code></pre>
+          <p class="p">
+            The following examples show how if <code class="ph codeph">months_add()</code>
+            would return a nonexistent date, due to different months having
+            different numbers of days, the function returns a <code class="ph codeph">TIMESTAMP</code>
+            from the last day of the relevant month. For example, adding one month
+            to January 31 produces a date of February 29th in the year 2016 (a leap year),
+            and February 28th in the year 2015 (a non-leap year).
+          </p>
+<pre class="pre codeblock"><code>
+with t1 as (select cast('2016-01-31' as timestamp) as jan_31)
+  select jan_31, months_add(jan_31,1) as feb_31 from t1;
++---------------------+---------------------+
+| jan_31              | feb_31              |
++---------------------+---------------------+
+| 2016-01-31 00:00:00 | 2016-02-29 00:00:00 |
++---------------------+---------------------+
+
+with t1 as (select cast('2015-01-31' as timestamp) as jan_31)
+  select jan_31, months_add(jan_31,1) as feb_31 from t1;
++---------------------+---------------------+
+| jan_31              | feb_31              |
++---------------------+---------------------+
+| 2015-01-31 00:00:00 | 2015-02-28 00:00:00 |
++---------------------+---------------------+
+</code></pre>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="datetime_functions__months_between">
+          <code class="ph codeph">months_between(timestamp newer, timestamp older)</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Returns the number of months between the date portions of two <code class="ph codeph">TIMESTAMP</code> values.
+          Can include a fractional part representing extra days in addition to the full months
+          between the dates. The fractional component is computed by dividing the difference in days by 31 (regardless of the month).
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">double</code>
+          </p>
+          <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.3.0</span>
+      </p>
+          <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+          <p class="p">
+            Typically used in business contexts, for example to determine whether
+            a specified number of months have passed or whether some end-of-month deadline was reached.
+          </p>
+          <p class="p">
+            If the only consideration is the number of full months and any fractional value is
+            not significant, use <code class="ph codeph">int_months_between()</code> instead.
+          </p>
+          <p class="p">
+            The method of determining the number of elapsed months includes some special handling of
+            months with different numbers of days that creates edge cases for dates between the
+            28th and 31st days of certain months.
+          </p>
+          <p class="p">
+            If either value is <code class="ph codeph">NULL</code>, which could happen for example when converting a
+            nonexistent date string such as <code class="ph codeph">'2015-02-29'</code> to a <code class="ph codeph">TIMESTAMP</code>,
+            the result is also <code class="ph codeph">NULL</code>.
+          </p>
+          <p class="p">
+            If the first argument represents an earlier time than the second argument, the result is negative.
+          </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+          <p class="p">
+            The following examples show how dates that are on the same day of the month
+            are considered to be exactly N months apart, even if the months have different
+            numbers of days.
+          </p>
+<pre class="pre codeblock"><code>select months_between('2015-02-28', '2015-01-28');
++--------------------------------------------+
+| months_between('2015-02-28', '2015-01-28') |
++--------------------------------------------+
+| 1                                          |
++--------------------------------------------+
+
+select months_between(now(), now() + interval 1 month);
++-------------------------------------------------+
+| months_between(now(), now() + interval 1 month) |
++-------------------------------------------------+
+| -1                                              |
++-------------------------------------------------+
+
+select months_between(now() + interval 1 year, now());
++------------------------------------------------+
+| months_between(now() + interval 1 year, now()) |
++------------------------------------------------+
+| 12                                             |
++------------------------------------------------+
+</code></pre>
+          <p class="p">
+            The following examples show how dates that are on the last day of the month
+            are considered to be exactly N months apart, even if the months have different
+            numbers of days. For example, from January 28th to February 28th is exactly one
+            month because the day of the month is identical; January 31st to February 28th
+            is exactly one month because in both cases it is the last day of the month;
+            but January 29th or 30th to February 28th is considered a fractional month.
+          </p>
+<pre class="pre codeblock"><code>select months_between('2015-02-28', '2015-01-31');
++--------------------------------------------+
+| months_between('2015-02-28', '2015-01-31') |
++--------------------------------------------+
+| 1                                          |
++--------------------------------------------+
+
+select months_between('2015-02-28', '2015-01-29');
++--------------------------------------------+
+| months_between('2015-02-28', '2015-01-29') |
++--------------------------------------------+
+| 0.967741935483871                          |
++--------------------------------------------+
+
+select months_between('2015-02-28', '2015-01-30');;
++--------------------------------------------+
+| months_between('2015-02-28', '2015-01-30') |
++--------------------------------------------+
+| 0.935483870967742                          |
++--------------------------------------------+
+</code></pre>
+          <p class="p">
+            The following examples show how dates that are not a precise number
+            of months apart result in a fractional return value.
+          </p>
+<pre class="pre codeblock"><code>select months_between('2015-03-01', '2015-01-28');
++--------------------------------------------+
+| months_between('2015-03-01', '2015-01-28') |
++--------------------------------------------+
+| 1.129032258064516                          |
++--------------------------------------------+
+
+select months_between('2015-03-01', '2015-02-28');
++--------------------------------------------+
+| months_between('2015-03-01', '2015-02-28') |
++--------------------------------------------+
+| 0.1290322580645161                         |
++--------------------------------------------+
+
+select months_between('2015-06-02', '2015-05-29');
++--------------------------------------------+
+| months_between('2015-06-02', '2015-05-29') |
++--------------------------------------------+
+| 0.1290322580645161                         |
++--------------------------------------------+
+
+select months_between('2015-03-01', '2015-01-25');
++--------------------------------------------+
+| months_between('2015-03-01', '2015-01-25') |
++--------------------------------------------+
+| 1.225806451612903                          |
++--------------------------------------------+
+
+select months_between('2015-03-01', '2015-02-25');
++--------------------------------------------+
+| months_between('2015-03-01', '2015-02-25') |
++--------------------------------------------+
+| 0.2258064516129032                         |
++--------------------------------------------+
+
+select months_between('2015-02-28', '2015-02-01');
++--------------------------------------------+
+| months_between('2015-02-28', '2015-02-01') |
++--------------------------------------------+
+| 0.8709677419354839                         |
++--------------------------------------------+
+
+select months_between('2015-03-28', '2015-03-01');
++--------------------------------------------+
+| months_between('2015-03-28', '2015-03-01') |
++--------------------------------------------+
+| 0.8709677419354839                         |
++--------------------------------------------+
+</code></pre>
+          <p class="p">
+            The following examples show how the time portion of the <code class="ph codeph">TIMESTAMP</code>
+            values are irrelevant for calculating the month interval. Even the fractional part
+            of the result only depends on the number of full days between the argument values,
+            regardless of the time portion.
+          </p>
+<pre class="pre codeblock"><code>select months_between('2015-05-28 23:00:00', '2015-04-28 11:45:00');
++--------------------------------------------------------------+
+| months_between('2015-05-28 23:00:00', '2015-04-28 11:45:00') |
++--------------------------------------------------------------+
+| 1                                                            |
++--------------------------------------------------------------+
+
+select months_between('2015-03-28', '2015-03-01');
++--------------------------------------------+
+| months_between('2015-03-28', '2015-03-01') |
++--------------------------------------------+
+| 0.8709677419354839                         |
++--------------------------------------------+
+
+select months_between('2015-03-28 23:00:00', '2015-03-01 11:45:00');
++--------------------------------------------------------------+
+| months_between('2015-03-28 23:00:00', '2015-03-01 11:45:00') |
++--------------------------------------------------------------+
+| 0.8709677419354839                                           |
++--------------------------------------------------------------+
+</code></pre>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="datetime_functions__months_sub">
+          <code class="ph codeph">months_sub(timestamp date, int months)</code>, <code class="ph codeph">months_sub(timestamp date, bigint
+          months)</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Returns the specified date and time minus some number of months.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">timestamp</code>
+          </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+<pre class="pre codeblock"><code>
+with t1 as (select trunc(now(), 'dd') as today)
+  select today, months_sub(today,1) as last_month from t1;
++---------------------+---------------------+
+| today               | last_month          |
++---------------------+---------------------+
+| 2016-06-01 00:00:00 | 2016-05-01 00:00:00 |
++---------------------+---------------------+
+</code></pre>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="datetime_functions__nanoseconds_add">
+          <code class="ph codeph">nanoseconds_add(timestamp date, int nanoseconds)</code>, <code class="ph codeph">nanoseconds_add(timestamp
+          date, bigint nanoseconds)</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Returns the specified date and time plus some number of nanoseconds.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">timestamp</code>
+          </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+<pre class="pre codeblock"><code>
+select now() as right_now, nanoseconds_add(now(), 1) as 1_nanosecond_later;
++-------------------------------+-------------------------------+
+| right_now                     | 1_nanosecond_later            |
++-------------------------------+-------------------------------+
+| 2016-06-01 15:42:00.361026000 | 2016-06-01 15:42:00.361026001 |
++-------------------------------+-------------------------------+
+
+-- 1 billion nanoseconds = 1 second.
+select now() as right_now, nanoseconds_add(now(), 1e9) as 1_second_later;
++-------------------------------+-------------------------------+
+| right_now                     | 1_second_later                |
++-------------------------------+-------------------------------+
+| 2016-06-01 15:42:52.926706000 | 2016-06-01 15:42:53.926706000 |
++-------------------------------+-------------------------------+
+</code></pre>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="datetime_functions__nanoseconds_sub">
+          <code class="ph codeph">nanoseconds_sub(timestamp date, int nanoseconds)</code>, <code class="ph codeph">nanoseconds_sub(timestamp
+          date, bigint nanoseconds)</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Returns the specified date and time minus some number of nanoseconds.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">timestamp</code>
+          </p>
+<pre class="pre codeblock"><code>
+select now() as right_now, nanoseconds_sub(now(), 1) as 1_nanosecond_earlier;
++-------------------------------+-------------------------------+
+| right_now                     | 1_nanosecond_earlier          |
++-------------------------------+-------------------------------+
+| 2016-06-01 15:44:14.355837000 | 2016-06-01 15:44:14.355836999 |
++-------------------------------+-------------------------------+
+
+-- 1 billion nanoseconds = 1 second.
+select now() as right_now, nanoseconds_sub(now(), 1e9) as 1_second_earlier;
++-------------------------------+-------------------------------+
+| right_now                     | 1_second_earlier              |
++-------------------------------+-------------------------------+
+| 2016-06-01 15:44:54.474929000 | 2016-06-01 15:44:53.474929000 |
++-------------------------------+-------------------------------+
+</code></pre>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="datetime_functions__now">
+          <code class="ph codeph">now()</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Returns the current date and time (in the local time zone) as a
+          <code class="ph codeph">TIMESTAMP</code> value.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">timestamp</code>
+          </p>
+          <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+          <p class="p">
+            To find a date/time value in the future or the past relative to the current date
+            and time, add or subtract an <code class="ph codeph">INTERVAL</code> expression to the return value of
+            <code class="ph codeph">now()</code>. See <a class="xref" href="impala_timestamp.html#timestamp">TIMESTAMP Data Type</a> for examples.
+          </p>
+          <p class="p">
+            To produce a <code class="ph codeph">TIMESTAMP</code> representing the current date and time that can be
+            shared or stored without interoperability problems due to time zone differences, use the
+            <code class="ph codeph">to_utc_timestamp()</code> function and specify the time zone of the server.
+            When <code class="ph codeph">TIMESTAMP</code> data is stored in UTC form, any application that queries
+            those values can convert them to the appropriate local time zone by calling the inverse
+            function, <code class="ph codeph">from_utc_timestamp()</code>.
+          </p>
+          <p class="p">
+        To determine the time zone of the server you are connected to, in <span class="keyword">Impala 2.3</span> and
+        higher you can call the <code class="ph codeph">timeofday()</code> function, which includes the time zone
+        specifier in its return value. Remember that with cloud computing, the server you interact
+        with might be in a different time zone than you are, or different sessions might connect to
+        servers in different time zones, or a cluster might include servers in more than one time zone.
+      </p>
+          <p class="p">
+            Any references to the <code class="ph codeph">now()</code> function are evaluated at the start of a query.
+            All calls to <code class="ph codeph">now()</code> within the same query return the same value,
+            and the value does not depend on how long the query takes.
+          </p>
+
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+<pre class="pre codeblock"><code>
+select now() as 'Current time in California USA',
+  to_utc_timestamp(now(), 'PDT') as 'Current time in Greenwich UK';
++--------------------------------+-------------------------------+
+| current time in california usa | current time in greenwich uk  |
++--------------------------------+-------------------------------+
+| 2016-06-01 15:52:08.980072000  | 2016-06-01 22:52:08.980072000 |
++--------------------------------+-------------------------------+
+
+select now() as right_now,
+  now() + interval 1 day as tomorrow,
+  now() + interval 1 week - interval 3 hours as almost_a_week_from_now;
++-------------------------------+-------------------------------+-------------------------------+
+| right_now                     | tomorrow                      | almost_a_week_from_now        |
++-------------------------------+-------------------------------+-------------------------------+
+| 2016-06-01 15:55:39.671690000 | 2016-06-02 15:55:39.671690000 | 2016-06-08 12:55:39.671690000 |
++-------------------------------+-------------------------------+-------------------------------+
+</code></pre>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="datetime_functions__second">
+          <code class="ph codeph">second(timestamp date)</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Returns the second field from a <code class="ph codeph">TIMESTAMP</code> value.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">int</code>
+          </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+<pre class="pre codeblock"><code>
+select now() as right_now,
+  second(now()) as seconds_in_current_minute;
++-------------------------------+---------------------------+
+| right_now                     | seconds_in_current_minute |
++-------------------------------+---------------------------+
+| 2016-06-01 16:03:57.006603000 | 57                        |
++-------------------------------+---------------------------+
+</code></pre>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="datetime_functions__seconds_add">
+          <code class="ph codeph">seconds_add(timestamp date, int seconds)</code>, <code class="ph codeph">seconds_add(timestamp date, bigint
+          seconds)</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Returns the specified date and time plus some number of seconds.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">timestamp</code>
+          </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+<pre class="pre codeblock"><code>
+select now() as right_now,
+  seconds_add(now(), 10) as 10_seconds_from_now;
++-------------------------------+-------------------------------+
+| right_now                     | 10_seconds_from_now           |
++-------------------------------+-------------------------------+
+| 2016-06-01 16:05:21.573935000 | 2016-06-01 16:05:31.573935000 |
++-------------------------------+-------------------------------+
+</code></pre>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="datetime_functions__seconds_sub">
+          <code class="ph codeph">seconds_sub(timestamp date, int seconds)</code>, <code class="ph codeph">seconds_sub(timestamp date, bigint
+          seconds)</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Returns the specified date and time minus some number of seconds.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">timestamp</code>
+          </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+<pre class="pre codeblock"><code>
+select now() as right_now,
+  seconds_sub(now(), 10) as 10_seconds_ago;
++-------------------------------+-------------------------------+
+| right_now                     | 10_seconds_ago                |
++-------------------------------+-------------------------------+
+| 2016-06-01 16:06:03.467931000 | 2016-06-01 16:05:53.467931000 |
++-------------------------------+-------------------------------+
+</code></pre>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="datetime_functions__subdate">
+          <code class="ph codeph">subdate(timestamp startdate, int days)</code>, <code class="ph codeph">subdate(timestamp startdate, bigint
+          days)</code>,
+        <

<TRUNCATED>

[37/51] [partial] incubator-impala git commit: IMPALA-4181 [DOCS] Publish rendered Impala documentation to ASF site

Posted by jb...@apache.org.

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_ddl.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_ddl.html b/docs/build/html/topics/impala_ddl.html
new file mode 100644
index 0000000..d7cf482
--- /dev/null
+++ b/docs/build/html/topics/impala_ddl.html
@@ -0,0 +1,141 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_langref_sql.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="ddl"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>DDL Statements</title></head><body id="ddl"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">DDL Statements</h1>
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      DDL refers to <span class="q">"Data Definition Language"</span>, a subset of SQL statements that change the structure of the
+      database schema in some way, typically by creating, deleting, or modifying schema objects such as databases,
+      tables, and views. Most Impala DDL statements start with the keywords <code class="ph codeph">CREATE</code>,
+      <code class="ph codeph">DROP</code>, or <code class="ph codeph">ALTER</code>.
+    </p>
+
+    <p class="p">
+      The Impala DDL statements are:
+    </p>
+
+    <ul class="ul">
+      <li class="li">
+        <a class="xref" href="impala_alter_table.html#alter_table">ALTER TABLE Statement</a>
+      </li>
+
+      <li class="li">
+        <a class="xref" href="impala_alter_view.html#alter_view">ALTER VIEW Statement</a>
+      </li>
+
+      <li class="li">
+        <a class="xref" href="impala_compute_stats.html#compute_stats">COMPUTE STATS Statement</a>
+      </li>
+
+      <li class="li">
+        <a class="xref" href="impala_create_database.html#create_database">CREATE DATABASE Statement</a>
+      </li>
+
+      <li class="li">
+        <a class="xref" href="impala_create_function.html#create_function">CREATE FUNCTION Statement</a>
+      </li>
+
+      <li class="li">
+        <a class="xref" href="impala_create_role.html#create_role">CREATE ROLE Statement (Impala 2.0 or higher only)</a>
+      </li>
+
+      <li class="li">
+        <a class="xref" href="impala_create_table.html#create_table">CREATE TABLE Statement</a>
+      </li>
+
+      <li class="li">
+        <a class="xref" href="impala_create_view.html#create_view">CREATE VIEW Statement</a>
+      </li>
+
+      <li class="li">
+        <a class="xref" href="impala_drop_database.html#drop_database">DROP DATABASE Statement</a>
+      </li>
+
+      <li class="li">
+        <a class="xref" href="impala_drop_function.html#drop_function">DROP FUNCTION Statement</a>
+      </li>
+
+      <li class="li">
+        <a class="xref" href="impala_drop_role.html#drop_role">DROP ROLE Statement (Impala 2.0 or higher only)</a>
+      </li>
+
+      <li class="li">
+        <a class="xref" href="impala_drop_table.html#drop_table">DROP TABLE Statement</a>
+      </li>
+
+      <li class="li">
+        <a class="xref" href="impala_drop_view.html#drop_view">DROP VIEW Statement</a>
+      </li>
+
+      <li class="li">
+        <a class="xref" href="impala_grant.html#grant">GRANT Statement (Impala 2.0 or higher only)</a>
+      </li>
+
+      <li class="li">
+        <a class="xref" href="impala_revoke.html#revoke">REVOKE Statement (Impala 2.0 or higher only)</a>
+      </li>
+    </ul>
+
+    <p class="p">
+      After Impala executes a DDL command, information about available tables, columns, views, partitions, and so
+      on is automatically synchronized between all the Impala nodes in a cluster. (Prior to Impala 1.2, you had to
+      issue a <code class="ph codeph">REFRESH</code> or <code class="ph codeph">INVALIDATE METADATA</code> statement manually on the other
+      nodes to make them aware of the changes.)
+    </p>
+
+    <p class="p">
+      If the timing of metadata updates is significant, for example if you use round-robin scheduling where each
+      query could be issued through a different Impala node, you can enable the
+      <a class="xref" href="impala_sync_ddl.html#sync_ddl">SYNC_DDL</a> query option to make the DDL statement wait until
+      all nodes have been notified about the metadata changes.
+    </p>
+
+    <p class="p">
+      See <a class="xref" href="impala_s3.html#s3">Using Impala with the Amazon S3 Filesystem</a> for details about how Impala DDL statements interact with
+      tables and partitions stored in the Amazon S3 filesystem.
+    </p>
+
+    <p class="p">
+      Although the <code class="ph codeph">INSERT</code> statement is officially classified as a DML (data manipulation language)
+      statement, it also involves metadata changes that must be broadcast to all Impala nodes, and so is also
+      affected by the <code class="ph codeph">SYNC_DDL</code> query option.
+    </p>
+
+    <p class="p">
+      Because the <code class="ph codeph">SYNC_DDL</code> query option makes each DDL operation take longer than normal, you
+      might only enable it before the last DDL operation in a sequence. For example, if you are running a script
+      that issues multiple of DDL operations to set up an entire new schema, add several new partitions, and so on,
+      you might minimize the performance overhead by enabling the query option only before the last
+      <code class="ph codeph">CREATE</code>, <code class="ph codeph">DROP</code>, <code class="ph codeph">ALTER</code>, or <code class="ph codeph">INSERT</code> statement.
+      The script only finishes when all the relevant metadata changes are recognized by all the Impala nodes, so
+      you could connect to any node and issue queries through it.
+    </p>
+
+    <p class="p">
+      The classification of DDL, DML, and other statements is not necessarily the same between Impala and Hive.
+      Impala organizes these statements in a way intended to be familiar to people familiar with relational
+      databases or data warehouse products. Statements that modify the metastore database, such as <code class="ph codeph">COMPUTE
+      STATS</code>, are classified as DDL. Statements that only query the metastore database, such as
+      <code class="ph codeph">SHOW</code> or <code class="ph codeph">DESCRIBE</code>, are put into a separate category of utility statements.
+    </p>
+
+    <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+      The query types shown in the Impala debug web user interface might not match exactly the categories listed
+      here. For example, currently the <code class="ph codeph">USE</code> statement is shown as DDL in the debug web UI. The
+      query types shown in the debug web UI are subject to change, for improved consistency.
+    </div>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+    <p class="p">
+      The other major classifications of SQL statements are data manipulation language (see
+      <a class="xref" href="impala_dml.html#dml">DML Statements</a>) and queries (see <a class="xref" href="impala_select.html#select">SELECT Statement</a>).
+    </p>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_langref_sql.html">Impala SQL Statements</a></div></div></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_debug_action.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_debug_action.html b/docs/build/html/topics/impala_debug_action.html
new file mode 100644
index 0000000..ce4ef7c
--- /dev/null
+++ b/docs/build/html/topics/impala_debug_action.html
@@ -0,0 +1,24 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="debug_action"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>DEBUG_ACTION Query Option</title></head><body id="debug_action"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">DEBUG_ACTION Query Option</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      
+      Introduces artificial problem conditions within queries. For internal debugging and troubleshooting.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Type:</strong> <code class="ph codeph">STRING</code>
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Default:</strong> empty string
+    </p>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_decimal.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_decimal.html b/docs/build/html/topics/impala_decimal.html
new file mode 100644
index 0000000..8cec53e
--- /dev/null
+++ b/docs/build/html/topics/impala_decimal.html
@@ -0,0 +1,826 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_datatypes.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="decimal"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>DECIMAL Data Type (Impala 1.4 or higher only)</title></head><body id="decimal"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">DECIMAL Data Type (<span class="keyword">Impala 1.4</span> or higher only)</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      A numeric data type with fixed scale and precision, used in <code class="ph codeph">CREATE TABLE</code> and <code class="ph codeph">ALTER
+      TABLE</code> statements. Suitable for financial and other arithmetic calculations where the imprecise
+      representation and rounding behavior of <code class="ph codeph">FLOAT</code> and <code class="ph codeph">DOUBLE</code> make those types
+      impractical.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+    <p class="p">
+      In the column definition of a <code class="ph codeph">CREATE TABLE</code> statement:
+    </p>
+
+<pre class="pre codeblock"><code><var class="keyword varname">column_name</var> DECIMAL[(<var class="keyword varname">precision</var>[,<var class="keyword varname">scale</var>])]</code></pre>
+
+    <p class="p">
+      <code class="ph codeph">DECIMAL</code> with no precision or scale values is equivalent to <code class="ph codeph">DECIMAL(9,0)</code>.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Precision and Scale:</strong>
+    </p>
+
+    <p class="p">
+      <var class="keyword varname">precision</var> represents the total number of digits that can be represented by the column,
+      regardless of the location of the decimal point. This value must be between 1 and 38. For example,
+      representing integer values up to 9999, and floating-point values up to 99.99, both require a precision of 4.
+      You can also represent corresponding negative values, without any change in the precision. For example, the
+      range -9999 to 9999 still only requires a precision of 4.
+    </p>
+
+    <p class="p">
+      <var class="keyword varname">scale</var> represents the number of fractional digits. This value must be less than or equal to
+      <var class="keyword varname">precision</var>. A scale of 0 produces integral values, with no fractional part. If precision
+      and scale are equal, all the digits come after the decimal point, making all the values between 0 and
+      0.999... or 0 and -0.999...
+    </p>
+
+    <p class="p">
+      When <var class="keyword varname">precision</var> and <var class="keyword varname">scale</var> are omitted, a <code class="ph codeph">DECIMAL</code> value
+      is treated as <code class="ph codeph">DECIMAL(9,0)</code>, that is, an integer value ranging from
+      <code class="ph codeph">-999,999,999</code> to <code class="ph codeph">999,999,999</code>. This is the largest <code class="ph codeph">DECIMAL</code>
+      value that can still be represented in 4 bytes. If precision is specified but scale is omitted, Impala uses a
+      value of zero for the scale.
+    </p>
+
+    <p class="p">
+      Both <var class="keyword varname">precision</var> and <var class="keyword varname">scale</var> must be specified as integer literals, not any
+      other kind of constant expressions.
+    </p>
+
+    <p class="p">
+      To check the precision or scale for arbitrary values, you can call the
+      <a class="xref" href="impala_math_functions.html#math_functions"><code class="ph codeph">precision()</code> and
+      <code class="ph codeph">scale()</code> built-in functions</a>. For example, you might use these values to figure out how
+      many characters are required for various fields in a report, or to understand the rounding characteristics of
+      a formula as applied to a particular <code class="ph codeph">DECIMAL</code> column.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Range:</strong>
+    </p>
+
+    <p class="p">
+      The maximum precision value is 38. Thus, the largest integral value is represented by
+      <code class="ph codeph">DECIMAL(38,0)</code> (999... with 9 repeated 38 times). The most precise fractional value (between
+      0 and 1, or 0 and -1) is represented by <code class="ph codeph">DECIMAL(38,38)</code>, with 38 digits to the right of the
+      decimal point. The value closest to 0 would be .0000...1 (37 zeros and the final 1). The value closest to 1
+      would be .999... (9 repeated 38 times).
+    </p>
+
+    <p class="p">
+      For a given precision and scale, the range of <code class="ph codeph">DECIMAL</code> values is the same in the positive and
+      negative directions. For example, <code class="ph codeph">DECIMAL(4,2)</code> can represent from -99.99 to 99.99. This is
+      different from other integral numeric types where the positive and negative bounds differ slightly.
+    </p>
+
+    <p class="p">
+      When you use <code class="ph codeph">DECIMAL</code> values in arithmetic expressions, the precision and scale of the result
+      value are determined as follows:
+    </p>
+
+    <ul class="ul">
+      <li class="li">
+        <p class="p">
+          For addition and subtraction, the precision and scale are based on the maximum possible result, that is,
+          if all the digits of the input values were 9s and the absolute values were added together.
+        </p>
+
+
+      </li>
+
+      <li class="li">
+        <p class="p">
+          For multiplication, the precision is the sum of the precisions of the input values. The scale is the sum
+          of the scales of the input values.
+        </p>
+      </li>
+
+
+
+      <li class="li">
+        <p class="p">
+          For division, Impala sets the precision and scale to values large enough to represent the whole and
+          fractional parts of the result.
+        </p>
+      </li>
+
+      <li class="li">
+        <p class="p">
+          For <code class="ph codeph">UNION</code>, the scale is the larger of the scales of the input values, and the precision
+          is increased if necessary to accommodate any additional fractional digits. If the same input value has
+          the largest precision and the largest scale, the result value has the same precision and scale. If one
+          value has a larger precision but smaller scale, the scale of the result value is increased. For example,
+          <code class="ph codeph">DECIMAL(20,2) UNION DECIMAL(8,6)</code> produces a result of type
+          <code class="ph codeph">DECIMAL(24,6)</code>. The extra 4 fractional digits of scale (6-2) are accommodated by
+          extending the precision by the same amount (20+4).
+        </p>
+      </li>
+
+      <li class="li">
+        <p class="p">
+          To doublecheck, you can always call the <code class="ph codeph">PRECISION()</code> and <code class="ph codeph">SCALE()</code>
+          functions on the results of an arithmetic expression to see the relevant values, or use a <code class="ph codeph">CREATE
+          TABLE AS SELECT</code> statement to define a column based on the return type of the expression.
+        </p>
+      </li>
+    </ul>
+
+    <p class="p">
+        <strong class="ph b">Compatibility:</strong>
+      </p>
+
+    <ul class="ul">
+      <li class="li">
+        Using the <code class="ph codeph">DECIMAL</code> type is only supported under <span class="keyword">Impala 1.4</span> and higher.
+      </li>
+
+      <li class="li">
+        Use the <code class="ph codeph">DECIMAL</code> data type in Impala for applications where you used the
+        <code class="ph codeph">NUMBER</code> data type in Oracle. The Impala <code class="ph codeph">DECIMAL</code> type does not support the
+        Oracle idioms of <code class="ph codeph">*</code> for scale or negative values for precision.
+      </li>
+    </ul>
+
+    <p class="p">
+      <strong class="ph b">Conversions and casting:</strong>
+    </p>
+
+    <p class="p">
+      <span class="ph">Casting an integer or floating-point value <code class="ph codeph">N</code> to
+        <code class="ph codeph">TIMESTAMP</code> produces a value that is <code class="ph codeph">N</code> seconds past the start of the epoch
+        date (January 1, 1970). By default, the result value represents a date and time in the UTC time zone.
+        If the setting <code class="ph codeph">-use_local_tz_for_unix_timestamp_conversions=true</code> is in effect,
+        the resulting <code class="ph codeph">TIMESTAMP</code> represents a date and time in the local time zone.</span>
+    </p>
+
+    <p class="p">
+      Impala automatically converts between <code class="ph codeph">DECIMAL</code> and other numeric types where possible. A
+      <code class="ph codeph">DECIMAL</code> with zero scale is converted to or from the smallest appropriate integral type. A
+      <code class="ph codeph">DECIMAL</code> with a fractional part is automatically converted to or from the smallest
+      appropriate floating-point type. If the destination type does not have sufficient precision or scale to hold
+      all possible values of the source type, Impala raises an error and does not convert the value.
+    </p>
+
+    <p class="p">
+      For example, these statements show how expressions of <code class="ph codeph">DECIMAL</code> and other types are reconciled
+      to the same type in the context of <code class="ph codeph">UNION</code> queries and <code class="ph codeph">INSERT</code> statements:
+    </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; select cast(1 as int) as x union select cast(1.5 as decimal(9,4)) as x;
++----------------+
+| x              |
++----------------+
+| 1.5000         |
+| 1.0000         |
++----------------+
+[localhost:21000] &gt; create table int_vs_decimal as select cast(1 as int) as x union select cast(1.5 as decimal(9,4)) as x;
++-------------------+
+| summary           |
++-------------------+
+| Inserted 2 row(s) |
++-------------------+
+[localhost:21000] &gt; desc int_vs_decimal;
++------+---------------+---------+
+| name | type          | comment |
++------+---------------+---------+
+| x    | decimal(14,4) |         |
++------+---------------+---------+
+
+</code></pre>
+
+    <p class="p">
+      To avoid potential conversion errors, you can use <code class="ph codeph">CAST()</code> to convert <code class="ph codeph">DECIMAL</code>
+      values to <code class="ph codeph">FLOAT</code>, <code class="ph codeph">TINYINT</code>, <code class="ph codeph">SMALLINT</code>, <code class="ph codeph">INT</code>,
+      <code class="ph codeph">BIGINT</code>, <code class="ph codeph">STRING</code>, <code class="ph codeph">TIMESTAMP</code>, or <code class="ph codeph">BOOLEAN</code>.
+      You can use exponential notation in <code class="ph codeph">DECIMAL</code> literals or when casting from
+      <code class="ph codeph">STRING</code>, for example <code class="ph codeph">1.0e6</code> to represent one million.
+    </p>
+
+    <p class="p">
+      If you cast a value with more fractional digits than the scale of the destination type, any extra fractional
+      digits are truncated (not rounded). Casting a value to a target type with not enough precision produces a
+      result of <code class="ph codeph">NULL</code> and displays a runtime warning.
+    </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; select cast(1.239 as decimal(3,2));
++-----------------------------+
+| cast(1.239 as decimal(3,2)) |
++-----------------------------+
+| 1.23                        |
++-----------------------------+
+[localhost:21000] &gt; select cast(1234 as decimal(3));
++----------------------------+
+| cast(1234 as decimal(3,0)) |
++----------------------------+
+| NULL                       |
++----------------------------+
+WARNINGS: Expression overflowed, returning NULL
+
+</code></pre>
+
+    <p class="p">
+      When you specify integer literals, for example in <code class="ph codeph">INSERT ... VALUES</code> statements or arithmetic
+      expressions, those numbers are interpreted as the smallest applicable integer type. You must use
+      <code class="ph codeph">CAST()</code> calls for some combinations of integer literals and <code class="ph codeph">DECIMAL</code>
+      precision. For example, <code class="ph codeph">INT</code> has a maximum value that is 10 digits long,
+      <code class="ph codeph">TINYINT</code> has a maximum value that is 3 digits long, and so on. If you specify a value such as
+      123456 to go into a <code class="ph codeph">DECIMAL</code> column, Impala checks if the column has enough precision to
+      represent the largest value of that integer type, and raises an error if not. Therefore, use an expression
+      like <code class="ph codeph">CAST(123456 TO DECIMAL(9,0))</code> for <code class="ph codeph">DECIMAL</code> columns with precision 9 or
+      less, <code class="ph codeph">CAST(50 TO DECIMAL(2,0))</code> for <code class="ph codeph">DECIMAL</code> columns with precision 2 or
+      less, and so on. For <code class="ph codeph">DECIMAL</code> columns with precision 10 or greater, Impala automatically
+      interprets the value as the correct <code class="ph codeph">DECIMAL</code> type; however, because
+      <code class="ph codeph">DECIMAL(10)</code> requires 8 bytes of storage while <code class="ph codeph">DECIMAL(9)</code> requires only 4
+      bytes, only use precision of 10 or higher when actually needed.
+    </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; create table decimals_9_0 (x decimal);
+[localhost:21000] &gt; insert into decimals_9_0 values (1), (2), (4), (8), (16), (1024), (32768), (65536), (1000000);
+ERROR: AnalysisException: Possible loss of precision for target table 'decimal_testing.decimals_9_0'.
+Expression '1' (type: INT) would need to be cast to DECIMAL(9,0) for column 'x'
+[localhost:21000] &gt; insert into decimals_9_0 values (cast(1 as decimal)), (cast(2 as decimal)), (cast(4 as decimal)), (cast(8 as decimal)), (cast(16 as decimal)), (cast(1024 as decimal)), (cast(32768 as decimal)), (cast(65536 as decimal)), (cast(1000000 as decimal));
+
+[localhost:21000] &gt; create table decimals_10_0 (x decimal(10,0));
+[localhost:21000] &gt; insert into decimals_10_0 values (1), (2), (4), (8), (16), (1024), (32768), (65536), (1000000);
+
+</code></pre>
+
+    <p class="p">
+      Be aware that in memory and for binary file formats such as Parquet or Avro, <code class="ph codeph">DECIMAL(10)</code> or
+      higher consumes 8 bytes while <code class="ph codeph">DECIMAL(9)</code> (the default for <code class="ph codeph">DECIMAL</code>) or lower
+      consumes 4 bytes. Therefore, to conserve space in large tables, use the smallest-precision
+      <code class="ph codeph">DECIMAL</code> type that is appropriate and <code class="ph codeph">CAST()</code> literal values where necessary,
+      rather than declaring <code class="ph codeph">DECIMAL</code> columns with high precision for convenience.
+    </p>
+
+    <p class="p">
+      To represent a very large or precise <code class="ph codeph">DECIMAL</code> value as a literal, for example one that
+      contains more digits than can be represented by a <code class="ph codeph">BIGINT</code> literal, use a quoted string or a
+      floating-point value for the number, and <code class="ph codeph">CAST()</code> to the desired <code class="ph codeph">DECIMAL</code>
+      type:
+    </p>
+
+<pre class="pre codeblock"><code>insert into decimals_38_5 values (1), (2), (4), (8), (16), (1024), (32768), (65536), (1000000),
+  (cast("999999999999999999999999999999" as decimal(38,5))),
+  (cast(999999999999999999999999999999. as decimal(38,5)));
+</code></pre>
+
+    <ul class="ul">
+      <li class="li">
+        <p class="p"> The result of the <code class="ph codeph">SUM()</code> aggregate function on
+            <code class="ph codeph">DECIMAL</code> values is promoted to a precision of 38,
+          with the same precision as the underlying column. Thus, the result can
+          represent the largest possible value at that particular precision. </p>
+      </li>
+
+      <li class="li">
+        <p class="p">
+          <code class="ph codeph">STRING</code> columns, literals, or expressions can be converted to <code class="ph codeph">DECIMAL</code> as
+          long as the overall number of digits and digits to the right of the decimal point fit within the
+          specified precision and scale for the declared <code class="ph codeph">DECIMAL</code> type. By default, a
+          <code class="ph codeph">DECIMAL</code> value with no specified scale or precision can hold a maximum of 9 digits of an
+          integer value. If there are more digits in the string value than are allowed by the
+          <code class="ph codeph">DECIMAL</code> scale and precision, the result is <code class="ph codeph">NULL</code>.
+        </p>
+        <p class="p">
+          The following examples demonstrate how <code class="ph codeph">STRING</code> values with integer and fractional parts
+          are represented when converted to <code class="ph codeph">DECIMAL</code>. If the scale is 0, the number is treated
+          as an integer value with a maximum of <var class="keyword varname">precision</var> digits. If the precision is greater than
+          0, the scale must be increased to account for the digits both to the left and right of the decimal point.
+          As the precision increases, output values are printed with additional trailing zeros after the decimal
+          point if needed. Any trailing zeros after the decimal point in the <code class="ph codeph">STRING</code> value must fit
+          within the number of digits specified by the precision.
+        </p>
+<pre class="pre codeblock"><code>[localhost:21000] &gt; select cast('100' as decimal); -- Small integer value fits within 9 digits of scale.
++-----------------------------+
+| cast('100' as decimal(9,0)) |
++-----------------------------+
+| 100                         |
++-----------------------------+
+[localhost:21000] &gt; select cast('100' as decimal(3,0)); -- Small integer value fits within 3 digits of scale.
++-----------------------------+
+| cast('100' as decimal(3,0)) |
++-----------------------------+
+| 100                         |
++-----------------------------+
+[localhost:21000] &gt; select cast('100' as decimal(2,0)); -- 2 digits of scale is not enough!
++-----------------------------+
+| cast('100' as decimal(2,0)) |
++-----------------------------+
+| NULL                        |
++-----------------------------+
+[localhost:21000] &gt; select cast('100' as decimal(3,1)); -- (3,1) = 2 digits left of the decimal point, 1 to the right. Not enough.
++-----------------------------+
+| cast('100' as decimal(3,1)) |
++-----------------------------+
+| NULL                        |
++-----------------------------+
+[localhost:21000] &gt; select cast('100' as decimal(4,1)); -- 4 digits total, 1 to the right of the decimal point.
++-----------------------------+
+| cast('100' as decimal(4,1)) |
++-----------------------------+
+| 100.0                       |
++-----------------------------+
+[localhost:21000] &gt; select cast('98.6' as decimal(3,1)); -- (3,1) can hold a 3 digit number with 1 fractional digit.
++------------------------------+
+| cast('98.6' as decimal(3,1)) |
++------------------------------+
+| 98.6                         |
++------------------------------+
+[localhost:21000] &gt; select cast('98.6' as decimal(15,1)); -- Larger scale allows bigger numbers but still only 1 fractional digit.
++-------------------------------+
+| cast('98.6' as decimal(15,1)) |
++-------------------------------+
+| 98.6                          |
++-------------------------------+
+[localhost:21000] &gt; select cast('98.6' as decimal(15,5)); -- Larger precision allows more fractional digits, outputs trailing zeros.
++-------------------------------+
+| cast('98.6' as decimal(15,5)) |
++-------------------------------+
+| 98.60000                      |
++-------------------------------+
+[localhost:21000] &gt; select cast('98.60000' as decimal(15,1)); -- Trailing zeros in the string must fit within 'scale' digits (1 in this case).
++-----------------------------------+
+| cast('98.60000' as decimal(15,1)) |
++-----------------------------------+
+| NULL                              |
++-----------------------------------+
+
+</code></pre>
+      </li>
+
+      <li class="li">
+        Most built-in arithmetic functions such as <code class="ph codeph">SIN()</code> and <code class="ph codeph">COS()</code> continue to
+        accept only <code class="ph codeph">DOUBLE</code> values because they are so commonly used in scientific context for
+        calculations of IEEE 954-compliant values. The built-in functions that accept and return
+        <code class="ph codeph">DECIMAL</code> are:
+
+
+        <ul class="ul">
+          <li class="li">
+            <code class="ph codeph">ABS()</code>
+          </li>
+
+          <li class="li">
+            <code class="ph codeph">CEIL()</code>
+          </li>
+
+          <li class="li">
+            <code class="ph codeph">COALESCE()</code>
+          </li>
+
+          <li class="li">
+            <code class="ph codeph">FLOOR()</code>
+          </li>
+
+          <li class="li">
+            <code class="ph codeph">FNV_HASH()</code>
+          </li>
+
+          <li class="li">
+            <code class="ph codeph">GREATEST()</code>
+          </li>
+
+          <li class="li">
+            <code class="ph codeph">IF()</code>
+          </li>
+
+          <li class="li">
+            <code class="ph codeph">ISNULL()</code>
+          </li>
+
+          <li class="li">
+            <code class="ph codeph">LEAST()</code>
+          </li>
+
+          <li class="li">
+            <code class="ph codeph">NEGATIVE()</code>
+          </li>
+
+          <li class="li">
+            <code class="ph codeph">NULLIF()</code>
+          </li>
+
+          <li class="li">
+            <code class="ph codeph">POSITIVE()</code>
+          </li>
+
+          <li class="li">
+            <code class="ph codeph">PRECISION()</code>
+          </li>
+
+          <li class="li">
+            <code class="ph codeph">ROUND()</code>
+          </li>
+
+          <li class="li">
+            <code class="ph codeph">SCALE()</code>
+          </li>
+
+          <li class="li">
+            <code class="ph codeph">TRUNCATE()</code>
+          </li>
+
+          <li class="li">
+            <code class="ph codeph">ZEROIFNULL()</code>
+          </li>
+        </ul>
+        See <a class="xref" href="impala_functions.html#builtins">Impala Built-In Functions</a> for details.
+      </li>
+
+      <li class="li">
+        <p class="p">
+          <code class="ph codeph">BIGINT</code>, <code class="ph codeph">INT</code>, <code class="ph codeph">SMALLINT</code>, and <code class="ph codeph">TINYINT</code>
+          values can all be cast to <code class="ph codeph">DECIMAL</code>. The number of digits to the left of the decimal point
+          in the <code class="ph codeph">DECIMAL</code> type must be sufficient to hold the largest value of the corresponding
+          integer type. Note that integer literals are treated as the smallest appropriate integer type, meaning
+          there is sometimes a range of values that require one more digit of <code class="ph codeph">DECIMAL</code> scale than
+          you might expect. For integer values, the precision of the <code class="ph codeph">DECIMAL</code> type can be zero; if
+          the precision is greater than zero, remember to increase the scale value by an equivalent amount to hold
+          the required number of digits to the left of the decimal point.
+        </p>
+        <p class="p">
+          The following examples show how different integer types are converted to <code class="ph codeph">DECIMAL</code>.
+        </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; select cast(1 as decimal(1,0));
++-------------------------+
+| cast(1 as decimal(1,0)) |
++-------------------------+
+| 1                       |
++-------------------------+
+[localhost:21000] &gt; select cast(9 as decimal(1,0));
++-------------------------+
+| cast(9 as decimal(1,0)) |
++-------------------------+
+| 9                       |
++-------------------------+
+[localhost:21000] &gt; select cast(10 as decimal(1,0));
++--------------------------+
+| cast(10 as decimal(1,0)) |
++--------------------------+
+| 10                       |
++--------------------------+
+[localhost:21000] &gt; select cast(10 as decimal(1,1));
++--------------------------+
+| cast(10 as decimal(1,1)) |
++--------------------------+
+| 10.0                     |
++--------------------------+
+[localhost:21000] &gt; select cast(100 as decimal(1,1));
++---------------------------+
+| cast(100 as decimal(1,1)) |
++---------------------------+
+| 100.0                     |
++---------------------------+
+[localhost:21000] &gt; select cast(1000 as decimal(1,1));
++----------------------------+
+| cast(1000 as decimal(1,1)) |
++----------------------------+
+| 1000.0                     |
++----------------------------+
+
+</code></pre>
+      </li>
+
+      <li class="li">
+        <p class="p">
+          When a <code class="ph codeph">DECIMAL</code> value is converted to any of the integer types, any fractional part is
+          truncated (that is, rounded towards zero):
+        </p>
+<pre class="pre codeblock"><code>[localhost:21000] &gt; create table num_dec_days (x decimal(4,1));
+[localhost:21000] &gt; insert into num_dec_days values (1), (2), (cast(4.5 as decimal(4,1)));
+[localhost:21000] &gt; insert into num_dec_days values (cast(0.1 as decimal(4,1))), (cast(.9 as decimal(4,1))), (cast(9.1 as decimal(4,1))), (cast(9.9 as decimal(4,1)));
+[localhost:21000] &gt; select cast(x as int) from num_dec_days;
++----------------+
+| cast(x as int) |
++----------------+
+| 1              |
+| 2              |
+| 4              |
+| 0              |
+| 0              |
+| 9              |
+| 9              |
++----------------+
+
+</code></pre>
+      </li>
+
+      <li class="li">
+        <p class="p">
+          You cannot directly cast <code class="ph codeph">TIMESTAMP</code> or <code class="ph codeph">BOOLEAN</code> values to or from
+          <code class="ph codeph">DECIMAL</code> values. You can turn a <code class="ph codeph">DECIMAL</code> value into a time-related
+          representation using a two-step process, by converting it to an integer value and then using that result
+          in a call to a date and time function such as <code class="ph codeph">from_unixtime()</code>.
+        </p>
+<pre class="pre codeblock"><code>[localhost:21000] &gt; select from_unixtime(cast(cast(1000.0 as decimal) as bigint));
++-------------------------------------------------------------+
+| from_unixtime(cast(cast(1000.0 as decimal(9,0)) as bigint)) |
++-------------------------------------------------------------+
+| 1970-01-01 00:16:40                                         |
++-------------------------------------------------------------+
+[localhost:21000] &gt; select now() + interval cast(x as int) days from num_dec_days; -- x is a DECIMAL column.
+
+[localhost:21000] &gt; create table num_dec_days (x decimal(4,1));
+[localhost:21000] &gt; insert into num_dec_days values (1), (2), (cast(4.5 as decimal(4,1)));
+[localhost:21000] &gt; select now() + interval cast(x as int) days from num_dec_days; -- The 4.5 value is truncated to 4 and becomes '4 days'.
++--------------------------------------+
+| now() + interval cast(x as int) days |
++--------------------------------------+
+| 2014-05-13 23:11:55.163284000        |
+| 2014-05-14 23:11:55.163284000        |
+| 2014-05-16 23:11:55.163284000        |
++--------------------------------------+
+
+</code></pre>
+      </li>
+
+      <li class="li">
+        <p class="p">
+          Because values in <code class="ph codeph">INSERT</code> statements are checked rigorously for type compatibility, be
+          prepared to use <code class="ph codeph">CAST()</code> function calls around literals, column references, or other
+          expressions that you are inserting into a <code class="ph codeph">DECIMAL</code> column.
+        </p>
+      </li>
+    </ul>
+
+    <p class="p">
+        <strong class="ph b">NULL considerations:</strong> Casting any non-numeric value to this type produces a <code class="ph codeph">NULL</code>
+        value.
+      </p>
+
+    <p class="p">
+      <strong class="ph b">DECIMAL differences from integer and floating-point types:</strong>
+    </p>
+
+    <p class="p">
+      With the <code class="ph codeph">DECIMAL</code> type, you are concerned with the number of overall digits of a number
+      rather than powers of 2 (as in <code class="ph codeph">TINYINT</code>, <code class="ph codeph">SMALLINT</code>, and so on). Therefore,
+      the limits with integral values of <code class="ph codeph">DECIMAL</code> types fall around 99, 999, 9999, and so on rather
+      than 32767, 65535, 2
+      <sup class="ph sup">32</sup>
+      -1, and so on. For fractional values, you do not need to account for imprecise representation of the
+      fractional part according to the IEEE-954 standard (as in <code class="ph codeph">FLOAT</code> and
+      <code class="ph codeph">DOUBLE</code>). Therefore, when you insert a fractional value into a <code class="ph codeph">DECIMAL</code>
+      column, you can compare, sum, query, <code class="ph codeph">GROUP BY</code>, and so on that column and get back the
+      original values rather than some <span class="q">"close but not identical"</span> value.
+    </p>
+
+    <p class="p">
+      <code class="ph codeph">FLOAT</code> and <code class="ph codeph">DOUBLE</code> can cause problems or unexpected behavior due to inability
+      to precisely represent certain fractional values, for example dollar and cents values for currency. You might
+      find output values slightly different than you inserted, equality tests that do not match precisely, or
+      unexpected values for <code class="ph codeph">GROUP BY</code> columns. <code class="ph codeph">DECIMAL</code> can help reduce unexpected
+      behavior and rounding errors, at the expense of some performance overhead for assignments and comparisons.
+    </p>
+
+    <div class="p">
+      <strong class="ph b">Literals and expressions:</strong>
+      <ul class="ul">
+        <li class="li">
+          <p class="p">
+            When you use an integer literal such as <code class="ph codeph">1</code> or <code class="ph codeph">999</code> in a SQL statement,
+            depending on the context, Impala will treat it as either the smallest appropriate
+            <code class="ph codeph">DECIMAL</code> type, or the smallest integer type (<code class="ph codeph">TINYINT</code>,
+            <code class="ph codeph">SMALLINT</code>, <code class="ph codeph">INT</code>, or <code class="ph codeph">BIGINT</code>). To minimize memory usage,
+            Impala prefers to treat the literal as the smallest appropriate integer type.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            When you use a floating-point literal such as <code class="ph codeph">1.1</code> or <code class="ph codeph">999.44</code> in a SQL
+            statement, depending on the context, Impala will treat it as either the smallest appropriate
+            <code class="ph codeph">DECIMAL</code> type, or the smallest floating-point type (<code class="ph codeph">FLOAT</code> or
+            <code class="ph codeph">DOUBLE</code>). To avoid loss of accuracy, Impala prefers to treat the literal as a
+            <code class="ph codeph">DECIMAL</code>.
+          </p>
+        </li>
+      </ul>
+    </div>
+
+    <p class="p">
+      <strong class="ph b">Storage considerations:</strong>
+    </p>
+
+    <ul class="ul">
+      <li class="li">
+        Only the precision determines the storage size for <code class="ph codeph">DECIMAL</code> values; the scale setting has
+        no effect on the storage size.
+      </li>
+
+      <li class="li">
+        Text, RCFile, and SequenceFile tables all use ASCII-based formats. In these text-based file formats,
+        leading zeros are not stored, but trailing zeros are stored. In these tables, each <code class="ph codeph">DECIMAL</code>
+        value takes up as many bytes as there are digits in the value, plus an extra byte if the decimal point is
+        present and an extra byte for negative values. Once the values are loaded into memory, they are represented
+        in 4, 8, or 16 bytes as described in the following list items. The on-disk representation varies depending
+        on the file format of the table.
+      </li>
+
+
+
+      <li class="li">
+        Parquet and Avro tables use binary formats, In these tables, Impala stores each value in as few bytes as
+        possible
+
+        depending on the precision specified for the <code class="ph codeph">DECIMAL</code> column.
+        <ul class="ul">
+          <li class="li">
+            In memory, <code class="ph codeph">DECIMAL</code> values with precision of 9 or less are stored in 4 bytes.
+          </li>
+
+          <li class="li">
+            In memory, <code class="ph codeph">DECIMAL</code> values with precision of 10 through 18 are stored in 8 bytes.
+          </li>
+
+          <li class="li">
+            In memory, <code class="ph codeph">DECIMAL</code> values with precision greater than 18 are stored in 16 bytes.
+          </li>
+        </ul>
+      </li>
+    </ul>
+
+    <p class="p">
+        <strong class="ph b">File format considerations:</strong>
+      </p>
+
+    <ul class="ul">
+      <li class="li">
+        The <code class="ph codeph">DECIMAL</code> data type can be stored in any of the file formats supported by Impala, as
+        described in <a class="xref" href="impala_file_formats.html#file_formats">How Impala Works with Hadoop File Formats</a>. Impala only writes to tables that use the
+        Parquet and text formats, so those formats are the focus for file format compatibility.
+      </li>
+
+      <li class="li">
+        Impala can query Avro, RCFile, or SequenceFile tables containing <code class="ph codeph">DECIMAL</code> columns, created
+        by other Hadoop components.
+      </li>
+
+      <li class="li">
+        You can use <code class="ph codeph">DECIMAL</code> columns in Impala tables that are mapped to HBase tables. Impala can
+        query and insert into such tables.
+      </li>
+
+      <li class="li">
+        Text, RCFile, and SequenceFile tables all use ASCII-based formats. In these tables, each
+        <code class="ph codeph">DECIMAL</code> value takes up as many bytes as there are digits in the value, plus an extra byte
+        if the decimal point is present. The binary format of Parquet or Avro files offers more compact storage for
+        <code class="ph codeph">DECIMAL</code> columns.
+      </li>
+
+      <li class="li">
+        Parquet and Avro tables use binary formats, In these tables, Impala stores each value in 4, 8, or 16 bytes
+        depending on the precision specified for the <code class="ph codeph">DECIMAL</code> column.
+      </li>
+
+    </ul>
+
+    <p class="p">
+      <strong class="ph b">UDF considerations:</strong> When writing a C++ UDF, use the <code class="ph codeph">DecimalVal</code> data type defined in
+      <span class="ph filepath">/usr/include/impala_udf/udf.h</span>.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Partitioning:</strong>
+      </p>
+
+    <p class="p">
+      You can use a <code class="ph codeph">DECIMAL</code> column as a partition key. Doing so provides a better match between
+      the partition key values and the HDFS directory names than using a <code class="ph codeph">DOUBLE</code> or
+      <code class="ph codeph">FLOAT</code> partitioning column:
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Schema evolution considerations:</strong>
+      </p>
+
+    <ul class="ul">
+      <li class="li">
+        For text-based formats (text, RCFile, and SequenceFile tables), you can issue an <code class="ph codeph">ALTER TABLE ...
+        REPLACE COLUMNS</code> statement to change the precision and scale of an existing
+        <code class="ph codeph">DECIMAL</code> column. As long as the values in the column fit within the new precision and
+        scale, they are returned correctly by a query. Any values that do not fit within the new precision and
+        scale are returned as <code class="ph codeph">NULL</code>, and Impala reports the conversion error. Leading zeros do not
+        count against the precision value, but trailing zeros after the decimal point do.
+<pre class="pre codeblock"><code>[localhost:21000] &gt; create table text_decimals (x string);
+[localhost:21000] &gt; insert into text_decimals values ("1"), ("2"), ("99.99"), ("1.234"), ("000001"), ("1.000000000");
+[localhost:21000] &gt; select * from text_decimals;
++-------------+
+| x           |
++-------------+
+| 1           |
+| 2           |
+| 99.99       |
+| 1.234       |
+| 000001      |
+| 1.000000000 |
++-------------+
+[localhost:21000] &gt; alter table text_decimals replace columns (x decimal(4,2));
+[localhost:21000] &gt; select * from text_decimals;
++-------+
+| x     |
++-------+
+| 1.00  |
+| 2.00  |
+| 99.99 |
+| NULL  |
+| 1.00  |
+| NULL  |
++-------+
+ERRORS:
+Backend 0:Error converting column: 0 TO DECIMAL(4, 2) (Data is: 1.234)
+file: hdfs://127.0.0.1:8020/user/hive/warehouse/decimal_testing.db/text_decimals/634d4bd3aa0
+e8420-b4b13bab7f1be787_56794587_data.0
+record: 1.234
+Error converting column: 0 TO DECIMAL(4, 2) (Data is: 1.000000000)
+file: hdfs://127.0.0.1:8020/user/hive/warehouse/decimal_testing.db/text_decimals/cd40dc68e20
+c565a-cc4bd86c724c96ba_311873428_data.0
+record: 1.000000000
+
+</code></pre>
+      </li>
+
+      <li class="li">
+        For binary formats (Parquet and Avro tables), although an <code class="ph codeph">ALTER TABLE ... REPLACE COLUMNS</code>
+        statement that changes the precision or scale of a <code class="ph codeph">DECIMAL</code> column succeeds, any subsequent
+        attempt to query the changed column results in a fatal error. (The other columns can still be queried
+        successfully.) This is because the metadata about the columns is stored in the data files themselves, and
+        <code class="ph codeph">ALTER TABLE</code> does not actually make any updates to the data files. If the metadata in the
+        data files disagrees with the metadata in the metastore database, Impala cancels the query.
+      </li>
+    </ul>
+
+    <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>CREATE TABLE t1 (x DECIMAL, y DECIMAL(5,2), z DECIMAL(25,0));
+INSERT INTO t1 VALUES (5, 99.44, 123456), (300, 6.7, 999999999);
+SELECT x+y, ROUND(y,1), z/98.6 FROM t1;
+SELECT CAST(1000.5 AS DECIMAL);
+</code></pre>
+
+
+
+    <p class="p">
+        <strong class="ph b">HBase considerations:</strong> This data type is fully compatible with HBase tables.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Parquet considerations:</strong> This type is fully compatible with Parquet tables.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Text table considerations:</strong> Values of this type are potentially larger in text tables than in tables
+        using Parquet or other binary formats.
+      </p>
+
+
+
+
+
+
+
+    <p class="p">
+        <strong class="ph b">Column statistics considerations:</strong> Because this type has a fixed size, the maximum and average size
+        fields are always filled in for column statistics, even before you run the <code class="ph codeph">COMPUTE STATS</code>
+        statement.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Kudu considerations:</strong>
+      </p>
+    <p class="p">
+        Currently, the data types <code class="ph codeph">DECIMAL</code>, <code class="ph codeph">TIMESTAMP</code>, <code class="ph codeph">CHAR</code>, <code class="ph codeph">VARCHAR</code>,
+        <code class="ph codeph">ARRAY</code>, <code class="ph codeph">MAP</code>, and <code class="ph codeph">STRUCT</code> cannot be used with Kudu tables.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+    <p class="p">
+      <a class="xref" href="impala_literals.html#numeric_literals">Numeric Literals</a>, <a class="xref" href="impala_tinyint.html#tinyint">TINYINT Data Type</a>,
+      <a class="xref" href="impala_smallint.html#smallint">SMALLINT Data Type</a>, <a class="xref" href="impala_int.html#int">INT Data Type</a>,
+      <a class="xref" href="impala_bigint.html#bigint">BIGINT Data Type</a>, <a class="xref" href="impala_decimal.html#decimal">DECIMAL Data Type (Impala 1.4 or higher only)</a>,
+      <a class="xref" href="impala_math_functions.html#math_functions">Impala Mathematical Functions</a> (especially <code class="ph codeph">PRECISION()</code> and
+      <code class="ph codeph">SCALE()</code>)
+    </p>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_datatypes.html">Data Types</a></div></div></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_default_order_by_limit.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_default_order_by_limit.html b/docs/build/html/topics/impala_default_order_by_limit.html
new file mode 100644
index 0000000..334253f
--- /dev/null
+++ b/docs/build/html/topics/impala_default_order_by_limit.html
@@ -0,0 +1,33 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="default_order_by_limit"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>DEFAULT_ORDER_BY_LIMIT Query Option</title></head><body id="default_order_by_limit"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">DEFAULT_ORDER_BY_LIMIT Query Option</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+        Now that the <code class="ph codeph">ORDER BY</code> clause no longer requires an accompanying <code class="ph codeph">LIMIT</code>
+        clause in Impala 1.4.0 and higher, this query option is deprecated and has no effect.
+      </p>
+
+    <p class="p">
+      Prior to Impala 1.4.0, Impala queries that use the <code class="ph codeph"><a class="xref" href="impala_order_by.html#order_by">ORDER
+      BY</a></code> clause must also include a
+      <code class="ph codeph"><a class="xref" href="impala_limit.html#limit">LIMIT</a></code> clause, to avoid accidentally producing
+      huge result sets that must be sorted. Sorting a huge result set is a memory-intensive operation. In Impala
+      1.4.0 and higher, Impala uses a temporary disk work area to perform the sort if that operation would
+      otherwise exceed the Impala memory limit on a particular host.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Type: numeric</strong>
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Default:</strong> -1 (no default limit)
+    </p>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_delegation.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_delegation.html b/docs/build/html/topics/impala_delegation.html
new file mode 100644
index 0000000..98fb545
--- /dev/null
+++ b/docs/build/html/topics/impala_delegation.html
@@ -0,0 +1,70 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_authentication.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="delegation"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Configuring Impala Delegation for Hue and BI Tools</title></head><body id="delegation"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Configuring Impala Delegation for Hue and BI Tools</h1>
+
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+
+      When users submit Impala queries through a separate application, such as Hue or a business intelligence tool,
+      typically all requests are treated as coming from the same user. In Impala 1.2 and higher, authentication is
+      extended by a new feature that allows applications to pass along credentials for the users that connect to
+      them (known as <span class="q">"delegation"</span>), and issue Impala queries with the privileges for those users. Currently,
+      the delegation feature is available only for Impala queries submitted through application interfaces such as
+      Hue and BI tools; for example, Impala cannot issue queries using the privileges of the HDFS user.
+    </p>
+
+    <p class="p">
+      The delegation feature is enabled by a startup option for <span class="keyword cmdname">impalad</span>:
+      <code class="ph codeph">--authorized_proxy_user_config</code>. When you specify this option, users whose names you specify
+      (such as <code class="ph codeph">hue</code>) can delegate the execution of a query to another user. The query runs with the
+      privileges of the delegated user, not the original user such as <code class="ph codeph">hue</code>. The name of the
+      delegated user is passed using the HiveServer2 configuration property <code class="ph codeph">impala.doas.user</code>.
+    </p>
+
+    <p class="p">
+      You can specify a list of users that the application user can delegate to, or <code class="ph codeph">*</code> to allow a
+      superuser to delegate to any other user. For example:
+    </p>
+
+<pre class="pre codeblock"><code>impalad --authorized_proxy_user_config 'hue=user1,user2;admin=*' ...</code></pre>
+
+    <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+      Make sure to use single quotes or escape characters to ensure that any <code class="ph codeph">*</code> characters do not
+      undergo wildcard expansion when specified in command-line arguments.
+    </div>
+
+    <p class="p">
+      See <a class="xref" href="impala_config_options.html#config_options">Modifying Impala Startup Options</a> for details about adding or changing
+      <span class="keyword cmdname">impalad</span> startup options. See
+      <a class="xref" href="http://blog.cloudera.com/blog/2013/07/how-hiveserver2-brings-security-and-concurrency-to-apache-hive/" target="_blank">this
+      blog post</a> for background information about the delegation capability in HiveServer2.
+    </p>
+    <p class="p">
+      To set up authentication for the delegated users:
+    </p>
+
+    <ul class="ul">
+      <li class="li">
+        <p class="p">
+          On the server side, configure either user/password authentication through LDAP, or Kerberos
+          authentication, for all the delegated users. See <a class="xref" href="impala_ldap.html#ldap">Enabling LDAP Authentication for Impala</a> or
+          <a class="xref" href="impala_kerberos.html#kerberos">Enabling Kerberos Authentication for Impala</a> for details.
+        </p>
+      </li>
+
+      <li class="li">
+        <p class="p">
+          On the client side, to learn how to enable delegation, consult the documentation
+          for the ODBC driver you are using.
+        </p>
+      </li>
+    </ul>
+
+  </div>
+
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_authentication.html">Impala Authentication</a></div></div></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_delete.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_delete.html b/docs/build/html/topics/impala_delete.html
new file mode 100644
index 0000000..3f95fa5
--- /dev/null
+++ b/docs/build/html/topics/impala_delete.html
@@ -0,0 +1,177 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_langref_sql.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="delete"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>DELETE Statement (Impala 2.8 or higher only)</title></head><body id="delete"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">DELETE Statement (<span class="keyword">Impala 2.8</span> or higher only)</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      
+      Deletes an arbitrary number of rows from a Kudu table.
+      This statement only works for Impala tables that use the Kudu storage engine.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>
+DELETE [FROM] [<var class="keyword varname">database_name</var>.]<var class="keyword varname">table_name</var> [ WHERE <var class="keyword varname">where_conditions</var> ]
+
+DELETE <var class="keyword varname">table_ref</var> FROM [<var class="keyword varname">joined_table_refs</var>] [ WHERE <var class="keyword varname">where_conditions</var> ]
+</code></pre>
+
+    <p class="p">
+      The first form evaluates rows from one table against an optional
+      <code class="ph codeph">WHERE</code> clause, and deletes all the rows that
+      match the <code class="ph codeph">WHERE</code> conditions, or all rows if
+      <code class="ph codeph">WHERE</code> is omitted.
+    </p>
+
+    <p class="p">
+      The second form evaluates one or more join clauses, and deletes
+      all matching rows from one of the tables. The join clauses can
+      include non-Kudu tables, but the table from which the rows
+      are deleted must be a Kudu table. The <code class="ph codeph">FROM</code>
+      keyword is required in this case, to separate the name of
+      the table whose rows are being deleted from the table names
+      of the join clauses.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+    <p class="p">
+      The conditions in the <code class="ph codeph">WHERE</code> clause are the same ones allowed
+      for the <code class="ph codeph">SELECT</code> statement. See <a class="xref" href="impala_select.html#select">SELECT Statement</a>
+      for details.
+    </p>
+
+    <p class="p">
+      The conditions in the <code class="ph codeph">WHERE</code> clause can refer to
+      any combination of primary key columns or other columns. Referring to
+      primary key columns in the <code class="ph codeph">WHERE</code> clause is more efficient
+      than referring to non-primary key columns.
+    </p>
+
+    <p class="p">
+      If the <code class="ph codeph">WHERE</code> clause is omitted, all rows are removed from the table.
+    </p>
+
+    <p class="p">
+      Because Kudu currently does not enforce strong consistency during concurrent DML operations,
+      be aware that the results after this statement finishes might be different than you
+      intuitively expect:
+    </p>
+    <ul class="ul">
+      <li class="li">
+        <p class="p">
+          If some rows cannot be deleted because their
+          some primary key columns are not found, due to their being deleted
+          by a concurrent <code class="ph codeph">DELETE</code> operation,
+          the statement succeeds but returns a warning.
+        </p>
+      </li>
+      <li class="li">
+        <p class="p">
+          A <code class="ph codeph">DELETE</code> statement might also overlap with
+          <code class="ph codeph">INSERT</code>, <code class="ph codeph">UPDATE</code>,
+          or <code class="ph codeph">UPSERT</code> statements running concurrently on the same table.
+          After the statement finishes, there might be more or fewer rows than expected in the table
+          because it is undefined whether the <code class="ph codeph">DELETE</code> applies to rows that are
+          inserted or updated while the <code class="ph codeph">DELETE</code> is in progress.
+        </p>
+      </li>
+    </ul>
+
+    <p class="p">
+      The number of affected rows is reported in an <span class="keyword cmdname">impala-shell</span> message
+      and in the query profile.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Statement type:</strong> DML
+      </p>
+
+    <div class="note important note_important"><span class="note__title importanttitle">Important:</span> 
+        After adding or replacing data in a table used in performance-critical queries, issue a <code class="ph codeph">COMPUTE
+        STATS</code> statement to make sure all statistics are up-to-date. Consider updating statistics for a
+        table after any <code class="ph codeph">INSERT</code>, <code class="ph codeph">LOAD DATA</code>, or <code class="ph codeph">CREATE TABLE AS
+        SELECT</code> statement in Impala, or after loading data through Hive and doing a <code class="ph codeph">REFRESH
+        <var class="keyword varname">table_name</var></code> in Impala. This technique is especially important for tables that
+        are very large, used in join queries, or both.
+      </div>
+
+    <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+    <p class="p">
+      The following examples show how to delete rows from a specified
+      table, either all rows or rows that match a <code class="ph codeph">WHERE</code>
+      clause:
+    </p>
+
+<pre class="pre codeblock"><code>
+-- Deletes all rows. The FROM keyword is optional.
+DELETE FROM kudu_table;
+DELETE kudu_table;
+
+-- Deletes 0, 1, or more rows.
+-- (If c1 is a single-column primary key, the statement could only
+-- delete 0 or 1 rows.)
+DELETE FROM kudu_table WHERE c1 = 100;
+
+-- Deletes all rows that match all the WHERE conditions.
+DELETE FROM kudu_table WHERE
+  (c1 &gt; c2 OR c3 IN ('hello','world')) AND c4 IS NOT NULL;
+DELETE FROM t1 WHERE
+  (c1 IN (1,2,3) AND c2 &gt; c3) OR c4 IS NOT NULL;
+DELETE FROM time_series WHERE
+  year = 2016 AND month IN (11,12) AND day &gt; 15;
+
+-- WHERE condition with a subquery.
+DELETE FROM t1 WHERE
+  c5 IN (SELECT DISTINCT other_col FROM other_table);
+
+-- Does not delete any rows, because the WHERE condition is always false.
+DELETE FROM kudu_table WHERE 1 = 0;
+</code></pre>
+
+    <p class="p">
+      The following examples show how to delete rows that are part
+      of the result set from a join:
+    </p>
+
+<pre class="pre codeblock"><code>
+-- Remove _all_ rows from t1 that have a matching X value in t2.
+DELETE t1 FROM t1 JOIN t2 ON t1.x = t2.x;
+
+-- Remove _some_ rows from t1 that have a matching X value in t2.
+DELETE t1 FROM t1 JOIN t2 ON t1.x = t2.x
+  WHERE t1.y = FALSE and t2.z &gt; 100;
+
+-- Delete from a Kudu table based on a join with a non-Kudu table.
+DELETE t1 FROM kudu_table t1 JOIN non_kudu_table t2 ON t1.x = t2.x;
+
+-- The tables can be joined in any order as long as the Kudu table
+-- is specified as the deletion target.
+DELETE t2 FROM non_kudu_table t1 JOIN kudu_table t2 ON t1.x = t2.x;
+</code></pre>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+    <p class="p">
+      <a class="xref" href="impala_kudu.html#impala_kudu">Using Impala to Query Kudu Tables</a>, <a class="xref" href="impala_insert.html#insert">INSERT Statement</a>,
+      <a class="xref" href="impala_update.html#update">UPDATE Statement (Impala 2.8 or higher only)</a>, <a class="xref" href="impala_upsert.html#upsert">UPSERT Statement (Impala 2.8 or higher only)</a>
+    </p>
+
+  </div>
+
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_langref_sql.html">Impala SQL Statements</a></div></div></nav></article></main></body></html>
\ No newline at end of file

[32/51] [partial] incubator-impala git commit: IMPALA-4181 [DOCS] Publish rendered Impala documentation to ASF site

Posted by jb...@apache.org.

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_float.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_float.html b/docs/build/html/topics/impala_float.html
new file mode 100644
index 0000000..97125af
--- /dev/null
+++ b/docs/build/html/topics/impala_float.html
@@ -0,0 +1,136 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_datatypes.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="float"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>FLOAT Data Type</title></head><body id="float"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">FLOAT Data Type</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      A single precision floating-point data type used in <code class="ph codeph">CREATE TABLE</code> and <code class="ph codeph">ALTER
+      TABLE</code> statements.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+    <p class="p">
+      In the column definition of a <code class="ph codeph">CREATE TABLE</code> statement:
+    </p>
+
+<pre class="pre codeblock"><code><var class="keyword varname">column_name</var> FLOAT</code></pre>
+
+    <p class="p">
+      <strong class="ph b">Range:</strong> 1.40129846432481707e-45 .. 3.40282346638528860e+38, positive or negative
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Precision:</strong> 6 to 9 significant digits, depending on usage. The number of significant digits does
+      not depend on the position of the decimal point.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Representation:</strong> The values are stored in 4 bytes, using
+      <a class="xref" href="https://en.wikipedia.org/wiki/Single-precision_floating-point_format" target="_blank">IEEE 754 Single Precision Binary Floating Point</a> format.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Conversions:</strong> Impala automatically converts <code class="ph codeph">FLOAT</code> to more precise
+      <code class="ph codeph">DOUBLE</code> values, but not the other way around. You can use <code class="ph codeph">CAST()</code> to convert
+      <code class="ph codeph">FLOAT</code> values to <code class="ph codeph">TINYINT</code>, <code class="ph codeph">SMALLINT</code>, <code class="ph codeph">INT</code>,
+      <code class="ph codeph">BIGINT</code>, <code class="ph codeph">STRING</code>, <code class="ph codeph">TIMESTAMP</code>, or <code class="ph codeph">BOOLEAN</code>.
+      You can use exponential notation in <code class="ph codeph">FLOAT</code> literals or when casting from
+      <code class="ph codeph">STRING</code>, for example <code class="ph codeph">1.0e6</code> to represent one million.
+      <span class="ph">Casting an integer or floating-point value <code class="ph codeph">N</code> to
+        <code class="ph codeph">TIMESTAMP</code> produces a value that is <code class="ph codeph">N</code> seconds past the start of the epoch
+        date (January 1, 1970). By default, the result value represents a date and time in the UTC time zone.
+        If the setting <code class="ph codeph">-use_local_tz_for_unix_timestamp_conversions=true</code> is in effect,
+        the resulting <code class="ph codeph">TIMESTAMP</code> represents a date and time in the local time zone.</span>
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>CREATE TABLE t1 (x FLOAT);
+SELECT CAST(1000.5 AS FLOAT);
+</code></pre>
+
+    <p class="p">
+        <strong class="ph b">Partitioning:</strong> Because fractional values of this type are not always represented precisely, when this
+        type is used for a partition key column, the underlying HDFS directories might not be named exactly as you
+        expect. Prefer to partition on a <code class="ph codeph">DECIMAL</code> column instead.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">HBase considerations:</strong> This data type is fully compatible with HBase tables.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Parquet considerations:</strong> This type is fully compatible with Parquet tables.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Text table considerations:</strong> Values of this type are potentially larger in text tables than in tables
+        using Parquet or other binary formats.
+      </p>
+
+
+
+    <p class="p">
+        <strong class="ph b">Internal details:</strong> Represented in memory as a 4-byte value.
+      </p>
+
+
+
+    <p class="p">
+        <strong class="ph b">Column statistics considerations:</strong> Because this type has a fixed size, the maximum and average size
+        fields are always filled in for column statistics, even before you run the <code class="ph codeph">COMPUTE STATS</code>
+        statement.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Restrictions:</strong>
+      </p>
+
+
+
+    <p class="p">
+        Due to the way arithmetic on <code class="ph codeph">FLOAT</code> and <code class="ph codeph">DOUBLE</code> columns uses
+        high-performance hardware instructions, and distributed queries can perform these operations in different
+        order for each query, results can vary slightly for aggregate function calls such as <code class="ph codeph">SUM()</code>
+        and <code class="ph codeph">AVG()</code> for <code class="ph codeph">FLOAT</code> and <code class="ph codeph">DOUBLE</code> columns, particularly on
+        large data sets where millions or billions of values are summed or averaged. For perfect consistency and
+        repeatability, use the <code class="ph codeph">DECIMAL</code> data type for such operations instead of
+        <code class="ph codeph">FLOAT</code> or <code class="ph codeph">DOUBLE</code>.
+      </p>
+
+    <p class="p">
+        The inability to exactly represent certain floating-point values means that
+        <code class="ph codeph">DECIMAL</code> is sometimes a better choice than <code class="ph codeph">DOUBLE</code>
+        or <code class="ph codeph">FLOAT</code> when precision is critical, particularly when
+        transferring data from other database systems that use different representations
+        or file formats.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Kudu considerations:</strong>
+      </p>
+    <p class="p">
+        Currently, the data types <code class="ph codeph">BOOLEAN</code>, <code class="ph codeph">FLOAT</code>,
+        and <code class="ph codeph">DOUBLE</code> cannot be used for primary key columns in Kudu tables.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+    <p class="p">
+      <a class="xref" href="impala_literals.html#numeric_literals">Numeric Literals</a>, <a class="xref" href="impala_math_functions.html#math_functions">Impala Mathematical Functions</a>,
+      <a class="xref" href="impala_double.html#double">DOUBLE Data Type</a>
+    </p>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_datatypes.html">Data Types</a></div></div></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_functions.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_functions.html b/docs/build/html/topics/impala_functions.html
new file mode 100644
index 0000000..dbe1016
--- /dev/null
+++ b/docs/build/html/topics/impala_functions.html
@@ -0,0 +1,162 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_langref.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_math_functions.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_bit_functions.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_conversion_functions.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_datetime_functions.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_conditional_functions.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_string_functions.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_misc_functions.html"><meta name="DC.Relation" scheme="URI" content=
 "../topics/impala_aggregate_functions.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_analytic_functions.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_udf.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="builtins"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Impala Built-In Functions</title></head><body id="builtins"><main role="main"><article role="article" aria-labelledby="builtins__title_functions">
+
+  <h1 class="title topictitle1" id="builtins__title_functions">Impala Built-In Functions</h1>
+  
+  
+
+  <div class="body conbody">
+
+    
+
+    <p class="p">
+      Impala supports several categories of built-in functions. These functions let you perform mathematical
+      calculations, string manipulation, date calculations, and other kinds of data transformations directly in
+      <code class="ph codeph">SELECT</code> statements. The built-in functions let a SQL query return results with all
+      formatting, calculating, and type conversions applied, rather than performing time-consuming postprocessing
+      in another application. By applying function calls where practical, you can make a SQL query that is as
+      convenient as an expression in a procedural programming language or a formula in a spreadsheet.
+    </p>
+
+    <p class="p">
+      The categories of functions supported by Impala are:
+    </p>
+
+    <ul class="ul">
+      <li class="li">
+        <a class="xref" href="impala_math_functions.html#math_functions">Impala Mathematical Functions</a>
+      </li>
+
+      <li class="li">
+        <a class="xref" href="impala_conversion_functions.html#conversion_functions">Impala Type Conversion Functions</a>
+      </li>
+
+      <li class="li">
+        <a class="xref" href="impala_datetime_functions.html#datetime_functions">Impala Date and Time Functions</a>
+      </li>
+
+      <li class="li">
+        <a class="xref" href="impala_conditional_functions.html#conditional_functions">Impala Conditional Functions</a>
+      </li>
+
+      <li class="li">
+        <a class="xref" href="impala_string_functions.html#string_functions">Impala String Functions</a>
+      </li>
+
+      <li class="li">
+        Aggregation functions, explained in <a class="xref" href="impala_aggregate_functions.html#aggregate_functions">Impala Aggregate Functions</a>.
+      </li>
+    </ul>
+
+    <p class="p">
+      You call any of these functions through the <code class="ph codeph">SELECT</code> statement. For most functions, you can
+      omit the <code class="ph codeph">FROM</code> clause and supply literal values for any required arguments:
+    </p>
+
+<pre class="pre codeblock"><code>select abs(-1);
++---------+
+| abs(-1) |
++---------+
+| 1       |
++---------+
+
+select concat('The rain ', 'in Spain');
++---------------------------------+
+| concat('the rain ', 'in spain') |
++---------------------------------+
+| The rain in Spain               |
++---------------------------------+
+
+select power(2,5);
++-------------+
+| power(2, 5) |
++-------------+
+| 32          |
++-------------+
+</code></pre>
+
+    <p class="p">
+      When you use a <code class="ph codeph">FROM</code> clause and specify a column name as a function argument, the function is
+      applied for each item in the result set:
+    </p>
+
+
+
+<pre class="pre codeblock"><code>select concat('Country = ',country_code) from all_countries where population &gt; 100000000;
+select round(price) as dollar_value from product_catalog where price between 0.0 and 100.0;
+</code></pre>
+
+    <p class="p">
+      Typically, if any argument to a built-in function is <code class="ph codeph">NULL</code>, the result value is also
+      <code class="ph codeph">NULL</code>:
+    </p>
+
+<pre class="pre codeblock"><code>select cos(null);
++-----------+
+| cos(null) |
++-----------+
+| NULL      |
++-----------+
+
+select power(2,null);
++----------------+
+| power(2, null) |
++----------------+
+| NULL           |
++----------------+
+
+select concat('a',null,'b');
++------------------------+
+| concat('a', null, 'b') |
++------------------------+
+| NULL                   |
++------------------------+
+</code></pre>
+
+    <p class="p">
+        Aggregate functions are a special category with different rules. These functions calculate a return value
+        across all the items in a result set, so they require a <code class="ph codeph">FROM</code> clause in the query:
+      </p>
+
+<pre class="pre codeblock"><code>select count(product_id) from product_catalog;
+select max(height), avg(height) from census_data where age &gt; 20;
+</code></pre>
+
+    <p class="p">
+        Aggregate functions also ignore <code class="ph codeph">NULL</code> values rather than returning a <code class="ph codeph">NULL</code>
+        result. For example, if some rows have <code class="ph codeph">NULL</code> for a particular column, those rows are
+        ignored when computing the <code class="ph codeph">AVG()</code> for that column. Likewise, specifying
+        <code class="ph codeph">COUNT(<var class="keyword varname">col_name</var>)</code> in a query counts only those rows where
+        <var class="keyword varname">col_name</var> contains a non-<code class="ph codeph">NULL</code> value.
+      </p>
+
+    <p class="p">
+      Aggregate functions are a special category with different rules. These functions calculate a return value
+      across all the items in a result set, so they do require a <code class="ph codeph">FROM</code> clause in the query:
+    </p>
+
+
+
+<pre class="pre codeblock"><code>select count(product_id) from product_catalog;
+select max(height), avg(height) from census_data where age &gt; 20;
+</code></pre>
+
+    <p class="p">
+      Aggregate functions also ignore <code class="ph codeph">NULL</code> values rather than returning a <code class="ph codeph">NULL</code>
+      result. For example, if some rows have <code class="ph codeph">NULL</code> for a particular column, those rows are ignored
+      when computing the AVG() for that column. Likewise, specifying <code class="ph codeph">COUNT(col_name)</code> in a query
+      counts only those rows where <code class="ph codeph">col_name</code> contains a non-<code class="ph codeph">NULL</code> value.
+    </p>
+
+    <p class="p">
+      Analytic functions are a variation on aggregate functions. Instead of returning a single value, or an
+      identical value for each group of rows, they can compute values that vary based on a <span class="q">"window"</span> consisting
+      of other rows around them in the result set.
+    </p>
+
+    <p class="p toc"></p>
+
+  </div>
+
+<nav role="navigation" class="related-links"><ul class="ullinks"><li class="link ulchildlink"><strong><a href="../topics/impala_math_functions.html">Impala Mathematical Functions</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_bit_functions.html">Impala Bit Functions</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_conversion_functions.html">Impala Type Conversion Functions</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_datetime_functions.html">Impala Date and Time Functions</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_conditional_functions.html">Impala Conditional Functions</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_string_functions.html">Impala String Functions</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_misc_functions.html">Impala Miscellaneous Functions</
 a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_aggregate_functions.html">Impala Aggregate Functions</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_analytic_functions.html">Impala Analytic Functions</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_udf.html">Impala User-Defined Functions (UDFs)</a></strong><br></li></ul><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_langref.html">Impala SQL Language Reference</a></div></div></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_functions_overview.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_functions_overview.html b/docs/build/html/topics/impala_functions_overview.html
new file mode 100644
index 0000000..d202600
--- /dev/null
+++ b/docs/build/html/topics/impala_functions_overview.html
@@ -0,0 +1,109 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_schema_objects.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="functions"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Overview of Impala Functions</title></head><body id="functions"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Overview of Impala Functions</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      Functions let you apply arithmetic, string, or other computations and transformations to Impala data. You
+      typically use them in <code class="ph codeph">SELECT</code> lists and <code class="ph codeph">WHERE</code> clauses to filter and format
+      query results so that the result set is exactly what you want, with no further processing needed on the
+      application side.
+    </p>
+
+    <p class="p">
+      Scalar functions return a single result for each input row. See <a class="xref" href="impala_functions.html#builtins">Impala Built-In Functions</a>.
+    </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; select name, population from country where continent = 'North America' order by population desc limit 4;
+[localhost:21000] &gt; select upper(name), population from country where continent = 'North America' order by population desc limit 4;
++-------------+------------+
+| upper(name) | population |
++-------------+------------+
+| USA         | 320000000  |
+| MEXICO      | 122000000  |
+| CANADA      | 25000000   |
+| GUATEMALA   | 16000000   |
++-------------+------------+
+</code></pre>
+    <p class="p">
+      Aggregate functions combine the results from multiple rows:
+      either a single result for the entire table, or a separate result for each group of rows.
+      Aggregate functions are frequently used in combination with <code class="ph codeph">GROUP BY</code>
+      and <code class="ph codeph">HAVING</code> clauses in the <code class="ph codeph">SELECT</code> statement.
+      See <a class="xref" href="impala_aggregate_functions.html#aggregate_functions">Impala Aggregate Functions</a>.
+    </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; select continent, <strong class="ph b">sum(population)</strong> as howmany from country <strong class="ph b">group by continent</strong> order by howmany desc;
++---------------+------------+
+| continent     | howmany    |
++---------------+------------+
+| Asia          | 4298723000 |
+| Africa        | 1110635000 |
+| Europe        | 742452000  |
+| North America | 565265000  |
+| South America | 406740000  |
+| Oceania       | 38304000   |
++---------------+------------+
+</code></pre>
+
+    <p class="p">
+      User-defined functions (UDFs) let you code your own logic.  They can be either scalar or aggregate functions.
+      UDFs let you implement important business or scientific logic using high-performance code for Impala to automatically parallelize.
+      You can also use UDFs to implement convenience functions to simplify reporting or porting SQL from other database systems.
+      See <a class="xref" href="impala_udf.html#udfs">Impala User-Defined Functions (UDFs)</a>.
+    </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; select <strong class="ph b">rot13('Hello world!')</strong> as 'Weak obfuscation';
++------------------+
+| weak obfuscation |
++------------------+
+| Uryyb jbeyq!     |
++------------------+
+[localhost:21000] &gt; select <strong class="ph b">likelihood_of_new_subatomic_particle(sensor1, sensor2, sensor3)</strong> as probability
+                  &gt; from experimental_results group by experiment;
+</code></pre>
+
+    <p class="p">
+      Each function is associated with a specific database. For example, if you issue a <code class="ph codeph">USE somedb</code>
+      statement followed by <code class="ph codeph">CREATE FUNCTION somefunc</code>, the new function is created in the
+      <code class="ph codeph">somedb</code> database, and you could refer to it through the fully qualified name
+      <code class="ph codeph">somedb.somefunc</code>. You could then issue another <code class="ph codeph">USE</code> statement
+      and create a function with the same name in a different database.
+    </p>
+
+    <p class="p">
+      Impala built-in functions are associated with a special database named <code class="ph codeph">_impala_builtins</code>,
+      which lets you refer to them from any database without qualifying the name.
+    </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; show databases;
++-------------------------+
+| name                    |
++-------------------------+
+| <strong class="ph b">_impala_builtins</strong>        |
+| analytic_functions      |
+| avro_testing            |
+| data_file_size          |
+...
+[localhost:21000] &gt; show functions in _impala_builtins like '*subs*';
++-------------+-----------------------------------+
+| return type | signature                         |
++-------------+-----------------------------------+
+| STRING      | substr(STRING, BIGINT)            |
+| STRING      | substr(STRING, BIGINT, BIGINT)    |
+| STRING      | substring(STRING, BIGINT)         |
+| STRING      | substring(STRING, BIGINT, BIGINT) |
++-------------+-----------------------------------+
+</code></pre>
+
+    <p class="p">
+      <strong class="ph b">Related statements:</strong> <a class="xref" href="impala_create_function.html#create_function">CREATE FUNCTION Statement</a>,
+      <a class="xref" href="impala_drop_function.html#drop_function">DROP FUNCTION Statement</a>
+    </p>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_schema_objects.html">Impala Schema Objects and Object Names</a></div></div></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_grant.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_grant.html b/docs/build/html/topics/impala_grant.html
new file mode 100644
index 0000000..dafb4f3
--- /dev/null
+++ b/docs/build/html/topics/impala_grant.html
@@ -0,0 +1,137 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_langref_sql.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="grant"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>GRANT Statement (Impala 2.0 or higher only)</title></head><body id="grant"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">GRANT Statement (<span class="keyword">Impala 2.0</span> or higher only)</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      
+
+      The <code class="ph codeph">GRANT</code> statement grants roles or privileges on specified objects to groups. Only Sentry
+      administrative users can grant roles to a group.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>GRANT ROLE <var class="keyword varname">role_name</var> TO GROUP <var class="keyword varname">group_name</var>
+
+GRANT <var class="keyword varname">privilege</var> ON <var class="keyword varname">object_type</var> <var class="keyword varname">object_name</var>
+   TO [ROLE] <var class="keyword varname">roleName</var>
+   [WITH GRANT OPTION]
+
+<span class="ph">privilege ::= SELECT | SELECT(<var class="keyword varname">column_name</var>) | INSERT | ALL</span>
+object_type ::= TABLE | DATABASE | SERVER | URI
+</code></pre>
+
+    <p class="p">
+      Typically, the object name is an identifier. For URIs, it is a string literal.
+    </p>
+
+
+
+    <p class="p">
+        <strong class="ph b">Required privileges:</strong>
+      </p>
+
+    <p class="p">
+
+      Only administrative users (initially, a predefined set of users specified in the Sentry service configuration
+      file) can use this statement.
+    </p>
+
+    <p class="p">
+      The <code class="ph codeph">WITH GRANT OPTION</code> clause allows members of the specified role to issue
+      <code class="ph codeph">GRANT</code> and <code class="ph codeph">REVOKE</code> statements for those same privileges
+
+      Hence, if a role has the <code class="ph codeph">ALL</code> privilege on a database and the <code class="ph codeph">WITH GRANT
+      OPTION</code> set, users granted that role can execute <code class="ph codeph">GRANT</code>/<code class="ph codeph">REVOKE</code>
+      statements only for that database or child tables of the database. This means a user could revoke the
+      privileges of the user that provided them the <code class="ph codeph">GRANT OPTION</code>.
+    </p>
+
+    <p class="p">
+
+      Impala does not currently support revoking only the <code class="ph codeph">WITH GRANT OPTION</code> from a privilege
+      previously granted to a role. To remove the <code class="ph codeph">WITH GRANT OPTION</code>, revoke the privilege and
+      grant it again without the <code class="ph codeph">WITH GRANT OPTION</code> flag.
+    </p>
+
+    <p class="p">
+      The ability to grant or revoke <code class="ph codeph">SELECT</code> privilege on specific columns is available
+      in <span class="keyword">Impala 2.3</span> and higher. See <span class="xref">the documentation for Apache Sentry</span> for details.
+    </p>
+
+
+
+
+
+    <p class="p">
+        <strong class="ph b">Compatibility:</strong>
+      </p>
+
+    <div class="p">
+      <ul class="ul">
+        <li class="li">
+          The Impala <code class="ph codeph">GRANT</code> and <code class="ph codeph">REVOKE</code> statements are available in
+          <span class="keyword">Impala 2.0</span> and later.
+        </li>
+
+        <li class="li">
+          In <span class="keyword">Impala 1.4</span> and later, Impala can make use of any roles and privileges specified by the
+          <code class="ph codeph">GRANT</code> and <code class="ph codeph">REVOKE</code> statements in Hive, when your system is configured to
+          use the Sentry service instead of the file-based policy mechanism.
+        </li>
+
+        <li class="li">
+          The Impala <code class="ph codeph">GRANT</code> and <code class="ph codeph">REVOKE</code> statements for privileges do not require
+          the <code class="ph codeph">ROLE</code> keyword to be repeated before each role name, unlike the equivalent Hive
+          statements.
+        </li>
+
+        <li class="li">
+          Currently, each Impala <code class="ph codeph">GRANT</code> or <code class="ph codeph">REVOKE</code> statement can only grant or
+          revoke a single privilege to or from a single role.
+        </li>
+      </ul>
+    </div>
+
+    <p class="p">
+        <strong class="ph b">Cancellation:</strong> Cannot be cancelled.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">HDFS permissions:</strong> This statement does not touch any HDFS files or directories,
+        therefore no HDFS permissions are required.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Kudu considerations:</strong>
+      </p>
+    <p class="p">
+        Access to Kudu tables must be granted to and revoked from roles as usual.
+        Only users with <code class="ph codeph">ALL</code> privileges on <code class="ph codeph">SERVER</code> can create external Kudu tables.
+        Currently, access to a Kudu table is <span class="q">"all or nothing"</span>:
+        enforced at the table level rather than the column level, and applying to all
+        SQL operations rather than individual statements such as <code class="ph codeph">INSERT</code>.
+        Because non-SQL APIs can access Kudu data without going through Sentry
+        authorization, currently the Sentry support is considered preliminary
+        and subject to change.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+    <p class="p">
+      <a class="xref" href="impala_authorization.html#authorization">Enabling Sentry Authorization for Impala</a>, <a class="xref" href="impala_revoke.html#revoke">REVOKE Statement (Impala 2.0 or higher only)</a>,
+      <a class="xref" href="impala_create_role.html#create_role">CREATE ROLE Statement (Impala 2.0 or higher only)</a>, <a class="xref" href="impala_drop_role.html#drop_role">DROP ROLE Statement (Impala 2.0 or higher only)</a>,
+      <a class="xref" href="impala_show.html#show">SHOW Statement</a>
+    </p>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_langref_sql.html">Impala SQL Statements</a></div></div></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_group_by.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_group_by.html b/docs/build/html/topics/impala_group_by.html
new file mode 100644
index 0000000..585dd3a
--- /dev/null
+++ b/docs/build/html/topics/impala_group_by.html
@@ -0,0 +1,140 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_select.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="group_by"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>GROUP BY Clause</title></head><body id="group_by"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">GROUP BY Clause</h1>
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      Specify the <code class="ph codeph">GROUP BY</code> clause in queries that use aggregation functions, such as
+      <code class="ph codeph"><a class="xref" href="impala_count.html#count">COUNT()</a></code>,
+      <code class="ph codeph"><a class="xref" href="impala_sum.html#sum">SUM()</a></code>,
+      <code class="ph codeph"><a class="xref" href="impala_avg.html#avg">AVG()</a></code>,
+      <code class="ph codeph"><a class="xref" href="impala_min.html#min">MIN()</a></code>, and
+      <code class="ph codeph"><a class="xref" href="impala_max.html#max">MAX()</a></code>. Specify in the
+      <code class="ph codeph"><a class="xref" href="impala_group_by.html#group_by">GROUP BY</a></code> clause the names of all the
+      columns that do not participate in the aggregation operation.
+    </p>
+
+    
+    <p class="p">
+        <strong class="ph b">Complex type considerations:</strong>
+      </p>
+
+    <p class="p">
+      In <span class="keyword">Impala 2.3</span> and higher, the complex data types <code class="ph codeph">STRUCT</code>,
+      <code class="ph codeph">ARRAY</code>, and <code class="ph codeph">MAP</code> are available. These columns cannot
+      be referenced directly in the <code class="ph codeph">ORDER BY</code> clause.
+      When you query a complex type column, you use join notation to <span class="q">"unpack"</span> the elements
+      of the complex type, and within the join query you can include an <code class="ph codeph">ORDER BY</code>
+      clause to control the order in the result set of the scalar elements from the complex type.
+      See <a class="xref" href="impala_complex_types.html#complex_types">Complex Types (Impala 2.3 or higher only)</a> for details about Impala support for complex types.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Zero-length strings:</strong> For purposes of clauses such as <code class="ph codeph">DISTINCT</code> and <code class="ph codeph">GROUP
+        BY</code>, Impala considers zero-length strings (<code class="ph codeph">""</code>), <code class="ph codeph">NULL</code>, and space
+        to all be different values.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+    <p class="p">
+      For example, the following query finds the 5 items that sold the highest total quantity (using the
+      <code class="ph codeph">SUM()</code> function, and also counts the number of sales transactions for those items (using the
+      <code class="ph codeph">COUNT()</code> function). Because the column representing the item IDs is not used in any
+      aggregation functions, we specify that column in the <code class="ph codeph">GROUP BY</code> clause.
+    </p>
+
+<pre class="pre codeblock"><code>select
+  <strong class="ph b">ss_item_sk</strong> as Item,
+  <strong class="ph b">count</strong>(ss_item_sk) as Times_Purchased,
+  <strong class="ph b">sum</strong>(ss_quantity) as Total_Quantity_Purchased
+from store_sales
+  <strong class="ph b">group by ss_item_sk</strong>
+  order by sum(ss_quantity) desc
+  limit 5;
++-------+-----------------+--------------------------+
+| item  | times_purchased | total_quantity_purchased |
++-------+-----------------+--------------------------+
+| 9325  | 372             | 19072                    |
+| 4279  | 357             | 18501                    |
+| 7507  | 371             | 18475                    |
+| 5953  | 369             | 18451                    |
+| 16753 | 375             | 18446                    |
++-------+-----------------+--------------------------+</code></pre>
+
+    <p class="p">
+      The <code class="ph codeph">HAVING</code> clause lets you filter the results of aggregate functions, because you cannot
+      refer to those expressions in the <code class="ph codeph">WHERE</code> clause. For example, to find the 5 lowest-selling
+      items that were included in at least 100 sales transactions, we could use this query:
+    </p>
+
+<pre class="pre codeblock"><code>select
+  <strong class="ph b">ss_item_sk</strong> as Item,
+  <strong class="ph b">count</strong>(ss_item_sk) as Times_Purchased,
+  <strong class="ph b">sum</strong>(ss_quantity) as Total_Quantity_Purchased
+from store_sales
+  <strong class="ph b">group by ss_item_sk</strong>
+  <strong class="ph b">having times_purchased &gt;= 100</strong>
+  order by sum(ss_quantity)
+  limit 5;
++-------+-----------------+--------------------------+
+| item  | times_purchased | total_quantity_purchased |
++-------+-----------------+--------------------------+
+| 13943 | 105             | 4087                     |
+| 2992  | 101             | 4176                     |
+| 4773  | 107             | 4204                     |
+| 14350 | 103             | 4260                     |
+| 11956 | 102             | 4275                     |
++-------+-----------------+--------------------------+</code></pre>
+
+    <p class="p">
+      When performing calculations involving scientific or financial data, remember that columns with type
+      <code class="ph codeph">FLOAT</code> or <code class="ph codeph">DOUBLE</code> are stored as true floating-point numbers, which cannot
+      precisely represent every possible fractional value. Thus, if you include a <code class="ph codeph">FLOAT</code> or
+      <code class="ph codeph">DOUBLE</code> column in a <code class="ph codeph">GROUP BY</code> clause, the results might not precisely match
+      literal values in your query or from an original Text data file. Use rounding operations, the
+      <code class="ph codeph">BETWEEN</code> operator, or another arithmetic technique to match floating-point values that are
+      <span class="q">"near"</span> literal values you expect. For example, this query on the <code class="ph codeph">ss_wholesale_cost</code>
+      column returns cost values that are close but not identical to the original figures that were entered as
+      decimal fractions.
+    </p>
+
+<pre class="pre codeblock"><code>select ss_wholesale_cost, avg(ss_quantity * ss_sales_price) as avg_revenue_per_sale
+  from sales
+  group by ss_wholesale_cost
+  order by avg_revenue_per_sale desc
+  limit 5;
++-------------------+----------------------+
+| ss_wholesale_cost | avg_revenue_per_sale |
++-------------------+----------------------+
+| 96.94000244140625 | 4454.351539300434    |
+| 95.93000030517578 | 4423.119941283189    |
+| 98.37999725341797 | 4332.516490316291    |
+| 97.97000122070312 | 4330.480601655014    |
+| 98.52999877929688 | 4291.316953108634    |
++-------------------+----------------------+</code></pre>
+
+    <p class="p">
+      Notice how wholesale cost values originally entered as decimal fractions such as <code class="ph codeph">96.94</code> and
+      <code class="ph codeph">98.38</code> are slightly larger or smaller in the result set, due to precision limitations in the
+      hardware floating-point types. The imprecise representation of <code class="ph codeph">FLOAT</code> and
+      <code class="ph codeph">DOUBLE</code> values is why financial data processing systems often store currency using data types
+      that are less space-efficient but avoid these types of rounding errors.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+    <p class="p">
+      <a class="xref" href="impala_select.html#select">SELECT Statement</a>,
+      <a class="xref" href="impala_aggregate_functions.html#aggregate_functions">Impala Aggregate Functions</a>
+    </p>
+
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_select.html">SELECT Statement</a></div></div></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_group_concat.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_group_concat.html b/docs/build/html/topics/impala_group_concat.html
new file mode 100644
index 0000000..dcc0439
--- /dev/null
+++ b/docs/build/html/topics/impala_group_concat.html
@@ -0,0 +1,137 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_aggregate_functions.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="group_concat"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>GROUP_CONCAT Function</title></head><body id="group_concat"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">GROUP_CONCAT Function</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      
+      An aggregate function that returns a single string representing the argument value concatenated together for
+      each row of the result set. If the optional separator string is specified, the separator is added between
+      each pair of concatenated values. The default separator is a comma followed by a space.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+
+
+<pre class="pre codeblock"><code>GROUP_CONCAT([ALL] <var class="keyword varname">expression</var> [, <var class="keyword varname">separator</var>])</code></pre>
+
+    <p class="p">
+        <strong class="ph b">Usage notes:</strong> <code class="ph codeph">concat()</code> and <code class="ph codeph">concat_ws()</code> are appropriate for
+        concatenating the values of multiple columns within the same row, while <code class="ph codeph">group_concat()</code>
+        joins together values from different rows.
+      </p>
+
+    <p class="p">
+      By default, returns a single string covering the whole result set. To include other columns or values in the
+      result set, or to produce multiple concatenated strings for subsets of rows, include a <code class="ph codeph">GROUP
+      BY</code> clause in the query.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Return type:</strong> <code class="ph codeph">STRING</code>
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Restrictions:</strong>
+      </p>
+
+    <p class="p">
+      You cannot apply the <code class="ph codeph">DISTINCT</code> operator to the argument of this function.
+    </p>
+
+    <p class="p">
+        This function cannot be used in an analytic context. That is, the <code class="ph codeph">OVER()</code> clause is not allowed at all with this function.
+      </p>
+
+    <p class="p">
+      Currently, Impala returns an error if the result value grows larger than 1 GiB.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+    <p class="p">
+      The following examples illustrate various aspects of the <code class="ph codeph">GROUP_CONCAT()</code> function.
+    </p>
+
+    <p class="p">
+      You can call the function directly on a <code class="ph codeph">STRING</code> column. To use it with a numeric column, cast
+      the value to <code class="ph codeph">STRING</code>.
+    </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; create table t1 (x int, s string);
+[localhost:21000] &gt; insert into t1 values (1, "one"), (3, "three"), (2, "two"), (1, "one");
+[localhost:21000] &gt; select group_concat(s) from t1;
++----------------------+
+| group_concat(s)      |
++----------------------+
+| one, three, two, one |
++----------------------+
+[localhost:21000] &gt; select group_concat(cast(x as string)) from t1;
++---------------------------------+
+| group_concat(cast(x as string)) |
++---------------------------------+
+| 1, 3, 2, 1                      |
++---------------------------------+
+</code></pre>
+
+    <p class="p">
+      The optional separator lets you format the result in flexible ways. The separator can be an arbitrary string
+      expression, not just a single character.
+    </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; select group_concat(s,"|") from t1;
++----------------------+
+| group_concat(s, '|') |
++----------------------+
+| one|three|two|one    |
++----------------------+
+[localhost:21000] &gt; select group_concat(s,'---') from t1;
++-------------------------+
+| group_concat(s, '---')  |
++-------------------------+
+| one---three---two---one |
++-------------------------+
+</code></pre>
+
+    <p class="p">
+      The default separator is a comma followed by a space. To get a comma-delimited result without extra spaces,
+      specify a delimiter character that is only a comma.
+    </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; select group_concat(s,',') from t1;
++----------------------+
+| group_concat(s, ',') |
++----------------------+
+| one,three,two,one    |
++----------------------+
+</code></pre>
+
+    <p class="p">
+      Including a <code class="ph codeph">GROUP BY</code> clause lets you produce a different concatenated result for each group
+      in the result set. In this example, the only <code class="ph codeph">X</code> value that occurs more than once is
+      <code class="ph codeph">1</code>, so that is the only row in the result set where <code class="ph codeph">GROUP_CONCAT()</code> returns a
+      delimited value. For groups containing a single value, <code class="ph codeph">GROUP_CONCAT()</code> returns the original
+      value of its <code class="ph codeph">STRING</code> argument.
+    </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; select x, group_concat(s) from t1 group by x;
++---+-----------------+
+| x | group_concat(s) |
++---+-----------------+
+| 2 | two             |
+| 3 | three           |
+| 1 | one, one        |
++---+-----------------+
+</code></pre>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_aggregate_functions.html">Impala Aggregate Functions</a></div></div></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_hadoop.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_hadoop.html b/docs/build/html/topics/impala_hadoop.html
new file mode 100644
index 0000000..dc5f7d6
--- /dev/null
+++ b/docs/build/html/topics/impala_hadoop.html
@@ -0,0 +1,138 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_concepts.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="intro_hadoop"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>How Impala Fits Into the Hadoop Ecosystem</title></head><body id="intro_hadoop"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">How Impala Fits Into the Hadoop Ecosystem</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      Impala makes use of many familiar components within the Hadoop ecosystem. Impala can interchange data with
+      other Hadoop components, as both a consumer and a producer, so it can fit in flexible ways into your ETL and
+      ELT pipelines.
+    </p>
+
+    <p class="p toc inpage"></p>
+  </div>
+
+  <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_concepts.html">Impala Concepts and Architecture</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="intro_hadoop__intro_hive">
+
+    <h2 class="title topictitle2" id="ariaid-title2">How Impala Works with Hive</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        A major Impala goal is to make SQL-on-Hadoop operations fast and efficient enough to appeal to new
+        categories of users and open up Hadoop to new types of use cases. Where practical, it makes use of existing
+        Apache Hive infrastructure that many Hadoop users already have in place to perform long-running,
+        batch-oriented SQL queries.
+      </p>
+
+      <p class="p">
+        In particular, Impala keeps its table definitions in a traditional MySQL or PostgreSQL database known as
+        the <strong class="ph b">metastore</strong>, the same database where Hive keeps this type of data. Thus, Impala can access tables
+        defined or loaded by Hive, as long as all columns use Impala-supported data types, file formats, and
+        compression codecs.
+      </p>
+
+      <p class="p">
+        The initial focus on query features and performance means that Impala can read more types of data with the
+        <code class="ph codeph">SELECT</code> statement than it can write with the <code class="ph codeph">INSERT</code> statement. To query
+        data using the Avro, RCFile, or SequenceFile <a class="xref" href="impala_file_formats.html#file_formats">file
+        formats</a>, you load the data using Hive.
+      </p>
+
+      <p class="p">
+        The Impala query optimizer can also make use of <a class="xref" href="impala_perf_stats.html#perf_table_stats">table
+        statistics</a> and <a class="xref" href="impala_perf_stats.html#perf_column_stats">column statistics</a>.
+        Originally, you gathered this information with the <code class="ph codeph">ANALYZE TABLE</code> statement in Hive; in
+        Impala 1.2.2 and higher, use the Impala <code class="ph codeph"><a class="xref" href="impala_compute_stats.html#compute_stats">COMPUTE
+        STATS</a></code> statement instead. <code class="ph codeph">COMPUTE STATS</code> requires less setup, is more
+        reliable, and does not require switching back and forth between <span class="keyword cmdname">impala-shell</span>
+        and the Hive shell.
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="intro_hadoop__intro_metastore">
+
+    <h2 class="title topictitle2" id="ariaid-title3">Overview of Impala Metadata and the Metastore</h2>
+  
+
+    <div class="body conbody">
+
+      <p class="p">
+        As discussed in <a class="xref" href="impala_hadoop.html#intro_hive">How Impala Works with Hive</a>, Impala maintains information about table
+        definitions in a central database known as the <strong class="ph b">metastore</strong>. Impala also tracks other metadata for the
+        low-level characteristics of data files:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          The physical locations of blocks within HDFS.
+        </li>
+      </ul>
+
+      <p class="p">
+        For tables with a large volume of data and/or many partitions, retrieving all the metadata for a table can
+        be time-consuming, taking minutes in some cases. Thus, each Impala node caches all of this metadata to
+        reuse for future queries against the same table.
+      </p>
+
+      <p class="p">
+        If the table definition or the data in the table is updated, all other Impala daemons in the cluster must
+        receive the latest metadata, replacing the obsolete cached metadata, before issuing a query against that
+        table. In Impala 1.2 and higher, the metadata update is automatic, coordinated through the
+        <span class="keyword cmdname">catalogd</span> daemon, for all DDL and DML statements issued through Impala. See
+        <a class="xref" href="impala_components.html#intro_catalogd">The Impala Catalog Service</a> for details.
+      </p>
+
+      <p class="p">
+        For DDL and DML issued through Hive, or changes made manually to files in HDFS, you still use the
+        <code class="ph codeph">REFRESH</code> statement (when new data files are added to existing tables) or the
+        <code class="ph codeph">INVALIDATE METADATA</code> statement (for entirely new tables, or after dropping a table,
+        performing an HDFS rebalance operation, or deleting data files). Issuing <code class="ph codeph">INVALIDATE
+        METADATA</code> by itself retrieves metadata for all the tables tracked by the metastore. If you know
+        that only specific tables have been changed outside of Impala, you can issue <code class="ph codeph">REFRESH
+        <var class="keyword varname">table_name</var></code> for each affected table to only retrieve the latest metadata for
+        those tables.
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="intro_hadoop__intro_hdfs">
+
+    <h2 class="title topictitle2" id="ariaid-title4">How Impala Uses HDFS</h2>
+  
+
+    <div class="body conbody">
+
+      <p class="p">
+        Impala uses the distributed filesystem HDFS as its primary data storage medium. Impala relies on the
+        redundancy provided by HDFS to guard against hardware or network outages on individual nodes. Impala table
+        data is physically represented as data files in HDFS, using familiar HDFS file formats and compression
+        codecs. When data files are present in the directory for a new table, Impala reads them all, regardless of
+        file name. New data is added in files with names controlled by Impala.
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title5" id="intro_hadoop__intro_hbase">
+
+    <h2 class="title topictitle2" id="ariaid-title5">How Impala Uses HBase</h2>
+  
+
+    <div class="body conbody">
+
+      <p class="p">
+        HBase is an alternative to HDFS as a storage medium for Impala data. It is a database storage system built
+        on top of HDFS, without built-in SQL support. Many Hadoop users already have it configured and store large
+        (often sparse) data sets in it. By defining tables in Impala and mapping them to equivalent tables in
+        HBase, you can query the contents of the HBase tables through Impala, and even perform join queries
+        including both Impala and HBase tables. See <a class="xref" href="impala_hbase.html#impala_hbase">Using Impala to Query HBase Tables</a> for details.
+      </p>
+    </div>
+  </article>
+</article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_having.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_having.html b/docs/build/html/topics/impala_having.html
new file mode 100644
index 0000000..378b07c
--- /dev/null
+++ b/docs/build/html/topics/impala_having.html
@@ -0,0 +1,39 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_select.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="having"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>HAVING Clause</title></head><body id="having"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">HAVING Clause</h1>
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      Performs a filter operation on a <code class="ph codeph">SELECT</code> query, by examining the results of aggregation
+      functions rather than testing each individual table row. Therefore, it is always used in conjunction with a
+      function such as <code class="ph codeph"><a class="xref" href="impala_count.html#count">COUNT()</a></code>,
+      <code class="ph codeph"><a class="xref" href="impala_sum.html#sum">SUM()</a></code>,
+      <code class="ph codeph"><a class="xref" href="impala_avg.html#avg">AVG()</a></code>,
+      <code class="ph codeph"><a class="xref" href="impala_min.html#min">MIN()</a></code>, or
+      <code class="ph codeph"><a class="xref" href="impala_max.html#max">MAX()</a></code>, and typically with the
+      <code class="ph codeph"><a class="xref" href="impala_group_by.html#group_by">GROUP BY</a></code> clause also.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Restrictions:</strong>
+      </p>
+
+    <p class="p">
+      The filter expression in the <code class="ph codeph">HAVING</code> clause cannot include a scalar subquery.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+    <p class="p">
+      <a class="xref" href="impala_select.html#select">SELECT Statement</a>,
+      <a class="xref" href="impala_group_by.html#group_by">GROUP BY Clause</a>,
+      <a class="xref" href="impala_aggregate_functions.html#aggregate_functions">Impala Aggregate Functions</a>
+    </p>
+
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_select.html">SELECT Statement</a></div></div></nav></article></main></body></html>
\ No newline at end of file

[16/51] [partial] incubator-impala git commit: IMPALA-4181 [DOCS] Publish rendered Impala documentation to ASF site

Posted by jb...@apache.org.

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_partitioning.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_partitioning.html b/docs/build/html/topics/impala_partitioning.html
new file mode 100644
index 0000000..b361083
--- /dev/null
+++ b/docs/build/html/topics/impala_partitioning.html
@@ -0,0 +1,653 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version"
  content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="partitioning"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Partitioning for Impala Tables</title></head><body id="partitioning"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Partitioning for Impala Tables</h1>
+
+  
+
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      
+      By default, all the data files for a table are located in a single directory. Partitioning is a technique for physically dividing the
+      data during loading, based on values from one or more columns, to speed up queries that test those columns. For example, with a
+      <code class="ph codeph">school_records</code> table partitioned on a <code class="ph codeph">year</code> column, there is a separate data directory for each
+      different year value, and all the data for that year is stored in a data file in that directory. A query that includes a
+      <code class="ph codeph">WHERE</code> condition such as <code class="ph codeph">YEAR=1966</code>, <code class="ph codeph">YEAR IN (1989,1999)</code>, or <code class="ph codeph">YEAR BETWEEN
+      1984 AND 1989</code> can examine only the data files from the appropriate directory or directories, greatly reducing the amount of
+      data to read and test.
+    </p>
+
+    <p class="p toc inpage"></p>
+
+    <p class="p">
+      See <a class="xref" href="impala_tutorial.html#tut_external_partition_data">Attaching an External Partitioned Table to an HDFS Directory Structure</a> for an example that illustrates the syntax for creating partitioned
+      tables, the underlying directory structure in HDFS, and how to attach a partitioned Impala external table to data files stored
+      elsewhere in HDFS.
+    </p>
+
+    <p class="p">
+      Parquet is a popular format for partitioned Impala tables because it is well suited to handle huge data volumes. See
+      <a class="xref" href="impala_parquet.html#parquet_performance">Query Performance for Impala Parquet Tables</a> for performance considerations for partitioned Parquet tables.
+    </p>
+
+    <p class="p">
+      See <a class="xref" href="impala_literals.html#null">NULL</a> for details about how <code class="ph codeph">NULL</code> values are represented in partitioned tables.
+    </p>
+
+    <p class="p">
+      See <a class="xref" href="impala_s3.html#s3">Using Impala with the Amazon S3 Filesystem</a> for details about setting up tables where some or all partitions reside on the Amazon Simple
+      Storage Service (S3).
+    </p>
+
+  </div>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title2" id="partitioning__partitioning_choosing">
+
+    <h2 class="title topictitle2" id="ariaid-title2">When to Use Partitioned Tables</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        Partitioning is typically appropriate for:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          Tables that are very large, where reading the entire data set takes an impractical amount of time.
+        </li>
+
+        <li class="li">
+          Tables that are always or almost always queried with conditions on the partitioning columns. In our example of a table partitioned
+          by year, <code class="ph codeph">SELECT COUNT(*) FROM school_records WHERE year = 1985</code> is efficient, only examining a small fraction of
+          the data; but <code class="ph codeph">SELECT COUNT(*) FROM school_records</code> has to process a separate data file for each year, resulting in
+          more overall work than in an unpartitioned table. You would probably not partition this way if you frequently queried the table
+          based on last name, student ID, and so on without testing the year.
+        </li>
+
+        <li class="li">
+          Columns that have reasonable cardinality (number of different values). If a column only has a small number of values, for example
+          <code class="ph codeph">Male</code> or <code class="ph codeph">Female</code>, you do not gain much efficiency by eliminating only about 50% of the data to
+          read for each query. If a column has only a few rows matching each value, the number of directories to process can become a
+          limiting factor, and the data file in each directory could be too small to take advantage of the Hadoop mechanism for transmitting
+          data in multi-megabyte blocks. For example, you might partition census data by year, store sales data by year and month, and web
+          traffic data by year, month, and day. (Some users with high volumes of incoming data might even partition down to the individual
+          hour and minute.)
+        </li>
+
+        <li class="li">
+          Data that already passes through an extract, transform, and load (ETL) pipeline. The values of the partitioning columns are
+          stripped from the original data files and represented by directory names, so loading data into a partitioned table involves some
+          sort of transformation or preprocessing.
+        </li>
+      </ul>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="partitioning__partition_sql">
+
+    <h2 class="title topictitle2" id="ariaid-title3">SQL Statements for Partitioned Tables</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        In terms of Impala SQL syntax, partitioning affects these statements:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          <code class="ph codeph"><a class="xref" href="impala_create_table.html#create_table">CREATE TABLE</a></code>: you specify a <code class="ph codeph">PARTITIONED
+          BY</code> clause when creating the table to identify names and data types of the partitioning columns. These columns are not
+          included in the main list of columns for the table.
+        </li>
+
+        <li class="li">
+          In <span class="keyword">Impala 2.5</span> and higher, you can also use the <code class="ph codeph">PARTITIONED BY</code> clause in a <code class="ph codeph">CREATE TABLE AS
+          SELECT</code> statement. This syntax lets you use a single statement to create a partitioned table, copy data into it, and
+          create new partitions based on the values in the inserted data.
+        </li>
+
+        <li class="li">
+          <code class="ph codeph"><a class="xref" href="impala_alter_table.html#alter_table">ALTER TABLE</a></code>: you can add or drop partitions, to work with
+          different portions of a huge data set. You can designate the HDFS directory that holds the data files for a specific partition.
+          With data partitioned by date values, you might <span class="q">"age out"</span> data that is no longer relevant.
+          <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+        If you are creating a partition for the first time and specifying its location, for maximum efficiency, use
+        a single <code class="ph codeph">ALTER TABLE</code> statement including both the <code class="ph codeph">ADD PARTITION</code> and
+        <code class="ph codeph">LOCATION</code> clauses, rather than separate statements with <code class="ph codeph">ADD PARTITION</code> and
+        <code class="ph codeph">SET LOCATION</code> clauses.
+      </div>
+        </li>
+
+        <li class="li">
+          <code class="ph codeph"><a class="xref" href="impala_insert.html#insert">INSERT</a></code>: When you insert data into a partitioned table, you identify
+          the partitioning columns. One or more values from each inserted row are not stored in data files, but instead determine the
+          directory where that row value is stored. You can also specify which partition to load a set of data into, with <code class="ph codeph">INSERT
+          OVERWRITE</code> statements; you can replace the contents of a specific partition but you cannot append data to a specific
+          partition.
+          <p class="p">
+        By default, if an <code class="ph codeph">INSERT</code> statement creates any new subdirectories underneath a partitioned
+        table, those subdirectories are assigned default HDFS permissions for the <code class="ph codeph">impala</code> user. To
+        make each subdirectory have the same permissions as its parent directory in HDFS, specify the
+        <code class="ph codeph">--insert_inherit_permissions</code> startup option for the <span class="keyword cmdname">impalad</span> daemon.
+      </p>
+        </li>
+
+        <li class="li">
+          Although the syntax of the <code class="ph codeph"><a class="xref" href="impala_select.html#select">SELECT</a></code> statement is the same whether or
+          not the table is partitioned, the way queries interact with partitioned tables can have a dramatic impact on performance and
+          scalability. The mechanism that lets queries skip certain partitions during a query is known as partition pruning; see
+          <a class="xref" href="impala_partitioning.html#partition_pruning">Partition Pruning for Queries</a> for details.
+        </li>
+
+        <li class="li">
+          In Impala 1.4 and later, there is a <code class="ph codeph">SHOW PARTITIONS</code> statement that displays information about each partition in a
+          table. See <a class="xref" href="impala_show.html#show">SHOW Statement</a> for details.
+        </li>
+      </ul>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="partitioning__partition_static_dynamic">
+
+    <h2 class="title topictitle2" id="ariaid-title4">Static and Dynamic Partitioning Clauses</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        Specifying all the partition columns in a SQL statement is called <dfn class="term">static partitioning</dfn>, because the statement affects a
+        single predictable partition. For example, you use static partitioning with an <code class="ph codeph">ALTER TABLE</code> statement that affects
+        only one partition, or with an <code class="ph codeph">INSERT</code> statement that inserts all values into the same partition:
+      </p>
+
+<pre class="pre codeblock"><code>insert into t1 <strong class="ph b">partition(x=10, y='a')</strong> select c1 from some_other_table;
+</code></pre>
+
+      <p class="p">
+        When you specify some partition key columns in an <code class="ph codeph">INSERT</code> statement, but leave out the values, Impala determines
+        which partition to insert. This technique is called <dfn class="term">dynamic partitioning</dfn>:
+      </p>
+
+<pre class="pre codeblock"><code>insert into t1 <strong class="ph b">partition(x, y='b')</strong> select c1, c2 from some_other_table;
+-- Create new partition if necessary based on variable year, month, and day; insert a single value.
+insert into weather <strong class="ph b">partition (year, month, day)</strong> select 'cloudy',2014,4,21;
+-- Create new partition if necessary for specified year and month but variable day; insert a single value.
+insert into weather <strong class="ph b">partition (year=2014, month=04, day)</strong> select 'sunny',22;
+</code></pre>
+
+      <p class="p">
+        The more key columns you specify in the <code class="ph codeph">PARTITION</code> clause, the fewer columns you need in the <code class="ph codeph">SELECT</code>
+        list. The trailing columns in the <code class="ph codeph">SELECT</code> list are substituted in order for the partition key columns with no
+        specified value.
+      </p>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title5" id="partitioning__partition_refresh">
+
+    <h2 class="title topictitle2" id="ariaid-title5">Refreshing a Single Partition</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        The <code class="ph codeph">REFRESH</code> statement is typically used with partitioned tables when new data files are loaded into a partition by
+        some non-Impala mechanism, such as a Hive or Spark job. The <code class="ph codeph">REFRESH</code> statement makes Impala aware of the new data
+        files so that they can be used in Impala queries. Because partitioned tables typically contain a high volume of data, the
+        <code class="ph codeph">REFRESH</code> operation for a full partitioned table can take significant time.
+      </p>
+
+      <p class="p">
+        In <span class="keyword">Impala 2.7</span> and higher, you can include a <code class="ph codeph">PARTITION (<var class="keyword varname">partition_spec</var>)</code> clause in the
+        <code class="ph codeph">REFRESH</code> statement so that only a single partition is refreshed. For example, <code class="ph codeph">REFRESH big_table PARTITION
+        (year=2017, month=9, day=30)</code>. The partition spec must include all the partition key columns. See
+        <a class="xref" href="impala_refresh.html#refresh">REFRESH Statement</a> for more details and examples of <code class="ph codeph">REFRESH</code> syntax and usage.
+      </p>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title6" id="partitioning__partition_permissions">
+
+    <h2 class="title topictitle2" id="ariaid-title6">Permissions for Partition Subdirectories</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        By default, if an <code class="ph codeph">INSERT</code> statement creates any new subdirectories underneath a partitioned
+        table, those subdirectories are assigned default HDFS permissions for the <code class="ph codeph">impala</code> user. To
+        make each subdirectory have the same permissions as its parent directory in HDFS, specify the
+        <code class="ph codeph">--insert_inherit_permissions</code> startup option for the <span class="keyword cmdname">impalad</span> daemon.
+      </p>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title7" id="partitioning__partition_pruning">
+
+    <h2 class="title topictitle2" id="ariaid-title7">Partition Pruning for Queries</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        Partition pruning refers to the mechanism where a query can skip reading the data files corresponding to one or more partitions. If
+        you can arrange for queries to prune large numbers of unnecessary partitions from the query execution plan, the queries use fewer
+        resources and are thus proportionally faster and more scalable.
+      </p>
+
+      <p class="p">
+        For example, if a table is partitioned by columns <code class="ph codeph">YEAR</code>, <code class="ph codeph">MONTH</code>, and <code class="ph codeph">DAY</code>, then
+        <code class="ph codeph">WHERE</code> clauses such as <code class="ph codeph">WHERE year = 2013</code>, <code class="ph codeph">WHERE year &lt; 2010</code>, or <code class="ph codeph">WHERE
+        year BETWEEN 1995 AND 1998</code> allow Impala to skip the data files in all partitions outside the specified range. Likewise,
+        <code class="ph codeph">WHERE year = 2013 AND month BETWEEN 1 AND 3</code> could prune even more partitions, reading the data files for only a
+        portion of one year.
+      </p>
+
+      <p class="p toc inpage"></p>
+
+    </div>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title8" id="partition_pruning__partition_pruning_checking">
+
+      <h3 class="title topictitle3" id="ariaid-title8">Checking if Partition Pruning Happens for a Query</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          To check the effectiveness of partition pruning for a query, check the <code class="ph codeph">EXPLAIN</code> output for the query before
+          running it. For example, this example shows a table with 3 partitions, where the query only reads 1 of them. The notation
+          <code class="ph codeph">#partitions=1/3</code> in the <code class="ph codeph">EXPLAIN</code> plan confirms that Impala can do the appropriate partition
+          pruning.
+        </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; insert into census partition (year=2010) values ('Smith'),('Jones');
+[localhost:21000] &gt; insert into census partition (year=2011) values ('Smith'),('Jones'),('Doe');
+[localhost:21000] &gt; insert into census partition (year=2012) values ('Smith'),('Doe');
+[localhost:21000] &gt; select name from census where year=2010;
++-------+
+| name  |
++-------+
+| Smith |
+| Jones |
++-------+
+[localhost:21000] &gt; explain select name from census <strong class="ph b">where year=2010</strong>;
++------------------------------------------------------------------+
+| Explain String                                                   |
++------------------------------------------------------------------+
+| PLAN FRAGMENT 0                                                  |
+|   PARTITION: UNPARTITIONED                                       |
+|                                                                  |
+|   1:EXCHANGE                                                     |
+|                                                                  |
+| PLAN FRAGMENT 1                                                  |
+|   PARTITION: RANDOM                                              |
+|                                                                  |
+|   STREAM DATA SINK                                               |
+|     EXCHANGE ID: 1                                               |
+|     UNPARTITIONED                                                |
+|                                                                  |
+|   0:SCAN HDFS                                                    |
+|      table=predicate_propagation.census <strong class="ph b">#partitions=1/3</strong> size=12B |
++------------------------------------------------------------------+</code></pre>
+
+        <p class="p">
+          For a report of the volume of data that was actually read and processed at each stage of the query, check the output of the
+          <code class="ph codeph">SUMMARY</code> command immediately after running the query. For a more detailed analysis, look at the output of the
+          <code class="ph codeph">PROFILE</code> command; it includes this same summary report near the start of the profile output.
+        </p>
+
+      </div>
+
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title9" id="partition_pruning__partition_pruning_sql">
+
+      <h3 class="title topictitle3" id="ariaid-title9">What SQL Constructs Work with Partition Pruning</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          
+          Impala can even do partition pruning in cases where the partition key column is not directly compared to a constant, by applying
+          the transitive property to other parts of the <code class="ph codeph">WHERE</code> clause. This technique is known as predicate propagation, and
+          is available in Impala 1.2.2 and later. In this example, the census table includes another column indicating when the data was
+          collected, which happens in 10-year intervals. Even though the query does not compare the partition key column
+          (<code class="ph codeph">YEAR</code>) to a constant value, Impala can deduce that only the partition <code class="ph codeph">YEAR=2010</code> is required, and
+          again only reads 1 out of 3 partitions.
+        </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; drop table census;
+[localhost:21000] &gt; create table census (name string, census_year int) partitioned by (year int);
+[localhost:21000] &gt; insert into census partition (year=2010) values ('Smith',2010),('Jones',2010);
+[localhost:21000] &gt; insert into census partition (year=2011) values ('Smith',2020),('Jones',2020),('Doe',2020);
+[localhost:21000] &gt; insert into census partition (year=2012) values ('Smith',2020),('Doe',2020);
+[localhost:21000] &gt; select name from census where year = census_year and census_year=2010;
++-------+
+| name  |
++-------+
+| Smith |
+| Jones |
++-------+
+[localhost:21000] &gt; explain select name from census <strong class="ph b">where year = census_year and census_year=2010</strong>;
++------------------------------------------------------------------+
+| Explain String                                                   |
++------------------------------------------------------------------+
+| PLAN FRAGMENT 0                                                  |
+|   PARTITION: UNPARTITIONED                                       |
+|                                                                  |
+|   1:EXCHANGE                                                     |
+|                                                                  |
+| PLAN FRAGMENT 1                                                  |
+|   PARTITION: RANDOM                                              |
+|                                                                  |
+|   STREAM DATA SINK                                               |
+|     EXCHANGE ID: 1                                               |
+|     UNPARTITIONED                                                |
+|                                                                  |
+|   0:SCAN HDFS                                                    |
+|      table=predicate_propagation.census <strong class="ph b">#partitions=1/3</strong> size=22B |
+|      predicates: census_year = 2010, year = census_year          |
++------------------------------------------------------------------+
+</code></pre>
+
+        <p class="p">
+        If a view applies to a partitioned table, any partition pruning considers the clauses on both
+        the original query and any additional <code class="ph codeph">WHERE</code> predicates in the query that refers to the view.
+        Prior to Impala 1.4, only the <code class="ph codeph">WHERE</code> clauses on the original query from the
+        <code class="ph codeph">CREATE VIEW</code> statement were used for partition pruning.
+      </p>
+
+        <p class="p">
+        In queries involving both analytic functions and partitioned tables, partition pruning only occurs for columns named in the <code class="ph codeph">PARTITION BY</code>
+        clause of the analytic function call. For example, if an analytic function query has a clause such as <code class="ph codeph">WHERE year=2016</code>,
+        the way to make the query prune all other <code class="ph codeph">YEAR</code> partitions is to include <code class="ph codeph">PARTITION BY year</code>in the analytic function call;
+        for example, <code class="ph codeph">OVER (PARTITION BY year,<var class="keyword varname">other_columns</var> <var class="keyword varname">other_analytic_clauses</var>)</code>.
+
+      </p>
+
+      </div>
+
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title10" id="partition_pruning__dynamic_partition_pruning">
+
+      <h3 class="title topictitle3" id="ariaid-title10">Dynamic Partition Pruning</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          The original mechanism uses to prune partitions is <dfn class="term">static partition pruning</dfn>, in which the conditions in the
+          <code class="ph codeph">WHERE</code> clause are analyzed to determine in advance which partitions can be safely skipped. In <span class="keyword">Impala 2.5</span>
+          and higher, Impala can perform <dfn class="term">dynamic partition pruning</dfn>, where information about the partitions is collected during
+          the query, and Impala prunes unnecessary partitions in ways that were impractical to predict in advance.
+        </p>
+
+        <p class="p">
+          For example, if partition key columns are compared to literal values in a <code class="ph codeph">WHERE</code> clause, Impala can perform static
+          partition pruning during the planning phase to only read the relevant partitions:
+        </p>
+
+<pre class="pre codeblock"><code>
+-- The query only needs to read 3 partitions whose key values are known ahead of time.
+-- That's static partition pruning.
+SELECT COUNT(*) FROM sales_table WHERE year IN (2005, 2010, 2015);
+</code></pre>
+
+        <p class="p">
+          Dynamic partition pruning involves using information only available at run time, such as the result of a subquery:
+        </p>
+
+<pre class="pre codeblock"><code>
+create table yy (s string) partitioned by (year int) stored as parquet;
+insert into yy partition (year) values ('1999', 1999), ('2000', 2000),
+  ('2001', 2001), ('2010',2010);
+compute stats yy;
+
+create table yy2 (s string) partitioned by (year int) stored as parquet;
+insert into yy2 partition (year) values ('1999', 1999), ('2000', 2000),
+  ('2001', 2001);
+compute stats yy2;
+
+-- The query reads an unknown number of partitions, whose key values are only
+-- known at run time. The 'runtime filters' lines show how the information about
+-- the partitions is calculated in query fragment 02, and then used in query
+-- fragment 00 to decide which partitions to skip.
+explain select s from yy2 where year in (select year from yy where year between 2000 and 2005);
++----------------------------------------------------------+
+| Explain String                                           |
++----------------------------------------------------------+
+| Estimated Per-Host Requirements: Memory=16.00MB VCores=2 |
+|                                                          |
+| 04:EXCHANGE [UNPARTITIONED]                              |
+| |                                                        |
+| 02:HASH JOIN [LEFT SEMI JOIN, BROADCAST]                 |
+| |  hash predicates: year = year                          |
+| |  <strong class="ph b">runtime filters: RF000 &lt;- year</strong>                        |
+| |                                                        |
+| |--03:EXCHANGE [BROADCAST]                               |
+| |  |                                                     |
+| |  01:SCAN HDFS [dpp.yy]                                 |
+| |     partitions=2/4 files=2 size=468B                   |
+| |                                                        |
+| 00:SCAN HDFS [dpp.yy2]                                   |
+|    partitions=2/3 files=2 size=468B                      |
+|    <strong class="ph b">runtime filters: RF000 -&gt; year</strong>                        |
++----------------------------------------------------------+
+</code></pre>
+
+
+
+        <p class="p">
+          In this case, Impala evaluates the subquery, sends the subquery results to all Impala nodes participating in the query, and then
+          each <span class="keyword cmdname">impalad</span> daemon uses the dynamic partition pruning optimization to read only the partitions with the
+          relevant key values.
+        </p>
+
+        <p class="p">
+          Dynamic partition pruning is especially effective for queries involving joins of several large partitioned tables. Evaluating the
+          <code class="ph codeph">ON</code> clauses of the join predicates might normally require reading data from all partitions of certain tables. If
+          the <code class="ph codeph">WHERE</code> clauses of the query refer to the partition key columns, Impala can now often skip reading many of the
+          partitions while evaluating the <code class="ph codeph">ON</code> clauses. The dynamic partition pruning optimization reduces the amount of I/O
+          and the amount of intermediate data stored and transmitted across the network during the query.
+        </p>
+
+        <p class="p">
+        When the spill-to-disk feature is activated for a join node within a query, Impala does not
+        produce any runtime filters for that join operation on that host. Other join nodes within
+        the query are not affected.
+      </p>
+
+        <p class="p">
+          Dynamic partition pruning is part of the runtime filtering feature, which applies to other kinds of queries in addition to queries
+          against partitioned tables. See <a class="xref" href="impala_runtime_filtering.html#runtime_filtering">Runtime Filtering for Impala Queries (Impala 2.5 or higher only)</a> for full details about this feature.
+        </p>
+
+      </div>
+
+    </article>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title11" id="partitioning__partition_key_columns">
+
+    <h2 class="title topictitle2" id="ariaid-title11">Partition Key Columns</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        The columns you choose as the partition keys should be ones that are frequently used to filter query results in important,
+        large-scale queries. Popular examples are some combination of year, month, and day when the data has associated time values, and
+        geographic region when the data is associated with some place.
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          <p class="p">
+            For time-based data, split out the separate parts into their own columns, because Impala cannot partition based on a
+            <code class="ph codeph">TIMESTAMP</code> column.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            The data type of the partition columns does not have a significant effect on the storage required, because the values from those
+            columns are not stored in the data files, rather they are represented as strings inside HDFS directory names.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            In <span class="keyword">Impala 2.5</span> and higher, you can enable the <code class="ph codeph">OPTIMIZE_PARTITION_KEY_SCANS</code> query option to speed up
+            queries that only refer to partition key columns, such as <code class="ph codeph">SELECT MAX(year)</code>. This setting is not enabled by
+            default because the query behavior is slightly different if the table contains partition directories without actual data inside.
+            See <a class="xref" href="impala_optimize_partition_key_scans.html#optimize_partition_key_scans">OPTIMIZE_PARTITION_KEY_SCANS Query Option (Impala 2.5 or higher only)</a> for details.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+        Partitioned tables can contain complex type columns.
+        All the partition key columns must be scalar types.
+      </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            Remember that when Impala queries data stored in HDFS, it is most efficient to use multi-megabyte files to take advantage of the
+            HDFS block size. For Parquet tables, the block size (and ideal size of the data files) is <span class="ph">256 MB in
+            Impala 2.0 and later</span>. Therefore, avoid specifying too many partition key columns, which could result in individual
+            partitions containing only small amounts of data. For example, if you receive 1 GB of data per day, you might partition by year,
+            month, and day; while if you receive 5 GB of data per minute, you might partition by year, month, day, hour, and minute. If you
+            have data with a geographic component, you might partition based on postal code if you have many megabytes of data for each
+            postal code, but if not, you might partition by some larger region such as city, state, or country. state
+          </p>
+        </li>
+      </ul>
+
+      <p class="p">
+        If you frequently run aggregate functions such as <code class="ph codeph">MIN()</code>, <code class="ph codeph">MAX()</code>, and
+        <code class="ph codeph">COUNT(DISTINCT)</code> on partition key columns, consider enabling the <code class="ph codeph">OPTIMIZE_PARTITION_KEY_SCANS</code>
+        query option, which optimizes such queries. This feature is available in <span class="keyword">Impala 2.5</span> and higher.
+        See <a class="xref" href="../shared/../topics/impala_optimize_partition_key_scans.html">OPTIMIZE_PARTITION_KEY_SCANS Query Option (Impala 2.5 or higher only)</a>
+        for the kinds of queries that this option applies to, and slight differences in how partitions are
+        evaluated when this query option is enabled.
+      </p>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title12" id="partitioning__mixed_format_partitions">
+
+    <h2 class="title topictitle2" id="ariaid-title12">Setting Different File Formats for Partitions</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        Partitioned tables have the flexibility to use different file formats for different partitions. (For background information about
+        the different file formats Impala supports, see <a class="xref" href="impala_file_formats.html#file_formats">How Impala Works with Hadoop File Formats</a>.) For example, if you originally
+        received data in text format, then received new data in RCFile format, and eventually began receiving data in Parquet format, all
+        that data could reside in the same table for queries. You just need to ensure that the table is structured so that the data files
+        that use different file formats reside in separate partitions.
+      </p>
+
+      <p class="p">
+        For example, here is how you might switch from text to Parquet data as you receive data for different years:
+      </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; create table census (name string) partitioned by (year smallint);
+[localhost:21000] &gt; alter table census add partition (year=2012); -- Text format;
+
+[localhost:21000] &gt; alter table census add partition (year=2013); -- Text format switches to Parquet before data loaded;
+[localhost:21000] &gt; alter table census partition (year=2013) set fileformat parquet;
+
+[localhost:21000] &gt; insert into census partition (year=2012) values ('Smith'),('Jones'),('Lee'),('Singh');
+[localhost:21000] &gt; insert into census partition (year=2013) values ('Flores'),('Bogomolov'),('Cooper'),('Appiah');</code></pre>
+
+      <p class="p">
+        At this point, the HDFS directory for <code class="ph codeph">year=2012</code> contains a text-format data file, while the HDFS directory for
+        <code class="ph codeph">year=2013</code> contains a Parquet data file. As always, when loading non-trivial data, you would use <code class="ph codeph">INSERT ...
+        SELECT</code> or <code class="ph codeph">LOAD DATA</code> to import data in large batches, rather than <code class="ph codeph">INSERT ... VALUES</code> which
+        produces small files that are inefficient for real-world queries.
+      </p>
+
+      <p class="p">
+        For other file types that Impala cannot create natively, you can switch into Hive and issue the <code class="ph codeph">ALTER TABLE ... SET
+        FILEFORMAT</code> statements and <code class="ph codeph">INSERT</code> or <code class="ph codeph">LOAD DATA</code> statements there. After switching back to
+        Impala, issue a <code class="ph codeph">REFRESH <var class="keyword varname">table_name</var></code> statement so that Impala recognizes any partitions or new
+        data added through Hive.
+      </p>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title13" id="partitioning__partition_management">
+
+    <h2 class="title topictitle2" id="ariaid-title13">Managing Partitions</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        You can add, drop, set the expected file format, or set the HDFS location of the data files for individual partitions within an
+        Impala table. See <a class="xref" href="impala_alter_table.html#alter_table">ALTER TABLE Statement</a> for syntax details, and
+        <a class="xref" href="impala_partitioning.html#mixed_format_partitions">Setting Different File Formats for Partitions</a> for tips on managing tables containing partitions with different file
+        formats.
+      </p>
+
+      <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+        If you are creating a partition for the first time and specifying its location, for maximum efficiency, use
+        a single <code class="ph codeph">ALTER TABLE</code> statement including both the <code class="ph codeph">ADD PARTITION</code> and
+        <code class="ph codeph">LOCATION</code> clauses, rather than separate statements with <code class="ph codeph">ADD PARTITION</code> and
+        <code class="ph codeph">SET LOCATION</code> clauses.
+      </div>
+
+      <p class="p">
+        What happens to the data files when a partition is dropped depends on whether the partitioned table is designated as internal or
+        external. For an internal (managed) table, the data files are deleted. For example, if data in the partitioned table is a copy of
+        raw data files stored elsewhere, you might save disk space by dropping older partitions that are no longer required for reporting,
+        knowing that the original data is still available if needed later. For an external table, the data files are left alone. For
+        example, dropping a partition without deleting the associated files lets Impala consider a smaller set of partitions, improving
+        query efficiency and reducing overhead for DDL operations on the table; if the data is needed again later, you can add the partition
+        again. See <a class="xref" href="impala_tables.html#tables">Overview of Impala Tables</a> for details and examples.
+      </p>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title14" id="partitioning__partition_kudu">
+
+    <h2 class="title topictitle2" id="ariaid-title14">Using Partitioning with Kudu Tables</h2>
+
+    
+
+    <div class="body conbody">
+
+      <p class="p">
+        Kudu tables use a more fine-grained partitioning scheme than tables containing HDFS data files. You specify a <code class="ph codeph">PARTITION
+        BY</code> clause with the <code class="ph codeph">CREATE TABLE</code> statement to identify how to divide the values from the partition key
+        columns.
+      </p>
+
+      <p class="p">
+        See <a class="xref" href="impala_kudu.html#kudu_partitioning">Partitioning for Kudu Tables</a> for
+        details and examples of the partitioning techniques
+        for Kudu tables.
+      </p>
+
+    </div>
+
+  </article>
+
+</article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_perf_benchmarking.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_perf_benchmarking.html b/docs/build/html/topics/impala_perf_benchmarking.html
new file mode 100644
index 0000000..ce9d995
--- /dev/null
+++ b/docs/build/html/topics/impala_perf_benchmarking.html
@@ -0,0 +1,27 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_performance.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="perf_benchmarks"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Benchmarking Impala Queries</title></head><body id="perf_benchmarks"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Benchmarking Impala Queries</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      Because Impala, like other Hadoop components, is designed to handle large data volumes in a distributed
+      environment, conduct any performance tests using realistic data and cluster configurations. Use a multi-node
+      cluster rather than a single node; run queries against tables containing terabytes of data rather than tens
+      of gigabytes. The parallel processing techniques used by Impala are most appropriate for workloads that are
+      beyond the capacity of a single server.
+    </p>
+
+    <p class="p">
+      When you run queries returning large numbers of rows, the CPU time to pretty-print the output can be
+      substantial, giving an inaccurate measurement of the actual query time. Consider using the
+      <code class="ph codeph">-B</code> option on the <code class="ph codeph">impala-shell</code> command to turn off the pretty-printing, and
+      optionally the <code class="ph codeph">-o</code> option to store query results in a file rather than printing to the
+      screen. See <a class="xref" href="impala_shell_options.html#shell_options">impala-shell Configuration Options</a> for details.
+    </p>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_performance.html">Tuning Impala for Performance</a></div></div></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_perf_cookbook.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_perf_cookbook.html b/docs/build/html/topics/impala_perf_cookbook.html
new file mode 100644
index 0000000..fc68b45
--- /dev/null
+++ b/docs/build/html/topics/impala_perf_cookbook.html
@@ -0,0 +1,256 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_performance.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="perf_cookbook"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Impala Performance Guidelines and Best Practices</title></head><body id="perf_cookbook"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Impala Performance Guidelines and Best Practices</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      Here are performance guidelines and best practices that you can use during planning, experimentation, and
+      performance tuning for an Impala-enabled <span class="keyword"></span> cluster. All of this information is also available in more
+      detail elsewhere in the Impala documentation; it is gathered together here to serve as a cookbook and
+      emphasize which performance techniques typically provide the highest return on investment
+    </p>
+
+    <p class="p toc inpage"></p>
+
+    <section class="section" id="perf_cookbook__perf_cookbook_file_format"><h2 class="title sectiontitle">Choose the appropriate file format for the data.</h2>
+
+      
+
+      <p class="p">
+        Typically, for large volumes of data (multiple gigabytes per table or partition), the Parquet file format
+        performs best because of its combination of columnar storage layout, large I/O request size, and
+        compression and encoding. See <a class="xref" href="impala_file_formats.html#file_formats">How Impala Works with Hadoop File Formats</a> for comparisons of all
+        file formats supported by Impala, and <a class="xref" href="impala_parquet.html#parquet">Using the Parquet File Format with Impala Tables</a> for details about the
+        Parquet file format.
+      </p>
+
+      <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+        For smaller volumes of data, a few gigabytes or less for each table or partition, you might not see
+        significant performance differences between file formats. At small data volumes, reduced I/O from an
+        efficient compressed file format can be counterbalanced by reduced opportunity for parallel execution. When
+        planning for a production deployment or conducting benchmarks, always use realistic data volumes to get a
+        true picture of performance and scalability.
+      </div>
+    </section>
+
+    <section class="section" id="perf_cookbook__perf_cookbook_small_files"><h2 class="title sectiontitle">Avoid data ingestion processes that produce many small files.</h2>
+
+      
+
+      <p class="p">
+        When producing data files outside of Impala, prefer either text format or Avro, where you can build up the
+        files row by row. Once the data is in Impala, you can convert it to the more efficient Parquet format and
+        split into multiple data files using a single <code class="ph codeph">INSERT ... SELECT</code> statement. Or, if you have
+        the infrastructure to produce multi-megabyte Parquet files as part of your data preparation process, do
+        that and skip the conversion step inside Impala.
+      </p>
+
+      <p class="p">
+        Always use <code class="ph codeph">INSERT ... SELECT</code> to copy significant volumes of data from table to table
+        within Impala. Avoid <code class="ph codeph">INSERT ... VALUES</code> for any substantial volume of data or
+        performance-critical tables, because each such statement produces a separate tiny data file. See
+        <a class="xref" href="impala_insert.html#insert">INSERT Statement</a> for examples of the <code class="ph codeph">INSERT ... SELECT</code> syntax.
+      </p>
+
+      <p class="p">
+        For example, if you have thousands of partitions in a Parquet table, each with less than
+        <span class="ph">256 MB</span> of data, consider partitioning in a less granular way, such as by
+        year / month rather than year / month / day. If an inefficient data ingestion process produces thousands of
+        data files in the same table or partition, consider compacting the data by performing an <code class="ph codeph">INSERT ...
+        SELECT</code> to copy all the data to a different table; the data will be reorganized into a smaller
+        number of larger files by this process.
+      </p>
+    </section>
+
+    <section class="section" id="perf_cookbook__perf_cookbook_partitioning"><h2 class="title sectiontitle">Choose partitioning granularity based on actual data volume.</h2>
+
+      
+
+      <p class="p">
+        Partitioning is a technique that physically divides the data based on values of one or more columns, such
+        as by year, month, day, region, city, section of a web site, and so on. When you issue queries that request
+        a specific value or range of values for the partition key columns, Impala can avoid reading the irrelevant
+        data, potentially yielding a huge savings in disk I/O.
+      </p>
+
+      <p class="p">
+        When deciding which column(s) to use for partitioning, choose the right level of granularity. For example,
+        should you partition by year, month, and day, or only by year and month? Choose a partitioning strategy
+        that puts at least <span class="ph">256 MB</span> of data in each partition, to take advantage of
+        HDFS bulk I/O and Impala distributed queries.
+      </p>
+
+      <p class="p">
+        Over-partitioning can also cause query planning to take longer than necessary, as Impala prunes the
+        unnecessary partitions. Ideally, keep the number of partitions in the table under 30 thousand.
+      </p>
+
+      <p class="p">
+        When preparing data files to go in a partition directory, create several large files rather than many small
+        ones. If you receive data in the form of many small files and have no control over the input format,
+        consider using the <code class="ph codeph">INSERT ... SELECT</code> syntax to copy data from one table or partition to
+        another, which compacts the files into a relatively small number (based on the number of nodes in the
+        cluster).
+      </p>
+
+      <p class="p">
+        If you need to reduce the overall number of partitions and increase the amount of data in each partition,
+        first look for partition key columns that are rarely referenced or are referenced in non-critical queries
+        (not subject to an SLA). For example, your web site log data might be partitioned by year, month, day, and
+        hour, but if most queries roll up the results by day, perhaps you only need to partition by year, month,
+        and day.
+      </p>
+
+      <p class="p">
+        If you need to reduce the granularity even more, consider creating <span class="q">"buckets"</span>, computed values
+        corresponding to different sets of partition key values. For example, you can use the
+        <code class="ph codeph">TRUNC()</code> function with a <code class="ph codeph">TIMESTAMP</code> column to group date and time values
+        based on intervals such as week or quarter. See
+        <a class="xref" href="impala_datetime_functions.html#datetime_functions">Impala Date and Time Functions</a> for details.
+      </p>
+
+      <p class="p">
+        See <a class="xref" href="impala_partitioning.html#partitioning">Partitioning for Impala Tables</a> for full details and performance considerations for
+        partitioning.
+      </p>
+    </section>
+
+    <section class="section" id="perf_cookbook__perf_cookbook_partition_keys"><h2 class="title sectiontitle">Use smallest appropriate integer types for partition key columns.</h2>
+
+      
+
+      <p class="p">
+        Although it is tempting to use strings for partition key columns, since those values are turned into HDFS
+        directory names anyway, you can minimize memory usage by using numeric values for common partition key
+        fields such as <code class="ph codeph">YEAR</code>, <code class="ph codeph">MONTH</code>, and <code class="ph codeph">DAY</code>. Use the smallest
+        integer type that holds the appropriate range of values, typically <code class="ph codeph">TINYINT</code> for
+        <code class="ph codeph">MONTH</code> and <code class="ph codeph">DAY</code>, and <code class="ph codeph">SMALLINT</code> for <code class="ph codeph">YEAR</code>.
+        Use the <code class="ph codeph">EXTRACT()</code> function to pull out individual date and time fields from a
+        <code class="ph codeph">TIMESTAMP</code> value, and <code class="ph codeph">CAST()</code> the return value to the appropriate integer
+        type.
+      </p>
+    </section>
+
+    <section class="section" id="perf_cookbook__perf_cookbook_parquet_block_size"><h2 class="title sectiontitle">Choose an appropriate Parquet block size.</h2>
+
+      
+
+      <p class="p">
+        By default, the Impala <code class="ph codeph">INSERT ... SELECT</code> statement creates Parquet files with a 256 MB
+        block size. (This default was changed in Impala 2.0. Formerly, the limit was 1 GB, but Impala made
+        conservative estimates about compression, resulting in files that were smaller than 1 GB.)
+      </p>
+
+      <p class="p">
+        Each Parquet file written by Impala is a single block, allowing the whole file to be processed as a unit by a single host.
+        As you copy Parquet files into HDFS or between HDFS filesystems, use <code class="ph codeph">hdfs dfs -pb</code> to preserve the original
+        block size.
+      </p>
+
+      <p class="p">
+        If there is only one or a few data block in your Parquet table, or in a partition that is the only one
+        accessed by a query, then you might experience a slowdown for a different reason: not enough data to take
+        advantage of Impala's parallel distributed queries. Each data block is processed by a single core on one of
+        the DataNodes. In a 100-node cluster of 16-core machines, you could potentially process thousands of data
+        files simultaneously. You want to find a sweet spot between <span class="q">"many tiny files"</span> and <span class="q">"single giant
+        file"</span> that balances bulk I/O and parallel processing. You can set the <code class="ph codeph">PARQUET_FILE_SIZE</code>
+        query option before doing an <code class="ph codeph">INSERT ... SELECT</code> statement to reduce the size of each
+        generated Parquet file. <span class="ph">(Specify the file size as an absolute number of bytes, or in Impala
+        2.0 and later, in units ending with <code class="ph codeph">m</code> for megabytes or <code class="ph codeph">g</code> for
+        gigabytes.)</span> Run benchmarks with different file sizes to find the right balance point for your
+        particular data volume.
+      </p>
+    </section>
+
+    <section class="section" id="perf_cookbook__perf_cookbook_stats"><h2 class="title sectiontitle">Gather statistics for all tables used in performance-critical or high-volume join queries.</h2>
+
+      
+
+      <p class="p">
+        Gather the statistics with the <code class="ph codeph">COMPUTE STATS</code> statement. See
+        <a class="xref" href="impala_perf_joins.html#perf_joins">Performance Considerations for Join Queries</a> for details.
+      </p>
+    </section>
+
+    <section class="section" id="perf_cookbook__perf_cookbook_network"><h2 class="title sectiontitle">Minimize the overhead of transmitting results back to the client.</h2>
+
+      
+
+      <p class="p">
+        Use techniques such as:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          Aggregation. If you need to know how many rows match a condition, the total values of matching values
+          from some column, the lowest or highest matching value, and so on, call aggregate functions such as
+          <code class="ph codeph">COUNT()</code>, <code class="ph codeph">SUM()</code>, and <code class="ph codeph">MAX()</code> in the query rather than
+          sending the result set to an application and doing those computations there. Remember that the size of an
+          unaggregated result set could be huge, requiring substantial time to transmit across the network.
+        </li>
+
+        <li class="li">
+          Filtering. Use all applicable tests in the <code class="ph codeph">WHERE</code> clause of a query to eliminate rows
+          that are not relevant, rather than producing a big result set and filtering it using application logic.
+        </li>
+
+        <li class="li">
+          <code class="ph codeph">LIMIT</code> clause. If you only need to see a few sample values from a result set, or the top
+          or bottom values from a query using <code class="ph codeph">ORDER BY</code>, include the <code class="ph codeph">LIMIT</code> clause
+          to reduce the size of the result set rather than asking for the full result set and then throwing most of
+          the rows away.
+        </li>
+
+        <li class="li">
+          Avoid overhead from pretty-printing the result set and displaying it on the screen. When you retrieve the
+          results through <span class="keyword cmdname">impala-shell</span>, use <span class="keyword cmdname">impala-shell</span> options such as
+          <code class="ph codeph">-B</code> and <code class="ph codeph">--output_delimiter</code> to produce results without special
+          formatting, and redirect output to a file rather than printing to the screen. Consider using
+          <code class="ph codeph">INSERT ... SELECT</code> to write the results directly to new files in HDFS. See
+          <a class="xref" href="impala_shell_options.html#shell_options">impala-shell Configuration Options</a> for details about the
+          <span class="keyword cmdname">impala-shell</span> command-line options.
+        </li>
+      </ul>
+    </section>
+
+    <section class="section" id="perf_cookbook__perf_cookbook_explain"><h2 class="title sectiontitle">Verify that your queries are planned in an efficient logical manner.</h2>
+
+      
+
+      <p class="p">
+        Examine the <code class="ph codeph">EXPLAIN</code> plan for a query before actually running it. See
+        <a class="xref" href="impala_explain.html#explain">EXPLAIN Statement</a> and <a class="xref" href="impala_explain_plan.html#perf_explain">Using the EXPLAIN Plan for Performance Tuning</a> for
+        details.
+      </p>
+    </section>
+
+    <section class="section" id="perf_cookbook__perf_cookbook_profile"><h2 class="title sectiontitle">Verify performance characteristics of queries.</h2>
+
+      
+
+      <p class="p">
+        Verify that the low-level aspects of I/O, memory usage, network bandwidth, CPU utilization, and so on are
+        within expected ranges by examining the query profile for a query after running it. See
+        <a class="xref" href="impala_explain_plan.html#perf_profile">Using the Query Profile for Performance Tuning</a> for details.
+      </p>
+    </section>
+
+    <section class="section" id="perf_cookbook__perf_cookbook_os"><h2 class="title sectiontitle">Use appropriate operating system settings.</h2>
+
+      
+
+      <p class="p">
+        See <span class="xref">the documentation for your Apache Hadoop distribution</span> for recommendations about operating system
+        settings that you can change to influence Impala performance. In particular, you might find
+        that changing the <code class="ph codeph">vm.swappiness</code> Linux kernel setting to a non-zero value improves
+        overall performance.
+      </p>
+    </section>
+
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_performance.html">Tuning Impala for Performance</a></div></div></nav></article></main></body></html>
\ No newline at end of file

[42/51] [partial] incubator-impala git commit: IMPALA-4181 [DOCS] Publish rendered Impala documentation to ASF site

Posted by jb...@apache.org.

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_components.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_components.html b/docs/build/html/topics/impala_components.html
new file mode 100644
index 0000000..d3d210d
--- /dev/null
+++ b/docs/build/html/topics/impala_components.html
@@ -0,0 +1,192 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_concepts.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="intro_components"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Components of the Impala Server</title></head><body id="intro_components"><main 
 role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Components of the Impala Server</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      The Impala server is a distributed, massively parallel processing (MPP) database engine. It consists of
+      different daemon processes that run on specific hosts within your <span class="keyword"></span> cluster.
+    </p>
+
+    <p class="p toc inpage"></p>
+  </div>
+
+  <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_concepts.html">Impala Concepts and Architecture</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="intro_components__intro_impalad">
+
+    <h2 class="title topictitle2" id="ariaid-title2">The Impala Daemon</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        The core Impala component is a daemon process that runs on each DataNode of the cluster, physically represented
+        by the <code class="ph codeph">impalad</code> process. It reads and writes to data files; accepts queries transmitted
+        from the <code class="ph codeph">impala-shell</code> command, Hue, JDBC, or ODBC; parallelizes the queries and
+        distributes work across the cluster; and transmits intermediate query results back to the
+        central coordinator node.
+      </p>
+
+      <p class="p">
+        You can submit a query to the Impala daemon running on any DataNode, and that instance of the daemon serves as the
+        <dfn class="term">coordinator node</dfn> for that query. The other nodes transmit partial results back to the
+        coordinator, which constructs the final result set for a query. When running experiments with functionality
+        through the <code class="ph codeph">impala-shell</code> command, you might always connect to the same Impala daemon for
+        convenience. For clusters running production workloads, you might load-balance by
+        submitting each query to a different Impala daemon in round-robin style, using the JDBC or ODBC interfaces.
+      </p>
+
+      <p class="p">
+        The Impala daemons are in constant communication with the <dfn class="term">statestore</dfn>, to confirm which nodes
+        are healthy and can accept new work.
+      </p>
+
+      <p class="p">
+        They also receive broadcast messages from the <span class="keyword cmdname">catalogd</span> daemon (introduced in Impala 1.2)
+        whenever any Impala node in the cluster creates, alters, or drops any type of object, or when an
+        <code class="ph codeph">INSERT</code> or <code class="ph codeph">LOAD DATA</code> statement is processed through Impala. This
+        background communication minimizes the need for <code class="ph codeph">REFRESH</code> or <code class="ph codeph">INVALIDATE
+        METADATA</code> statements that were needed to coordinate metadata across nodes prior to Impala 1.2.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Related information:</strong> <a class="xref" href="impala_config_options.html#config_options">Modifying Impala Startup Options</a>,
+        <a class="xref" href="impala_processes.html#processes">Starting Impala</a>, <a class="xref" href="impala_timeouts.html#impalad_timeout">Setting the Idle Query and Idle Session Timeouts for impalad</a>,
+        <a class="xref" href="impala_ports.html#ports">Ports Used by Impala</a>, <a class="xref" href="impala_proxy.html#proxy">Using Impala through a Proxy for High Availability</a>
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="intro_components__intro_statestore">
+
+    <h2 class="title topictitle2" id="ariaid-title3">The Impala Statestore</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        The Impala component known as the <dfn class="term">statestore</dfn> checks on the health of Impala daemons on all the
+        DataNodes in a cluster, and continuously relays its findings to each of those daemons. It is physically
+        represented by a daemon process named <code class="ph codeph">statestored</code>; you only need such a process on one
+        host in the cluster. If an Impala daemon goes offline due to hardware failure, network error, software issue,
+        or other reason, the statestore informs all the other Impala daemons so that future queries can avoid making
+        requests to the unreachable node.
+      </p>
+
+      <p class="p">
+        Because the statestore's purpose is to help when things go wrong, it is not critical to the normal
+        operation of an Impala cluster. If the statestore is not running or becomes unreachable, the Impala daemons
+        continue running and distributing work among themselves as usual; the cluster just becomes less robust if
+        other Impala daemons fail while the statestore is offline. When the statestore comes back online, it re-establishes
+        communication with the Impala daemons and resumes its monitoring function.
+      </p>
+
+      <p class="p">
+        Most considerations for load balancing and high availability apply to the <span class="keyword cmdname">impalad</span> daemon.
+        The <span class="keyword cmdname">statestored</span> and <span class="keyword cmdname">catalogd</span> daemons do not have special
+        requirements for high availability, because problems with those daemons do not result in data loss.
+        If those daemons become unavailable due to an outage on a particular
+        host, you can stop the Impala service, delete the <span class="ph uicontrol">Impala StateStore</span> and
+        <span class="ph uicontrol">Impala Catalog Server</span> roles, add the roles on a different host, and restart the
+        Impala service.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+      <p class="p">
+        <a class="xref" href="impala_scalability.html#statestore_scalability">Scalability Considerations for the Impala Statestore</a>,
+        <a class="xref" href="impala_config_options.html#config_options">Modifying Impala Startup Options</a>, <a class="xref" href="impala_processes.html#processes">Starting Impala</a>,
+        <a class="xref" href="impala_timeouts.html#statestore_timeout">Increasing the Statestore Timeout</a>, <a class="xref" href="impala_ports.html#ports">Ports Used by Impala</a>
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="intro_components__intro_catalogd">
+
+    <h2 class="title topictitle2" id="ariaid-title4">The Impala Catalog Service</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        The Impala component known as the <dfn class="term">catalog service</dfn> relays the metadata changes from Impala SQL
+        statements to all the DataNodes in a cluster. It is physically represented by a daemon process named
+        <code class="ph codeph">catalogd</code>; you only need such a process on one host in the cluster. Because the requests
+        are passed through the statestore daemon, it makes sense to run the <span class="keyword cmdname">statestored</span> and
+        <span class="keyword cmdname">catalogd</span> services on the same host.
+      </p>
+
+      <p class="p">
+        The catalog service avoids the need to issue
+        <code class="ph codeph">REFRESH</code> and <code class="ph codeph">INVALIDATE METADATA</code> statements when the metadata changes are
+        performed by statements issued through Impala. When you create a table, load data, and so on through Hive,
+        you do need to issue <code class="ph codeph">REFRESH</code> or <code class="ph codeph">INVALIDATE METADATA</code> on an Impala node
+        before executing a query there.
+      </p>
+
+      <p class="p">
+        This feature touches a number of aspects of Impala:
+      </p>
+
+
+
+      <ul class="ul" id="intro_catalogd__catalogd_xrefs">
+        <li class="li">
+          <p class="p">
+            See <a class="xref" href="impala_install.html#install">Installing Impala</a>, <a class="xref" href="impala_upgrading.html#upgrading">Upgrading Impala</a> and
+            <a class="xref" href="impala_processes.html#processes">Starting Impala</a>, for usage information for the
+            <span class="keyword cmdname">catalogd</span> daemon.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            The <code class="ph codeph">REFRESH</code> and <code class="ph codeph">INVALIDATE METADATA</code> statements are not needed
+            when the <code class="ph codeph">CREATE TABLE</code>, <code class="ph codeph">INSERT</code>, or other table-changing or
+            data-changing operation is performed through Impala. These statements are still needed if such
+            operations are done through Hive or by manipulating data files directly in HDFS, but in those cases the
+            statements only need to be issued on one Impala node rather than on all nodes. See
+            <a class="xref" href="impala_refresh.html#refresh">REFRESH Statement</a> and
+            <a class="xref" href="impala_invalidate_metadata.html#invalidate_metadata">INVALIDATE METADATA Statement</a> for the latest usage information for
+            those statements.
+          </p>
+        </li>
+      </ul>
+
+      <p class="p">
+        By default, the metadata loading and caching on startup happens asynchronously, so Impala can begin
+        accepting requests promptly. To enable the original behavior, where Impala waited until all metadata was
+        loaded before accepting any requests, set the <span class="keyword cmdname">catalogd</span> configuration option
+        <code class="ph codeph">--load_catalog_in_background=false</code>.
+      </p>
+
+      <p class="p">
+        Most considerations for load balancing and high availability apply to the <span class="keyword cmdname">impalad</span> daemon.
+        The <span class="keyword cmdname">statestored</span> and <span class="keyword cmdname">catalogd</span> daemons do not have special
+        requirements for high availability, because problems with those daemons do not result in data loss.
+        If those daemons become unavailable due to an outage on a particular
+        host, you can stop the Impala service, delete the <span class="ph uicontrol">Impala StateStore</span> and
+        <span class="ph uicontrol">Impala Catalog Server</span> roles, add the roles on a different host, and restart the
+        Impala service.
+      </p>
+
+      <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+        <p class="p">
+        In Impala 1.2.4 and higher, you can specify a table name with <code class="ph codeph">INVALIDATE METADATA</code> after
+        the table is created in Hive, allowing you to make individual tables visible to Impala without doing a full
+        reload of the catalog metadata. Impala 1.2.4 also includes other changes to make the metadata broadcast
+        mechanism faster and more responsive, especially during Impala startup. See
+        <a class="xref" href="../shared/../topics/impala_new_features.html#new_features_124">New Features in Impala 1.2.4</a> for details.
+      </p>
+      </div>
+
+      <p class="p">
+        <strong class="ph b">Related information:</strong> <a class="xref" href="impala_config_options.html#config_options">Modifying Impala Startup Options</a>,
+        <a class="xref" href="impala_processes.html#processes">Starting Impala</a>, <a class="xref" href="impala_ports.html#ports">Ports Used by Impala</a>
+      </p>
+    </div>
+  </article>
+</article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_compression_codec.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_compression_codec.html b/docs/build/html/topics/impala_compression_codec.html
new file mode 100644
index 0000000..2018299
--- /dev/null
+++ b/docs/build/html/topics/impala_compression_codec.html
@@ -0,0 +1,92 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="compression_codec"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>COMPRESSION_CODEC Query Option (Impala 2.0 or higher only)</title></head><body id="compression_codec"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">COMPRESSION_CODEC Query Option (<span class="keyword">Impala 2.0</span> or higher only)</h1>
+  
+  
+
+  <div class="body conbody">
+
+
+
+
+
+    <p class="p">
+      
+      When Impala writes Parquet data files using the <code class="ph codeph">INSERT</code> statement, the underlying compression
+      is controlled by the <code class="ph codeph">COMPRESSION_CODEC</code> query option.
+    </p>
+
+    <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+      Prior to Impala 2.0, this option was named <code class="ph codeph">PARQUET_COMPRESSION_CODEC</code>. In Impala 2.0 and
+      later, the <code class="ph codeph">PARQUET_COMPRESSION_CODEC</code> name is not recognized. Use the more general name
+      <code class="ph codeph">COMPRESSION_CODEC</code> for new code.
+    </div>
+
+    <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>SET COMPRESSION_CODEC=<var class="keyword varname">codec_name</var>;</code></pre>
+
+    <p class="p">
+      The allowed values for this query option are <code class="ph codeph">SNAPPY</code> (the default), <code class="ph codeph">GZIP</code>,
+      and <code class="ph codeph">NONE</code>.
+    </p>
+
+    <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+      A Parquet file created with <code class="ph codeph">COMPRESSION_CODEC=NONE</code> is still typically smaller than the
+      original data, due to encoding schemes such as run-length encoding and dictionary encoding that are applied
+      separately from compression.
+    </div>
+
+    <p class="p"></p>
+
+    <p class="p">
+      The option value is not case-sensitive.
+    </p>
+
+    <p class="p">
+      If the option is set to an unrecognized value, all kinds of queries will fail due to the invalid option
+      setting, not just queries involving Parquet tables. (The value <code class="ph codeph">BZIP2</code> is also recognized, but
+      is not compatible with Parquet tables.)
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Type:</strong> <code class="ph codeph">STRING</code>
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Default:</strong> <code class="ph codeph">SNAPPY</code>
+    </p>
+
+
+    <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>set compression_codec=gzip;
+insert into parquet_table_highly_compressed select * from t1;
+
+set compression_codec=snappy;
+insert into parquet_table_compression_plus_fast_queries select * from t1;
+
+set compression_codec=none;
+insert into parquet_table_no_compression select * from t1;
+
+set compression_codec=foo;
+select * from t1 limit 5;
+ERROR: Invalid compression codec: foo
+</code></pre>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+    <p class="p">
+      For information about how compressing Parquet data files affects query performance, see
+      <a class="xref" href="impala_parquet.html#parquet_compression">Snappy and GZip Compression for Parquet Data Files</a>.
+    </p>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_compute_stats.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_compute_stats.html b/docs/build/html/topics/impala_compute_stats.html
new file mode 100644
index 0000000..fcba3d6
--- /dev/null
+++ b/docs/build/html/topics/impala_compute_stats.html
@@ -0,0 +1,558 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_langref_sql.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="compute_stats"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>COMPUTE STATS Statement</title></head><body id="compute_stats"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">COMPUTE STATS Statement</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      
+      Gathers information about volume and distribution of data in a table and all associated columns and
+      partitions. The information is stored in the metastore database, and used by Impala to help optimize queries.
+      For example, if Impala can determine that a table is large or small, or has many or few distinct values it
+      can organize parallelize the work appropriately for a join query or insert operation. For details about the
+      kinds of information gathered by this statement, see <a class="xref" href="impala_perf_stats.html#perf_stats">Table and Column Statistics</a>.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>COMPUTE STATS [<var class="keyword varname">db_name</var>.]<var class="keyword varname">table_name</var>
+COMPUTE INCREMENTAL STATS [<var class="keyword varname">db_name</var>.]<var class="keyword varname">table_name</var> [PARTITION (<var class="keyword varname">partition_spec</var>)]
+
+<var class="keyword varname">partition_spec</var> ::= <var class="keyword varname">simple_partition_spec</var> | <span class="ph"><var class="keyword varname">complex_partition_spec</var></span>
+
+<var class="keyword varname">simple_partition_spec</var> ::= <var class="keyword varname">partition_col</var>=<var class="keyword varname">constant_value</var>
+
+<span class="ph"><var class="keyword varname">complex_partition_spec</var> ::= <var class="keyword varname">comparison_expression_on_partition_col</var></span>
+</code></pre>
+
+    <p class="p">
+        The <code class="ph codeph">PARTITION</code> clause is only allowed in combination with the <code class="ph codeph">INCREMENTAL</code>
+        clause. It is optional for <code class="ph codeph">COMPUTE INCREMENTAL STATS</code>, and required for <code class="ph codeph">DROP
+        INCREMENTAL STATS</code>. Whenever you specify partitions through the <code class="ph codeph">PARTITION
+        (<var class="keyword varname">partition_spec</var>)</code> clause in a <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> or
+        <code class="ph codeph">DROP INCREMENTAL STATS</code> statement, you must include all the partitioning columns in the
+        specification, and specify constant values for all the partition key columns.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+    <p class="p">
+      Originally, Impala relied on users to run the Hive <code class="ph codeph">ANALYZE TABLE</code> statement, but that method
+      of gathering statistics proved unreliable and difficult to use. The Impala <code class="ph codeph">COMPUTE STATS</code>
+      statement is built from the ground up to improve the reliability and user-friendliness of this operation.
+      <code class="ph codeph">COMPUTE STATS</code> does not require any setup steps or special configuration. You only run a
+      single Impala <code class="ph codeph">COMPUTE STATS</code> statement to gather both table and column statistics, rather
+      than separate Hive <code class="ph codeph">ANALYZE TABLE</code> statements for each kind of statistics.
+    </p>
+
+    <p class="p">
+      The <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> variation is a shortcut for partitioned tables that works on a
+      subset of partitions rather than the entire table. The incremental nature makes it suitable for large tables
+      with many partitions, where a full <code class="ph codeph">COMPUTE STATS</code> operation takes too long to be practical
+      each time a partition is added or dropped. See <a class="xref" href="impala_perf_stats.html#perf_stats_incremental">Overview of Incremental Statistics</a>
+      for full usage details.
+    </p>
+
+    <p class="p">
+      <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> only applies to partitioned tables. If you use the
+      <code class="ph codeph">INCREMENTAL</code> clause for an unpartitioned table, Impala automatically uses the original
+      <code class="ph codeph">COMPUTE STATS</code> statement. Such tables display <code class="ph codeph">false</code> under the
+      <code class="ph codeph">Incremental stats</code> column of the <code class="ph codeph">SHOW TABLE STATS</code> output.
+    </p>
+
+    <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+      Because many of the most performance-critical and resource-intensive operations rely on table and column
+      statistics to construct accurate and efficient plans, <code class="ph codeph">COMPUTE STATS</code> is an important step at
+      the end of your ETL process. Run <code class="ph codeph">COMPUTE STATS</code> on all tables as your first step during
+      performance tuning for slow queries, or troubleshooting for out-of-memory conditions:
+      <ul class="ul">
+        <li class="li">
+          Accurate statistics help Impala construct an efficient query plan for join queries, improving performance
+          and reducing memory usage.
+        </li>
+
+        <li class="li">
+          Accurate statistics help Impala distribute the work effectively for insert operations into Parquet
+          tables, improving performance and reducing memory usage.
+        </li>
+
+        <li class="li">
+          Accurate statistics help Impala estimate the memory required for each query, which is important when you
+          use resource management features, such as admission control and the YARN resource management framework.
+          The statistics help Impala to achieve high concurrency, full utilization of available memory, and avoid
+          contention with workloads from other Hadoop components.
+        </li>
+        <li class="li">
+          In <span class="keyword">Impala 2.8</span> and higher, when you run the
+          <code class="ph codeph">COMPUTE STATS</code> or <code class="ph codeph">COMPUTE INCREMENTAL STATS</code>
+          statement against a Parquet table, Impala automatically applies the query
+          option setting <code class="ph codeph">MT_DOP=4</code> to increase the amount of intra-node
+          parallelism during this CPU-intensive operation. See <a class="xref" href="impala_mt_dop.html">MT_DOP Query Option</a>
+          for details about what this query option does and how to use it with
+          CPU-intensive <code class="ph codeph">SELECT</code> statements.
+        </li>
+      </ul>
+    </div>
+
+    <p class="p">
+      <strong class="ph b">Computing stats for groups of partitions:</strong>
+    </p>
+
+    <p class="p">
+      In <span class="keyword">Impala 2.8</span> and higher, you can run <code class="ph codeph">COMPUTE INCREMENTAL STATS</code>
+      on multiple partitions, instead of the entire table or one partition at a time. You include
+      comparison operators other than <code class="ph codeph">=</code> in the <code class="ph codeph">PARTITION</code> clause,
+      and the <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> statement applies to all partitions that
+      match the comparison expression.
+    </p>
+
+    <p class="p">
+      For example, the <code class="ph codeph">INT_PARTITIONS</code> table contains 4 partitions.
+      The following <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> statements affect some but not all
+      partitions, as indicated by the <code class="ph codeph">Updated <var class="keyword varname">n</var> partition(s)</code>
+      messages. The partitions that are affected depend on values in the partition key column <code class="ph codeph">X</code>
+      that match the comparison expression in the <code class="ph codeph">PARTITION</code> clause.
+    </p>
+
+<pre class="pre codeblock"><code>
+show partitions int_partitions;
++-------+-------+--------+------+--------------+-------------------+---------+...
+| x     | #Rows | #Files | Size | Bytes Cached | Cache Replication | Format  |...
++-------+-------+--------+------+--------------+-------------------+---------+...
+| 99    | -1    | 0      | 0B   | NOT CACHED   | NOT CACHED        | PARQUET |...
+| 120   | -1    | 0      | 0B   | NOT CACHED   | NOT CACHED        | TEXT    |...
+| 150   | -1    | 0      | 0B   | NOT CACHED   | NOT CACHED        | TEXT    |...
+| 200   | -1    | 0      | 0B   | NOT CACHED   | NOT CACHED        | TEXT    |...
+| Total | -1    | 0      | 0B   | 0B           |                   |         |...
++-------+-------+--------+------+--------------+-------------------+---------+...
+
+compute incremental stats int_partitions partition (x &lt; 100);
++-----------------------------------------+
+| summary                                 |
++-----------------------------------------+
+| Updated 1 partition(s) and 1 column(s). |
++-----------------------------------------+
+
+compute incremental stats int_partitions partition (x in (100, 150, 200));
++-----------------------------------------+
+| summary                                 |
++-----------------------------------------+
+| Updated 2 partition(s) and 1 column(s). |
++-----------------------------------------+
+
+compute incremental stats int_partitions partition (x between 100 and 175);
++-----------------------------------------+
+| summary                                 |
++-----------------------------------------+
+| Updated 2 partition(s) and 1 column(s). |
++-----------------------------------------+
+
+compute incremental stats int_partitions partition (x in (100, 150, 200) or x &lt; 100);
++-----------------------------------------+
+| summary                                 |
++-----------------------------------------+
+| Updated 3 partition(s) and 1 column(s). |
++-----------------------------------------+
+
+compute incremental stats int_partitions partition (x != 150);
++-----------------------------------------+
+| summary                                 |
++-----------------------------------------+
+| Updated 3 partition(s) and 1 column(s). |
++-----------------------------------------+
+
+</code></pre>
+
+    <p class="p">
+        <strong class="ph b">Complex type considerations:</strong>
+      </p>
+
+    <p class="p">
+      Currently, the statistics created by the <code class="ph codeph">COMPUTE STATS</code> statement do not include
+      information about complex type columns. The column stats metrics for complex columns are always shown
+      as -1. For queries involving complex type columns, Impala uses
+      heuristics to estimate the data distribution within such columns.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">HBase considerations:</strong>
+      </p>
+
+    <p class="p">
+      <code class="ph codeph">COMPUTE STATS</code> works for HBase tables also. The statistics gathered for HBase tables are
+      somewhat different than for HDFS-backed tables, but that metadata is still used for optimization when HBase
+      tables are involved in join queries.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Amazon S3 considerations:</strong>
+      </p>
+
+    <p class="p">
+      <code class="ph codeph">COMPUTE STATS</code> also works for tables where data resides in the Amazon Simple Storage Service (S3).
+      See <a class="xref" href="impala_s3.html#s3">Using Impala with the Amazon S3 Filesystem</a> for details.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Performance considerations:</strong>
+      </p>
+
+    <p class="p">
+      The statistics collected by <code class="ph codeph">COMPUTE STATS</code> are used to optimize join queries
+      <code class="ph codeph">INSERT</code> operations into Parquet tables, and other resource-intensive kinds of SQL statements.
+      See <a class="xref" href="impala_perf_stats.html#perf_stats">Table and Column Statistics</a> for details.
+    </p>
+
+    <p class="p">
+      For large tables, the <code class="ph codeph">COMPUTE STATS</code> statement itself might take a long time and you
+      might need to tune its performance. The <code class="ph codeph">COMPUTE STATS</code> statement does not work with the
+      <code class="ph codeph">EXPLAIN</code> statement, or the <code class="ph codeph">SUMMARY</code> command in <span class="keyword cmdname">impala-shell</span>.
+      You can use the <code class="ph codeph">PROFILE</code> statement in <span class="keyword cmdname">impala-shell</span> to examine timing information
+      for the statement as a whole. If a basic <code class="ph codeph">COMPUTE STATS</code> statement takes a long time for a
+      partitioned table, consider switching to the <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> syntax so that only
+      newly added partitions are analyzed each time.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+    <p class="p">
+      This example shows two tables, <code class="ph codeph">T1</code> and <code class="ph codeph">T2</code>, with a small number distinct
+      values linked by a parent-child relationship between <code class="ph codeph">T1.ID</code> and <code class="ph codeph">T2.PARENT</code>.
+      <code class="ph codeph">T1</code> is tiny, while <code class="ph codeph">T2</code> has approximately 100K rows. Initially, the statistics
+      includes physical measurements such as the number of files, the total size, and size measurements for
+      fixed-length columns such as with the <code class="ph codeph">INT</code> type. Unknown values are represented by -1. After
+      running <code class="ph codeph">COMPUTE STATS</code> for each table, much more information is available through the
+      <code class="ph codeph">SHOW STATS</code> statements. If you were running a join query involving both of these tables, you
+      would need statistics for both tables to get the most effective optimization for the query.
+    </p>
+
+
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; show table stats t1;
+Query: show table stats t1
++-------+--------+------+--------+
+| #Rows | #Files | Size | Format |
++-------+--------+------+--------+
+| -1    | 1      | 33B  | TEXT   |
++-------+--------+------+--------+
+Returned 1 row(s) in 0.02s
+[localhost:21000] &gt; show table stats t2;
+Query: show table stats t2
++-------+--------+----------+--------+
+| #Rows | #Files | Size     | Format |
++-------+--------+----------+--------+
+| -1    | 28     | 960.00KB | TEXT   |
++-------+--------+----------+--------+
+Returned 1 row(s) in 0.01s
+[localhost:21000] &gt; show column stats t1;
+Query: show column stats t1
++--------+--------+------------------+--------+----------+----------+
+| Column | Type   | #Distinct Values | #Nulls | Max Size | Avg Size |
++--------+--------+------------------+--------+----------+----------+
+| id     | INT    | -1               | -1     | 4        | 4        |
+| s      | STRING | -1               | -1     | -1       | -1       |
++--------+--------+------------------+--------+----------+----------+
+Returned 2 row(s) in 1.71s
+[localhost:21000] &gt; show column stats t2;
+Query: show column stats t2
++--------+--------+------------------+--------+----------+----------+
+| Column | Type   | #Distinct Values | #Nulls | Max Size | Avg Size |
++--------+--------+------------------+--------+----------+----------+
+| parent | INT    | -1               | -1     | 4        | 4        |
+| s      | STRING | -1               | -1     | -1       | -1       |
++--------+--------+------------------+--------+----------+----------+
+Returned 2 row(s) in 0.01s
+[localhost:21000] &gt; compute stats t1;
+Query: compute stats t1
++-----------------------------------------+
+| summary                                 |
++-----------------------------------------+
+| Updated 1 partition(s) and 2 column(s). |
++-----------------------------------------+
+Returned 1 row(s) in 5.30s
+[localhost:21000] &gt; show table stats t1;
+Query: show table stats t1
++-------+--------+------+--------+
+| #Rows | #Files | Size | Format |
++-------+--------+------+--------+
+| 3     | 1      | 33B  | TEXT   |
++-------+--------+------+--------+
+Returned 1 row(s) in 0.01s
+[localhost:21000] &gt; show column stats t1;
+Query: show column stats t1
++--------+--------+------------------+--------+----------+----------+
+| Column | Type   | #Distinct Values | #Nulls | Max Size | Avg Size |
++--------+--------+------------------+--------+----------+----------+
+| id     | INT    | 3                | -1     | 4        | 4        |
+| s      | STRING | 3                | -1     | -1       | -1       |
++--------+--------+------------------+--------+----------+----------+
+Returned 2 row(s) in 0.02s
+[localhost:21000] &gt; compute stats t2;
+Query: compute stats t2
++-----------------------------------------+
+| summary                                 |
++-----------------------------------------+
+| Updated 1 partition(s) and 2 column(s). |
++-----------------------------------------+
+Returned 1 row(s) in 5.70s
+[localhost:21000] &gt; show table stats t2;
+Query: show table stats t2
++-------+--------+----------+--------+
+| #Rows | #Files | Size     | Format |
++-------+--------+----------+--------+
+| 98304 | 1      | 960.00KB | TEXT   |
++-------+--------+----------+--------+
+Returned 1 row(s) in 0.03s
+[localhost:21000] &gt; show column stats t2;
+Query: show column stats t2
++--------+--------+------------------+--------+----------+----------+
+| Column | Type   | #Distinct Values | #Nulls | Max Size | Avg Size |
++--------+--------+------------------+--------+----------+----------+
+| parent | INT    | 3                | -1     | 4        | 4        |
+| s      | STRING | 6                | -1     | 14       | 9.3      |
++--------+--------+------------------+--------+----------+----------+
+Returned 2 row(s) in 0.01s</code></pre>
+
+    <p class="p">
+      The following example shows how to use the <code class="ph codeph">INCREMENTAL</code> clause, available in Impala 2.1.0 and
+      higher. The <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> syntax lets you collect statistics for newly added or
+      changed partitions, without rescanning the entire table.
+    </p>
+
+<pre class="pre codeblock"><code>-- Initially the table has no incremental stats, as indicated
+-- by -1 under #Rows and false under Incremental stats.
+show table stats item_partitioned;
++-------------+-------+--------+----------+--------------+---------+------------------
+| i_category  | #Rows | #Files | Size     | Bytes Cached | Format  | Incremental stats
++-------------+-------+--------+----------+--------------+---------+------------------
+| Books       | -1    | 1      | 223.74KB | NOT CACHED   | PARQUET | false
+| Children    | -1    | 1      | 230.05KB | NOT CACHED   | PARQUET | false
+| Electronics | -1    | 1      | 232.67KB | NOT CACHED   | PARQUET | false
+| Home        | -1    | 1      | 232.56KB | NOT CACHED   | PARQUET | false
+| Jewelry     | -1    | 1      | 223.72KB | NOT CACHED   | PARQUET | false
+| Men         | -1    | 1      | 231.25KB | NOT CACHED   | PARQUET | false
+| Music       | -1    | 1      | 237.90KB | NOT CACHED   | PARQUET | false
+| Shoes       | -1    | 1      | 234.90KB | NOT CACHED   | PARQUET | false
+| Sports      | -1    | 1      | 227.97KB | NOT CACHED   | PARQUET | false
+| Women       | -1    | 1      | 226.27KB | NOT CACHED   | PARQUET | false
+| Total       | -1    | 10     | 2.25MB   | 0B           |         |
++-------------+-------+--------+----------+--------------+---------+------------------
+
+-- After the first COMPUTE INCREMENTAL STATS,
+-- all partitions have stats.
+compute incremental stats item_partitioned;
++-------------------------------------------+
+| summary                                   |
++-------------------------------------------+
+| Updated 10 partition(s) and 21 column(s). |
++-------------------------------------------+
+show table stats item_partitioned;
++-------------+-------+--------+----------+--------------+---------+------------------
+| i_category  | #Rows | #Files | Size     | Bytes Cached | Format  | Incremental stats
++-------------+-------+--------+----------+--------------+---------+------------------
+| Books       | 1733  | 1      | 223.74KB | NOT CACHED   | PARQUET | true
+| Children    | 1786  | 1      | 230.05KB | NOT CACHED   | PARQUET | true
+| Electronics | 1812  | 1      | 232.67KB | NOT CACHED   | PARQUET | true
+| Home        | 1807  | 1      | 232.56KB | NOT CACHED   | PARQUET | true
+| Jewelry     | 1740  | 1      | 223.72KB | NOT CACHED   | PARQUET | true
+| Men         | 1811  | 1      | 231.25KB | NOT CACHED   | PARQUET | true
+| Music       | 1860  | 1      | 237.90KB | NOT CACHED   | PARQUET | true
+| Shoes       | 1835  | 1      | 234.90KB | NOT CACHED   | PARQUET | true
+| Sports      | 1783  | 1      | 227.97KB | NOT CACHED   | PARQUET | true
+| Women       | 1790  | 1      | 226.27KB | NOT CACHED   | PARQUET | true
+| Total       | 17957 | 10     | 2.25MB   | 0B           |         |
++-------------+-------+--------+----------+--------------+---------+------------------
+
+-- Add a new partition...
+alter table item_partitioned add partition (i_category='Camping');
+-- Add or replace files in HDFS outside of Impala,
+-- rendering the stats for a partition obsolete.
+!import_data_into_sports_partition.sh
+refresh item_partitioned;
+drop incremental stats item_partitioned partition (i_category='Sports');
+-- Now some partitions have incremental stats
+-- and some do not.
+show table stats item_partitioned;
++-------------+-------+--------+----------+--------------+---------+------------------
+| i_category  | #Rows | #Files | Size     | Bytes Cached | Format  | Incremental stats
++-------------+-------+--------+----------+--------------+---------+------------------
+| Books       | 1733  | 1      | 223.74KB | NOT CACHED   | PARQUET | true
+| Camping     | -1    | 1      | 408.02KB | NOT CACHED   | PARQUET | false
+| Children    | 1786  | 1      | 230.05KB | NOT CACHED   | PARQUET | true
+| Electronics | 1812  | 1      | 232.67KB | NOT CACHED   | PARQUET | true
+| Home        | 1807  | 1      | 232.56KB | NOT CACHED   | PARQUET | true
+| Jewelry     | 1740  | 1      | 223.72KB | NOT CACHED   | PARQUET | true
+| Men         | 1811  | 1      | 231.25KB | NOT CACHED   | PARQUET | true
+| Music       | 1860  | 1      | 237.90KB | NOT CACHED   | PARQUET | true
+| Shoes       | 1835  | 1      | 234.90KB | NOT CACHED   | PARQUET | true
+| Sports      | -1    | 1      | 227.97KB | NOT CACHED   | PARQUET | false
+| Women       | 1790  | 1      | 226.27KB | NOT CACHED   | PARQUET | true
+| Total       | 17957 | 11     | 2.65MB   | 0B           |         |
++-------------+-------+--------+----------+--------------+---------+------------------
+
+-- After another COMPUTE INCREMENTAL STATS,
+-- all partitions have incremental stats, and only the 2
+-- partitions without incremental stats were scanned.
+compute incremental stats item_partitioned;
++------------------------------------------+
+| summary                                  |
++------------------------------------------+
+| Updated 2 partition(s) and 21 column(s). |
++------------------------------------------+
+show table stats item_partitioned;
++-------------+-------+--------+----------+--------------+---------+------------------
+| i_category  | #Rows | #Files | Size     | Bytes Cached | Format  | Incremental stats
++-------------+-------+--------+----------+--------------+---------+------------------
+| Books       | 1733  | 1      | 223.74KB | NOT CACHED   | PARQUET | true
+| Camping     | 5328  | 1      | 408.02KB | NOT CACHED   | PARQUET | true
+| Children    | 1786  | 1      | 230.05KB | NOT CACHED   | PARQUET | true
+| Electronics | 1812  | 1      | 232.67KB | NOT CACHED   | PARQUET | true
+| Home        | 1807  | 1      | 232.56KB | NOT CACHED   | PARQUET | true
+| Jewelry     | 1740  | 1      | 223.72KB | NOT CACHED   | PARQUET | true
+| Men         | 1811  | 1      | 231.25KB | NOT CACHED   | PARQUET | true
+| Music       | 1860  | 1      | 237.90KB | NOT CACHED   | PARQUET | true
+| Shoes       | 1835  | 1      | 234.90KB | NOT CACHED   | PARQUET | true
+| Sports      | 1783  | 1      | 227.97KB | NOT CACHED   | PARQUET | true
+| Women       | 1790  | 1      | 226.27KB | NOT CACHED   | PARQUET | true
+| Total       | 17957 | 11     | 2.65MB   | 0B           |         |
++-------------+-------+--------+----------+--------------+---------+------------------
+</code></pre>
+
+    <p class="p">
+        <strong class="ph b">File format considerations:</strong>
+      </p>
+
+    <p class="p">
+      The <code class="ph codeph">COMPUTE STATS</code> statement works with tables created with any of the file formats supported
+      by Impala. See <a class="xref" href="impala_file_formats.html#file_formats">How Impala Works with Hadoop File Formats</a> for details about working with the
+      different file formats. The following considerations apply to <code class="ph codeph">COMPUTE STATS</code> depending on the
+      file format of the table.
+    </p>
+
+    <p class="p">
+      The <code class="ph codeph">COMPUTE STATS</code> statement works with text tables with no restrictions. These tables can be
+      created through either Impala or Hive.
+    </p>
+
+    <p class="p">
+      The <code class="ph codeph">COMPUTE STATS</code> statement works with Parquet tables. These tables can be created through
+      either Impala or Hive.
+    </p>
+
+    <p class="p">
+      The <code class="ph codeph">COMPUTE STATS</code> statement works with Avro tables without restriction in <span class="keyword">Impala 2.2</span>
+      and higher. In earlier releases, <code class="ph codeph">COMPUTE STATS</code> worked only for Avro tables created through Hive,
+      and required the <code class="ph codeph">CREATE TABLE</code> statement to use SQL-style column names and types rather than an
+      Avro-style schema specification.
+    </p>
+
+    <p class="p">
+      The <code class="ph codeph">COMPUTE STATS</code> statement works with RCFile tables with no restrictions. These tables can
+      be created through either Impala or Hive.
+    </p>
+
+    <p class="p">
+      The <code class="ph codeph">COMPUTE STATS</code> statement works with SequenceFile tables with no restrictions. These
+      tables can be created through either Impala or Hive.
+    </p>
+
+    <p class="p">
+      The <code class="ph codeph">COMPUTE STATS</code> statement works with partitioned tables, whether all the partitions use
+      the same file format, or some partitions are defined through <code class="ph codeph">ALTER TABLE</code> to use different
+      file formats.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Statement type:</strong> DDL
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Cancellation:</strong> Certain multi-stage statements (<code class="ph codeph">CREATE TABLE AS SELECT</code> and
+        <code class="ph codeph">COMPUTE STATS</code>) can be cancelled during some stages, when running <code class="ph codeph">INSERT</code>
+        or <code class="ph codeph">SELECT</code> operations internally. To cancel this statement, use Ctrl-C from the
+        <span class="keyword cmdname">impala-shell</span> interpreter, the <span class="ph uicontrol">Cancel</span> button from the
+        <span class="ph uicontrol">Watch</span> page in Hue, or <span class="ph uicontrol">Cancel</span> from the list of
+        in-flight queries (for a particular node) on the <span class="ph uicontrol">Queries</span> tab in the Impala web UI
+        (port 25000).
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Restrictions:</strong>
+      </p>
+
+    <div class="note note note_note"><span class="note__title notetitle">Note:</span>  Prior to Impala 1.4.0,
+          <code class="ph codeph">COMPUTE STATS</code> counted the number of
+          <code class="ph codeph">NULL</code> values in each column and recorded that figure
+        in the metastore database. Because Impala does not currently use the
+          <code class="ph codeph">NULL</code> count during query planning, Impala 1.4.0 and
+        higher speeds up the <code class="ph codeph">COMPUTE STATS</code> statement by
+        skipping this <code class="ph codeph">NULL</code> counting. </div>
+
+    <p class="p">
+        <strong class="ph b">Internal details:</strong>
+      </p>
+    <p class="p">
+      Behind the scenes, the <code class="ph codeph">COMPUTE STATS</code> statement
+      executes two statements: one to count the rows of each partition
+      in the table (or the entire table if unpartitioned) through the
+      <code class="ph codeph">COUNT(*)</code> function,
+      and another to count the approximate number of distinct values
+      in each column through the <code class="ph codeph">NDV()</code> function.
+      You might see these queries in your monitoring and diagnostic displays.
+      The same factors that affect the performance, scalability, and
+      execution of other queries (such as parallel execution, memory usage,
+      admission control, and timeouts) also apply to the queries run by the
+      <code class="ph codeph">COMPUTE STATS</code> statement.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">HDFS permissions:</strong>
+      </p>
+    <p class="p">
+      The user ID that the <span class="keyword cmdname">impalad</span> daemon runs under,
+      typically the <code class="ph codeph">impala</code> user, must have read
+      permission for all affected files in the source directory:
+      all files in the case of an unpartitioned table or
+      a partitioned table in the case of <code class="ph codeph">COMPUTE STATS</code>;
+      or all the files in partitions without incremental stats in
+      the case of <code class="ph codeph">COMPUTE INCREMENTAL STATS</code>.
+      It must also have read and execute permissions for all
+      relevant directories holding the data files.
+      (Essentially, <code class="ph codeph">COMPUTE STATS</code> requires the
+      same permissions as the underlying <code class="ph codeph">SELECT</code> queries it runs
+      against the table.)
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Kudu considerations:</strong>
+      </p>
+
+    <p class="p">
+      The <code class="ph codeph">COMPUTE STATS</code> statement applies to Kudu tables.
+      Impala does not compute the number of rows for each partition for
+      Kudu tables. Therefore, you do not need to re-run the operation when
+      you see -1 in the <code class="ph codeph"># Rows</code> column of the output from
+      <code class="ph codeph">SHOW TABLE STATS</code>. That column always shows -1 for
+      all Kudu tables. 
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+    <p class="p">
+      <a class="xref" href="impala_drop_stats.html#drop_stats">DROP STATS Statement</a>, <a class="xref" href="impala_show.html#show_table_stats">SHOW TABLE STATS Statement</a>,
+      <a class="xref" href="impala_show.html#show_column_stats">SHOW COLUMN STATS Statement</a>, <a class="xref" href="impala_perf_stats.html#perf_stats">Table and Column Statistics</a>
+    </p>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_langref_sql.html">Impala SQL Statements</a></div></div></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_concepts.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_concepts.html b/docs/build/html/topics/impala_concepts.html
new file mode 100644
index 0000000..e644e93
--- /dev/null
+++ b/docs/build/html/topics/impala_concepts.html
@@ -0,0 +1,48 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_components.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_development.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_hadoop.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="concepts"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Impala Concepts and Architecture</title></head><body id="concepts"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Impala Concepts and Architecture</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      The following sections provide background information to help you become productive using Impala and
+      its features. Where appropriate, the explanations include context to help understand how aspects of Impala
+      relate to other technologies you might already be familiar with, such as relational database management
+      systems and data warehouses, or other Hadoop components such as Hive, HDFS, and HBase.
+    </p>
+
+    <p class="p toc"></p>
+  </div>
+
+
+
+  
+
+  
+
+  
+
+  
+
+  
+
+  
+
+
+
+  
+
+
+
+  
+
+
+
+
+
+  
+<nav role="navigation" class="related-links"><ul class="ullinks"><li class="link ulchildlink"><strong><a href="../topics/impala_components.html">Components of the Impala Server</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_development.html">Developing Impala Applications</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_hadoop.html">How Impala Fits Into the Hadoop Ecosystem</a></strong><br></li></ul></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_conditional_functions.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_conditional_functions.html b/docs/build/html/topics/impala_conditional_functions.html
new file mode 100644
index 0000000..713946b
--- /dev/null
+++ b/docs/build/html/topics/impala_conditional_functions.html
@@ -0,0 +1,517 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_functions.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="conditional_functions"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Impala Conditional Functions</title></head><body id="conditional_functions"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Impala Conditional Functions</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      Impala supports the following conditional functions for testing equality, comparison operators, and nullity:
+    </p>
+
+    <dl class="dl">
+      
+
+        <dt class="dt dlterm" id="conditional_functions__case">
+          <code class="ph codeph">CASE a WHEN b THEN c [WHEN d THEN e]... [ELSE f] END</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Compares an expression to one or more possible values, and returns a corresponding result
+          when a match is found.
+          <p class="p">
+        <strong class="ph b">Return type:</strong> same as the initial argument value, except that integer values are promoted to
+        <code class="ph codeph">BIGINT</code> and floating-point values are promoted to <code class="ph codeph">DOUBLE</code>; use
+        <code class="ph codeph">CAST()</code> when inserting into a smaller numeric column
+      </p>
+          <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+          <p class="p">
+            In this form of the <code class="ph codeph">CASE</code> expression, the initial value <code class="ph codeph">A</code>
+            being evaluated for each row it typically a column reference, or an expression involving
+            a column. This form can only compare against a set of specified values, not ranges,
+            multi-value comparisons such as <code class="ph codeph">BETWEEN</code> or <code class="ph codeph">IN</code>,
+            regular expressions, or <code class="ph codeph">NULL</code>.
+          </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+          <p class="p">
+            Although this example is split across multiple lines, you can put any or all parts of a <code class="ph codeph">CASE</code> expression
+            on a single line, with no punctuation or other separators between the <code class="ph codeph">WHEN</code>,
+            <code class="ph codeph">ELSE</code>, and <code class="ph codeph">END</code> clauses.
+          </p>
+<pre class="pre codeblock"><code>select case x
+    when 1 then 'one'
+    when 2 then 'two'
+    when 0 then 'zero'
+    else 'out of range'
+  end
+    from t1;
+</code></pre>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="conditional_functions__case2">
+          <code class="ph codeph">CASE WHEN a THEN b [WHEN c THEN d]... [ELSE e] END</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Tests whether any of a sequence of expressions is true, and returns a corresponding
+          result for the first true expression.
+          <p class="p">
+        <strong class="ph b">Return type:</strong> same as the initial argument value, except that integer values are promoted to
+        <code class="ph codeph">BIGINT</code> and floating-point values are promoted to <code class="ph codeph">DOUBLE</code>; use
+        <code class="ph codeph">CAST()</code> when inserting into a smaller numeric column
+      </p>
+          <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+          <p class="p">
+            <code class="ph codeph">CASE</code> expressions without an initial test value have more flexibility.
+            For example, they can test different columns in different <code class="ph codeph">WHEN</code> clauses,
+            or use comparison operators such as <code class="ph codeph">BETWEEN</code>, <code class="ph codeph">IN</code> and <code class="ph codeph">IS NULL</code>
+            rather than comparing against discrete values.
+          </p>
+          <p class="p">
+            <code class="ph codeph">CASE</code> expressions are often the foundation of long queries that
+            summarize and format results for easy-to-read reports. For example, you might
+            use a <code class="ph codeph">CASE</code> function call to turn values from a numeric column
+            into category strings corresponding to integer values, or labels such as <span class="q">"Small"</span>,
+            <span class="q">"Medium"</span> and <span class="q">"Large"</span> based on ranges. Then subsequent parts of the
+            query might aggregate based on the transformed values, such as how many
+            values are classified as small, medium, or large. You can also use <code class="ph codeph">CASE</code>
+            to signal problems with out-of-bounds values, <code class="ph codeph">NULL</code> values,
+            and so on.
+          </p>
+          <p class="p">
+            By using operators such as <code class="ph codeph">OR</code>, <code class="ph codeph">IN</code>,
+            <code class="ph codeph">REGEXP</code>, and so on in <code class="ph codeph">CASE</code> expressions,
+            you can build extensive tests and transformations into a single query.
+            Therefore, applications that construct SQL statements often rely heavily on <code class="ph codeph">CASE</code>
+            calls in the generated SQL code.
+          </p>
+          <p class="p">
+            Because this flexible form of the <code class="ph codeph">CASE</code> expressions allows you to perform
+            many comparisons and call multiple functions when evaluating each row, be careful applying
+            elaborate <code class="ph codeph">CASE</code> expressions to queries that process large amounts of data.
+            For example, when practical, evaluate and transform values through <code class="ph codeph">CASE</code>
+            after applying operations such as aggregations that reduce the size of the result set;
+            transform numbers to strings after performing joins with the original numeric values.
+          </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+          <p class="p">
+            Although this example is split across multiple lines, you can put any or all parts of a <code class="ph codeph">CASE</code> expression
+            on a single line, with no punctuation or other separators between the <code class="ph codeph">WHEN</code>,
+            <code class="ph codeph">ELSE</code>, and <code class="ph codeph">END</code> clauses.
+          </p>
+<pre class="pre codeblock"><code>select case
+    when dayname(now()) in ('Saturday','Sunday') then 'result undefined on weekends'
+    when x &gt; y then 'x greater than y'
+    when x = y then 'x and y are equal'
+    when x is null or y is null then 'one of the columns is null'
+    else null
+  end
+    from t1;
+</code></pre>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="conditional_functions__coalesce">
+          <code class="ph codeph">coalesce(type v1, type v2, ...)</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Returns the first specified argument that is not <code class="ph codeph">NULL</code>, or
+          <code class="ph codeph">NULL</code> if all arguments are <code class="ph codeph">NULL</code>.
+          <p class="p">
+        <strong class="ph b">Return type:</strong> same as the initial argument value, except that integer values are promoted to
+        <code class="ph codeph">BIGINT</code> and floating-point values are promoted to <code class="ph codeph">DOUBLE</code>; use
+        <code class="ph codeph">CAST()</code> when inserting into a smaller numeric column
+      </p>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="conditional_functions__decode">
+          <code class="ph codeph">decode(type expression, type search1, type result1 [, type search2, type result2 ...] [, type
+          default] )</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Compares an expression to one or more possible values, and returns a corresponding result
+          when a match is found.
+          <p class="p">
+        <strong class="ph b">Return type:</strong> same as the initial argument value, except that integer values are promoted to
+        <code class="ph codeph">BIGINT</code> and floating-point values are promoted to <code class="ph codeph">DOUBLE</code>; use
+        <code class="ph codeph">CAST()</code> when inserting into a smaller numeric column
+      </p>
+          <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+          <p class="p">
+            Can be used as shorthand for a <code class="ph codeph">CASE</code> expression.
+          </p>
+          <p class="p">
+            The original expression and the search expressions must of the same type or convertible types. The
+            result expression can be a different type, but all result expressions must be of the same type.
+          </p>
+          <p class="p">
+            Returns a successful match If the original expression is <code class="ph codeph">NULL</code> and a search expression
+            is also <code class="ph codeph">NULL</code>. the
+          </p>
+          <p class="p">
+            Returns <code class="ph codeph">NULL</code> if the final <code class="ph codeph">default</code> value is omitted and none of the
+            search expressions match the original expression.
+          </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+          <p class="p">
+            The following example translates numeric day values into descriptive names:
+          </p>
+<pre class="pre codeblock"><code>SELECT event, decode(day_of_week, 1, "Monday", 2, "Tuesday", 3, "Wednesday",
+  4, "Thursday", 5, "Friday", 6, "Saturday", 7, "Sunday", "Unknown day")
+  FROM calendar;
+</code></pre>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="conditional_functions__if">
+          <code class="ph codeph">if(boolean condition, type ifTrue, type ifFalseOrNull)</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Tests an expression and returns a corresponding result depending on whether the result is
+          true, false, or <code class="ph codeph">NULL</code>.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> Same as the <code class="ph codeph">ifTrue</code> argument value
+          </p>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="conditional_functions__ifnull">
+          <code class="ph codeph">ifnull(type a, type ifNull)</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Alias for the <code class="ph codeph">isnull()</code> function, with the same behavior. To simplify
+          porting SQL with vendor extensions to Impala.
+          <p class="p">
+        <strong class="ph b">Added in:</strong> Impala 1.3.0
+      </p>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="conditional_functions__isfalse">
+          <code class="ph codeph">isfalse(<var class="keyword varname">boolean</var>)</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Tests if a Boolean expression is <code class="ph codeph">false</code> or not.
+          Returns <code class="ph codeph">true</code> if so.
+          If the argument is <code class="ph codeph">NULL</code>, returns <code class="ph codeph">false</code>.
+          Identical to <code class="ph codeph">isnottrue()</code>, except it returns the opposite value for a <code class="ph codeph">NULL</code> argument.
+          <p class="p">
+        <strong class="ph b">Return type:</strong> <code class="ph codeph">BOOLEAN</code>
+      </p>
+          <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.2.0</span>
+      </p>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="conditional_functions__isnotfalse">
+          <code class="ph codeph">isnotfalse(<var class="keyword varname">boolean</var>)</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Tests if a Boolean expression is not <code class="ph codeph">false</code> (that is, either <code class="ph codeph">true</code> or <code class="ph codeph">NULL</code>).
+          Returns <code class="ph codeph">true</code> if so.
+          If the argument is <code class="ph codeph">NULL</code>, returns <code class="ph codeph">true</code>.
+          Identical to <code class="ph codeph">istrue()</code>, except it returns the opposite value for a <code class="ph codeph">NULL</code> argument.
+          <p class="p">
+        <strong class="ph b">Return type:</strong> <code class="ph codeph">BOOLEAN</code>
+      </p>
+          <p class="p">
+        <strong class="ph b">Usage notes:</strong> Primarily for compatibility with code containing industry extensions to SQL.
+      </p>
+          <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.2.0</span>
+      </p>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="conditional_functions__isnottrue">
+          <code class="ph codeph">isnottrue(<var class="keyword varname">boolean</var>)</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Tests if a Boolean expression is not <code class="ph codeph">true</code> (that is, either <code class="ph codeph">false</code> or <code class="ph codeph">NULL</code>).
+          Returns <code class="ph codeph">true</code> if so.
+          If the argument is <code class="ph codeph">NULL</code>, returns <code class="ph codeph">true</code>.
+          Identical to <code class="ph codeph">isfalse()</code>, except it returns the opposite value for a <code class="ph codeph">NULL</code> argument.
+          <p class="p">
+        <strong class="ph b">Return type:</strong> <code class="ph codeph">BOOLEAN</code>
+      </p>
+          <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.2.0</span>
+      </p>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="conditional_functions__isnull">
+          <code class="ph codeph">isnull(type a, type ifNull)</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Tests if an expression is <code class="ph codeph">NULL</code>, and returns the expression result value
+          if not. If the first argument is <code class="ph codeph">NULL</code>, returns the second argument.
+          <p class="p">
+            <strong class="ph b">Compatibility notes:</strong> Equivalent to the <code class="ph codeph">nvl()</code> function from Oracle Database or
+            <code class="ph codeph">ifnull()</code> from MySQL. The <code class="ph codeph">nvl()</code> and <code class="ph codeph">ifnull()</code>
+            functions are also available in Impala.
+          </p>
+          <p class="p">
+            <strong class="ph b">Return type:</strong> Same as the first argument value
+          </p>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="conditional_functions__istrue">
+          <code class="ph codeph">istrue(<var class="keyword varname">boolean</var>)</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Tests if a Boolean expression is <code class="ph codeph">true</code> or not.
+          Returns <code class="ph codeph">true</code> if so.
+          If the argument is <code class="ph codeph">NULL</code>, returns <code class="ph codeph">false</code>.
+          Identical to <code class="ph codeph">isnotfalse()</code>, except it returns the opposite value for a <code class="ph codeph">NULL</code> argument.
+          <p class="p">
+        <strong class="ph b">Return type:</strong> <code class="ph codeph">BOOLEAN</code>
+      </p>
+          <p class="p">
+        <strong class="ph b">Usage notes:</strong> Primarily for compatibility with code containing industry extensions to SQL.
+      </p>
+          <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.2.0</span>
+      </p>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="conditional_functions__nonnullvalue">
+          <code class="ph codeph">nonnullvalue(<var class="keyword varname">expression</var>)</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Tests if an expression (of any type) is <code class="ph codeph">NULL</code> or not.
+          Returns <code class="ph codeph">false</code> if so.
+          The converse of <code class="ph codeph">nullvalue()</code>.
+          <p class="p">
+        <strong class="ph b">Return type:</strong> <code class="ph codeph">BOOLEAN</code>
+      </p>
+          <p class="p">
+        <strong class="ph b">Usage notes:</strong> Primarily for compatibility with code containing industry extensions to SQL.
+      </p>
+          <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.2.0</span>
+      </p>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="conditional_functions__nullif">
+          <code class="ph codeph">nullif(<var class="keyword varname">expr1</var>,<var class="keyword varname">expr2</var>)</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Returns <code class="ph codeph">NULL</code> if the two specified arguments are equal. If the specified
+          arguments are not equal, returns the value of <var class="keyword varname">expr1</var>. The data types of the expressions
+          must be compatible, according to the conversion rules from <a class="xref" href="impala_datatypes.html#datatypes">Data Types</a>.
+          You cannot use an expression that evaluates to <code class="ph codeph">NULL</code> for <var class="keyword varname">expr1</var>; that
+          way, you can distinguish a return value of <code class="ph codeph">NULL</code> from an argument value of
+          <code class="ph codeph">NULL</code>, which would never match <var class="keyword varname">expr2</var>.
+          <p class="p">
+            <strong class="ph b">Usage notes:</strong> This function is effectively shorthand for a <code class="ph codeph">CASE</code> expression of
+            the form:
+          </p>
+<pre class="pre codeblock"><code>CASE
+  WHEN <var class="keyword varname">expr1</var> = <var class="keyword varname">expr2</var> THEN NULL
+  ELSE <var class="keyword varname">expr1</var>
+END</code></pre>
+          <p class="p">
+            It is commonly used in division expressions, to produce a <code class="ph codeph">NULL</code> result instead of a
+            divide-by-zero error when the divisor is equal to zero:
+          </p>
+<pre class="pre codeblock"><code>select 1.0 / nullif(c1,0) as reciprocal from t1;</code></pre>
+          <p class="p">
+            You might also use it for compatibility with other database systems that support the same
+            <code class="ph codeph">NULLIF()</code> function.
+          </p>
+          <p class="p">
+        <strong class="ph b">Return type:</strong> same as the initial argument value, except that integer values are promoted to
+        <code class="ph codeph">BIGINT</code> and floating-point values are promoted to <code class="ph codeph">DOUBLE</code>; use
+        <code class="ph codeph">CAST()</code> when inserting into a smaller numeric column
+      </p>
+          <p class="p">
+        <strong class="ph b">Added in:</strong> Impala 1.3.0
+      </p>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="conditional_functions__nullifzero">
+          <code class="ph codeph">nullifzero(<var class="keyword varname">numeric_expr</var>)</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Returns <code class="ph codeph">NULL</code> if the numeric expression evaluates to 0, otherwise returns
+          the result of the expression.
+          <p class="p">
+            <strong class="ph b">Usage notes:</strong> Used to avoid error conditions such as divide-by-zero in numeric calculations.
+            Serves as shorthand for a more elaborate <code class="ph codeph">CASE</code> expression, to simplify porting SQL with
+            vendor extensions to Impala.
+          </p>
+          <p class="p">
+        <strong class="ph b">Return type:</strong> same as the initial argument value, except that integer values are promoted to
+        <code class="ph codeph">BIGINT</code> and floating-point values are promoted to <code class="ph codeph">DOUBLE</code>; use
+        <code class="ph codeph">CAST()</code> when inserting into a smaller numeric column
+      </p>
+          <p class="p">
+        <strong class="ph b">Added in:</strong> Impala 1.3.0
+      </p>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="conditional_functions__nullvalue">
+          <code class="ph codeph">nullvalue(<var class="keyword varname">expression</var>)</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Tests if an expression (of any type) is <code class="ph codeph">NULL</code> or not.
+          Returns <code class="ph codeph">true</code> if so.
+          The converse of <code class="ph codeph">nonnullvalue()</code>.
+          <p class="p">
+        <strong class="ph b">Return type:</strong> <code class="ph codeph">BOOLEAN</code>
+      </p>
+          <p class="p">
+        <strong class="ph b">Usage notes:</strong> Primarily for compatibility with code containing industry extensions to SQL.
+      </p>
+          <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.2.0</span>
+      </p>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="conditional_functions__nvl">
+          <code class="ph codeph">nvl(type a, type ifNull)</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Alias for the <code class="ph codeph">isnull()</code> function. Tests if an expression is
+          <code class="ph codeph">NULL</code>, and returns the expression result value if not. If the first argument is
+          <code class="ph codeph">NULL</code>, returns the second argument. Equivalent to the <code class="ph codeph">nvl()</code> function
+          from Oracle Database or <code class="ph codeph">ifnull()</code> from MySQL.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> Same as the first argument value
+          </p>
+          <p class="p">
+        <strong class="ph b">Added in:</strong> Impala 1.1
+      </p>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="conditional_functions__zeroifnull">
+          <code class="ph codeph">zeroifnull(<var class="keyword varname">numeric_expr</var>)</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Returns 0 if the numeric expression evaluates to <code class="ph codeph">NULL</code>, otherwise returns
+          the result of the expression.
+          <p class="p">
+            <strong class="ph b">Usage notes:</strong> Used to avoid unexpected results due to unexpected propagation of
+            <code class="ph codeph">NULL</code> values in numeric calculations. Serves as shorthand for a more elaborate
+            <code class="ph codeph">CASE</code> expression, to simplify porting SQL with vendor extensions to Impala.
+          </p>
+          <p class="p">
+        <strong class="ph b">Return type:</strong> same as the initial argument value, except that integer values are promoted to
+        <code class="ph codeph">BIGINT</code> and floating-point values are promoted to <code class="ph codeph">DOUBLE</code>; use
+        <code class="ph codeph">CAST()</code> when inserting into a smaller numeric column
+      </p>
+          <p class="p">
+        <strong class="ph b">Added in:</strong> Impala 1.3.0
+      </p>
+        </dd>
+
+      
+    </dl>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_functions.html">Impala Built-In Functions</a></div></div></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_config.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_config.html b/docs/build/html/topics/impala_config.html
new file mode 100644
index 0000000..6950647
--- /dev/null
+++ b/docs/build/html/topics/impala_config.html
@@ -0,0 +1,48 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_config_performance.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_odbc.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_jdbc.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="config"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Managing Impala</title></head><body id="config"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Managing Impala</h1>
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      This section explains how to configure Impala to accept connections from applications that use popular
+      programming APIs:
+    </p>
+
+    <ul class="ul">
+      <li class="li">
+        <a class="xref" href="impala_config_performance.html#config_performance">Post-Installation Configuration for Impala</a>
+      </li>
+
+      <li class="li">
+        <a class="xref" href="impala_odbc.html#impala_odbc">Configuring Impala to Work with ODBC</a>
+      </li>
+
+      <li class="li">
+        <a class="xref" href="impala_jdbc.html#impala_jdbc">Configuring Impala to Work with JDBC</a>
+      </li>
+    </ul>
+
+    <p class="p">
+      This type of configuration is especially useful when using Impala in combination with Business Intelligence
+      tools, which use these standard interfaces to query different kinds of database and Big Data systems.
+    </p>
+
+    <p class="p">
+      You can also configure these other aspects of Impala:
+    </p>
+
+    <ul class="ul">
+      <li class="li">
+        <a class="xref" href="impala_security.html#security">Impala Security</a>
+      </li>
+
+      <li class="li">
+        <a class="xref" href="impala_config_options.html#config_options">Modifying Impala Startup Options</a>
+      </li>
+    </ul>
+  </div>
+<nav role="navigation" class="related-links"><ul class="ullinks"><li class="link ulchildlink"><strong><a href="../topics/impala_config_performance.html">Post-Installation Configuration for Impala</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_odbc.html">Configuring Impala to Work with ODBC</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_jdbc.html">Configuring Impala to Work with JDBC</a></strong><br></li></ul></nav></article></main></body></html>
\ No newline at end of file

[21/51] [partial] incubator-impala git commit: IMPALA-4181 [DOCS] Publish rendered Impala documentation to ASF site

Posted by jb...@apache.org.

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_new_features.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_new_features.html b/docs/build/html/topics/impala_new_features.html
new file mode 100644
index 0000000..4f8af12
--- /dev/null
+++ b/docs/build/html/topics/impala_new_features.html
@@ -0,0 +1,3712 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_release_notes.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="new_features"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>New Features in Apache Impala (incubating)</title></head><body id="new_features"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1"><span class="ph">New Features in Apache Impala (incubating)</span></h1>
+
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      This release of Impala contains the following changes and enhancements from previous releases.
+    </p>
+
+    <p class="p toc inpage"></p>
+
+  </div>
+
+
+
+  <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_release_notes.html">Impala Release Notes</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="new_features__new_features_280">
+
+    <h2 class="title topictitle2" id="ariaid-title2">New Features in <span class="keyword">Impala 2.8</span></h2>
+
+    <div class="body conbody">
+
+      <ul class="ul" id="new_features_280__feature_list">
+        <li class="li">
+          <p class="p">
+            Performance and scalability improvements:
+          </p>
+          <ul class="ul">
+            <li class="li">
+              <p class="p">
+                The <code class="ph codeph">COMPUTE STATS</code> statement can
+                take advantage of multithreading.
+              </p>
+            </li>
+            <li class="li">
+              <p class="p">
+                Improved scalability for highly concurrent loads by reducing the possibility of TCP/IP timeouts.
+                A configuration setting, <code class="ph codeph">accepted_cnxn_queue_depth</code>, can be adjusted upwards to
+                avoid this type of timeout on large clusters.
+              </p>
+            </li>
+            <li class="li">
+              <p class="p">
+                Several performance improvements were made to the mechanism for generating native code:
+              </p>
+              <ul class="ul">
+                <li class="li">
+                  <p class="p">
+                    Some queries involving analytic functions can take better advantage of native code generation.
+                  </p>
+                </li>
+                <li class="li">
+                  <p class="p">
+                    Modules produced during intermediate code generation are organized
+                    to be easier to cache and reuse during the lifetime of a long-running or complicated query.
+                  </p>
+                </li>
+                <li class="li">
+                  <p class="p">
+                    The <code class="ph codeph">COMPUTE STATS</code> statement is more efficient
+                    (less time for the codegen phase) for tables with a large number
+                    of columns, especially for tables containing <code class="ph codeph">TIMESTAMP</code>
+                    columns.
+                  </p>
+                </li>
+                <li class="li">
+                  <p class="p">
+                    The logic for determining whether or not to use a runtime filter is more reliable, and the
+                    evaluation process itself is faster because of native code generation.
+                  </p>
+                </li>
+              </ul>
+            </li>
+            <li class="li">
+              <p class="p">
+                The <code class="ph codeph">MT_DOP</code> query option enables
+                multithreading for a number of Impala operations.
+                <code class="ph codeph">COMPUTE STATS</code> statements for Parquet tables
+                use a default of <code class="ph codeph">MT_DOP=4</code> to improve the
+                intra-node parallelism and CPU efficiency of this data-intensive
+                operation.
+              </p>
+            </li>
+            <li class="li">
+              <p class="p">
+                The <code class="ph codeph">COMPUTE STATS</code> statement is more efficient
+                (less time for the codegen phase) for tables with a large number
+                of columns.
+              </p>
+            </li>
+            <li class="li">
+              <p class="p">
+                A new hint, <code class="ph codeph">CLUSTERED</code>,
+                allows Impala <code class="ph codeph">INSERT</code> operations on a Parquet table
+                that use dynamic partitioning to process a high number of
+                partitions in a single statement. The data is ordered based on the
+                partition key columns, and each partition is only written
+                by a single host, reducing the amount of memory needed to buffer
+                Parquet data while the data blocks are being constructed.
+              </p>
+            </li>
+            <li class="li">
+              <p class="p">
+                A new hint, <code class="ph codeph">SORTBY(<var class="keyword varname">cols</var>)</code>,
+                allows Impala <code class="ph codeph">INSERT</code> operations on a Parquet table
+                to produce optimized output files with better compressibility
+                and a more compact range of min/max values within each data file.
+              </p>
+            </li>
+            <li class="li">
+              <p class="p">
+                The new configuration setting <code class="ph codeph">inc_stats_size_limit_bytes</code>
+                lets you reduce the load on the catalog server when running the
+                <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> statement for very large tables.
+              </p>
+            </li>
+            <li class="li">
+              <p class="p">
+                Impala folds many constant expressions within query statements,
+                rather than evaluating them for each row. This optimization
+                is especially useful when using functions to manipulate and
+                format <code class="ph codeph">TIMESTAMP</code> values, such as the result
+                of an expression such as <code class="ph codeph">to_date(now() - interval 1 day)</code>.
+              </p>
+            </li>
+            <li class="li">
+              <p class="p">
+                Parsing of complicated expressions is faster. This speedup is
+                especially useful for queries containing large <code class="ph codeph">CASE</code>
+                expressions.
+              </p>
+            </li>
+            <li class="li">
+              <p class="p">
+                Evaluation is faster for <code class="ph codeph">IN</code> operators with many constant
+                arguments. The same performance improvement applies to other functions
+                with many constant arguments.
+              </p>
+            </li>
+            <li class="li">
+              <p class="p">
+                Impala optimizes identical comparison operators within multiple <code class="ph codeph">OR</code>
+                blocks.
+              </p>
+            </li>
+            <li class="li">
+              <p class="p">
+                The reporting for wall-clock times and total CPU time in profile output is more accurate.
+              </p>
+            </li>
+            <li class="li">
+              <p class="p">
+                A new query option, <code class="ph codeph">SCRATCH_LIMIT</code>, lets you restrict the amount of
+                space used when a query exceeds the memory limit and activates the <span class="q">"spill to disk"</span> mechanism.
+                This option helps to avoid runaway queries or make queries <span class="q">"fail fast"</span> if they require more
+                memory than anticipated. You can prevent runaway queries from using excessive amounts of spill space,
+                without restarting the cluster to turn the spilling feature off entirely.
+                See <a class="xref" href="impala_scratch_limit.html#scratch_limit">SCRATCH_LIMIT Query Option</a> for details.
+              </p>
+            </li>
+          </ul>
+        </li>
+        <li class="li">
+          <p class="p">
+            Integration with Apache Kudu:
+          </p>
+          <ul class="ul">
+            <li class="li">
+              <p class="p">
+                The experimental Impala support for the Kudu storage layer has been folded
+                into the main Impala development branch. Impala can now directly access Kudu tables,
+                opening up new capabilities such as enhanced DML operations and continuous ingestion.
+              </p>
+            </li>
+            <li class="li">
+              <p class="p">
+                The <code class="ph codeph">DELETE</code> statement is a flexible way to remove data from a Kudu table. Previously,
+                removing data from an Impala table involved removing or rewriting the underlying data files, dropping entire partitions,
+                or rewriting the entire table. This Impala statement only works for Kudu tables.
+              </p>
+            </li>
+            <li class="li">
+              <p class="p">
+                The <code class="ph codeph">UPDATE</code> statement is a flexible way to modify data within a Kudu table. Previously,
+                updating data in an Impala table involved replacing the underlying data files, dropping entire partitions,
+                or rewriting the entire table. This Impala statement only works for Kudu tables.
+              </p>
+            </li>
+            <li class="li">
+              <p class="p">
+                The <code class="ph codeph">UPSERT</code> statement is a flexible way to ingest, modify, or both data within a Kudu table. Previously,
+                ingesting data that might contain duplicates involved an inefficient multi-stage operation, and there was no
+                built-in protection against duplicate data. The <code class="ph codeph">UPSERT</code> statement, in combination with
+                the primary key designation for Kudu tables, lets you add or replace rows in a single operation, and
+                automatically avoids creating any duplicate data.
+              </p>
+            </li>
+            <li class="li">
+              <p class="p">
+                The <code class="ph codeph">CREATE TABLE</code> statement gains some new clauses that are specific to Kudu tables:
+                <code class="ph codeph">PARTITION BY</code>, <code class="ph codeph">PARTITIONS</code>, <code class="ph codeph">STORED AS KUDU</code>, and column
+                attributes <code class="ph codeph">PRIMARY KEY</code>, <code class="ph codeph">NULL</code> and <code class="ph codeph">NOT NULL</code>,
+                <code class="ph codeph">ENCODING</code>, <code class="ph codeph">COMPRESSION</code>, <code class="ph codeph">DEFAULT</code>, and <code class="ph codeph">BLOCK_SIZE</code>.
+                These clauses replace the explicit <code class="ph codeph">TBLPROPERTIES</code> settings that were required in the
+                early experimental phases of integration between Impala and Kudu.
+              </p>
+            </li>
+            <li class="li">
+              <p class="p">
+                The <code class="ph codeph">ALTER TABLE</code> statement can change certain attributes of Kudu tables.
+                You can add, drop, or rename columns.
+                You can add or drop range partitions.
+                You can change the <code class="ph codeph">TBLPROPERTIES</code> value to rename or point to a different underlying Kudu table,
+                independently from the Impala table name in the metastore database.
+                You cannot change the data type of an existing column in a Kudu table.
+              </p>
+            </li>
+            <li class="li">
+              <p class="p">
+                The <code class="ph codeph">SHOW PARTITIONS</code> statement displays information about the distribution of data
+                between partitions in Kudu tables. A new variation, <code class="ph codeph">SHOW RANGE PARTITIONS</code>,
+                displays information about the Kudu-specific partitions that apply across ranges of key values.
+              </p>
+            </li>
+            <li class="li">
+              <p class="p">
+                Not all Impala data types are supported in Kudu tables. In particular, currently the Impala
+                <code class="ph codeph">TIMESTAMP</code> type is not allowed in a Kudu table. Impala does not recognize the
+                <code class="ph codeph">UNIXTIME_MICROS</code> Kudu type when it is present in a Kudu table. (These two
+                representations of date/time data use different units and are not directly compatible.)
+                You cannot create columns of type <code class="ph codeph">TIMESTAMP</code>, <code class="ph codeph">DECIMAL</code>,
+                <code class="ph codeph">VARCHAR</code>, or <code class="ph codeph">CHAR</code> within a Kudu table. Within a query, you can
+                cast values in a result set to these types. Certain types, such as <code class="ph codeph">BOOLEAN</code>,
+                cannot be used as primary key columns.
+              </p>
+            </li>
+            <li class="li">
+              <p class="p">
+                Currently, Kudu tables are not interchangeable between Impala and Hive the way other kinds of Impala tables are.
+                Although the metadata for Kudu tables is stored in the metastore database, currently Hive cannot access Kudu tables.
+              </p>
+            </li>
+            <li class="li">
+              <p class="p">
+                The <code class="ph codeph">INSERT</code> statement works for Kudu tables. The organization
+                of the Kudu data makes it more efficient than with HDFS-backed tables to insert
+                data in small batches, such as with the <code class="ph codeph">INSERT ... VALUES</code> syntax.
+              </p>
+            </li>
+            <li class="li">
+              <p class="p">
+                Some audit data is recorded for data governance purposes.
+                All <code class="ph codeph">UPDATE</code>, <code class="ph codeph">DELETE</code>, and <code class="ph codeph">UPSERT</code> statements are characterized
+                as <code class="ph codeph">INSERT</code> operations in the audit log. Currently, lineage metadata is not generated for
+                <code class="ph codeph">UPDATE</code> and <code class="ph codeph">DELETE</code> operations on Kudu tables.
+              </p>
+            </li>
+            <li class="li">
+              <div class="p">
+                Currently, Kudu tables have limited support for Sentry:
+                <ul class="ul">
+                  <li class="li">
+                    <p class="p">
+                      Access to Kudu tables must be granted to roles as usual.
+                    </p>
+                  </li>
+                  <li class="li">
+                    <p class="p">
+                      Currently, access to a Kudu table through Sentry is <span class="q">"all or nothing"</span>.
+                      You cannot enforce finer-grained permissions such as at the column level,
+                      or permissions on certain operations such as <code class="ph codeph">INSERT</code>.
+                    </p>
+                  </li>
+                  <li class="li">
+                    <p class="p">
+                      Only users with <code class="ph codeph">ALL</code> privileges on <code class="ph codeph">SERVER</code> can create external Kudu tables.
+                    </p>
+                  </li>
+                </ul>
+                Because non-SQL APIs can access Kudu data without going through Sentry
+                authorization, currently the Sentry support is considered preliminary.
+              </div>
+            </li>
+            <li class="li">
+              <p class="p">
+                Equality and <code class="ph codeph">IN</code> predicates in Impala queries are pushed to
+                Kudu and evaluated efficiently by the Kudu storage layer.
+              </p>
+            </li>
+          </ul>
+        </li>
+        <li class="li">
+          <p class="p">
+            <strong class="ph b">Security:</strong>
+          </p>
+          <ul class="ul">
+            <li class="li">
+              <p class="p">
+                Impala can take advantage of the S3 encrypted credential
+                store, to avoid exposing the secret key when accessing
+                data stored on S3.
+              </p>
+            </li>
+          </ul>
+        </li>
+        <li class="li">
+          <p class="p">
+            The <code class="ph codeph">REFRESH</code> statement now updates information about HDFS block locations.
+            Therefore, you can perform a fast and efficient <code class="ph codeph">REFRESH</code> after doing an HDFS
+            rebalancing operation instead of the more expensive <code class="ph codeph">INVALIDATE METADATA</code> statement.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            [<a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-1654" target="_blank">IMPALA-1654</a>]
+            Several kinds of DDL operations
+            can now work on a range of partitions. The partitions can be specified
+            using operators such as <code class="ph codeph">&lt;</code>, <code class="ph codeph">&gt;=</code>, and
+            <code class="ph codeph">!=</code> rather than just an equality predicate applying to a single
+            partition.
+            This new feature extends the syntax of several clauses
+            of the <code class="ph codeph">ALTER TABLE</code> statement
+            (<code class="ph codeph">DROP PARTITION</code>, <code class="ph codeph">SET [UN]CACHED</code>,
+            <code class="ph codeph">SET FILEFORMAT | SERDEPROPERTIES | TBLPROPERTIES</code>),
+            the <code class="ph codeph">SHOW FILES</code> statement, and the
+            <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> statement.
+            It does not apply to statements that are defined to only apply to a single
+            partition, such as <code class="ph codeph">LOAD DATA</code>, <code class="ph codeph">ALTER TABLE ... ADD PARTITION</code>,
+            <code class="ph codeph">SET LOCATION</code>, and <code class="ph codeph">INSERT</code> with a static
+            partitioning clause.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            The <code class="ph codeph">instr()</code> function has optional second and third arguments, representing
+            the character to position to begin searching for the substring, and the Nth occurrence
+            of the substring to find.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            Improved error handling for malformed Avro data. In particular, incorrect
+            precision or scale for <code class="ph codeph">DECIMAL</code> types is now handled.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            Impala debug web UI:
+          </p>
+          <ul class="ul">
+            <li class="li">
+              <p class="p">
+                In addition to <span class="q">"inflight"</span> and <span class="q">"finished"</span> queries, the web UI
+                now also includes a section for <span class="q">"queued"</span> queries.
+              </p>
+            </li>
+            <li class="li">
+              <p class="p">
+                The <span class="ph uicontrol">/sessions</span> tab now clarifies how many of the displayed
+                sections are active, and lets you sort by <span class="ph uicontrol">Expired</span> status
+                to distinguish active sessions from expired ones.
+              </p>
+            </li>
+          </ul>
+        </li>
+        <li class="li">
+          <p class="p">
+            Improved stability when DDL operations such as <code class="ph codeph">CREATE DATABASE</code>
+            or <code class="ph codeph">DROP DATABASE</code> are run in Hive at the same time as an Impala
+            <code class="ph codeph">INVALIDATE METADATA</code> statement.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            The <span class="q">"out of memory"</span> error report was made more user-friendly, with additional
+            diagnostic information to help identify the spot where the memory limit was exceeded.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            Improved disk space usage for Java-based UDFs. Temporary copies of the associated JAR
+            files are removed when no longer needed, so that they do not accumulate across restarts
+            of the <span class="keyword cmdname">catalogd</span> daemon and potentially cause an out-of-space condition.
+            These temporary files are also created in the directory specified by the <code class="ph codeph">local_library_dir</code>
+            configuration setting, so that the storage for these temporary files can be independent
+            from any capacity limits on the <span class="ph filepath">/tmp</span> filesystem.
+          </p>
+        </li>
+      </ul>
+
+    </div>
+  </article>
+
+
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="new_features__new_features_270">
+
+    <h2 class="title topictitle2" id="ariaid-title3">New Features in <span class="keyword">Impala 2.7</span></h2>
+
+    <div class="body conbody">
+
+      <ul class="ul" id="new_features_270__feature_list">
+        <li class="li">
+          <p class="p">
+            Performance improvements:
+          </p>
+          <ul class="ul">
+            <li class="li">
+              <p class="p">
+                [<a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-3206" target="_blank">IMPALA-3206</a>]
+                Speedup for queries against <code class="ph codeph">DECIMAL</code> columns in Avro tables.
+                The code that parses <code class="ph codeph">DECIMAL</code> values from Avro now uses
+                native code generation.
+              </p>
+            </li>
+            <li class="li">
+              <p class="p">
+                [<a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-3674" target="_blank">IMPALA-3674</a>]
+                Improved efficiency in LLVM code generation can reduce codegen time, especially
+                for short queries.
+              </p>
+            </li>
+            
+            <li class="li">
+              <p class="p">
+                [<a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2979" target="_blank">IMPALA-2979</a>]
+                Improvements to scheduling on worker nodes,
+                enabled by the <code class="ph codeph">REPLICA_PREFERENCE</code> query option.
+                See <a class="xref" href="impala_replica_preference.html#replica_preference">REPLICA_PREFERENCE Query Option (Impala 2.7 or higher only)</a> for details.
+              </p>
+            </li>
+          </ul>
+        </li>
+        
+        <li class="li">
+          <p class="p">
+            [<a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-1683" target="_blank">IMPALA-1683</a>]
+            The <code class="ph codeph">REFRESH</code> statement can be applied to a single partition,
+            rather than the entire table. See <a class="xref" href="impala_refresh.html#refresh">REFRESH Statement</a>
+            and <a class="xref" href="impala_partitioning.html#partition_refresh">Refreshing a Single Partition</a> for details.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            Improvements to the Impala web user interface:
+          </p>
+          <ul class="ul">
+            <li class="li">
+              <p class="p">
+                [<a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2767" target="_blank">IMPALA-2767</a>]
+                You can now force a session to expire by clicking a link in the web UI,
+                on the <span class="ph uicontrol">/sessions</span> tab.
+              </p>
+            </li>
+            <li class="li">
+              <p class="p">
+                [<a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-3715" target="_blank">IMPALA-3715</a>]
+                The <span class="ph uicontrol">/memz</span> tab includes more information about
+                Impala memory usage.
+              </p>
+            </li>
+            <li class="li">
+              <p class="p">
+                [<a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-3716" target="_blank">IMPALA-3716</a>]
+                The <span class="ph uicontrol">Details</span> page for a query now includes
+                a <span class="ph uicontrol">Memory</span> tab.
+              </p>
+            </li>
+          </ul>
+        </li>
+        <li class="li">
+          <p class="p">
+            [<a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-3499" target="_blank">IMPALA-3499</a>]
+            Scalability improvements to the catalog server. Impala handles internal communication
+            more efficiently for tables with large numbers of columns and partitions, where the
+            size of the metadata exceeds 2 GiB.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            [<a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-3677" target="_blank">IMPALA-3677</a>]
+            You can send a <code class="ph codeph">SIGUSR1</code> signal to any Impala-related daemon to write a
+            Breakpad minidump. For advanced troubleshooting, you can now produce a minidump
+            without triggering a crash. See <a class="xref" href="impala_breakpad.html#breakpad">Breakpad Minidumps for Impala (Impala 2.6 or higher only)</a> for
+            details about the Breakpad minidump feature.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            [<a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-3687" target="_blank">IMPALA-3687</a>]
+            The schema reconciliation rules for Avro tables have changed slightly
+            for <code class="ph codeph">CHAR</code> and <code class="ph codeph">VARCHAR</code> columns. Now, if
+            the definition of such a column is changed in the Avro schema file,
+            the column retains its <code class="ph codeph">CHAR</code> or <code class="ph codeph">VARCHAR</code>
+            type as specified in the SQL definition, but the column name and comment
+            from the Avro schema file take precedence.
+            See <a class="xref" href="impala_avro.html#avro_create_table">Creating Avro Tables</a> for details about
+            column definitions in Avro tables.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            [<a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-3575" target="_blank">IMPALA-3575</a>]
+            Some network
+            operations now have additional timeout and retry settings. The extra
+            configuration helps avoid failed queries for transient network
+            problems, to avoid hangs when a sender or receiver fails in the
+            middle of a network transmission, and to make cancellation requests
+            more reliable despite network issues. </p>
+        </li>
+      </ul>
+
+    </div>
+  </article>
+
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="new_features__new_features_260">
+
+    <h2 class="title topictitle2" id="ariaid-title4">New Features in <span class="keyword">Impala 2.6</span></h2>
+
+    <div class="body conbody">
+
+      <ul class="ul">
+        <li class="li">
+          <p class="p">
+            Improvements to Impala support for the Amazon S3 filesystem:
+          </p>
+          <ul class="ul">
+            <li class="li">
+              <p class="p">
+                Impala can now write to S3 tables through the <code class="ph codeph">INSERT</code>
+                or <code class="ph codeph">LOAD DATA</code> statements.
+                See <a class="xref" href="impala_s3.html#s3">Using Impala with the Amazon S3 Filesystem</a> for general information about
+                using Impala with S3.
+              </p>
+            </li>
+            <li class="li">
+              <p class="p">
+                A new query option, <code class="ph codeph">S3_SKIP_INSERT_STAGING</code>, lets you
+                trade off between fast <code class="ph codeph">INSERT</code> performance and
+                slower <code class="ph codeph">INSERT</code>s that are more consistent if a
+                problem occurs during the statement. The new behavior is enabled by default.
+                See <a class="xref" href="impala_s3_skip_insert_staging.html#s3_skip_insert_staging">S3_SKIP_INSERT_STAGING Query Option (Impala 2.6 or higher only)</a> for details
+                about this option.
+              </p>
+            </li>
+          </ul>
+        </li>
+        <li class="li">
+          <p class="p">
+            Performance improvements for the runtime filtering feature:
+          </p>
+          <ul class="ul">
+            <li class="li">
+              <p class="p">
+                The default for the <code class="ph codeph">RUNTIME_FILTER_MODE</code>
+                query option is changed to <code class="ph codeph">GLOBAL</code> (the highest setting).
+                See <a class="xref" href="impala_runtime_filter_mode.html#runtime_filter_mode">RUNTIME_FILTER_MODE Query Option (Impala 2.5 or higher only)</a> for
+                details about this option.
+              </p>
+            </li>
+            <li class="li">
+              <p class="p">
+                The <code class="ph codeph">RUNTIME_BLOOM_FILTER_SIZE</code> setting is now only used
+                as a fallback if statistics are not available; otherwise, Impala
+                uses the statistics to estimate the appropriate size to use for each filter.
+                See <a class="xref" href="impala_runtime_bloom_filter_size.html#runtime_bloom_filter_size">RUNTIME_BLOOM_FILTER_SIZE Query Option (Impala 2.5 or higher only)</a> for
+                details about this option.
+              </p>
+            </li>
+            <li class="li">
+              <p class="p">
+                New query options <code class="ph codeph">RUNTIME_FILTER_MIN_SIZE</code> and
+                <code class="ph codeph">RUNTIME_FILTER_MAX_SIZE</code> let you fine-tune
+                the sizes of the Bloom filter structures used for runtime filtering.
+                If the filter size derived from Impala internal estimates or from
+                the <code class="ph codeph">RUNTIME_FILTER_BLOOM_SIZE</code> falls outside the size
+                range specified by these options, any too-small filter size is adjusted
+                to the minimum, and any too-large filter size is adjusted to the maximum.
+                See <a class="xref" href="impala_runtime_filter_min_size.html#runtime_filter_min_size">RUNTIME_FILTER_MIN_SIZE Query Option (Impala 2.6 or higher only)</a>
+                and <a class="xref" href="impala_runtime_filter_max_size.html#runtime_filter_max_size">RUNTIME_FILTER_MAX_SIZE Query Option (Impala 2.6 or higher only)</a>
+                for details about these options.
+              </p>
+            </li>
+            <li class="li">
+              <p class="p">
+                Runtime filter propagation now applies to all the
+                operands of <code class="ph codeph">UNION</code> and <code class="ph codeph">UNION ALL</code>
+                operators.
+              </p>
+            </li>
+            <li class="li">
+              <p class="p">
+                Runtime filters can now be produced during join queries even
+                when the join processing activates the spill-to-disk mechanism.
+              </p>
+            </li>
+          </ul>
+            See <a class="xref" href="impala_runtime_filtering.html#runtime_filtering">Runtime Filtering for Impala Queries (Impala 2.5 or higher only)</a> for
+            general information about the runtime filtering feature.
+        </li>
+        
+        <li class="li">
+          <p class="p">
+            Admission control and dynamic resource pools are enabled by default.
+            See <a class="xref" href="impala_admission.html#admission_control">Admission Control and Query Queuing</a> for details
+            about admission control.
+          </p>
+        </li>
+        
+        <li class="li">
+          <p class="p">
+            Impala can now manually set column statistics,
+            using the <code class="ph codeph">ALTER TABLE</code> statement with a
+            <code class="ph codeph">SET COLUMN STATS</code> clause.
+            See <a class="xref" href="impala_perf_stats.html#perf_column_stats_manual">Setting Column Stats Manually through ALTER TABLE</a> for details.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            Impala can now write lightweight <span class="q">"minidump"</span> files, rather
+            than large core files, to save diagnostic information when
+            any of the Impala-related daemons crash. This feature uses the
+            open source <code class="ph codeph">breakpad</code> framework.
+            See <a class="xref" href="impala_breakpad.html#breakpad">Breakpad Minidumps for Impala (Impala 2.6 or higher only)</a> for details.
+          </p>
+        </li>
+        <li class="li">
+          <div class="p">
+            New query options improve interoperability with Parquet files:
+            <ul class="ul">
+              <li class="li">
+                <p class="p">
+                  The <code class="ph codeph">PARQUET_FALLBACK_SCHEMA_RESOLUTION</code> query option
+                  lets Impala locate columns within Parquet files based on
+                  column name rather than ordinal position.
+                  This enhancement improves interoperability with applications
+                  that write Parquet files with a different order or subset of
+                  columns than are used in the Impala table.
+                  See <a class="xref" href="impala_parquet_fallback_schema_resolution.html#parquet_fallback_schema_resolution">PARQUET_FALLBACK_SCHEMA_RESOLUTION Query Option (Impala 2.6 or higher only)</a>
+                  for details.
+                </p>
+              </li>
+              <li class="li">
+                <p class="p">
+                  The <code class="ph codeph">PARQUET_ANNOTATE_STRINGS_UTF8</code> query option
+                  makes Impala include the <code class="ph codeph">UTF-8</code> annotation
+                  metadata for <code class="ph codeph">STRING</code>, <code class="ph codeph">CHAR</code>,
+                  and <code class="ph codeph">VARCHAR</code> columns in Parquet files created
+                  by <code class="ph codeph">INSERT</code> or <code class="ph codeph">CREATE TABLE AS SELECT</code>
+                  statements.
+                  See <a class="xref" href="impala_parquet_annotate_strings_utf8.html#parquet_annotate_strings_utf8">PARQUET_ANNOTATE_STRINGS_UTF8 Query Option (Impala 2.6 or higher only)</a>
+                  for details.
+                </p>
+              </li>
+            </ul>
+            See <a class="xref" href="impala_parquet.html#parquet">Using the Parquet File Format with Impala Tables</a> for general information about working
+            with Parquet files.
+          </div>
+        </li>
+        <li class="li">
+          <p class="p">
+            Improvements to security and reduction in overhead for secure clusters:
+          </p>
+          <ul class="ul">
+            <li class="li">
+              <p class="p">
+                Overall performance improvements for secure clusters.
+                (TPC-H queries on a secure cluster were benchmarked
+                at roughly 3x as fast as the previous release.)
+              </p>
+            </li>
+            <li class="li">
+              <p class="p">
+                Impala now recognizes the <code class="ph codeph">auth_to_local</code> setting,
+                specified through the HDFS configuration setting
+                <code class="ph codeph">hadoop.security.auth_to_local</code>.
+                This feature is disabled by default; to enable it,
+                specify <code class="ph codeph">--load_auth_to_local_rules=true</code>
+                in the <span class="keyword cmdname">impalad</span> configuration settings.
+                See <a class="xref" href="impala_kerberos.html#auth_to_local">Mapping Kerberos Principals to Short Names for Impala</a> for details.
+              </p>
+            </li>
+            <li class="li">
+              <p class="p">
+                Timing improvements in the mechanism for the <span class="keyword cmdname">impalad</span>
+                daemon to acquire Kerberos tickets. This feature spreads out the overhead
+                on the KDC during Impala startup, especially for large clusters.
+              </p>
+            </li>
+            <li class="li">
+              <p class="p">
+                For Kerberized clusters, the Catalog service now uses
+                the Kerberos principal instead of the operating sytem user that runs
+                the <span class="keyword cmdname">catalogd</span> daemon.
+                This eliminates the requirement to configure a <code class="ph codeph">hadoop.user.group.static.mapping.overrides</code>
+                setting to put the OS user into the Sentry administrative group, on clusters where the principal
+                and the OS user name for this user are different.
+              </p>
+            </li>
+          </ul>
+        </li>
+        <li class="li">
+          <p class="p">
+            Overall performance improvements for join queries, by using a prefetching mechanism
+            while building the in-memory hash table to evaluate join predicates.
+            See <a class="xref" href="impala_prefetch_mode.html#prefetch_mode">PREFETCH_MODE Query Option (Impala 2.6 or higher only)</a> for the query option
+            to control this optimization.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            The <span class="keyword cmdname">impala-shell</span> interpreter has a new command,
+            <code class="ph codeph">SOURCE</code>, that lets you run a set of SQL statements
+            or other <span class="keyword cmdname">impala-shell</span> commands stored in a file.
+            You can run additional <code class="ph codeph">SOURCE</code> commands from inside
+            a file, to set up flexible sequences of statements for use cases
+            such as schema setup, ETL, or reporting.
+            See <a class="xref" href="impala_shell_commands.html#shell_commands">impala-shell Command Reference</a> for details
+            and <a class="xref" href="impala_shell_running_commands.html#shell_running_commands">Running Commands and SQL Statements in impala-shell</a>
+            for examples.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            The <code class="ph codeph">millisecond()</code> built-in function lets you extract
+            the fractional seconds part of a <code class="ph codeph">TIMESTAMP</code> value.
+            See <a class="xref" href="impala_datetime_functions.html#datetime_functions">Impala Date and Time Functions</a> for details.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            If an Avro table is created without column definitions in the
+            <code class="ph codeph">CREATE TABLE</code> statement, and columns are later
+            added through <code class="ph codeph">ALTER TABLE</code>, the resulting
+            table is now queryable. Missing values from the newly added
+            columns now default to <code class="ph codeph">NULL</code>.
+            See <a class="xref" href="impala_avro.html#avro">Using the Avro File Format with Impala Tables</a> for general details about
+            working with Avro files.
+          </p>
+        </li>
+        <li class="li">
+          <div class="p">
+            The mechanism for interpreting <code class="ph codeph">DECIMAL</code> literals is
+            improved, no longer going through an intermediate conversion step
+            to <code class="ph codeph">DOUBLE</code>:
+            <ul class="ul">
+              <li class="li">
+                <p class="p">
+                  Casting a <code class="ph codeph">DECIMAL</code> value to <code class="ph codeph">TIMESTAMP</code>
+                  <code class="ph codeph">DOUBLE</code> produces a more precise
+                  value for the <code class="ph codeph">TIMESTAMP</code> than formerly.
+                </p>
+              </li>
+              <li class="li">
+                <p class="p">
+                  Certain function calls involving <code class="ph codeph">DECIMAL</code> literals
+                  now succeed, when formerly they failed due to lack of a function
+                  signature with a <code class="ph codeph">DOUBLE</code> argument.
+                </p>
+              </li>
+              <li class="li">
+                <p class="p">
+                  Faster runtime performance for <code class="ph codeph">DECIMAL</code> constant
+                  values, through improved native code generation for all combinations
+                  of precision and scale.
+                </p>
+              </li>
+            </ul>
+            See <a class="xref" href="impala_decimal.html#decimal">DECIMAL Data Type (Impala 1.4 or higher only)</a> for details about the <code class="ph codeph">DECIMAL</code> type.
+          </div>
+        </li>
+        <li class="li">
+          <p class="p">
+            Improved type accuracy for <code class="ph codeph">CASE</code> return values.
+            If all <code class="ph codeph">WHEN</code> clauses of the <code class="ph codeph">CASE</code>
+            expression are of <code class="ph codeph">CHAR</code> type, the final result
+            is also <code class="ph codeph">CHAR</code> instead of being converted to
+            <code class="ph codeph">STRING</code>.
+            See <a class="xref" href="impala_conditional_functions.html#conditional_functions">Impala Conditional Functions</a>
+            for details about the <code class="ph codeph">CASE</code> function.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            Uncorrelated queries using the <code class="ph codeph">NOT EXISTS</code> operator
+            are now supported. Formerly, the <code class="ph codeph">NOT EXISTS</code>
+            operator was only available for correlated subqueries.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            Improved performance for reading Parquet files.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            Improved performance for <dfn class="term">top-N</dfn> queries, that is,
+            those including both <code class="ph codeph">ORDER BY</code> and
+            <code class="ph codeph">LIMIT</code> clauses.
+          </p>
+        </li>
+        
+        <li class="li">
+          <p class="p">
+            Impala optionally skips an arbitrary number of header lines from text input
+            files on HDFS based on the <code class="ph codeph">skip.header.line.count</code> value
+            in the <code class="ph codeph">TBLPROPERTIES</code> field of the table metadata.
+            See <a class="xref" href="impala_txtfile.html#text_data_files">Data Files for Text Tables</a> for details.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            Trailing comments are now allowed in queries processed by
+            the <span class="keyword cmdname">impala-shell</span> options <code class="ph codeph">-q</code>
+            and <code class="ph codeph">-f</code>.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            Impala can run <code class="ph codeph">COUNT</code> queries for RCFile tables
+            that include complex type columns.
+            See <a class="xref" href="impala_complex_types.html#complex_types">Complex Types (Impala 2.3 or higher only)</a> for
+            general information about working with complex types,
+            and <a class="xref" href="impala_array.html#array">ARRAY Complex Type (Impala 2.3 or higher only)</a>,
+            <a class="xref" href="impala_map.html#map">MAP Complex Type (Impala 2.3 or higher only)</a>, and <a class="xref" href="impala_struct.html#struct">STRUCT Complex Type (Impala 2.3 or higher only)</a>
+            for syntax details of each type.
+          </p>
+        </li>
+      </ul>
+
+    </div>
+  </article>
+
+
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title5" id="new_features__new_features_250">
+
+    <h2 class="title topictitle2" id="ariaid-title5">New Features in <span class="keyword">Impala 2.5</span></h2>
+
+    <div class="body conbody">
+
+      <ul class="ul">
+        <li class="li">
+          <p class="p">
+            Dynamic partition pruning. When a query refers to a partition key column in a <code class="ph codeph">WHERE</code>
+            clause, and the exact set of column values are not known until the query is executed,
+            Impala evaluates the predicate and skips the I/O for entire partitions that are not needed.
+            For example, if a table was partitioned by year, Impala would apply this technique to a query
+            such as <code class="ph codeph">SELECT c1 FROM partitioned_table WHERE year = (SELECT MAX(year) FROM other_table)</code>.
+            <span class="ph">See <a class="xref" href="impala_partitioning.html#dynamic_partition_pruning">Dynamic Partition Pruning</a> for details.</span>
+          </p>
+          <p class="p">
+            The dynamic partition pruning optimization technique lets Impala avoid reading
+            data files from partitions that are not part of the result set, even when
+            that determination cannot be made in advance. This technique is especially valuable
+            when performing join queries involving partitioned tables. For example, if a join
+            query includes an <code class="ph codeph">ON</code> clause and a <code class="ph codeph">WHERE</code> clause
+            that refer to the same columns, the query can find the set of column values that
+            match the <code class="ph codeph">WHERE</code> clause, and only scan the associated partitions
+            when evaluating the <code class="ph codeph">ON</code> clause.
+          </p>
+          <p class="p">
+            Dynamic partition pruning is controlled by the same settings as the runtime filtering feature.
+            By default, this feature is enabled at a medium level, because the maximum setting can use
+            slightly more memory for queries than in previous releases.
+            To fully enable this feature, set the query option <code class="ph codeph">RUNTIME_FILTER_MODE=GLOBAL</code>.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            Runtime filtering. This is a wide-ranging set of optimizations that are especially valuable for join queries.
+            Using the same technique as with dynamic partition pruning,
+            Impala uses the predicates from <code class="ph codeph">WHERE</code> and <code class="ph codeph">ON</code> clauses
+            to determine the subset of column values from one of the joined tables could possibly be part of the
+            result set. Impala sends a compact representation of the filter condition to the hosts in the cluster,
+            instead of the full set of values or the entire table.
+            <span class="ph">See <a class="xref" href="impala_runtime_filtering.html#runtime_filtering">Runtime Filtering for Impala Queries (Impala 2.5 or higher only)</a> for details.</span>
+          </p>
+          <p class="p">
+            By default, this feature is enabled at a medium level, because the maximum setting can use
+            slightly more memory for queries than in previous releases.
+            To fully enable this feature, set the query option <code class="ph codeph">RUNTIME_FILTER_MODE=GLOBAL</code>.
+            <span class="ph">See <a class="xref" href="impala_runtime_filter_mode.html#runtime_filter_mode">RUNTIME_FILTER_MODE Query Option (Impala 2.5 or higher only)</a> for details.</span>
+          </p>
+          <p class="p">
+            This feature involves some new query options:
+            <a class="xref" href="impala_runtime_filter_mode.html">RUNTIME_FILTER_MODE</a>,
+            <a class="xref" href="impala_max_num_runtime_filters.html">MAX_NUM_RUNTIME_FILTERS</a>,
+            <a class="xref" href="impala_runtime_bloom_filter_size.html">RUNTIME_BLOOM_FILTER_SIZE</a>,
+            <a class="xref" href="impala_runtime_filter_wait_time_ms.html">RUNTIME_FILTER_WAIT_TIME_MS</a>,
+            and <a class="xref" href="impala_disable_row_runtime_filtering.html">DISABLE_ROW_RUNTIME_FILTERING</a>.
+            <span class="ph">See
+            <a class="xref" href="impala_runtime_filter_mode.html#runtime_filter_mode">RUNTIME_FILTER_MODE</a>,
+            <a class="xref" href="impala_max_num_runtime_filters.html#max_num_runtime_filters">MAX_NUM_RUNTIME_FILTERS</a>,
+            <a class="xref" href="impala_runtime_bloom_filter_size.html#runtime_bloom_filter_size">RUNTIME_BLOOM_FILTER_SIZE</a>,
+            <a class="xref" href="impala_runtime_filter_wait_time_ms.html#runtime_filter_wait_time_ms">RUNTIME_FILTER_WAIT_TIME_MS</a>, and
+            <a class="xref" href="impala_disable_row_runtime_filtering.html#disable_row_runtime_filtering">DISABLE_ROW_RUNTIME_FILTERING</a>
+            for details.
+            </span>
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            More efficient use of the HDFS caching feature, to avoid
+            hotspots and bottlenecks that could occur if heavily used
+            cached data blocks were always processed by the same host.
+            By default, Impala now randomizes which host processes each cached
+            HDFS data block, when cached replicas are available on multiple hosts.
+            (Remember to use the <code class="ph codeph">WITH REPLICATION</code> clause with the
+            <code class="ph codeph">CREATE TABLE</code> or <code class="ph codeph">ALTER TABLE</code> statement
+            when enabling HDFS caching for a table or partition, to cache the same
+            data blocks across multiple hosts.)
+            The new query option <code class="ph codeph">SCHEDULE_RANDOM_REPLICA</code>
+            
+            lets you fine-tune the interaction with HDFS caching even more.
+            <span class="ph">See <a class="xref" href="impala_perf_hdfs_caching.html#hdfs_caching">Using HDFS Caching with Impala (Impala 2.1 or higher only)</a> for details.</span>
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            The <code class="ph codeph">TRUNCATE TABLE</code> statement now accepts an <code class="ph codeph">IF EXISTS</code>
+            clause, making <code class="ph codeph">TRUNCATE TABLE</code> easier to use in setup or ETL scripts where the table might or
+            might not exist.
+            <span class="ph">See <a class="xref" href="impala_truncate_table.html#truncate_table">TRUNCATE TABLE Statement (Impala 2.3 or higher only)</a> for details.</span>
+          </p>
+        </li>
+        <li class="li">
+          <div class="p">
+            Improved performance and reliability for the <code class="ph codeph">DECIMAL</code> data type:
+            <ul class="ul">
+            <li class="li">
+              <p class="p">
+                Using <code class="ph codeph">DECIMAL</code> values in a <code class="ph codeph">GROUP BY</code> clause now
+                triggers the native code generation optimization, speeding up queries that
+                group by values such as prices.
+              </p>
+            </li>
+            <li class="li">
+              <p class="p">
+                Checking for overflow in <code class="ph codeph">DECIMAL</code>
+                multiplication is now substantially faster, making <code class="ph codeph">DECIMAL</code>
+                a more practical data type in some use cases where formerly <code class="ph codeph">DECIMAL</code>
+                was much slower than <code class="ph codeph">FLOAT</code> or <code class="ph codeph">DOUBLE</code>.
+              </p>
+            </li>
+            <li class="li">
+              <p class="p">
+                Multiplying a mixture of <code class="ph codeph">DECIMAL</code>
+                and <code class="ph codeph">FLOAT</code> or <code class="ph codeph">DOUBLE</code> values now returns the
+                <code class="ph codeph">DOUBLE</code> rather than <code class="ph codeph">DECIMAL</code>. This change avoids
+                some cases where an intermediate value would underflow or overflow and become
+                <code class="ph codeph">NULL</code> unexpectedly.
+              </p>
+            </li>
+            </ul>
+            <span class="ph">See <a class="xref" href="impala_decimal.html">DECIMAL Data Type (Impala 1.4 or higher only)</a> for details.</span>
+          </div>
+        </li>
+        <li class="li">
+          <p class="p">
+            For UDFs written in Java, or Hive UDFs reused for Impala,
+            Impala now allows parameters and return values to be primitive types.
+            Formerly, these things were required to be one of the <span class="q">"Writable"</span>
+            object types.
+            <span class="ph">See <a class="xref" href="impala_udf.html#udfs_hive">Using Hive UDFs with Impala</a> for details.</span>
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            Performance improvements for HDFS I/O. Impala now caches HDFS file handles to avoid the
+            overhead of repeatedly opening the same file.
+          </p>
+        </li>
+
+        
+        <li class="li">
+          <p class="p">
+            Performance improvements for queries involving nested complex types.
+            Certain basic query types, such as counting the elements of a complex column,
+            now use an optimized code path.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            Improvements to the memory reservation mechanism for the Impala
+            admission control feature. You can specify more settings, such
+            as the timeout period and maximum aggregate memory used, for each
+            resource pool instead of globally for the Impala instance. The
+            default limit for concurrent queries (the <span class="ph uicontrol">max requests</span>
+            setting) is now unlimited instead of 200.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            Performance improvements related to code generation.
+            Even in queries where code generation is not performed
+            for some phases of execution (such as reading data from
+            Parquet tables), Impala can still use code generation in
+            other parts of the query, such as evaluating
+            functions in the <code class="ph codeph">WHERE</code> clause.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            Performance improvements for queries using aggregation functions
+            on high-cardinality columns.
+            Formerly, Impala could do unnecessary extra work to produce intermediate
+            results for operations such as <code class="ph codeph">DISTINCT</code> or <code class="ph codeph">GROUP BY</code>
+            on columns that were unique or had few duplicate values.
+            Now, Impala decides at run time whether it is more efficient to
+            do an initial aggregation phase and pass along a smaller set of intermediate data,
+            or to pass raw intermediate data back to next phase of query processing to be aggregated there.
+            This feature is known as <dfn class="term">streaming pre-aggregation</dfn>.
+            In case of performance regression, this feature can be turned off
+            using the <code class="ph codeph">DISABLE_STREAMING_PREAGGREGATIONS</code> query option.
+            <span class="ph">See <a class="xref" href="impala_disable_streaming_preaggregations.html#disable_streaming_preaggregations">DISABLE_STREAMING_PREAGGREGATIONS Query Option (Impala 2.5 or higher only)</a> for details.</span>
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            Spill-to-disk feature now always recommended. In earlier releases, the spill-to-disk feature
+            could be turned off using a pair of configuration settings,
+            <code class="ph codeph">enable_partitioned_aggregation=false</code> and
+            <code class="ph codeph">enable_partitioned_hash_join=false</code>.
+            The latest improvements in the spill-to-disk mechanism, and related features that
+            interact with it, make this feature robust enough that disabling it is now
+            no longer needed or supported. In particular, some new features in <span class="keyword">Impala 2.5</span>
+            and higher do not work when the spill-to-disk feature is disabled.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            Improvements to scripting capability for the <span class="keyword cmdname">impala-shell</span> command,
+            through user-specified substitution variables that can appear in statements processed
+            by <span class="keyword cmdname">impala-shell</span>:
+          </p>
+          <ul class="ul">
+            <li class="li">
+              <p class="p">
+                The <code class="ph codeph">--var</code> command-line option lets you pass key-value pairs to
+                <span class="keyword cmdname">impala-shell</span>. The shell can substitute the values
+                into queries before executing them, where the query text contains the notation
+                <code class="ph codeph">${var:<var class="keyword varname">varname</var>}</code>. For example, you might prepare a SQL file
+                containing a set of DDL statements and queries containing variables for
+                database and table names, and then pass the applicable names as part of the
+                <code class="ph codeph">impala-shell -f <var class="keyword varname">filename</var></code> command.
+                <span class="ph">See <a class="xref" href="impala_shell_running_commands.html#shell_running_commands">Running Commands and SQL Statements in impala-shell</a> for details.</span>
+              </p>
+            </li>
+            <li class="li">
+              <p class="p">
+                The <code class="ph codeph">SET</code> and <code class="ph codeph">UNSET</code> commands within the
+                <span class="keyword cmdname">impala-shell</span> interpreter now work with user-specified
+                substitution variables, as well as the built-in query options.
+                The two kinds of variables are divided in the <code class="ph codeph">SET</code> output.
+                As with variables defined by the <code class="ph codeph">--var</code> command-line option,
+                you refer to the user-specified substitution variables in queries by using
+                the notation <code class="ph codeph">${var:<var class="keyword varname">varname</var>}</code>
+                in the query text. Because the substitution variables are processed by
+                <span class="keyword cmdname">impala-shell</span> instead of the <span class="keyword cmdname">impalad</span>
+                backend, you cannot define your own substitution variables through the
+                <code class="ph codeph">SET</code> statement in a JDBC or ODBC application.
+                <span class="ph">See <a class="xref" href="impala_set.html#set">SET Statement</a> for details.</span>
+              </p>
+            </li>
+          </ul>
+        </li>
+        <li class="li">
+          <p class="p">
+            Performance improvements for query startup. Impala better parallelizes certain work
+            when coordinating plan distribution between <span class="keyword cmdname">impalad</span> instances, which improves
+            startup time for queries involving tables with many partitions on large clusters,
+            or complicated queries with many plan fragments.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            Performance and scalability improvements for tables with many partitions.
+            The memory requirements on the coordinator node are reduced, making it substantially
+            faster and less resource-intensive
+            to do joins involving several tables with thousands of partitions each.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            Whitelisting for access to internal APIs. For applications that need direct access
+            to Impala APIs, without going through the HiveServer2 or Beeswax interfaces, you can
+            specify a list of Kerberos users who are allowed to call those APIs. By default, the
+            <code class="ph codeph">impala</code> and <code class="ph codeph">hdfs</code> users are the only ones authorized
+            for this kind of access.
+            Any users not explicitly authorized through the <code class="ph codeph">internal_principals_whitelist</code>
+            configuration setting are blocked from accessing the APIs. This setting applies to all the
+            Impala-related daemons, although currently it is primarily used for HDFS to control the
+            behavior of the catalog server.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            Improvements to Impala integration and usability for Hue. (The code changes
+            are actually on the Hue side.)
+          </p>
+          <ul class="ul">
+          <li class="li">
+            <p class="p">
+              The list of tables now refreshes dynamically.
+            </p>
+          </li>
+          </ul>
+        </li>
+        <li class="li">
+          <p class="p">
+            Usability improvements for case-insensitive queries.
+            You can now use the operators <code class="ph codeph">ILIKE</code> and <code class="ph codeph">IREGEXP</code>
+            to perform case-insensitive wildcard matches or regular expression matches,
+            rather than explicitly converting column values with <code class="ph codeph">UPPER</code>
+            or <code class="ph codeph">LOWER</code>.
+            <span class="ph">See <a class="xref" href="impala_operators.html#ilike">ILIKE Operator</a> and <a class="xref" href="impala_operators.html#iregexp">IREGEXP Operator</a> for details.</span>
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            Performance and reliability improvements for DDL and insert operations on partitioned tables with a large
+            number of partitions. Impala only re-evaluates metadata for partitions that are affected by
+            a DDL operation, not all partitions in the table. While a DDL or insert statement is in progress,
+            other Impala statements that attempt to modify metadata for the same table wait until the first one
+            finishes.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            Reliability improvements for the <code class="ph codeph">LOAD DATA</code> statement.
+            Previously, this statement would fail if the source HDFS directory
+            contained any subdirectories at all. Now, the statement ignores
+            any hidden subdirectories, for example <span class="ph filepath">_impala_insert_staging</span>.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            A new operator, <code class="ph codeph">IS [NOT] DISTINCT FROM</code>, lets you compare values
+            and always get a <code class="ph codeph">true</code> or <code class="ph codeph">false</code> result,
+            even if one or both of the values are <code class="ph codeph">NULL</code>.
+            The <code class="ph codeph">IS NOT DISTINCT FROM</code> operator, or its equivalent
+            <code class="ph codeph">&lt;=&gt;</code> notation, improves the efficiency of join queries that
+            treat key values that are <code class="ph codeph">NULL</code> in both tables as equal.
+            <span class="ph">See <a class="xref" href="impala_operators.html#is_distinct_from">IS DISTINCT FROM Operator</a> for details.</span>
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            Security enhancements for the <span class="keyword cmdname">impala-shell</span> command.
+            A new option, <code class="ph codeph">--ldap_password_cmd</code>, lets you specify
+            a command to retrieve the LDAP password. The resulting password is
+            then used to authenticate the <span class="keyword cmdname">impala-shell</span> command
+            with the LDAP server.
+            <span class="ph">See <a class="xref" href="impala_shell_options.html">impala-shell Configuration Options</a> for details.</span>
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            The <code class="ph codeph">CREATE TABLE AS SELECT</code> statement now accepts a
+            <code class="ph codeph">PARTITIONED BY</code> clause, which lets you create a
+            partitioned table and insert data into it with a single statement.
+            <span class="ph">See <a class="xref" href="impala_create_table.html#create_table">CREATE TABLE Statement</a> for details.</span>
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            User-defined functions (UDFs and UDAFs) written in C++ now persist automatically
+            when the <span class="keyword cmdname">catalogd</span> daemon is restarted. You no longer
+            have to run the <code class="ph codeph">CREATE FUNCTION</code> statements again after a restart.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            User-defined functions (UDFs) written in Java can now persist
+            when the <span class="keyword cmdname">catalogd</span> daemon is restarted, and can be shared
+            transparently between Impala and Hive. You must do a one-time operation to recreate these
+            UDFs using new <code class="ph codeph">CREATE FUNCTION</code> syntax, without a signature for arguments
+            or the return value. Afterwards, you no longer have to run the <code class="ph codeph">CREATE FUNCTION</code>
+            statements again after a restart.
+            Although Impala does not have visibility into the UDFs that implement the
+            Hive built-in functions, user-created Hive UDFs are now automatically available
+            for calling through Impala.
+            <span class="ph">See <a class="xref" href="impala_create_function.html#create_function">CREATE FUNCTION Statement</a> for details.</span>
+          </p>
+        </li>
+        <li class="li">
+          
+          <p class="p">
+            Reliability enhancements for memory management. Some aggregation and join queries
+            that formerly might have failed with an out-of-memory error due to memory contention,
+            now can succeed using the spill-to-disk mechanism.
+          </p>
+        </li>
+        <li class="li">
+          
+          <p class="p">
+            The <code class="ph codeph">SHOW DATABASES</code> statement now returns two columns rather than one.
+            The second column includes the associated comment string, if any, for each database.
+            Adjust any application code that examines the list of databases and assumes the
+            result set contains only a single column.
+            <span class="ph">See <a class="xref" href="impala_show.html#show_databases">SHOW DATABASES</a> for details.</span>
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            A new optimization speeds up aggregation operations that involve only the partition key
+            columns of partitioned tables. For example, a query such as <code class="ph codeph">SELECT COUNT(DISTINCT k), MIN(k), MAX(k) FROM t1</code>
+            can avoid reading any data files if <code class="ph codeph">T1</code> is a partitioned table and <code class="ph codeph">K</code>
+            is one of the partition key columns. Because this technique can produce different results in cases
+            where HDFS files in a partition are manually deleted or are empty, you must enable the optimization
+            by setting the query option <code class="ph codeph">OPTIMIZE_PARTITION_KEY_SCANS</code>.
+            <span class="ph">See <a class="xref" href="impala_optimize_partition_key_scans.html">OPTIMIZE_PARTITION_KEY_SCANS Query Option (Impala 2.5 or higher only)</a> for details.</span>
+          </p>
+        </li>
+        
+        <li class="li">
+          <p class="p">
+            The <code class="ph codeph">DESCRIBE</code> statement can now display metadata about a database, using the
+            syntax <code class="ph codeph">DESCRIBE DATABASE <var class="keyword varname">db_name</var></code>.
+            <span class="ph">See <a class="xref" href="impala_describe.html#describe">DESCRIBE Statement</a> for details.</span>
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            The <code class="ph codeph">uuid()</code> built-in function generates an
+            alphanumeric value that you can use as a guaranteed unique identifier.
+            The uniqueness applies even across tables, for cases where an ascending
+            numeric sequence is not suitable.
+            <span class="ph">See <a class="xref" href="impala_misc_functions.html#misc_functions">Impala Miscellaneous Functions</a> for details.</span>
+          </p>
+        </li>
+      </ul>
+
+    </div>
+  </article>
+
+
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title6" id="new_features__new_features_240">
+
+    <h2 class="title topictitle2" id="ariaid-title6">New Features in <span class="keyword">Impala 2.4</span></h2>
+
+    <div class="body conbody">
+
+      <ul class="ul">
+        <li class="li">
+          <p class="p">
+            Impala can be used on the DSSD D5 Storage Appliance.
+            From a user perspective, the Impala features are the same as in <span class="keyword">Impala 2.3</span>.
+          </p>
+        </li>
+      </ul>
+
+    </div>
+  </article>
+
+
+
+
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title7" id="new_features__new_features_230">
+
+    <h2 class="title topictitle2" id="ariaid-title7">New Features in <span class="keyword">Impala 2.3</span></h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        The following are the major new features in Impala 2.3.x. This major release
+        contains improvements to SQL syntax (particularly new support for complex types), performance,
+        manageability, security.
+      </p>
+
+      <ul class="ul">
+
+        <li class="li">
+          <p class="p">
+            Complex data types: <code class="ph codeph">STRUCT</code>, <code class="ph codeph">ARRAY</code>, and <code class="ph codeph">MAP</code>. These
+            types can encode multiple named fields, positional items, or key-value pairs within a single column.
+            You can combine these types to produce nested types with arbitrarily deep nesting,
+            such as an <code class="ph codeph">ARRAY</code> of <code class="ph codeph">STRUCT</code> values,
+            a <code class="ph codeph">MAP</code> where each key-value pair is an <code class="ph codeph">ARRAY</code> of other <code class="ph codeph">MAP</code> values,
+            and so on. Currently, complex data types are only supported for the Parquet file format.
+            <span class="ph">See <a class="xref" href="impala_complex_types.html#complex_types">Complex Types (Impala 2.3 or higher only)</a> for usage details and <a class="xref" href="impala_array.html#array">ARRAY Complex Type (Impala 2.3 or higher only)</a>, <a class="xref" href="impala_struct.html#struct">STRUCT Complex Type (Impala 2.3 or higher only)</a>, and <a class="xref" href="impala_map.html#map">MAP Complex Type (Impala 2.3 or higher only)</a> for syntax.</span>
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            Column-level authorization lets you define access to particular columns within a table,
+            rather than the entire table. This feature lets you reduce the reliance on creating views to
+            set up authorization schemes for subsets of information.
+            See <span class="xref">the documentation for Apache Sentry</span> for background details, and
+            <a class="xref" href="impala_grant.html#grant">GRANT Statement (Impala 2.0 or higher only)</a> and <a class="xref" href="impala_revoke.html#revoke">REVOKE Statement (Impala 2.0 or higher only)</a> for Impala-specific syntax.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            The <code class="ph codeph">TRUNCATE TABLE</code> statement removes all the data from a table without removing the table itself.
+            <span class="ph">See <a class="xref" href="impala_truncate_table.html#truncate_table">TRUNCATE TABLE Statement (Impala 2.3 or higher only)</a> for details.</span>
+          </p>
+        </li>
+
+        <li class="li" id="new_features_230__IMPALA-2015">
+          <p class="p">
+            Nested loop join queries. Some join queries that formerly required equality comparisons can now use
+            operators such as <code class="ph codeph">&lt;</code> or <code class="ph codeph">&gt;=</code>. This same join mechanism is used
+            internally to optimize queries that retrieve values from complex type columns.
+            <span class="ph">See <a class="xref" href="impala_joins.html#joins">Joins in Impala SELECT Statements</a> for details about Impala join queries.</span>
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            Reduced memory usage and improved performance and robustness for spill-to-disk feature.
+            <span class="ph">See <a class="xref" href="impala_scalability.html#spill_to_disk">SQL Operations that Spill to Disk</a> for details about this feature.</span>
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            Performance improvements for querying Parquet data files containing multiple row groups
+            and multiple data blocks:
+          </p>
+          <ul class="ul">
+          <li class="li">
+          <p class="p"> For files written by Hive, SparkSQL, and other Parquet MR writers
+                and spanning multiple HDFS blocks, Impala now scans the extra
+                data blocks locally when possible, rather than using remote
+                reads. </p>
+          </li>
+          <li class="li">
+          <p class="p">
+            Impala queries benefit from the improved alignment of row groups with HDFS blocks for Parquet
+            files written by Hive, MapReduce, and other components. (Impala itself never writes
+            multiblock Parquet files, so the alignment change does not apply to Parquet files produced by Impala.)
+            These Parquet writers now add padding to Parquet files that they write to align row groups with HDFS blocks.
+            The <code class="ph codeph">parquet.writer.max-padding</code> setting specifies the maximum number of bytes, by default
+            8 megabytes, that can be added to the file between row groups to fill the gap at the end of one block
+            so that the next row group starts at the beginning of the next block.
+            If the gap is larger than this size, the writer attempts to fit another entire row group in the remaining space.
+            Include this setting in the <span class="ph filepath">hive-site</span> configuration file to influence Parquet files written by Hive,
+            or the <span class="ph filepath">hdfs-site</span> configuration file to influence Parquet files written by all non-Impala components.
+          </p>
+          </li>
+          </ul>
+          <p class="p">
+            See <a class="xref" href="impala_parquet.html#parquet">Using the Parquet File Format with Impala Tables</a> for instructions about using Parquet data files
+            with Impala.
+          </p>
+        </li>
+
+        <li class="li" id="new_features_230__IMPALA-1660">
+          <p class="p">
+            Many new built-in scalar functions, for convenience and enhanced portability of SQL that uses common industry extensions.
+          </p>
+
+          <p class="p">
+            Math functions<span class="ph"> (see <a class="xref" href="impala_math_functions.html#math_functions">Impala Mathematical Functions</a> for details)</span>:
+          </p>
+          <ul class="ul">
+            <li class="li">
+              <code class="ph codeph">ATAN2</code>
+            </li>
+
+            <li class="li">
+              <code class="ph codeph">COSH</code>
+            </li>
+
+            <li class="li">
+              <code class="ph codeph">COT</code>
+            </li>
+
+            <li class="li">
+              <code class="ph codeph">DCEIL</code>
+            </li>
+
+            <li class="li">
+              <code class="ph codeph">DEXP</code>
+            </li>
+
+            <li class="li">
+              <code class="ph codeph">DFLOOR</code>
+            </li>
+
+            <li class="li">
+              <code class="ph codeph">DLOG10</code>
+            </li>
+
+            <li class="li">
+              <code class="ph codeph">DPOW</code>
+            </li>
+
+            <li class="li">
+              <code class="ph codeph">DROUND</code>
+            </li>
+
+            <li class="li">
+              <code class="ph codeph">DSQRT</code>
+            </li>
+
+            <li class="li">
+              <code class="ph codeph">DTRUNC</code>
+            </li>
+
+            <li class="li">
+              <code class="ph codeph">FACTORIAL</code>, and corresponding <code class="ph codeph">!</code> operator
+            </li>
+
+            <li class="li">
+              <code class="ph codeph">FPOW</code>
+            </li>
+
+            <li class="li">
+              <code class="ph codeph">RADIANS</code>
+            </li>
+
+            <li class="li">
+              <code class="ph codeph">RANDOM</code>
+            </li>
+
+            <li class="li">
+              <code class="ph codeph">SINH</code>
+            </li>
+
+            <li class="li">
+              <code class="ph codeph">TANH</code>
+            </li>
+          </ul>
+
+          <p class="p">
+            String functions<span class="ph"> (see <a class="xref" href="impala_string_functions.html#string_functions">Impala String Functions</a> for details)</span>:
+          </p>
+          <ul class="ul">
+            <li class="li">
+              <code class="ph codeph">BTRIM</code>
+            </li>
+            <li class="li">
+              <code class="ph codeph">CHR</code>
+            </li>
+            <li class="li">
+              <code class="ph codeph">REGEXP_LIKE</code>
+            </li>
+            <li class="li">
+              <code class="ph codeph">SPLIT_PART</code>
+            </li>
+          </ul>
+
+          <p class="p">
+            Date and time functions<span class="ph"> (see <a class="xref" href="impala_datetime_functions.html#datetime_functions">Impala Date and Time Functions</a> for details)</span>:
+          </p>
+          <ul class="ul">
+              <li class="li">
+                <code class="ph codeph">INT_MONTHS_BETWEEN</code>
+              </li>
+              <li class="li">
+                <code class="ph codeph">MONTHS_BETWEEN</code>
+              </li>
+              <li class="li">
+                <code class="ph codeph">TIMEOFDAY</code>
+              </li>
+              <li class="li">
+                <code class="ph codeph">TIMESTAMP_CMP</code>
+              </li>
+          </ul>
+
+          <p class="p">
+            Bit manipulation functions<span class="ph"> (see <a class="xref" href="impala_bit_functions.html#bit_functions">Impala Bit Functions</a> for details)</span>:
+          </p>
+          <ul class="ul">
+            <li class="li">
+              <code class="ph codeph">BITAND</code>
+            </li>
+
+            <li class="li">
+              <code class="ph codeph">BITNOT</code>
+            </li>
+
+            <li class="li">
+              <code class="ph codeph">BITOR</code>
+            </li>
+
+            <li class="li">
+              <code class="ph codeph">BITXOR</code>
+            </li>
+
+            <li class="li">
+              <code class="ph codeph">COUNTSET</code>
+            </li>
+
+            <li class="li">
+              <code class="ph codeph">GETBIT</code>
+            </li>
+
+            <li class="li">
+              <code class="ph codeph">ROTATELEFT</code>
+            </li>
+
+            <li class="li">
+              <code class="ph codeph">ROTATERIGHT</code>
+            </li>
+
+            <li class="li">
+              <code class="ph codeph">SETBIT</code>
+            </li>
+
+            <li class="li">
+              <code class="ph codeph">SHIFTLEFT</code>
+            </li>
+
+            <li class="li">
+              <code class="ph codeph">SHIFTRIGHT</code>
+            </li>
+          </ul>
+          <p class="p">
+            Type conversion functions<span class="ph"> (see <a class="xref" href="impala_conversion_functions.html#conversion_functions">Impala Type Conversion Functions</a> for details)</span>:
+          </p>
+          <ul class="ul">
+            <li class="li">
+              <code class="ph codeph">TYPEOF</code>
+            </li>
+          </ul>
+          <p class="p">
+            The <code class="ph codeph">effective_user()</code> function<span class="ph"> (see <a class="xref" href="impala_misc_functions.html#misc_functions">Impala Miscellaneous Functions</a> for details)</span>.
+          </p>
+        </li>
+
+        <li class="li" id="new_features_230__IMPALA-2081">
+          <p class="p">
+            New built-in analytic functions: <code class="ph codeph">PERCENT_RANK</code>, <code class="ph codeph">NTILE</code>,
+            <code class="ph codeph">CUME_DIST</code>.
+            <span class="ph">See <a class="xref" href="impala_analytic_functions.html#analytic_functions">Impala Analytic Functions</a> for details.</span>
+          </p>
+        </li>
+
+        <li class="li" id="new_features_230__IMPALA-595">
+          <p class="p">
+            The <code class="ph codeph">DROP DATABASE</code> statement now works for a non-empty database.
+            When you specify the optional <code class="ph codeph">CASCADE</code> clause, any tables in the
+            database are dropped before the database itself is removed.
+            <span class="ph">See <a class="xref" href="impala_drop_database.html#drop_database">DROP DATABASE Statement</a> for details.</span>
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            The <code class="ph codeph">DROP TABLE</code> and <code class="ph codeph">ALTER TABLE DROP PARTITION</code> statements have a new optional keyword, <code class="ph codeph">PURGE</code>.
+            This keyword causes Impala to immediately remove the relevant HDFS data files rather than sending them to the HDFS trashcan.
+            This feature can help to avoid out-of-space errors on storage devices, and to avoid files being left behind in case of
+            a problem with the HDFS trashcan, such as the trashcan not being configured or being in a different HDFS encryption zone
+            than the data files.
+            <span class="ph">See <a class="xref" href="impala_drop_table.html#drop_table">DROP TABLE Statement</a> and <a class="xref" href="impala_alter_table.html#alter_table">ALTER TABLE Statement</a> for syntax.</span>
+          </p>
+        </li>
+
+        <li class="li" id="new_features_230__IMPALA-80">
+          <p class="p">
+            The <span class="keyword cmdname">impala-shell</span> command has a new feature for live progress reporting. This feature
+            is enabled through the <code class="ph codeph">--live_progress</code> and <code class="ph codeph">--live_summary</code>
+            command-line options, or during a session through the <code class="ph codeph">LIVE_SUMMARY</code> and
+            <code class="ph codeph">LIVE_PROGRESS</code> query options.
+            <span class="ph">See <a class="xref" href="impala_live_progress.html#live_progress">LIVE_PROGRESS Query Option (Impala 2.3 or higher only)</a> and <a class="xref" href="impala_live_summary.html#live_summary">LIVE_SUMMARY Query Option (Impala 2.3 or higher only)</a> for details.</span>
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            The <span class="keyword cmdname">impala-shell</span> command also now displays a random <span class="q">"tip of the day"</span> when it starts.
+          </p>
+        </li>
+
+        <li class="li" id="new_features_230__IMPALA-1413">
+          <p class="p">
+            The <span class="keyword cmdname">impala-shell</span> option <code class="ph codeph">-f</code> now recognizes a special filename
+            <code class="ph codeph">-</code> to accept input from stdin.
+            <span class="ph">See <a class="xref" href="impala_shell_options.html#shell_options">impala-shell Configuration Options</a> for details about the options for running <span class="keyword cmdname">impala-shell</span> in non-interactive mode.</span>
+          </p>
+        </li>
+
+        <li class="li" id="new_features_230__IMPALA-1963">
+          <p class="p">
+            Format strings for the <code class="ph codeph">unix_timestamp()</code> function can now include numeric timezone offsets.
+            <span class="ph">See <a class="xref" href="impala_datetime_functions.html#datetime_functions">Impala Date and Time Functions</a> for details.</span>
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            Impala can now run a specified command to obtain the password to decrypt a p

<TRUNCATED>

[19/51] [partial] incubator-impala git commit: IMPALA-4181 [DOCS] Publish rendered Impala documentation to ASF site

Posted by jb...@apache.org.

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_operators.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_operators.html b/docs/build/html/topics/impala_operators.html
new file mode 100644
index 0000000..09a5e7d
--- /dev/null
+++ b/docs/build/html/topics/impala_operators.html
@@ -0,0 +1,1937 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_langref.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="
 Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="operators"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>SQL Operators</title></head><body id="operators"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">SQL Operators</h1>
+
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      
+      SQL operators are a class of comparison functions that are widely used within the <code class="ph codeph">WHERE</code> clauses of
+      <code class="ph codeph">SELECT</code> statements.
+    </p>
+
+    <p class="p toc inpage"></p>
+
+  </div>
+
+  <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_langref.html">Impala SQL Language Reference</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="operators__arithmetic_operators">
+
+    <h2 class="title topictitle2" id="ariaid-title2">Arithmetic Operators</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        
+        The arithmetic operators use expressions with a left-hand argument, the operator, and then (in most cases) a right-hand argument.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+<pre class="pre codeblock"><code><var class="keyword varname">left_hand_arg</var> <var class="keyword varname">binary_operator</var> <var class="keyword varname">right_hand_arg</var>
+<var class="keyword varname">unary_operator</var> <var class="keyword varname">single_arg</var>
+</code></pre>
+
+      <ul class="ul">
+        <li class="li">
+          <code class="ph codeph">+</code> and <code class="ph codeph">-</code>: Can be used either as unary or binary operators.
+          <ul class="ul">
+            <li class="li">
+              <p class="p">
+                With unary notation, such as <code class="ph codeph">+5</code>, <code class="ph codeph">-2.5</code>, or <code class="ph codeph">-<var class="keyword varname">col_name</var></code>,
+                they multiply their single numeric argument by <code class="ph codeph">+1</code> or <code class="ph codeph">-1</code>. Therefore, unary
+                <code class="ph codeph">+</code> returns its argument unchanged, while unary <code class="ph codeph">-</code> flips the sign of its argument. Although
+                you can double up these operators in expressions such as <code class="ph codeph">++5</code> (always positive) or <code class="ph codeph">-+2</code> or
+                <code class="ph codeph">+-2</code> (both always negative), you cannot double the unary minus operator because <code class="ph codeph">--</code> is
+                interpreted as the start of a comment. (You can use a double unary minus operator if you separate the <code class="ph codeph">-</code>
+                characters, for example with a space or parentheses.)
+              </p>
+            </li>
+
+            <li class="li">
+              <p class="p">
+                With binary notation, such as <code class="ph codeph">2+2</code>, <code class="ph codeph">5-2.5</code>, or <code class="ph codeph"><var class="keyword varname">col1</var> +
+                <var class="keyword varname">col2</var></code>, they add or subtract respectively the right-hand argument to (or from) the left-hand
+                argument. Both arguments must be of numeric types.
+              </p>
+            </li>
+          </ul>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            <code class="ph codeph">*</code> and <code class="ph codeph">/</code>: Multiplication and division respectively. Both arguments must be of numeric types.
+          </p>
+
+          <p class="p">
+            When multiplying, the shorter argument is promoted if necessary (such as <code class="ph codeph">SMALLINT</code> to <code class="ph codeph">INT</code> or
+            <code class="ph codeph">BIGINT</code>, or <code class="ph codeph">FLOAT</code> to <code class="ph codeph">DOUBLE</code>), and then the result is promoted again to the
+            next larger type. Thus, multiplying a <code class="ph codeph">TINYINT</code> and an <code class="ph codeph">INT</code> produces a <code class="ph codeph">BIGINT</code>
+            result. Multiplying a <code class="ph codeph">FLOAT</code> and a <code class="ph codeph">FLOAT</code> produces a <code class="ph codeph">DOUBLE</code> result. Multiplying
+            a <code class="ph codeph">FLOAT</code> and a <code class="ph codeph">DOUBLE</code> or a <code class="ph codeph">DOUBLE</code> and a <code class="ph codeph">DOUBLE</code> produces a
+            <code class="ph codeph">DECIMAL(38,17)</code>, because <code class="ph codeph">DECIMAL</code> values can represent much larger and more precise values than
+            <code class="ph codeph">DOUBLE</code>.
+          </p>
+
+          <p class="p">
+            When dividing, Impala always treats the arguments and result as <code class="ph codeph">DOUBLE</code> values to avoid losing precision. If you
+            need to insert the results of a division operation into a <code class="ph codeph">FLOAT</code> column, use the <code class="ph codeph">CAST()</code>
+            function to convert the result to the correct type.
+          </p>
+        </li>
+
+        <li class="li" id="arithmetic_operators__div">
+          <p class="p">
+            <code class="ph codeph">DIV</code>: Integer division. Arguments are not promoted to a floating-point type, and any fractional result
+            is discarded. For example, <code class="ph codeph">13 DIV 7</code> returns 1, <code class="ph codeph">14 DIV 7</code> returns 2, and
+            <code class="ph codeph">15 DIV 7</code> returns 2. This operator is the same as the <code class="ph codeph">QUOTIENT()</code> function.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            <code class="ph codeph">%</code>: Modulo operator. Returns the remainder of the left-hand argument divided by the right-hand argument. Both
+            arguments must be of one of the integer types.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            <code class="ph codeph">&amp;</code>, <code class="ph codeph">|</code>, <code class="ph codeph">~</code>, and <code class="ph codeph">^</code>: Bitwise operators that return the
+            logical AND, logical OR, <code class="ph codeph">NOT</code>, or logical XOR (exclusive OR) of their argument values. Both arguments must be of
+            one of the integer types. If the arguments are of different type, the argument with the smaller type is implicitly extended to
+            match the argument with the longer type.
+          </p>
+        </li>
+      </ul>
+
+      <p class="p">
+        You can chain a sequence of arithmetic expressions, optionally grouping them with parentheses.
+      </p>
+
+      <p class="p">
+        The arithmetic operators generally do not have equivalent calling conventions using functional notation. For example, prior to
+        <span class="keyword">Impala 2.2</span>, there is no <code class="ph codeph">MOD()</code> function equivalent to the <code class="ph codeph">%</code> modulo operator.
+        Conversely, there are some arithmetic functions that do not have a corresponding operator. For example, for exponentiation you use
+        the <code class="ph codeph">POW()</code> function, but there is no <code class="ph codeph">**</code> exponentiation operator. See
+        <a class="xref" href="impala_math_functions.html#math_functions">Impala Mathematical Functions</a> for the arithmetic functions you can use.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Complex type considerations:</strong>
+      </p>
+
+      <p class="p">
+        To access a column with a complex type (<code class="ph codeph">ARRAY</code>, <code class="ph codeph">STRUCT</code>, or <code class="ph codeph">MAP</code>)
+        in an aggregation function, you unpack the individual elements using join notation in the query,
+        and then apply the function to the final scalar item, field, key, or value at the bottom of any nested type hierarchy in the column.
+        See <a class="xref" href="../shared/../topics/impala_complex_types.html#complex_types">Complex Types (Impala 2.3 or higher only)</a> for details about using complex types in Impala.
+      </p>
+
+      <div class="p">
+The following example demonstrates calls to several aggregation functions
+using values from a column containing nested complex types
+(an <code class="ph codeph">ARRAY</code> of <code class="ph codeph">STRUCT</code> items).
+The array is unpacked inside the query using join notation.
+The array elements are referenced using the <code class="ph codeph">ITEM</code>
+pseudocolumn, and the structure fields inside the array elements
+are referenced using dot notation.
+Numeric values such as <code class="ph codeph">SUM()</code> and <code class="ph codeph">AVG()</code>
+are computed using the numeric <code class="ph codeph">R_NATIONKEY</code> field, and
+the general-purpose <code class="ph codeph">MAX()</code> and <code class="ph codeph">MIN()</code>
+values are computed from the string <code class="ph codeph">N_NAME</code> field.
+<pre class="pre codeblock"><code>describe region;
++-------------+-------------------------+---------+
+| name        | type                    | comment |
++-------------+-------------------------+---------+
+| r_regionkey | smallint                |         |
+| r_name      | string                  |         |
+| r_comment   | string                  |         |
+| r_nations   | array&lt;struct&lt;           |         |
+|             |   n_nationkey:smallint, |         |
+|             |   n_name:string,        |         |
+|             |   n_comment:string      |         |
+|             | &gt;&gt;                      |         |
++-------------+-------------------------+---------+
+
+select r_name, r_nations.item.n_nationkey
+  from region, region.r_nations as r_nations
+order by r_name, r_nations.item.n_nationkey;
++-------------+------------------+
+| r_name      | item.n_nationkey |
++-------------+------------------+
+| AFRICA      | 0                |
+| AFRICA      | 5                |
+| AFRICA      | 14               |
+| AFRICA      | 15               |
+| AFRICA      | 16               |
+| AMERICA     | 1                |
+| AMERICA     | 2                |
+| AMERICA     | 3                |
+| AMERICA     | 17               |
+| AMERICA     | 24               |
+| ASIA        | 8                |
+| ASIA        | 9                |
+| ASIA        | 12               |
+| ASIA        | 18               |
+| ASIA        | 21               |
+| EUROPE      | 6                |
+| EUROPE      | 7                |
+| EUROPE      | 19               |
+| EUROPE      | 22               |
+| EUROPE      | 23               |
+| MIDDLE EAST | 4                |
+| MIDDLE EAST | 10               |
+| MIDDLE EAST | 11               |
+| MIDDLE EAST | 13               |
+| MIDDLE EAST | 20               |
++-------------+------------------+
+
+select
+  r_name,
+  count(r_nations.item.n_nationkey) as count,
+  sum(r_nations.item.n_nationkey) as sum,
+  avg(r_nations.item.n_nationkey) as avg,
+  min(r_nations.item.n_name) as minimum,
+  max(r_nations.item.n_name) as maximum,
+  ndv(r_nations.item.n_nationkey) as distinct_vals
+from
+  region, region.r_nations as r_nations
+group by r_name
+order by r_name;
++-------------+-------+-----+------+-----------+----------------+---------------+
+| r_name      | count | sum | avg  | minimum   | maximum        | distinct_vals |
++-------------+-------+-----+------+-----------+----------------+---------------+
+| AFRICA      | 5     | 50  | 10   | ALGERIA   | MOZAMBIQUE     | 5             |
+| AMERICA     | 5     | 47  | 9.4  | ARGENTINA | UNITED STATES  | 5             |
+| ASIA        | 5     | 68  | 13.6 | CHINA     | VIETNAM        | 5             |
+| EUROPE      | 5     | 77  | 15.4 | FRANCE    | UNITED KINGDOM | 5             |
+| MIDDLE EAST | 5     | 58  | 11.6 | EGYPT     | SAUDI ARABIA   | 5             |
++-------------+-------+-----+------+-----------+----------------+---------------+
+</code></pre>
+</div>
+
+      <p class="p">
+        You cannot refer to a column with a complex data type (<code class="ph codeph">ARRAY</code>, <code class="ph codeph">STRUCT</code>, or <code class="ph codeph">MAP</code>
+        directly in an operator. You can apply operators only to scalar values that make up a complex type
+        (the fields of a <code class="ph codeph">STRUCT</code>, the items of an <code class="ph codeph">ARRAY</code>,
+        or the key or value portion of a <code class="ph codeph">MAP</code>) as part of a join query that refers to
+        the scalar value using the appropriate dot notation or <code class="ph codeph">ITEM</code>, <code class="ph codeph">KEY</code>, or <code class="ph codeph">VALUE</code>
+        pseudocolumn names.
+      </p>
+
+      <p class="p">
+        The following example shows how to do an arithmetic operation using a numeric field of a <code class="ph codeph">STRUCT</code> type that is an
+        item within an <code class="ph codeph">ARRAY</code> column. Once the scalar numeric value <code class="ph codeph">R_NATIONKEY</code> is extracted, it can be
+        used in an arithmetic expression, such as multiplying by 10:
+      </p>
+
+<pre class="pre codeblock"><code>
+-- The SMALLINT is a field within an array of structs.
+describe region;
++-------------+-------------------------+---------+
+| name        | type                    | comment |
++-------------+-------------------------+---------+
+| r_regionkey | smallint                |         |
+| r_name      | string                  |         |
+| r_comment   | string                  |         |
+| r_nations   | array&lt;struct&lt;           |         |
+|             |   n_nationkey:smallint, |         |
+|             |   n_name:string,        |         |
+|             |   n_comment:string      |         |
+|             | &gt;&gt;                      |         |
++-------------+-------------------------+---------+
+
+-- When we refer to the scalar value using dot notation,
+-- we can use arithmetic and comparison operators on it
+-- like any other number.
+select r_name, nation.item.n_name, nation.item.n_nationkey * 10
+  from region, region.r_nations as nation
+where nation.item.n_nationkey &lt; 5;
++-------------+-------------+------------------------------+
+| r_name      | item.n_name | nation.item.n_nationkey * 10 |
++-------------+-------------+------------------------------+
+| AMERICA     | CANADA      | 30                           |
+| AMERICA     | BRAZIL      | 20                           |
+| AMERICA     | ARGENTINA   | 10                           |
+| MIDDLE EAST | EGYPT       | 40                           |
+| AFRICA      | ALGERIA     | 0                            |
++-------------+-------------+------------------------------+
+</code></pre>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="operators__between">
+
+    <h2 class="title topictitle2" id="ariaid-title3">BETWEEN Operator</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        
+        In a <code class="ph codeph">WHERE</code> clause, compares an expression to both a lower and upper bound. The comparison is successful is the
+        expression is greater than or equal to the lower bound, and less than or equal to the upper bound. If the bound values are switched,
+        so the lower bound is greater than the upper bound, does not match any values.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+<pre class="pre codeblock"><code><var class="keyword varname">expression</var> BETWEEN <var class="keyword varname">lower_bound</var> AND <var class="keyword varname">upper_bound</var></code></pre>
+
+      <p class="p">
+        <strong class="ph b">Data types:</strong> Typically used with numeric data types. Works with any data type, although not very practical for
+        <code class="ph codeph">BOOLEAN</code> values. (<code class="ph codeph">BETWEEN false AND true</code> will match all <code class="ph codeph">BOOLEAN</code> values.) Use
+        <code class="ph codeph">CAST()</code> if necessary to ensure the lower and upper bound values are compatible types. Call string or date/time
+        functions if necessary to extract or transform the relevant portion to compare, especially if the value can be transformed into a
+        number.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+      <p class="p">
+        Be careful when using short string operands. A longer string that starts with the upper bound value will not be included, because it
+        is considered greater than the upper bound. For example, <code class="ph codeph">BETWEEN 'A' and 'M'</code> would not match the string value
+        <code class="ph codeph">'Midway'</code>. Use functions such as <code class="ph codeph">upper()</code>, <code class="ph codeph">lower()</code>, <code class="ph codeph">substr()</code>,
+        <code class="ph codeph">trim()</code>, and so on if necessary to ensure the comparison works as expected.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Complex type considerations:</strong>
+      </p>
+
+      <p class="p">
+        You cannot refer to a column with a complex data type (<code class="ph codeph">ARRAY</code>, <code class="ph codeph">STRUCT</code>, or <code class="ph codeph">MAP</code>
+        directly in an operator. You can apply operators only to scalar values that make up a complex type
+        (the fields of a <code class="ph codeph">STRUCT</code>, the items of an <code class="ph codeph">ARRAY</code>,
+        or the key or value portion of a <code class="ph codeph">MAP</code>) as part of a join query that refers to
+        the scalar value using the appropriate dot notation or <code class="ph codeph">ITEM</code>, <code class="ph codeph">KEY</code>, or <code class="ph codeph">VALUE</code>
+        pseudocolumn names.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>-- Retrieve data for January through June, inclusive.
+select c1 from t1 where month <strong class="ph b">between 1 and 6</strong>;
+
+-- Retrieve data for names beginning with 'A' through 'M' inclusive.
+-- Only test the first letter to ensure all the values starting with 'M' are matched.
+-- Do a case-insensitive comparison to match names with various capitalization conventions.
+select last_name from customers where upper(substr(last_name,1,1)) <strong class="ph b">between 'A' and 'M'</strong>;
+
+-- Retrieve data for only the first week of each month.
+select count(distinct visitor_id)) from web_traffic where dayofmonth(when_viewed) <strong class="ph b">between 1 and 7</strong>;</code></pre>
+
+      <p class="p">
+        The following example shows how to do a <code class="ph codeph">BETWEEN</code> comparison using a numeric field of a <code class="ph codeph">STRUCT</code> type
+        that is an item within an <code class="ph codeph">ARRAY</code> column. Once the scalar numeric value <code class="ph codeph">R_NATIONKEY</code> is extracted, it
+        can be used in a comparison operator:
+      </p>
+
+<pre class="pre codeblock"><code>
+-- The SMALLINT is a field within an array of structs.
+describe region;
++-------------+-------------------------+---------+
+| name        | type                    | comment |
++-------------+-------------------------+---------+
+| r_regionkey | smallint                |         |
+| r_name      | string                  |         |
+| r_comment   | string                  |         |
+| r_nations   | array&lt;struct&lt;           |         |
+|             |   n_nationkey:smallint, |         |
+|             |   n_name:string,        |         |
+|             |   n_comment:string      |         |
+|             | &gt;&gt;                      |         |
++-------------+-------------------------+---------+
+
+-- When we refer to the scalar value using dot notation,
+-- we can use arithmetic and comparison operators on it
+-- like any other number.
+select r_name, nation.item.n_name, nation.item.n_nationkey
+from region, region.r_nations as nation
+where nation.item.n_nationkey between 3 and 5
++-------------+-------------+------------------+
+| r_name      | item.n_name | item.n_nationkey |
++-------------+-------------+------------------+
+| AMERICA     | CANADA      | 3                |
+| MIDDLE EAST | EGYPT       | 4                |
+| AFRICA      | ETHIOPIA    | 5                |
++-------------+-------------+------------------+
+</code></pre>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="operators__comparison_operators">
+
+    <h2 class="title topictitle2" id="ariaid-title4">Comparison Operators</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        
+        Impala supports the familiar comparison operators for checking equality and sort order for the column data types:
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+<pre class="pre codeblock"><code><var class="keyword varname">left_hand_expression</var> <var class="keyword varname">comparison_operator</var> <var class="keyword varname">right_hand_expression</var></code></pre>
+
+      <ul class="ul">
+        <li class="li">
+          <code class="ph codeph">=</code>, <code class="ph codeph">!=</code>, <code class="ph codeph">&lt;&gt;</code>: apply to all types.
+        </li>
+
+        <li class="li">
+          <code class="ph codeph">&lt;</code>, <code class="ph codeph">&lt;=</code>, <code class="ph codeph">&gt;</code>, <code class="ph codeph">&gt;=</code>: apply to all types; for
+          <code class="ph codeph">BOOLEAN</code>, <code class="ph codeph">TRUE</code> is considered greater than <code class="ph codeph">FALSE</code>.
+        </li>
+      </ul>
+
+      <p class="p">
+        <strong class="ph b">Alternatives:</strong>
+      </p>
+
+      <p class="p">
+        The <code class="ph codeph">IN</code> and <code class="ph codeph">BETWEEN</code> operators provide shorthand notation for expressing combinations of equality,
+        less than, and greater than comparisons with a single operator.
+      </p>
+
+      <p class="p">
+        Because comparing any value to <code class="ph codeph">NULL</code> produces <code class="ph codeph">NULL</code> rather than <code class="ph codeph">TRUE</code> or
+        <code class="ph codeph">FALSE</code>, use the <code class="ph codeph">IS NULL</code> and <code class="ph codeph">IS NOT NULL</code> operators to check if a value is
+        <code class="ph codeph">NULL</code> or not.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Complex type considerations:</strong>
+      </p>
+
+      <p class="p">
+        You cannot refer to a column with a complex data type (<code class="ph codeph">ARRAY</code>, <code class="ph codeph">STRUCT</code>, or <code class="ph codeph">MAP</code>
+        directly in an operator. You can apply operators only to scalar values that make up a complex type
+        (the fields of a <code class="ph codeph">STRUCT</code>, the items of an <code class="ph codeph">ARRAY</code>,
+        or the key or value portion of a <code class="ph codeph">MAP</code>) as part of a join query that refers to
+        the scalar value using the appropriate dot notation or <code class="ph codeph">ITEM</code>, <code class="ph codeph">KEY</code>, or <code class="ph codeph">VALUE</code>
+        pseudocolumn names.
+      </p>
+
+      <p class="p">
+        The following example shows how to do an arithmetic operation using a numeric field of a <code class="ph codeph">STRUCT</code> type that is an
+        item within an <code class="ph codeph">ARRAY</code> column. Once the scalar numeric value <code class="ph codeph">R_NATIONKEY</code> is extracted, it can be
+        used with a comparison operator such as <code class="ph codeph">&lt;</code>:
+      </p>
+
+<pre class="pre codeblock"><code>
+-- The SMALLINT is a field within an array of structs.
+describe region;
++-------------+-------------------------+---------+
+| name        | type                    | comment |
++-------------+-------------------------+---------+
+| r_regionkey | smallint                |         |
+| r_name      | string                  |         |
+| r_comment   | string                  |         |
+| r_nations   | array&lt;struct&lt;           |         |
+|             |   n_nationkey:smallint, |         |
+|             |   n_name:string,        |         |
+|             |   n_comment:string      |         |
+|             | &gt;&gt;                      |         |
++-------------+-------------------------+---------+
+
+-- When we refer to the scalar value using dot notation,
+-- we can use arithmetic and comparison operators on it
+-- like any other number.
+select r_name, nation.item.n_name, nation.item.n_nationkey
+from region, region.r_nations as nation
+where nation.item.n_nationkey &lt; 5
++-------------+-------------+------------------+
+| r_name      | item.n_name | item.n_nationkey |
++-------------+-------------+------------------+
+| AMERICA     | CANADA      | 3                |
+| AMERICA     | BRAZIL      | 2                |
+| AMERICA     | ARGENTINA   | 1                |
+| MIDDLE EAST | EGYPT       | 4                |
+| AFRICA      | ALGERIA     | 0                |
++-------------+-------------+------------------+
+</code></pre>
+
+    </div>
+
+  </article>
+
+  
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title5" id="operators__exists">
+
+    <h2 class="title topictitle2" id="ariaid-title5">EXISTS Operator</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        
+
+        
+        The <code class="ph codeph">EXISTS</code> operator tests whether a subquery returns any results. You typically use it to find values from one
+        table that have corresponding values in another table.
+      </p>
+
+      <p class="p">
+        The converse, <code class="ph codeph">NOT EXISTS</code>, helps to find all the values from one table that do not have any corresponding values in
+        another table.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>EXISTS (<var class="keyword varname">subquery</var>)
+NOT EXISTS (<var class="keyword varname">subquery</var>)
+</code></pre>
+
+      <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+      <p class="p">
+        The subquery can refer to a different table than the outer query block, or the same table. For example, you might use
+        <code class="ph codeph">EXISTS</code> or <code class="ph codeph">NOT EXISTS</code> to check the existence of parent/child relationships between two columns of
+        the same table.
+      </p>
+
+      <p class="p">
+        You can also use operators and function calls within the subquery to test for other kinds of relationships other than strict
+        equality. For example, you might use a call to <code class="ph codeph">COUNT()</code> in the subquery to check whether the number of matching
+        values is higher or lower than some limit. You might call a UDF in the subquery to check whether values in one table matches a
+        hashed representation of those same values in a different table.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">NULL considerations:</strong>
+      </p>
+
+      <p class="p">
+        If the subquery returns any value at all (even <code class="ph codeph">NULL</code>), <code class="ph codeph">EXISTS</code> returns <code class="ph codeph">TRUE</code> and
+        <code class="ph codeph">NOT EXISTS</code> returns false.
+      </p>
+
+      <p class="p">
+        The following example shows how even when the subquery returns only <code class="ph codeph">NULL</code> values, <code class="ph codeph">EXISTS</code> still
+        returns <code class="ph codeph">TRUE</code> and thus matches all the rows from the table in the outer query block.
+      </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; create table all_nulls (x int);
+[localhost:21000] &gt; insert into all_nulls values (null), (null), (null);
+[localhost:21000] &gt; select y from t2 where exists (select x from all_nulls);
++---+
+| y |
++---+
+| 2 |
+| 4 |
+| 6 |
++---+
+</code></pre>
+
+      <p class="p">
+        However, if the table in the subquery is empty and so the subquery returns an empty result set, <code class="ph codeph">EXISTS</code> returns
+        <code class="ph codeph">FALSE</code>:
+      </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; create table empty (x int);
+[localhost:21000] &gt; select y from t2 where exists (select x from empty);
+[localhost:21000] &gt;
+</code></pre>
+
+      <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.0.0</span>
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Restrictions:</strong>
+      </p>
+
+      <p class="p">
+        Correlated subqueries used in <code class="ph codeph">EXISTS</code> and <code class="ph codeph">IN</code> operators cannot include a
+        <code class="ph codeph">LIMIT</code> clause.
+      </p>
+
+      <p class="p">
+        Prior to <span class="keyword">Impala 2.6</span>,
+        the <code class="ph codeph">NOT EXISTS</code> operator required a correlated subquery.
+        In <span class="keyword">Impala 2.6</span> and higher, <code class="ph codeph">NOT EXISTS</code> works with
+        uncorrelated queries also.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Complex type considerations:</strong>
+      </p>
+
+      <p class="p">
+        You cannot refer to a column with a complex data type (<code class="ph codeph">ARRAY</code>, <code class="ph codeph">STRUCT</code>, or <code class="ph codeph">MAP</code>
+        directly in an operator. You can apply operators only to scalar values that make up a complex type
+        (the fields of a <code class="ph codeph">STRUCT</code>, the items of an <code class="ph codeph">ARRAY</code>,
+        or the key or value portion of a <code class="ph codeph">MAP</code>) as part of a join query that refers to
+        the scalar value using the appropriate dot notation or <code class="ph codeph">ITEM</code>, <code class="ph codeph">KEY</code>, or <code class="ph codeph">VALUE</code>
+        pseudocolumn names.
+      </p>
+
+
+
+      <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+      <div class="p">
+
+
+        The following examples refer to these simple tables containing small sets of integers or strings:
+<pre class="pre codeblock"><code>[localhost:21000] &gt; create table t1 (x int);
+[localhost:21000] &gt; insert into t1 values (1), (2), (3), (4), (5), (6);
+
+[localhost:21000] &gt; create table t2 (y int);
+[localhost:21000] &gt; insert into t2 values (2), (4), (6);
+
+[localhost:21000] &gt; create table t3 (z int);
+[localhost:21000] &gt; insert into t3 values (1), (3), (5);
+
+[localhost:21000] &gt; create table month_names (m string);
+[localhost:21000] &gt; insert into month_names values
+                  &gt; ('January'), ('February'), ('March'),
+                  &gt; ('April'), ('May'), ('June'), ('July'),
+                  &gt; ('August'), ('September'), ('October'),
+                  &gt; ('November'), ('December');
+</code></pre>
+      </div>
+
+      <p class="p">
+        The following example shows a correlated subquery that finds all the values in one table that exist in another table. For each value
+        <code class="ph codeph">X</code> from <code class="ph codeph">T1</code>, the query checks if the <code class="ph codeph">Y</code> column of <code class="ph codeph">T2</code> contains an
+        identical value, and the <code class="ph codeph">EXISTS</code> operator returns <code class="ph codeph">TRUE</code> or <code class="ph codeph">FALSE</code> as appropriate in
+        each case.
+      </p>
+
+<pre class="pre codeblock"><code>localhost:21000] &gt; select x from t1 where exists (select y from t2 where t1.x = y);
++---+
+| x |
++---+
+| 2 |
+| 4 |
+| 6 |
++---+
+</code></pre>
+
+      <p class="p">
+        An uncorrelated query is less interesting in this case. Because the subquery always returns <code class="ph codeph">TRUE</code>, all rows from
+        <code class="ph codeph">T1</code> are returned. If the table contents where changed so that the subquery did not match any rows, none of the rows
+        from <code class="ph codeph">T1</code> would be returned.
+      </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; select x from t1 where exists (select y from t2 where y &gt; 5);
++---+
+| x |
++---+
+| 1 |
+| 2 |
+| 3 |
+| 4 |
+| 5 |
+| 6 |
++---+
+</code></pre>
+
+      <p class="p">
+        The following example shows how an uncorrelated subquery can test for the existence of some condition within a table. By using
+        <code class="ph codeph">LIMIT 1</code> or an aggregate function, the query returns a single result or no result based on whether the subquery
+        matches any rows. Here, we know that <code class="ph codeph">T1</code> and <code class="ph codeph">T2</code> contain some even numbers, but <code class="ph codeph">T3</code>
+        does not.
+      </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; select "contains an even number" from t1 where exists (select x from t1 where x % 2 = 0) limit 1;
++---------------------------+
+| 'contains an even number' |
++---------------------------+
+| contains an even number   |
++---------------------------+
+[localhost:21000] &gt; select "contains an even number" as assertion from t1 where exists (select x from t1 where x % 2 = 0) limit 1;
++-------------------------+
+| assertion               |
++-------------------------+
+| contains an even number |
++-------------------------+
+[localhost:21000] &gt; select "contains an even number" as assertion from t2 where exists (select x from t2 where y % 2 = 0) limit 1;
+ERROR: AnalysisException: couldn't resolve column reference: 'x'
+[localhost:21000] &gt; select "contains an even number" as assertion from t2 where exists (select y from t2 where y % 2 = 0) limit 1;
++-------------------------+
+| assertion               |
++-------------------------+
+| contains an even number |
++-------------------------+
+[localhost:21000] &gt; select "contains an even number" as assertion from t3 where exists (select z from t3 where z % 2 = 0) limit 1;
+[localhost:21000] &gt;
+</code></pre>
+
+      <p class="p">
+        The following example finds numbers in one table that are 1 greater than numbers from another table. The <code class="ph codeph">EXISTS</code>
+        notation is simpler than an equivalent <code class="ph codeph">CROSS JOIN</code> between the tables. (The example then also illustrates how the
+        same test could be performed using an <code class="ph codeph">IN</code> operator.)
+      </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; select x from t1 where exists (select y from t2 where x = y + 1);
++---+
+| x |
++---+
+| 3 |
+| 5 |
++---+
+[localhost:21000] &gt; select x from t1 where x in (select y + 1 from t2);
++---+
+| x |
++---+
+| 3 |
+| 5 |
++---+
+</code></pre>
+
+      <p class="p">
+        The following example finds values from one table that do not exist in another table.
+      </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; select x from t1 where not exists (select y from t2 where x = y);
++---+
+| x |
++---+
+| 1 |
+| 3 |
+| 5 |
++---+
+</code></pre>
+
+      <p class="p">
+        The following example uses the <code class="ph codeph">NOT EXISTS</code> operator to find all the leaf nodes in tree-structured data. This
+        simplified <span class="q">"tree of life"</span> has multiple levels (class, order, family, and so on), with each item pointing upward through a
+        <code class="ph codeph">PARENT</code> pointer. The example runs an outer query and a subquery on the same table, returning only those items whose
+        <code class="ph codeph">ID</code> value is <em class="ph i">not</em> referenced by the <code class="ph codeph">PARENT</code> of any other item.
+      </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; create table tree (id int, parent int, name string);
+[localhost:21000] &gt; insert overwrite tree values
+                  &gt; (0, null, "animals"),
+                  &gt; (1, 0, "placentals"),
+                  &gt; (2, 0, "marsupials"),
+                  &gt; (3, 1, "bats"),
+                  &gt; (4, 1, "cats"),
+                  &gt; (5, 2, "kangaroos"),
+                  &gt; (6, 4, "lions"),
+                  &gt; (7, 4, "tigers"),
+                  &gt; (8, 5, "red kangaroo"),
+                  &gt; (9, 2, "wallabies");
+[localhost:21000] &gt; select name as "leaf node" from tree one
+                  &gt; where not exists (select parent from tree two where one.id = two.parent);
++--------------+
+| leaf node    |
++--------------+
+| bats         |
+| lions        |
+| tigers       |
+| red kangaroo |
+| wallabies    |
++--------------+
+</code></pre>
+
+      <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+      <p class="p">
+        <a class="xref" href="impala_subqueries.html#subqueries">Subqueries in Impala SELECT Statements</a>
+      </p>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title6" id="operators__ilike">
+
+    <h2 class="title topictitle2" id="ariaid-title6">ILIKE Operator</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        
+        A case-insensitive comparison operator for <code class="ph codeph">STRING</code> data, with basic wildcard capability using <code class="ph codeph">_</code> to match a single
+        character and <code class="ph codeph">%</code> to match multiple characters. The argument expression must match the entire string value.
+        Typically, it is more efficient to put any <code class="ph codeph">%</code> wildcard match at the end of the string.
+      </p>
+
+      <p class="p">
+        This operator, available in <span class="keyword">Impala 2.5</span> and higher, is the equivalent of the <code class="ph codeph">LIKE</code> operator,
+        but with case-insensitive comparisons.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+<pre class="pre codeblock"><code><var class="keyword varname">string_expression</var> ILIKE <var class="keyword varname">wildcard_expression</var>
+<var class="keyword varname">string_expression</var> NOT ILIKE <var class="keyword varname">wildcard_expression</var>
+</code></pre>
+
+      <p class="p">
+        <strong class="ph b">Complex type considerations:</strong>
+      </p>
+
+      <p class="p">
+        You cannot refer to a column with a complex data type (<code class="ph codeph">ARRAY</code>, <code class="ph codeph">STRUCT</code>, or <code class="ph codeph">MAP</code>
+        directly in an operator. You can apply operators only to scalar values that make up a complex type
+        (the fields of a <code class="ph codeph">STRUCT</code>, the items of an <code class="ph codeph">ARRAY</code>,
+        or the key or value portion of a <code class="ph codeph">MAP</code>) as part of a join query that refers to
+        the scalar value using the appropriate dot notation or <code class="ph codeph">ITEM</code>, <code class="ph codeph">KEY</code>, or <code class="ph codeph">VALUE</code>
+        pseudocolumn names.
+      </p>
+
+
+
+      <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+      <p class="p">
+        In the following examples, strings that are the same except for differences in uppercase
+        and lowercase match successfully with <code class="ph codeph">ILIKE</code>, but do not match
+        with <code class="ph codeph">LIKE</code>:
+      </p>
+
+<pre class="pre codeblock"><code>select 'fooBar' ilike 'FOOBAR';
++-------------------------+
+| 'foobar' ilike 'foobar' |
++-------------------------+
+| true                    |
++-------------------------+
+
+select 'fooBar' like 'FOOBAR';
++------------------------+
+| 'foobar' like 'foobar' |
++------------------------+
+| false                  |
++------------------------+
+
+select 'FOOBAR' ilike 'f%';
++---------------------+
+| 'foobar' ilike 'f%' |
++---------------------+
+| true                |
++---------------------+
+
+select 'FOOBAR' like 'f%';
++--------------------+
+| 'foobar' like 'f%' |
++--------------------+
+| false              |
++--------------------+
+
+select 'ABCXYZ' not ilike 'ab_xyz';
++-----------------------------+
+| not 'abcxyz' ilike 'ab_xyz' |
++-----------------------------+
+| false                       |
++-----------------------------+
+
+select 'ABCXYZ' not like 'ab_xyz';
++----------------------------+
+| not 'abcxyz' like 'ab_xyz' |
++----------------------------+
+| true                       |
++----------------------------+
+</code></pre>
+
+      <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+      <p class="p">
+        For case-sensitive comparisons, see <a class="xref" href="impala_operators.html#like">LIKE Operator</a>.
+        For a more general kind of search operator using regular expressions, see <a class="xref" href="impala_operators.html#regexp">REGEXP Operator</a>
+        or its case-insensitive counterpart <a class="xref" href="impala_operators.html#iregexp">IREGEXP Operator</a>.
+      </p>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title7" id="operators__in">
+
+    <h2 class="title topictitle2" id="ariaid-title7">IN Operator</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        
+
+        
+        The <code class="ph codeph">IN</code> operator compares an argument value to a set of values, and returns <code class="ph codeph">TRUE</code> if the argument
+        matches any value in the set. The <code class="ph codeph">NOT IN</code> operator reverses the comparison, and checks if the argument value is not
+        part of a set of values.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+<pre class="pre codeblock"><code><var class="keyword varname">expression</var> IN (<var class="keyword varname">expression</var> [, <var class="keyword varname">expression</var>])
+<var class="keyword varname">expression</var> IN (<var class="keyword varname">subquery</var>)
+
+<var class="keyword varname">expression</var> NOT IN (<var class="keyword varname">expression</var> [, <var class="keyword varname">expression</var>])
+<var class="keyword varname">expression</var> NOT IN (<var class="keyword varname">subquery</var>)
+</code></pre>
+
+      <p class="p">
+        The left-hand expression and the set of comparison values must be of compatible types.
+      </p>
+
+      <p class="p">
+        The left-hand expression must consist only of a single value, not a tuple. Although the left-hand expression is typically a column
+        name, it could also be some other value. For example, the <code class="ph codeph">WHERE</code> clauses <code class="ph codeph">WHERE id IN (5)</code> and
+        <code class="ph codeph">WHERE 5 IN (id)</code> produce the same results.
+      </p>
+
+      <p class="p">
+        The set of values to check against can be specified as constants, function calls, column names, or other expressions in the query
+        text. The maximum number of expressions in the <code class="ph codeph">IN</code> list is 9999. (The maximum number of elements of
+        a single expression is 10,000 items, and the <code class="ph codeph">IN</code> operator itself counts as one.)
+      </p>
+
+      <p class="p">
+        In Impala 2.0 and higher, the set of values can also be generated by a subquery. <code class="ph codeph">IN</code> can evaluate an unlimited
+        number of results using a subquery.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+      <p class="p">
+        Any expression using the <code class="ph codeph">IN</code> operator could be rewritten as a series of equality tests connected with
+        <code class="ph codeph">OR</code>, but the <code class="ph codeph">IN</code> syntax is often clearer, more concise, and easier for Impala to optimize. For
+        example, with partitioned tables, queries frequently use <code class="ph codeph">IN</code> clauses to filter data by comparing the partition key
+        columns to specific values.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">NULL considerations:</strong>
+      </p>
+
+      <p class="p">
+        If there really is a matching non-null value, <code class="ph codeph">IN</code> returns <code class="ph codeph">TRUE</code>:
+      </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; select 1 in (1,null,2,3);
++----------------------+
+| 1 in (1, null, 2, 3) |
++----------------------+
+| true                 |
++----------------------+
+[localhost:21000] &gt; select 1 not in (1,null,2,3);
++--------------------------+
+| 1 not in (1, null, 2, 3) |
++--------------------------+
+| false                    |
++--------------------------+
+</code></pre>
+
+      <p class="p">
+        If the searched value is not found in the comparison values, and the comparison values include <code class="ph codeph">NULL</code>, the result is
+        <code class="ph codeph">NULL</code>:
+      </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; select 5 in (1,null,2,3);
++----------------------+
+| 5 in (1, null, 2, 3) |
++----------------------+
+| NULL                 |
++----------------------+
+[localhost:21000] &gt; select 5 not in (1,null,2,3);
++--------------------------+
+| 5 not in (1, null, 2, 3) |
++--------------------------+
+| NULL                     |
++--------------------------+
+[localhost:21000] &gt; select 1 in (null);
++-------------+
+| 1 in (null) |
++-------------+
+| NULL        |
++-------------+
+[localhost:21000] &gt; select 1 not in (null);
++-----------------+
+| 1 not in (null) |
++-----------------+
+| NULL            |
++-----------------+
+</code></pre>
+
+      <p class="p">
+        If the left-hand argument is <code class="ph codeph">NULL</code>, <code class="ph codeph">IN</code> always returns <code class="ph codeph">NULL</code>. This rule applies even
+        if the comparison values include <code class="ph codeph">NULL</code>.
+      </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; select null in (1,2,3);
++-------------------+
+| null in (1, 2, 3) |
++-------------------+
+| NULL              |
++-------------------+
+[localhost:21000] &gt; select null not in (1,2,3);
++-----------------------+
+| null not in (1, 2, 3) |
++-----------------------+
+| NULL                  |
++-----------------------+
+[localhost:21000] &gt; select null in (null);
++----------------+
+| null in (null) |
++----------------+
+| NULL           |
++----------------+
+[localhost:21000] &gt; select null not in (null);
++--------------------+
+| null not in (null) |
++--------------------+
+| NULL               |
++--------------------+
+</code></pre>
+
+      <p class="p">
+        <strong class="ph b">Added in:</strong> Available in earlier Impala releases, but new capabilities were added in
+        <span class="keyword">Impala 2.0.0</span>
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Complex type considerations:</strong>
+      </p>
+
+      <p class="p">
+        You cannot refer to a column with a complex data type (<code class="ph codeph">ARRAY</code>, <code class="ph codeph">STRUCT</code>, or <code class="ph codeph">MAP</code>
+        directly in an operator. You can apply operators only to scalar values that make up a complex type
+        (the fields of a <code class="ph codeph">STRUCT</code>, the items of an <code class="ph codeph">ARRAY</code>,
+        or the key or value portion of a <code class="ph codeph">MAP</code>) as part of a join query that refers to
+        the scalar value using the appropriate dot notation or <code class="ph codeph">ITEM</code>, <code class="ph codeph">KEY</code>, or <code class="ph codeph">VALUE</code>
+        pseudocolumn names.
+      </p>
+
+      <p class="p">
+        The following example shows how to do an arithmetic operation using a numeric field of a <code class="ph codeph">STRUCT</code> type that is an
+        item within an <code class="ph codeph">ARRAY</code> column. Once the scalar numeric value <code class="ph codeph">R_NATIONKEY</code> is extracted, it can be
+        used in an arithmetic expression, such as multiplying by 10:
+      </p>
+
+<pre class="pre codeblock"><code>
+-- The SMALLINT is a field within an array of structs.
+describe region;
++-------------+-------------------------+---------+
+| name        | type                    | comment |
++-------------+-------------------------+---------+
+| r_regionkey | smallint                |         |
+| r_name      | string                  |         |
+| r_comment   | string                  |         |
+| r_nations   | array&lt;struct&lt;           |         |
+|             |   n_nationkey:smallint, |         |
+|             |   n_name:string,        |         |
+|             |   n_comment:string      |         |
+|             | &gt;&gt;                      |         |
++-------------+-------------------------+---------+
+
+-- When we refer to the scalar value using dot notation,
+-- we can use arithmetic and comparison operators on it
+-- like any other number.
+select r_name, nation.item.n_name, nation.item.n_nationkey
+from region, region.r_nations as nation
+where nation.item.n_nationkey in (1,3,5)
++---------+-------------+------------------+
+| r_name  | item.n_name | item.n_nationkey |
++---------+-------------+------------------+
+| AMERICA | CANADA      | 3                |
+| AMERICA | ARGENTINA   | 1                |
+| AFRICA  | ETHIOPIA    | 5                |
++---------+-------------+------------------+
+</code></pre>
+
+      <p class="p">
+        <strong class="ph b">Restrictions:</strong>
+      </p>
+
+      <p class="p">
+        Correlated subqueries used in <code class="ph codeph">EXISTS</code> and <code class="ph codeph">IN</code> operators cannot include a
+        <code class="ph codeph">LIMIT</code> clause.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>-- Using IN is concise and self-documenting.
+SELECT * FROM t1 WHERE c1 IN (1,2,10);
+-- Equivalent to series of = comparisons ORed together.
+SELECT * FROM t1 WHERE c1 = 1 OR c1 = 2 OR c1 = 10;
+
+SELECT c1 AS "starts with vowel" FROM t2 WHERE upper(substr(c1,1,1)) IN ('A','E','I','O','U');
+
+SELECT COUNT(DISTINCT(visitor_id)) FROM web_traffic WHERE month IN ('January','June','July');</code></pre>
+
+      <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+      <p class="p">
+        <a class="xref" href="impala_subqueries.html#subqueries">Subqueries in Impala SELECT Statements</a>
+      </p>
+
+    </div>
+
+  </article>
+
+  
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title8" id="operators__iregexp">
+
+    <h2 class="title topictitle2" id="ariaid-title8">IREGEXP Operator</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        
+        Tests whether a value matches a regular expression, using case-insensitive string comparisons.
+        Uses the POSIX regular expression syntax where <code class="ph codeph">^</code> and
+        <code class="ph codeph">$</code> match the beginning and end of the string, <code class="ph codeph">.</code> represents any single character, <code class="ph codeph">*</code>
+        represents a sequence of zero or more items, <code class="ph codeph">+</code> represents a sequence of one or more items, <code class="ph codeph">?</code>
+        produces a non-greedy match, and so on.
+      </p>
+
+      <p class="p">
+        This operator, available in <span class="keyword">Impala 2.5</span> and higher, is the equivalent of the <code class="ph codeph">REGEXP</code> operator,
+        but with case-insensitive comparisons.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+<pre class="pre codeblock"><code><var class="keyword varname">string_expression</var> IREGEXP <var class="keyword varname">regular_expression</var>
+</code></pre>
+
+      <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+      <p class="p">
+        The regular expression must match the entire value, not just occur somewhere inside it. Use <code class="ph codeph">.*</code> at the beginning,
+        the end, or both if you only need to match characters anywhere in the middle. Thus, the <code class="ph codeph">^</code> and <code class="ph codeph">$</code>
+        atoms are often redundant, although you might already have them in your expression strings that you reuse from elsewhere.
+      </p>
+
+
+
+      <p class="p">
+        The <code class="ph codeph">|</code> symbol is the alternation operator, typically used within <code class="ph codeph">()</code> to match different sequences.
+        The <code class="ph codeph">()</code> groups do not allow backreferences. To retrieve the part of a value matched within a <code class="ph codeph">()</code>
+        section, use the <code class="ph codeph"><a class="xref" href="impala_string_functions.html#string_functions__regexp_extract">regexp_extract()</a></code>
+        built-in function. (Currently, there is not any case-insensitive equivalent for the <code class="ph codeph">regexp_extract()</code> function.)
+      </p>
+
+      <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+        <p class="p">
+        In Impala 1.3.1 and higher, the <code class="ph codeph">REGEXP</code> and <code class="ph codeph">RLIKE</code> operators now match a
+        regular expression string that occurs anywhere inside the target string, the same as if the regular
+        expression was enclosed on each side by <code class="ph codeph">.*</code>. See
+        <a class="xref" href="../shared/../topics/impala_operators.html#regexp">REGEXP Operator</a> for examples. Previously, these operators only
+        succeeded when the regular expression matched the entire target string. This change improves compatibility
+        with the regular expression support for popular database systems. There is no change to the behavior of the
+        <code class="ph codeph">regexp_extract()</code> and <code class="ph codeph">regexp_replace()</code> built-in functions.
+      </p>
+      </div>
+
+      <p class="p">
+        In Impala 2.0 and later, the Impala regular expression syntax conforms to the POSIX Extended Regular
+        Expression syntax used by the Google RE2 library. For details, see
+        <a class="xref" href="https://code.google.com/p/re2/" target="_blank">the RE2 documentation</a>. It
+        has most idioms familiar from regular expressions in Perl, Python, and so on, including
+        <code class="ph codeph">.*?</code> for non-greedy matches.
+      </p>
+
+      <p class="p">
+        In Impala 2.0 and later, a change in the underlying regular expression library could cause changes in the
+        way regular expressions are interpreted by this function. Test any queries that use regular expressions and
+        adjust the expression patterns if necessary. See
+        <a class="xref" href="../shared/../topics/impala_incompatible_changes.html#incompatible_changes_200">Incompatible Changes Introduced in Impala 2.0.0</a> for details.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Complex type considerations:</strong>
+      </p>
+
+      <p class="p">
+        You cannot refer to a column with a complex data type (<code class="ph codeph">ARRAY</code>, <code class="ph codeph">STRUCT</code>, or <code class="ph codeph">MAP</code>
+        directly in an operator. You can apply operators only to scalar values that make up a complex type
+        (the fields of a <code class="ph codeph">STRUCT</code>, the items of an <code class="ph codeph">ARRAY</code>,
+        or the key or value portion of a <code class="ph codeph">MAP</code>) as part of a join query that refers to
+        the scalar value using the appropriate dot notation or <code class="ph codeph">ITEM</code>, <code class="ph codeph">KEY</code>, or <code class="ph codeph">VALUE</code>
+        pseudocolumn names.
+      </p>
+
+
+
+      <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+      <p class="p">
+        The following examples demonstrate the syntax for the <code class="ph codeph">IREGEXP</code> operator.
+      </p>
+
+<pre class="pre codeblock"><code>select 'abcABCaabbcc' iregexp '^[a-c]+$';
++---------------------------------+
+| 'abcabcaabbcc' iregexp '[a-c]+' |
++---------------------------------+
+| true                            |
++---------------------------------+
+</code></pre>
+
+      <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+      <p class="p">
+        <a class="xref" href="impala_operators.html#regexp">REGEXP Operator</a>
+      </p>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="is_distinct_from__is_distinct" id="operators__is_distinct_from">
+
+    <h2 class="title topictitle2" id="is_distinct_from__is_distinct">IS DISTINCT FROM Operator</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        
+
+        
+        The <code class="ph codeph">IS DISTINCT FROM</code> operator, and its converse the <code class="ph codeph">IS NOT DISTINCT FROM</code> operator, test whether or
+        not values are identical. <code class="ph codeph">IS NOT DISTINCT FROM</code> is similar to the <code class="ph codeph">=</code> operator, and <code class="ph codeph">IS
+        DISTINCT FROM</code> is similar to the <code class="ph codeph">!=</code> operator, except that <code class="ph codeph">NULL</code> values are treated as
+        identical. Therefore, <code class="ph codeph">IS NOT DISTINCT FROM</code> returns <code class="ph codeph">true</code> rather than <code class="ph codeph">NULL</code>, and
+        <code class="ph codeph">IS DISTINCT FROM</code> returns <code class="ph codeph">false</code> rather than <code class="ph codeph">NULL</code>, when comparing two
+        <code class="ph codeph">NULL</code> values. If one of the values being compared is <code class="ph codeph">NULL</code> and the other is not, <code class="ph codeph">IS DISTINCT
+        FROM</code> returns <code class="ph codeph">true</code> and <code class="ph codeph">IS NOT DISTINCT FROM</code> returns <code class="ph codeph">false</code>, again instead
+        of returning <code class="ph codeph">NULL</code> in both cases.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+<pre class="pre codeblock"><code><var class="keyword varname">expression1</var> IS DISTINCT FROM <var class="keyword varname">expression2</var>
+
+<var class="keyword varname">expression1</var> IS NOT DISTINCT FROM <var class="keyword varname">expression2</var>
+<var class="keyword varname">expression1</var> &lt;=&gt; <var class="keyword varname">expression2</var>
+</code></pre>
+
+      <p class="p">
+        The operator <code class="ph codeph">&lt;=&gt;</code> is an alias for <code class="ph codeph">IS NOT DISTINCT FROM</code>.
+        It is typically used as a <code class="ph codeph">NULL</code>-safe equality operator in join queries.
+        That is, <code class="ph codeph">A &lt;=&gt; B</code> is true if <code class="ph codeph">A</code> equals <code class="ph codeph">B</code>
+        or if both <code class="ph codeph">A</code> and <code class="ph codeph">B</code> are <code class="ph codeph">NULL</code>.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+      <p class="p">
+        This operator provides concise notation for comparing two values and always producing a <code class="ph codeph">true</code> or
+        <code class="ph codeph">false</code> result, without treating <code class="ph codeph">NULL</code> as a special case. Otherwise, to unambiguously distinguish
+        between two values requires a compound expression involving <code class="ph codeph">IS [NOT] NULL</code> tests of both operands in addition to the
+        <code class="ph codeph">=</code> or <code class="ph codeph">!=</code> operator.
+      </p>
+
+      <p class="p">
+        The <code class="ph codeph">&lt;=&gt;</code> operator, used like an equality operator in a join query,
+        is more efficient than the equivalent clause: <code class="ph codeph">A = B OR (A IS NULL AND B IS NULL)</code>.
+        The <code class="ph codeph">&lt;=&gt;</code> operator can use a hash join, while the <code class="ph codeph">OR</code> expression
+        cannot.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+      <p class="p">
+        The following examples show how <code class="ph codeph">IS DISTINCT FROM</code> gives output similar to
+        the <code class="ph codeph">!=</code> operator, and <code class="ph codeph">IS NOT DISTINCT FROM</code> gives output
+        similar to the <code class="ph codeph">=</code> operator. The exception is when the expression involves
+        a <code class="ph codeph">NULL</code> value on one side or both sides, where <code class="ph codeph">!=</code> and
+        <code class="ph codeph">=</code> return <code class="ph codeph">NULL</code> but the <code class="ph codeph">IS [NOT] DISTINCT FROM</code>
+        operators still return <code class="ph codeph">true</code> or <code class="ph codeph">false</code>.
+      </p>
+
+<pre class="pre codeblock"><code>
+select 1 is distinct from 0, 1 != 0;
++----------------------+--------+
+| 1 is distinct from 0 | 1 != 0 |
++----------------------+--------+
+| true                 | true   |
++----------------------+--------+
+
+select 1 is distinct from 1, 1 != 1;
++----------------------+--------+
+| 1 is distinct from 1 | 1 != 1 |
++----------------------+--------+
+| false                | false  |
++----------------------+--------+
+
+select 1 is distinct from null, 1 != null;
++-------------------------+-----------+
+| 1 is distinct from null | 1 != null |
++-------------------------+-----------+
+| true                    | NULL      |
++-------------------------+-----------+
+
+select null is distinct from null, null != null;
++----------------------------+--------------+
+| null is distinct from null | null != null |
++----------------------------+--------------+
+| false                      | NULL         |
++----------------------------+--------------+
+
+select 1 is not distinct from 0, 1 = 0;
++--------------------------+-------+
+| 1 is not distinct from 0 | 1 = 0 |
++--------------------------+-------+
+| false                    | false |
++--------------------------+-------+
+
+select 1 is not distinct from 1, 1 = 1;
++--------------------------+-------+
+| 1 is not distinct from 1 | 1 = 1 |
++--------------------------+-------+
+| true                     | true  |
++--------------------------+-------+
+
+select 1 is not distinct from null, 1 = null;
++-----------------------------+----------+
+| 1 is not distinct from null | 1 = null |
++-----------------------------+----------+
+| false                       | NULL     |
++-----------------------------+----------+
+
+select null is not distinct from null, null = null;
++--------------------------------+-------------+
+| null is not distinct from null | null = null |
++--------------------------------+-------------+
+| true                           | NULL        |
++--------------------------------+-------------+
+</code></pre>
+
+      <p class="p">
+        The following example shows how <code class="ph codeph">IS DISTINCT FROM</code> considers
+        <code class="ph codeph">CHAR</code> values to be the same (not distinct from each other)
+        if they only differ in the number of trailing spaces. Therefore, sometimes
+        the result of an <code class="ph codeph">IS [NOT] DISTINCT FROM</code> operator differs
+        depending on whether the values are <code class="ph codeph">STRING</code>/<code class="ph codeph">VARCHAR</code>
+        or <code class="ph codeph">CHAR</code>.
+      </p>
+
+<pre class="pre codeblock"><code>
+select
+  'x' is distinct from 'x ' as string_with_trailing_spaces,
+  cast('x' as char(5)) is distinct from cast('x ' as char(5)) as char_with_trailing_spaces;
++-----------------------------+---------------------------+
+| string_with_trailing_spaces | char_with_trailing_spaces |
++-----------------------------+---------------------------+
+| true                        | false                     |
++-----------------------------+---------------------------+
+</code></pre>
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title10" id="operators__is_null">
+
+    <h2 class="title topictitle2" id="ariaid-title10">IS NULL Operator</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        
+
+        
+        The <code class="ph codeph">IS NULL</code> operator, and its converse the <code class="ph codeph">IS NOT NULL</code> operator, test whether a specified value is
+        <code class="ph codeph"><a class="xref" href="impala_literals.html#null">NULL</a></code>. Because using <code class="ph codeph">NULL</code> with any of the other
+        comparison operators such as <code class="ph codeph">=</code> or <code class="ph codeph">!=</code> also returns <code class="ph codeph">NULL</code> rather than
+        <code class="ph codeph">TRUE</code> or <code class="ph codeph">FALSE</code>, you use a special-purpose comparison operator to check for this special condition.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+<pre class="pre codeblock"><code><var class="keyword varname">expression</var> IS NULL
+<var class="keyword varname">expression</var> IS NOT NULL
+</code></pre>
+
+      <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+      <p class="p">
+        In many cases, <code class="ph codeph">NULL</code> values indicate some incorrect or incomplete processing during data ingestion or conversion.
+        You might check whether any values in a column are <code class="ph codeph">NULL</code>, and if so take some followup action to fill them in.
+      </p>
+
+      <p class="p">
+        With sparse data, often represented in <span class="q">"wide"</span> tables, it is common for most values to be <code class="ph codeph">NULL</code> with only an
+        occasional non-<code class="ph codeph">NULL</code> value. In those cases, you can use the <code class="ph codeph">IS NOT NULL</code> operator to identify the
+        rows containing any data at all for a particular column, regardless of the actual value.
+      </p>
+
+      <p class="p">
+        With a well-designed database schema, effective use of <code class="ph codeph">NULL</code> values and <code class="ph codeph">IS NULL</code> and <code class="ph codeph">IS NOT
+        NULL</code> operators can save having to design custom logic around special values such as 0, -1, <code class="ph codeph">'N/A'</code>, empty
+        string, and so on. <code class="ph codeph">NULL</code> lets you distinguish between a value that is known to be 0, false, or empty, and a truly
+        unknown value.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Complex type considerations:</strong>
+      </p>
+
+      <p class="p">
+        This operator is not applicable to complex type columns (<code class="ph codeph">STRUCT</code>,
+        <code class="ph codeph">ARRAY</code>, or <code class="ph codeph">MAP</code>). Using a complex type column
+        with <code class="ph codeph">IS NULL</code> or <code class="ph codeph">IS NOT NULL</code> causes a query error.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>-- If this value is non-zero, something is wrong.
+select count(*) from employees where employee_id is null;
+
+-- With data from disparate sources, some fields might be blank.
+-- Not necessarily an error condition.
+select count(*) from census where household_income is null;
+
+-- Sometimes we expect fields to be null, and followup action
+-- is needed when they are not.
+select count(*) from web_traffic where weird_http_code is not null;</code></pre>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title11" id="operators__like">
+
+    <h2 class="title topictitle2" id="ariaid-title11">LIKE Operator</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        
+        A comparison operator for <code class="ph codeph">STRING</code> data, with basic wildcard capability using the underscore
+        (<code class="ph codeph">_</code>) to match a single character and the percent sign (<code class="ph codeph">%</code>) to match multiple
+        characters. The argument expression must match the entire string value.
+        Typically, it is more efficient to put any <code class="ph codeph">%</code> wildcard match at the end of the string.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+<pre class="pre codeblock"><code><var class="keyword varname">string_expression</var> LIKE <var class="keyword varname">wildcard_expression</var>
+<var class="keyword varname">string_expression</var> NOT LIKE <var class="keyword varname">wildcard_expression</var>
+</code></pre>
+
+      <p class="p">
+        <strong class="ph b">Complex type considerations:</strong>
+      </p>
+
+      <p class="p">
+        You cannot refer to a column with a complex data type (<code class="ph codeph">ARRAY</code>, <code class="ph codeph">STRUCT</code>, or <code class="ph codeph">MAP</code>
+        directly in an operator. You can apply operators only to scalar values that make up a complex type
+        (the fields of a <code class="ph codeph">STRUCT</code>, the items of an <code class="ph codeph">ARRAY</code>,
+        or the key or value portion of a <code class="ph codeph">MAP</code>) as part of a join query that refers to
+        the scalar value using the appropriate dot notation or <code class="ph codeph">ITEM</code>, <code class="ph codeph">KEY</code>, or <code class="ph codeph">VALUE</code>
+        pseudocolumn names.
+      </p>
+
+
+
+      <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>select distinct c_last_name from customer where c_last_name like 'Mc%' or c_last_name like 'Mac%';
+select count(c_last_name) from customer where c_last_name like 'M%';
+select c_email_address from customer where c_email_address like '%.edu';
+
+-- We can find 4-letter names beginning with 'M' by calling functions...
+select distinct c_last_name from customer where length(c_last_name) = 4 and substr(c_last_name,1,1) = 'M';
+-- ...or in a more readable way by matching M followed by exactly 3 characters.
+select distinct c_last_name from customer where c_last_name like 'M___';</code></pre>
+
+      <p class="p">
+        For case-insensitive comparisons, see <a class="xref" href="impala_operators.html#ilike">ILIKE Operator</a>.
+        For a more general kind of search operator using regular expressions, see <a class="xref" href="impala_operators.html#regexp">REGEXP Operator</a>
+        or its case-insensitive counterpart <a class="xref" href="impala_operators.html#iregexp">IREGEXP Operator</a>.
+      </p>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title12" id="operators__logical_operators">
+
+    <h2 class="title topictitle2" id="ariaid-title12">Logical Operators</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        
+        Logical operators return a <code class="ph codeph">BOOLEAN</code> value, based on a binary or unary logical operation between arguments that are
+        also Booleans. Typically, the argument expressions use <a class="xref" href="impala_operators.html#comparison_operators">comparison
+        operators</a>.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+<pre class="pre codeblock"><code><var class="keyword varname">boolean_expression</var> <var class="keyword varname">binary_logical_operator</var> <var class="keyword varname">boolean_expression</var>
+<var class="keyword varname">unary_logical_operator</var> <var class="keyword varname">boolean_expression</var>
+</code></pre>
+
+      <p class="p">
+        The Impala logical operators are:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          <code class="ph codeph">AND</code>: A binary operator that returns <code class="ph codeph">true</code> if its left-hand and right-hand arguments both evaluate
+          to <code class="ph codeph">true</code>, <code class="ph codeph">NULL</code> if either argument is <code class="ph codeph">NULL</code>, and <code class="ph codeph">false</code> otherwise.
+        </li>
+
+        <li class="li">
+          <code class="ph codeph">OR</code>: A binary operator that returns <code class="ph codeph">true</code> if either of its left-hand and right-hand arguments
+          evaluate to <code class="ph codeph">true</code>, <code class="ph codeph">NULL</code> if one argument is <code class="ph codeph">NULL</code> and the other is either
+          <code class="ph codeph">NULL</code> or <code class="ph codeph">false</code>, and <code class="ph codeph">false</code> otherwise.
+        </li>
+
+        <li class="li">
+          <code class="ph codeph">NOT</code>: A unary operator that flips the state of a Boolean expression from <code class="ph codeph">true</code> to
+          <code class="ph codeph">false</code>, or <code class="ph codeph">false</code> to <code class="ph codeph">true</code>. If the argument expression is <code class="ph codeph">NULL</code>,
+          the result remains <code class="ph codeph">NULL</code>. (When <code class="ph codeph">NOT</code> is used this way as a unary logical operator, it works
+          differently than the <code class="ph codeph">IS NOT NULL</code> comparison operator, which returns <code class="ph codeph">true</code> when applied to a
+          <code class="ph codeph">NULL</code>.)
+        </li>
+      </ul>
+
+      <p class="p">
+        <strong class="ph b">Complex type considerations:</strong>
+      </p>
+
+      <p class="p">
+        You cannot refer to a column with a complex data type (<code class="ph codeph">ARRAY</code>, <code class="ph codeph">STRUCT</code>, or <code class="ph codeph">MAP</code>
+        directly in an operator. You can apply operators only to scalar values that make up a complex type
+        (the fields of a <code class="ph codeph">STRUCT</code>, the items of an <code class="ph codeph">ARRAY</code>,
+        or the key or value portion of a <code class="ph codeph">MAP</code>) as part of a join query that refers to
+        the scalar value using the appropriate dot notation or <code class="ph codeph">ITEM</code>, <code class="ph codeph">KEY</code>, or <code class="ph codeph">VALUE</code>
+        pseudocolumn names.
+      </p>
+
+      <p class="p">
+        The following example shows how to do an arithmetic operation using a numeric field of a <code class="ph codeph">STRUCT</code> type that is an
+        item within an <code class="ph codeph">ARRAY</code> column. Once the scalar numeric value <code class="ph codeph">R_NATIONKEY</code> is extracted, it can be
+        used in an arithmetic expression, such as multiplying by 10:
+      </p>
+
+<pre class="pre codeblock"><code>
+-- The SMALLINT is a field within an array of structs.
+describe region;
++-------------+-------------------------+---------+
+| name        | type                    | comment |
++-------------+-------------------------+---------+
+| r_regionkey | smallint                |         |
+| r_name      | string                  |         |
+| r_comment   | string                  |         |
+| r_nations   | array&lt;struct&lt;           |         |
+|             |   n_nationkey:smallint, |         |
+|             |   n_name:string,        |         |
+|             |   n_comment:string      |         |
+|             | &gt;&gt;                      |         |
++-------------+-------------------------+---------+
+
+-- When we refer to the scalar value using dot notation,
+-- we can use arithmetic and comparison operators on it
+-- like any other number.
+select r_name, nation.item.n_name, nation.item.n_nationkey
+  from region, region.r_nations as nation
+where
+  nation.item.n_nationkey between 3 and 5
+  or nation.item.n_nationkey &lt; 15;
++-------------+----------------+------------------+
+| r_name      | item.n_name    | item.n_nationkey |
++-------------+----------------+------------------+
+| EUROPE      | UNITED KINGDOM | 23               |
+| EUROPE      | RUSSIA         | 22               |
+| EUROPE      | ROMANIA        | 19               |
+| ASIA        | VIETNAM        | 21               |
+| ASIA        | CHINA          | 18               |
+| AMERICA     | UNITED STATES  | 24               |
+| AMERICA     | PERU           | 17               |
+| AMERICA     | CANADA         | 3                |
+| MIDDLE EAST | SAUDI ARABIA   | 20               |
+| MIDDLE EAST | EGYPT          | 4                |
+| AFRICA      | MOZAMBIQUE     | 16               |
+| AFRICA      | ETHIOPIA       | 5                |
++-------------+----------------+------------------+
+</code></pre>
+
+      <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+      <p class="p">
+        These examples demonstrate the <code class="ph codeph">AND</code> operator:
+      </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; select true and true;
++---------------+
+| true and true |
++---------------+
+| true          |
++---------------+
+[localhost:21000] &gt; select true and false;
++----------------+
+| true and false |
++----------------+
+| false          |
++----------------+
+[localhost:21000] &gt; select false and false;
++-----------------+
+| false and false |
++-----------------+
+| false           |
++-----------------+
+[localhost:21000] &gt; select true and null;
++---------------+
+| true and null |
++---------------+
+| NULL          |
++---------------+
+[localhost:21000] &gt; select (10 &gt; 2) and (6 != 9);
++-----------------------+
+| (10 &gt; 2) and (6 != 9) |
++-----------------------+
+| true                  |
++-----------------------+
+</code></pre>
+
+      <p class="p">
+        These examples demonstrate the <code class="ph codeph">OR</code> operator:
+      </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; select true or true;
++--------------+
+| true or true |
++--------------+
+| true         |
++--------------+
+[localhost:21000] &gt; select true or false;
++---------------+
+| true or false |
++---------------+
+| true          |
++---------------+
+[localhost:21000] &gt; select false or false;
++----------------+
+| false or false |
++----------------+
+| false          |
++----------------+
+[localhost:21000] &gt; select true or null;
++--------------+
+| true or null |
++--------------+
+| true         |
++--------------+
+[localhost:21000] &gt; select null or true;
++--------------+
+| null or true |
++--------------+
+| true         |
++--------------+
+[localhost:21000] &gt; select false or null;
++---------------+
+| false or null |
++---------------+
+| NULL          |
++---------------+
+[localhost:21000] &gt; select (1 = 1) or ('hello' = 'world');
++--------------------------------+
+| (1 = 1) or ('hello' = 'world') |
++--------------------------------+
+| true                           |
++--------------------------------+
+[localhost:21000] &gt; select (2 + 2 != 4) or (-1 &gt; 0);
++--------------------------+
+| (2 + 2 != 4) or (-1 &gt; 0) |
++--------------------------+
+| false                    |
++--------------------------+
+</code></pre>
+
+      <p class="p">
+        These examples demonstrate the <code class="ph codeph">NOT</code> operator:
+      </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; select not true;
++----------+
+| not true |
++----------+
+| false    |
++----------+
+[localhost:21000] &gt; select not false;
++-----------+
+| not false |
++-----------+
+| true      |
++-----------+
+[localhost:21000] &gt; select not null;
++----------+
+| not null |
++----------+
+| NULL     |
++----------+
+[localhost:21000] &gt; select not (1=1);
++-------------+
+| not (1 = 1) |
++-------------+
+| false       |
++-------------+
+</code></pre>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title13" id="operators__regexp">
+
+    <h2 class="title topictitle2" id="ariaid-title13">REGEXP Operator</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        
+        Tests whether a value matches a regular expression. Uses the POSIX regular expression syntax where <code class="ph codeph">^</code> and
+        <code class="ph codeph">$</code> match the beginning and end of the string, <code class="ph codeph">.</code> represents any single character, <code class="ph codeph">*</code>
+        represents a sequence of zero or more items, <code class="ph codeph">+</code> represents a sequence of one or more items, <code class="ph codeph">?</code>
+        produces a non-greedy match, and so on.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+<pre class="pre codeblock"><code><var class="keyword varname">string_expression</var> REGEXP <var class="keyword varname">regular_expression</var>
+</code></pre>
+
+      <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+      <p class="p">
+        The regular expression must match the entire value, not just occur somewhere inside it. Use <code class="ph codeph">.*</code> at the beginning,
+        the end, or both if you only need to match characters anywhere in the middle. Thus, the <code class="ph codeph">^</code> and <code class="ph codeph">$</code>
+        atoms are often redundant, although you might already have them in your expression strings that you reuse from elsewhere.
+      </p>
+
+      <p class="p">
+        The <code class="ph codeph">RLIKE</code> operator is a synonym for <code class="ph codeph">REGEXP</code>.
+      </p>
+
+      <p class="p">
+        The <code class="ph codeph">|</code> symbol is the alternation operator, typically used within <code class="ph codeph">()</code> to match different sequences.
+        The <code class="ph codeph">()</code> groups do not allow backreferences. To retrieve the part of a value matched within a <code class="ph codeph">()</code>
+        section, use the <code class="ph codeph"><a class="xref" href="impala_string_functions.html#string_functions__regexp_extract">regexp_extract()</a></code>
+        built-in function.
+      </p>
+
+      <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+          <p class="p">
+        In Impala 1.3.1 and higher, the <code class="ph codeph">REGEXP</code> and <code class="ph codeph">RLIKE</code> operators now match a
+        regular expression string that occurs anywhere inside the target string, the same as if the regular
+        expression was enclosed on each side by <code class="ph codeph">.*</code>. See
+        <a class="xref" href="../shared/../topics/impala_operators.html#regexp">REGEXP Operator</a> for examples. Previously, these operators only
+        succeeded when the regular expression matched the entire target string. This change improves compatibility
+        with the regular expression support for popular database systems. There is no change to the behavior of the
+        <code class="ph codeph">regexp_extract()</code> and <code class="ph codeph">regexp_replace()</code> built-in functions.
+      </p>
+      </div>
+
+      <p class="p">
+        In Impala 2.0 and later, the Impala regular expression syntax conforms to the POSIX Extended Regular
+        Expression syntax used by the Google RE2 library. For details, see
+        <a class="xref" href="https://code.google.com/p/re2/" target="_blank">the RE2 documentation</a>. It
+        has most idioms familiar from regular expressions in Perl, Python, and so on, including
+        <code class="ph codeph">.*?</code> for non-greedy matches.
+      </p>
+
+      <p class="p">
+        In Impala 2.0 and later, a change in the underlying regular expression library could cause changes in the
+        way regular expressions are interpreted by this function. Test any queries that use regular expressions and
+        adjust the expression patterns if necessary. See
+        <a class="xref" href="../shared/../topics/impala_incompatible_changes.html#incompatible_changes_200">Incompatible Changes Introduced in Impala 2.0.0</a> for details.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Complex type considerations:</strong>
+      </p>
+
+      <p class="p">
+        You cannot refer to a column with a complex data type (<code class="ph codeph">ARRAY</code>, <code class="ph codeph">STRUCT</code>, or <code class="ph codeph">MAP</code>
+        directly in an operator. You can apply operators only to scalar values that make up a complex type
+        (the fields of a <code class="ph codeph">STRUCT</code>, the items of an <code class="ph codeph">ARRAY</code>,
+        or the key or value portion of a <code class="ph codeph">MAP</code>) as part of a join query that refers to
+        the scalar value using the appropriate dot notation or <code class="ph codeph">ITEM</code>, <code class="ph codeph">KEY</code>, or <code class="ph codeph">VALUE</code>
+        pseudocolumn names.
+      

<TRUNCATED>

[27/51] [partial] incubator-impala git commit: IMPALA-4181 [DOCS] Publish rendered Impala documentation to ASF site

Posted by jb...@apache.org.

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_known_issues.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_known_issues.html b/docs/build/html/topics/impala_known_issues.html
new file mode 100644
index 0000000..e496bdc
--- /dev/null
+++ b/docs/build/html/topics/impala_known_issues.html
@@ -0,0 +1,1712 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_release_notes.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="known_issues"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Known Issues and Workarounds in Impala</title></head><body id="known_issues"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1"><span class="ph">Known Issues and Workarounds in Impala</span></h1>
+
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      The following sections describe known issues and workarounds in Impala, as of the current production release. This page summarizes the
+      most serious or frequently encountered issues in the current release, to help you make planning decisions about installing and
+      upgrading. Any workarounds are listed here. The bug links take you to the Impala issues site, where you can see the diagnosis and
+      whether a fix is in the pipeline.
+    </p>
+
+    <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+      The online issue tracking system for Impala contains comprehensive information and is updated in real time. To verify whether an issue
+      you are experiencing has already been reported, or which release an issue is fixed in, search on the
+      <a class="xref" href="https://issues.apache.org/jira/" target="_blank">issues.apache.org JIRA tracker</a>.
+    </div>
+
+    <p class="p toc inpage"></p>
+
+    <p class="p">
+      For issues fixed in various Impala releases, see <a class="xref" href="impala_fixed_issues.html#fixed_issues">Fixed Issues in Apache Impala (incubating)</a>.
+    </p>
+
+
+
+  </div>
+
+
+
+  <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_release_notes.html">Impala Release Notes</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="known_issues__known_issues_crash">
+
+    <h2 class="title topictitle2" id="ariaid-title2">Impala Known Issues: Crashes and Hangs</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        These issues can cause Impala to quit or become unresponsive.
+      </p>
+
+    </div>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title3" id="known_issues_crash__IMPALA-4828">
+      <h3 class="title topictitle3" id="ariaid-title3">Altering Kudu table schema outside of Impala may result in crash on read</h3>
+      <div class="body conbody">
+        <p class="p">
+          Creating a table in Impala, changing the column schema outside of Impala,
+          and then reading again in Impala may result in a crash. Neither Impala nor
+          the Kudu client validates the schema immediately before reading, so Impala may attempt to
+          dereference pointers that aren't there. This happens if a string column is dropped
+          and then a new, non-string column is added with the old string column's name.
+        </p>
+        <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-4828" target="_blank">IMPALA-4828</a></p>
+        <p class="p"><strong class="ph b">Severity:</strong> High</p>
+        <p class="p"><strong class="ph b">Workaround:</strong> Run the statement <code class="ph codeph">REFRESH <var class="keyword varname">table_name</var></code>
+          after any occasion when the table structure, such as the number, names, and data types
+          of columns, are modified outside of Impala using the Kudu API.
+        </p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title4" id="known_issues_crash__IMPALA-1972">
+
+      <h3 class="title topictitle3" id="ariaid-title4">Queries that take a long time to plan can cause webserver to block other queries</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          Trying to get the details of a query through the debug web page
+          while the query is planning will block new queries that had not
+          started when the web page was requested. The web UI becomes
+          unresponsive until the planning phase is finished.
+        </p>
+
+        <p class="p">
+          <strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-1972" target="_blank">IMPALA-1972</a>
+        </p>
+
+        <p class="p">
+          <strong class="ph b">Severity:</strong> High
+        </p>
+
+      </div>
+
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title5" id="known_issues_crash__IMPALA-3069">
+
+      <h3 class="title topictitle3" id="ariaid-title5">Setting BATCH_SIZE query option too large can cause a crash</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          Using a value in the millions for the <code class="ph codeph">BATCH_SIZE</code> query option, together with wide rows or large string values in
+          columns, could cause a memory allocation of more than 2 GB resulting in a crash.
+        </p>
+
+        <p class="p">
+          <strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-3069" target="_blank">IMPALA-3069</a>
+        </p>
+
+        <p class="p">
+          <strong class="ph b">Severity:</strong> High
+        </p>
+
+        <p class="p"><strong class="ph b">Resolution:</strong> Fixed in <span class="keyword">Impala 2.7.0</span>.</p>
+
+      </div>
+
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title6" id="known_issues_crash__IMPALA-3441">
+
+      <h3 class="title topictitle3" id="ariaid-title6">Impala should not crash for invalid avro serialized data</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          Malformed Avro data, such as out-of-bounds integers or values in the wrong format, could cause a crash when queried.
+        </p>
+
+        <p class="p">
+          <strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-3441" target="_blank">IMPALA-3441</a>
+        </p>
+
+        <p class="p">
+          <strong class="ph b">Severity:</strong> High
+        </p>
+
+        <p class="p"><strong class="ph b">Resolution:</strong> Fixed in <span class="keyword">Impala 2.7.0</span> and <span class="keyword">Impala 2.6.2</span>.</p>
+
+      </div>
+
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title7" id="known_issues_crash__IMPALA-2592">
+
+      <h3 class="title topictitle3" id="ariaid-title7">Queries may hang on server-to-server exchange errors</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          The <code class="ph codeph">DataStreamSender::Channel::CloseInternal()</code> does not close the channel on an error. This causes the node on
+          the other side of the channel to wait indefinitely, causing a hang.
+        </p>
+
+        <p class="p">
+          <strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2592" target="_blank">IMPALA-2592</a>
+        </p>
+
+        <p class="p">
+          <strong class="ph b">Resolution:</strong> Fixed in <span class="keyword">Impala 2.5.0</span>.
+        </p>
+
+      </div>
+
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title8" id="known_issues_crash__IMPALA-2365">
+
+      <h3 class="title topictitle3" id="ariaid-title8">Impalad is crashing if udf jar is not available in hdfs location for first time</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          If the JAR file corresponding to a Java UDF is removed from HDFS after the Impala <code class="ph codeph">CREATE FUNCTION</code> statement is
+          issued, the <span class="keyword cmdname">impalad</span> daemon crashes.
+        </p>
+
+        <p class="p">
+          <strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2365" target="_blank">IMPALA-2365</a>
+        </p>
+
+        <p class="p"><strong class="ph b">Resolution:</strong> Fixed in <span class="keyword">Impala 2.5.0</span>.</p>
+
+      </div>
+
+    </article>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="known_issues_performance__ki_performance" id="known_issues__known_issues_performance">
+
+    <h2 class="title topictitle2" id="known_issues_performance__ki_performance">Impala Known Issues: Performance</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        These issues involve the performance of operations such as queries or DDL statements.
+      </p>
+
+    </div>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title10" id="known_issues_performance__IMPALA-1480">
+
+
+
+      <h3 class="title topictitle3" id="ariaid-title10">Slow DDL statements for tables with large number of partitions</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          DDL statements for tables with a large number of partitions might be slow.
+        </p>
+
+        <p class="p">
+          <strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-1480" target="_blank">IMPALA-1480</a>
+        </p>
+
+        <p class="p">
+          <strong class="ph b">Workaround:</strong> Run the DDL statement in Hive if the slowness is an issue.
+        </p>
+
+        <p class="p"><strong class="ph b">Resolution:</strong> Fixed in <span class="keyword">Impala 2.5.0</span>.</p>
+
+      </div>
+
+    </article>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="known_issues_usability__ki_usability" id="known_issues__known_issues_usability">
+
+    <h2 class="title topictitle2" id="known_issues_usability__ki_usability">Impala Known Issues: Usability</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        These issues affect the convenience of interacting directly with Impala, typically through the Impala shell or Hue.
+      </p>
+
+    </div>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title12" id="known_issues_usability__IMPALA-3133">
+
+      <h3 class="title topictitle3" id="ariaid-title12">Unexpected privileges in show output</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          Due to a timing condition in updating cached policy data from Sentry, the <code class="ph codeph">SHOW</code> statements for Sentry roles could
+          sometimes display out-of-date role settings. Because Impala rechecks authorization for each SQL statement, this discrepancy does
+          not represent a security issue for other statements.
+        </p>
+
+        <p class="p">
+          <strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-3133" target="_blank">IMPALA-3133</a>
+        </p>
+
+        <p class="p">
+          <strong class="ph b">Severity:</strong> High
+        </p>
+
+        <p class="p">
+          <strong class="ph b">Resolution:</strong> Fixes have been issued for some but not all Impala releases. Check the JIRA for details of fix releases.
+        </p>
+
+        <p class="p"><strong class="ph b">Resolution:</strong> Fixed in <span class="keyword">Impala 2.6.0</span> and <span class="keyword">Impala 2.5.1</span>.</p>
+
+      </div>
+
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title13" id="known_issues_usability__IMPALA-1776">
+
+      <h3 class="title topictitle3" id="ariaid-title13">Less than 100% progress on completed simple SELECT queries</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          Simple <code class="ph codeph">SELECT</code> queries show less than 100% progress even though they are already completed.
+        </p>
+
+        <p class="p">
+          <strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-1776" target="_blank">IMPALA-1776</a>
+        </p>
+
+      </div>
+
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title14" id="known_issues_usability__concept_lmx_dk5_lx">
+
+      <h3 class="title topictitle3" id="ariaid-title14">Unexpected column overflow behavior with INT datatypes</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+        Impala does not return column overflows as <code class="ph codeph">NULL</code>, so that customers can distinguish
+        between <code class="ph codeph">NULL</code> data and overflow conditions similar to how they do so with traditional
+        database systems. Impala returns the largest or smallest value in the range for the type. For example,
+        valid values for a <code class="ph codeph">tinyint</code> range from -128 to 127. In Impala, a <code class="ph codeph">tinyint</code>
+        with a value of -200 returns -128 rather than <code class="ph codeph">NULL</code>. A <code class="ph codeph">tinyint</code> with a
+        value of 200 returns 127.
+      </p>
+
+        <p class="p">
+          <strong class="ph b">Bug:</strong>
+          <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-3123" target="_blank">IMPALA-3123</a>
+        </p>
+
+      </div>
+
+    </article>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="known_issues_drivers__ki_drivers" id="known_issues__known_issues_drivers">
+
+    <h2 class="title topictitle2" id="known_issues_drivers__ki_drivers">Impala Known Issues: JDBC and ODBC Drivers</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        These issues affect applications that use the JDBC or ODBC APIs, such as business intelligence tools or custom-written applications
+        in languages such as Java or C++.
+      </p>
+
+    </div>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title16" id="known_issues_drivers__IMPALA-1792">
+
+
+
+      <h3 class="title topictitle3" id="ariaid-title16">ImpalaODBC: Can not get the value in the SQLGetData(m-x th column) after the SQLBindCol(m th column)</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          If the ODBC <code class="ph codeph">SQLGetData</code> is called on a series of columns, the function calls must follow the same order as the
+          columns. For example, if data is fetched from column 2 then column 1, the <code class="ph codeph">SQLGetData</code> call for column 1 returns
+          <code class="ph codeph">NULL</code>.
+        </p>
+
+        <p class="p">
+          <strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-1792" target="_blank">IMPALA-1792</a>
+        </p>
+
+        <p class="p">
+          <strong class="ph b">Workaround:</strong> Fetch columns in the same order they are defined in the table.
+        </p>
+
+      </div>
+
+    </article>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="known_issues_security__ki_security" id="known_issues__known_issues_security">
+
+    <h2 class="title topictitle2" id="known_issues_security__ki_security">Impala Known Issues: Security</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        These issues relate to security features, such as Kerberos authentication, Sentry authorization, encryption, auditing, and
+        redaction.
+      </p>
+
+    </div>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title18" id="known_issues_security__renewable_kerberos_tickets">
+
+
+
+      <h3 class="title topictitle3" id="ariaid-title18">Kerberos tickets must be renewable</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          In a Kerberos environment, the <span class="keyword cmdname">impalad</span> daemon might not start if Kerberos tickets are not renewable.
+        </p>
+
+        <p class="p">
+          <strong class="ph b">Workaround:</strong> Configure your KDC to allow tickets to be renewed, and configure <span class="ph filepath">krb5.conf</span> to request
+          renewable tickets.
+        </p>
+
+      </div>
+
+    </article>
+
+
+
+  </article>
+
+
+
+  <article class="topic concept nested1" aria-labelledby="known_issues_resources__ki_resources" id="known_issues__known_issues_resources">
+
+    <h2 class="title topictitle2" id="known_issues_resources__ki_resources">Impala Known Issues: Resources</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        These issues involve memory or disk usage, including out-of-memory conditions, the spill-to-disk feature, and resource management
+        features.
+      </p>
+
+    </div>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title20" id="known_issues_resources__catalogd_heap">
+
+      <h3 class="title topictitle3" id="ariaid-title20">Impala catalogd heap issues when upgrading to <span class="keyword">Impala 2.5</span></h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          The default heap size for Impala <span class="keyword cmdname">catalogd</span> has changed in <span class="keyword">Impala 2.5</span> and higher:
+        </p>
+
+        <ul class="ul">
+          <li class="li">
+            <p class="p">
+              Previously, by default <span class="keyword cmdname">catalogd</span> was using the JVM's default heap size, which is the smaller of 1/4th of the
+              physical memory or 32 GB.
+            </p>
+          </li>
+
+          <li class="li">
+            <p class="p">
+              Starting with <span class="keyword">Impala 2.5.0</span>, the default <span class="keyword cmdname">catalogd</span> heap size is 4 GB.
+            </p>
+          </li>
+        </ul>
+
+        <p class="p">
+          For example, on a host with 128GB physical memory this will result in catalogd heap decreasing from 32GB to 4GB. This can result
+          in out-of-memory errors in catalogd and leading to query failures.
+        </p>
+
+        <p class="p">
+          <strong class="ph b">Severity:</strong> High
+        </p>
+
+        <p class="p">
+          <strong class="ph b">Workaround:</strong> Increase the <span class="keyword cmdname">catalogd</span> memory limit as follows.
+
+
+        </p>
+
+        <div class="p">
+        For schemas with large numbers of tables, partitions, and data files, the <span class="keyword cmdname">catalogd</span>
+        daemon might encounter an out-of-memory error. To increase the memory limit for the
+        <span class="keyword cmdname">catalogd</span> daemon:
+
+        <ol class="ol">
+          <li class="li">
+            <p class="p">
+              Check current memory usage for the <span class="keyword cmdname">catalogd</span> daemon by running the
+              following commands on the host where that daemon runs on your cluster:
+            </p>
+  <pre class="pre codeblock"><code>
+  jcmd <var class="keyword varname">catalogd_pid</var> VM.flags
+  jmap -heap <var class="keyword varname">catalogd_pid</var>
+  </code></pre>
+          </li>
+          <li class="li">
+            <p class="p">
+              Decide on a large enough value for the <span class="keyword cmdname">catalogd</span> heap.
+              You express it as an environment variable value as follows:
+            </p>
+  <pre class="pre codeblock"><code>
+  JAVA_TOOL_OPTIONS="-Xmx8g"
+  </code></pre>
+          </li>
+          <li class="li">
+            <p class="p">
+              On systems not using cluster management software, put this environment variable setting into the
+              startup script for the <span class="keyword cmdname">catalogd</span> daemon, then restart the <span class="keyword cmdname">catalogd</span>
+              daemon.
+            </p>
+          </li>
+          <li class="li">
+            <p class="p">
+              Use the same <span class="keyword cmdname">jcmd</span> and <span class="keyword cmdname">jmap</span> commands as earlier to
+              verify that the new settings are in effect.
+            </p>
+          </li>
+        </ol>
+      </div>
+
+      </div>
+
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title21" id="known_issues_resources__IMPALA-3509">
+
+      <h3 class="title topictitle3" id="ariaid-title21">Breakpad minidumps can be very large when the thread count is high</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          The size of the breakpad minidump files grows linearly with the number of threads. By default, each thread adds 8 KB to the
+          minidump size. Minidump files could consume significant disk space when the daemons have a high number of threads.
+        </p>
+
+        <p class="p">
+          <strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-3509" target="_blank">IMPALA-3509</a>
+        </p>
+
+        <p class="p">
+          <strong class="ph b">Severity:</strong> High
+        </p>
+
+        <p class="p">
+          <strong class="ph b">Workaround:</strong> Add <code class="ph codeph">--minidump_size_limit_hint_kb=<var class="keyword varname">size</var></code> to set a soft upper limit on the
+          size of each minidump file. If the minidump file would exceed that limit, Impala reduces the amount of information for each thread
+          from 8 KB to 2 KB. (Full thread information is captured for the first 20 threads, then 2 KB per thread after that.) The minidump
+          file can still grow larger than the <span class="q">"hinted"</span> size. For example, if you have 10,000 threads, the minidump file can be more
+          than 20 MB.
+        </p>
+
+      </div>
+
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title22" id="known_issues_resources__IMPALA-3662">
+
+      <h3 class="title topictitle3" id="ariaid-title22">Parquet scanner memory increase after IMPALA-2736</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          The initial release of <span class="keyword">Impala 2.6</span> sometimes has a higher peak memory usage than in previous releases while reading
+          Parquet files.
+        </p>
+
+        <div class="p">
+          <span class="keyword">Impala 2.6</span> addresses the issue IMPALA-2736, which improves the efficiency of Parquet scans by up to 2x. The faster scans
+          may result in a higher peak memory consumption compared to earlier versions of Impala due to the new column-wise row
+          materialization strategy. You are likely to experience higher memory consumption in any of the following scenarios:
+          <ul class="ul">
+            <li class="li">
+              <p class="p">
+                Very wide rows due to projecting many columns in a scan.
+              </p>
+            </li>
+
+            <li class="li">
+              <p class="p">
+                Very large rows due to big column values, for example, long strings or nested collections with many items.
+              </p>
+            </li>
+
+            <li class="li">
+              <p class="p">
+                Producer/consumer speed imbalances, leading to more rows being buffered between a scan (producer) and downstream (consumer)
+                plan nodes.
+              </p>
+            </li>
+          </ul>
+        </div>
+
+        <p class="p">
+          <strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-3662" target="_blank">IMPALA-3662</a>
+        </p>
+
+        <p class="p">
+          <strong class="ph b">Severity:</strong> High
+        </p>
+
+        <div class="p">
+          <strong class="ph b">Workaround:</strong> The following query options might help to reduce memory consumption in the Parquet scanner:
+          <ul class="ul">
+            <li class="li">
+              Reduce the number of scanner threads, for example: <code class="ph codeph">set num_scanner_threads=30</code>
+            </li>
+
+            <li class="li">
+              Reduce the batch size, for example: <code class="ph codeph">set batch_size=512</code>
+            </li>
+
+            <li class="li">
+              Increase the memory limit, for example: <code class="ph codeph">set mem_limit=64g</code>
+            </li>
+          </ul>
+        </div>
+
+      </div>
+
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title23" id="known_issues_resources__IMPALA-691">
+
+      <h3 class="title topictitle3" id="ariaid-title23">Process mem limit does not account for the JVM's memory usage</h3>
+
+
+
+      <div class="body conbody">
+
+        <p class="p">
+          Some memory allocated by the JVM used internally by Impala is not counted against the memory limit for the
+          <span class="keyword cmdname">impalad</span> daemon.
+        </p>
+
+        <p class="p">
+          <strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-691" target="_blank">IMPALA-691</a>
+        </p>
+
+        <p class="p">
+          <strong class="ph b">Workaround:</strong> To monitor overall memory usage, use the <span class="keyword cmdname">top</span> command, or add the memory figures in the
+          Impala web UI <span class="ph uicontrol">/memz</span> tab to JVM memory usage shown on the <span class="ph uicontrol">/metrics</span> tab.
+        </p>
+
+      </div>
+
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title24" id="known_issues_resources__IMPALA-2375">
+
+
+
+      <h3 class="title topictitle3" id="ariaid-title24">Fix issues with the legacy join and agg nodes using --enable_partitioned_hash_join=false and --enable_partitioned_aggregation=false</h3>
+
+      <div class="body conbody">
+
+        <p class="p"></p>
+
+        <p class="p">
+          <strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2375" target="_blank">IMPALA-2375</a>
+        </p>
+
+        <p class="p">
+          <strong class="ph b">Workaround:</strong> Transition away from the <span class="q">"old-style"</span> join and aggregation mechanism if practical.
+        </p>
+
+        <p class="p"><strong class="ph b">Resolution:</strong> Fixed in <span class="keyword">Impala 2.5.0</span>.</p>
+
+      </div>
+
+    </article>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="known_issues_correctness__ki_correctness" id="known_issues__known_issues_correctness">
+
+    <h2 class="title topictitle2" id="known_issues_correctness__ki_correctness">Impala Known Issues: Correctness</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        These issues can cause incorrect or unexpected results from queries. They typically only arise in very specific circumstances.
+      </p>
+
+    </div>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title26" id="known_issues_correctness__IMPALA-3084">
+
+      <h3 class="title topictitle3" id="ariaid-title26">Incorrect assignment of NULL checking predicate through an outer join of a nested collection.</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          A query could return wrong results (too many or too few <code class="ph codeph">NULL</code> values) if it referenced an outer-joined nested
+          collection and also contained a null-checking predicate (<code class="ph codeph">IS NULL</code>, <code class="ph codeph">IS NOT NULL</code>, or the
+          <code class="ph codeph">&lt;=&gt;</code> operator) in the <code class="ph codeph">WHERE</code> clause.
+        </p>
+
+        <p class="p">
+          <strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-3084" target="_blank">IMPALA-3084</a>
+        </p>
+
+        <p class="p">
+          <strong class="ph b">Severity:</strong> High
+        </p>
+
+        <p class="p"><strong class="ph b">Resolution:</strong> Fixed in <span class="keyword">Impala 2.7.0</span>.</p>
+
+      </div>
+
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title27" id="known_issues_correctness__IMPALA-3094">
+
+      <h3 class="title topictitle3" id="ariaid-title27">Incorrect result due to constant evaluation in query with outer join</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          An <code class="ph codeph">OUTER JOIN</code> query could omit some expected result rows due to a constant such as <code class="ph codeph">FALSE</code> in
+          another join clause. For example:
+        </p>
+
+<pre class="pre codeblock"><code>
+explain SELECT 1 FROM alltypestiny a1
+  INNER JOIN alltypesagg a2 ON a1.smallint_col = a2.year AND false
+  RIGHT JOIN alltypes a3 ON a1.year = a1.bigint_col;
++---------------------------------------------------------+
+| Explain String                                          |
++---------------------------------------------------------+
+| Estimated Per-Host Requirements: Memory=1.00KB VCores=1 |
+|                                                         |
+| 00:EMPTYSET                                             |
++---------------------------------------------------------+
+
+</code></pre>
+
+        <p class="p">
+          <strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-3094" target="_blank">IMPALA-3094</a>
+        </p>
+
+        <p class="p">
+          <strong class="ph b">Severity:</strong> High
+        </p>
+
+        <p class="p">
+          <strong class="ph b">Resolution:</strong>
+        </p>
+
+        <p class="p">
+          <strong class="ph b">Workaround:</strong>
+        </p>
+
+      </div>
+
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title28" id="known_issues_correctness__IMPALA-3126">
+
+      <h3 class="title topictitle3" id="ariaid-title28">Incorrect assignment of an inner join On-clause predicate through an outer join.</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          Impala may return incorrect results for queries that have the following properties:
+        </p>
+
+        <ul class="ul">
+          <li class="li">
+            <p class="p">
+              There is an INNER JOIN following a series of OUTER JOINs.
+            </p>
+          </li>
+
+          <li class="li">
+            <p class="p">
+              The INNER JOIN has an On-clause with a predicate that references at least two tables that are on the nullable side of the
+              preceding OUTER JOINs.
+            </p>
+          </li>
+        </ul>
+
+        <p class="p">
+          The following query demonstrates the issue:
+        </p>
+
+<pre class="pre codeblock"><code>
+select 1 from functional.alltypes a left outer join
+  functional.alltypes b on a.id = b.id left outer join
+  functional.alltypes c on b.id = c.id right outer join
+  functional.alltypes d on c.id = d.id inner join functional.alltypes e
+on b.int_col = c.int_col;
+</code></pre>
+
+        <p class="p">
+          The following listing shows the incorrect <code class="ph codeph">EXPLAIN</code> plan:
+        </p>
+
+<pre class="pre codeblock"><code>
++-----------------------------------------------------------+
+| Explain String                                            |
++-----------------------------------------------------------+
+| Estimated Per-Host Requirements: Memory=480.04MB VCores=4 |
+|                                                           |
+| 14:EXCHANGE [UNPARTITIONED]                               |
+| |                                                         |
+| 08:NESTED LOOP JOIN [CROSS JOIN, BROADCAST]               |
+| |                                                         |
+| |--13:EXCHANGE [BROADCAST]                                |
+| |  |                                                      |
+| |  04:SCAN HDFS [functional.alltypes e]                   |
+| |     partitions=24/24 files=24 size=478.45KB             |
+| |                                                         |
+| 07:HASH JOIN [RIGHT OUTER JOIN, PARTITIONED]              |
+| |  hash predicates: c.id = d.id                           |
+| |  runtime filters: RF000 &lt;- d.id                         |
+| |                                                         |
+| |--12:EXCHANGE [HASH(d.id)]                               |
+| |  |                                                      |
+| |  03:SCAN HDFS [functional.alltypes d]                   |
+| |     partitions=24/24 files=24 size=478.45KB             |
+| |                                                         |
+| 06:HASH JOIN [LEFT OUTER JOIN, PARTITIONED]               |
+| |  hash predicates: b.id = c.id                           |
+| |  other predicates: b.int_col = c.int_col     &lt;--- incorrect placement; should be at node 07 or 08
+| |  runtime filters: RF001 &lt;- c.int_col                    |
+| |                                                         |
+| |--11:EXCHANGE [HASH(c.id)]                               |
+| |  |                                                      |
+| |  02:SCAN HDFS [functional.alltypes c]                   |
+| |     partitions=24/24 files=24 size=478.45KB             |
+| |     runtime filters: RF000 -&gt; c.id                      |
+| |                                                         |
+| 05:HASH JOIN [RIGHT OUTER JOIN, PARTITIONED]              |
+| |  hash predicates: b.id = a.id                           |
+| |  runtime filters: RF002 &lt;- a.id                         |
+| |                                                         |
+| |--10:EXCHANGE [HASH(a.id)]                               |
+| |  |                                                      |
+| |  00:SCAN HDFS [functional.alltypes a]                   |
+| |     partitions=24/24 files=24 size=478.45KB             |
+| |                                                         |
+| 09:EXCHANGE [HASH(b.id)]                                  |
+| |                                                         |
+| 01:SCAN HDFS [functional.alltypes b]                      |
+|    partitions=24/24 files=24 size=478.45KB                |
+|    runtime filters: RF001 -&gt; b.int_col, RF002 -&gt; b.id     |
++-----------------------------------------------------------+
+
+</code></pre>
+
+        <p class="p">
+          <strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-3126" target="_blank">IMPALA-3126</a>
+        </p>
+
+        <p class="p">
+          <strong class="ph b">Severity:</strong> High
+        </p>
+
+        <p class="p">
+          <strong class="ph b">Workaround:</strong> High
+        </p>
+
+        <p class="p">
+          For some queries, this problem can be worked around by placing the problematic <code class="ph codeph">ON</code> clause predicate in the
+          <code class="ph codeph">WHERE</code> clause instead, or changing the preceding <code class="ph codeph">OUTER JOIN</code>s to <code class="ph codeph">INNER JOIN</code>s (if
+          the <code class="ph codeph">ON</code> clause predicate would discard <code class="ph codeph">NULL</code>s). For example, to fix the problematic query above:
+        </p>
+
+<pre class="pre codeblock"><code>
+select 1 from functional.alltypes a
+  left outer join functional.alltypes b
+    on a.id = b.id
+  left outer join functional.alltypes c
+    on b.id = c.id
+  right outer join functional.alltypes d
+    on c.id = d.id
+  inner join functional.alltypes e
+where b.int_col = c.int_col
+
++-----------------------------------------------------------+
+| Explain String                                            |
++-----------------------------------------------------------+
+| Estimated Per-Host Requirements: Memory=480.04MB VCores=4 |
+|                                                           |
+| 14:EXCHANGE [UNPARTITIONED]                               |
+| |                                                         |
+| 08:NESTED LOOP JOIN [CROSS JOIN, BROADCAST]               |
+| |                                                         |
+| |--13:EXCHANGE [BROADCAST]                                |
+| |  |                                                      |
+| |  04:SCAN HDFS [functional.alltypes e]                   |
+| |     partitions=24/24 files=24 size=478.45KB             |
+| |                                                         |
+| 07:HASH JOIN [RIGHT OUTER JOIN, PARTITIONED]              |
+| |  hash predicates: c.id = d.id                           |
+| |  other predicates: b.int_col = c.int_col          &lt;-- correct assignment
+| |  runtime filters: RF000 &lt;- d.id                         |
+| |                                                         |
+| |--12:EXCHANGE [HASH(d.id)]                               |
+| |  |                                                      |
+| |  03:SCAN HDFS [functional.alltypes d]                   |
+| |     partitions=24/24 files=24 size=478.45KB             |
+| |                                                         |
+| 06:HASH JOIN [LEFT OUTER JOIN, PARTITIONED]               |
+| |  hash predicates: b.id = c.id                           |
+| |                                                         |
+| |--11:EXCHANGE [HASH(c.id)]                               |
+| |  |                                                      |
+| |  02:SCAN HDFS [functional.alltypes c]                   |
+| |     partitions=24/24 files=24 size=478.45KB             |
+| |     runtime filters: RF000 -&gt; c.id                      |
+| |                                                         |
+| 05:HASH JOIN [RIGHT OUTER JOIN, PARTITIONED]              |
+| |  hash predicates: b.id = a.id                           |
+| |  runtime filters: RF001 &lt;- a.id                         |
+| |                                                         |
+| |--10:EXCHANGE [HASH(a.id)]                               |
+| |  |                                                      |
+| |  00:SCAN HDFS [functional.alltypes a]                   |
+| |     partitions=24/24 files=24 size=478.45KB             |
+| |                                                         |
+| 09:EXCHANGE [HASH(b.id)]                                  |
+| |                                                         |
+| 01:SCAN HDFS [functional.alltypes b]                      |
+|    partitions=24/24 files=24 size=478.45KB                |
+|    runtime filters: RF001 -&gt; b.id                         |
++-----------------------------------------------------------+
+
+</code></pre>
+
+      </div>
+
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title29" id="known_issues_correctness__IMPALA-3006">
+
+      <h3 class="title topictitle3" id="ariaid-title29">Impala may use incorrect bit order with BIT_PACKED encoding</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          Parquet <code class="ph codeph">BIT_PACKED</code> encoding as implemented by Impala is LSB first. The parquet standard says it is MSB first.
+        </p>
+
+        <p class="p">
+          <strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-3006" target="_blank">IMPALA-3006</a>
+        </p>
+
+        <p class="p">
+          <strong class="ph b">Severity:</strong> High, but rare in practice because BIT_PACKED is infrequently used, is not written by Impala, and is deprecated
+          in Parquet 2.0.
+        </p>
+
+      </div>
+
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title30" id="known_issues_correctness__IMPALA-3082">
+
+      <h3 class="title topictitle3" id="ariaid-title30">BST between 1972 and 1995</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          The calculation of start and end times for the BST (British Summer Time) time zone could be incorrect between 1972 and 1995.
+          Between 1972 and 1995, BST began and ended at 02:00 GMT on the third Sunday in March (or second Sunday when Easter fell on the
+          third) and fourth Sunday in October. For example, both function calls should return 13, but actually return 12, in a query such
+          as:
+        </p>
+
+<pre class="pre codeblock"><code>
+select
+  extract(from_utc_timestamp(cast('1970-01-01 12:00:00' as timestamp), 'Europe/London'), "hour") summer70start,
+  extract(from_utc_timestamp(cast('1970-12-31 12:00:00' as timestamp), 'Europe/London'), "hour") summer70end;
+</code></pre>
+
+        <p class="p">
+          <strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-3082" target="_blank">IMPALA-3082</a>
+        </p>
+
+        <p class="p">
+          <strong class="ph b">Severity:</strong> High
+        </p>
+
+      </div>
+
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title31" id="known_issues_correctness__IMPALA-1170">
+
+      <h3 class="title topictitle3" id="ariaid-title31">parse_url() returns incorrect result if @ character in URL</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          If a URL contains an <code class="ph codeph">@</code> character, the <code class="ph codeph">parse_url()</code> function could return an incorrect value for
+          the hostname field.
+        </p>
+
+        <p class="p">
+          <strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-1170" target="_blank">https://issues.apache.org/jira/browse/IMPALA-1170</a>IMPALA-1170
+        </p>
+
+        <p class="p"><strong class="ph b">Resolution:</strong> Fixed in <span class="keyword">Impala 2.5.0</span> and <span class="keyword">Impala 2.3.4</span>.</p>
+
+      </div>
+
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title32" id="known_issues_correctness__IMPALA-2422">
+
+      <h3 class="title topictitle3" id="ariaid-title32">% escaping does not work correctly when occurs at the end in a LIKE clause</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          If the final character in the RHS argument of a <code class="ph codeph">LIKE</code> operator is an escaped <code class="ph codeph">\%</code> character, it
+          does not match a <code class="ph codeph">%</code> final character of the LHS argument.
+        </p>
+
+        <p class="p">
+          <strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2422" target="_blank">IMPALA-2422</a>
+        </p>
+
+      </div>
+
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title33" id="known_issues_correctness__IMPALA-397">
+
+      <h3 class="title topictitle3" id="ariaid-title33">ORDER BY rand() does not work.</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          Because the value for <code class="ph codeph">rand()</code> is computed early in a query, using an <code class="ph codeph">ORDER BY</code> expression
+          involving a call to <code class="ph codeph">rand()</code> does not actually randomize the results.
+        </p>
+
+        <p class="p">
+          <strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-397" target="_blank">IMPALA-397</a>
+        </p>
+
+      </div>
+
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title34" id="known_issues_correctness__IMPALA-2643">
+
+      <h3 class="title topictitle3" id="ariaid-title34">Duplicated column in inline view causes dropping null slots during scan</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          If the same column is queried twice within a view, <code class="ph codeph">NULL</code> values for that column are omitted. For example, the
+          result of <code class="ph codeph">COUNT(*)</code> on the view could be less than expected.
+        </p>
+
+        <p class="p">
+          <strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2643" target="_blank">IMPALA-2643</a>
+        </p>
+
+        <p class="p">
+          <strong class="ph b">Workaround:</strong> Avoid selecting the same column twice within an inline view.
+        </p>
+
+        <p class="p"><strong class="ph b">Resolution:</strong> Fixed in <span class="keyword">Impala 2.5.0</span>, <span class="keyword">Impala 2.3.2</span>, and <span class="keyword">Impala 2.2.10</span>.</p>
+
+      </div>
+
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title35" id="known_issues_correctness__IMPALA-1459">
+
+
+
+      <h3 class="title topictitle3" id="ariaid-title35">Incorrect assignment of predicates through an outer join in an inline view.</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          A query involving an <code class="ph codeph">OUTER JOIN</code> clause where one of the table references is an inline view might apply predicates
+          from the <code class="ph codeph">ON</code> clause incorrectly.
+        </p>
+
+        <p class="p">
+          <strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-1459" target="_blank">IMPALA-1459</a>
+        </p>
+
+        <p class="p"><strong class="ph b">Resolution:</strong> Fixed in <span class="keyword">Impala 2.5.0</span>, <span class="keyword">Impala 2.3.2</span>, and <span class="keyword">Impala 2.2.9</span>.</p>
+
+      </div>
+
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title36" id="known_issues_correctness__IMPALA-2603">
+
+      <h3 class="title topictitle3" id="ariaid-title36">Crash: impala::Coordinator::ValidateCollectionSlots</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          A query could encounter a serious error if includes multiple nested levels of <code class="ph codeph">INNER JOIN</code> clauses involving
+          subqueries.
+        </p>
+
+        <p class="p">
+          <strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2603" target="_blank">IMPALA-2603</a>
+        </p>
+
+      </div>
+
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title37" id="known_issues_correctness__IMPALA-2665">
+
+      <h3 class="title topictitle3" id="ariaid-title37">Incorrect assignment of On-clause predicate inside inline view with an outer join.</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          A query might return incorrect results due to wrong predicate assignment in the following scenario:
+        </p>
+
+        <ol class="ol">
+          <li class="li">
+            There is an inline view that contains an outer join
+          </li>
+
+          <li class="li">
+            That inline view is joined with another table in the enclosing query block
+          </li>
+
+          <li class="li">
+            That join has an On-clause containing a predicate that only references columns originating from the outer-joined tables inside
+            the inline view
+          </li>
+        </ol>
+
+        <p class="p">
+          <strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2665" target="_blank">IMPALA-2665</a>
+        </p>
+
+        <p class="p"><strong class="ph b">Resolution:</strong> Fixed in <span class="keyword">Impala 2.5.0</span>, <span class="keyword">Impala 2.3.2</span>, and <span class="keyword">Impala 2.2.9</span>.</p>
+
+      </div>
+
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title38" id="known_issues_correctness__IMPALA-2144">
+
+      <h3 class="title topictitle3" id="ariaid-title38">Wrong assignment of having clause predicate across outer join</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          In an <code class="ph codeph">OUTER JOIN</code> query with a <code class="ph codeph">HAVING</code> clause, the comparison from the <code class="ph codeph">HAVING</code>
+          clause might be applied at the wrong stage of query processing, leading to incorrect results.
+        </p>
+
+        <p class="p">
+          <strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2144" target="_blank">IMPALA-2144</a>
+        </p>
+
+        <p class="p"><strong class="ph b">Resolution:</strong> Fixed in <span class="keyword">Impala 2.5.0</span>.</p>
+
+      </div>
+
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title39" id="known_issues_correctness__IMPALA-2093">
+
+      <h3 class="title topictitle3" id="ariaid-title39">Wrong plan of NOT IN aggregate subquery when a constant is used in subquery predicate</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          A <code class="ph codeph">NOT IN</code> operator with a subquery that calls an aggregate function, such as <code class="ph codeph">NOT IN (SELECT
+          SUM(...))</code>, could return incorrect results.
+        </p>
+
+        <p class="p">
+          <strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2093" target="_blank">IMPALA-2093</a>
+        </p>
+
+        <p class="p"><strong class="ph b">Resolution:</strong> Fixed in <span class="keyword">Impala 2.5.0</span> and <span class="keyword">Impala 2.3.4</span>.</p>
+
+      </div>
+
+    </article>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="known_issues_metadata__ki_metadata" id="known_issues__known_issues_metadata">
+
+    <h2 class="title topictitle2" id="known_issues_metadata__ki_metadata">Impala Known Issues: Metadata</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        These issues affect how Impala interacts with metadata. They cover areas such as the metastore database, the <code class="ph codeph">COMPUTE
+        STATS</code> statement, and the Impala <span class="keyword cmdname">catalogd</span> daemon.
+      </p>
+
+    </div>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title41" id="known_issues_metadata__IMPALA-2648">
+
+      <h3 class="title topictitle3" id="ariaid-title41">Catalogd may crash when loading metadata for tables with many partitions, many columns and with incremental stats</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          Incremental stats use up about 400 bytes per partition for each column. For example, for a table with 20K partitions and 100
+          columns, the memory overhead from incremental statistics is about 800 MB. When serialized for transmission across the network,
+          this metadata exceeds the 2 GB Java array size limit and leads to a <code class="ph codeph">catalogd</code> crash.
+        </p>
+
+        <p class="p">
+          <strong class="ph b">Bugs:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2647" target="_blank">IMPALA-2647</a>,
+          <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2648" target="_blank">IMPALA-2648</a>,
+          <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2649" target="_blank">IMPALA-2649</a>
+        </p>
+
+        <p class="p">
+          <strong class="ph b">Workaround:</strong> If feasible, compute full stats periodically and avoid computing incremental stats for that table. The
+          scalability of incremental stats computation is a continuing work item.
+        </p>
+
+      </div>
+
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title42" id="known_issues_metadata__IMPALA-1420">
+
+
+
+      <h3 class="title topictitle3" id="ariaid-title42">Can't update stats manually via alter table after upgrading to <span class="keyword">Impala 2.0</span></h3>
+
+      <div class="body conbody">
+
+        <p class="p"></p>
+
+        <p class="p">
+          <strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-1420" target="_blank">IMPALA-1420</a>
+        </p>
+
+        <p class="p">
+          <strong class="ph b">Workaround:</strong> On <span class="keyword">Impala 2.0</span>, when adjusting table statistics manually by setting the <code class="ph codeph">numRows</code>, you must also
+          enable the Boolean property <code class="ph codeph">STATS_GENERATED_VIA_STATS_TASK</code>. For example, use a statement like the following to
+          set both properties with a single <code class="ph codeph">ALTER TABLE</code> statement:
+        </p>
+
+<pre class="pre codeblock"><code>ALTER TABLE <var class="keyword varname">table_name</var> SET TBLPROPERTIES('numRows'='<var class="keyword varname">new_value</var>', 'STATS_GENERATED_VIA_STATS_TASK' = 'true');</code></pre>
+
+        <p class="p">
+          <strong class="ph b">Resolution:</strong> The underlying cause is the issue
+          <a class="xref" href="https://issues.apache.org/jira/browse/HIVE-8648" target="_blank">HIVE-8648</a> that affects the
+          metastore in Hive 0.13. The workaround is only needed until the fix for this issue is incorporated into release of <span class="keyword"></span>.
+        </p>
+
+      </div>
+
+    </article>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="known_issues_interop__ki_interop" id="known_issues__known_issues_interop">
+
+    <h2 class="title topictitle2" id="known_issues_interop__ki_interop">Impala Known Issues: Interoperability</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        These issues affect the ability to interchange data between Impala and other database systems. They cover areas such as data types
+        and file formats.
+      </p>
+
+    </div>
+
+
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title44" id="known_issues_interop__describe_formatted_avro">
+
+      <h3 class="title topictitle3" id="ariaid-title44">DESCRIBE FORMATTED gives error on Avro table</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          This issue can occur either on old Avro tables (created prior to Hive 1.1) or when changing the Avro schema file by
+          adding or removing columns. Columns added to the schema file will not show up in the output of the <code class="ph codeph">DESCRIBE
+          FORMATTED</code> command. Removing columns from the schema file will trigger a <code class="ph codeph">NullPointerException</code>.
+        </p>
+
+        <p class="p">
+          As a workaround, you can use the output of <code class="ph codeph">SHOW CREATE TABLE</code> to drop and recreate the table. This will populate
+          the Hive metastore database with the correct column definitions.
+        </p>
+
+        <div class="note warning note_warning"><span class="note__title warningtitle">Warning:</span> 
+          Only use this for external tables, or Impala will remove the data files. In case of an internal table, set it to external first:
+<pre class="pre codeblock"><code>
+ALTER TABLE table_name SET TBLPROPERTIES('EXTERNAL'='TRUE');
+</code></pre>
+          (The part in parentheses is case sensitive.) Make sure to pick the right choice between internal and external when recreating the
+          table. See <a class="xref" href="impala_tables.html#tables">Overview of Impala Tables</a> for the differences between internal and external tables.
+        </div>
+
+        <p class="p">
+          <strong class="ph b">Severity:</strong> High
+        </p>
+
+      </div>
+
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title45" id="known_issues_interop__IMP-469">
+
+
+
+      <h3 class="title topictitle3" id="ariaid-title45">Deviation from Hive behavior: Impala does not do implicit casts between string and numeric and boolean types.</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          <strong class="ph b">Anticipated Resolution</strong>: None
+        </p>
+
+        <p class="p">
+          <strong class="ph b">Workaround:</strong> Use explicit casts.
+        </p>
+
+      </div>
+
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title46" id="known_issues_interop__IMP-175">
+
+
+
+      <h3 class="title topictitle3" id="ariaid-title46">Deviation from Hive behavior: Out of range values float/double values are returned as maximum allowed value of type (Hive returns NULL)</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          Impala behavior differs from Hive with respect to out of range float/double values. Out of range values are returned as maximum
+          allowed value of type (Hive returns NULL).
+        </p>
+
+        <p class="p">
+          <strong class="ph b">Workaround:</strong> None
+        </p>
+
+      </div>
+
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title47" id="known_issues_interop__flume_writeformat_text">
+
+
+
+      <h3 class="title topictitle3" id="ariaid-title47">Configuration needed for Flume to be compatible with Impala</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          For compatibility with Impala, the value for the Flume HDFS Sink <code class="ph codeph">hdfs.writeFormat</code> must be set to
+          <code class="ph codeph">Text</code>, rather than its default value of <code class="ph codeph">Writable</code>. The <code class="ph codeph">hdfs.writeFormat</code> setting
+          must be changed to <code class="ph codeph">Text</code> before creating data files with Flume; otherwise, those files cannot be read by either
+          Impala or Hive.
+        </p>
+
+        <p class="p">
+          <strong class="ph b">Resolution:</strong> This information has been requested to be added to the upstream Flume documentation.
+        </p>
+
+      </div>
+
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title48" id="known_issues_interop__IMPALA-635">
+
+
+
+      <h3 class="title topictitle3" id="ariaid-title48">Avro Scanner fails to parse some schemas</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          Querying certain Avro tables could cause a crash or return no rows, even though Impala could <code class="ph codeph">DESCRIBE</code> the table.
+        </p>
+
+        <p class="p">
+          <strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-635" target="_blank">IMPALA-635</a>
+        </p>
+
+        <p class="p">
+          <strong class="ph b">Workaround:</strong> Swap the order of the fields in the schema specification. For example, <code class="ph codeph">["null", "string"]</code>
+          instead of <code class="ph codeph">["string", "null"]</code>.
+        </p>
+
+        <p class="p">
+          <strong class="ph b">Resolution:</strong> Not allowing this syntax agrees with the Avro specification, so it may still cause an error even when the
+          crashing issue is resolved.
+        </p>
+
+      </div>
+
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title49" id="known_issues_interop__IMPALA-1024">
+
+
+
+      <h3 class="title topictitle3" id="ariaid-title49">Impala BE cannot parse Avro schema that contains a trailing semi-colon</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          If an Avro table has a schema definition with a trailing semicolon, Impala encounters an error when the table is queried.
+        </p>
+
+        <p class="p">
+          <strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-1024" target="_blank">IMPALA-1024</a>
+        </p>
+
+        <p class="p">
+          <strong class="ph b">Severity:</strong> Remove trailing semicolon from the Avro schema.
+        </p>
+
+      </div>
+
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title50" id="known_issues_interop__IMPALA-2154">
+
+
+
+      <h3 class="title topictitle3" id="ariaid-title50">Fix decompressor to allow parsing gzips with multiple streams</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          Currently, Impala can only read gzipped files containing a single stream. If a gzipped file contains multiple concatenated
+          streams, the Impala query only processes the data from the first stream.
+        </p>
+
+        <p class="p">
+          <strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2154" target="_blank">IMPALA-2154</a>
+        </p>
+
+        <p class="p">
+          <strong class="ph b">Workaround:</strong> Use a different gzip tool to compress file to a single stream file.
+        </p>
+
+        <p class="p"><strong class="ph b">Resolution:</strong> Fixed in <span class="keyword">Impala 2.5.0</span>.</p>
+
+      </div>
+
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title51" id="known_issues_interop__IMPALA-1578">
+
+
+
+      <h3 class="title topictitle3" id="ariaid-title51">Impala incorrectly handles text data when the new line character \n\r is split between different HDFS block</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          If a carriage return / newline pair of characters in a text table is split between HDFS data blocks, Impala incorrectly processes
+          the row following the <code class="ph codeph">\n\r</code> pair twice.
+        </p>
+
+        <p class="p">
+          <strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-1578" target="_blank">IMPALA-1578</a>
+        </p>
+
+        <p class="p">
+          <strong class="ph b">Workaround:</strong> Use the Parquet format for large volumes of data where practical.
+        </p>
+
+        <p class="p"><strong class="ph b">Resolution:</strong> Fixed in <span class="keyword">Impala 2.6.0</span>.</p>
+
+      </div>
+
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title52" id="known_issues_interop__IMPALA-1862">
+
+
+
+      <h3 class="title topictitle3" id="ariaid-title52">Invalid bool value not reported as a scanner error</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          In some cases, an invalid <code class="ph codeph">BOOLEAN</code> value read from a table does not produce a warning message about the bad value.
+          The result is still <code class="ph codeph">NULL</code> as expected. Therefore, this is not a query correctness issue, but it could lead to
+          overlooking the presence of invalid data.
+        </p>
+
+        <p class="p">
+          <strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-1862" target="_blank">IMPALA-1862</a>
+        </p>
+
+      </div>
+
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title53" id="known_issues_interop__IMPALA-1652">
+
+
+
+      <h3 class="title topictitle3" id="ariaid-title53">Incorrect results with basic predicate on CHAR typed column.</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          When comparing a <code class="ph codeph">CHAR</code> column value to a string literal, the literal value is not blank-padded and so the
+          comparison might fail when it should match.
+        </p>
+
+        <p class="p">
+          <strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-1652" target="_blank">IMPALA-1652</a>
+        </p>
+
+        <p class="p">
+          <strong class="ph b">Workaround:</strong> Use the <code class="ph codeph">RPAD()</code> function to blank-pad literals compared with <code class="ph codeph">CHAR</code> columns to
+          the expected length.
+        </p>
+
+      </div>
+
+    </article>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title54" id="known_issues__known_issues_limitations">
+
+    <h2 class="title topictitle2" id="ariaid-title54">Impala Known Issues: Limitations</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        These issues are current limitations of Impala that require evaluation as you plan how to integrate Impala into your data management
+        workflow.
+      </p>
+
+    </div>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title55" id="known_issues_limitations__IMPALA-77">
+
+
+
+      <h3 class="title topictitle3" id="ariaid-title55">Impala does not support running on clusters with federated namespaces</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          Impala does not support running on clusters with federated namespaces. The <code class="ph codeph">impalad</code> process will not start on a
+          node running such a filesystem based on the <code class="ph codeph">org.apache.hadoop.fs.viewfs.ViewFs</code> class.
+        </p>
+
+        <p class="p">
+          <strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-77" target="_blank">IMPALA-77</a>
+        </p>
+
+        <p class="p">
+          <strong class="ph b">Anticipated Resolution:</strong> Limitation
+        </p>
+
+        <p class="p">
+          <strong class="ph b">Workaround:</strong> Use standard HDFS on all Impala nodes.
+        </p>
+
+      </div>
+
+    </article>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title56" id="known_issues__known_issues_misc">
+
+    <h2 class="title topictitle2" id="ariaid-title56">Impala Known Issues: Miscellaneous / Older Issues</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        These issues do not fall into one of the above categories or have not been categorized yet.
+      </p>
+
+    </div>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title57" id="known_issues_misc__IMPALA-2005">
+
+
+
+      <h3 class="title topictitle3" id="ariaid-title57">A failed CTAS does not drop the table if the insert fails.</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          If a <code class="ph codeph">CREATE TABLE AS SELECT</code> operation successfully creates the target table but an error occurs while querying
+          the source table or copying the data, the new table is left behind rather than being dropped.
+        </p>
+
+        <p class="p">
+          <strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2005" target="_blank">IMPALA-2005</a>
+        </p>
+
+        <p class="p">
+          <strong class="ph b">Workaround:</strong> Drop the new table manually after a failed <code class="ph codeph">CREATE TABLE AS SELECT</code>.
+        </p>
+
+      </div>
+
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title58" id="known_issues_misc__IMPALA-1821">
+
+
+
+      <h3 class="title topictitle3" id="ariaid-title58">Casting scenarios with invalid/inconsistent results</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          Using a <code class="ph codeph">CAST()</code> function to convert large literal values to smaller types, or to convert special values such as
+          <code class="ph codeph">NaN</code> or <code class="ph codeph">Inf</code>, produces values not consistent with other database systems. This could lead to
+          unexpected results from queries.
+        </p>
+
+        <p class="p">
+          <strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-1821" target="_blank">IMPALA-1821</a>
+        </p>
+
+
+
+      </div>
+
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title59" id="known_issues_misc__IMPALA-1619">
+
+
+
+      <h3 class="title topictitle3" id="ariaid-title59">Support individual memory allocations larger than 1 GB</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          The largest single block of memory that Impala can allocate during a query is 1 GiB. Therefore, a query could fail or Impala could
+          crash if a compressed text file resulted in more than 1 GiB of data in uncompressed form, or if a string function such as
+          <code class="ph codeph">group_concat()</code> returned a value greater than 1 GiB.
+        </p>
+
+        <p class="p">
+          <strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-1619" target="_blank">IMPALA-1619</a>
+        </p>
+
+        <p class="p"><strong class="ph b">Resolution:</strong> Fixed in <span class="keyword">Impala 2.7.0</span> and <span class="keyword">Impala 2.6.3</span>.</p>
+
+      </div>
+
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title60" id="known_issues_misc__IMPALA-941">
+
+
+
+      <h3 class="title topictitle3" id="ariaid-title60">Impala Parser issue when using fully qualified table names that start with a number.</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          A fully qualified table name starting with a number could cause a parsing error. In a name such as <code class="ph codeph">db.571_market</code>,
+          the decimal point followed by digits is interpreted as a floating-point number.
+        </p>
+
+        <p class="p">
+          <strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-941" target="_blank">IMPALA-941</a>
+        </p>
+
+        <p class="p">
+          <strong class="ph b">Workaround:</strong> Surround each part of the fully qualified name with backticks (<code class="ph codeph">``</code>).
+        </p>
+
+      </div>
+
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title61" id="known_issues_misc__IMPALA-532">
+
+
+
+      <h3 class="title topictitle3" id="ariaid-title61">Impala should tolerate bad locale settings</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          If the <code class="ph codeph">LC_*</code> environment variables specify an unsupported locale, Impala does not start.
+        </p>
+
+        <p class="p">
+          <strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-532" target="_blank">IMPALA-532</a>
+        </p>
+
+        <p class="p">
+          <strong class="ph b">Workaround:</strong> Add <code class="ph codeph">LC_ALL="C"</code> to the environment settings for both the Impala daemon and the Statestore
+          daemon. See <a class="xref" href="impala_config_options.html#config_options">Modifying Impala Startup Options</a> for details about modifying these environment settings.
+        </p>
+
+        <p class="p">
+          <strong class="ph b">Resolution:</strong> Fixing this issue would require an upgrade to Boost 1.47 in the Impala distribution.
+        </p>
+
+      </div>
+
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title62" id="known_issues_misc__IMP-1203">
+
+
+
+      <h3 class="title topictitle3" id="ariaid-title62">Log Level 3 Not Recommended for Impala</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          The extensive logging produced by log level 3 can cause serious performance overhead and capacity issues.
+        </p>
+
+        <p class="p">
+          <strong class="ph b">Workaround:</strong> Reduce the log level to its default value of 1, that is, <code class="ph codeph">GLOG_v=1</code>. See
+          <a class="xref" href="impala_logging.html#log_levels">Setting Logging Levels</a> for details about the effects of setting different logging levels.
+        </p>
+
+      </div>
+
+    </article>
+
+  </article>
+
+</article></main></body></html>
\ No newline at end of file

[50/51] [partial] incubator-impala git commit: IMPALA-4181 [DOCS] Publish rendered Impala documentation to ASF site

Posted by jb...@apache.org.

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_admission.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_admission.html b/docs/build/html/topics/impala_admission.html
new file mode 100644
index 0000000..294f8ca
--- /dev/null
+++ b/docs/build/html/topics/impala_admission.html
@@ -0,0 +1,838 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_admin.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="admission_control"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Admission Control and Query Queuing</title></head><body id="admission_control"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Admission Control and Query Queuing</h1>
+  
+
+  <div class="body conbody">
+
+    <p class="p" id="admission_control__admission_control_intro">
+      Admission control is an Impala feature that imposes limits on concurrent SQL queries, to avoid resource usage
+      spikes and out-of-memory conditions on busy clusters.
+      It is a form of <span class="q">"throttling"</span>.
+      New queries are accepted and executed until
+      certain conditions are met, such as too many queries or too much
+      total memory used across the cluster.
+      When one of these thresholds is reached,
+      incoming queries wait to begin execution. These queries are
+      queued and are admitted (that is, begin executing) when the resources become available.
+    </p>
+    <p class="p">
+      In addition to the threshold values for currently executing queries,
+      you can place limits on the maximum number of queries that are
+      queued (waiting) and a limit on the amount of time they might wait
+      before returning with an error. These queue settings let you ensure that queries do
+      not wait indefinitely, so that you can detect and correct <span class="q">"starvation"</span> scenarios.
+    </p>
+    <p class="p">
+      Enable this feature if your cluster is
+      underutilized at some times and overutilized at others. Overutilization is indicated by performance
+      bottlenecks and queries being cancelled due to out-of-memory conditions, when those same queries are
+      successful and perform well during times with less concurrent load. Admission control works as a safeguard to
+      avoid out-of-memory conditions during heavy concurrent usage.
+    </p>
+
+    <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+        <p class="p">
+          The use of the Llama component for integrated resource management within YARN
+          is no longer supported with <span class="keyword">Impala 2.3</span> and higher.
+          The Llama support code is removed entirely in <span class="keyword">Impala 2.8</span> and higher.
+        </p>
+        <p class="p">
+          For clusters running Impala alongside
+          other data management components, you define static service pools to define the resources
+          available to Impala and other components. Then within the area allocated for Impala,
+          you can create dynamic service pools, each with its own settings for the Impala admission control feature.
+        </p>
+      </div>
+
+    <p class="p toc inpage"></p>
+  </div>
+
+  <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_admin.html">Impala Administration</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="admission_control__admission_intro">
+
+    <h2 class="title topictitle2" id="ariaid-title2">Overview of Impala Admission Control</h2>
+  
+
+    <div class="body conbody">
+
+      <p class="p">
+        On a busy cluster, you might find there is an optimal number of Impala queries that run concurrently.
+        For example, when the I/O capacity is fully utilized by I/O-intensive queries,
+        you might not find any throughput benefit in running more concurrent queries.
+        By allowing some queries to run at full speed while others wait, rather than having
+        all queries contend for resources and run slowly, admission control can result in higher overall throughput.
+      </p>
+
+      <p class="p">
+        For another example, consider a memory-bound workload such as many large joins or aggregation queries.
+        Each such query could briefly use many gigabytes of memory to process intermediate results.
+        Because Impala by default cancels queries that exceed the specified memory limit,
+        running multiple large-scale queries at once might require
+        re-running some queries that are cancelled. In this case, admission control improves the
+        reliability and stability of the overall workload by only allowing as many concurrent queries
+        as the overall memory of the cluster can accomodate.
+      </p>
+
+      <p class="p">
+        The admission control feature lets you set an upper limit on the number of concurrent Impala
+        queries and on the memory used by those queries. Any additional queries are queued until the earlier ones
+        finish, rather than being cancelled or running slowly and causing contention. As other queries finish, the
+        queued queries are allowed to proceed.
+      </p>
+
+      <p class="p">
+        In <span class="keyword">Impala 2.5</span> and higher, you can specify these limits and thresholds for each
+        pool rather than globally. That way, you can balance the resource usage and throughput
+        between steady well-defined workloads, rare resource-intensive queries, and ad hoc
+        exploratory queries.
+      </p>
+
+      <p class="p">
+        For details on the internal workings of admission control, see
+        <a class="xref" href="impala_admission.html#admission_architecture">How Impala Schedules and Enforces Limits on Concurrent Queries</a>.
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="admission_control__admission_concurrency">
+    <h2 class="title topictitle2" id="ariaid-title3">Concurrent Queries and Admission Control</h2>
+    <div class="body conbody">
+      <p class="p">
+        One way to limit resource usage through admission control is to set an upper limit
+        on the number of concurrent queries. This is the initial technique you might use
+        when you do not have extensive information about memory usage for your workload.
+        This setting can be specified separately for each dynamic resource pool.
+      </p>
+      <p class="p">
+        You can combine this setting with the memory-based approach described in
+        <a class="xref" href="impala_admission.html#admission_memory">Memory Limits and Admission Control</a>. If either the maximum number of
+        or the expected memory usage of the concurrent queries is exceeded, subsequent queries
+        are queued until the concurrent workload falls below the threshold again.
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="admission_control__admission_memory">
+    <h2 class="title topictitle2" id="ariaid-title4">Memory Limits and Admission Control</h2>
+    <div class="body conbody">
+      <p class="p">
+        Each dynamic resource pool can have an upper limit on the cluster-wide memory used by queries executing in that pool.
+        This is the technique to use once you have a stable workload with well-understood memory requirements.
+      </p>
+      <p class="p">
+        Always specify the <span class="ph uicontrol">Default Query Memory Limit</span> for the expected maximum amount of RAM
+        that a query might require on each host, which is equivalent to setting the <code class="ph codeph">MEM_LIMIT</code>
+        query option for every query run in that pool. That value affects the execution of each query, preventing it
+        from overallocating memory on each host, and potentially activating the spill-to-disk mechanism or cancelling
+        the query when necessary.
+      </p>
+      <p class="p">
+        Optionally, specify the <span class="ph uicontrol">Max Memory</span> setting, a cluster-wide limit that determines
+        how many queries can be safely run concurrently, based on the upper memory limit per host multiplied by the
+        number of Impala nodes in the cluster.
+      </p>
+      <div class="p">
+        For example, consider the following scenario:
+        <ul class="ul">
+          <li class="li"> The cluster is running <span class="keyword cmdname">impalad</span> daemons on five
+            DataNodes. </li>
+          <li class="li"> A dynamic resource pool has <span class="ph uicontrol">Max Memory</span> set
+            to 100 GB. </li>
+          <li class="li"> The <span class="ph uicontrol">Default Query Memory Limit</span> for the
+            pool is 10 GB. Therefore, any query running in this pool could use
+            up to 50 GB of memory (default query memory limit * number of Impala
+            nodes). </li>
+          <li class="li"> The maximum number of queries that Impala executes concurrently
+            within this dynamic resource pool is two, which is the most that
+            could be accomodated within the 100 GB <span class="ph uicontrol">Max
+              Memory</span> cluster-wide limit. </li>
+          <li class="li"> There is no memory penalty if queries use less memory than the
+              <span class="ph uicontrol">Default Query Memory Limit</span> per-host setting
+            or the <span class="ph uicontrol">Max Memory</span> cluster-wide limit. These
+            values are only used to estimate how many queries can be run
+            concurrently within the resource constraints for the pool. </li>
+        </ul>
+      </div>
+      <div class="note note note_note"><span class="note__title notetitle">Note:</span>  If you specify <span class="ph uicontrol">Max
+          Memory</span> for an Impala dynamic resource pool, you must also
+        specify the <span class="ph uicontrol">Default Query Memory Limit</span>.
+          <span class="ph uicontrol">Max Memory</span> relies on the <span class="ph uicontrol">Default
+          Query Memory Limit</span> to produce a reliable estimate of
+        overall memory consumption for a query. </div>
+      <p class="p">
+        You can combine the memory-based settings with the upper limit on concurrent queries described in
+        <a class="xref" href="impala_admission.html#admission_concurrency">Concurrent Queries and Admission Control</a>. If either the maximum number of
+        or the expected memory usage of the concurrent queries is exceeded, subsequent queries
+        are queued until the concurrent workload falls below the threshold again.
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title5" id="admission_control__admission_yarn">
+
+    <h2 class="title topictitle2" id="ariaid-title5">How Impala Admission Control Relates to Other Resource Management Tools</h2>
+  
+
+    <div class="body conbody">
+
+      <p class="p">
+        The admission control feature is similar in some ways to the YARN resource management framework. These features
+        can be used separately or together. This section describes some similarities and differences, to help you
+        decide which combination of resource management features to use for Impala.
+      </p>
+
+      <p class="p">
+        Admission control is a lightweight, decentralized system that is suitable for workloads consisting
+        primarily of Impala queries and other SQL statements. It sets <span class="q">"soft"</span> limits that smooth out Impala
+        memory usage during times of heavy load, rather than taking an all-or-nothing approach that cancels jobs
+        that are too resource-intensive.
+      </p>
+
+      <p class="p">
+        Because the admission control system does not interact with other Hadoop workloads such as MapReduce jobs, you
+        might use YARN with static service pools on clusters where resources are shared between
+        Impala and other Hadoop components. This configuration is recommended when using Impala in a
+        <dfn class="term">multitenant</dfn> cluster. Devote a percentage of cluster resources to Impala, and allocate another
+        percentage for MapReduce and other batch-style workloads. Let admission control handle the concurrency and
+        memory usage for the Impala work within the cluster, and let YARN manage the work for other components within the
+        cluster. In this scenario, Impala's resources are not managed by YARN.
+      </p>
+
+      <p class="p">
+        The Impala admission control feature uses the same configuration mechanism as the YARN resource manager to map users to
+        pools and authenticate them.
+      </p>
+
+      <p class="p">
+        Although the Impala admission control feature uses a <code class="ph codeph">fair-scheduler.xml</code> configuration file
+        behind the scenes, this file does not depend on which scheduler is used for YARN. You still use this file
+        even when YARN is using the capacity scheduler.
+      </p>
+
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title6" id="admission_control__admission_architecture">
+
+    <h2 class="title topictitle2" id="ariaid-title6">How Impala Schedules and Enforces Limits on Concurrent Queries</h2>
+  
+
+    <div class="body conbody">
+
+      <p class="p">
+        The admission control system is decentralized, embedded in each Impala daemon and communicating through the
+        statestore mechanism. Although the limits you set for memory usage and number of concurrent queries apply
+        cluster-wide, each Impala daemon makes its own decisions about whether to allow each query to run
+        immediately or to queue it for a less-busy time. These decisions are fast, meaning the admission control
+        mechanism is low-overhead, but might be imprecise during times of heavy load across many coordinators. There could be times when the
+        more queries were queued (in aggregate across the cluster) than the specified limit, or when number of admitted queries
+        exceeds the expected number. Thus, you typically err on the
+        high side for the size of the queue, because there is not a big penalty for having a large number of queued
+        queries; and you typically err on the low side for configuring memory resources, to leave some headroom in case more
+        queries are admitted than expected, without running out of memory and being cancelled as a result.
+      </p>
+
+
+
+      <p class="p">
+        To avoid a large backlog of queued requests, you can set an upper limit on the size of the queue for
+        queries that are queued. When the number of queued queries exceeds this limit, further queries are
+        cancelled rather than being queued. You can also configure a timeout period per pool, after which queued queries are
+        cancelled, to avoid indefinite waits. If a cluster reaches this state where queries are cancelled due to
+        too many concurrent requests or long waits for query execution to begin, that is a signal for an
+        administrator to take action, either by provisioning more resources, scheduling work on the cluster to
+        smooth out the load, or by doing <a class="xref" href="impala_performance.html#performance">Impala performance
+        tuning</a> to enable higher throughput.
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title7" id="admission_control__admission_jdbc_odbc">
+
+    <h2 class="title topictitle2" id="ariaid-title7">How Admission Control works with Impala Clients (JDBC, ODBC, HiveServer2)</h2>
+  
+
+    <div class="body conbody">
+
+      <p class="p">
+        Most aspects of admission control work transparently with client interfaces such as JDBC and ODBC:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          If a SQL statement is put into a queue rather than running immediately, the API call blocks until the
+          statement is dequeued and begins execution. At that point, the client program can request to fetch
+          results, which might also block until results become available.
+        </li>
+
+        <li class="li">
+          If a SQL statement is cancelled because it has been queued for too long or because it exceeded the memory
+          limit during execution, the error is returned to the client program with a descriptive error message.
+        </li>
+
+      </ul>
+
+      <p class="p">
+        In Impala 2.0 and higher, you can submit
+        a SQL <code class="ph codeph">SET</code> statement from the client application
+        to change the <code class="ph codeph">REQUEST_POOL</code> query option.
+        This option lets you submit queries to different resource pools,
+        as described in <a class="xref" href="impala_request_pool.html#request_pool">REQUEST_POOL Query Option</a>.
+
+      </p>
+
+      <p class="p">
+        At any time, the set of queued queries could include queries submitted through multiple different Impala
+        daemon hosts. All the queries submitted through a particular host will be executed in order, so a
+        <code class="ph codeph">CREATE TABLE</code> followed by an <code class="ph codeph">INSERT</code> on the same table would succeed.
+        Queries submitted through different hosts are not guaranteed to be executed in the order they were
+        received. Therefore, if you are using load-balancing or other round-robin scheduling where different
+        statements are submitted through different hosts, set up all table structures ahead of time so that the
+        statements controlled by the queuing system are primarily queries, where order is not significant. Or, if a
+        sequence of statements needs to happen in strict order (such as an <code class="ph codeph">INSERT</code> followed by a
+        <code class="ph codeph">SELECT</code>), submit all those statements through a single session, while connected to the same
+        Impala daemon host.
+      </p>
+
+      <p class="p">
+        Admission control has the following limitations or special behavior when used with JDBC or ODBC
+        applications:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          The other resource-related query options,
+          <code class="ph codeph">RESERVATION_REQUEST_TIMEOUT</code> and <code class="ph codeph">V_CPU_CORES</code>, are no longer used. Those query options only
+          applied to using Impala with Llama, which is no longer supported.
+        </li>
+      </ul>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title8" id="admission_control__admission_schema_config">
+    <h2 class="title topictitle2" id="ariaid-title8">SQL and Schema Considerations for Admission Control</h2>
+    <div class="body conbody">
+      <p class="p">
+        When queries complete quickly and are tuned for optimal memory usage, there is less chance of
+        performance or capacity problems during times of heavy load. Before setting up admission control,
+        tune your Impala queries to ensure that the query plans are efficient and the memory estimates
+        are accurate. Understanding the nature of your workload, and which queries are the most
+        resource-intensive, helps you to plan how to divide the queries into different pools and
+        decide what limits to define for each pool.
+      </p>
+      <p class="p">
+        For large tables, especially those involved in join queries, keep their statistics up to date
+        after loading substantial amounts of new data or adding new partitions.
+        Use the <code class="ph codeph">COMPUTE STATS</code> statement for unpartitioned tables, and
+        <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> for partitioned tables.
+      </p>
+      <p class="p">
+        When you use dynamic resource pools with a <span class="ph uicontrol">Max Memory</span> setting enabled,
+        you typically override the memory estimates that Impala makes based on the statistics from the
+        <code class="ph codeph">COMPUTE STATS</code> statement.
+        You either set the <code class="ph codeph">MEM_LIMIT</code> query option within a particular session to
+        set an upper memory limit for queries within that session, or a default <code class="ph codeph">MEM_LIMIT</code>
+        setting for all queries processed by the <span class="keyword cmdname">impalad</span> instance, or
+        a default <code class="ph codeph">MEM_LIMIT</code> setting for all queries assigned to a particular
+        dynamic resource pool. By designating a consistent memory limit for a set of similar queries
+        that use the same resource pool, you avoid unnecessary query queuing or out-of-memory conditions
+        that can arise during high-concurrency workloads when memory estimates for some queries are inaccurate.
+      </p>
+      <p class="p">
+        Follow other steps from <a class="xref" href="impala_performance.html#performance">Tuning Impala for Performance</a> to tune your queries.
+      </p>
+    </div>
+  </article>
+
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title9" id="admission_control__admission_config">
+
+    <h2 class="title topictitle2" id="ariaid-title9">Configuring Admission Control</h2>
+  
+
+    <div class="body conbody">
+
+      <p class="p">
+        The configuration options for admission control range from the simple (a single resource pool with a single
+        set of options) to the complex (multiple resource pools with different options, each pool handling queries
+        for a different set of users and groups).
+      </p>
+
+      <section class="section" id="admission_config__admission_flags"><h3 class="title sectiontitle">Impala Service Flags for Admission Control (Advanced)</h3>
+
+        
+
+        <p class="p">
+          The following Impala configuration options let you adjust the settings of the admission control feature. When supplying the
+          options on the <span class="keyword cmdname">impalad</span> command line, prepend the option name with <code class="ph codeph">--</code>.
+        </p>
+
+        <dl class="dl" id="admission_config__admission_control_option_list">
+          
+            <dt class="dt dlterm" id="admission_config__queue_wait_timeout_ms">
+              <code class="ph codeph">queue_wait_timeout_ms</code>
+            </dt>
+            <dd class="dd">
+              
+              <strong class="ph b">Purpose:</strong> Maximum amount of time (in milliseconds) that a
+              request waits to be admitted before timing out.
+              <p class="p">
+                <strong class="ph b">Type:</strong> <code class="ph codeph">int64</code>
+              </p>
+              <p class="p">
+                <strong class="ph b">Default:</strong> <code class="ph codeph">60000</code>
+              </p>
+            </dd>
+          
+          
+            <dt class="dt dlterm" id="admission_config__default_pool_max_requests">
+              <code class="ph codeph">default_pool_max_requests</code>
+            </dt>
+            <dd class="dd">
+              
+              <strong class="ph b">Purpose:</strong> Maximum number of concurrent outstanding requests
+              allowed to run before incoming requests are queued. Because this
+              limit applies cluster-wide, but each Impala node makes independent
+              decisions to run queries immediately or queue them, it is a soft
+              limit; the overall number of concurrent queries might be slightly
+              higher during times of heavy load. A negative value indicates no
+              limit. Ignored if <code class="ph codeph">fair_scheduler_config_path</code> and
+                <code class="ph codeph">llama_site_path</code> are set. <p class="p">
+                <strong class="ph b">Type:</strong>
+                <code class="ph codeph">int64</code>
+              </p>
+              <p class="p">
+                <strong class="ph b">Default:</strong>
+                <span class="ph">-1, meaning unlimited (prior to <span class="keyword">Impala 2.5</span> the default was 200)</span>
+              </p>
+            </dd>
+          
+          
+            <dt class="dt dlterm" id="admission_config__default_pool_max_queued">
+              <code class="ph codeph">default_pool_max_queued</code>
+            </dt>
+            <dd class="dd">
+              
+              <strong class="ph b">Purpose:</strong> Maximum number of requests allowed to be queued
+              before rejecting requests. Because this limit applies
+              cluster-wide, but each Impala node makes independent decisions to
+              run queries immediately or queue them, it is a soft limit; the
+              overall number of queued queries might be slightly higher during
+              times of heavy load. A negative value or 0 indicates requests are
+              always rejected once the maximum concurrent requests are
+              executing. Ignored if <code class="ph codeph">fair_scheduler_config_path</code>
+              and <code class="ph codeph">llama_site_path</code> are set. <p class="p">
+                <strong class="ph b">Type:</strong>
+                <code class="ph codeph">int64</code>
+              </p>
+              <p class="p">
+                <strong class="ph b">Default:</strong>
+                <span class="ph">unlimited</span>
+              </p>
+            </dd>
+          
+          
+            <dt class="dt dlterm" id="admission_config__default_pool_mem_limit">
+              <code class="ph codeph">default_pool_mem_limit</code>
+            </dt>
+            <dd class="dd">
+              
+              <strong class="ph b">Purpose:</strong> Maximum amount of memory (across the entire
+              cluster) that all outstanding requests in this pool can use before
+              new requests to this pool are queued. Specified in bytes,
+              megabytes, or gigabytes by a number followed by the suffix
+                <code class="ph codeph">b</code> (optional), <code class="ph codeph">m</code>, or
+                <code class="ph codeph">g</code>, either uppercase or lowercase. You can
+              specify floating-point values for megabytes and gigabytes, to
+              represent fractional numbers such as <code class="ph codeph">1.5</code>. You can
+              also specify it as a percentage of the physical memory by
+              specifying the suffix <code class="ph codeph">%</code>. 0 or no setting
+              indicates no limit. Defaults to bytes if no unit is given. Because
+              this limit applies cluster-wide, but each Impala node makes
+              independent decisions to run queries immediately or queue them, it
+              is a soft limit; the overall memory used by concurrent queries
+              might be slightly higher during times of heavy load. Ignored if
+                <code class="ph codeph">fair_scheduler_config_path</code> and
+                <code class="ph codeph">llama_site_path</code> are set. <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+        Impala relies on the statistics produced by the <code class="ph codeph">COMPUTE STATS</code> statement to estimate memory
+        usage for each query. See <a class="xref" href="../shared/../topics/impala_compute_stats.html#compute_stats">COMPUTE STATS Statement</a> for guidelines
+        about how and when to use this statement.
+      </div>
+              <p class="p">
+        <strong class="ph b">Type:</strong> string
+      </p>
+              <p class="p">
+                <strong class="ph b">Default:</strong>
+                <code class="ph codeph">""</code> (empty string, meaning unlimited) </p>
+            </dd>
+          
+          
+            <dt class="dt dlterm" id="admission_config__disable_admission_control">
+              <code class="ph codeph">disable_admission_control</code>
+            </dt>
+            <dd class="dd">
+              
+              <strong class="ph b">Purpose:</strong> Turns off the admission control feature entirely,
+              regardless of other configuration option settings.
+              <p class="p">
+                <strong class="ph b">Type:</strong> Boolean </p>
+              <p class="p">
+                <strong class="ph b">Default:</strong>
+                <code class="ph codeph">false</code>
+              </p>
+            </dd>
+          
+          
+            <dt class="dt dlterm" id="admission_config__disable_pool_max_requests">
+              <code class="ph codeph">disable_pool_max_requests</code>
+            </dt>
+            <dd class="dd">
+              
+              <strong class="ph b">Purpose:</strong> Disables all per-pool limits on the maximum number
+              of running requests. <p class="p">
+                <strong class="ph b">Type:</strong> Boolean </p>
+              <p class="p">
+                <strong class="ph b">Default:</strong>
+                <code class="ph codeph">false</code>
+              </p>
+            </dd>
+          
+          
+            <dt class="dt dlterm" id="admission_config__disable_pool_mem_limits">
+              <code class="ph codeph">disable_pool_mem_limits</code>
+            </dt>
+            <dd class="dd">
+              
+              <strong class="ph b">Purpose:</strong> Disables all per-pool mem limits. <p class="p">
+                <strong class="ph b">Type:</strong> Boolean </p>
+              <p class="p">
+                <strong class="ph b">Default:</strong>
+                <code class="ph codeph">false</code>
+              </p>
+            </dd>
+          
+          
+            <dt class="dt dlterm" id="admission_config__fair_scheduler_allocation_path">
+              <code class="ph codeph">fair_scheduler_allocation_path</code>
+            </dt>
+            <dd class="dd">
+              
+              <strong class="ph b">Purpose:</strong> Path to the fair scheduler allocation file
+                (<code class="ph codeph">fair-scheduler.xml</code>). <p class="p">
+        <strong class="ph b">Type:</strong> string
+      </p>
+              <p class="p">
+                <strong class="ph b">Default:</strong>
+                <code class="ph codeph">""</code> (empty string) </p>
+              <p class="p">
+                <strong class="ph b">Usage notes:</strong> Admission control only uses a small subset
+                of the settings that can go in this file, as described below.
+                For details about all the Fair Scheduler configuration settings,
+                see the <a class="xref" href="http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html#Configuration" target="_blank">Apache wiki</a>. </p>
+            </dd>
+          
+          
+            <dt class="dt dlterm" id="admission_config__llama_site_path">
+              <code class="ph codeph">llama_site_path</code>
+            </dt>
+            <dd class="dd">
+              
+              <strong class="ph b">Purpose:</strong> Path to the configuration file used by admission control
+                (<code class="ph codeph">llama-site.xml</code>). If set,
+                <code class="ph codeph">fair_scheduler_allocation_path</code> must also be set.
+              <p class="p">
+        <strong class="ph b">Type:</strong> string
+      </p>
+              <p class="p">
+                <strong class="ph b">Default:</strong> <code class="ph codeph">""</code> (empty string) </p>
+              <p class="p">
+                <strong class="ph b">Usage notes:</strong> Admission control only uses a few
+                of the settings that can go in this file, as described below.
+              </p>
+            </dd>
+          
+        </dl>
+      </section>
+    </div>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title10" id="admission_config__admission_config_manual">
+
+      <h3 class="title topictitle3" id="ariaid-title10">Configuring Admission Control Using the Command Line</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          To configure admission control, use a combination of startup options for the Impala daemon and edit
+          or create the configuration files <span class="ph filepath">fair-scheduler.xml</span> and
+            <span class="ph filepath">llama-site.xml</span>.
+        </p>
+
+        <p class="p">
+          For a straightforward configuration using a single resource pool named <code class="ph codeph">default</code>, you can
+          specify configuration options on the command line and skip the <span class="ph filepath">fair-scheduler.xml</span>
+          and <span class="ph filepath">llama-site.xml</span> configuration files.
+        </p>
+
+        <p class="p">
+          For an advanced configuration with multiple resource pools using different settings, set up the
+          <span class="ph filepath">fair-scheduler.xml</span> and <span class="ph filepath">llama-site.xml</span> configuration files
+          manually. Provide the paths to each one using the <span class="keyword cmdname">impalad</span> command-line options,
+          <code class="ph codeph">--fair_scheduler_allocation_path</code> and <code class="ph codeph">--llama_site_path</code> respectively.
+        </p>
+
+        <p class="p">
+          The Impala admission control feature only uses the Fair Scheduler configuration settings to determine how
+          to map users and groups to different resource pools. For example, you might set up different resource
+          pools with separate memory limits, and maximum number of concurrent and queued queries, for different
+          categories of users within your organization. For details about all the Fair Scheduler configuration
+          settings, see the
+          <a class="xref" href="http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html#Configuration" target="_blank">Apache
+          wiki</a>.
+        </p>
+
+        <p class="p">
+          The Impala admission control feature only uses a small subset of possible settings from the
+          <span class="ph filepath">llama-site.xml</span> configuration file:
+        </p>
+
+<pre class="pre codeblock"><code>llama.am.throttling.maximum.placed.reservations.<var class="keyword varname">queue_name</var>
+llama.am.throttling.maximum.queued.reservations.<var class="keyword varname">queue_name</var>
+<span class="ph">impala.admission-control.pool-default-query-options.<var class="keyword varname">queue_name</var>
+impala.admission-control.pool-queue-timeout-ms.<var class="keyword varname">queue_name</var></span>
+</code></pre>
+
+        <p class="p">
+          The <code class="ph codeph">impala.admission-control.pool-queue-timeout-ms</code>
+          setting specifies the timeout value for this pool, in milliseconds.
+          The<code class="ph codeph">impala.admission-control.pool-default-query-options</code>
+          settings designates the default query options for all queries that run
+          in this pool. Its argument value is a comma-delimited string of
+          'key=value' pairs, for example,<code class="ph codeph">'key1=val1,key2=val2'</code>.
+          For example, this is where you might set a default memory limit
+          for all queries in the pool, using an argument such as <code class="ph codeph">MEM_LIMIT=5G</code>.
+        </p>
+
+        <p class="p">
+          The <code class="ph codeph">impala.admission-control.*</code> configuration settings are available in
+          <span class="keyword">Impala 2.5</span> and higher.
+        </p>
+
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title11" id="admission_config__admission_examples">
+
+      <h3 class="title topictitle3" id="ariaid-title11">Example of Admission Control Configuration</h3>
+
+      <div class="body conbody">
+
+        <p class="p"> Here are sample <span class="ph filepath">fair-scheduler.xml</span> and
+          <span class="ph filepath">llama-site.xml</span> files that define resource pools
+          <code class="ph codeph">root.default</code>, <code class="ph codeph">root.development</code>, and
+          <code class="ph codeph">root.production</code>. These sample files are stripped down: in a real
+          deployment they might contain other settings for use with various aspects of the YARN
+          component. The settings shown here are the significant ones for the Impala admission
+          control feature. </p>
+
+        <p class="p">
+          <strong class="ph b">fair-scheduler.xml:</strong>
+        </p>
+
+        <p class="p">
+          Although Impala does not use the <code class="ph codeph">vcores</code> value, you must still specify it to satisfy
+          YARN requirements for the file contents.
+        </p>
+
+        <p class="p">
+          Each <code class="ph codeph">&lt;aclSubmitApps&gt;</code> tag (other than the one for <code class="ph codeph">root</code>) contains
+          a comma-separated list of users, then a space, then a comma-separated list of groups; these are the
+          users and groups allowed to submit Impala statements to the corresponding resource pool.
+        </p>
+
+        <p class="p">
+          If you leave the <code class="ph codeph">&lt;aclSubmitApps&gt;</code> element empty for a pool, nobody can submit
+          directly to that pool; child pools can specify their own <code class="ph codeph">&lt;aclSubmitApps&gt;</code> values
+          to authorize users and groups to submit to those pools.
+        </p>
+
+        <pre class="pre codeblock"><code>&lt;allocations&gt;
+
+    &lt;queue name="root"&gt;
+        &lt;aclSubmitApps&gt; &lt;/aclSubmitApps&gt;
+        &lt;queue name="default"&gt;
+            &lt;maxResources&gt;50000 mb, 0 vcores&lt;/maxResources&gt;
+            &lt;aclSubmitApps&gt;*&lt;/aclSubmitApps&gt;
+        &lt;/queue&gt;
+        &lt;queue name="development"&gt;
+            &lt;maxResources&gt;200000 mb, 0 vcores&lt;/maxResources&gt;
+            &lt;aclSubmitApps&gt;user1,user2 dev,ops,admin&lt;/aclSubmitApps&gt;
+        &lt;/queue&gt;
+        &lt;queue name="production"&gt;
+            &lt;maxResources&gt;1000000 mb, 0 vcores&lt;/maxResources&gt;
+            &lt;aclSubmitApps&gt; ops,admin&lt;/aclSubmitApps&gt;
+        &lt;/queue&gt;
+    &lt;/queue&gt;
+    &lt;queuePlacementPolicy&gt;
+        &lt;rule name="specified" create="false"/&gt;
+        &lt;rule name="default" /&gt;
+    &lt;/queuePlacementPolicy&gt;
+&lt;/allocations&gt;
+
+</code></pre>
+
+        <p class="p">
+          <strong class="ph b">llama-site.xml:</strong>
+        </p>
+
+        <pre class="pre codeblock"><code>
+&lt;?xml version="1.0" encoding="UTF-8"?&gt;
+&lt;configuration&gt;
+  &lt;property&gt;
+    &lt;name&gt;llama.am.throttling.maximum.placed.reservations.root.default&lt;/name&gt;
+    &lt;value&gt;10&lt;/value&gt;
+  &lt;/property&gt;
+  &lt;property&gt;
+    &lt;name&gt;llama.am.throttling.maximum.queued.reservations.root.default&lt;/name&gt;
+    &lt;value&gt;50&lt;/value&gt;
+  &lt;/property&gt;
+  &lt;property&gt;
+    &lt;name&gt;impala.admission-control.pool-default-query-options.root.default&lt;/name&gt;
+    &lt;value&gt;mem_limit=128m,query_timeout_s=20,max_io_buffers=10&lt;/value&gt;
+  &lt;/property&gt;
+  &lt;property&gt;
+    &lt;name&gt;impala.admission-control.pool-queue-timeout-ms.root.default&lt;/name&gt;
+    &lt;value&gt;30000&lt;/value&gt;
+  &lt;/property&gt;
+  &lt;property&gt;
+    &lt;name&gt;llama.am.throttling.maximum.placed.reservations.root.development&lt;/name&gt;
+    &lt;value&gt;50&lt;/value&gt;
+  &lt;/property&gt;
+  &lt;property&gt;
+    &lt;name&gt;llama.am.throttling.maximum.queued.reservations.root.development&lt;/name&gt;
+    &lt;value&gt;100&lt;/value&gt;
+  &lt;/property&gt;
+  &lt;property&gt;
+    &lt;name&gt;impala.admission-control.pool-default-query-options.root.development&lt;/name&gt;
+    &lt;value&gt;mem_limit=256m,query_timeout_s=30,max_io_buffers=10&lt;/value&gt;
+  &lt;/property&gt;
+  &lt;property&gt;
+    &lt;name&gt;impala.admission-control.pool-queue-timeout-ms.root.development&lt;/name&gt;
+    &lt;value&gt;15000&lt;/value&gt;
+  &lt;/property&gt;
+  &lt;property&gt;
+    &lt;name&gt;llama.am.throttling.maximum.placed.reservations.root.production&lt;/name&gt;
+    &lt;value&gt;100&lt;/value&gt;
+  &lt;/property&gt;
+  &lt;property&gt;
+    &lt;name&gt;llama.am.throttling.maximum.queued.reservations.root.production&lt;/name&gt;
+    &lt;value&gt;200&lt;/value&gt;
+  &lt;/property&gt;
+&lt;!--
+       Default query options for the 'root.production' pool.
+       THIS IS A NEW PARAMETER in Impala 2.5.
+       Note that the MEM_LIMIT query option still shows up in here even though it is a
+       separate box in the UI. We do that because it is the most important query option
+       that people will need (everything else is somewhat advanced).
+
+       MEM_LIMIT takes a per-node memory limit which is specified using one of the following:
+        - '&lt;int&gt;[bB]?'  -&gt; bytes (default if no unit given)
+        - '&lt;float&gt;[mM(bB)]' -&gt; megabytes
+        - '&lt;float&gt;[gG(bB)]' -&gt; in gigabytes
+        E.g. 'MEM_LIMIT=12345' (no unit) means 12345 bytes, and you can append m or g
+             to specify megabytes or gigabytes, though that is not required.
+--&gt;
+  &lt;property&gt;
+    &lt;name&gt;impala.admission-control.pool-default-query-options.root.production&lt;/name&gt;
+    &lt;value&gt;mem_limit=386m,query_timeout_s=30,max_io_buffers=10&lt;/value&gt;
+  &lt;/property&gt;
+&lt;!--
+  Default queue timeout (ms) for the pool 'root.production'.
+  If this isn\u2019t set, the process-wide flag is used.
+  THIS IS A NEW PARAMETER in Impala 2.5.
+--&gt;
+  &lt;property&gt;
+    &lt;name&gt;impala.admission-control.pool-queue-timeout-ms.root.production&lt;/name&gt;
+    &lt;value&gt;30000&lt;/value&gt;
+  &lt;/property&gt;
+&lt;/configuration&gt;
+
+</code></pre>
+
+      </div>
+    </article>
+
+
+
+  <article class="topic concept nested2" aria-labelledby="ariaid-title12" id="admission_config__admission_guidelines">
+
+    <h3 class="title topictitle3" id="ariaid-title12">Guidelines for Using Admission Control</h3>
+  
+
+    <div class="body conbody">
+
+      <p class="p">
+        To see how admission control works for particular queries, examine the profile output for the query. This
+        information is available through the <code class="ph codeph">PROFILE</code> statement in <span class="keyword cmdname">impala-shell</span>
+        immediately after running a query in the shell, on the <span class="ph uicontrol">queries</span> page of the Impala
+        debug web UI, or in the Impala log file (basic information at log level 1, more detailed information at log
+        level 2). The profile output contains details about the admission decision, such as whether the query was
+        queued or not and which resource pool it was assigned to. It also includes the estimated and actual memory
+        usage for the query, so you can fine-tune the configuration for the memory limits of the resource pools.
+      </p>
+
+      <p class="p">
+        Remember that the limits imposed by admission control are <span class="q">"soft"</span> limits.
+        The decentralized nature of this mechanism means that each Impala node makes its own decisions about whether
+        to allow queries to run immediately or to queue them. These decisions rely on information passed back and forth
+        between nodes by the statestore service. If a sudden surge in requests causes more queries than anticipated to run
+        concurrently, then throughput could decrease due to queries spilling to disk or contending for resources;
+        or queries could be cancelled if they exceed the <code class="ph codeph">MEM_LIMIT</code> setting while running.
+      </p>
+
+
+
+      <p class="p">
+        In <span class="keyword cmdname">impala-shell</span>, you can also specify which resource pool to direct queries to by
+        setting the <code class="ph codeph">REQUEST_POOL</code> query option.
+      </p>
+
+      <p class="p">
+        The statements affected by the admission control feature are primarily queries, but also include statements
+        that write data such as <code class="ph codeph">INSERT</code> and <code class="ph codeph">CREATE TABLE AS SELECT</code>. Most write
+        operations in Impala are not resource-intensive, but inserting into a Parquet table can require substantial
+        memory due to buffering intermediate data before writing out each Parquet data block. See
+        <a class="xref" href="impala_parquet.html#parquet_etl">Loading Data into Parquet Tables</a> for instructions about inserting data efficiently into
+        Parquet tables.
+      </p>
+
+      <p class="p">
+        Although admission control does not scrutinize memory usage for other kinds of DDL statements, if a query
+        is queued due to a limit on concurrent queries or memory usage, subsequent statements in the same session
+        are also queued so that they are processed in the correct order:
+      </p>
+
+<pre class="pre codeblock"><code>-- This query could be queued to avoid out-of-memory at times of heavy load.
+select * from huge_table join enormous_table using (id);
+-- If so, this subsequent statement in the same session is also queued
+-- until the previous statement completes.
+drop table huge_table;
+</code></pre>
+
+      <p class="p">
+        If you set up different resource pools for different users and groups, consider reusing any classifications
+        you developed for use with Sentry security. See <a class="xref" href="impala_authorization.html#authorization">Enabling Sentry Authorization for Impala</a> for details.
+      </p>
+
+      <p class="p">
+        For details about all the Fair Scheduler configuration settings, see
+        <a class="xref" href="http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html#Configuration" target="_blank">Fair Scheduler Configuration</a>, in particular the tags such as <code class="ph codeph">&lt;queue&gt;</code> and
+        <code class="ph codeph">&lt;aclSubmitApps&gt;</code> to map users and groups to particular resource pools (queues).
+      </p>
+
+
+    </div>
+  </article>
+</article>
+</article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_aggregate_functions.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_aggregate_functions.html b/docs/build/html/topics/impala_aggregate_functions.html
new file mode 100644
index 0000000..0b6ab31
--- /dev/null
+++ b/docs/build/html/topics/impala_aggregate_functions.html
@@ -0,0 +1,34 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_functions.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_appx_median.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_avg.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_count.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_group_concat.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_max.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_min.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_ndv.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_stddev.html"><meta name="DC.Relation" scheme="URI" conte
 nt="../topics/impala_sum.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_variance.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="aggregate_functions"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Impala Aggregate Functions</title></head><body id="aggregate_functions"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Impala Aggregate Functions</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+        Aggregate functions are a special category with different rules. These functions calculate a return value
+        across all the items in a result set, so they require a <code class="ph codeph">FROM</code> clause in the query:
+      </p>
+
+<pre class="pre codeblock"><code>select count(product_id) from product_catalog;
+select max(height), avg(height) from census_data where age &gt; 20;
+</code></pre>
+
+    <p class="p">
+        Aggregate functions also ignore <code class="ph codeph">NULL</code> values rather than returning a <code class="ph codeph">NULL</code>
+        result. For example, if some rows have <code class="ph codeph">NULL</code> for a particular column, those rows are
+        ignored when computing the <code class="ph codeph">AVG()</code> for that column. Likewise, specifying
+        <code class="ph codeph">COUNT(<var class="keyword varname">col_name</var>)</code> in a query counts only those rows where
+        <var class="keyword varname">col_name</var> contains a non-<code class="ph codeph">NULL</code> value.
+      </p>
+
+    <p class="p">
+      
+    </p>
+
+    <p class="p toc"></p>
+  </div>
+<nav role="navigation" class="related-links"><ul class="ullinks"><li class="link ulchildlink"><strong><a href="../topics/impala_appx_median.html">APPX_MEDIAN Function</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_avg.html">AVG Function</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_count.html">COUNT Function</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_group_concat.html">GROUP_CONCAT Function</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_max.html">MAX Function</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_min.html">MIN Function</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_ndv.html">NDV Function</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_stddev.html">STDDEV, STDDEV_SAMP, STDDEV_POP Functions</a></strong><br></li><li cl
 ass="link ulchildlink"><strong><a href="../topics/impala_sum.html">SUM Function</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_variance.html">VARIANCE, VARIANCE_SAMP, VARIANCE_POP, VAR_SAMP, VAR_POP Functions</a></strong><br></li></ul><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_functions.html">Impala Built-In Functions</a></div></div></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_aliases.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_aliases.html b/docs/build/html/topics/impala_aliases.html
new file mode 100644
index 0000000..4322db3
--- /dev/null
+++ b/docs/build/html/topics/impala_aliases.html
@@ -0,0 +1,85 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_schema_objects.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="aliases"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Overview of Impala Aliases</title></head><body id="aliases"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Overview of Impala Aliases</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      When you write the names of tables, columns, or column expressions in a query, you can assign an alias at the
+      same time. Then you can specify the alias rather than the original name when making other references to the
+      table or column in the same statement. You typically specify aliases that are shorter, easier to remember, or
+      both than the original names. The aliases are printed in the query header, making them useful for
+      self-documenting output.
+    </p>
+
+    <p class="p">
+      To set up an alias, add the <code class="ph codeph">AS <var class="keyword varname">alias</var></code> clause immediately after any table,
+      column, or expression name in the <code class="ph codeph">SELECT</code> list or <code class="ph codeph">FROM</code> list of a query. The
+      <code class="ph codeph">AS</code> keyword is optional; you can also specify the alias immediately after the original name.
+    </p>
+
+<pre class="pre codeblock"><code>-- Make the column headers of the result set easier to understand.
+SELECT c1 AS name, c2 AS address, c3 AS phone FROM table_with_terse_columns;
+SELECT SUM(ss_xyz_dollars_net) AS total_sales FROM table_with_cryptic_columns;
+-- The alias can be a quoted string for extra readability.
+SELECT c1 AS "Employee ID", c2 AS "Date of hire" FROM t1;
+-- The AS keyword is optional.
+SELECT c1 "Employee ID", c2 "Date of hire" FROM t1;
+
+-- The table aliases assigned in the FROM clause can be used both earlier
+-- in the query (the SELECT list) and later (the WHERE clause).
+SELECT one.name, two.address, three.phone
+  FROM census one, building_directory two, phonebook three
+WHERE one.id = two.id and two.id = three.id;
+
+-- The aliases c1 and c2 let the query handle columns with the same names from 2 joined tables.
+-- The aliases t1 and t2 let the query abbreviate references to long or cryptically named tables.
+SELECT t1.column_n AS c1, t2.column_n AS c2 FROM long_name_table AS t1, very_long_name_table2 AS t2
+  WHERE c1 = c2;
+SELECT t1.column_n c1, t2.column_n c2 FROM table1 t1, table2 t2
+  WHERE c1 = c2;
+</code></pre>
+
+    <p class="p">
+      To use an alias name that matches one of the Impala reserved keywords (listed in
+      <a class="xref" href="impala_reserved_words.html#reserved_words">Impala Reserved Words</a>), surround the identifier with either single or
+      double quotation marks, or <code class="ph codeph">``</code> characters (backticks).
+    </p>
+
+    <p class="p">
+      <span class="ph"> Aliases follow the same rules as identifiers when it comes to case
+        insensitivity. Aliases can be longer than identifiers (up to the maximum length of a Java string) and can
+        include additional characters such as spaces and dashes when they are quoted using backtick characters.
+        </span>
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Complex type considerations:</strong>
+      </p>
+
+    <p class="p">
+      Queries involving the complex types (<code class="ph codeph">ARRAY</code>,
+      <code class="ph codeph">STRUCT</code>, and <code class="ph codeph">MAP</code>), typically make
+      extensive use of table aliases. These queries involve join clauses
+      where the complex type column is treated as a joined table.
+      To construct two-part or three-part qualified names for the
+      complex column elements in the <code class="ph codeph">FROM</code> list,
+      sometimes it is syntactically required to construct a table
+      alias for the complex column where it is referenced in the join clause.
+      See <a class="xref" href="impala_complex_types.html#complex_types">Complex Types (Impala 2.3 or higher only)</a> for details and examples.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Alternatives:</strong>
+    </p>
+
+    <p class="p">
+        Another way to define different names for the same tables or columns is to create views. See
+        <a class="xref" href="../shared/../topics/impala_views.html#views">Overview of Impala Views</a> for details.
+      </p>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_schema_objects.html">Impala Schema Objects and Object Names</a></div></div></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_allow_unsupported_formats.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_allow_unsupported_formats.html b/docs/build/html/topics/impala_allow_unsupported_formats.html
new file mode 100644
index 0000000..824c555
--- /dev/null
+++ b/docs/build/html/topics/impala_allow_unsupported_formats.html
@@ -0,0 +1,24 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="allow_unsupported_formats"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>ALLOW_UNSUPPORTED_FORMATS Query Option</title></head><body id="allow_unsupported_formats"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">ALLOW_UNSUPPORTED_FORMATS Query Option</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      An obsolete query option from early work on support for file formats. Do not use. Might be removed in the
+      future.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Type:</strong> Boolean; recognized values are 1 and 0, or <code class="ph codeph">true</code> and <code class="ph codeph">false</code>;
+        any other value interpreted as <code class="ph codeph">false</code>
+      </p>
+    <p class="p">
+        <strong class="ph b">Default:</strong> <code class="ph codeph">false</code> (shown as 0 in output of <code class="ph codeph">SET</code> statement)
+      </p>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>
\ No newline at end of file

[14/51] [partial] incubator-impala git commit: IMPALA-4181 [DOCS] Publish rendered Impala documentation to ASF site

Posted by jb...@apache.org.

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_perf_stats.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_perf_stats.html b/docs/build/html/topics/impala_perf_stats.html
new file mode 100644
index 0000000..7ad1fb0
--- /dev/null
+++ b/docs/build/html/topics/impala_perf_stats.html
@@ -0,0 +1,996 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_performance.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="perf_stats"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Table and Column Statistics</title></head><body id="perf_stats"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Table and Column Statistics</h1>
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      Impala can do better optimization for complex or multi-table queries when it has access to statistics about
+      the volume of data and how the values are distributed. Impala uses this information to help parallelize and
+      distribute the work for a query. For example, optimizing join queries requires a way of determining if one
+      table is <span class="q">"bigger"</span> than another, which is a function of the number of rows and the average row size
+      for each table. The following sections describe the categories of statistics Impala can work
+      with, and how to produce them and keep them up to date.
+    </p>
+
+    <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+      <p class="p">
+        Originally, Impala relied on the Hive mechanism for collecting statistics, through the Hive <code class="ph codeph">ANALYZE
+        TABLE</code> statement which initiates a MapReduce job. For better user-friendliness and reliability,
+        Impala implements its own <code class="ph codeph">COMPUTE STATS</code> statement in Impala 1.2.2 and higher, along with the
+        <code class="ph codeph">DROP STATS</code>, <code class="ph codeph">SHOW TABLE STATS</code>, and <code class="ph codeph">SHOW COLUMN STATS</code>
+        statements.
+      </p>
+    </div>
+
+    <p class="p toc inpage"></p>
+  </div>
+
+  <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_performance.html">Tuning Impala for Performance</a></div></div></nav><article class="topic concept nested1" aria-labelledby="perf_table_stats__table_stats" id="perf_stats__perf_table_stats">
+
+    <h2 class="title topictitle2" id="perf_table_stats__table_stats">Overview of Table Statistics</h2>
+  
+
+    <div class="body conbody">
+
+
+
+      <p class="p">
+        The Impala query planner can make use of statistics about entire tables and partitions.
+        This information includes physical characteristics such as the number of rows, number of data files,
+        the total size of the data files, and the file format. For partitioned tables, the numbers
+        are calculated per partition, and as totals for the whole table.
+        This metadata is stored in the metastore database, and can be updated by either Impala or Hive.
+        If a number is not available, the value -1 is used as a placeholder.
+        Some numbers, such as number and total sizes of data files, are always kept up to date because
+        they can be calculated cheaply, as part of gathering HDFS block metadata.
+      </p>
+
+      <p class="p">
+        The following example shows table stats for an unpartitioned Parquet table.
+        The values for the number and sizes of files are always available.
+        Initially, the number of rows is not known, because it requires a potentially expensive
+        scan through the entire table, and so that value is displayed as -1.
+        The <code class="ph codeph">COMPUTE STATS</code> statement fills in any unknown table stats values.
+      </p>
+
+<pre class="pre codeblock"><code>
+show table stats parquet_snappy;
++-------+--------+---------+--------------+-------------------+---------+-------------------+...
+| #Rows | #Files | Size    | Bytes Cached | Cache Replication | Format  | Incremental stats |...
++-------+--------+---------+--------------+-------------------+---------+-------------------+...
+| -1    | 96     | 23.35GB | NOT CACHED   | NOT CACHED        | PARQUET | false             |...
++-------+--------+---------+--------------+-------------------+---------+-------------------+...
+
+compute stats parquet_snappy;
++-----------------------------------------+
+| summary                                 |
++-----------------------------------------+
+| Updated 1 partition(s) and 6 column(s). |
++-----------------------------------------+
+
+
+show table stats parquet_snappy;
++------------+--------+---------+--------------+-------------------+---------+-------------------+...
+| #Rows      | #Files | Size    | Bytes Cached | Cache Replication | Format  | Incremental stats |...
++------------+--------+---------+--------------+-------------------+---------+-------------------+...
+| 1000000000 | 96     | 23.35GB | NOT CACHED   | NOT CACHED        | PARQUET | false             |...
++------------+--------+---------+--------------+-------------------+---------+-------------------+...
+</code></pre>
+
+      <p class="p">
+        Impala performs some optimizations using this metadata on its own, and other optimizations by
+        using a combination of table and column statistics.
+      </p>
+
+      <p class="p">
+        To check that table statistics are available for a table, and see the details of those statistics, use the
+        statement <code class="ph codeph">SHOW TABLE STATS <var class="keyword varname">table_name</var></code>. See
+        <a class="xref" href="impala_show.html#show">SHOW Statement</a> for details.
+      </p>
+
+      <p class="p">
+        If you use the Hive-based methods of gathering statistics, see
+        <a class="xref" href="https://cwiki.apache.org/confluence/display/Hive/StatsDev" target="_blank">the
+        Hive wiki</a> for information about the required configuration on the Hive side. Where practical,
+        use the Impala <code class="ph codeph">COMPUTE STATS</code> statement to avoid potential configuration and scalability
+        issues with the statistics-gathering process.
+      </p>
+
+      <p class="p">
+        If you run the Hive statement <code class="ph codeph">ANALYZE TABLE COMPUTE STATISTICS FOR COLUMNS</code>,
+        Impala can only use the resulting column statistics if the table is unpartitioned.
+        Impala cannot use Hive-generated column statistics for a partitioned table.
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="perf_column_stats__column_stats" id="perf_stats__perf_column_stats">
+
+    <h2 class="title topictitle2" id="perf_column_stats__column_stats">Overview of Column Statistics</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        The Impala query planner can make use of statistics about individual columns when that metadata is
+        available in the metastore database. This technique is most valuable for columns compared across tables in
+        <a class="xref" href="impala_perf_joins.html#perf_joins">join queries</a>, to help estimate how many rows the query
+        will retrieve from each table. <span class="ph"> These statistics are also important for correlated
+        subqueries using the <code class="ph codeph">EXISTS()</code> or <code class="ph codeph">IN()</code> operators, which are processed
+        internally the same way as join queries.</span>
+      </p>
+
+      <p class="p">
+        The following example shows column stats for an unpartitioned Parquet table.
+        The values for the maximum and average sizes of some types are always available,
+        because those figures are constant for numeric and other fixed-size types.
+        Initially, the number of distinct values is not known, because it requires a potentially expensive
+        scan through the entire table, and so that value is displayed as -1.
+        The same applies to maximum and average sizes of variable-sized types, such as <code class="ph codeph">STRING</code>.
+        The <code class="ph codeph">COMPUTE STATS</code> statement fills in most unknown column stats values.
+        (It does not record the number of <code class="ph codeph">NULL</code> values, because currently Impala
+        does not use that figure for query optimization.)
+      </p>
+
+<pre class="pre codeblock"><code>
+show column stats parquet_snappy;
++-------------+----------+------------------+--------+----------+----------+
+| Column      | Type     | #Distinct Values | #Nulls | Max Size | Avg Size |
++-------------+----------+------------------+--------+----------+----------+
+| id          | BIGINT   | -1               | -1     | 8        | 8        |
+| val         | INT      | -1               | -1     | 4        | 4        |
+| zerofill    | STRING   | -1               | -1     | -1       | -1       |
+| name        | STRING   | -1               | -1     | -1       | -1       |
+| assertion   | BOOLEAN  | -1               | -1     | 1        | 1        |
+| location_id | SMALLINT | -1               | -1     | 2        | 2        |
++-------------+----------+------------------+--------+----------+----------+
+
+compute stats parquet_snappy;
++-----------------------------------------+
+| summary                                 |
++-----------------------------------------+
+| Updated 1 partition(s) and 6 column(s). |
++-----------------------------------------+
+
+show column stats parquet_snappy;
++-------------+----------+------------------+--------+----------+-------------------+
+| Column      | Type     | #Distinct Values | #Nulls | Max Size | Avg Size          |
++-------------+----------+------------------+--------+----------+-------------------+
+| id          | BIGINT   | 183861280        | -1     | 8        | 8                 |
+| val         | INT      | 139017           | -1     | 4        | 4                 |
+| zerofill    | STRING   | 101761           | -1     | 6        | 6                 |
+| name        | STRING   | 145636240        | -1     | 22       | 13.00020027160645 |
+| assertion   | BOOLEAN  | 2                | -1     | 1        | 1                 |
+| location_id | SMALLINT | 339              | -1     | 2        | 2                 |
++-------------+----------+------------------+--------+----------+-------------------+
+</code></pre>
+
+      <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+        <p class="p">
+          For column statistics to be effective in Impala, you also need to have table statistics for the
+          applicable tables, as described in <a class="xref" href="impala_perf_stats.html#perf_table_stats">Overview of Table Statistics</a>. When you use
+          the Impala <code class="ph codeph">COMPUTE STATS</code> statement, both table and column statistics are automatically
+          gathered at the same time, for all columns in the table.
+        </p>
+      </div>
+
+      <div class="note note note_note"><span class="note__title notetitle">Note:</span>  Prior to Impala 1.4.0,
+          <code class="ph codeph">COMPUTE STATS</code> counted the number of
+          <code class="ph codeph">NULL</code> values in each column and recorded that figure
+        in the metastore database. Because Impala does not currently use the
+          <code class="ph codeph">NULL</code> count during query planning, Impala 1.4.0 and
+        higher speeds up the <code class="ph codeph">COMPUTE STATS</code> statement by
+        skipping this <code class="ph codeph">NULL</code> counting. </div>
+
+
+
+      <p class="p">
+        To check whether column statistics are available for a particular set of columns, use the <code class="ph codeph">SHOW
+        COLUMN STATS <var class="keyword varname">table_name</var></code> statement, or check the extended
+        <code class="ph codeph">EXPLAIN</code> output for a query against that table that refers to those columns. See
+        <a class="xref" href="impala_show.html#show">SHOW Statement</a> and <a class="xref" href="impala_explain.html#explain">EXPLAIN Statement</a> for details.
+      </p>
+
+      <p class="p">
+        If you run the Hive statement <code class="ph codeph">ANALYZE TABLE COMPUTE STATISTICS FOR COLUMNS</code>,
+        Impala can only use the resulting column statistics if the table is unpartitioned.
+        Impala cannot use Hive-generated column statistics for a partitioned table.
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="perf_stats_partitions__stats_partitions" id="perf_stats__perf_stats_partitions">
+    <h2 class="title topictitle2" id="perf_stats_partitions__stats_partitions">How Table and Column Statistics Work for Partitioned Tables</h2>
+    <div class="body conbody">
+
+      <p class="p">
+        When you use Impala for <span class="q">"big data"</span>, you are highly likely to use partitioning
+        for your biggest tables, the ones representing data that can be logically divided
+        based on dates, geographic regions, or similar criteria. The table and column statistics
+        are especially useful for optimizing queries on such tables. For example, a query involving
+        one year might involve substantially more or less data than a query involving a different year,
+        or a range of several years. Each query might be optimized differently as a result.
+      </p>
+
+      <p class="p">
+        The following examples show how table and column stats work with a partitioned table.
+        The table for this example is partitioned by year, month, and day.
+        For simplicity, the sample data consists of 5 partitions, all from the same year and month.
+        Table stats are collected independently for each partition. (In fact, the
+        <code class="ph codeph">SHOW PARTITIONS</code> statement displays exactly the same information as
+        <code class="ph codeph">SHOW TABLE STATS</code> for a partitioned table.) Column stats apply to
+        the entire table, not to individual partitions. Because the partition key column values
+        are represented as HDFS directories, their characteristics are typically known in advance,
+        even when the values for non-key columns are shown as -1.
+      </p>
+
+<pre class="pre codeblock"><code>
+show partitions year_month_day;
++-------+-------+-----+-------+--------+---------+--------------+-------------------+---------+...
+| year  | month | day | #Rows | #Files | Size    | Bytes Cached | Cache Replication | Format  |...
++-------+-------+-----+-------+--------+---------+--------------+-------------------+---------+...
+| 2013  | 12    | 1   | -1    | 1      | 2.51MB  | NOT CACHED   | NOT CACHED        | PARQUET |...
+| 2013  | 12    | 2   | -1    | 1      | 2.53MB  | NOT CACHED   | NOT CACHED        | PARQUET |...
+| 2013  | 12    | 3   | -1    | 1      | 2.52MB  | NOT CACHED   | NOT CACHED        | PARQUET |...
+| 2013  | 12    | 4   | -1    | 1      | 2.51MB  | NOT CACHED   | NOT CACHED        | PARQUET |...
+| 2013  | 12    | 5   | -1    | 1      | 2.52MB  | NOT CACHED   | NOT CACHED        | PARQUET |...
+| Total |       |     | -1    | 5      | 12.58MB | 0B           |                   |         |...
++-------+-------+-----+-------+--------+---------+--------------+-------------------+---------+...
+
+show table stats year_month_day;
++-------+-------+-----+-------+--------+---------+--------------+-------------------+---------+...
+| year  | month | day | #Rows | #Files | Size    | Bytes Cached | Cache Replication | Format  |...
++-------+-------+-----+-------+--------+---------+--------------+-------------------+---------+...
+| 2013  | 12    | 1   | -1    | 1      | 2.51MB  | NOT CACHED   | NOT CACHED        | PARQUET |...
+| 2013  | 12    | 2   | -1    | 1      | 2.53MB  | NOT CACHED   | NOT CACHED        | PARQUET |...
+| 2013  | 12    | 3   | -1    | 1      | 2.52MB  | NOT CACHED   | NOT CACHED        | PARQUET |...
+| 2013  | 12    | 4   | -1    | 1      | 2.51MB  | NOT CACHED   | NOT CACHED        | PARQUET |...
+| 2013  | 12    | 5   | -1    | 1      | 2.52MB  | NOT CACHED   | NOT CACHED        | PARQUET |...
+| Total |       |     | -1    | 5      | 12.58MB | 0B           |                   |         |...
++-------+-------+-----+-------+--------+---------+--------------+-------------------+---------+...
+
+show column stats year_month_day;
++-----------+---------+------------------+--------+----------+----------+
+| Column    | Type    | #Distinct Values | #Nulls | Max Size | Avg Size |
++-----------+---------+------------------+--------+----------+----------+
+| id        | INT     | -1               | -1     | 4        | 4        |
+| val       | INT     | -1               | -1     | 4        | 4        |
+| zfill     | STRING  | -1               | -1     | -1       | -1       |
+| name      | STRING  | -1               | -1     | -1       | -1       |
+| assertion | BOOLEAN | -1               | -1     | 1        | 1        |
+| year      | INT     | 1                | 0      | 4        | 4        |
+| month     | INT     | 1                | 0      | 4        | 4        |
+| day       | INT     | 5                | 0      | 4        | 4        |
++-----------+---------+------------------+--------+----------+----------+
+
+compute stats year_month_day;
++-----------------------------------------+
+| summary                                 |
++-----------------------------------------+
+| Updated 5 partition(s) and 5 column(s). |
++-----------------------------------------+
+
+show table stats year_month_day;
++-------+-------+-----+--------+--------+---------+--------------+-------------------+---------+...
+| year  | month | day | #Rows  | #Files | Size    | Bytes Cached | Cache Replication | Format  |...
++-------+-------+-----+--------+--------+---------+--------------+-------------------+---------+...
+| 2013  | 12    | 1   | 93606  | 1      | 2.51MB  | NOT CACHED   | NOT CACHED        | PARQUET |...
+| 2013  | 12    | 2   | 94158  | 1      | 2.53MB  | NOT CACHED   | NOT CACHED        | PARQUET |...
+| 2013  | 12    | 3   | 94122  | 1      | 2.52MB  | NOT CACHED   | NOT CACHED        | PARQUET |...
+| 2013  | 12    | 4   | 93559  | 1      | 2.51MB  | NOT CACHED   | NOT CACHED        | PARQUET |...
+| 2013  | 12    | 5   | 93845  | 1      | 2.52MB  | NOT CACHED   | NOT CACHED        | PARQUET |...
+| Total |       |     | 469290 | 5      | 12.58MB | 0B           |                   |         |...
++-------+-------+-----+--------+--------+---------+--------------+-------------------+---------+...
+
+show column stats year_month_day;
++-----------+---------+------------------+--------+----------+-------------------+
+| Column    | Type    | #Distinct Values | #Nulls | Max Size | Avg Size          |
++-----------+---------+------------------+--------+----------+-------------------+
+| id        | INT     | 511129           | -1     | 4        | 4                 |
+| val       | INT     | 364853           | -1     | 4        | 4                 |
+| zfill     | STRING  | 311430           | -1     | 6        | 6                 |
+| name      | STRING  | 471975           | -1     | 22       | 13.00160026550293 |
+| assertion | BOOLEAN | 2                | -1     | 1        | 1                 |
+| year      | INT     | 1                | 0      | 4        | 4                 |
+| month     | INT     | 1                | 0      | 4        | 4                 |
+| day       | INT     | 5                | 0      | 4        | 4                 |
++-----------+---------+------------------+--------+----------+-------------------+
+</code></pre>
+
+      <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+        Partitioned tables can grow so large that scanning the entire table, as the <code class="ph codeph">COMPUTE STATS</code>
+        statement does, is impractical just to update the statistics for a new partition. The standard
+        <code class="ph codeph">COMPUTE STATS</code> statement might take hours, or even days. That situation is where you switch
+        to using incremental statistics, a feature available in <span class="keyword">Impala 2.1</span> and higher.
+        See <a class="xref" href="impala_perf_stats.html#perf_stats_incremental">Overview of Incremental Statistics</a> for details about this feature
+        and the <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> syntax.
+      </div>
+
+      <p class="p">
+        If you run the Hive statement <code class="ph codeph">ANALYZE TABLE COMPUTE STATISTICS FOR COLUMNS</code>,
+        Impala can only use the resulting column statistics if the table is unpartitioned.
+        Impala cannot use Hive-generated column statistics for a partitioned table.
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="perf_stats_incremental__incremental_stats" id="perf_stats__perf_stats_incremental">
+
+    <h2 class="title topictitle2" id="perf_stats_incremental__incremental_stats">Overview of Incremental Statistics</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        In Impala 2.1.0 and higher, you can use the syntax <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> and
+        <code class="ph codeph">DROP INCREMENTAL STATS</code>. The <code class="ph codeph">INCREMENTAL</code> clauses work with incremental
+        statistics, a specialized feature for partitioned tables that are large or frequently updated with new
+        partitions.
+      </p>
+
+      <p class="p">
+        When you compute incremental statistics for a partitioned table, by default Impala only processes those
+        partitions that do not yet have incremental statistics. By processing only newly added partitions, you can
+        keep statistics up to date for large partitioned tables, without incurring the overhead of reprocessing the
+        entire table each time.
+      </p>
+
+      <p class="p">
+        You can also compute or drop statistics for a single partition by including a <code class="ph codeph">PARTITION</code>
+        clause in the <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> or <code class="ph codeph">DROP INCREMENTAL STATS</code>
+        statement.
+      </p>
+
+      <p class="p">
+        The metadata for incremental statistics is handled differently from the original style of statistics:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          <p class="p">
+            If you have an existing partitioned table for which you have already computed statistics, issuing
+            <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> without a partition clause causes Impala to rescan the
+            entire table. Once the incremental statistics are computed, any future <code class="ph codeph">COMPUTE INCREMENTAL
+            STATS</code> statements only scan any new partitions and any partitions where you performed
+            <code class="ph codeph">DROP INCREMENTAL STATS</code>.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            The <code class="ph codeph">SHOW TABLE STATS</code> and <code class="ph codeph">SHOW PARTITIONS</code> statements now include an
+            additional column showing whether incremental statistics are available for each column. A partition
+            could already be covered by the original type of statistics based on a prior <code class="ph codeph">COMPUTE
+            STATS</code> statement, as indicated by a value other than <code class="ph codeph">-1</code> under the
+            <code class="ph codeph">#Rows</code> column. Impala query planning uses either kind of statistics when available.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> takes more time than <code class="ph codeph">COMPUTE STATS</code> for the
+            same volume of data. Therefore it is most suitable for tables with large data volume where new
+            partitions are added frequently, making it impractical to run a full <code class="ph codeph">COMPUTE STATS</code>
+            operation for each new partition. For unpartitioned tables, or partitioned tables that are loaded once
+            and not updated with new partitions, use the original <code class="ph codeph">COMPUTE STATS</code> syntax.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> uses some memory in the <span class="keyword cmdname">catalogd</span> process,
+            proportional to the number of partitions and number of columns in the applicable table. The memory
+            overhead is approximately 400 bytes for each column in each partition. This memory is reserved in the
+            <span class="keyword cmdname">catalogd</span> daemon, the <span class="keyword cmdname">statestored</span> daemon, and in each instance of
+            the <span class="keyword cmdname">impalad</span> daemon.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            In cases where new files are added to an existing partition, issue a <code class="ph codeph">REFRESH</code> statement
+            for the table, followed by a <code class="ph codeph">DROP INCREMENTAL STATS</code> and <code class="ph codeph">COMPUTE INCREMENTAL
+            STATS</code> sequence for the changed partition.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            The <code class="ph codeph">DROP INCREMENTAL STATS</code> statement operates only on a single partition at a time. To
+            remove statistics (whether incremental or not) from all partitions of a table, issue a <code class="ph codeph">DROP
+            STATS</code> statement with no <code class="ph codeph">INCREMENTAL</code> or <code class="ph codeph">PARTITION</code> clauses.
+          </p>
+        </li>
+      </ul>
+
+      <p class="p">
+        The following considerations apply to incremental statistics when the structure of an existing table is
+        changed (known as <dfn class="term">schema evolution</dfn>):
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          <p class="p">
+            If you use an <code class="ph codeph">ALTER TABLE</code> statement to drop a column, the existing statistics remain
+            valid and <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> does not rescan any partitions.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            If you use an <code class="ph codeph">ALTER TABLE</code> statement to add a column, Impala rescans all partitions and
+            fills in the appropriate column-level values the next time you run <code class="ph codeph">COMPUTE INCREMENTAL
+            STATS</code>.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            If you use an <code class="ph codeph">ALTER TABLE</code> statement to change the data type of a column, Impala
+            rescans all partitions and fills in the appropriate column-level values the next time you run
+            <code class="ph codeph">COMPUTE INCREMENTAL STATS</code>.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            If you use an <code class="ph codeph">ALTER TABLE</code> statement to change the file format of a table, the existing
+            statistics remain valid and a subsequent <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> does not rescan any
+            partitions.
+          </p>
+        </li>
+      </ul>
+
+      <p class="p">
+        See <a class="xref" href="impala_compute_stats.html#compute_stats">COMPUTE STATS Statement</a> and
+        <a class="xref" href="impala_drop_stats.html#drop_stats">DROP STATS Statement</a> for syntax details.
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title6" id="perf_stats__perf_stats_computing">
+    <h2 class="title topictitle2" id="ariaid-title6">Generating Table and Column Statistics (COMPUTE STATS Statement)</h2>
+    <div class="body conbody">
+
+      <p class="p">
+        To gather table statistics after loading data into a table or partition, you typically use the
+        <code class="ph codeph">COMPUTE STATS</code> statement. This statement is available in Impala 1.2.2 and higher.
+        It gathers both table statistics and column statistics for all columns in a single operation.
+        For large partitioned tables, where you frequently need to update statistics and it is impractical
+        to scan the entire table each time, use the syntax <code class="ph codeph">COMPUTE INCREMENTAL STATS</code>,
+        which is available in <span class="keyword">Impala 2.1</span> and higher.
+      </p>
+
+      <p class="p">
+        If you use Hive as part of your ETL workflow, you can also use Hive to generate table and
+        column statistics. You might need to do extra configuration within Hive itself, the metastore,
+        or even set up a separate database to hold Hive-generated statistics. You might need to run
+        multiple statements to generate all the necessary statistics. Therefore, prefer the
+        Impala <code class="ph codeph">COMPUTE STATS</code> statement where that technique is practical.
+        For details about collecting statistics through Hive, see
+        <a class="xref" href="https://cwiki.apache.org/confluence/display/Hive/StatsDev" target="_blank">the Hive wiki</a>.
+      </p>
+
+      <p class="p">
+        If you run the Hive statement <code class="ph codeph">ANALYZE TABLE COMPUTE STATISTICS FOR COLUMNS</code>,
+        Impala can only use the resulting column statistics if the table is unpartitioned.
+        Impala cannot use Hive-generated column statistics for a partitioned table.
+      </p>
+
+
+
+      <p class="p">
+
+
+        For your very largest tables, you might find that <code class="ph codeph">COMPUTE STATS</code> or even <code class="ph codeph">COMPUTE INCREMENTAL STATS</code>
+        take so long to scan the data that it is impractical to use them regularly. In such a case, after adding a partition or inserting new data,
+        you can update just the number of rows property through an <code class="ph codeph">ALTER TABLE</code> statement.
+        See <a class="xref" href="impala_perf_stats.html#perf_table_stats_manual">Setting the NUMROWS Value Manually through ALTER TABLE</a> for details.
+        Because the column statistics might be left in a stale state, do not use this technique as a replacement
+        for <code class="ph codeph">COMPUTE STATS</code>. Only use this technique if all other means of collecting statistics are impractical, or as a
+        low-overhead operation that you run in between periodic <code class="ph codeph">COMPUTE STATS</code> or <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> operations.
+      </p>
+
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title7" id="perf_stats__perf_stats_checking">
+
+    <h2 class="title topictitle2" id="ariaid-title7">Detecting Missing Statistics</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        You can check whether a specific table has statistics using the <code class="ph codeph">SHOW TABLE STATS</code> statement
+        (for any table) or the <code class="ph codeph">SHOW PARTITIONS</code> statement (for a partitioned table). Both
+        statements display the same information. If a table or a partition does not have any statistics, the
+        <code class="ph codeph">#Rows</code> field contains <code class="ph codeph">-1</code>. Once you compute statistics for the table or
+        partition, the <code class="ph codeph">#Rows</code> field changes to an accurate value.
+      </p>
+
+      <p class="p">
+        The following example shows a table that initially does not have any statistics. The <code class="ph codeph">SHOW TABLE
+        STATS</code> statement displays different values for <code class="ph codeph">#Rows</code> before and after the
+        <code class="ph codeph">COMPUTE STATS</code> operation.
+      </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; create table no_stats (x int);
+[localhost:21000] &gt; show table stats no_stats;
++-------+--------+------+--------------+--------+-------------------+
+| #Rows | #Files | Size | Bytes Cached | Format | Incremental stats |
++-------+--------+------+--------------+--------+-------------------+
+| -1    | 0      | 0B   | NOT CACHED   | TEXT   | false             |
++-------+--------+------+--------------+--------+-------------------+
+[localhost:21000] &gt; compute stats no_stats;
++-----------------------------------------+
+| summary                                 |
++-----------------------------------------+
+| Updated 1 partition(s) and 1 column(s). |
++-----------------------------------------+
+[localhost:21000] &gt; show table stats no_stats;
++-------+--------+------+--------------+--------+-------------------+
+| #Rows | #Files | Size | Bytes Cached | Format | Incremental stats |
++-------+--------+------+--------------+--------+-------------------+
+| 0     | 0      | 0B   | NOT CACHED   | TEXT   | false             |
++-------+--------+------+--------------+--------+-------------------+
+</code></pre>
+
+      <p class="p">
+        The following example shows a similar progression with a partitioned table. Initially,
+        <code class="ph codeph">#Rows</code> is <code class="ph codeph">-1</code>. After a <code class="ph codeph">COMPUTE STATS</code> operation,
+        <code class="ph codeph">#Rows</code> changes to an accurate value. Any newly added partition starts with no statistics,
+        meaning that you must collect statistics after adding a new partition.
+      </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; create table no_stats_partitioned (x int) partitioned by (year smallint);
+[localhost:21000] &gt; show table stats no_stats_partitioned;
++-------+-------+--------+------+--------------+--------+-------------------+
+| year  | #Rows | #Files | Size | Bytes Cached | Format | Incremental stats |
++-------+-------+--------+------+--------------+--------+-------------------+
+| Total | -1    | 0      | 0B   | 0B           |        |                   |
++-------+-------+--------+------+--------------+--------+-------------------+
+[localhost:21000] &gt; show partitions no_stats_partitioned;
++-------+-------+--------+------+--------------+--------+-------------------+
+| year  | #Rows | #Files | Size | Bytes Cached | Format | Incremental stats |
++-------+-------+--------+------+--------------+--------+-------------------+
+| Total | -1    | 0      | 0B   | 0B           |        |                   |
++-------+-------+--------+------+--------------+--------+-------------------+
+[localhost:21000] &gt; alter table no_stats_partitioned add partition (year=2013);
+[localhost:21000] &gt; compute stats no_stats_partitioned;
++-----------------------------------------+
+| summary                                 |
++-----------------------------------------+
+| Updated 1 partition(s) and 1 column(s). |
++-----------------------------------------+
+[localhost:21000] &gt; alter table no_stats_partitioned add partition (year=2014);
+[localhost:21000] &gt; show partitions no_stats_partitioned;
++-------+-------+--------+------+--------------+--------+-------------------+
+| year  | #Rows | #Files | Size | Bytes Cached | Format | Incremental stats |
++-------+-------+--------+------+--------------+--------+-------------------+
+| 2013  | 0     | 0      | 0B   | NOT CACHED   | TEXT   | false             |
+| 2014  | -1    | 0      | 0B   | NOT CACHED   | TEXT   | false             |
+| Total | 0     | 0      | 0B   | 0B           |        |                   |
++-------+-------+--------+------+--------------+--------+-------------------+
+</code></pre>
+
+      <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+        Because the default <code class="ph codeph">COMPUTE STATS</code> statement creates and updates statistics for all
+        partitions in a table, if you expect to frequently add new partitions, use the <code class="ph codeph">COMPUTE INCREMENTAL
+        STATS</code> syntax instead, which lets you compute stats for a single specified partition, or only for
+        those partitions that do not already have incremental stats.
+      </div>
+
+      <p class="p">
+        If checking each individual table is impractical, due to a large number of tables or views that hide the
+        underlying base tables, you can also check for missing statistics for a particular query. Use the
+        <code class="ph codeph">EXPLAIN</code> statement to preview query efficiency before actually running the query. Use the
+        query profile output available through the <code class="ph codeph">PROFILE</code> command in
+        <span class="keyword cmdname">impala-shell</span> or the web UI to verify query execution and timing after running the query.
+        Both the <code class="ph codeph">EXPLAIN</code> plan and the <code class="ph codeph">PROFILE</code> output display a warning if any
+        tables or partitions involved in the query do not have statistics.
+      </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; create table no_stats (x int);
+[localhost:21000] &gt; explain select count(*) from no_stats;
++------------------------------------------------------------------------------------+
+| Explain String                                                                     |
++------------------------------------------------------------------------------------+
+| Estimated Per-Host Requirements: Memory=10.00MB VCores=1                           |
+| WARNING: The following tables are missing relevant table and/or column statistics. |
+| incremental_stats.no_stats                                                         |
+|                                                                                    |
+| 03:AGGREGATE [FINALIZE]                                                            |
+| |  output: count:merge(*)                                                          |
+| |                                                                                  |
+| 02:EXCHANGE [UNPARTITIONED]                                                        |
+| |                                                                                  |
+| 01:AGGREGATE                                                                       |
+| |  output: count(*)                                                                |
+| |                                                                                  |
+| 00:SCAN HDFS [incremental_stats.no_stats]                                          |
+|    partitions=1/1 files=0 size=0B                                                  |
++------------------------------------------------------------------------------------+
+</code></pre>
+
+      <p class="p">
+        Because Impala uses the <dfn class="term">partition pruning</dfn> technique when possible to only evaluate certain
+        partitions, if you have a partitioned table with statistics for some partitions and not others, whether or
+        not the <code class="ph codeph">EXPLAIN</code> statement shows the warning depends on the actual partitions used by the
+        query. For example, you might see warnings or not for different queries against the same table:
+      </p>
+
+<pre class="pre codeblock"><code>-- No warning because all the partitions for the year 2012 have stats.
+EXPLAIN SELECT ... FROM t1 WHERE year = 2012;
+
+-- Missing stats warning because one or more partitions in this range
+-- do not have stats.
+EXPLAIN SELECT ... FROM t1 WHERE year BETWEEN 2006 AND 2009;
+</code></pre>
+
+      <p class="p">
+        To confirm if any partitions at all in the table are missing statistics, you might explain a query that
+        scans the entire table, such as <code class="ph codeph">SELECT COUNT(*) FROM <var class="keyword varname">table_name</var></code>.
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title8" id="perf_stats__perf_stats_collecting">
+
+    <h2 class="title topictitle2" id="ariaid-title8">Keeping Statistics Up to Date</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        When the contents of a table or partition change significantly, recompute the stats for the relevant table
+        or partition. The degree of change that qualifies as <span class="q">"significant"</span> varies, depending on the absolute
+        and relative sizes of the tables. Typically, if you add more than 30% more data to a table, it is
+        worthwhile to recompute stats, because the differences in number of rows and number of distinct values
+        might cause Impala to choose a different join order when that table is used in join queries. This guideline
+        is most important for the largest tables. For example, adding 30% new data to a table containing 1 TB has a
+        greater effect on join order than adding 30% to a table containing only a few megabytes, and the larger
+        table has a greater effect on query performance if Impala chooses a suboptimal join order as a result of
+        outdated statistics.
+      </p>
+
+      <p class="p">
+        If you reload a complete new set of data for a table, but the number of rows and number of distinct values
+        for each column is relatively unchanged from before, you do not need to recompute stats for the table.
+      </p>
+
+      <p class="p">
+        If the statistics for a table are out of date, and the table's large size makes it impractical to recompute
+        new stats immediately, you can use the <code class="ph codeph">DROP STATS</code> statement to remove the obsolete
+        statistics, making it easier to identify tables that need a new <code class="ph codeph">COMPUTE STATS</code> operation.
+      </p>
+
+      <p class="p">
+        For a large partitioned table, consider using the incremental stats feature available in Impala 2.1.0 and
+        higher, as explained in <a class="xref" href="impala_perf_stats.html#perf_stats_incremental">Overview of Incremental Statistics</a>. If you add a new
+        partition to a table, it is worthwhile to recompute incremental stats, because the operation only scans the
+        data for that one new partition.
+      </p>
+    </div>
+  </article>
+
+
+
+  
+
+
+
+  
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title9" id="perf_stats__perf_table_stats_manual">
+
+    <h2 class="title topictitle2" id="ariaid-title9">Setting the NUMROWS Value Manually through ALTER TABLE</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        The most crucial piece of data in all the statistics is the number of rows in the table (for an
+        unpartitioned or partitioned table) and for each partition (for a partitioned table). The <code class="ph codeph">COMPUTE STATS</code>
+        statement always gathers statistics about all columns, as well as overall table statistics. If it is not
+        practical to do a full <code class="ph codeph">COMPUTE STATS</code> or <code class="ph codeph">COMPUTE INCREMENTAL STATS</code>
+        operation after adding a partition or inserting data, or if you can see that Impala would produce a more
+        efficient plan if the number of rows was different, you can manually set the number of rows through an
+        <code class="ph codeph">ALTER TABLE</code> statement:
+      </p>
+
+<pre class="pre codeblock"><code>
+-- Set total number of rows. Applies to both unpartitioned and partitioned tables.
+alter table <var class="keyword varname">table_name</var> set tblproperties('numRows'='<var class="keyword varname">new_value</var>', 'STATS_GENERATED_VIA_STATS_TASK'='true');
+
+-- Set total number of rows for a specific partition. Applies to partitioned tables only.
+-- You must specify all the partition key columns in the PARTITION clause.
+alter table <var class="keyword varname">table_name</var> partition (<var class="keyword varname">keycol1</var>=<var class="keyword varname">val1</var>,<var class="keyword varname">keycol2</var>=<var class="keyword varname">val2</var>...) set tblproperties('numRows'='<var class="keyword varname">new_value</var>', 'STATS_GENERATED_VIA_STATS_TASK'='true');
+</code></pre>
+
+      <p class="p">
+        This statement avoids re-scanning any data files. (The requirement to include the <code class="ph codeph">STATS_GENERATED_VIA_STATS_TASK</code> property is relatively new, as a
+        result of the issue <a class="xref" href="https://issues.apache.org/jira/browse/HIVE-8648" target="_blank">HIVE-8648</a>
+        for the Hive metastore.)
+      </p>
+
+<pre class="pre codeblock"><code>create table analysis_data stored as parquet as select * from raw_data;
+Inserted 1000000000 rows in 181.98s
+compute stats analysis_data;
+insert into analysis_data select * from smaller_table_we_forgot_before;
+Inserted 1000000 rows in 15.32s
+-- Now there are 1001000000 rows. We can update this single data point in the stats.
+alter table analysis_data set tblproperties('numRows'='1001000000', 'STATS_GENERATED_VIA_STATS_TASK'='true');</code></pre>
+
+      <p class="p">
+        For a partitioned table, update both the per-partition number of rows and the number of rows for the whole
+        table:
+      </p>
+
+<pre class="pre codeblock"><code>-- If the table originally contained 1 million rows, and we add another partition with 30 thousand rows,
+-- change the numRows property for the partition and the overall table.
+alter table partitioned_data partition(year=2009, month=4) set tblproperties ('numRows'='30000', 'STATS_GENERATED_VIA_STATS_TASK'='true');
+alter table partitioned_data set tblproperties ('numRows'='1030000', 'STATS_GENERATED_VIA_STATS_TASK'='true');</code></pre>
+
+      <p class="p">
+        In practice, the <code class="ph codeph">COMPUTE STATS</code> statement, or <code class="ph codeph">COMPUTE INCREMENTAL STATS</code>
+        for a partitioned table, should be fast and convenient enough that this technique is only useful for the very
+        largest partitioned tables.
+        
+        
+        Because the column statistics might be left in a stale state, do not use this technique as a replacement
+        for <code class="ph codeph">COMPUTE STATS</code>. Only use this technique if all other means of collecting statistics are impractical, or as a
+        low-overhead operation that you run in between periodic <code class="ph codeph">COMPUTE STATS</code> or <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> operations.
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title10" id="perf_stats__perf_column_stats_manual">
+    <h2 class="title topictitle2" id="ariaid-title10">Setting Column Stats Manually through ALTER TABLE</h2>
+    <div class="body conbody">
+      <p class="p">
+        In <span class="keyword">Impala 2.6</span> and higher, you can also use the <code class="ph codeph">SET COLUMN STATS</code>
+        clause of <code class="ph codeph">ALTER TABLE</code> to manually set or change column statistics.
+        Only use this technique in cases where it is impractical to run
+        <code class="ph codeph">COMPUTE STATS</code> or <code class="ph codeph">COMPUTE INCREMENTAL STATS</code>
+        frequently enough to keep up with data changes for a huge table.
+      </p>
+      <div class="p">
+        You specify a case-insensitive symbolic name for the kind of statistics:
+        <code class="ph codeph">numDVs</code>, <code class="ph codeph">numNulls</code>, <code class="ph codeph">avgSize</code>, <code class="ph codeph">maxSize</code>.
+        The key names and values are both quoted. This operation applies to an entire table,
+        not a specific partition. For example:
+<pre class="pre codeblock"><code>
+create table t1 (x int, s string);
+insert into t1 values (1, 'one'), (2, 'two'), (2, 'deux');
+show column stats t1;
++--------+--------+------------------+--------+----------+----------+
+| Column | Type   | #Distinct Values | #Nulls | Max Size | Avg Size |
++--------+--------+------------------+--------+----------+----------+
+| x      | INT    | -1               | -1     | 4        | 4        |
+| s      | STRING | -1               | -1     | -1       | -1       |
++--------+--------+------------------+--------+----------+----------+
+alter table t1 set column stats x ('numDVs'='2','numNulls'='0');
+alter table t1 set column stats s ('numdvs'='3','maxsize'='4');
+show column stats t1;
++--------+--------+------------------+--------+----------+----------+
+| Column | Type   | #Distinct Values | #Nulls | Max Size | Avg Size |
++--------+--------+------------------+--------+----------+----------+
+| x      | INT    | 2                | 0      | 4        | 4        |
+| s      | STRING | 3                | -1     | 4        | -1       |
++--------+--------+------------------+--------+----------+----------+
+</code></pre>
+      </div>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title11" id="perf_stats__perf_stats_examples">
+
+    <h2 class="title topictitle2" id="ariaid-title11">Examples of Using Table and Column Statistics with Impala</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        The following examples walk through a sequence of <code class="ph codeph">SHOW TABLE STATS</code>, <code class="ph codeph">SHOW COLUMN
+        STATS</code>, <code class="ph codeph">ALTER TABLE</code>, and <code class="ph codeph">SELECT</code> and <code class="ph codeph">INSERT</code>
+        statements to illustrate various aspects of how Impala uses statistics to help optimize queries.
+      </p>
+
+      <p class="p">
+        This example shows table and column statistics for the <code class="ph codeph">STORE</code> column used in the
+        <a class="xref" href="http://www.tpc.org/tpcds/" target="_blank">TPC-DS benchmarks for decision
+        support</a> systems. It is a tiny table holding data for 12 stores. Initially, before any statistics are
+        gathered by a <code class="ph codeph">COMPUTE STATS</code> statement, most of the numeric fields show placeholder values
+        of -1, indicating that the figures are unknown. The figures that are filled in are values that are easily
+        countable or deducible at the physical level, such as the number of files, total data size of the files,
+        and the maximum and average sizes for data types that have a constant size such as <code class="ph codeph">INT</code>,
+        <code class="ph codeph">FLOAT</code>, and <code class="ph codeph">TIMESTAMP</code>.
+      </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; show table stats store;
++-------+--------+--------+--------+
+| #Rows | #Files | Size   | Format |
++-------+--------+--------+--------+
+| -1    | 1      | 3.08KB | TEXT   |
++-------+--------+--------+--------+
+Returned 1 row(s) in 0.03s
+[localhost:21000] &gt; show column stats store;
++--------------------+-----------+------------------+--------+----------+----------+
+| Column             | Type      | #Distinct Values | #Nulls | Max Size | Avg Size |
++--------------------+-----------+------------------+--------+----------+----------+
+| s_store_sk         | INT       | -1               | -1     | 4        | 4        |
+| s_store_id         | STRING    | -1               | -1     | -1       | -1       |
+| s_rec_start_date   | TIMESTAMP | -1               | -1     | 16       | 16       |
+| s_rec_end_date     | TIMESTAMP | -1               | -1     | 16       | 16       |
+| s_closed_date_sk   | INT       | -1               | -1     | 4        | 4        |
+| s_store_name       | STRING    | -1               | -1     | -1       | -1       |
+| s_number_employees | INT       | -1               | -1     | 4        | 4        |
+| s_floor_space      | INT       | -1               | -1     | 4        | 4        |
+| s_hours            | STRING    | -1               | -1     | -1       | -1       |
+| s_manager          | STRING    | -1               | -1     | -1       | -1       |
+| s_market_id        | INT       | -1               | -1     | 4        | 4        |
+| s_geography_class  | STRING    | -1               | -1     | -1       | -1       |
+| s_market_desc      | STRING    | -1               | -1     | -1       | -1       |
+| s_market_manager   | STRING    | -1               | -1     | -1       | -1       |
+| s_division_id      | INT       | -1               | -1     | 4        | 4        |
+| s_division_name    | STRING    | -1               | -1     | -1       | -1       |
+| s_company_id       | INT       | -1               | -1     | 4        | 4        |
+| s_company_name     | STRING    | -1               | -1     | -1       | -1       |
+| s_street_number    | STRING    | -1               | -1     | -1       | -1       |
+| s_street_name      | STRING    | -1               | -1     | -1       | -1       |
+| s_street_type      | STRING    | -1               | -1     | -1       | -1       |
+| s_suite_number     | STRING    | -1               | -1     | -1       | -1       |
+| s_city             | STRING    | -1               | -1     | -1       | -1       |
+| s_county           | STRING    | -1               | -1     | -1       | -1       |
+| s_state            | STRING    | -1               | -1     | -1       | -1       |
+| s_zip              | STRING    | -1               | -1     | -1       | -1       |
+| s_country          | STRING    | -1               | -1     | -1       | -1       |
+| s_gmt_offset       | FLOAT     | -1               | -1     | 4        | 4        |
+| s_tax_percentage   | FLOAT     | -1               | -1     | 4        | 4        |
++--------------------+-----------+------------------+--------+----------+----------+
+Returned 29 row(s) in 0.04s</code></pre>
+
+      <p class="p">
+        With the Hive <code class="ph codeph">ANALYZE TABLE</code> statement for column statistics, you had to specify each
+        column for which to gather statistics. The Impala <code class="ph codeph">COMPUTE STATS</code> statement automatically
+        gathers statistics for all columns, because it reads through the entire table relatively quickly and can
+        efficiently compute the values for all the columns. This example shows how after running the
+        <code class="ph codeph">COMPUTE STATS</code> statement, statistics are filled in for both the table and all its columns:
+      </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; compute stats store;
++------------------------------------------+
+| summary                                  |
++------------------------------------------+
+| Updated 1 partition(s) and 29 column(s). |
++------------------------------------------+
+Returned 1 row(s) in 1.88s
+[localhost:21000] &gt; show table stats store;
++-------+--------+--------+--------+
+| #Rows | #Files | Size   | Format |
++-------+--------+--------+--------+
+| 12    | 1      | 3.08KB | TEXT   |
++-------+--------+--------+--------+
+Returned 1 row(s) in 0.02s
+[localhost:21000] &gt; show column stats store;
++--------------------+-----------+------------------+--------+----------+-------------------+
+| Column             | Type      | #Distinct Values | #Nulls | Max Size | Avg Size          |
++--------------------+-----------+------------------+--------+----------+-------------------+
+| s_store_sk         | INT       | 12               | -1     | 4        | 4                 |
+| s_store_id         | STRING    | 6                | -1     | 16       | 16                |
+| s_rec_start_date   | TIMESTAMP | 4                | -1     | 16       | 16                |
+| s_rec_end_date     | TIMESTAMP | 3                | -1     | 16       | 16                |
+| s_closed_date_sk   | INT       | 3                | -1     | 4        | 4                 |
+| s_store_name       | STRING    | 8                | -1     | 5        | 4.25              |
+| s_number_employees | INT       | 9                | -1     | 4        | 4                 |
+| s_floor_space      | INT       | 10               | -1     | 4        | 4                 |
+| s_hours            | STRING    | 2                | -1     | 8        | 7.083300113677979 |
+| s_manager          | STRING    | 7                | -1     | 15       | 12                |
+| s_market_id        | INT       | 7                | -1     | 4        | 4                 |
+| s_geography_class  | STRING    | 1                | -1     | 7        | 7                 |
+| s_market_desc      | STRING    | 10               | -1     | 94       | 55.5              |
+| s_market_manager   | STRING    | 7                | -1     | 16       | 14                |
+| s_division_id      | INT       | 1                | -1     | 4        | 4                 |
+| s_division_name    | STRING    | 1                | -1     | 7        | 7                 |
+| s_company_id       | INT       | 1                | -1     | 4        | 4                 |
+| s_company_name     | STRING    | 1                | -1     | 7        | 7                 |
+| s_street_number    | STRING    | 9                | -1     | 3        | 2.833300113677979 |
+| s_street_name      | STRING    | 12               | -1     | 11       | 6.583300113677979 |
+| s_street_type      | STRING    | 8                | -1     | 9        | 4.833300113677979 |
+| s_suite_number     | STRING    | 11               | -1     | 9        | 8.25              |
+| s_city             | STRING    | 2                | -1     | 8        | 6.5               |
+| s_county           | STRING    | 1                | -1     | 17       | 17                |
+| s_state            | STRING    | 1                | -1     | 2        | 2                 |
+| s_zip              | STRING    | 2                | -1     | 5        | 5                 |
+| s_country          | STRING    | 1                | -1     | 13       | 13                |
+| s_gmt_offset       | FLOAT     | 1                | -1     | 4        | 4                 |
+| s_tax_percentage   | FLOAT     | 5                | -1     | 4        | 4                 |
++--------------------+-----------+------------------+--------+----------+-------------------+
+Returned 29 row(s) in 0.04s</code></pre>
+
+      <p class="p">
+        The following example shows how statistics are represented for a partitioned table. In this case, we have
+        set up a table to hold the world's most trivial census data, a single <code class="ph codeph">STRING</code> field,
+        partitioned by a <code class="ph codeph">YEAR</code> column. The table statistics include a separate entry for each
+        partition, plus final totals for the numeric fields. The column statistics include some easily deducible
+        facts for the partitioning column, such as the number of distinct values (the number of partition
+        subdirectories).
+
+      </p>
+
+<pre class="pre codeblock"><code>localhost:21000] &gt; describe census;
++------+----------+---------+
+| name | type     | comment |
++------+----------+---------+
+| name | string   |         |
+| year | smallint |         |
++------+----------+---------+
+Returned 2 row(s) in 0.02s
+[localhost:21000] &gt; show table stats census;
++-------+-------+--------+------+---------+
+| year  | #Rows | #Files | Size | Format  |
++-------+-------+--------+------+---------+
+| 2000  | -1    | 0      | 0B   | TEXT    |
+| 2004  | -1    | 0      | 0B   | TEXT    |
+| 2008  | -1    | 0      | 0B   | TEXT    |
+| 2010  | -1    | 0      | 0B   | TEXT    |
+| 2011  | 0     | 1      | 22B  | TEXT    |
+| 2012  | -1    | 1      | 22B  | TEXT    |
+| 2013  | -1    | 1      | 231B | PARQUET |
+| Total | 0     | 3      | 275B |         |
++-------+-------+--------+------+---------+
+Returned 8 row(s) in 0.02s
+[localhost:21000] &gt; show column stats census;
++--------+----------+------------------+--------+----------+----------+
+| Column | Type     | #Distinct Values | #Nulls | Max Size | Avg Size |
++--------+----------+------------------+--------+----------+----------+
+| name   | STRING   | -1               | -1     | -1       | -1       |
+| year   | SMALLINT | 7                | -1     | 2        | 2        |
++--------+----------+------------------+--------+----------+----------+
+Returned 2 row(s) in 0.02s</code></pre>
+
+      <p class="p">
+        The following example shows how the statistics are filled in by a <code class="ph codeph">COMPUTE STATS</code> statement
+        in Impala.
+      </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; compute stats census;
++-----------------------------------------+
+| summary                                 |
++-----------------------------------------+
+| Updated 3 partition(s) and 1 column(s). |
++-----------------------------------------+
+Returned 1 row(s) in 2.16s
+[localhost:21000] &gt; show table stats census;
++-------+-------+--------+------+---------+
+| year  | #Rows | #Files | Size | Format  |
++-------+-------+--------+------+---------+
+| 2000  | -1    | 0      | 0B   | TEXT    |
+| 2004  | -1    | 0      | 0B   | TEXT    |
+| 2008  | -1    | 0      | 0B   | TEXT    |
+| 2010  | -1    | 0      | 0B   | TEXT    |
+| 2011  | 4     | 1      | 22B  | TEXT    |
+| 2012  | 4     | 1      | 22B  | TEXT    |
+| 2013  | 1     | 1      | 231B | PARQUET |
+| Total | 9     | 3      | 275B |         |
++-------+-------+--------+------+---------+
+Returned 8 row(s) in 0.02s
+[localhost:21000] &gt; show column stats census;
++--------+----------+------------------+--------+----------+----------+
+| Column | Type     | #Distinct Values | #Nulls | Max Size | Avg Size |
++--------+----------+------------------+--------+----------+----------+
+| name   | STRING   | 4                | -1     | 5        | 4.5      |
+| year   | SMALLINT | 7                | -1     | 2        | 2        |
++--------+----------+------------------+--------+----------+----------+
+Returned 2 row(s) in 0.02s</code></pre>
+
+      <p class="p">
+        For examples showing how some queries work differently when statistics are available, see
+        <a class="xref" href="impala_perf_joins.html#perf_joins_examples">Examples of Join Order Optimization</a>. You can see how Impala executes a query
+        differently in each case by observing the <code class="ph codeph">EXPLAIN</code> output before and after collecting
+        statistics. Measure the before and after query times, and examine the throughput numbers in before and
+        after <code class="ph codeph">SUMMARY</code> or <code class="ph codeph">PROFILE</code> output, to verify how much the improved plan
+        speeds up performance.
+      </p>
+    </div>
+  </article>
+</article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_perf_testing.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_perf_testing.html b/docs/build/html/topics/impala_perf_testing.html
new file mode 100644
index 0000000..0663ae9
--- /dev/null
+++ b/docs/build/html/topics/impala_perf_testing.html
@@ -0,0 +1,152 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_performance.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="performance_testing"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Testing Impala Performance</title></head><body id="performance_testing"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Testing Impala Performance</h1>
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      Test to ensure that Impala is configured for optimal performance. If you have installed Impala with cluster
+      management software, complete the processes described in this topic to help ensure a proper
+      configuration. These procedures can be used to verify that Impala is set up correctly.
+    </p>
+
+    <section class="section" id="performance_testing__checking_config_performance"><h2 class="title sectiontitle">Checking Impala Configuration Values</h2>
+
+      
+
+      <p class="p">
+        You can inspect Impala configuration values by connecting to your Impala server using a browser.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">To check Impala configuration values:</strong>
+      </p>
+
+      <ol class="ol">
+        <li class="li">
+          Use a browser to connect to one of the hosts running <code class="ph codeph">impalad</code> in your environment.
+          Connect using an address of the form
+          <code class="ph codeph">http://<var class="keyword varname">hostname</var>:<var class="keyword varname">port</var>/varz</code>.
+          <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+            In the preceding example, replace <code class="ph codeph">hostname</code> and <code class="ph codeph">port</code> with the name and
+            port of your Impala server. The default port is 25000.
+          </div>
+        </li>
+
+        <li class="li">
+          Review the configured values.
+          <p class="p">
+            For example, to check that your system is configured to use block locality tracking information, you
+            would check that the value for <code class="ph codeph">dfs.datanode.hdfs-blocks-metadata.enabled</code> is
+            <code class="ph codeph">true</code>.
+          </p>
+        </li>
+      </ol>
+
+      <p class="p" id="performance_testing__p_31">
+        <strong class="ph b">To check data locality:</strong>
+      </p>
+
+      <ol class="ol">
+        <li class="li">
+          Execute a query on a dataset that is available across multiple nodes. For example, for a table named
+          <code class="ph codeph">MyTable</code> that has a reasonable chance of being spread across multiple DataNodes:
+<pre class="pre codeblock"><code>[impalad-host:21000] &gt; SELECT COUNT (*) FROM MyTable</code></pre>
+        </li>
+
+        <li class="li">
+          After the query completes, review the contents of the Impala logs. You should find a recent message
+          similar to the following:
+<pre class="pre codeblock"><code>Total remote scan volume = 0</code></pre>
+        </li>
+      </ol>
+
+      <p class="p">
+        The presence of remote scans may indicate <code class="ph codeph">impalad</code> is not running on the correct nodes.
+        This can be because some DataNodes do not have <code class="ph codeph">impalad</code> running or it can be because the
+        <code class="ph codeph">impalad</code> instance that is starting the query is unable to contact one or more of the
+        <code class="ph codeph">impalad</code> instances.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">To understand the causes of this issue:</strong>
+      </p>
+
+      <ol class="ol">
+        <li class="li">
+          Connect to the debugging web server. By default, this server runs on port 25000. This page lists all
+          <code class="ph codeph">impalad</code> instances running in your cluster. If there are fewer instances than you expect,
+          this often indicates some DataNodes are not running <code class="ph codeph">impalad</code>. Ensure
+          <code class="ph codeph">impalad</code> is started on all DataNodes.
+        </li>
+
+        <li class="li">
+          
+          If you are using multi-homed hosts, ensure that the Impala daemon's hostname resolves to the interface on
+          which <code class="ph codeph">impalad</code> is running. The hostname Impala is using is displayed when
+          <code class="ph codeph">impalad</code> starts. To explicitly set the hostname, use the <code class="ph codeph">--hostname</code>&nbsp;flag.
+        </li>
+
+        <li class="li">
+          Check that <code class="ph codeph">statestored</code> is running as expected. Review the contents of the state store
+          log to ensure all instances of <code class="ph codeph">impalad</code> are listed as having connected to the state
+          store.
+        </li>
+      </ol>
+    </section>
+
+    <section class="section" id="performance_testing__checking_config_logs"><h2 class="title sectiontitle">Reviewing Impala Logs</h2>
+
+      
+
+      <p class="p">
+        You can review the contents of the Impala logs for signs that short-circuit reads or block location
+        tracking are not functioning. Before checking logs, execute a simple query against a small HDFS dataset.
+        Completing a query task generates log messages using current settings. Information on starting Impala and
+        executing queries can be found in <a class="xref" href="impala_processes.html#processes">Starting Impala</a> and
+        <a class="xref" href="impala_impala_shell.html#impala_shell">Using the Impala Shell (impala-shell Command)</a>. Information on logging can be found in
+        <a class="xref" href="impala_logging.html#logging">Using Impala Logging</a>. Log messages and their interpretations are as follows:
+      </p>
+
+      <table class="table"><caption></caption><colgroup><col style="width:75%"><col style="width:25%"></colgroup><thead class="thead">
+            <tr class="row">
+              <th class="entry nocellnorowborder" id="performance_testing__checking_config_logs__entry__1">
+                Log Message
+              </th>
+              <th class="entry nocellnorowborder" id="performance_testing__checking_config_logs__entry__2">
+                Interpretation
+              </th>
+            </tr>
+          </thead><tbody class="tbody">
+            <tr class="row">
+              <td class="entry nocellnorowborder" headers="performance_testing__checking_config_logs__entry__1 ">
+                <div class="p">
+<pre class="pre">Unknown disk id. This will negatively affect performance. Check your hdfs settings to enable block location metadata
+</pre>
+                </div>
+              </td>
+              <td class="entry nocellnorowborder" headers="performance_testing__checking_config_logs__entry__2 ">
+                <p class="p">
+                  Tracking block locality is not enabled.
+                </p>
+              </td>
+            </tr>
+            <tr class="row">
+              <td class="entry nocellnorowborder" headers="performance_testing__checking_config_logs__entry__1 ">
+                <div class="p">
+<pre class="pre">Unable to load native-hadoop library for your platform... using builtin-java classes where applicable</pre>
+                </div>
+              </td>
+              <td class="entry nocellnorowborder" headers="performance_testing__checking_config_logs__entry__2 ">
+                <p class="p">
+                  Native checksumming is not enabled.
+                </p>
+              </td>
+            </tr>
+          </tbody></table>
+    </section>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_performance.html">Tuning Impala for Performance</a></div></div></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_performance.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_performance.html b/docs/build/html/topics/impala_performance.html
new file mode 100644
index 0000000..13d44b3
--- /dev/null
+++ b/docs/build/html/topics/impala_performance.html
@@ -0,0 +1,116 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_perf_cookbook.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_perf_joins.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_perf_stats.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_perf_benchmarking.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_perf_resources.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_runtime_filtering.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_perf_hdfs_caching.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_perf_testing.html"><meta name="DC.Relation" scheme="URI" content="../topics/im
 pala_explain_plan.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_perf_skew.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="performance"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Tuning Impala for Performance</title></head><body id="performance"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Tuning Impala for Performance</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      The following sections explain the factors affecting the performance of Impala features, and procedures for
+      tuning, monitoring, and benchmarking Impala queries and other SQL operations.
+    </p>
+
+    <p class="p">
+      This section also describes techniques for maximizing Impala scalability. Scalability is tied to performance:
+      it means that performance remains high as the system workload increases. For example, reducing the disk I/O
+      performed by a query can speed up an individual query, and at the same time improve scalability by making it
+      practical to run more queries simultaneously. Sometimes, an optimization technique improves scalability more
+      than performance. For example, reducing memory usage for a query might not change the query performance much,
+      but might improve scalability by allowing more Impala queries or other kinds of jobs to run at the same time
+      without running out of memory.
+    </p>
+
+    <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+      <p class="p">
+        Before starting any performance tuning or benchmarking, make sure your system is configured with all the
+        recommended minimum hardware requirements from <a class="xref" href="impala_prereqs.html#prereqs_hardware">Hardware Requirements</a> and
+        software settings from <a class="xref" href="impala_config_performance.html#config_performance">Post-Installation Configuration for Impala</a>.
+      </p>
+    </div>
+
+    <ul class="ul">
+      <li class="li">
+        <a class="xref" href="impala_partitioning.html#partitioning">Partitioning for Impala Tables</a>. This technique physically divides the data based on
+        the different values in frequently queried columns, allowing queries to skip reading a large percentage of
+        the data in a table.
+      </li>
+
+      <li class="li">
+        <a class="xref" href="impala_perf_joins.html#perf_joins">Performance Considerations for Join Queries</a>. Joins are the main class of queries that you can tune at
+        the SQL level, as opposed to changing physical factors such as the file format or the hardware
+        configuration. The related topics <a class="xref" href="impala_perf_stats.html#perf_column_stats">Overview of Column Statistics</a> and
+        <a class="xref" href="impala_perf_stats.html#perf_table_stats">Overview of Table Statistics</a> are also important primarily for join performance.
+      </li>
+
+      <li class="li">
+        <a class="xref" href="impala_perf_stats.html#perf_table_stats">Overview of Table Statistics</a> and
+        <a class="xref" href="impala_perf_stats.html#perf_column_stats">Overview of Column Statistics</a>. Gathering table and column statistics, using the
+        <code class="ph codeph">COMPUTE STATS</code> statement, helps Impala automatically optimize the performance for join
+        queries, without requiring changes to SQL query statements. (This process is greatly simplified in Impala
+        1.2.2 and higher, because the <code class="ph codeph">COMPUTE STATS</code> statement gathers both kinds of statistics in
+        one operation, and does not require any setup and configuration as was previously necessary for the
+        <code class="ph codeph">ANALYZE TABLE</code> statement in Hive.)
+      </li>
+
+      <li class="li">
+        <a class="xref" href="impala_perf_testing.html#performance_testing">Testing Impala Performance</a>. Do some post-setup testing to ensure Impala is
+        using optimal settings for performance, before conducting any benchmark tests.
+      </li>
+
+      <li class="li">
+        <a class="xref" href="impala_perf_benchmarking.html#perf_benchmarks">Benchmarking Impala Queries</a>. The configuration and sample data that you use
+        for initial experiments with Impala is often not appropriate for doing performance tests.
+      </li>
+
+      <li class="li">
+        <a class="xref" href="impala_perf_resources.html#mem_limits">Controlling Impala Resource Usage</a>. The more memory Impala can utilize, the better query
+        performance you can expect. In a cluster running other kinds of workloads as well, you must make tradeoffs
+        to make sure all Hadoop components have enough memory to perform well, so you might cap the memory that
+        Impala can use.
+      </li>
+
+      
+
+      <li class="li">
+        <a class="xref" href="impala_s3.html#s3">Using Impala with the Amazon S3 Filesystem</a>. Queries against data stored in the Amazon Simple Storage Service (S3)
+        have different performance characteristics than when the data is stored in HDFS.
+      </li>
+    </ul>
+
+    <p class="p toc"></p>
+
+    <p class="p">
+        A good source of tips related to scalability and performance tuning is the
+        <a class="xref" href="http://www.slideshare.net/cloudera/the-impala-cookbook-42530186" target="_blank">Impala Cookbook</a>
+        presentation. These slides are updated periodically as new features come out and new benchmarks are performed.
+      </p>
+
+  </div>
+
+
+
+  
+
+  
+
+  
+
+  
+
+  
+
+  
+
+  
+
+  
+
+  
+
+  
+
+  
+<nav role="navigation" class="related-links"><ul class="ullinks"><li class="link ulchildlink"><strong><a href="../topics/impala_perf_cookbook.html">Impala Performance Guidelines and Best Practices</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_perf_joins.html">Performance Considerations for Join Queries</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_perf_stats.html">Table and Column Statistics</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_perf_benchmarking.html">Benchmarking Impala Queries</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_perf_resources.html">Controlling Impala Resource Usage</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_runtime_filtering.html">Runtime Filtering for Impala Queries (Impala 2.5 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/i
 mpala_perf_hdfs_caching.html">Using HDFS Caching with Impala (Impala 2.1 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_perf_testing.html">Testing Impala Performance</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_explain_plan.html">Understanding Impala Query Performance - EXPLAIN Plans and Query Profiles</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_perf_skew.html">Detecting and Correcting HDFS Block Skew Conditions</a></strong><br></li></ul></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_planning.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_planning.html b/docs/build/html/topics/impala_planning.html
new file mode 100644
index 0000000..8707957
--- /dev/null
+++ b/docs/build/html/topics/impala_planning.html
@@ -0,0 +1,20 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_prereqs.html#prereqs"><meta name="DC.Relation" scheme="URI" content="../topics/impala_cluster_sizing.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_schema_design.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="planning"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Planning for Impala Deployment</title></head><body id="planning"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Planning for Impala Deployment</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      
+      Before you set up Impala in production, do some planning to make sure that your hardware setup has sufficient
+      capacity, that your cluster topology is optimal for Impala queries, and that your schema design and ETL
+      processes follow the best practices for Impala.
+    </p>
+
+    <p class="p toc"></p>
+  </div>
+<nav role="navigation" class="related-links"><ul class="ullinks"><li class="link ulchildlink"><strong><a href="../topics/impala_prereqs.html#prereqs">Impala Requirements</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_cluster_sizing.html">Cluster Sizing Guidelines for Impala</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_schema_design.html">Guidelines for Designing Impala Schemas</a></strong><br></li></ul></nav></article></main></body></html>
\ No newline at end of file

[45/51] [partial] incubator-impala git commit: IMPALA-4181 [DOCS] Publish rendered Impala documentation to ASF site

Posted by jb...@apache.org.

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_avro.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_avro.html b/docs/build/html/topics/impala_avro.html
new file mode 100644
index 0000000..fd38294
--- /dev/null
+++ b/docs/build/html/topics/impala_avro.html
@@ -0,0 +1,565 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_file_formats.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta
  name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="avro"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Using the Avro File Format with Impala Tables</title></head><body id="avro"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Using the Avro File Format with Impala Tables</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      
+      Impala supports using tables whose data files use the Avro file format. Impala can query Avro
+      tables, and in Impala 1.4.0 and higher can create them, but currently cannot insert data into them. For
+      insert operations, use Hive, then switch back to Impala to run queries.
+    </p>
+
+    <table class="table"><caption><span class="table--title-label">Table 1. </span><span class="title">Avro Format Support in Impala</span></caption><colgroup><col style="width:10%"><col style="width:10%"><col style="width:20%"><col style="width:30%"><col style="width:30%"></colgroup><thead class="thead">
+          <tr class="row">
+            <th class="entry nocellnorowborder" id="avro__entry__1">
+              File Type
+            </th>
+            <th class="entry nocellnorowborder" id="avro__entry__2">
+              Format
+            </th>
+            <th class="entry nocellnorowborder" id="avro__entry__3">
+              Compression Codecs
+            </th>
+            <th class="entry nocellnorowborder" id="avro__entry__4">
+              Impala Can CREATE?
+            </th>
+            <th class="entry nocellnorowborder" id="avro__entry__5">
+              Impala Can INSERT?
+            </th>
+          </tr>
+        </thead><tbody class="tbody">
+          <tr class="row">
+            <td class="entry nocellnorowborder" headers="avro__entry__1 ">
+              <a class="xref" href="impala_avro.html#avro">Avro</a>
+            </td>
+            <td class="entry nocellnorowborder" headers="avro__entry__2 ">
+              Structured
+            </td>
+            <td class="entry nocellnorowborder" headers="avro__entry__3 ">
+              Snappy, gzip, deflate, bzip2
+            </td>
+            <td class="entry nocellnorowborder" headers="avro__entry__4 ">
+              Yes, in Impala 1.4.0 and higher. Before that, create the table using Hive.
+            </td>
+            <td class="entry nocellnorowborder" headers="avro__entry__5 ">
+              No. Import data by using <code class="ph codeph">LOAD DATA</code> on data files already in the right format, or use
+              <code class="ph codeph">INSERT</code> in Hive followed by <code class="ph codeph">REFRESH <var class="keyword varname">table_name</var></code> in Impala.
+            </td>
+
+          </tr>
+        </tbody></table>
+
+    <p class="p toc inpage"></p>
+  </div>
+
+  <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_file_formats.html">How Impala Works with Hadoop File Formats</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="avro__avro_create_table">
+
+    <h2 class="title topictitle2" id="ariaid-title2">Creating Avro Tables</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        To create a new table using the Avro file format, issue the <code class="ph codeph">CREATE TABLE</code> statement through
+        Impala with the <code class="ph codeph">STORED AS AVRO</code> clause, or through Hive. If you create the table through
+        Impala, you must include column definitions that match the fields specified in the Avro schema. With Hive,
+        you can omit the columns and just specify the Avro schema.
+      </p>
+
+      <p class="p">
+        In <span class="keyword">Impala 2.3</span> and higher, the <code class="ph codeph">CREATE TABLE</code> for Avro tables can include
+        SQL-style column definitions rather than specifying Avro notation through the <code class="ph codeph">TBLPROPERTIES</code>
+        clause. Impala issues warning messages if there are any mismatches between the types specified in the
+        SQL column definitions and the underlying types; for example, any <code class="ph codeph">TINYINT</code> or
+        <code class="ph codeph">SMALLINT</code> columns are treated as <code class="ph codeph">INT</code> in the underlying Avro files,
+        and therefore are displayed as <code class="ph codeph">INT</code> in any <code class="ph codeph">DESCRIBE</code> or
+        <code class="ph codeph">SHOW CREATE TABLE</code> output.
+      </p>
+
+      <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+        <p class="p">
+        Currently, Avro tables cannot contain <code class="ph codeph">TIMESTAMP</code> columns. If you need to store date and
+        time values in Avro tables, as a workaround you can use a <code class="ph codeph">STRING</code> representation of the
+        values, convert the values to <code class="ph codeph">BIGINT</code> with the <code class="ph codeph">UNIX_TIMESTAMP()</code> function,
+        or create separate numeric columns for individual date and time fields using the <code class="ph codeph">EXTRACT()</code>
+        function.
+      </p>
+      </div>
+
+      
+
+      <p class="p">
+        The following examples demonstrate creating an Avro table in Impala, using either an inline column
+        specification or one taken from a JSON file stored in HDFS:
+      </p>
+
+<pre class="pre codeblock"><code>
+[localhost:21000] &gt; CREATE TABLE avro_only_sql_columns
+                  &gt; (
+                  &gt;   id INT,
+                  &gt;   bool_col BOOLEAN,
+                  &gt;   tinyint_col TINYINT, /* Gets promoted to INT */
+                  &gt;   smallint_col SMALLINT, /* Gets promoted to INT */
+                  &gt;   int_col INT,
+                  &gt;   bigint_col BIGINT,
+                  &gt;   float_col FLOAT,
+                  &gt;   double_col DOUBLE,
+                  &gt;   date_string_col STRING,
+                  &gt;   string_col STRING
+                  &gt; )
+                  &gt; STORED AS AVRO;
+
+[localhost:21000] &gt; CREATE TABLE impala_avro_table
+                  &gt; (bool_col BOOLEAN, int_col INT, long_col BIGINT, float_col FLOAT, double_col DOUBLE, string_col STRING, nullable_int INT)
+                  &gt; STORED AS AVRO
+                  &gt; TBLPROPERTIES ('avro.schema.literal'='{
+                  &gt;    "name": "my_record",
+                  &gt;    "type": "record",
+                  &gt;    "fields": [
+                  &gt;       {"name":"bool_col", "type":"boolean"},
+                  &gt;       {"name":"int_col", "type":"int"},
+                  &gt;       {"name":"long_col", "type":"long"},
+                  &gt;       {"name":"float_col", "type":"float"},
+                  &gt;       {"name":"double_col", "type":"double"},
+                  &gt;       {"name":"string_col", "type":"string"},
+                  &gt;       {"name": "nullable_int", "type": ["null", "int"]}]}');
+
+[localhost:21000] &gt; CREATE TABLE avro_examples_of_all_types (
+                  &gt;     id INT,
+                  &gt;     bool_col BOOLEAN,
+                  &gt;     tinyint_col TINYINT,
+                  &gt;     smallint_col SMALLINT,
+                  &gt;     int_col INT,
+                  &gt;     bigint_col BIGINT,
+                  &gt;     float_col FLOAT,
+                  &gt;     double_col DOUBLE,
+                  &gt;     date_string_col STRING,
+                  &gt;     string_col STRING
+                  &gt;   )
+                  &gt;   STORED AS AVRO
+                  &gt;   TBLPROPERTIES ('avro.schema.url'='hdfs://localhost:8020/avro_schemas/alltypes.json');
+
+</code></pre>
+
+      <p class="p">
+        The following example demonstrates creating an Avro table in Hive:
+      </p>
+
+<pre class="pre codeblock"><code>
+hive&gt; CREATE TABLE hive_avro_table
+    &gt; ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
+    &gt; STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
+    &gt; OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
+    &gt; TBLPROPERTIES ('avro.schema.literal'='{
+    &gt;    "name": "my_record",
+    &gt;    "type": "record",
+    &gt;    "fields": [
+    &gt;       {"name":"bool_col", "type":"boolean"},
+    &gt;       {"name":"int_col", "type":"int"},
+    &gt;       {"name":"long_col", "type":"long"},
+    &gt;       {"name":"float_col", "type":"float"},
+    &gt;       {"name":"double_col", "type":"double"},
+    &gt;       {"name":"string_col", "type":"string"},
+    &gt;       {"name": "nullable_int", "type": ["null", "int"]}]}');
+
+</code></pre>
+
+      <p class="p">
+        Each field of the record becomes a column of the table. Note that any other information, such as the record
+        name, is ignored.
+      </p>
+
+
+
+      <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+        For nullable Avro columns, make sure to put the <code class="ph codeph">"null"</code> entry before the actual type name.
+        In Impala, all columns are nullable; Impala currently does not have a <code class="ph codeph">NOT NULL</code> clause. Any
+        non-nullable property is only enforced on the Avro side.
+      </div>
+
+      <p class="p">
+        Most column types map directly from Avro to Impala under the same names. These are the exceptions and
+        special cases to consider:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          The <code class="ph codeph">DECIMAL</code> type is defined in Avro as a <code class="ph codeph">BYTE</code> type with the
+          <code class="ph codeph">logicalType</code> property set to <code class="ph codeph">"decimal"</code> and a specified precision and
+          scale.
+        </li>
+
+        <li class="li">
+          The Avro <code class="ph codeph">long</code> type maps to <code class="ph codeph">BIGINT</code> in Impala.
+        </li>
+      </ul>
+
+      <p class="p">
+        If you create the table through Hive, switch back to <span class="keyword cmdname">impala-shell</span> and issue an
+        <code class="ph codeph">INVALIDATE METADATA <var class="keyword varname">table_name</var></code> statement. Then you can run queries for
+        that table through <span class="keyword cmdname">impala-shell</span>.
+      </p>
+
+      <div class="p">
+        In rare instances, a mismatch could occur between the Avro schema and the column definitions in the
+        metastore database. In <span class="keyword">Impala 2.3</span> and higher, Impala checks for such inconsistencies during
+        a <code class="ph codeph">CREATE TABLE</code> statement and each time it loads the metadata for a table (for example,
+        after <code class="ph codeph">INVALIDATE METADATA</code>). Impala uses the following rules to determine how to treat
+        mismatching columns, a process known as <dfn class="term">schema reconciliation</dfn>:
+        <ul class="ul">
+        <li class="li">
+          If there is a mismatch in the number of columns, Impala uses the column
+          definitions from the Avro schema.
+        </li>
+        <li class="li">
+          If there is a mismatch in column name or type, Impala uses the column definition from the Avro schema.
+          Because a <code class="ph codeph">CHAR</code> or <code class="ph codeph">VARCHAR</code> column in Impala maps to an Avro <code class="ph codeph">STRING</code>,
+          this case is not considered a mismatch and the column is preserved as <code class="ph codeph">CHAR</code> or <code class="ph codeph">VARCHAR</code>
+          in the reconciled schema. <span class="ph">Prior to <span class="keyword">Impala 2.7</span> the column
+          name and comment for such <code class="ph codeph">CHAR</code> and <code class="ph codeph">VARCHAR</code> columns was also taken from the SQL column definition.
+          In <span class="keyword">Impala 2.7</span> and higher, the column name and comment from the Avro schema file take precedence for such columns,
+          and only the <code class="ph codeph">CHAR</code> or <code class="ph codeph">VARCHAR</code> type is preserved from the SQL column definition.</span>
+        </li>
+        <li class="li">
+          An Impala <code class="ph codeph">TIMESTAMP</code> column definition maps to an Avro <code class="ph codeph">STRING</code> and is presented as a <code class="ph codeph">STRING</code>
+          in the reconciled schema, because Avro has no binary <code class="ph codeph">TIMESTAMP</code> representation.
+          As a result, no Avro table can have a <code class="ph codeph">TIMESTAMP</code> column; this restriction is the same as
+          in earlier Impala releases.
+        </li>
+        </ul>
+      </div>
+
+      <p class="p">
+        <strong class="ph b">Complex type considerations:</strong>
+        Although you can create tables in this file format using
+        the complex types (<code class="ph codeph">ARRAY</code>, <code class="ph codeph">STRUCT</code>,
+        and <code class="ph codeph">MAP</code>) available in <span class="keyword">Impala 2.3</span> and higher,
+        currently, Impala can query these types only in Parquet tables.
+        <span class="ph">
+        The one exception to the preceding rule is <code class="ph codeph">COUNT(*)</code> queries on RCFile tables that include complex types.
+        Such queries are allowed in <span class="keyword">Impala 2.6</span> and higher.
+        </span>
+      </p>
+
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="avro__avro_map_table">
+
+    <h2 class="title topictitle2" id="ariaid-title3">Using a Hive-Created Avro Table in Impala</h2>
+
+    <div class="body conbody">
+
+      <div class="p">
+        If you have an Avro table created through Hive, you can use it in Impala as long as it contains only
+        Impala-compatible data types. It cannot contain:
+        <ul class="ul">
+          <li class="li">
+            Complex types: <code class="ph codeph">array</code>, <code class="ph codeph">map</code>, <code class="ph codeph">record</code>,
+            <code class="ph codeph">struct</code>, <code class="ph codeph">union</code> other than
+            <code class="ph codeph">[<var class="keyword varname">supported_type</var>,null]</code> or
+            <code class="ph codeph">[null,<var class="keyword varname">supported_type</var>]</code>
+          </li>
+
+          <li class="li">
+            The Avro-specific types <code class="ph codeph">enum</code>, <code class="ph codeph">bytes</code>, and <code class="ph codeph">fixed</code>
+          </li>
+
+          <li class="li">
+            Any scalar type other than those listed in <a class="xref" href="impala_datatypes.html#datatypes">Data Types</a>
+          </li>
+        </ul>
+        Because Impala and Hive share the same metastore database, Impala can directly access the table definitions
+        and data for tables that were created in Hive.
+      </div>
+
+      <p class="p">
+        If you create an Avro table in Hive, issue an <code class="ph codeph">INVALIDATE METADATA</code> the next time you
+        connect to Impala through <span class="keyword cmdname">impala-shell</span>. This is a one-time operation to make Impala
+        aware of the new table. You can issue the statement while connected to any Impala node, and the catalog
+        service broadcasts the change to all other Impala nodes.
+      </p>
+
+      <p class="p">
+        If you load new data into an Avro table through Hive, either through a Hive <code class="ph codeph">LOAD DATA</code> or
+        <code class="ph codeph">INSERT</code> statement, or by manually copying or moving files into the data directory for the
+        table, issue a <code class="ph codeph">REFRESH <var class="keyword varname">table_name</var></code> statement the next time you connect
+        to Impala through <span class="keyword cmdname">impala-shell</span>. You can issue the statement while connected to any
+        Impala node, and the catalog service broadcasts the change to all other Impala nodes. If you issue the
+        <code class="ph codeph">LOAD DATA</code> statement through Impala, you do not need a <code class="ph codeph">REFRESH</code> afterward.
+      </p>
+
+      <p class="p">
+        Impala only supports fields of type <code class="ph codeph">boolean</code>, <code class="ph codeph">int</code>, <code class="ph codeph">long</code>,
+        <code class="ph codeph">float</code>, <code class="ph codeph">double</code>, and <code class="ph codeph">string</code>, or unions of these types with
+        null; for example, <code class="ph codeph">["string", "null"]</code>. Unions with <code class="ph codeph">null</code> essentially
+        create a nullable type.
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="avro__avro_json">
+
+    <h2 class="title topictitle2" id="ariaid-title4">Specifying the Avro Schema through JSON</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        While you can embed a schema directly in your <code class="ph codeph">CREATE TABLE</code> statement, as shown above,
+        column width restrictions in the Hive metastore limit the length of schema you can specify. If you
+        encounter problems with long schema literals, try storing your schema as a <code class="ph codeph">JSON</code> file in
+        HDFS instead. Specify your schema in HDFS using table properties similar to the following:
+      </p>
+
+<pre class="pre codeblock"><code>tblproperties ('avro.schema.url'='hdfs//your-name-node:port/path/to/schema.json');</code></pre>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title5" id="avro__avro_load_data">
+
+    <h2 class="title topictitle2" id="ariaid-title5">Loading Data into an Avro Table</h2>
+  
+
+    <div class="body conbody">
+
+      <p class="p">
+        Currently, Impala cannot write Avro data files. Therefore, an Avro table cannot be used as the destination
+        of an Impala <code class="ph codeph">INSERT</code> statement or <code class="ph codeph">CREATE TABLE AS SELECT</code>.
+      </p>
+
+      <p class="p">
+        To copy data from another table, issue any <code class="ph codeph">INSERT</code> statements through Hive. For information
+        about loading data into Avro tables through Hive, see
+        <a class="xref" href="https://cwiki.apache.org/confluence/display/Hive/AvroSerDe" target="_blank">Avro
+        page on the Hive wiki</a>.
+      </p>
+
+      <p class="p">
+        If you already have data files in Avro format, you can also issue <code class="ph codeph">LOAD DATA</code> in either
+        Impala or Hive. Impala can move existing Avro data files into an Avro table, it just cannot create new
+        Avro data files.
+      </p>
+
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title6" id="avro__avro_compression">
+
+    <h2 class="title topictitle2" id="ariaid-title6">Enabling Compression for Avro Tables</h2>
+  
+
+    <div class="body conbody">
+
+      <p class="p">
+        
+        To enable compression for Avro tables, specify settings in the Hive shell to enable compression and to
+        specify a codec, then issue a <code class="ph codeph">CREATE TABLE</code> statement as in the preceding examples. Impala
+        supports the <code class="ph codeph">snappy</code> and <code class="ph codeph">deflate</code> codecs for Avro tables.
+      </p>
+
+      <p class="p">
+        For example:
+      </p>
+
+<pre class="pre codeblock"><code>hive&gt; set hive.exec.compress.output=true;
+hive&gt; set avro.output.codec=snappy;</code></pre>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title7" id="avro__avro_schema_evolution">
+
+    <h2 class="title topictitle2" id="ariaid-title7">How Impala Handles Avro Schema Evolution</h2>
+  
+
+    <div class="body conbody">
+
+      <p class="p">
+        Starting in Impala 1.1, Impala can deal with Avro data files that employ <dfn class="term">schema evolution</dfn>,
+        where different data files within the same table use slightly different type definitions. (You would
+        perform the schema evolution operation by issuing an <code class="ph codeph">ALTER TABLE</code> statement in the Hive
+        shell.) The old and new types for any changed columns must be compatible, for example a column might start
+        as an <code class="ph codeph">int</code> and later change to a <code class="ph codeph">bigint</code> or <code class="ph codeph">float</code>.
+      </p>
+
+      <p class="p">
+        As with any other tables where the definitions are changed or data is added outside of the current
+        <span class="keyword cmdname">impalad</span> node, ensure that Impala loads the latest metadata for the table if the Avro
+        schema is modified through Hive. Issue a <code class="ph codeph">REFRESH <var class="keyword varname">table_name</var></code> or
+        <code class="ph codeph">INVALIDATE METADATA <var class="keyword varname">table_name</var></code> statement. <code class="ph codeph">REFRESH</code>
+        reloads the metadata immediately, <code class="ph codeph">INVALIDATE METADATA</code> reloads the metadata the next time
+        the table is accessed.
+      </p>
+
+      <p class="p">
+        When Avro data files or columns are not consulted during a query, Impala does not check for consistency.
+        Thus, if you issue <code class="ph codeph">SELECT c1, c2 FROM t1</code>, Impala does not return any error if the column
+        <code class="ph codeph">c3</code> changed in an incompatible way. If a query retrieves data from some partitions but not
+        others, Impala does not check the data files for the unused partitions.
+      </p>
+
+      <p class="p">
+        In the Hive DDL statements, you can specify an <code class="ph codeph">avro.schema.literal</code> table property (if the
+        schema definition is short) or an <code class="ph codeph">avro.schema.url</code> property (if the schema definition is
+        long, or to allow convenient editing for the definition).
+      </p>
+
+      <p class="p">
+        For example, running the following SQL code in the Hive shell creates a table using the Avro file format
+        and puts some sample data into it:
+      </p>
+
+<pre class="pre codeblock"><code>CREATE TABLE avro_table (a string, b string)
+ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
+STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
+OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
+TBLPROPERTIES (
+  'avro.schema.literal'='{
+    "type": "record",
+    "name": "my_record",
+    "fields": [
+      {"name": "a", "type": "int"},
+      {"name": "b", "type": "string"}
+    ]}');
+
+INSERT OVERWRITE TABLE avro_table SELECT 1, "avro" FROM functional.alltypes LIMIT 1;
+</code></pre>
+
+      <p class="p">
+        Once the Avro table is created and contains data, you can query it through the
+        <span class="keyword cmdname">impala-shell</span> command:
+      </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; select * from avro_table;
++---+------+
+| a | b    |
++---+------+
+| 1 | avro |
++---+------+
+</code></pre>
+
+      <p class="p">
+        Now in the Hive shell, you change the type of a column and add a new column with a default value:
+      </p>
+
+<pre class="pre codeblock"><code>-- Promote column "a" from INT to FLOAT (no need to update Avro schema)
+ALTER TABLE avro_table CHANGE A A FLOAT;
+
+-- Add column "c" with default
+ALTER TABLE avro_table ADD COLUMNS (c int);
+ALTER TABLE avro_table SET TBLPROPERTIES (
+  'avro.schema.literal'='{
+    "type": "record",
+    "name": "my_record",
+    "fields": [
+      {"name": "a", "type": "int"},
+      {"name": "b", "type": "string"},
+      {"name": "c", "type": "int", "default": 10}
+    ]}');
+</code></pre>
+
+      <p class="p">
+        Once again in <span class="keyword cmdname">impala-shell</span>, you can query the Avro table based on its latest schema
+        definition. Because the table metadata was changed outside of Impala, you issue a <code class="ph codeph">REFRESH</code>
+        statement first so that Impala has up-to-date metadata for the table.
+      </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; refresh avro_table;
+[localhost:21000] &gt; select * from avro_table;
++---+------+----+
+| a | b    | c  |
++---+------+----+
+| 1 | avro | 10 |
++---+------+----+
+</code></pre>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title8" id="avro__avro_data_types">
+
+    <h2 class="title topictitle2" id="ariaid-title8">Data Type Considerations for Avro Tables</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        The Avro format defines a set of data types whose names differ from the names of the corresponding Impala
+        data types. If you are preparing Avro files using other Hadoop components such as Pig or MapReduce, you
+        might need to work with the type names defined by Avro. The following figure lists the Avro-defined types
+        and the equivalent types in Impala.
+      </p>
+
+<pre class="pre codeblock"><code>Primitive Types (Avro -&gt; Impala)
+--------------------------------
+STRING -&gt; STRING
+STRING -&gt; CHAR
+STRING -&gt; VARCHAR
+INT -&gt; INT
+BOOLEAN -&gt; BOOLEAN
+LONG -&gt;  BIGINT
+FLOAT -&gt;  FLOAT
+DOUBLE -&gt; DOUBLE
+
+Logical Types
+-------------
+BYTES + logicalType = "decimal" -&gt; DECIMAL
+
+Avro Types with No Impala Equivalent
+------------------------------------
+RECORD, MAP, ARRAY, UNION,  ENUM, FIXED, NULL
+
+Impala Types with No Avro Equivalent
+------------------------------------
+TIMESTAMP
+
+</code></pre>
+
+      <p class="p">
+        The Avro specification allows string values up to 2**64 bytes in length.
+        Impala queries for Avro tables use 32-bit integers to hold string lengths.
+        In <span class="keyword">Impala 2.5</span> and higher, Impala truncates <code class="ph codeph">CHAR</code>
+        and <code class="ph codeph">VARCHAR</code> values in Avro tables to (2**31)-1 bytes.
+        If a query encounters a <code class="ph codeph">STRING</code> value longer than (2**31)-1
+        bytes in an Avro table, the query fails. In earlier releases,
+        encountering such long values in an Avro table could cause a crash.
+      </p>
+
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title9" id="avro__avro_performance">
+
+    <h2 class="title topictitle2" id="ariaid-title9">Query Performance for Impala Avro Tables</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        In general, expect query performance with Avro tables to be
+        faster than with tables using text data, but slower than with
+        Parquet tables. See <a class="xref" href="impala_parquet.html#parquet">Using the Parquet File Format with Impala Tables</a>
+        for information about using the Parquet file format for
+        high-performance analytic queries.
+      </p>
+
+      <p class="p">
+        In <span class="keyword">Impala 2.6</span> and higher, Impala queries are optimized for files stored in Amazon S3.
+        For Impala tables that use the file formats Parquet, RCFile, SequenceFile,
+        Avro, and uncompressed text, the setting <code class="ph codeph">fs.s3a.block.size</code>
+        in the <span class="ph filepath">core-site.xml</span> configuration file determines
+        how Impala divides the I/O work of reading the data files. This configuration
+        setting is specified in bytes. By default, this
+        value is 33554432 (32 MB), meaning that Impala parallelizes S3 read operations on the files
+        as if they were made up of 32 MB blocks. For example, if your S3 queries primarily access
+        Parquet files written by MapReduce or Hive, increase <code class="ph codeph">fs.s3a.block.size</code>
+        to 134217728 (128 MB) to match the row group size of those files. If most S3 queries involve
+        Parquet files written by Impala, increase <code class="ph codeph">fs.s3a.block.size</code>
+        to 268435456 (256 MB) to match the row group size produced by Impala.
+      </p>
+
+    </div>
+  </article>
+
+</article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_batch_size.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_batch_size.html b/docs/build/html/topics/impala_batch_size.html
new file mode 100644
index 0000000..52ceff0
--- /dev/null
+++ b/docs/build/html/topics/impala_batch_size.html
@@ -0,0 +1,29 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="batch_size"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>BATCH_SIZE Query Option</title></head><body id="batch_size"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">BATCH_SIZE Query Option</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      
+      Number of rows evaluated at a time by SQL operators. Unspecified or a size of 0 uses a predefined default
+      size. Using a large number improves responsiveness, especially for scan operations, at the cost of a higher memory footprint.
+    </p>
+
+    <p class="p">
+      This option is primarily for testing during Impala development, or for use under the direction of <span class="keyword">the appropriate support channel</span>.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Type:</strong> numeric
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Default:</strong> 0 (meaning the predefined default of 1024)
+    </p>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_bigint.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_bigint.html b/docs/build/html/topics/impala_bigint.html
new file mode 100644
index 0000000..d0f9c2c
--- /dev/null
+++ b/docs/build/html/topics/impala_bigint.html
@@ -0,0 +1,136 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_datatypes.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="bigint"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>BIGINT Data Type</title></head><body id="bigint"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">BIGINT Data Type</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      An 8-byte integer data type used in <code class="ph codeph">CREATE TABLE</code> and <code class="ph codeph">ALTER TABLE</code>
+      statements.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+    <p class="p">
+      In the column definition of a <code class="ph codeph">CREATE TABLE</code> statement:
+    </p>
+
+<pre class="pre codeblock"><code><var class="keyword varname">column_name</var> BIGINT</code></pre>
+
+    <p class="p">
+      <strong class="ph b">Range:</strong> -9223372036854775808 .. 9223372036854775807. There is no <code class="ph codeph">UNSIGNED</code> subtype.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Conversions:</strong> Impala automatically converts to a floating-point type (<code class="ph codeph">FLOAT</code> or
+      <code class="ph codeph">DOUBLE</code>) automatically. Use <code class="ph codeph">CAST()</code> to convert to <code class="ph codeph">TINYINT</code>,
+      <code class="ph codeph">SMALLINT</code>, <code class="ph codeph">INT</code>, <code class="ph codeph">STRING</code>, or <code class="ph codeph">TIMESTAMP</code>.
+      <span class="ph">Casting an integer or floating-point value <code class="ph codeph">N</code> to
+        <code class="ph codeph">TIMESTAMP</code> produces a value that is <code class="ph codeph">N</code> seconds past the start of the epoch
+        date (January 1, 1970). By default, the result value represents a date and time in the UTC time zone.
+        If the setting <code class="ph codeph">-use_local_tz_for_unix_timestamp_conversions=true</code> is in effect,
+        the resulting <code class="ph codeph">TIMESTAMP</code> represents a date and time in the local time zone.</span>
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>CREATE TABLE t1 (x BIGINT);
+SELECT CAST(1000 AS BIGINT);
+</code></pre>
+
+    <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+    <p class="p">
+      <code class="ph codeph">BIGINT</code> is a convenient type to use for column declarations because you can use any kind of
+      integer values in <code class="ph codeph">INSERT</code> statements and they are promoted to <code class="ph codeph">BIGINT</code> where
+      necessary. However, <code class="ph codeph">BIGINT</code> also requires the most bytes of any integer type on disk and in
+      memory, meaning your queries are not as efficient and scalable as possible if you overuse this type.
+      Therefore, prefer to use the smallest integer type with sufficient range to hold all input values, and
+      <code class="ph codeph">CAST()</code> when necessary to the appropriate type.
+    </p>
+
+    <p class="p">
+      For a convenient and automated way to check the bounds of the <code class="ph codeph">BIGINT</code> type, call the
+      functions <code class="ph codeph">MIN_BIGINT()</code> and <code class="ph codeph">MAX_BIGINT()</code>.
+    </p>
+
+    <p class="p">
+      If an integer value is too large to be represented as a <code class="ph codeph">BIGINT</code>, use a
+      <code class="ph codeph">DECIMAL</code> instead with sufficient digits of precision.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">NULL considerations:</strong> Casting any non-numeric value to this type produces a <code class="ph codeph">NULL</code>
+        value.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Partitioning:</strong> Prefer to use this type for a partition key column. Impala can process the numeric
+        type more efficiently than a <code class="ph codeph">STRING</code> representation of the value.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">HBase considerations:</strong> This data type is fully compatible with HBase tables.
+      </p>
+
+
+
+    <p class="p">
+        <strong class="ph b">Text table considerations:</strong> Values of this type are potentially larger in text tables than in tables
+        using Parquet or other binary formats.
+      </p>
+
+
+
+    <p class="p">
+        <strong class="ph b">Internal details:</strong> Represented in memory as an 8-byte value.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Added in:</strong> Available in all versions of Impala.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Column statistics considerations:</strong> Because this type has a fixed size, the maximum and average size
+        fields are always filled in for column statistics, even before you run the <code class="ph codeph">COMPUTE STATS</code>
+        statement.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Sqoop considerations:</strong>
+      </p>
+
+    <p class="p"> If you use Sqoop to
+        convert RDBMS data to Parquet, be careful with interpreting any
+        resulting values from <code class="ph codeph">DATE</code>, <code class="ph codeph">DATETIME</code>,
+        or <code class="ph codeph">TIMESTAMP</code> columns. The underlying values are
+        represented as the Parquet <code class="ph codeph">INT64</code> type, which is
+        represented as <code class="ph codeph">BIGINT</code> in the Impala table. The Parquet
+        values represent the time in milliseconds, while Impala interprets
+          <code class="ph codeph">BIGINT</code> as the time in seconds. Therefore, if you have
+        a <code class="ph codeph">BIGINT</code> column in a Parquet table that was imported
+        this way from Sqoop, divide the values by 1000 when interpreting as the
+          <code class="ph codeph">TIMESTAMP</code> type.</p>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+    <p class="p">
+      <a class="xref" href="impala_literals.html#numeric_literals">Numeric Literals</a>, <a class="xref" href="impala_tinyint.html#tinyint">TINYINT Data Type</a>,
+      <a class="xref" href="impala_smallint.html#smallint">SMALLINT Data Type</a>, <a class="xref" href="impala_int.html#int">INT Data Type</a>,
+      <a class="xref" href="impala_bigint.html#bigint">BIGINT Data Type</a>, <a class="xref" href="impala_decimal.html#decimal">DECIMAL Data Type (Impala 1.4 or higher only)</a>,
+      <a class="xref" href="impala_math_functions.html#math_functions">Impala Mathematical Functions</a>
+    </p>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_datatypes.html">Data Types</a></div></div></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_bit_functions.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_bit_functions.html b/docs/build/html/topics/impala_bit_functions.html
new file mode 100644
index 0000000..80d9f55
--- /dev/null
+++ b/docs/build/html/topics/impala_bit_functions.html
@@ -0,0 +1,848 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_functions.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="bit_functions"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Impala Bit Functions</title></head><body id="bit_functions"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Impala Bit Functions</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      Bit manipulation functions perform bitwise operations involved in scientific processing or computer science algorithms.
+      For example, these functions include setting, clearing, or testing bits within an integer value, or changing the
+      positions of bits with or without wraparound.
+    </p>
+
+    <p class="p">
+      If a function takes two integer arguments that are required to be of the same type, the smaller argument is promoted
+      to the type of the larger one if required. For example, <code class="ph codeph">BITAND(1,4096)</code> treats both arguments as
+      <code class="ph codeph">SMALLINT</code>, because 1 can be represented as a <code class="ph codeph">TINYINT</code> but 4096 requires a <code class="ph codeph">SMALLINT</code>.
+    </p>
+
+    <p class="p">
+     Remember that all Impala integer values are signed. Therefore, when dealing with binary values where the most significant
+     bit is 1, the specified or returned values might be negative when represented in base 10.
+    </p>
+
+    <p class="p">
+      Whenever any argument is <code class="ph codeph">NULL</code>, either the input value, bit position, or number of shift or rotate positions,
+      the return value from any of these functions is also <code class="ph codeph">NULL</code>
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+    <p class="p">
+      The bit functions operate on all the integral data types: <a class="xref" href="impala_int.html#int">INT Data Type</a>,
+      <a class="xref" href="impala_bigint.html#bigint">BIGINT Data Type</a>, <a class="xref" href="impala_smallint.html#smallint">SMALLINT Data Type</a>, and
+      <a class="xref" href="impala_tinyint.html#tinyint">TINYINT Data Type</a>.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Function reference:</strong>
+    </p>
+
+    <p class="p">
+      Impala supports the following bit functions:
+    </p>
+
+
+
+    <dl class="dl">
+
+      
+
+        <dt class="dt dlterm" id="bit_functions__bitand">
+          <code class="ph codeph">bitand(integer_type a, same_type b)</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Returns an integer value representing the bits that are set to 1 in both of the arguments.
+          If the arguments are of different sizes, the smaller is promoted to the type of the larger.
+          <p class="p">
+            <strong class="ph b">Usage notes:</strong> The <code class="ph codeph">bitand()</code> function is equivalent to the <code class="ph codeph">&amp;</code> binary operator.
+          </p>
+          <p class="p">
+        <strong class="ph b">Return type:</strong> Same as the input value
+      </p>
+          <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.3.0</span>
+      </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+          <p class="p">
+            The following examples show the results of ANDing integer values.
+            255 contains all 1 bits in its lowermost 7 bits.
+            32767 contains all 1 bits in its lowermost 15 bits.
+            
+            You can use the <code class="ph codeph">bin()</code> function to check the binary representation of any
+            integer value, although the result is always represented as a 64-bit value.
+            If necessary, the smaller argument is promoted to the
+            type of the larger one.
+          </p>
+<pre class="pre codeblock"><code>select bitand(255, 32767); /* 0000000011111111 &amp; 0111111111111111 */
++--------------------+
+| bitand(255, 32767) |
++--------------------+
+| 255                |
++--------------------+
+
+select bitand(32767, 1); /* 0111111111111111 &amp; 0000000000000001 */
++------------------+
+| bitand(32767, 1) |
++------------------+
+| 1                |
++------------------+
+
+select bitand(32, 16); /* 00010000 &amp; 00001000 */
++----------------+
+| bitand(32, 16) |
++----------------+
+| 0              |
++----------------+
+
+select bitand(12,5); /* 00001100 &amp; 00000101 */
++---------------+
+| bitand(12, 5) |
++---------------+
+| 4             |
++---------------+
+
+select bitand(-1,15); /* 11111111 &amp; 00001111 */
++----------------+
+| bitand(-1, 15) |
++----------------+
+| 15             |
++----------------+
+</code></pre>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="bit_functions__bitnot">
+          <code class="ph codeph">bitnot(integer_type a)</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Inverts all the bits of the input argument.
+          <p class="p">
+            <strong class="ph b">Usage notes:</strong> The <code class="ph codeph">bitnot()</code> function is equivalent to the <code class="ph codeph">~</code> unary operator.
+          </p>
+          <p class="p">
+        <strong class="ph b">Return type:</strong> Same as the input value
+      </p>
+          <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.3.0</span>
+      </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+          <p class="p">
+            These examples illustrate what happens when you flip all the bits of an integer value.
+            The sign always changes. The decimal representation is one different between the positive and
+            negative values.
+            
+          </p>
+<pre class="pre codeblock"><code>select bitnot(127); /* 01111111 -&gt; 10000000 */
++-------------+
+| bitnot(127) |
++-------------+
+| -128        |
++-------------+
+
+select bitnot(16); /* 00010000 -&gt; 11101111 */
++------------+
+| bitnot(16) |
++------------+
+| -17        |
++------------+
+
+select bitnot(0); /* 00000000 -&gt; 11111111 */
++-----------+
+| bitnot(0) |
++-----------+
+| -1        |
++-----------+
+
+select bitnot(-128); /* 10000000 -&gt; 01111111 */
++--------------+
+| bitnot(-128) |
++--------------+
+| 127          |
++--------------+
+</code></pre>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="bit_functions__bitor">
+          <code class="ph codeph">bitor(integer_type a, same_type b)</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Returns an integer value representing the bits that are set to 1 in either of the arguments.
+          If the arguments are of different sizes, the smaller is promoted to the type of the larger.
+          <p class="p">
+            <strong class="ph b">Usage notes:</strong> The <code class="ph codeph">bitor()</code> function is equivalent to the <code class="ph codeph">|</code> binary operator.
+          </p>
+          <p class="p">
+        <strong class="ph b">Return type:</strong> Same as the input value
+      </p>
+          <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.3.0</span>
+      </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+          <p class="p">
+            The following examples show the results of ORing integer values.
+          </p>
+<pre class="pre codeblock"><code>select bitor(1,4); /* 00000001 | 00000100 */
++-------------+
+| bitor(1, 4) |
++-------------+
+| 5           |
++-------------+
+
+select bitor(16,48); /* 00001000 | 00011000 */
++---------------+
+| bitor(16, 48) |
++---------------+
+| 48            |
++---------------+
+
+select bitor(0,7); /* 00000000 | 00000111 */
++-------------+
+| bitor(0, 7) |
++-------------+
+| 7           |
++-------------+
+</code></pre>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="bit_functions__bitxor">
+          <code class="ph codeph">bitxor(integer_type a, same_type b)</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Returns an integer value representing the bits that are set to 1 in one but not both of the arguments.
+          If the arguments are of different sizes, the smaller is promoted to the type of the larger.
+          <p class="p">
+            <strong class="ph b">Usage notes:</strong> The <code class="ph codeph">bitxor()</code> function is equivalent to the <code class="ph codeph">^</code> binary operator.
+          </p>
+          <p class="p">
+        <strong class="ph b">Return type:</strong> Same as the input value
+      </p>
+          <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.3.0</span>
+      </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+          <p class="p">
+            The following examples show the results of XORing integer values.
+            XORing a non-zero value with zero returns the non-zero value.
+            XORing two identical values returns zero, because all the 1 bits from the first argument are also 1 bits in the second argument.
+            XORing different non-zero values turns off some bits and leaves others turned on, based on whether the same bit is set in both arguments.
+          </p>
+<pre class="pre codeblock"><code>select bitxor(0,15); /* 00000000 ^ 00001111 */
++---------------+
+| bitxor(0, 15) |
++---------------+
+| 15            |
++---------------+
+
+select bitxor(7,7); /* 00000111 ^ 00000111 */
++--------------+
+| bitxor(7, 7) |
++--------------+
+| 0            |
++--------------+
+
+select bitxor(8,4); /* 00001000 ^ 00000100 */
++--------------+
+| bitxor(8, 4) |
++--------------+
+| 12           |
++--------------+
+
+select bitxor(3,7); /* 00000011 ^ 00000111 */
++--------------+
+| bitxor(3, 7) |
++--------------+
+| 4            |
++--------------+
+</code></pre>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="bit_functions__countset">
+          <code class="ph codeph">countset(integer_type a [, int zero_or_one])</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> By default, returns the number of 1 bits in the specified integer value.
+          If the optional second argument is set to zero, it returns the number of 0 bits instead.
+          <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+          <p class="p">
+            In discussions of information theory, this operation is referred to as the
+            <span class="q">"<a class="xref" href="https://en.wikipedia.org/wiki/Hamming_weight" target="_blank">population count</a>"</span>
+            or <span class="q">"popcount"</span>.
+          </p>
+          <p class="p">
+        <strong class="ph b">Return type:</strong> Same as the input value
+      </p>
+          <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.3.0</span>
+      </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+          <p class="p">
+            The following examples show how to count the number of 1 bits in an integer value.
+          </p>
+<pre class="pre codeblock"><code>select countset(1); /* 00000001 */
++-------------+
+| countset(1) |
++-------------+
+| 1           |
++-------------+
+
+select countset(3); /* 00000011 */
++-------------+
+| countset(3) |
++-------------+
+| 2           |
++-------------+
+
+select countset(16); /* 00010000 */
++--------------+
+| countset(16) |
++--------------+
+| 1            |
++--------------+
+
+select countset(17); /* 00010001 */
++--------------+
+| countset(17) |
++--------------+
+| 2            |
++--------------+
+
+select countset(7,1); /* 00000111 = 3 1 bits; the function counts 1 bits by default */
++----------------+
+| countset(7, 1) |
++----------------+
+| 3              |
++----------------+
+
+select countset(7,0); /* 00000111 = 5 0 bits; third argument can only be 0 or 1 */
++----------------+
+| countset(7, 0) |
++----------------+
+| 5              |
++----------------+
+</code></pre>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="bit_functions__getbit">
+          <code class="ph codeph">getbit(integer_type a, int position)</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Returns a 0 or 1 representing the bit at a
+          specified position. The positions are numbered right to left, starting at zero.
+          The position argument cannot be negative.
+          <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+          <p class="p">
+            When you use a literal input value, it is treated as an 8-bit, 16-bit,
+            and so on value, the smallest type that is appropriate.
+            The type of the input value limits the range of the positions.
+            Cast the input value to the appropriate type if you need to
+            ensure it is treated as a 64-bit, 32-bit, and so on value.
+          </p>
+          <p class="p">
+        <strong class="ph b">Return type:</strong> Same as the input value
+      </p>
+          <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.3.0</span>
+      </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+          <p class="p">
+            The following examples show how to test a specific bit within an integer value.
+          </p>
+<pre class="pre codeblock"><code>select getbit(1,0); /* 00000001 */
++--------------+
+| getbit(1, 0) |
++--------------+
+| 1            |
++--------------+
+
+select getbit(16,1) /* 00010000 */
++---------------+
+| getbit(16, 1) |
++---------------+
+| 0             |
++---------------+
+
+select getbit(16,4) /* 00010000 */
++---------------+
+| getbit(16, 4) |
++---------------+
+| 1             |
++---------------+
+
+select getbit(16,5) /* 00010000 */
++---------------+
+| getbit(16, 5) |
++---------------+
+| 0             |
++---------------+
+
+select getbit(-1,3); /* 11111111 */
++---------------+
+| getbit(-1, 3) |
++---------------+
+| 1             |
++---------------+
+
+select getbit(-1,25); /* 11111111 */
+ERROR: Invalid bit position: 25
+
+select getbit(cast(-1 as int),25); /* 11111111111111111111111111111111 */
++-----------------------------+
+| getbit(cast(-1 as int), 25) |
++-----------------------------+
+| 1                           |
++-----------------------------+
+</code></pre>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="bit_functions__rotateleft">
+          <code class="ph codeph">rotateleft(integer_type a, int positions)</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Rotates an integer value left by a specified number of bits.
+          As the most significant bit is taken out of the original value,
+          if it is a 1 bit, it is <span class="q">"rotated"</span> back to the least significant bit.
+          Therefore, the final value has the same number of 1 bits as the original value,
+          just in different positions.
+          In computer science terms, this operation is a
+          <span class="q">"<a class="xref" href="https://en.wikipedia.org/wiki/Circular_shift" target="_blank">circular shift</a>"</span>.
+          <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+          <p class="p">
+            Specifying a second argument of zero leaves the original value unchanged.
+            Rotating a -1 value by any number of positions still returns -1,
+            because the original value has all 1 bits and all the 1 bits are
+            preserved during rotation.
+            Similarly, rotating a 0 value by any number of positions still returns 0.
+            Rotating a value by the same number of bits as in the value returns the same value.
+            Because this is a circular operation, the number of positions is not limited
+            to the number of bits in the input value.
+            For example, rotating an 8-bit value by 1, 9, 17, and so on positions returns an
+            identical result in each case.
+          </p>
+          <p class="p">
+        <strong class="ph b">Return type:</strong> Same as the input value
+      </p>
+          <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.3.0</span>
+      </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+<pre class="pre codeblock"><code>select rotateleft(1,4); /* 00000001 -&gt; 00010000 */
++------------------+
+| rotateleft(1, 4) |
++------------------+
+| 16               |
++------------------+
+
+select rotateleft(-1,155); /* 11111111 -&gt; 11111111 */
++---------------------+
+| rotateleft(-1, 155) |
++---------------------+
+| -1                  |
++---------------------+
+
+select rotateleft(-128,1); /* 10000000 -&gt; 00000001 */
++---------------------+
+| rotateleft(-128, 1) |
++---------------------+
+| 1                   |
++---------------------+
+
+select rotateleft(-127,3); /* 10000001 -&gt; 00001100 */
++---------------------+
+| rotateleft(-127, 3) |
++---------------------+
+| 12                  |
++---------------------+
+
+</code></pre>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="bit_functions__rotateright">
+          <code class="ph codeph">rotateright(integer_type a, int positions)</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Rotates an integer value right by a specified number of bits.
+          As the least significant bit is taken out of the original value,
+          if it is a 1 bit, it is <span class="q">"rotated"</span> back to the most significant bit.
+          Therefore, the final value has the same number of 1 bits as the original value,
+          just in different positions.
+          In computer science terms, this operation is a
+          <span class="q">"<a class="xref" href="https://en.wikipedia.org/wiki/Circular_shift" target="_blank">circular shift</a>"</span>.
+          <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+          <p class="p">
+            Specifying a second argument of zero leaves the original value unchanged.
+            Rotating a -1 value by any number of positions still returns -1,
+            because the original value has all 1 bits and all the 1 bits are
+            preserved during rotation.
+            Similarly, rotating a 0 value by any number of positions still returns 0.
+            Rotating a value by the same number of bits as in the value returns the same value.
+            Because this is a circular operation, the number of positions is not limited
+            to the number of bits in the input value.
+            For example, rotating an 8-bit value by 1, 9, 17, and so on positions returns an
+            identical result in each case.
+          </p>
+          <p class="p">
+        <strong class="ph b">Return type:</strong> Same as the input value
+      </p>
+          <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.3.0</span>
+      </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+<pre class="pre codeblock"><code>select rotateright(16,4); /* 00010000 -&gt; 00000001 */
++--------------------+
+| rotateright(16, 4) |
++--------------------+
+| 1                  |
++--------------------+
+
+select rotateright(-1,155); /* 11111111 -&gt; 11111111 */
++----------------------+
+| rotateright(-1, 155) |
++----------------------+
+| -1                   |
++----------------------+
+
+select rotateright(-128,1); /* 10000000 -&gt; 01000000 */
++----------------------+
+| rotateright(-128, 1) |
++----------------------+
+| 64                   |
++----------------------+
+
+select rotateright(-127,3); /* 10000001 -&gt; 00110000 */
++----------------------+
+| rotateright(-127, 3) |
++----------------------+
+| 48                   |
++----------------------+
+</code></pre>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="bit_functions__setbit">
+          <code class="ph codeph">setbit(integer_type a, int position [, int zero_or_one])</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> By default, changes a bit at a specified position to a 1, if it is not already.
+          If the optional third argument is set to zero, the specified bit is set to 0 instead.
+          <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+          If the bit at the specified position was already 1 (by default)
+          or 0 (with a third argument of zero), the return value is
+          the same as the first argument.
+          The positions are numbered right to left, starting at zero.
+          (Therefore, the return value could be different from the first argument
+          even if the position argument is zero.)
+          The position argument cannot be negative.
+          <p class="p">
+            When you use a literal input value, it is treated as an 8-bit, 16-bit,
+            and so on value, the smallest type that is appropriate.
+            The type of the input value limits the range of the positions.
+            Cast the input value to the appropriate type if you need to
+            ensure it is treated as a 64-bit, 32-bit, and so on value.
+          </p>
+          <p class="p">
+        <strong class="ph b">Return type:</strong> Same as the input value
+      </p>
+          <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.3.0</span>
+      </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+<pre class="pre codeblock"><code>select setbit(0,0); /* 00000000 -&gt; 00000001 */
++--------------+
+| setbit(0, 0) |
++--------------+
+| 1            |
++--------------+
+
+select setbit(0,3); /* 00000000 -&gt; 00001000 */
++--------------+
+| setbit(0, 3) |
++--------------+
+| 8            |
++--------------+
+
+select setbit(7,3); /* 00000111 -&gt; 00001111 */
++--------------+
+| setbit(7, 3) |
++--------------+
+| 15           |
++--------------+
+
+select setbit(15,3); /* 00001111 -&gt; 00001111 */
++---------------+
+| setbit(15, 3) |
++---------------+
+| 15            |
++---------------+
+
+select setbit(0,32); /* By default, 0 is a TINYINT with only 8 bits. */
+ERROR: Invalid bit position: 32
+
+select setbit(cast(0 as bigint),32); /* For BIGINT, the position can be 0..63. */
++-------------------------------+
+| setbit(cast(0 as bigint), 32) |
++-------------------------------+
+| 4294967296                    |
++-------------------------------+
+
+select setbit(7,3,1); /* 00000111 -&gt; 00001111; setting to 1 is the default */
++-----------------+
+| setbit(7, 3, 1) |
++-----------------+
+| 15              |
++-----------------+
+
+select setbit(7,2,0); /* 00000111 -&gt; 00000011; third argument of 0 clears instead of sets */
++-----------------+
+| setbit(7, 2, 0) |
++-----------------+
+| 3               |
++-----------------+
+</code></pre>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="bit_functions__shiftleft">
+          <code class="ph codeph">shiftleft(integer_type a, int positions)</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Shifts an integer value left by a specified number of bits.
+          As the most significant bit is taken out of the original value,
+          it is discarded and the least significant bit becomes 0.
+          In computer science terms, this operation is a <span class="q">"<a class="xref" href="https://en.wikipedia.org/wiki/Logical_shift" target="_blank">logical shift</a>"</span>.
+          <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+          <p class="p">
+            The final value has either the same number of 1 bits as the original value, or fewer.
+            Shifting an 8-bit value by 8 positions, a 16-bit value by 16 positions, and so on produces
+            a result of zero.
+          </p>
+          <p class="p">
+            Specifying a second argument of zero leaves the original value unchanged.
+            Shifting any value by 0 returns the original value.
+            Shifting any value by 1 is the same as multiplying it by 2,
+            as long as the value is small enough; larger values eventually
+            become negative when shifted, as the sign bit is set.
+            Starting with the value 1 and shifting it left by N positions gives
+            the same result as 2 to the Nth power, or <code class="ph codeph">pow(2,<var class="keyword varname">N</var>)</code>.
+          </p>
+          <p class="p">
+        <strong class="ph b">Return type:</strong> Same as the input value
+      </p>
+          <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.3.0</span>
+      </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+<pre class="pre codeblock"><code>select shiftleft(1,0); /* 00000001 -&gt; 00000001 */
++-----------------+
+| shiftleft(1, 0) |
++-----------------+
+| 1               |
++-----------------+
+
+select shiftleft(1,3); /* 00000001 -&gt; 00001000 */
++-----------------+
+| shiftleft(1, 3) |
++-----------------+
+| 8               |
++-----------------+
+
+select shiftleft(8,2); /* 00001000 -&gt; 00100000 */
++-----------------+
+| shiftleft(8, 2) |
++-----------------+
+| 32              |
++-----------------+
+
+select shiftleft(127,1); /* 01111111 -&gt; 11111110 */
++-------------------+
+| shiftleft(127, 1) |
++-------------------+
+| -2                |
++-------------------+
+
+select shiftleft(127,5); /* 01111111 -&gt; 11100000 */
++-------------------+
+| shiftleft(127, 5) |
++-------------------+
+| -32               |
++-------------------+
+
+select shiftleft(-1,4); /* 11111111 -&gt; 11110000 */
++------------------+
+| shiftleft(-1, 4) |
++------------------+
+| -16              |
++------------------+
+</code></pre>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="bit_functions__shiftright">
+          <code class="ph codeph">shiftright(integer_type a, int positions)</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Shifts an integer value right by a specified number of bits.
+          As the least significant bit is taken out of the original value,
+          it is discarded and the most significant bit becomes 0.
+          In computer science terms, this operation is a <span class="q">"<a class="xref" href="https://en.wikipedia.org/wiki/Logical_shift" target="_blank">logical shift</a>"</span>.
+          <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+          <p class="p">
+          Therefore, the final value has either the same number of 1 bits as the original value, or fewer.
+          Shifting an 8-bit value by 8 positions, a 16-bit value by 16 positions, and so on produces
+          a result of zero.
+          </p>
+          <p class="p">
+            Specifying a second argument of zero leaves the original value unchanged.
+            Shifting any value by 0 returns the original value.
+            Shifting any positive value right by 1 is the same as dividing it by 2.
+            Negative values become positive when shifted right.
+          </p>
+          <p class="p">
+        <strong class="ph b">Return type:</strong> Same as the input value
+      </p>
+          <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.3.0</span>
+      </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+<pre class="pre codeblock"><code>select shiftright(16,0); /* 00010000 -&gt; 00000000 */
++-------------------+
+| shiftright(16, 0) |
++-------------------+
+| 16                |
++-------------------+
+
+select shiftright(16,4); /* 00010000 -&gt; 00000000 */
++-------------------+
+| shiftright(16, 4) |
++-------------------+
+| 1                 |
++-------------------+
+
+select shiftright(16,5); /* 00010000 -&gt; 00000000 */
++-------------------+
+| shiftright(16, 5) |
++-------------------+
+| 0                 |
++-------------------+
+
+select shiftright(-1,1); /* 11111111 -&gt; 01111111 */
++-------------------+
+| shiftright(-1, 1) |
++-------------------+
+| 127               |
++-------------------+
+
+select shiftright(-1,5); /* 11111111 -&gt; 00000111 */
++-------------------+
+| shiftright(-1, 5) |
++-------------------+
+| 7                 |
++-------------------+
+</code></pre>
+        </dd>
+
+      
+
+    </dl>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_functions.html">Impala Built-In Functions</a></div></div></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_boolean.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_boolean.html b/docs/build/html/topics/impala_boolean.html
new file mode 100644
index 0000000..51a91ba
--- /dev/null
+++ b/docs/build/html/topics/impala_boolean.html
@@ -0,0 +1,170 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_datatypes.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="boolean"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>BOOLEAN Data Type</title></head><body id="boolean"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">BOOLEAN Data Type</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      A data type used in <code class="ph codeph">CREATE TABLE</code> and <code class="ph codeph">ALTER TABLE</code> statements, representing a
+      single true/false choice.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+    <p class="p">
+      In the column definition of a <code class="ph codeph">CREATE TABLE</code> statement:
+    </p>
+
+<pre class="pre codeblock"><code><var class="keyword varname">column_name</var> BOOLEAN</code></pre>
+
+    <p class="p">
+      <strong class="ph b">Range:</strong> <code class="ph codeph">TRUE</code> or <code class="ph codeph">FALSE</code>. Do not use quotation marks around the
+      <code class="ph codeph">TRUE</code> and <code class="ph codeph">FALSE</code> literal values. You can write the literal values in
+      uppercase, lowercase, or mixed case. The values queried from a table are always returned in lowercase,
+      <code class="ph codeph">true</code> or <code class="ph codeph">false</code>.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Conversions:</strong> Impala does not automatically convert any other type to <code class="ph codeph">BOOLEAN</code>. All
+      conversions must use an explicit call to the <code class="ph codeph">CAST()</code> function.
+    </p>
+
+    <p class="p">
+      You can use <code class="ph codeph">CAST()</code> to convert
+
+      any integer or floating-point type to
+      <code class="ph codeph">BOOLEAN</code>: a value of 0 represents <code class="ph codeph">false</code>, and any non-zero value is converted
+      to <code class="ph codeph">true</code>.
+    </p>
+
+<pre class="pre codeblock"><code>SELECT CAST(42 AS BOOLEAN) AS nonzero_int, CAST(99.44 AS BOOLEAN) AS nonzero_decimal,
+  CAST(000 AS BOOLEAN) AS zero_int, CAST(0.0 AS BOOLEAN) AS zero_decimal;
++-------------+-----------------+----------+--------------+
+| nonzero_int | nonzero_decimal | zero_int | zero_decimal |
++-------------+-----------------+----------+--------------+
+| true        | true            | false    | false        |
++-------------+-----------------+----------+--------------+
+</code></pre>
+
+    <p class="p">
+      When you cast the opposite way, from <code class="ph codeph">BOOLEAN</code> to a numeric type,
+      the result becomes either 1 or 0:
+    </p>
+
+<pre class="pre codeblock"><code>SELECT CAST(true AS INT) AS true_int, CAST(true AS DOUBLE) AS true_double,
+  CAST(false AS INT) AS false_int, CAST(false AS DOUBLE) AS false_double;
++----------+-------------+-----------+--------------+
+| true_int | true_double | false_int | false_double |
++----------+-------------+-----------+--------------+
+| 1        | 1           | 0         | 0            |
++----------+-------------+-----------+--------------+
+</code></pre>
+
+    <p class="p">
+
+      You can cast <code class="ph codeph">DECIMAL</code> values to <code class="ph codeph">BOOLEAN</code>, with the same treatment of zero and
+      non-zero values as the other numeric types. You cannot cast a <code class="ph codeph">BOOLEAN</code> to a
+      <code class="ph codeph">DECIMAL</code>.
+    </p>
+
+    <p class="p">
+      You cannot cast a <code class="ph codeph">STRING</code> value to <code class="ph codeph">BOOLEAN</code>, although you can cast a
+      <code class="ph codeph">BOOLEAN</code> value to <code class="ph codeph">STRING</code>, returning <code class="ph codeph">'1'</code> for
+      <code class="ph codeph">true</code> values and <code class="ph codeph">'0'</code> for <code class="ph codeph">false</code> values.
+    </p>
+
+    <p class="p">
+      Although you can cast a <code class="ph codeph">TIMESTAMP</code> to a <code class="ph codeph">BOOLEAN</code> or a
+      <code class="ph codeph">BOOLEAN</code> to a <code class="ph codeph">TIMESTAMP</code>, the results are unlikely to be useful. Any non-zero
+      <code class="ph codeph">TIMESTAMP</code> (that is, any value other than <code class="ph codeph">1970-01-01 00:00:00</code>) becomes
+      <code class="ph codeph">TRUE</code> when converted to <code class="ph codeph">BOOLEAN</code>, while <code class="ph codeph">1970-01-01 00:00:00</code>
+      becomes <code class="ph codeph">FALSE</code>. A value of <code class="ph codeph">FALSE</code> becomes <code class="ph codeph">1970-01-01
+      00:00:00</code> when converted to <code class="ph codeph">BOOLEAN</code>, and <code class="ph codeph">TRUE</code> becomes one second
+      past this epoch date, that is, <code class="ph codeph">1970-01-01 00:00:01</code>.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">NULL considerations:</strong> An expression of this type produces a <code class="ph codeph">NULL</code> value if any
+        argument of the expression is <code class="ph codeph">NULL</code>.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Partitioning:</strong>
+      </p>
+
+    <p class="p">
+      Do not use a <code class="ph codeph">BOOLEAN</code> column as a partition key. Although you can create such a table,
+      subsequent operations produce errors:
+    </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; create table truth_table (assertion string) partitioned by (truth boolean);
+[localhost:21000] &gt; insert into truth_table values ('Pigs can fly',false);
+ERROR: AnalysisException: INSERT into table with BOOLEAN partition column (truth) is not supported: partitioning.truth_table
+</code></pre>
+
+    <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>SELECT 1 &lt; 2;
+SELECT 2 = 5;
+SELECT 100 &lt; NULL, 100 &gt; NULL;
+CREATE TABLE assertions (claim STRING, really BOOLEAN);
+INSERT INTO assertions VALUES
+  ("1 is less than 2", 1 &lt; 2),
+  ("2 is the same as 5", 2 = 5),
+  ("Grass is green", true),
+  ("The moon is made of green cheese", false);
+SELECT claim FROM assertions WHERE really = TRUE;
+</code></pre>
+
+    <p class="p">
+        <strong class="ph b">HBase considerations:</strong> This data type is fully compatible with HBase tables.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Parquet considerations:</strong> This type is fully compatible with Parquet tables.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Text table considerations:</strong> Values of this type are potentially larger in text tables than in tables
+        using Parquet or other binary formats.
+      </p>
+
+
+
+
+
+
+
+    <p class="p">
+        <strong class="ph b">Column statistics considerations:</strong> Because this type has a fixed size, the maximum and average size
+        fields are always filled in for column statistics, even before you run the <code class="ph codeph">COMPUTE STATS</code>
+        statement.
+      </p>
+
+
+
+    <p class="p">
+        <strong class="ph b">Kudu considerations:</strong>
+      </p>
+    <p class="p">
+        Currently, the data types <code class="ph codeph">BOOLEAN</code>, <code class="ph codeph">FLOAT</code>,
+        and <code class="ph codeph">DOUBLE</code> cannot be used for primary key columns in Kudu tables.
+      </p>
+
+
+
+    <p class="p">
+      <strong class="ph b">Related information:</strong> <a class="xref" href="impala_literals.html#boolean_literals">Boolean Literals</a>,
+      <a class="xref" href="impala_operators.html#operators">SQL Operators</a>,
+      <a class="xref" href="impala_conditional_functions.html#conditional_functions">Impala Conditional Functions</a>
+    </p>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_datatypes.html">Data Types</a></div></div></nav></article></main></body></html>
\ No newline at end of file

[29/51] [partial] incubator-impala git commit: IMPALA-4181 [DOCS] Publish rendered Impala documentation to ASF site

Posted by jb...@apache.org.

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_insert.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_insert.html b/docs/build/html/topics/impala_insert.html
new file mode 100644
index 0000000..557ab70
--- /dev/null
+++ b/docs/build/html/topics/impala_insert.html
@@ -0,0 +1,798 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_langref_sql.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="insert"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>INSERT Statement</title></head><body id="insert"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">INSERT Statement</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      
+      Impala supports inserting into tables and partitions that you create with the Impala <code class="ph codeph">CREATE
+      TABLE</code> statement, or pre-defined tables and partitions created through Hive.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>[<var class="keyword varname">with_clause</var>]
+INSERT { INTO | OVERWRITE } [TABLE] <var class="keyword varname">table_name</var>
+  [(<var class="keyword varname">column_list</var>)]
+  [ PARTITION (<var class="keyword varname">partition_clause</var>)]
+{
+    [<var class="keyword varname">hint_clause</var>] <var class="keyword varname">select_statement</var>
+  | VALUES (<var class="keyword varname">value</var> [, <var class="keyword varname">value</var> ...]) [, (<var class="keyword varname">value</var> [, <var class="keyword varname">value</var> ...]) ...]
+}
+
+partition_clause ::= <var class="keyword varname">col_name</var> [= <var class="keyword varname">constant</var>] [, <var class="keyword varname">col_name</var> [= <var class="keyword varname">constant</var>] ...]
+
+hint_clause ::= [SHUFFLE] | [NOSHUFFLE]    (Note: the square brackets are part of the syntax.)
+</code></pre>
+
+    <p class="p">
+      <strong class="ph b">Appending or replacing (INTO and OVERWRITE clauses):</strong>
+    </p>
+
+    <p class="p">
+      The <code class="ph codeph">INSERT INTO</code> syntax appends data to a table. The existing data files are left as-is, and
+      the inserted data is put into one or more new data files.
+    </p>
+
+    <p class="p">
+      The <code class="ph codeph">INSERT OVERWRITE</code> syntax replaces the data in a table.
+
+
+      Currently, the overwritten data files are deleted immediately; they do not go through the HDFS trash
+      mechanism.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Complex type considerations:</strong>
+      </p>
+
+    <p class="p">
+      The <code class="ph codeph">INSERT</code> statement currently does not support writing data files
+      containing complex types (<code class="ph codeph">ARRAY</code>, <code class="ph codeph">STRUCT</code>, and <code class="ph codeph">MAP</code>).
+      To prepare Parquet data for such tables, you generate the data files outside Impala and then
+      use <code class="ph codeph">LOAD DATA</code> or <code class="ph codeph">CREATE EXTERNAL TABLE</code> to associate those
+      data files with the table. Currently, such tables must use the Parquet file format.
+      See <a class="xref" href="impala_complex_types.html#complex_types">Complex Types (Impala 2.3 or higher only)</a> for details about working with complex types.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Kudu considerations:</strong>
+      </p>
+
+    <p class="p">
+        Currently, the <code class="ph codeph">INSERT OVERWRITE</code> syntax cannot be used with Kudu tables.
+      </p>
+
+    <p class="p">
+      Kudu tables require a unique primary key for each row. If an <code class="ph codeph">INSERT</code>
+      statement attempts to insert a row with the same values for the primary key columns
+      as an existing row, that row is discarded and the insert operation continues.
+      When rows are discarded due to duplicate primary keys, the statement finishes
+      with a warning, not an error. (This is a change from early releases of Kudu
+      where the default was to return in error in such cases, and the syntax
+      <code class="ph codeph">INSERT IGNORE</code> was required to make the statement succeed.
+      The <code class="ph codeph">IGNORE</code> clause is no longer part of the <code class="ph codeph">INSERT</code>
+      syntax.)
+    </p>
+
+    <p class="p">
+      For situations where you prefer to replace rows with duplicate primary key values,
+      rather than discarding the new data, you can use the <code class="ph codeph">UPSERT</code>
+      statement instead of <code class="ph codeph">INSERT</code>. <code class="ph codeph">UPSERT</code> inserts
+      rows that are entirely new, and for rows that match an existing primary key in the
+      table, the non-primary-key columns are updated to reflect the values in the
+      <span class="q">"upserted"</span> data.
+    </p>
+
+    <p class="p">
+      If you really want to store new rows, not replace existing ones, but cannot do so
+      because of the primary key uniqueness constraint, consider recreating the table
+      with additional columns included in the primary key.
+    </p>
+
+    <p class="p">
+      See <a class="xref" href="impala_kudu.html#impala_kudu">Using Impala to Query Kudu Tables</a> for more details about using Impala with Kudu.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+    <p class="p">
+      Impala currently supports:
+    </p>
+
+    <ul class="ul">
+      <li class="li">
+        Copy data from another table using <code class="ph codeph">SELECT</code> query. In Impala 1.2.1 and higher, you can
+        combine <code class="ph codeph">CREATE TABLE</code> and <code class="ph codeph">INSERT</code> operations into a single step with the
+        <code class="ph codeph">CREATE TABLE AS SELECT</code> syntax, which bypasses the actual <code class="ph codeph">INSERT</code> keyword.
+      </li>
+
+      <li class="li">
+        An optional <a class="xref" href="impala_with.html#with"><code class="ph codeph">WITH</code> clause</a> before the
+        <code class="ph codeph">INSERT</code> keyword, to define a subquery referenced in the <code class="ph codeph">SELECT</code> portion.
+      </li>
+
+      <li class="li">
+        Create one or more new rows using constant expressions through <code class="ph codeph">VALUES</code> clause. (The
+        <code class="ph codeph">VALUES</code> clause was added in Impala 1.0.1.)
+      </li>
+
+      <li class="li">
+        <p class="p">
+          By default, the first column of each newly inserted row goes into the first column of the table, the
+          second column into the second column, and so on.
+        </p>
+        <p class="p">
+          You can also specify the columns to be inserted, an arbitrarily ordered subset of the columns in the
+          destination table, by specifying a column list immediately after the name of the destination table. This
+          feature lets you adjust the inserted columns to match the layout of a <code class="ph codeph">SELECT</code> statement,
+          rather than the other way around. (This feature was added in Impala 1.1.)
+        </p>
+        <p class="p">
+          The number of columns mentioned in the column list (known as the <span class="q">"column permutation"</span>) must match
+          the number of columns in the <code class="ph codeph">SELECT</code> list or the <code class="ph codeph">VALUES</code> tuples. The
+          order of columns in the column permutation can be different than in the underlying table, and the columns
+          of each input row are reordered to match. If the number of columns in the column permutation is less than
+          in the destination table, all unmentioned columns are set to <code class="ph codeph">NULL</code>.
+        </p>
+      </li>
+
+      <li class="li">
+        <p class="p">
+          For a partitioned table, the optional <code class="ph codeph">PARTITION</code> clause identifies which partition or
+          partitions the new values go into. If a partition key column is given a constant value such as
+          <code class="ph codeph">PARTITION (year=2012)</code> or <code class="ph codeph">PARTITION (year=2012, month=2)</code>, all the
+          inserted rows use those same values for those partition key columns and you omit any corresponding
+          columns in the source table from the <code class="ph codeph">SELECT</code> list. This form is known as <span class="q">"static
+          partitioning"</span>.
+        </p>
+        <p class="p">
+          If a partition key column is mentioned but not assigned a value, such as in <code class="ph codeph">PARTITION (year,
+          region)</code> (both columns unassigned) or <code class="ph codeph">PARTITION(year, region='CA')</code>
+          (<code class="ph codeph">year</code> column unassigned), the unassigned columns are filled in with the final columns of
+          the <code class="ph codeph">SELECT</code> list. In this case, the number of columns in the <code class="ph codeph">SELECT</code> list
+          must equal the number of columns in the column permutation plus the number of partition key columns not
+          assigned a constant value. This form is known as <span class="q">"dynamic partitioning"</span>.
+        </p>
+        <p class="p">
+          See <a class="xref" href="impala_partitioning.html#partition_static_dynamic">Static and Dynamic Partitioning Clauses</a> for examples and performance
+          characteristics of static and dynamic partitioned inserts.
+        </p>
+      </li>
+
+      <li class="li">
+        An optional hint clause immediately before the <code class="ph codeph">SELECT</code> keyword, to fine-tune the behavior
+        when doing an <code class="ph codeph">INSERT ... SELECT</code> operation into partitioned Parquet tables. The hint
+        keywords are <code class="ph codeph">[SHUFFLE]</code> and <code class="ph codeph">[NOSHUFFLE]</code>, including the square brackets.
+        Inserting into partitioned Parquet tables can be a resource-intensive operation because it potentially
+        involves many files being written to HDFS simultaneously, and separate
+        <span class="ph">large</span> memory buffers being allocated to buffer the data for each
+        partition. For usage details, see <a class="xref" href="impala_parquet.html#parquet_etl">Loading Data into Parquet Tables</a>.
+      </li>
+    </ul>
+
+    <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+      <ul class="ul">
+        <li class="li">
+          Insert commands that partition or add files result in changes to Hive metadata. Because Impala uses Hive
+          metadata, such changes may necessitate a metadata refresh. For more information, see the
+          <a class="xref" href="impala_refresh.html#refresh">REFRESH</a> function.
+        </li>
+
+        <li class="li">
+          Currently, Impala can only insert data into tables that use the text and Parquet formats. For other file
+          formats, insert the data using Hive and use Impala to query it.
+        </li>
+
+        <li class="li">
+          As an alternative to the <code class="ph codeph">INSERT</code> statement, if you have existing data files elsewhere in
+          HDFS, the <code class="ph codeph">LOAD DATA</code> statement can move those files into a table. This statement works
+          with tables of any file format.
+        </li>
+      </ul>
+    </div>
+
+    <p class="p">
+        <strong class="ph b">Statement type:</strong> DML (but still affected by
+        <a class="xref" href="../shared/../topics/impala_sync_ddl.html#sync_ddl">SYNC_DDL</a> query option)
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+    <p class="p">
+      When you insert the results of an expression, particularly of a built-in function call, into a small numeric
+      column such as <code class="ph codeph">INT</code>, <code class="ph codeph">SMALLINT</code>, <code class="ph codeph">TINYINT</code>, or
+      <code class="ph codeph">FLOAT</code>, you might need to use a <code class="ph codeph">CAST()</code> expression to coerce values into the
+      appropriate type. Impala does not automatically convert from a larger type to a smaller one. For example, to
+      insert cosine values into a <code class="ph codeph">FLOAT</code> column, write <code class="ph codeph">CAST(COS(angle) AS FLOAT)</code>
+      in the <code class="ph codeph">INSERT</code> statement to make the conversion explicit.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">File format considerations:</strong>
+      </p>
+
+    <p class="p">
+      Because Impala can read certain file formats that it cannot write,
+      the <code class="ph codeph">INSERT</code> statement does not work for all kinds of
+      Impala tables. See <a class="xref" href="impala_file_formats.html#file_formats">How Impala Works with Hadoop File Formats</a>
+      for details about what file formats are supported by the
+      <code class="ph codeph">INSERT</code> statement.
+    </p>
+
+    <p class="p">
+        Any <code class="ph codeph">INSERT</code> statement for a Parquet table requires enough free space in the HDFS filesystem
+        to write one block. Because Parquet data files use a block size of 1 GB by default, an
+        <code class="ph codeph">INSERT</code> might fail (even for a very small amount of data) if your HDFS is running low on
+        space.
+      </p>
+
+    <p class="p">
+        If you connect to different Impala nodes within an <span class="keyword cmdname">impala-shell</span> session for
+        load-balancing purposes, you can enable the <code class="ph codeph">SYNC_DDL</code> query option to make each DDL
+        statement wait before returning, until the new or changed metadata has been received by all the Impala
+        nodes. See <a class="xref" href="../shared/../topics/impala_sync_ddl.html#sync_ddl">SYNC_DDL Query Option</a> for details.
+      </p>
+
+    <div class="note important note_important"><span class="note__title importanttitle">Important:</span> 
+        After adding or replacing data in a table used in performance-critical queries, issue a <code class="ph codeph">COMPUTE
+        STATS</code> statement to make sure all statistics are up-to-date. Consider updating statistics for a
+        table after any <code class="ph codeph">INSERT</code>, <code class="ph codeph">LOAD DATA</code>, or <code class="ph codeph">CREATE TABLE AS
+        SELECT</code> statement in Impala, or after loading data through Hive and doing a <code class="ph codeph">REFRESH
+        <var class="keyword varname">table_name</var></code> in Impala. This technique is especially important for tables that
+        are very large, used in join queries, or both.
+      </div>
+
+    <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+    <p class="p">
+      The following example sets up new tables with the same definition as the <code class="ph codeph">TAB1</code> table from the
+      <a class="xref" href="impala_tutorial.html#tutorial">Tutorial</a> section, using different file
+      formats, and demonstrates inserting data into the tables created with the <code class="ph codeph">STORED AS TEXTFILE</code>
+      and <code class="ph codeph">STORED AS PARQUET</code> clauses:
+    </p>
+
+<pre class="pre codeblock"><code>CREATE DATABASE IF NOT EXISTS file_formats;
+USE file_formats;
+
+DROP TABLE IF EXISTS text_table;
+CREATE TABLE text_table
+( id INT, col_1 BOOLEAN, col_2 DOUBLE, col_3 TIMESTAMP )
+STORED AS TEXTFILE;
+
+DROP TABLE IF EXISTS parquet_table;
+CREATE TABLE parquet_table
+( id INT, col_1 BOOLEAN, col_2 DOUBLE, col_3 TIMESTAMP )
+STORED AS PARQUET;</code></pre>
+
+    <p class="p">
+      With the <code class="ph codeph">INSERT INTO TABLE</code> syntax, each new set of inserted rows is appended to any existing
+      data in the table. This is how you would record small amounts of data that arrive continuously, or ingest new
+      batches of data alongside the existing data. For example, after running 2 <code class="ph codeph">INSERT INTO TABLE</code>
+      statements with 5 rows each, the table contains 10 rows total:
+    </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; insert into table text_table select * from default.tab1;
+Inserted 5 rows in 0.41s
+
+[localhost:21000] &gt; insert into table text_table select * from default.tab1;
+Inserted 5 rows in 0.46s
+
+[localhost:21000] &gt; select count(*) from text_table;
++----------+
+| count(*) |
++----------+
+| 10       |
++----------+
+Returned 1 row(s) in 0.26s</code></pre>
+
+    <p class="p">
+      With the <code class="ph codeph">INSERT OVERWRITE TABLE</code> syntax, each new set of inserted rows replaces any existing
+      data in the table. This is how you load data to query in a data warehousing scenario where you analyze just
+      the data for a particular day, quarter, and so on, discarding the previous data each time. You might keep the
+      entire set of data in one raw table, and transfer and transform certain rows into a more compact and
+      efficient form to perform intensive analysis on that subset.
+    </p>
+
+    <p class="p">
+      For example, here we insert 5 rows into a table using the <code class="ph codeph">INSERT INTO</code> clause, then replace
+      the data by inserting 3 rows with the <code class="ph codeph">INSERT OVERWRITE</code> clause. Afterward, the table only
+      contains the 3 rows from the final <code class="ph codeph">INSERT</code> statement.
+    </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; insert into table parquet_table select * from default.tab1;
+Inserted 5 rows in 0.35s
+
+[localhost:21000] &gt; insert overwrite table parquet_table select * from default.tab1 limit 3;
+Inserted 3 rows in 0.43s
+[localhost:21000] &gt; select count(*) from parquet_table;
++----------+
+| count(*) |
++----------+
+| 3        |
++----------+
+Returned 1 row(s) in 0.43s</code></pre>
+
+    <p class="p">
+      The <code class="ph codeph"><a class="xref" href="impala_insert.html#values">VALUES</a></code> clause lets you insert one or more
+      rows by specifying constant values for all the columns. The number, types, and order of the expressions must
+      match the table definition.
+    </p>
+
+    <div class="note note note_note" id="insert__insert_values_warning"><span class="note__title notetitle">Note:</span> 
+      The <code class="ph codeph">INSERT ... VALUES</code> technique is not suitable for loading large quantities of data into
+      HDFS-based tables, because the insert operations cannot be parallelized, and each one produces a separate
+      data file. Use it for setting up small dimension tables or tiny amounts of data for experimenting with SQL
+      syntax, or with HBase tables. Do not use it for large ETL jobs or benchmark tests for load operations. Do not
+      run scripts with thousands of <code class="ph codeph">INSERT ... VALUES</code> statements that insert a single row each
+      time. If you do run <code class="ph codeph">INSERT ... VALUES</code> operations to load data into a staging table as one
+      stage in an ETL pipeline, include multiple row values if possible within each <code class="ph codeph">VALUES</code> clause,
+      and use a separate database to make cleanup easier if the operation does produce many tiny files.
+    </div>
+
+    <p class="p">
+      The following example shows how to insert one row or multiple rows, with expressions of different types,
+      using literal values, expressions, and function return values:
+    </p>
+
+<pre class="pre codeblock"><code>create table val_test_1 (c1 int, c2 float, c3 string, c4 boolean, c5 timestamp);
+insert into val_test_1 values (100, 99.9/10, 'abc', true, now());
+create table val_test_2 (id int, token string);
+insert overwrite val_test_2 values (1, 'a'), (2, 'b'), (-1,'xyzzy');</code></pre>
+
+    <p class="p">
+      These examples show the type of <span class="q">"not implemented"</span> error that you see when attempting to insert data into
+      a table with a file format that Impala currently does not write to:
+    </p>
+
+<pre class="pre codeblock"><code>DROP TABLE IF EXISTS sequence_table;
+CREATE TABLE sequence_table
+( id INT, col_1 BOOLEAN, col_2 DOUBLE, col_3 TIMESTAMP )
+STORED AS SEQUENCEFILE;
+
+DROP TABLE IF EXISTS rc_table;
+CREATE TABLE rc_table
+( id INT, col_1 BOOLEAN, col_2 DOUBLE, col_3 TIMESTAMP )
+STORED AS RCFILE;
+
+[localhost:21000] &gt; insert into table rc_table select * from default.tab1;
+Remote error
+Backend 0:RC_FILE not implemented.
+
+[localhost:21000] &gt; insert into table sequence_table select * from default.tab1;
+Remote error
+Backend 0:SEQUENCE_FILE not implemented. </code></pre>
+
+    <p class="p">
+      Inserting data into partitioned tables requires slightly different syntax that divides the partitioning
+      columns from the others:
+    </p>
+
+<pre class="pre codeblock"><code>create table t1 (i int) <strong class="ph b">partitioned by (x int, y string)</strong>;
+-- Select an INT column from another table.
+-- All inserted rows will have the same x and y values, as specified in the INSERT statement.
+-- This technique of specifying all the partition key values is known as static partitioning.
+insert into t1 <strong class="ph b">partition(x=10, y='a')</strong> select c1 from some_other_table;
+-- Select two INT columns from another table.
+-- All inserted rows will have the same y value, as specified in the INSERT statement.
+-- Values from c2 go into t1.x.
+-- Any partitioning columns whose value is not specified are filled in
+-- from the columns specified last in the SELECT list.
+-- This technique of omitting some partition key values is known as dynamic partitioning.
+insert into t1 <strong class="ph b">partition(x, y='b')</strong> select c1, c2 from some_other_table;
+-- Select an INT and a STRING column from another table.
+-- All inserted rows will have the same x value, as specified in the INSERT statement.
+-- Values from c3 go into t1.y.
+insert into t1 <strong class="ph b">partition(x=20, y)</strong> select c1, c3  from some_other_table;</code></pre>
+
+    <p class="p">
+      The following examples show how you can copy the data in all the columns from one table to another, copy the
+      data from only some columns, or specify the columns in the select list in a different order than they
+      actually appear in the table:
+    </p>
+
+<pre class="pre codeblock"><code>-- Start with 2 identical tables.
+create table t1 (c1 int, c2 int);
+create table t2 like t1;
+
+-- If there is no () part after the destination table name,
+-- all columns must be specified, either as * or by name.
+insert into t2 select * from t1;
+insert into t2 select c1, c2 from t1;
+
+-- With the () notation following the destination table name,
+-- you can omit columns (all values for that column are NULL
+-- in the destination table), and/or reorder the values
+-- selected from the source table. This is the "column permutation" feature.
+insert into t2 (c1) select c1 from t1;
+insert into t2 (c2, c1) select c1, c2 from t1;
+
+-- The column names can be entirely different in the source and destination tables.
+-- You can copy any columns, not just the corresponding ones, from the source table.
+-- But the number and type of selected columns must match the columns mentioned in the () part.
+alter table t2 replace columns (x int, y int);
+insert into t2 (y) select c1 from t1;
+
+-- For partitioned tables, all the partitioning columns must be mentioned in the () column list
+-- or a PARTITION clause; these columns cannot be defaulted to NULL.
+create table pt1 (x int, y int) partitioned by (z int);
+-- The values from c1 are copied into the column x in the new table,
+-- all in the same partition based on a constant value for z.
+-- The values of y in the new table are all NULL.
+insert into pt1 (x) partition (z=5) select c1 from t1;
+-- Again we omit the values for column y so they are all NULL.
+-- The inserted x values can go into different partitions, based on
+-- the different values inserted into the partitioning column z.
+insert into pt1 (x,z) select x, z from t2;
+</code></pre>
+
+    <p class="p">
+      <code class="ph codeph">SELECT *</code> for a partitioned table requires that all partition key columns in the source table
+      be declared as the last columns in the <code class="ph codeph">CREATE TABLE</code> statement. You still include a
+      <code class="ph codeph">PARTITION BY</code> clause listing all the partition key columns. These partition columns are
+      automatically mapped to the last columns from the <code class="ph codeph">SELECT *</code> list.
+    </p>
+
+<pre class="pre codeblock"><code>create table source (x int, y int, year int, month int, day int);
+create table destination (x int, y int) partitioned by (year int, month int, day int);
+...load some data into the unpartitioned source table...
+-- Insert a single partition of data.
+-- The SELECT * means you cannot specify partition (year=2014, month, day).
+insert overwrite destination partition (year, month, day) select * from source where year=2014;
+-- Insert the data for all year/month/day combinations.
+insert overwrite destination partition (year, month, day) select * from source;
+
+-- If one of the partition columns is omitted from the source table,
+-- then you can specify a specific value for that column in the PARTITION clause.
+-- Here the source table holds only data from 2014, and so does not include a year column.
+create table source_2014 (x int, y int, month, day);
+...load some data into the unpartitioned source_2014 table...
+insert overwrite destination partition (year=2014, month, day) select * from source_2014;
+</code></pre>
+
+    <p class="p">
+        <strong class="ph b">Sorting considerations:</strong> Although you can specify an <code class="ph codeph">ORDER BY</code> clause in an
+        <code class="ph codeph">INSERT ... SELECT</code> statement, any <code class="ph codeph">ORDER BY</code> clause is ignored and the
+        results are not necessarily sorted. An <code class="ph codeph">INSERT ... SELECT</code> operation potentially creates
+        many different data files, prepared on different data nodes, and therefore the notion of the data being
+        stored in sorted order is impractical.
+      </p>
+
+    <p class="p">
+      <strong class="ph b">Concurrency considerations:</strong> Each <code class="ph codeph">INSERT</code> operation creates new data files with unique
+      names, so you can run multiple <code class="ph codeph">INSERT INTO</code> statements simultaneously without filename
+      conflicts.
+
+      While data is being inserted into an Impala table, the data is staged temporarily in a subdirectory inside
+      the data directory; during this period, you cannot issue queries against that table in Hive. If an
+      <code class="ph codeph">INSERT</code> operation fails, the temporary data file and the subdirectory could be left behind in
+      the data directory. If so, remove the relevant subdirectory and any data files it contains manually, by
+      issuing an <code class="ph codeph">hdfs dfs -rm -r</code> command, specifying the full path of the work subdirectory, whose
+      name ends in <code class="ph codeph">_dir</code>.
+    </p>
+  </div>
+
+  <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_langref_sql.html">Impala SQL Statements</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="insert__values">
+
+    <h2 class="title topictitle2" id="ariaid-title2">VALUES Clause</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        The <code class="ph codeph">VALUES</code> clause is a general-purpose way to specify the columns of one or more rows,
+        typically within an <code class="ph codeph"><a class="xref" href="impala_insert.html#insert">INSERT</a></code> statement.
+      </p>
+
+      <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+        The <code class="ph codeph">INSERT ... VALUES</code> technique is not suitable for loading large quantities of data into
+        HDFS-based tables, because the insert operations cannot be parallelized, and each one produces a separate
+        data file. Use it for setting up small dimension tables or tiny amounts of data for experimenting with SQL
+        syntax, or with HBase tables. Do not use it for large ETL jobs or benchmark tests for load operations. Do
+        not run scripts with thousands of <code class="ph codeph">INSERT ... VALUES</code> statements that insert a single row
+        each time. If you do run <code class="ph codeph">INSERT ... VALUES</code> operations to load data into a staging table as
+        one stage in an ETL pipeline, include multiple row values if possible within each <code class="ph codeph">VALUES</code>
+        clause, and use a separate database to make cleanup easier if the operation does produce many tiny files.
+      </div>
+
+      <p class="p">
+        The following examples illustrate:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          How to insert a single row using a <code class="ph codeph">VALUES</code> clause.
+        </li>
+
+        <li class="li">
+          How to insert multiple rows using a <code class="ph codeph">VALUES</code> clause.
+        </li>
+
+        <li class="li">
+          How the row or rows from a <code class="ph codeph">VALUES</code> clause can be appended to a table through
+          <code class="ph codeph">INSERT INTO</code>, or replace the contents of the table through <code class="ph codeph">INSERT
+          OVERWRITE</code>.
+        </li>
+
+        <li class="li">
+          How the entries in a <code class="ph codeph">VALUES</code> clause can be literals, function results, or any other kind
+          of expression. See <a class="xref" href="impala_literals.html#literals">Literals</a> for the notation to use for literal
+          values, especially <a class="xref" href="impala_literals.html#string_literals">String Literals</a> for quoting and escaping
+          conventions for strings. See <a class="xref" href="impala_operators.html#operators">SQL Operators</a> and
+          <a class="xref" href="impala_functions.html#builtins">Impala Built-In Functions</a> for other things you can include in expressions with the
+          <code class="ph codeph">VALUES</code> clause.
+        </li>
+      </ul>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; describe val_example;
+Query: describe val_example
+Query finished, fetching results ...
++-------+---------+---------+
+| name  | type    | comment |
++-------+---------+---------+
+| id    | int     |         |
+| col_1 | boolean |         |
+| col_2 | double  |         |
++-------+---------+---------+
+
+[localhost:21000] &gt; insert into val_example values (1,true,100.0);
+Inserted 1 rows in 0.30s
+[localhost:21000] &gt; select * from val_example;
++----+-------+-------+
+| id | col_1 | col_2 |
++----+-------+-------+
+| 1  | true  | 100   |
++----+-------+-------+
+
+[localhost:21000] &gt; insert overwrite val_example values (10,false,pow(2,5)), (50,true,10/3);
+Inserted 2 rows in 0.16s
+[localhost:21000] &gt; select * from val_example;
++----+-------+-------------------+
+| id | col_1 | col_2             |
++----+-------+-------------------+
+| 10 | false | 32                |
+| 50 | true  | 3.333333333333333 |
++----+-------+-------------------+</code></pre>
+
+      <p class="p">
+        When used in an <code class="ph codeph">INSERT</code> statement, the Impala <code class="ph codeph">VALUES</code> clause can specify
+        some or all of the columns in the destination table, and the columns can be specified in a different order
+        than they actually appear in the table. To specify a different set or order of columns than in the table,
+        use the syntax:
+      </p>
+
+<pre class="pre codeblock"><code>INSERT INTO <var class="keyword varname">destination</var>
+  (<var class="keyword varname">col_x</var>, <var class="keyword varname">col_y</var>, <var class="keyword varname">col_z</var>)
+  VALUES
+  (<var class="keyword varname">val_x</var>, <var class="keyword varname">val_y</var>, <var class="keyword varname">val_z</var>);
+</code></pre>
+
+      <p class="p">
+        Any columns in the table that are not listed in the <code class="ph codeph">INSERT</code> statement are set to
+        <code class="ph codeph">NULL</code>.
+      </p>
+
+
+
+      <p class="p">
+        To use a <code class="ph codeph">VALUES</code> clause like a table in other statements, wrap it in parentheses and use
+        <code class="ph codeph">AS</code> clauses to specify aliases for the entire object and any columns you need to refer to:
+      </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; select * from (values(4,5,6),(7,8,9)) as t;
++---+---+---+
+| 4 | 5 | 6 |
++---+---+---+
+| 4 | 5 | 6 |
+| 7 | 8 | 9 |
++---+---+---+
+[localhost:21000] &gt; select * from (values(1 as c1, true as c2, 'abc' as c3),(100,false,'xyz')) as t;
++-----+-------+-----+
+| c1  | c2    | c3  |
++-----+-------+-----+
+| 1   | true  | abc |
+| 100 | false | xyz |
++-----+-------+-----+</code></pre>
+
+      <p class="p">
+        For example, you might use a tiny table constructed like this from constant literals or function return
+        values as part of a longer statement involving joins or <code class="ph codeph">UNION ALL</code>.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">HDFS considerations:</strong>
+      </p>
+
+      <p class="p">
+        Impala physically writes all inserted files under the ownership of its default user, typically
+        <code class="ph codeph">impala</code>. Therefore, this user must have HDFS write permission in the corresponding table
+        directory.
+      </p>
+
+      <p class="p">
+        The permission requirement is independent of the authorization performed by the Sentry framework. (If the
+        connected user is not authorized to insert into a table, Sentry blocks that operation immediately,
+        regardless of the privileges available to the <code class="ph codeph">impala</code> user.) Files created by Impala are
+        not owned by and do not inherit permissions from the connected user.
+      </p>
+
+      <p class="p">
+        The number of data files produced by an <code class="ph codeph">INSERT</code> statement depends on the size of the
+        cluster, the number of data blocks that are processed, the partition key columns in a partitioned table,
+        and the mechanism Impala uses for dividing the work in parallel. Do not assume that an
+        <code class="ph codeph">INSERT</code> statement will produce some particular number of output files. In case of
+        performance issues with data written by Impala, check that the output files do not suffer from issues such
+        as many tiny files or many tiny partitions. (In the Hadoop context, even files or partitions of a few tens
+        of megabytes are considered <span class="q">"tiny"</span>.)
+      </p>
+
+      <p class="p">
+        The <code class="ph codeph">INSERT</code> statement has always left behind a hidden work directory inside the data
+        directory of the table. Formerly, this hidden work directory was named
+        <span class="ph filepath">.impala_insert_staging</span> . In Impala 2.0.1 and later, this directory name is changed to
+        <span class="ph filepath">_impala_insert_staging</span> . (While HDFS tools are expected to treat names beginning
+        either with underscore and dot as hidden, in practice names beginning with an underscore are more widely
+        supported.) If you have any scripts, cleanup jobs, and so on that rely on the name of this work directory,
+        adjust them to use the new name.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">HBase considerations:</strong>
+      </p>
+
+      <p class="p">
+        You can use the <code class="ph codeph">INSERT</code> statement with HBase tables as follows:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          <p class="p">
+            You can insert a single row or a small set of rows into an HBase table with the <code class="ph codeph">INSERT ...
+            VALUES</code> syntax. This is a good use case for HBase tables with Impala, because HBase tables are
+            not subject to the same kind of fragmentation from many small insert operations as HDFS tables are.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            You can insert any number of rows at once into an HBase table using the <code class="ph codeph">INSERT ...
+            SELECT</code> syntax.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            If more than one inserted row has the same value for the HBase key column, only the last inserted row
+            with that value is visible to Impala queries. You can take advantage of this fact with <code class="ph codeph">INSERT
+            ... VALUES</code> statements to effectively update rows one at a time, by inserting new rows with the
+            same key values as existing rows. Be aware that after an <code class="ph codeph">INSERT ... SELECT</code> operation
+            copying from an HDFS table, the HBase table might contain fewer rows than were inserted, if the key
+            column in the source table contained duplicate values.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            You cannot <code class="ph codeph">INSERT OVERWRITE</code> into an HBase table. New rows are always appended.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            When you create an Impala or Hive table that maps to an HBase table, the column order you specify with
+            the <code class="ph codeph">INSERT</code> statement might be different than the order you declare with the
+            <code class="ph codeph">CREATE TABLE</code> statement. Behind the scenes, HBase arranges the columns based on how
+            they are divided into column families. This might cause a mismatch during insert operations, especially
+            if you use the syntax <code class="ph codeph">INSERT INTO <var class="keyword varname">hbase_table</var> SELECT * FROM
+            <var class="keyword varname">hdfs_table</var></code>. Before inserting data, verify the column order by issuing a
+            <code class="ph codeph">DESCRIBE</code> statement for the table, and adjust the order of the select list in the
+            <code class="ph codeph">INSERT</code> statement.
+          </p>
+        </li>
+      </ul>
+
+      <p class="p">
+        See <a class="xref" href="impala_hbase.html#impala_hbase">Using Impala to Query HBase Tables</a> for more details about using Impala with HBase.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Amazon S3 considerations:</strong>
+      </p>
+      <p class="p">
+        In <span class="keyword">Impala 2.6</span> and higher, the Impala DML statements (<code class="ph codeph">INSERT</code>, <code class="ph codeph">LOAD DATA</code>,
+        and <code class="ph codeph">CREATE TABLE AS SELECT</code>) can write data into a table or partition that resides in the
+        Amazon Simple Storage Service (S3).
+        The syntax of the DML statements is the same as for any other tables, because the S3 location for tables and
+        partitions is specified by an <code class="ph codeph">s3a://</code> prefix in the
+        <code class="ph codeph">LOCATION</code> attribute of
+        <code class="ph codeph">CREATE TABLE</code> or <code class="ph codeph">ALTER TABLE</code> statements.
+        If you bring data into S3 using the normal S3 transfer mechanisms instead of Impala DML statements,
+        issue a <code class="ph codeph">REFRESH</code> statement for the table before using Impala to query the S3 data.
+      </p>
+      <p class="p">
+        Because of differences between S3 and traditional filesystems, DML operations
+        for S3 tables can take longer than for tables on HDFS. For example, both the
+        <code class="ph codeph">LOAD DATA</code> statement and the final stage of the <code class="ph codeph">INSERT</code>
+        and <code class="ph codeph">CREATE TABLE AS SELECT</code> statements involve moving files from one directory
+        to another. (In the case of <code class="ph codeph">INSERT</code> and <code class="ph codeph">CREATE TABLE AS SELECT</code>,
+        the files are moved from a temporary staging directory to the final destination directory.)
+        Because S3 does not support a <span class="q">"rename"</span> operation for existing objects, in these cases Impala
+        actually copies the data files from one location to another and then removes the original files.
+        In <span class="keyword">Impala 2.6</span>, the <code class="ph codeph">S3_SKIP_INSERT_STAGING</code> query option provides a way
+        to speed up <code class="ph codeph">INSERT</code> statements for S3 tables and partitions, with the tradeoff
+        that a problem during statement execution could leave data in an inconsistent state.
+        It does not apply to <code class="ph codeph">INSERT OVERWRITE</code> or <code class="ph codeph">LOAD DATA</code> statements.
+        See <a class="xref" href="../shared/../topics/impala_s3_skip_insert_staging.html#s3_skip_insert_staging">S3_SKIP_INSERT_STAGING Query Option (Impala 2.6 or higher only)</a> for details.
+      </p>
+      <p class="p">See <a class="xref" href="impala_s3.html#s3">Using Impala with the Amazon S3 Filesystem</a> for details about reading and writing S3 data with Impala.</p>
+
+      <p class="p">
+        <strong class="ph b">Security considerations:</strong>
+      </p>
+      <p class="p">
+        If these statements in your environment contain sensitive literal values such as credit card numbers or tax
+        identifiers, Impala can redact this sensitive information when displaying the statements in log files and
+        other administrative contexts. See <span class="xref">the documentation for your Apache Hadoop distribution</span> for details.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Cancellation:</strong> Can be cancelled. To cancel this statement, use Ctrl-C from the
+        <span class="keyword cmdname">impala-shell</span> interpreter, the <span class="ph uicontrol">Cancel</span> button from the
+        <span class="ph uicontrol">Watch</span> page in Hue, or <span class="ph uicontrol">Cancel</span> from the list of
+        in-flight queries (for a particular node) on the <span class="ph uicontrol">Queries</span> tab in the Impala web UI
+        (port 25000).
+      </p>
+
+      <p class="p">
+        <strong class="ph b">HDFS permissions:</strong>
+      </p>
+      <p class="p">
+        The user ID that the <span class="keyword cmdname">impalad</span> daemon runs under,
+        typically the <code class="ph codeph">impala</code> user, must have read
+        permission for the files in the source directory of an <code class="ph codeph">INSERT ... SELECT</code>
+        operation, and write permission for all affected directories in the destination table.
+        (An <code class="ph codeph">INSERT</code> operation could write files to multiple different HDFS directories
+        if the destination table is partitioned.)
+        This user must also have write permission to create a temporary work directory
+        in the top-level HDFS directory of the destination table.
+        An <code class="ph codeph">INSERT OVERWRITE</code> operation does not require write permission on
+        the original data files in the table, only on the table directories themselves.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Restrictions:</strong>
+      </p>
+
+      <p class="p">
+        For <code class="ph codeph">INSERT</code> operations into <code class="ph codeph">CHAR</code> or <code class="ph codeph">VARCHAR</code> columns, you
+        must cast all <code class="ph codeph">STRING</code> literals or expressions returning <code class="ph codeph">STRING</code> to to a
+        <code class="ph codeph">CHAR</code> or <code class="ph codeph">VARCHAR</code> type with the appropriate length.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Related startup options:</strong>
+      </p>
+
+      <p class="p">
+        By default, if an <code class="ph codeph">INSERT</code> statement creates any new subdirectories underneath a partitioned
+        table, those subdirectories are assigned default HDFS permissions for the <code class="ph codeph">impala</code> user. To
+        make each subdirectory have the same permissions as its parent directory in HDFS, specify the
+        <code class="ph codeph">--insert_inherit_permissions</code> startup option for the <span class="keyword cmdname">impalad</span> daemon.
+      </p>
+    </div>
+  </article>
+
+
+</article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_install.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_install.html b/docs/build/html/topics/impala_install.html
new file mode 100644
index 0000000..561d3b4
--- /dev/null
+++ b/docs/build/html/topics/impala_install.html
@@ -0,0 +1,126 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="install"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Installing Impala</title></head><body id="install"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1"><span class="ph">Installing Impala</span></h1>
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      
+      
+      
+      
+      
+      
+      
+      
+      Impala is an open-source analytic database for Apache Hadoop
+      that returns rapid responses to queries.
+    </p>
+
+    <p class="p">
+      Follow these steps to set up Impala on a cluster by building from source:
+    </p>
+
+
+
+    <ul class="ul">
+      <li class="li">
+        <p class="p">
+          Download the latest release. See
+          <a class="xref" href="http://impala.apache.org/downloads.html" target="_blank">the Impala downloads page</a>
+          for the link to the latest release.
+        </p>
+      </li>
+      <li class="li">
+        <p class="p">
+          Check the <span class="ph filepath">README.md</span> file for a pointer
+          to the build instructions.
+        </p>
+      </li>
+      <li class="li">
+        <p class="p">
+          Please check the MD5 and SHA1 and GPG signature, the latter by using the code signing keys of the release managers.
+        </p>
+      </li>
+      <li class="li">
+        <div class="p">
+          Developers interested in working on Impala can clone the Impala source repository:
+<pre class="pre codeblock"><code>
+git clone https://git-wip-us.apache.org/repos/asf/incubator-impala.git
+</code></pre>
+        </div>
+      </li>
+    </ul>
+
+  </div>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title2" id="install__install_details">
+
+    <h2 class="title topictitle2" id="ariaid-title2">What is Included in an Impala Installation</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        Impala is made up of a set of components that can be installed on multiple nodes throughout your cluster.
+        The key installation step for performance is to install the <span class="keyword cmdname">impalad</span> daemon (which does
+        most of the query processing work) on <em class="ph i">all</em> DataNodes in the cluster.
+      </p>
+
+      <p class="p">
+        The Impala package installs these binaries:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          <p class="p">
+            <span class="keyword cmdname">impalad</span> - The Impala daemon. Plans and executes queries against HDFS, HBase, <span class="ph">and Amazon S3 data</span>.
+            <a class="xref" href="impala_processes.html#processes">Run one impalad process</a> on each node in the cluster
+            that has a DataNode.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            <span class="keyword cmdname">statestored</span> - Name service that tracks location and status of all
+            <code class="ph codeph">impalad</code> instances in the cluster. <a class="xref" href="impala_processes.html#processes">Run one
+            instance of this daemon</a> on a node in your cluster. Most production deployments run this daemon
+            on the namenode.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            <span class="keyword cmdname">catalogd</span> - Metadata coordination service that broadcasts changes from Impala DDL and
+            DML statements to all affected Impala nodes, so that new tables, newly loaded data, and so on are
+            immediately visible to queries submitted through any Impala node.
+
+            (Prior to Impala 1.2, you had to run the <code class="ph codeph">REFRESH</code> or <code class="ph codeph">INVALIDATE
+            METADATA</code> statement on each node to synchronize changed metadata. Now those statements are only
+            required if you perform the DDL or DML through an external mechanism such as Hive <span class="ph">or by uploading
+            data to the Amazon S3 filesystem</span>.)
+            <a class="xref" href="impala_processes.html#processes">Run one instance of this daemon</a> on a node in your cluster,
+            preferably on the same host as the <code class="ph codeph">statestored</code> daemon.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            <span class="keyword cmdname">impala-shell</span> - <a class="xref" href="impala_impala_shell.html#impala_shell">Command-line
+            interface</a> for issuing queries to the Impala daemon. You install this on one or more hosts
+            anywhere on your network, not necessarily DataNodes or even within the same cluster as Impala. It can
+            connect remotely to any instance of the Impala daemon.
+          </p>
+        </li>
+      </ul>
+
+      <p class="p">
+        Before doing the installation, ensure that you have all necessary prerequisites. See
+        <a class="xref" href="impala_prereqs.html#prereqs">Impala Requirements</a> for details.
+      </p>
+    </div>
+  </article>
+
+</article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_int.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_int.html b/docs/build/html/topics/impala_int.html
new file mode 100644
index 0000000..2fcd403
--- /dev/null
+++ b/docs/build/html/topics/impala_int.html
@@ -0,0 +1,119 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_datatypes.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="int"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>INT Data Type</title></head><body id="int"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">INT Data Type</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      A 4-byte integer data type used in <code class="ph codeph">CREATE TABLE</code> and <code class="ph codeph">ALTER TABLE</code> statements.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+    <p class="p">
+      In the column definition of a <code class="ph codeph">CREATE TABLE</code> statement:
+    </p>
+
+<pre class="pre codeblock"><code><var class="keyword varname">column_name</var> INT</code></pre>
+
+    <p class="p">
+      <strong class="ph b">Range:</strong> -2147483648 .. 2147483647. There is no <code class="ph codeph">UNSIGNED</code> subtype.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Conversions:</strong> Impala automatically converts to a larger integer type (<code class="ph codeph">BIGINT</code>) or a
+      floating-point type (<code class="ph codeph">FLOAT</code> or <code class="ph codeph">DOUBLE</code>) automatically. Use
+      <code class="ph codeph">CAST()</code> to convert to <code class="ph codeph">TINYINT</code>, <code class="ph codeph">SMALLINT</code>,
+      <code class="ph codeph">STRING</code>, or <code class="ph codeph">TIMESTAMP</code>.
+      <span class="ph">Casting an integer or floating-point value <code class="ph codeph">N</code> to
+        <code class="ph codeph">TIMESTAMP</code> produces a value that is <code class="ph codeph">N</code> seconds past the start of the epoch
+        date (January 1, 1970). By default, the result value represents a date and time in the UTC time zone.
+        If the setting <code class="ph codeph">-use_local_tz_for_unix_timestamp_conversions=true</code> is in effect,
+        the resulting <code class="ph codeph">TIMESTAMP</code> represents a date and time in the local time zone.</span>
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+    <p class="p">
+      The data type <code class="ph codeph">INTEGER</code> is an alias for <code class="ph codeph">INT</code>.
+    </p>
+
+    <p class="p">
+      For a convenient and automated way to check the bounds of the <code class="ph codeph">INT</code> type, call the functions
+      <code class="ph codeph">MIN_INT()</code> and <code class="ph codeph">MAX_INT()</code>.
+    </p>
+
+    <p class="p">
+      If an integer value is too large to be represented as a <code class="ph codeph">INT</code>, use a <code class="ph codeph">BIGINT</code>
+      instead.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">NULL considerations:</strong> Casting any non-numeric value to this type produces a <code class="ph codeph">NULL</code>
+        value.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>CREATE TABLE t1 (x INT);
+SELECT CAST(1000 AS INT);
+</code></pre>
+
+    <p class="p">
+        <strong class="ph b">Partitioning:</strong> Prefer to use this type for a partition key column. Impala can process the numeric
+        type more efficiently than a <code class="ph codeph">STRING</code> representation of the value.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">HBase considerations:</strong> This data type is fully compatible with HBase tables.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Parquet considerations:</strong>
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Text table considerations:</strong> Values of this type are potentially larger in text tables than in tables
+        using Parquet or other binary formats.
+      </p>
+
+
+
+    <p class="p">
+        <strong class="ph b">Internal details:</strong> Represented in memory as a 4-byte value.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Added in:</strong> Available in all versions of Impala.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Column statistics considerations:</strong> Because this type has a fixed size, the maximum and average size
+        fields are always filled in for column statistics, even before you run the <code class="ph codeph">COMPUTE STATS</code>
+        statement.
+      </p>
+
+
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+    <p class="p">
+      <a class="xref" href="impala_literals.html#numeric_literals">Numeric Literals</a>, <a class="xref" href="impala_tinyint.html#tinyint">TINYINT Data Type</a>,
+      <a class="xref" href="impala_smallint.html#smallint">SMALLINT Data Type</a>, <a class="xref" href="impala_int.html#int">INT Data Type</a>,
+      <a class="xref" href="impala_bigint.html#bigint">BIGINT Data Type</a>, <a class="xref" href="impala_decimal.html#decimal">DECIMAL Data Type (Impala 1.4 or higher only)</a>,
+      <a class="xref" href="impala_math_functions.html#math_functions">Impala Mathematical Functions</a>
+    </p>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_datatypes.html">Data Types</a></div></div></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_intro.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_intro.html b/docs/build/html/topics/impala_intro.html
new file mode 100644
index 0000000..cdf05b7
--- /dev/null
+++ b/docs/build/html/topics/impala_intro.html
@@ -0,0 +1,198 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="intro"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Introducing Apache Impala (incubating)</title></head><body id="intro"><main role="main"><article role="article" aria-labelledby="intro__impala">
+
+  <h1 class="title topictitle1" id="intro__impala"><span class="ph">Introducing Apache Impala (incubating)</span></h1>
+  
+
+  <div class="body conbody" id="intro__intro_body">
+
+      <p class="p">
+        Impala provides fast, interactive SQL queries directly on your Apache Hadoop data stored in HDFS,
+        HBase, <span class="ph">or the Amazon Simple Storage Service (S3)</span>.
+        In addition to using the same unified storage platform,
+        Impala also uses the same metadata, SQL syntax (Hive SQL), ODBC driver, and user interface
+        (Impala query UI in Hue) as Apache Hive. This
+        provides a familiar and unified platform for real-time or batch-oriented queries.
+      </p>
+
+      <p class="p">
+        Impala is an addition to tools available for querying big data. Impala does not replace the batch
+        processing frameworks built on MapReduce such as Hive. Hive and other frameworks built on MapReduce are
+        best suited for long running batch jobs, such as those involving batch processing of Extract, Transform,
+        and Load (ETL) type jobs.
+      </p>
+
+      <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+        Impala was accepted into the Apache incubator on December 2, 2015.
+        In places where the documentation formerly referred to <span class="q">"Cloudera Impala"</span>,
+        now the official name is <span class="q">"Apache Impala (incubating)"</span>.
+      </div>
+
+  </div>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title2" id="intro__benefits">
+
+    <h2 class="title topictitle2" id="ariaid-title2">Impala Benefits</h2>
+
+    <div class="body conbody">
+
+      <div class="p">
+        Impala provides:
+
+        <ul class="ul">
+          <li class="li">
+            Familiar SQL interface that data scientists and analysts already know.
+          </li>
+
+          <li class="li">
+            Ability to query high volumes of data (<span class="q">"big data"</span>) in Apache Hadoop.
+          </li>
+
+          <li class="li">
+            Distributed queries in a cluster environment, for convenient scaling and to make use of cost-effective
+            commodity hardware.
+          </li>
+
+          <li class="li">
+            Ability to share data files between different components with no copy or export/import step; for example,
+            to write with Pig, transform with Hive and query with Impala. Impala can read from and write to Hive
+            tables, enabling simple data interchange using Impala for analytics on Hive-produced data.
+          </li>
+
+          <li class="li">
+            Single system for big data processing and analytics, so customers can avoid costly modeling and ETL just
+            for analytics.
+          </li>
+        </ul>
+      </div>
+
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="intro__impala_hadoop">
+
+    <h2 class="title topictitle2" id="ariaid-title3">How Impala Works with <span class="keyword">Apache Hadoop</span></h2>
+  
+
+    <div class="body conbody">
+
+      
+
+      <div class="p">
+        The Impala solution is composed of the following components:
+        <ul class="ul">
+          <li class="li">
+            Clients - Entities including Hue, ODBC clients, JDBC clients, and the Impala Shell can all interact
+            with Impala. These interfaces are typically used to issue queries or complete administrative tasks such
+            as connecting to Impala.
+          </li>
+
+          <li class="li">
+            Hive Metastore - Stores information about the data available to Impala. For example, the metastore lets
+            Impala know what databases are available and what the structure of those databases is. As you create,
+            drop, and alter schema objects, load data into tables, and so on through Impala SQL statements, the
+            relevant metadata changes are automatically broadcast to all Impala nodes by the dedicated catalog
+            service introduced in Impala 1.2.
+          </li>
+
+          <li class="li">
+            Impala - This process, which runs on DataNodes, coordinates and executes queries. Each
+            instance of Impala can receive, plan, and coordinate queries from Impala clients. Queries are
+            distributed among Impala nodes, and these nodes then act as workers, executing parallel query
+            fragments.
+          </li>
+
+          <li class="li">
+            HBase and HDFS - Storage for data to be queried.
+          </li>
+        </ul>
+      </div>
+
+      <div class="p">
+        Queries executed using Impala are handled as follows:
+        <ol class="ol">
+          <li class="li">
+            User applications send SQL queries to Impala through ODBC or JDBC, which provide standardized querying
+            interfaces. The user application may connect to any <code class="ph codeph">impalad</code> in the cluster. This
+            <code class="ph codeph">impalad</code> becomes the coordinator for the query.
+          </li>
+
+          <li class="li">
+            Impala parses the query and analyzes it to determine what tasks need to be performed by
+            <code class="ph codeph">impalad</code> instances across the cluster. Execution is planned for optimal efficiency.
+          </li>
+
+          <li class="li">
+            Services such as HDFS and HBase are accessed by local <code class="ph codeph">impalad</code> instances to provide
+            data.
+          </li>
+
+          <li class="li">
+            Each <code class="ph codeph">impalad</code> returns data to the coordinating <code class="ph codeph">impalad</code>, which sends
+            these results to the client.
+          </li>
+        </ol>
+      </div>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="intro__features">
+
+    <h2 class="title topictitle2" id="ariaid-title4">Primary Impala Features</h2>
+
+    <div class="body conbody">
+
+      <div class="p">
+        Impala provides support for:
+        <ul class="ul">
+          <li class="li">
+            Most common SQL-92 features of Hive Query Language (HiveQL) including
+            <a class="xref" href="../shared/../topics/impala_select.html#select">SELECT</a>,
+            <a class="xref" href="../shared/../topics/impala_joins.html#joins">joins</a>, and
+            <a class="xref" href="../shared/../topics/impala_aggregate_functions.html#aggregate_functions">aggregate functions</a>.
+          </li>
+
+          <li class="li">
+            HDFS, HBase, <span class="ph">and Amazon Simple Storage System (S3)</span> storage, including:
+            <ul class="ul">
+              <li class="li">
+                <a class="xref" href="../shared/../topics/impala_file_formats.html#file_formats">HDFS file formats</a>: delimited text files, Parquet,
+                Avro, SequenceFile, and RCFile.
+              </li>
+
+              <li class="li">
+                Compression codecs: Snappy, GZIP, Deflate, BZIP.
+              </li>
+            </ul>
+          </li>
+
+          <li class="li">
+            Common data access interfaces including:
+            <ul class="ul">
+              <li class="li">
+                <a class="xref" href="../shared/../topics/impala_jdbc.html#impala_jdbc">JDBC driver</a>.
+              </li>
+
+              <li class="li">
+                <a class="xref" href="../shared/../topics/impala_odbc.html#impala_odbc">ODBC driver</a>.
+              </li>
+
+              <li class="li">
+                Hue Beeswax and the Impala Query UI.
+              </li>
+            </ul>
+          </li>
+
+          <li class="li">
+            <a class="xref" href="../shared/../topics/impala_impala_shell.html#impala_shell">impala-shell command-line interface</a>.
+          </li>
+
+          <li class="li">
+            <a class="xref" href="../shared/../topics/impala_security.html#security">Kerberos authentication</a>.
+          </li>
+        </ul>
+      </div>
+    </div>
+  </article>
+</article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_invalidate_metadata.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_invalidate_metadata.html b/docs/build/html/topics/impala_invalidate_metadata.html
new file mode 100644
index 0000000..c06b9c9
--- /dev/null
+++ b/docs/build/html/topics/impala_invalidate_metadata.html
@@ -0,0 +1,294 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_langref_sql.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="invalidate_metadata"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>INVALIDATE METADATA Statement</title></head><body id="invalidate_metadata"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">INVALIDATE METADATA Statement</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      
+      Marks the metadata for one or all tables as stale. Required after a table is created through the Hive shell,
+      before the table is available for Impala queries. The next time the current Impala node performs a query
+      against a table whose metadata is invalidated, Impala reloads the associated metadata before the query
+      proceeds. This is a relatively expensive operation compared to the incremental metadata update done by the
+      <code class="ph codeph">REFRESH</code> statement, so in the common scenario of adding new data files to an existing table,
+      prefer <code class="ph codeph">REFRESH</code> rather than <code class="ph codeph">INVALIDATE METADATA</code>. If you are not familiar
+      with the way Impala uses metadata and how it shares the same metastore database as Hive, see
+      <a class="xref" href="impala_hadoop.html#intro_metastore">Overview of Impala Metadata and the Metastore</a> for background information.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>INVALIDATE METADATA [[<var class="keyword varname">db_name</var>.]<var class="keyword varname">table_name</var>]</code></pre>
+
+    <p class="p">
+      By default, the cached metadata for all tables is flushed. If you specify a table name, only the metadata for
+      that one table is flushed. Even for a single table, <code class="ph codeph">INVALIDATE METADATA</code> is more expensive
+      than <code class="ph codeph">REFRESH</code>, so prefer <code class="ph codeph">REFRESH</code> in the common case where you add new data
+      files for an existing table.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Internal details:</strong>
+      </p>
+
+    <p class="p">
+      To accurately respond to queries, Impala must have current metadata about those databases and tables that
+      clients query directly. Therefore, if some other entity modifies information used by Impala in the metastore
+      that Impala and Hive share, the information cached by Impala must be updated. However, this does not mean
+      that all metadata updates require an Impala update.
+    </p>
+
+    <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+      <p class="p">
+        In Impala 1.2.4 and higher, you can specify a table name with <code class="ph codeph">INVALIDATE METADATA</code> after
+        the table is created in Hive, allowing you to make individual tables visible to Impala without doing a full
+        reload of the catalog metadata. Impala 1.2.4 also includes other changes to make the metadata broadcast
+        mechanism faster and more responsive, especially during Impala startup. See
+        <a class="xref" href="../shared/../topics/impala_new_features.html#new_features_124">New Features in Impala 1.2.4</a> for details.
+      </p>
+      <p class="p">
+        In Impala 1.2 and higher, a dedicated daemon (<span class="keyword cmdname">catalogd</span>) broadcasts DDL changes made
+        through Impala to all Impala nodes. Formerly, after you created a database or table while connected to one
+        Impala node, you needed to issue an <code class="ph codeph">INVALIDATE METADATA</code> statement on another Impala node
+        before accessing the new database or table from the other node. Now, newly created or altered objects are
+        picked up automatically by all Impala nodes. You must still use the <code class="ph codeph">INVALIDATE METADATA</code>
+        technique after creating or altering objects through Hive. See
+        <a class="xref" href="impala_components.html#intro_catalogd">The Impala Catalog Service</a> for more information on the catalog service.
+      </p>
+      <p class="p">
+        The <code class="ph codeph">INVALIDATE METADATA</code> statement is new in Impala 1.1 and higher, and takes over some of
+        the use cases of the Impala 1.0 <code class="ph codeph">REFRESH</code> statement. Because <code class="ph codeph">REFRESH</code> now
+        requires a table name parameter, to flush the metadata for all tables at once, use the <code class="ph codeph">INVALIDATE
+        METADATA</code> statement.
+      </p>
+      <p class="p">
+      Because <code class="ph codeph">REFRESH <var class="keyword varname">table_name</var></code> only works for tables that the current
+      Impala node is already aware of, when you create a new table in the Hive shell, enter
+      <code class="ph codeph">INVALIDATE METADATA <var class="keyword varname">new_table</var></code> before you can see the new table in
+      <span class="keyword cmdname">impala-shell</span>. Once the table is known by Impala, you can issue <code class="ph codeph">REFRESH
+      <var class="keyword varname">table_name</var></code> after you add data files for that table.
+    </p>
+    </div>
+
+    <p class="p">
+      <code class="ph codeph">INVALIDATE METADATA</code> and <code class="ph codeph">REFRESH</code> are counterparts: <code class="ph codeph">INVALIDATE
+      METADATA</code> waits to reload the metadata when needed for a subsequent query, but reloads all the
+      metadata for the table, which can be an expensive operation, especially for large tables with many
+      partitions. <code class="ph codeph">REFRESH</code> reloads the metadata immediately, but only loads the block location
+      data for newly added data files, making it a less expensive operation overall. If data was altered in some
+      more extensive way, such as being reorganized by the HDFS balancer, use <code class="ph codeph">INVALIDATE
+      METADATA</code> to avoid a performance penalty from reduced local reads. If you used Impala version 1.0,
+      the <code class="ph codeph">INVALIDATE METADATA</code> statement works just like the Impala 1.0 <code class="ph codeph">REFRESH</code>
+      statement did, while the Impala 1.1 <code class="ph codeph">REFRESH</code> is optimized for the common use case of adding
+      new data files to an existing table, thus the table name argument is now required.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+    <p class="p">
+      A metadata update for an <code class="ph codeph">impalad</code> instance <strong class="ph b">is</strong> required if:
+    </p>
+
+    <ul class="ul">
+      <li class="li">
+        A metadata change occurs.
+      </li>
+
+      <li class="li">
+        <strong class="ph b">and</strong> the change is made from another <code class="ph codeph">impalad</code> instance in your cluster, or through
+        Hive.
+      </li>
+
+      <li class="li">
+        <strong class="ph b">and</strong> the change is made to a metastore database to which clients such as the Impala shell or ODBC directly
+        connect.
+      </li>
+    </ul>
+
+    <p class="p">
+      A metadata update for an Impala node is <strong class="ph b">not</strong> required when you issue queries from the same Impala node
+      where you ran <code class="ph codeph">ALTER TABLE</code>, <code class="ph codeph">INSERT</code>, or other table-modifying statement.
+    </p>
+
+    <p class="p">
+      Database and table metadata is typically modified by:
+    </p>
+
+    <ul class="ul">
+      <li class="li">
+        Hive - via <code class="ph codeph">ALTER</code>, <code class="ph codeph">CREATE</code>, <code class="ph codeph">DROP</code> or
+        <code class="ph codeph">INSERT</code> operations.
+      </li>
+
+      <li class="li">
+        Impalad - via <code class="ph codeph">CREATE TABLE</code>, <code class="ph codeph">ALTER TABLE</code>, and <code class="ph codeph">INSERT</code>
+        operations.
+      </li>
+    </ul>
+
+    <p class="p">
+      <code class="ph codeph">INVALIDATE METADATA</code> causes the metadata for that table to be marked as stale, and reloaded
+      the next time the table is referenced. For a huge table, that process could take a noticeable amount of time;
+      thus you might prefer to use <code class="ph codeph">REFRESH</code> where practical, to avoid an unpredictable delay later,
+      for example if the next reference to the table is during a benchmark test.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+    <p class="p">
+      The following example shows how you might use the <code class="ph codeph">INVALIDATE METADATA</code> statement after
+      creating new tables (such as SequenceFile or HBase tables) through the Hive shell. Before the
+      <code class="ph codeph">INVALIDATE METADATA</code> statement was issued, Impala would give a <span class="q">"table not found"</span> error
+      if you tried to refer to those table names. The <code class="ph codeph">DESCRIBE</code> statements cause the latest
+      metadata to be immediately loaded for the tables, avoiding a delay the next time those tables are queried.
+    </p>
+
+<pre class="pre codeblock"><code>[impalad-host:21000] &gt; invalidate metadata;
+[impalad-host:21000] &gt; describe t1;
+...
+[impalad-host:21000] &gt; describe t2;
+... </code></pre>
+
+    <p class="p">
+      For more examples of using <code class="ph codeph">REFRESH</code> and <code class="ph codeph">INVALIDATE METADATA</code> with a
+      combination of Impala and Hive operations, see <a class="xref" href="impala_tutorial.html#tutorial_impala_hive">Switching Back and Forth Between Impala and Hive</a>.
+    </p>
+
+    <p class="p">
+      If you need to ensure that the metadata is up-to-date when you start an <span class="keyword cmdname">impala-shell</span>
+      session, run <span class="keyword cmdname">impala-shell</span> with the <code class="ph codeph">-r</code> or
+      <code class="ph codeph">--refresh_after_connect</code> command-line option. Because this operation adds a delay to the next
+      query against each table, potentially expensive for large tables with many partitions, try to avoid using
+      this option for day-to-day operations in a production environment.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">HDFS permissions:</strong>
+      </p>
+    <p class="p">
+      The user ID that the <span class="keyword cmdname">impalad</span> daemon runs under,
+      typically the <code class="ph codeph">impala</code> user, must have execute
+      permissions for all the relevant directories holding table data.
+      (A table could have data spread across multiple directories,
+      or in unexpected paths, if it uses partitioning or
+      specifies a <code class="ph codeph">LOCATION</code> attribute for
+      individual partitions or the entire table.)
+      Issues with permissions might not cause an immediate error for this statement,
+      but subsequent statements such as <code class="ph codeph">SELECT</code>
+      or <code class="ph codeph">SHOW TABLE STATS</code> could fail.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">HDFS considerations:</strong>
+      </p>
+
+    <p class="p">
+      By default, the <code class="ph codeph">INVALIDATE METADATA</code> command checks HDFS permissions of the underlying data
+      files and directories, caching this information so that a statement can be cancelled immediately if for
+      example the <code class="ph codeph">impala</code> user does not have permission to write to the data directory for the
+      table. (This checking does not apply if you have set the <span class="keyword cmdname">catalogd</span> configuration option
+      <code class="ph codeph">--load_catalog_in_background=false</code>.) Impala reports any lack of write permissions as an
+      <code class="ph codeph">INFO</code> message in the log file, in case that represents an oversight. If you change HDFS
+      permissions to make data readable or writeable by the Impala user, issue another <code class="ph codeph">INVALIDATE
+      METADATA</code> to make Impala aware of the change.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+    <p class="p">
+      This example illustrates creating a new database and new table in Hive, then doing an <code class="ph codeph">INVALIDATE
+      METADATA</code> statement in Impala using the fully qualified table name, after which both the new table
+      and the new database are visible to Impala. The ability to specify <code class="ph codeph">INVALIDATE METADATA
+      <var class="keyword varname">table_name</var></code> for a table created in Hive is a new capability in Impala 1.2.4. In
+      earlier releases, that statement would have returned an error indicating an unknown table, requiring you to
+      do <code class="ph codeph">INVALIDATE METADATA</code> with no table name, a more expensive operation that reloaded metadata
+      for all tables and databases.
+    </p>
+
+<pre class="pre codeblock"><code>$ hive
+hive&gt; create database new_db_from_hive;
+OK
+Time taken: 4.118 seconds
+hive&gt; create table new_db_from_hive.new_table_from_hive (x int);
+OK
+Time taken: 0.618 seconds
+hive&gt; quit;
+$ impala-shell
+[localhost:21000] &gt; show databases like 'new*';
+[localhost:21000] &gt; refresh new_db_from_hive.new_table_from_hive;
+ERROR: AnalysisException: Database does not exist: new_db_from_hive
+[localhost:21000] &gt; invalidate metadata new_db_from_hive.new_table_from_hive;
+[localhost:21000] &gt; show databases like 'new*';
++--------------------+
+| name               |
++--------------------+
+| new_db_from_hive   |
++--------------------+
+[localhost:21000] &gt; show tables in new_db_from_hive;
++---------------------+
+| name                |
++---------------------+
+| new_table_from_hive |
++---------------------+</code></pre>
+
+    <p class="p">
+        <strong class="ph b">Amazon S3 considerations:</strong>
+      </p>
+    <p class="p">
+        The <code class="ph codeph">REFRESH</code> and <code class="ph codeph">INVALIDATE METADATA</code> statements also cache metadata
+        for tables where the data resides in the Amazon Simple Storage Service (S3).
+        In particular, issue a <code class="ph codeph">REFRESH</code> for a table after adding or removing files
+        in the associated S3 data directory.
+        See <a class="xref" href="../shared/../topics/impala_s3.html#s3">Using Impala with the Amazon S3 Filesystem</a> for details about working with S3 tables.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Cancellation:</strong> Cannot be cancelled.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Kudu considerations:</strong>
+      </p>
+    <p class="p">
+        Much of the metadata for Kudu tables is handled by the underlying
+        storage layer. Kudu tables have less reliance on the metastore
+        database, and require less metadata caching on the Impala side.
+        For example, information about partitions in Kudu tables is managed
+        by Kudu, and Impala does not cache any block locality metadata
+        for Kudu tables.
+      </p>
+    <p class="p">
+        The <code class="ph codeph">REFRESH</code> and <code class="ph codeph">INVALIDATE METADATA</code>
+        statements are needed less frequently for Kudu tables than for
+        HDFS-backed tables. Neither statement is needed when data is
+        added to, removed, or updated in a Kudu table, even if the changes
+        are made directly to Kudu through a client program using the Kudu API.
+        Run <code class="ph codeph">REFRESH <var class="keyword varname">table_name</var></code> or
+        <code class="ph codeph">INVALIDATE METADATA <var class="keyword varname">table_name</var></code>
+        for a Kudu table only after making a change to the Kudu table schema,
+        such as adding or dropping a column, by a mechanism other than
+        Impala.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+    <p class="p">
+      <a class="xref" href="impala_hadoop.html#intro_metastore">Overview of Impala Metadata and the Metastore</a>,
+      <a class="xref" href="impala_refresh.html#refresh">REFRESH Statement</a>
+    </p>
+
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_langref_sql.html">Impala SQL Statements</a></div></div></nav></article></main></body></html>
\ No newline at end of file

[39/51] [partial] incubator-impala git commit: IMPALA-4181 [DOCS] Publish rendered Impala documentation to ASF site

Posted by jb...@apache.org.

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_create_table.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_create_table.html b/docs/build/html/topics/impala_create_table.html
new file mode 100644
index 0000000..2f88c58
--- /dev/null
+++ b/docs/build/html/topics/impala_create_table.html
@@ -0,0 +1,1250 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_langref_sql.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="create_table"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>CREATE TABLE Statement</title></head><body class="impala sql_statement" id="create_table"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1 impala_title sql_statement_title" id="ariaid-title1">CREATE TABLE Statement</h1>
+
+  
+
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      
+      Creates a new table and specifies its characteristics. While creating a table, you
+      optionally specify aspects such as:
+    </p>
+
+    <ul class="ul">
+      <li class="li">
+        Whether the table is internal or external.
+      </li>
+
+      <li class="li">
+        The columns and associated data types.
+      </li>
+
+      <li class="li">
+        The columns used for physically partitioning the data.
+      </li>
+
+      <li class="li">
+        The file format for data files.
+      </li>
+
+      <li class="li">
+        The HDFS directory where the data files are located.
+      </li>
+    </ul>
+
+    <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+    <p class="p">
+      The general syntax for creating a table and specifying its columns is as follows:
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Explicit column definitions:</strong>
+    </p>
+
+<pre class="pre codeblock"><code>CREATE [EXTERNAL] TABLE [IF NOT EXISTS] [<var class="keyword varname">db_name</var>.]<var class="keyword varname">table_name</var>
+  (<var class="keyword varname">col_name</var> <var class="keyword varname">data_type</var>
+    [COMMENT '<var class="keyword varname">col_comment</var>']
+    [, ...]
+  )
+  [PARTITIONED BY (<var class="keyword varname">col_name</var> <var class="keyword varname">data_type</var> [COMMENT '<var class="keyword varname">col_comment</var>'], ...)]
+  [COMMENT '<var class="keyword varname">table_comment</var>']
+  [WITH SERDEPROPERTIES ('<var class="keyword varname">key1</var>'='<var class="keyword varname">value1</var>', '<var class="keyword varname">key2</var>'='<var class="keyword varname">value2</var>', ...)]
+  [
+   [ROW FORMAT <var class="keyword varname">row_format</var>] [STORED AS <var class="keyword varname">file_format</var>]
+  ]
+  [LOCATION '<var class="keyword varname">hdfs_path</var>']
+  [TBLPROPERTIES ('<var class="keyword varname">key1</var>'='<var class="keyword varname">value1</var>', '<var class="keyword varname">key2</var>'='<var class="keyword varname">value2</var>', ...)]
+<span class="ph">  [CACHED IN '<var class="keyword varname">pool_name</var>'</span> <span class="ph">[WITH REPLICATION = <var class="keyword varname">integer</var>]</span> | UNCACHED]
+</code></pre>
+
+    <p class="p">
+      <strong class="ph b">CREATE TABLE AS SELECT:</strong>
+    </p>
+
+<pre class="pre codeblock"><code>CREATE [EXTERNAL] TABLE [IF NOT EXISTS] <var class="keyword varname">db_name</var>.]<var class="keyword varname">table_name</var>
+  <span class="ph">[PARTITIONED BY (<var class="keyword varname">col_name</var>[, ...])]</span>
+  [COMMENT '<var class="keyword varname">table_comment</var>']
+  [WITH SERDEPROPERTIES ('<var class="keyword varname">key1</var>'='<var class="keyword varname">value1</var>', '<var class="keyword varname">key2</var>'='<var class="keyword varname">value2</var>', ...)]
+  [
+   [ROW FORMAT <var class="keyword varname">row_format</var>] <span class="ph">[STORED AS <var class="keyword varname">ctas_file_format</var>]</span>
+  ]
+  [LOCATION '<var class="keyword varname">hdfs_path</var>']
+  [TBLPROPERTIES ('<var class="keyword varname">key1</var>'='<var class="keyword varname">value1</var>', '<var class="keyword varname">key2</var>'='<var class="keyword varname">value2</var>', ...)]
+<span class="ph">  [CACHED IN '<var class="keyword varname">pool_name</var>'</span> <span class="ph">[WITH REPLICATION = <var class="keyword varname">integer</var>]</span> | UNCACHED]
+AS
+  <var class="keyword varname">select_statement</var></code></pre>
+
+<pre class="pre codeblock"><code>primitive_type:
+    TINYINT
+  | SMALLINT
+  | INT
+  | BIGINT
+  | BOOLEAN
+  | FLOAT
+  | DOUBLE
+  <span class="ph">| DECIMAL</span>
+  | STRING
+  <span class="ph">| CHAR</span>
+  <span class="ph">| VARCHAR</span>
+  | TIMESTAMP
+
+<span class="ph">complex_type:
+    struct_type
+  | array_type
+  | map_type
+
+struct_type: STRUCT &lt; <var class="keyword varname">name</var> : <var class="keyword varname">primitive_or_complex_type</var> [COMMENT '<var class="keyword varname">comment_string</var>'], ... &gt;
+
+array_type: ARRAY &lt; <var class="keyword varname">primitive_or_complex_type</var> &gt;
+
+map_type: MAP &lt; <var class="keyword varname">primitive_type</var>, <var class="keyword varname">primitive_or_complex_type</var> &gt;
+</span>
+row_format:
+  DELIMITED [FIELDS TERMINATED BY '<var class="keyword varname">char</var>' [ESCAPED BY '<var class="keyword varname">char</var>']]
+  [LINES TERMINATED BY '<var class="keyword varname">char</var>']
+
+file_format:
+    PARQUET
+  | TEXTFILE
+  | AVRO
+  | SEQUENCEFILE
+  | RCFILE
+
+<span class="ph">ctas_file_format:
+    PARQUET
+  | TEXTFILE</span>
+</code></pre>
+
+    <p class="p">
+      <strong class="ph b">Column definitions inferred from data file:</strong>
+    </p>
+
+<pre class="pre codeblock"><code>CREATE [EXTERNAL] TABLE [IF NOT EXISTS] [<var class="keyword varname">db_name</var>.]<var class="keyword varname">table_name</var>
+  LIKE PARQUET '<var class="keyword varname">hdfs_path_of_parquet_file</var>'
+  [COMMENT '<var class="keyword varname">table_comment</var>']
+  [PARTITIONED BY (<var class="keyword varname">col_name</var> <var class="keyword varname">data_type</var> [COMMENT '<var class="keyword varname">col_comment</var>'], ...)]
+  [WITH SERDEPROPERTIES ('<var class="keyword varname">key1</var>'='<var class="keyword varname">value1</var>', '<var class="keyword varname">key2</var>'='<var class="keyword varname">value2</var>', ...)]
+  [
+   [ROW FORMAT <var class="keyword varname">row_format</var>] [STORED AS <var class="keyword varname">file_format</var>]
+  ]
+  [LOCATION '<var class="keyword varname">hdfs_path</var>']
+  [TBLPROPERTIES ('<var class="keyword varname">key1</var>'='<var class="keyword varname">value1</var>', '<var class="keyword varname">key2</var>'='<var class="keyword varname">value2</var>', ...)]
+<span class="ph">  [CACHED IN '<var class="keyword varname">pool_name</var>'</span> <span class="ph">[WITH REPLICATION = <var class="keyword varname">integer</var>]</span> | UNCACHED]
+data_type:
+    <var class="keyword varname">primitive_type</var>
+  | array_type
+  | map_type
+  | struct_type
+</code></pre>
+
+    <p class="p">
+      <strong class="ph b">Kudu tables:</strong>
+    </p>
+
+<pre class="pre codeblock"><code>CREATE TABLE [IF NOT EXISTS] [<var class="keyword varname">db_name</var>.]<var class="keyword varname">table_name</var>
+  (<var class="keyword varname">col_name</var> <var class="keyword varname">data_type</var>
+    <span class="ph">[<var class="keyword varname">kudu_column_attribute</var> ...]</span>
+    [COMMENT '<var class="keyword varname">col_comment</var>']
+    [, ...]
+    [PRIMARY KEY (<var class="keyword varname">col_name</var>[, ...])]
+  )
+  <span class="ph">[PARTITION BY <var class="keyword varname">kudu_partition_clause</var></span>
+  [COMMENT '<var class="keyword varname">table_comment</var>']
+  STORED AS KUDU
+  [TBLPROPERTIES ('<var class="keyword varname">key1</var>'='<var class="keyword varname">value1</var>', '<var class="keyword varname">key2</var>'='<var class="keyword varname">value2</var>', ...)]
+</code></pre>
+
+    <div class="p">
+      <strong class="ph b">Kudu column attributes:</strong>
+<pre class="pre codeblock"><code>
+  PRIMARY KEY
+| [NOT] NULL
+| ENCODING <var class="keyword varname">codec</var>
+| COMPRESSION <var class="keyword varname">algorithm</var>
+| DEFAULT <var class="keyword varname">constant</var>
+| BLOCK_SIZE <var class="keyword varname">number</var>
+</code></pre>
+    </div>
+
+    <div class="p">
+      <strong class="ph b">kudu_partition_clause:</strong>
+<pre class="pre codeblock"><code>
+kudu_partition_clause ::= PARTITION BY [<var class="keyword varname">hash_clause</var>] [, <var class="keyword varname">range_clause</var> [ , <var class="keyword varname">range_clause</var> ] ]
+
+hash_clause ::=
+  HASH [ (<var class="keyword varname">pk_col</var> [, ...]) ]
+    PARTITIONS <var class="keyword varname">n</var>
+
+range_clause ::=
+  RANGE [ (<var class="keyword varname">pk_col</var> [, ...]) ]
+  (
+    {
+      PARTITION <var class="keyword varname">constant_expression</var> <var class="keyword varname">range_comparison_operator</var> VALUES <var class="keyword varname">range_comparison_operator</var> <var class="keyword varname">constant_expression</var>
+      | PARTITION VALUE = <var class="keyword varname">constant_expression_or_tuple</var>
+    }
+   [, ...]
+  )
+
+range_comparison_operator ::= { &lt; | &lt;= }
+</code></pre>
+    </div>
+
+    <p class="p">
+      <strong class="ph b">External Kudu tables:</strong>
+    </p>
+
+<pre class="pre codeblock"><code>CREATE EXTERNAL TABLE [IF NOT EXISTS] [<var class="keyword varname">db_name</var>.]<var class="keyword varname">table_name</var>
+  [COMMENT '<var class="keyword varname">table_comment</var>']
+  STORED AS KUDU
+  [TBLPROPERTIES ('kudu.table_name'='<var class="keyword varname">internal_kudu_name</var>')]
+</code></pre>
+
+    <p class="p">
+      <strong class="ph b">CREATE TABLE AS SELECT for Kudu tables:</strong>
+    </p>
+
+<pre class="pre codeblock"><code>CREATE TABLE [IF NOT EXISTS] <var class="keyword varname">db_name</var>.]<var class="keyword varname">table_name</var>
+  [PRIMARY KEY (<var class="keyword varname">col_name</var>[, ...])]
+  [PARTITION BY <var class="keyword varname">kudu_partition_clause</var>
+  [COMMENT '<var class="keyword varname">table_comment</var>']
+  STORED AS KUDU
+  [TBLPROPERTIES ('<var class="keyword varname">key1</var>'='<var class="keyword varname">value1</var>', '<var class="keyword varname">key2</var>'='<var class="keyword varname">value2</var>', ...)]
+AS
+  <var class="keyword varname">select_statement</var></code></pre>
+
+    <p class="p">
+        <strong class="ph b">Statement type:</strong> DDL
+      </p>
+
+
+
+    <p class="p">
+      <strong class="ph b">Column definitions:</strong>
+    </p>
+
+    <p class="p">
+      Depending on the form of the <code class="ph codeph">CREATE TABLE</code> statement, the column
+      definitions are required or not allowed.
+    </p>
+
+    <p class="p">
+      With the <code class="ph codeph">CREATE TABLE AS SELECT</code> and <code class="ph codeph">CREATE TABLE LIKE</code>
+      syntax, you do not specify the columns at all; the column names and types are derived from
+      the source table, query, or data file.
+    </p>
+
+    <p class="p">
+      With the basic <code class="ph codeph">CREATE TABLE</code> syntax, you must list one or more columns,
+      its name, type, and optionally a comment, in addition to any columns used as partitioning
+      keys. There is one exception where the column list is not required: when creating an Avro
+      table with the <code class="ph codeph">STORED AS AVRO</code> clause, you can omit the list of columns
+      and specify the same metadata as part of the <code class="ph codeph">TBLPROPERTIES</code> clause.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Complex type considerations:</strong>
+      </p>
+
+    <p class="p">
+      The Impala complex types (<code class="ph codeph">STRUCT</code>, <code class="ph codeph">ARRAY</code>, or
+      <code class="ph codeph">MAP</code>) are available in <span class="keyword">Impala 2.3</span> and higher.
+      Because you can nest these types (for example, to make an array of maps or a struct with
+      an array field), these types are also sometimes referred to as nested types. See
+      <a class="xref" href="impala_complex_types.html#complex_types">Complex Types (Impala 2.3 or higher only)</a> for usage details.
+    </p>
+
+
+
+    <p class="p">
+      Impala can create tables containing complex type columns, with any supported file format.
+      Because currently Impala can only query complex type columns in Parquet tables, creating
+      tables with complex type columns and other file formats such as text is of limited use.
+      For example, you might create a text table including some columns with complex types with
+      Impala, and use Hive as part of your to ingest the nested type data and copy it to an
+      identical Parquet table. Or you might create a partitioned table containing complex type
+      columns using one file format, and use <code class="ph codeph">ALTER TABLE</code> to change the file
+      format of individual partitions to Parquet; Impala can then query only the Parquet-format
+      partitions in that table.
+    </p>
+
+    <p class="p">
+        Partitioned tables can contain complex type columns.
+        All the partition key columns must be scalar types.
+      </p>
+
+    <p class="p">
+      <strong class="ph b">Internal and external tables (EXTERNAL and LOCATION clauses):</strong>
+    </p>
+
+    <p class="p">
+      By default, Impala creates an <span class="q">"internal"</span> table, where Impala manages the underlying
+      data files for the table, and physically deletes the data files when you drop the table.
+      If you specify the <code class="ph codeph">EXTERNAL</code> clause, Impala treats the table as an
+      <span class="q">"external"</span> table, where the data files are typically produced outside Impala and
+      queried from their original locations in HDFS, and Impala leaves the data files in place
+      when you drop the table. For details about internal and external tables, see
+      <a class="xref" href="impala_tables.html#tables">Overview of Impala Tables</a>.
+    </p>
+
+    <p class="p">
+      Typically, for an external table you include a <code class="ph codeph">LOCATION</code> clause to specify
+      the path to the HDFS directory where Impala reads and writes files for the table. For
+      example, if your data pipeline produces Parquet files in the HDFS directory
+      <span class="ph filepath">/user/etl/destination</span>, you might create an external table as follows:
+    </p>
+
+<pre class="pre codeblock"><code>CREATE EXTERNAL TABLE external_parquet (c1 INT, c2 STRING, c3 TIMESTAMP)
+  STORED AS PARQUET LOCATION '/user/etl/destination';
+</code></pre>
+
+    <p class="p">
+      Although the <code class="ph codeph">EXTERNAL</code> and <code class="ph codeph">LOCATION</code> clauses are often
+      specified together, <code class="ph codeph">LOCATION</code> is optional for external tables, and you can
+      also specify <code class="ph codeph">LOCATION</code> for internal tables. The difference is all about
+      whether Impala <span class="q">"takes control"</span> of the underlying data files and moves them when you
+      rename the table, or deletes them when you drop the table. For more about internal and
+      external tables and how they interact with the <code class="ph codeph">LOCATION</code> attribute, see
+      <a class="xref" href="impala_tables.html#tables">Overview of Impala Tables</a>.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Partitioned tables (PARTITIONED BY clause):</strong>
+    </p>
+
+    <p class="p">
+      The <code class="ph codeph">PARTITIONED BY</code> clause divides the data files based on the values from
+      one or more specified columns. Impala queries can use the partition metadata to minimize
+      the amount of data that is read from disk or transmitted across the network, particularly
+      during join queries. For details about partitioning, see
+      <a class="xref" href="impala_partitioning.html#partitioning">Partitioning for Impala Tables</a>.
+    </p>
+
+    <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+      <p class="p">
+        All Kudu tables require partitioning, which involves different syntax than non-Kudu
+        tables. See the <code class="ph codeph">PARTITION BY</code> clause, rather than <code class="ph codeph">PARTITIONED
+        BY</code>, for Kudu tables.
+      </p>
+    </div>
+
+    <p class="p">
+      Prior to <span class="keyword">Impala 2.5</span>, you could use a partitioned table as the
+      source and copy data from it, but could not specify any partitioning clauses for the new
+      table. In <span class="keyword">Impala 2.5</span> and higher, you can now use the
+      <code class="ph codeph">PARTITIONED BY</code> clause with a <code class="ph codeph">CREATE TABLE AS SELECT</code>
+      statement. See the examples under the following discussion of the <code class="ph codeph">CREATE TABLE AS
+      SELECT</code> syntax variation.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Kudu considerations:</strong>
+      </p>
+
+    <p class="p">
+      Because Kudu tables do not support clauses related to HDFS and S3 data files and
+      partitioning mechanisms, the syntax associated with the <code class="ph codeph">STORED AS KUDU</code>
+      clause is shown separately in the above syntax descriptions. Kudu tables have their own
+      syntax for <code class="ph codeph">CREATE TABLE</code>, <code class="ph codeph">CREATE EXTERNAL TABLE</code>, and
+      <code class="ph codeph">CREATE TABLE AS SELECT</code>. All internal Kudu tables require a
+      <code class="ph codeph">PARTITION BY</code> clause, different than the <code class="ph codeph">PARTITIONED BY</code>
+      clause for HDFS-backed tables.
+    </p>
+
+    <p class="p">
+      Here are some examples of creating empty Kudu tables:
+    </p>
+
+<pre class="pre codeblock"><code>
+-- Single-column primary key.
+CREATE TABLE kudu_t1 (id BIGINT PRIMARY key, s STRING, b BOOLEAN)
+  PARTITION BY HASH (id) PARTITIONS 20 STORED AS KUDU;
+
+-- Multi-column primary key.
+CREATE TABLE kudu_t2 (id BIGINT, s STRING, b BOOLEAN, PRIMARY KEY (id,s))
+  PARTITION BY HASH (s) PARTITIONS 30 STORED AS KUDU;
+
+-- Meaningful primary key column is good for range partitioning.
+CREATE TABLE kudu_t3 (id BIGINT, year INT, s STRING,
+    b BOOLEAN, PRIMARY KEY (id,year))
+  PARTITION BY HASH (id) PARTITIONS 20,
+  RANGE (year) (PARTITION 1980 &lt;= VALUES &lt; 1990,
+    PARTITION 1990 &lt;= VALUES &lt; 2000,
+    PARTITION VALUE = 2001,
+    PARTITION 2001 &lt; VALUES)
+  STORED AS KUDU;
+
+</code></pre>
+
+    <p class="p">
+      Here is an example of creating an external Kudu table:
+    </p>
+
+<pre class="pre codeblock"><code>
+-- Inherits column definitions from original table.
+-- For tables created through Impala, the kudu.table_name property
+-- comes from DESCRIBE FORMATTED output from the original table.
+CREATE EXTERNAL TABLE external_t1 STORED AS KUDU
+  TBLPROPERTIES ('kudu.table_name'='kudu_tbl_created_via_api');
+
+</code></pre>
+
+    <p class="p">
+      Here is an example of <code class="ph codeph">CREATE TABLE AS SELECT</code> syntax for a Kudu table:
+    </p>
+
+<pre class="pre codeblock"><code>
+-- The CTAS statement defines the primary key and partitioning scheme.
+-- The rest of the column definitions are derived from the select list.
+CREATE TABLE ctas_t1
+  PRIMARY KEY (id) PARTITION BY HASH (id) PARTITIONS 10
+  STORED AS KUDU
+  AS SELECT id, s FROM kudu_t1;
+
+</code></pre>
+
+    <p class="p">
+      The following <code class="ph codeph">CREATE TABLE</code> clauses are not supported for Kudu tables:
+    </p>
+
+    <ul class="ul">
+      <li class="li">
+        <code class="ph codeph">PARTITIONED BY</code> (Kudu tables use the clause <code class="ph codeph">PARTITION
+        BY</code> instead)
+      </li>
+
+      <li class="li">
+        <code class="ph codeph">LOCATION</code>
+      </li>
+
+      <li class="li">
+        <code class="ph codeph">ROWFORMAT</code>
+      </li>
+
+      <li class="li">
+        <code class="ph codeph">CACHED IN | UNCACHED</code>
+      </li>
+
+      <li class="li">
+        <code class="ph codeph">WITH SERDEPROPERTIES</code>
+      </li>
+    </ul>
+
+    <p class="p">
+      For more on the <code class="ph codeph">PRIMARY KEY</code> clause, see
+      <a class="xref" href="impala_kudu.html#kudu_primary_key">Primary Key Columns for Kudu Tables</a> and
+      <a class="xref" href="impala_kudu.html#kudu_primary_key_attribute">PRIMARY KEY Attribute</a>.
+    </p>
+
+    <p class="p">
+      For more on the <code class="ph codeph">NULL</code> and <code class="ph codeph">NOT NULL</code> attributes, see
+      <a class="xref" href="impala_kudu.html#kudu_not_null_attribute">NULL | NOT NULL Attribute</a>.
+    </p>
+
+    <p class="p">
+      For more on the <code class="ph codeph">ENCODING</code> attribute, see
+      <a class="xref" href="impala_kudu.html#kudu_encoding_attribute">ENCODING Attribute</a>.
+    </p>
+
+    <p class="p">
+      For more on the <code class="ph codeph">COMPRESSION</code> attribute, see
+      <a class="xref" href="impala_kudu.html#kudu_compression_attribute">COMPRESSION Attribute</a>.
+    </p>
+
+    <p class="p">
+      For more on the <code class="ph codeph">DEFAULT</code> attribute, see
+      <a class="xref" href="impala_kudu.html#kudu_default_attribute">DEFAULT Attribute</a>.
+    </p>
+
+    <p class="p">
+      For more on the <code class="ph codeph">BLOCK_SIZE</code> attribute, see
+      <a class="xref" href="impala_kudu.html#kudu_block_size_attribute">BLOCK_SIZE Attribute</a>.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Partitioning for Kudu tables (PARTITION BY clause)</strong>
+    </p>
+
+    <p class="p">
+      For Kudu tables, you specify logical partitioning across one or more columns using the
+      <code class="ph codeph">PARTITION BY</code> clause. In contrast to partitioning for HDFS-based tables,
+      multiple values for a partition key column can be located in the same partition. The
+      optional <code class="ph codeph">HASH</code> clause lets you divide one or a set of partition key
+      columns into a specified number of buckets. You can use more than one
+      <code class="ph codeph">HASH</code> clause, specifying a distinct set of partition key columns for each.
+      The optional <code class="ph codeph">RANGE</code> clause further subdivides the partitions, based on a
+      set of comparison operations for the partition key columns.
+    </p>
+
+    <p class="p">
+      Here are some examples of the <code class="ph codeph">PARTITION BY HASH</code> syntax:
+    </p>
+
+<pre class="pre codeblock"><code>
+-- Apply hash function to 1 primary key column.
+create table hash_t1 (x bigint, y bigint, s string, primary key (x,y))
+  partition by hash (x) partitions 10
+  stored as kudu;
+
+-- Apply hash function to a different primary key column.
+create table hash_t2 (x bigint, y bigint, s string, primary key (x,y))
+  partition by hash (y) partitions 10
+  stored as kudu;
+
+-- Apply hash function to both primary key columns.
+-- In this case, the total number of partitions is 10.
+create table hash_t3 (x bigint, y bigint, s string, primary key (x,y))
+  partition by hash (x,y) partitions 10
+  stored as kudu;
+
+-- When the column list is omitted, apply hash function to all primary key columns.
+create table hash_t4 (x bigint, y bigint, s string, primary key (x,y))
+  partition by hash partitions 10
+  stored as kudu;
+
+-- Hash the X values independently from the Y values.
+-- In this case, the total number of partitions is 10 x 20.
+create table hash_t5 (x bigint, y bigint, s string, primary key (x,y))
+  partition by hash (x) partitions 10, hash (y) partitions 20
+  stored as kudu;
+
+</code></pre>
+
+    <p class="p">
+      Here are some examples of the <code class="ph codeph">PARTITION BY RANGE</code> syntax:
+    </p>
+
+<pre class="pre codeblock"><code>
+-- Create partitions that cover every possible value of X.
+-- Ranges that span multiple values use the keyword VALUES between
+-- a pair of &lt; and &lt;= comparisons.
+create table range_t1 (x bigint, s string, s2 string, primary key (x, s))
+  partition by range (x)
+    (
+      partition 0 &lt;= values &lt;= 49, partition 50 &lt;= values &lt;= 100,
+      partition values &lt; 0, partition 100 &lt; values
+    )
+  stored as kudu;
+
+-- Create partitions that cover some possible values of X.
+-- Values outside the covered range(s) are rejected.
+-- New range partitions can be added through ALTER TABLE.
+create table range_t2 (x bigint, s string, s2 string, primary key (x, s))
+  partition by range (x)
+    (
+      partition 0 &lt;= values &lt;= 49, partition 50 &lt;= values &lt;= 100
+    )
+  stored as kudu;
+
+-- A range can also specify a single specific value, using the keyword VALUE
+-- with an = comparison.
+create table range_t3 (x bigint, s string, s2 string, primary key (x, s))
+  partition by range (s)
+    (
+      partition value = 'Yes', partition value = 'No', partition value = 'Maybe'
+    )
+  stored as kudu;
+
+-- Using multiple columns in the RANGE clause and tuples inside the partition spec
+-- only works for partitions specified with the VALUE= syntax.
+create table range_t4 (x bigint, s string, s2 string, primary key (x, s))
+  partition by range (x,s)
+    (
+      partition value = (0,'zero'), partition value = (1,'one'), partition value = (2,'two')
+    )
+  stored as kudu;
+
+</code></pre>
+
+    <p class="p">
+      Here are some examples combining both <code class="ph codeph">HASH</code> and <code class="ph codeph">RANGE</code>
+      syntax for the <code class="ph codeph">PARTITION BY</code> clause:
+    </p>
+
+<pre class="pre codeblock"><code>
+-- Values from each range partition are hashed into 10 associated buckets.
+-- Total number of partitions in this case is 10 x 2.
+create table combined_t1 (x bigint, s string, s2 string, primary key (x, s))
+  partition by hash (x) partitions 10, range (x)
+    (
+      partition 0 &lt;= values &lt;= 49, partition 50 &lt;= values &lt;= 100
+    )
+  stored as kudu;
+
+-- The hash partitioning and range partitioning can apply to different columns.
+-- But all the columns used in either partitioning scheme must be from the primary key.
+create table combined_t2 (x bigint, s string, s2 string, primary key (x, s))
+  partition by hash (s) partitions 10, range (x)
+    (
+      partition 0 &lt;= values &lt;= 49, partition 50 &lt;= values &lt;= 100
+    )
+  stored as kudu;
+
+</code></pre>
+
+    <p class="p">
+      For more usage details and examples of the Kudu partitioning syntax, see
+      <a class="xref" href="impala_kudu.html">Using Impala to Query Kudu Tables</a>.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Specifying file format (STORED AS and ROW FORMAT clauses):</strong>
+    </p>
+
+    <p class="p">
+      The <code class="ph codeph">STORED AS</code> clause identifies the format of the underlying data files.
+      Currently, Impala can query more types of file formats than it can create or insert into.
+      Use Hive to perform any create or data load operations that are not currently available in
+      Impala. For example, Impala can create an Avro, SequenceFile, or RCFile table but cannot
+      insert data into it. There are also Impala-specific procedures for using compression with
+      each kind of file format. For details about working with data files of various formats,
+      see <a class="xref" href="impala_file_formats.html#file_formats">How Impala Works with Hadoop File Formats</a>.
+    </p>
+
+    <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+      In Impala 1.4.0 and higher, Impala can create Avro tables, which formerly required doing
+      the <code class="ph codeph">CREATE TABLE</code> statement in Hive. See
+      <a class="xref" href="impala_avro.html#avro">Using the Avro File Format with Impala Tables</a> for details and examples.
+    </div>
+
+    <p class="p">
+      By default (when no <code class="ph codeph">STORED AS</code> clause is specified), data files in Impala
+      tables are created as text files with Ctrl-A (hex 01) characters as the delimiter.
+
+      Specify the <code class="ph codeph">ROW FORMAT DELIMITED</code> clause to produce or ingest data files
+      that use a different delimiter character such as tab or <code class="ph codeph">|</code>, or a different
+      line end character such as carriage return or newline. When specifying delimiter and line
+      end characters with the <code class="ph codeph">FIELDS TERMINATED BY</code> and <code class="ph codeph">LINES TERMINATED
+      BY</code> clauses, use <code class="ph codeph">'\t'</code> for tab, <code class="ph codeph">'\n'</code> for newline
+      or linefeed, <code class="ph codeph">'\r'</code> for carriage return, and
+      <code class="ph codeph">\</code><code class="ph codeph">0</code> for ASCII <code class="ph codeph">nul</code> (hex 00). For more
+      examples of text tables, see <a class="xref" href="impala_txtfile.html#txtfile">Using Text Data Files with Impala Tables</a>.
+    </p>
+
+    <p class="p">
+      The <code class="ph codeph">ESCAPED BY</code> clause applies both to text files that you create through
+      an <code class="ph codeph">INSERT</code> statement to an Impala <code class="ph codeph">TEXTFILE</code> table, and to
+      existing data files that you put into an Impala table directory. (You can ingest existing
+      data files either by creating the table with <code class="ph codeph">CREATE EXTERNAL TABLE ...
+      LOCATION</code>, the <code class="ph codeph">LOAD DATA</code> statement, or through an HDFS operation
+      such as <code class="ph codeph">hdfs dfs -put <var class="keyword varname">file</var>
+      <var class="keyword varname">hdfs_path</var></code>.) Choose an escape character that is not used
+      anywhere else in the file, and put it in front of each instance of the delimiter character
+      that occurs within a field value. Surrounding field values with quotation marks does not
+      help Impala to parse fields with embedded delimiter characters; the quotation marks are
+      considered to be part of the column value. If you want to use <code class="ph codeph">\</code> as the
+      escape character, specify the clause in <span class="keyword cmdname">impala-shell</span> as <code class="ph codeph">ESCAPED
+      BY '\\'</code>.
+    </p>
+
+    <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+        The <code class="ph codeph">CREATE TABLE</code> clauses <code class="ph codeph">FIELDS TERMINATED BY</code>, <code class="ph codeph">ESCAPED
+        BY</code>, and <code class="ph codeph">LINES TERMINATED BY</code> have special rules for the string literal used for
+        their argument, because they all require a single character. You can use a regular character surrounded by
+        single or double quotation marks, an octal sequence such as <code class="ph codeph">'\054'</code> (representing a comma),
+        or an integer in the range '-127'..'128' (with quotation marks but no backslash), which is interpreted as a
+        single-byte ASCII character. Negative values are subtracted from 256; for example, <code class="ph codeph">FIELDS
+        TERMINATED BY '-2'</code> sets the field delimiter to ASCII code 254, the <span class="q">"Icelandic Thorn"</span>
+        character used as a delimiter by some data formats.
+      </div>
+
+    <p class="p">
+      <strong class="ph b">Cloning tables (LIKE clause):</strong>
+    </p>
+
+    <p class="p">
+      To create an empty table with the same columns, comments, and other attributes as another
+      table, use the following variation. The <code class="ph codeph">CREATE TABLE ... LIKE</code> form allows
+      a restricted set of clauses, currently only the <code class="ph codeph">LOCATION</code>,
+      <code class="ph codeph">COMMENT</code>, and <code class="ph codeph">STORED AS</code> clauses.
+    </p>
+
+<pre class="pre codeblock"><code>CREATE [EXTERNAL] TABLE [IF NOT EXISTS] [<var class="keyword varname">db_name</var>.]<var class="keyword varname">table_name</var>
+  <span class="ph">LIKE { [<var class="keyword varname">db_name</var>.]<var class="keyword varname">table_name</var> | PARQUET '<var class="keyword varname">hdfs_path_of_parquet_file</var>' }</span>
+  [COMMENT '<var class="keyword varname">table_comment</var>']
+  [STORED AS <var class="keyword varname">file_format</var>]
+  [LOCATION '<var class="keyword varname">hdfs_path</var>']</code></pre>
+
+    <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+      <p class="p">
+        To clone the structure of a table and transfer data into it in a single operation, use
+        the <code class="ph codeph">CREATE TABLE AS SELECT</code> syntax described in the next subsection.
+      </p>
+    </div>
+
+    <p class="p">
+      When you clone the structure of an existing table using the <code class="ph codeph">CREATE TABLE ...
+      LIKE</code> syntax, the new table keeps the same file format as the original one, so you
+      only need to specify the <code class="ph codeph">STORED AS</code> clause if you want to use a different
+      file format, or when specifying a view as the original table. (Creating a table
+      <span class="q">"like"</span> a view produces a text table by default.)
+    </p>
+
+    <p class="p">
+      Although normally Impala cannot create an HBase table directly, Impala can clone the
+      structure of an existing HBase table with the <code class="ph codeph">CREATE TABLE ... LIKE</code>
+      syntax, preserving the file format and metadata from the original table.
+    </p>
+
+    <p class="p">
+      There are some exceptions to the ability to use <code class="ph codeph">CREATE TABLE ... LIKE</code>
+      with an Avro table. For example, you cannot use this technique for an Avro table that is
+      specified with an Avro schema but no columns. When in doubt, check if a <code class="ph codeph">CREATE
+      TABLE ... LIKE</code> operation works in Hive; if not, it typically will not work in
+      Impala either.
+    </p>
+
+    <p class="p">
+      If the original table is partitioned, the new table inherits the same partition key
+      columns. Because the new table is initially empty, it does not inherit the actual
+      partitions that exist in the original one. To create partitions in the new table, insert
+      data or issue <code class="ph codeph">ALTER TABLE ... ADD PARTITION</code> statements.
+    </p>
+
+    <p class="p">
+        Prior to Impala 1.4.0, it was not possible to use the <code class="ph codeph">CREATE TABLE LIKE
+        <var class="keyword varname">view_name</var></code> syntax. In Impala 1.4.0 and higher, you can create a table with the
+        same column definitions as a view using the <code class="ph codeph">CREATE TABLE LIKE</code> technique. Although
+        <code class="ph codeph">CREATE TABLE LIKE</code> normally inherits the file format of the original table, a view has no
+        underlying file format, so <code class="ph codeph">CREATE TABLE LIKE <var class="keyword varname">view_name</var></code> produces a text
+        table by default. To specify a different file format, include a <code class="ph codeph">STORED AS
+        <var class="keyword varname">file_format</var></code> clause at the end of the <code class="ph codeph">CREATE TABLE LIKE</code>
+        statement.
+      </p>
+
+    <p class="p">
+      Because <code class="ph codeph">CREATE TABLE ... LIKE</code> only manipulates table metadata, not the
+      physical data of the table, issue <code class="ph codeph">INSERT INTO TABLE</code> statements afterward
+      to copy any data from the original table into the new one, optionally converting the data
+      to a new file format. (For some file formats, Impala can do a <code class="ph codeph">CREATE TABLE ...
+      LIKE</code> to create the table, but Impala cannot insert data in that file format; in
+      these cases, you must load the data in Hive. See
+      <a class="xref" href="impala_file_formats.html#file_formats">How Impala Works with Hadoop File Formats</a> for details.)
+    </p>
+
+    <p class="p" id="create_table__ctas">
+      <strong class="ph b">CREATE TABLE AS SELECT:</strong>
+    </p>
+
+    <p class="p">
+      The <code class="ph codeph">CREATE TABLE AS SELECT</code> syntax is a shorthand notation to create a
+      table based on column definitions from another table, and copy data from the source table
+      to the destination table without issuing any separate <code class="ph codeph">INSERT</code> statement.
+      This idiom is so popular that it has its own acronym, <span class="q">"CTAS"</span>.
+    </p>
+
+    <p class="p">
+      The following examples show how to copy data from a source table <code class="ph codeph">T1</code> to a
+      variety of destinations tables, applying various transformations to the table properties,
+      table layout, or the data itself as part of the operation:
+    </p>
+
+<pre class="pre codeblock"><code>
+-- Sample table to be the source of CTAS operations.
+CREATE TABLE t1 (x INT, y STRING);
+INSERT INTO t1 VALUES (1, 'one'), (2, 'two'), (3, 'three');
+
+-- Clone all the columns and data from one table to another.
+CREATE TABLE clone_of_t1 AS SELECT * FROM t1;
++-------------------+
+| summary           |
++-------------------+
+| Inserted 3 row(s) |
++-------------------+
+
+-- Clone the columns and data, and convert the data to a different file format.
+CREATE TABLE parquet_version_of_t1 STORED AS PARQUET AS SELECT * FROM t1;
++-------------------+
+| summary           |
++-------------------+
+| Inserted 3 row(s) |
++-------------------+
+
+-- Copy only some rows to the new table.
+CREATE TABLE subset_of_t1 AS SELECT * FROM t1 WHERE x &gt;= 2;
++-------------------+
+| summary           |
++-------------------+
+| Inserted 2 row(s) |
++-------------------+
+
+-- Same idea as CREATE TABLE LIKE: clone table layout but do not copy any data.
+CREATE TABLE empty_clone_of_t1 AS SELECT * FROM t1 WHERE 1=0;
++-------------------+
+| summary           |
++-------------------+
+| Inserted 0 row(s) |
++-------------------+
+
+-- Reorder and rename columns and transform the data.
+CREATE TABLE t5 AS SELECT upper(y) AS s, x+1 AS a, 'Entirely new column' AS n FROM t1;
++-------------------+
+| summary           |
++-------------------+
+| Inserted 3 row(s) |
++-------------------+
+SELECT * FROM t5;
++-------+---+---------------------+
+| s     | a | n                   |
++-------+---+---------------------+
+| ONE   | 2 | Entirely new column |
+| TWO   | 3 | Entirely new column |
+| THREE | 4 | Entirely new column |
++-------+---+---------------------+
+</code></pre>
+
+
+
+
+
+    <p class="p">
+      See <a class="xref" href="impala_select.html#select">SELECT Statement</a> for details about query syntax for the
+      <code class="ph codeph">SELECT</code> portion of a <code class="ph codeph">CREATE TABLE AS SELECT</code> statement.
+    </p>
+
+    <p class="p">
+      The newly created table inherits the column names that you select from the original table,
+      which you can override by specifying column aliases in the query. Any column or table
+      comments from the original table are not carried over to the new table.
+    </p>
+
+    <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+      When using the <code class="ph codeph">STORED AS</code> clause with a <code class="ph codeph">CREATE TABLE AS
+      SELECT</code> statement, the destination table must be a file format that Impala can
+      write to: currently, text or Parquet. You cannot specify an Avro, SequenceFile, or RCFile
+      table as the destination table for a CTAS operation.
+    </div>
+
+    <p class="p">
+      Prior to <span class="keyword">Impala 2.5</span> you could use a partitioned table as the source
+      and copy data from it, but could not specify any partitioning clauses for the new table.
+      In <span class="keyword">Impala 2.5</span> and higher, you can now use the <code class="ph codeph">PARTITIONED
+      BY</code> clause with a <code class="ph codeph">CREATE TABLE AS SELECT</code> statement. The following
+      example demonstrates how you can copy data from an unpartitioned table in a <code class="ph codeph">CREATE
+      TABLE AS SELECT</code> operation, creating a new partitioned table in the process. The
+      main syntax consideration is the column order in the <code class="ph codeph">PARTITIONED BY</code>
+      clause and the select list: the partition key columns must be listed last in the select
+      list, in the same order as in the <code class="ph codeph">PARTITIONED BY</code> clause. Therefore, in
+      this case, the column order in the destination table is different from the source table.
+      You also only specify the column names in the <code class="ph codeph">PARTITIONED BY</code> clause, not
+      the data types or column comments.
+    </p>
+
+<pre class="pre codeblock"><code>
+create table partitions_no (year smallint, month tinyint, s string);
+insert into partitions_no values (2016, 1, 'January 2016'),
+  (2016, 2, 'February 2016'), (2016, 3, 'March 2016');
+
+-- Prove that the source table is not partitioned.
+show partitions partitions_no;
+ERROR: AnalysisException: Table is not partitioned: ctas_partition_by.partitions_no
+
+-- Create new table with partitions based on column values from source table.
+<strong class="ph b">create table partitions_yes partitioned by (year, month)
+  as select s, year, month from partitions_no;</strong>
++-------------------+
+| summary           |
++-------------------+
+| Inserted 3 row(s) |
++-------------------+
+
+-- Prove that the destination table is partitioned.
+show partitions partitions_yes;
++-------+-------+-------+--------+------+...
+| year  | month | #Rows | #Files | Size |...
++-------+-------+-------+--------+------+...
+| 2016  | 1     | -1    | 1      | 13B  |...
+| 2016  | 2     | -1    | 1      | 14B  |...
+| 2016  | 3     | -1    | 1      | 11B  |...
+| Total |       | -1    | 3      | 38B  |...
++-------+-------+-------+--------+------+...
+</code></pre>
+
+    <p class="p">
+      The most convenient layout for partitioned tables is with all the partition key columns at
+      the end. The CTAS <code class="ph codeph">PARTITIONED BY</code> syntax requires that column order in the
+      select list, resulting in that same column order in the destination table.
+    </p>
+
+<pre class="pre codeblock"><code>
+describe partitions_no;
++-------+----------+---------+
+| name  | type     | comment |
++-------+----------+---------+
+| year  | smallint |         |
+| month | tinyint  |         |
+| s     | string   |         |
++-------+----------+---------+
+
+-- The CTAS operation forced us to put the partition key columns last.
+-- Having those columns last works better with idioms such as SELECT *
+-- for partitioned tables.
+describe partitions_yes;
++-------+----------+---------+
+| name  | type     | comment |
++-------+----------+---------+
+| s     | string   |         |
+| year  | smallint |         |
+| month | tinyint  |         |
++-------+----------+---------+
+</code></pre>
+
+    <p class="p">
+      Attempting to use a select list with the partition key columns not at the end results in
+      an error due to a column name mismatch:
+    </p>
+
+<pre class="pre codeblock"><code>
+-- We expect this CTAS to fail because non-key column S
+-- comes after key columns YEAR and MONTH in the select list.
+create table partitions_maybe partitioned by (year, month)
+  as select year, month, s from partitions_no;
+ERROR: AnalysisException: Partition column name mismatch: year != month
+</code></pre>
+
+    <p class="p">
+      For example, the following statements show how you can clone all the data in a table, or a
+      subset of the columns and/or rows, or reorder columns, rename them, or construct them out
+      of expressions:
+    </p>
+
+    <p class="p">
+      As part of a CTAS operation, you can convert the data to any file format that Impala can
+      write (currently, <code class="ph codeph">TEXTFILE</code> and <code class="ph codeph">PARQUET</code>). You cannot
+      specify the lower-level properties of a text table, such as the delimiter.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Sorting considerations:</strong> Although you can specify an <code class="ph codeph">ORDER BY</code> clause in an
+        <code class="ph codeph">INSERT ... SELECT</code> statement, any <code class="ph codeph">ORDER BY</code> clause is ignored and the
+        results are not necessarily sorted. An <code class="ph codeph">INSERT ... SELECT</code> operation potentially creates
+        many different data files, prepared on different data nodes, and therefore the notion of the data being
+        stored in sorted order is impractical.
+      </p>
+
+    <p class="p">
+      <strong class="ph b">CREATE TABLE LIKE PARQUET:</strong>
+    </p>
+
+    <p class="p">
+      The variation <code class="ph codeph">CREATE TABLE ... LIKE PARQUET
+      '<var class="keyword varname">hdfs_path_of_parquet_file</var>'</code> lets you skip the column
+      definitions of the <code class="ph codeph">CREATE TABLE</code> statement. The column names and data
+      types are automatically configured based on the organization of the specified Parquet data
+      file, which must already reside in HDFS. You can use a data file located outside the
+      Impala database directories, or a file from an existing Impala Parquet table; either way,
+      Impala only uses the column definitions from the file and does not use the HDFS location
+      for the <code class="ph codeph">LOCATION</code> attribute of the new table. (Although you can also
+      specify the enclosing directory with the <code class="ph codeph">LOCATION</code> attribute, to both use
+      the same schema as the data file and point the Impala table at the associated directory
+      for querying.)
+    </p>
+
+    <p class="p">
+      The following considerations apply when you use the <code class="ph codeph">CREATE TABLE LIKE
+      PARQUET</code> technique:
+    </p>
+
+    <ul class="ul">
+      <li class="li">
+        Any column comments from the original table are not preserved in the new table. Each
+        column in the new table has a comment stating the low-level Parquet field type used to
+        deduce the appropriate SQL column type.
+      </li>
+
+      <li class="li">
+        If you use a data file from a partitioned Impala table, any partition key columns from
+        the original table are left out of the new table, because they are represented in HDFS
+        directory names rather than stored in the data file. To preserve the partition
+        information, repeat the same <code class="ph codeph">PARTITION</code> clause as in the original
+        <code class="ph codeph">CREATE TABLE</code> statement.
+      </li>
+
+      <li class="li">
+        The file format of the new table defaults to text, as with other kinds of <code class="ph codeph">CREATE
+        TABLE</code> statements. To make the new table also use Parquet format, include the
+        clause <code class="ph codeph">STORED AS PARQUET</code> in the <code class="ph codeph">CREATE TABLE LIKE
+        PARQUET</code> statement.
+      </li>
+
+      <li class="li">
+        If the Parquet data file comes from an existing Impala table, currently, any
+        <code class="ph codeph">TINYINT</code> or <code class="ph codeph">SMALLINT</code> columns are turned into
+        <code class="ph codeph">INT</code> columns in the new table. Internally, Parquet stores such values as
+        32-bit integers.
+      </li>
+
+      <li class="li">
+        When the destination table uses the Parquet file format, the <code class="ph codeph">CREATE TABLE AS
+        SELECT</code> and <code class="ph codeph">INSERT ... SELECT</code> statements always create at least
+        one data file, even if the <code class="ph codeph">SELECT</code> part of the statement does not match
+        any rows. You can use such an empty Parquet data file as a template for subsequent
+        <code class="ph codeph">CREATE TABLE LIKE PARQUET</code> statements.
+      </li>
+    </ul>
+
+    <p class="p">
+      For more details about creating Parquet tables, and examples of the <code class="ph codeph">CREATE TABLE
+      LIKE PARQUET</code> syntax, see <a class="xref" href="impala_parquet.html#parquet">Using the Parquet File Format with Impala Tables</a>.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Visibility and Metadata (TBLPROPERTIES and WITH SERDEPROPERTIES clauses):</strong>
+    </p>
+
+    <p class="p">
+      You can associate arbitrary items of metadata with a table by specifying the
+      <code class="ph codeph">TBLPROPERTIES</code> clause. This clause takes a comma-separated list of
+      key-value pairs and stores those items in the metastore database. You can also change the
+      table properties later with an <code class="ph codeph">ALTER TABLE</code> statement. You can observe the
+      table properties for different delimiter and escape characters using the <code class="ph codeph">DESCRIBE
+      FORMATTED</code> command, and change those settings for an existing table with
+      <code class="ph codeph">ALTER TABLE ... SET TBLPROPERTIES</code>.
+    </p>
+
+    <p class="p">
+      You can also associate SerDes properties with the table by specifying key-value pairs
+      through the <code class="ph codeph">WITH SERDEPROPERTIES</code> clause. This metadata is not used by
+      Impala, which has its own built-in serializer and deserializer for the file formats it
+      supports. Particular property values might be needed for Hive compatibility with certain
+      variations of file formats, particularly Avro.
+    </p>
+
+    <p class="p">
+      Some DDL operations that interact with other Hadoop components require specifying
+      particular values in the <code class="ph codeph">SERDEPROPERTIES</code> or
+      <code class="ph codeph">TBLPROPERTIES</code> fields, such as creating an Avro table or an HBase table.
+      (You typically create HBase tables in Hive, because they require additional clauses not
+      currently available in Impala.)
+
+    </p>
+
+    <p class="p">
+      To see the column definitions and column comments for an existing table, for example
+      before issuing a <code class="ph codeph">CREATE TABLE ... LIKE</code> or a <code class="ph codeph">CREATE TABLE ... AS
+      SELECT</code> statement, issue the statement <code class="ph codeph">DESCRIBE
+      <var class="keyword varname">table_name</var></code>. To see even more detail, such as the location of
+      data files and the values for clauses such as <code class="ph codeph">ROW FORMAT</code> and
+      <code class="ph codeph">STORED AS</code>, issue the statement <code class="ph codeph">DESCRIBE FORMATTED
+      <var class="keyword varname">table_name</var></code>. <code class="ph codeph">DESCRIBE FORMATTED</code> is also needed
+      to see any overall table comment (as opposed to individual column comments).
+    </p>
+
+    <p class="p">
+      After creating a table, your <span class="keyword cmdname">impala-shell</span> session or another
+      <span class="keyword cmdname">impala-shell</span> connected to the same node can immediately query that
+      table. There might be a brief interval (one statestore heartbeat) before the table can be
+      queried through a different Impala node. To make the <code class="ph codeph">CREATE TABLE</code>
+      statement return only when the table is recognized by all Impala nodes in the cluster,
+      enable the <code class="ph codeph">SYNC_DDL</code> query option.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">HDFS caching (CACHED IN clause):</strong>
+    </p>
+
+    <p class="p">
+      If you specify the <code class="ph codeph">CACHED IN</code> clause, any existing or future data files in
+      the table directory or the partition subdirectories are designated to be loaded into
+      memory with the HDFS caching mechanism. See
+      <a class="xref" href="impala_perf_hdfs_caching.html#hdfs_caching">Using HDFS Caching with Impala (Impala 2.1 or higher only)</a> for details about using the HDFS
+      caching feature.
+    </p>
+
+    <p class="p">
+        In <span class="keyword">Impala 2.2</span> and higher, the optional <code class="ph codeph">WITH REPLICATION</code> clause
+        for <code class="ph codeph">CREATE TABLE</code> and <code class="ph codeph">ALTER TABLE</code> lets you specify
+        a <dfn class="term">replication factor</dfn>, the number of hosts on which to cache the same data blocks.
+        When Impala processes a cached data block, where the cache replication factor is greater than 1, Impala randomly
+        selects a host that has a cached copy of that data block. This optimization avoids excessive CPU
+        usage on a single host when the same cached data block is processed multiple times.
+        Where practical, specify a value greater than or equal to the HDFS block replication factor.
+      </p>
+
+
+
+    <p class="p">
+      <strong class="ph b">Column order</strong>:
+    </p>
+
+    <p class="p">
+      If you intend to use the table to hold data files produced by some external source,
+      specify the columns in the same order as they appear in the data files.
+    </p>
+
+    <p class="p">
+      If you intend to insert or copy data into the table through Impala, or if you have control
+      over the way externally produced data files are arranged, use your judgment to specify
+      columns in the most convenient order:
+    </p>
+
+    <ul class="ul">
+      <li class="li">
+        <p class="p">
+          If certain columns are often <code class="ph codeph">NULL</code>, specify those columns last. You
+          might produce data files that omit these trailing columns entirely. Impala
+          automatically fills in the <code class="ph codeph">NULL</code> values if so.
+        </p>
+      </li>
+
+      <li class="li">
+        <p class="p">
+          If an unpartitioned table will be used as the source for an <code class="ph codeph">INSERT ...
+          SELECT</code> operation into a partitioned table, specify last in the unpartitioned
+          table any columns that correspond to partition key columns in the partitioned table,
+          and in the same order as the partition key columns are declared in the partitioned
+          table. This technique lets you use <code class="ph codeph">INSERT ... SELECT *</code> when copying
+          data to the partitioned table, rather than specifying each column name individually.
+        </p>
+      </li>
+
+      <li class="li">
+        <p class="p">
+          If you specify columns in an order that you later discover is suboptimal, you can
+          sometimes work around the problem without recreating the table. You can create a view
+          that selects columns from the original table in a permuted order, then do a
+          <code class="ph codeph">SELECT *</code> from the view. When inserting data into a table, you can
+          specify a permuted order for the inserted columns to match the order in the
+          destination table.
+        </p>
+      </li>
+    </ul>
+
+    <p class="p">
+        <strong class="ph b">Hive considerations:</strong>
+      </p>
+
+    <p class="p">
+      Impala queries can make use of metadata about the table and columns, such as the number of
+      rows in a table or the number of different values in a column. Prior to Impala 1.2.2, to
+      create this metadata, you issued the <code class="ph codeph">ANALYZE TABLE</code> statement in Hive to
+      gather this information, after creating the table and loading representative data into it.
+      In Impala 1.2.2 and higher, the <code class="ph codeph">COMPUTE STATS</code> statement produces these
+      statistics within Impala, without needing to use Hive at all.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">HBase considerations:</strong>
+      </p>
+
+    <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+      <p class="p">
+        The Impala <code class="ph codeph">CREATE TABLE</code> statement cannot create an HBase table, because
+        it currently does not support the <code class="ph codeph">STORED BY</code> clause needed for HBase
+        tables. Create such tables in Hive, then query them through Impala. For information on
+        using Impala with HBase tables, see <a class="xref" href="impala_hbase.html#impala_hbase">Using Impala to Query HBase Tables</a>.
+      </p>
+    </div>
+
+    <p class="p">
+        <strong class="ph b">Amazon S3 considerations:</strong>
+      </p>
+
+    <p class="p">
+      To create a table where the data resides in the Amazon Simple Storage Service (S3),
+      specify a <code class="ph codeph">s3a://</code> prefix <code class="ph codeph">LOCATION</code> attribute pointing to
+      the data files in S3.
+    </p>
+
+    <p class="p">
+      In <span class="keyword">Impala 2.6</span> and higher, you can use this special
+      <code class="ph codeph">LOCATION</code> syntax as part of a <code class="ph codeph">CREATE TABLE AS SELECT</code>
+      statement.
+    </p>
+
+    <p class="p">
+        In <span class="keyword">Impala 2.6</span> and higher, Impala DDL statements such as
+        <code class="ph codeph">CREATE DATABASE</code>, <code class="ph codeph">CREATE TABLE</code>, <code class="ph codeph">DROP DATABASE CASCADE</code>,
+        <code class="ph codeph">DROP TABLE</code>, and <code class="ph codeph">ALTER TABLE [ADD|DROP] PARTITION</code> can create or remove folders
+        as needed in the Amazon S3 system. Prior to <span class="keyword">Impala 2.6</span>, you had to create folders yourself and point
+        Impala database, tables, or partitions at them, and manually remove folders when no longer needed.
+        See <a class="xref" href="../shared/../topics/impala_s3.html#s3">Using Impala with the Amazon S3 Filesystem</a> for details about reading and writing S3 data with Impala.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Sorting considerations:</strong> Although you can specify an <code class="ph codeph">ORDER BY</code> clause in an
+        <code class="ph codeph">INSERT ... SELECT</code> statement, any <code class="ph codeph">ORDER BY</code> clause is ignored and the
+        results are not necessarily sorted. An <code class="ph codeph">INSERT ... SELECT</code> operation potentially creates
+        many different data files, prepared on different data nodes, and therefore the notion of the data being
+        stored in sorted order is impractical.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">HDFS considerations:</strong>
+      </p>
+
+    <p class="p">
+      The <code class="ph codeph">CREATE TABLE</code> statement for an internal table creates a directory in
+      HDFS. The <code class="ph codeph">CREATE EXTERNAL TABLE</code> statement associates the table with an
+      existing HDFS directory, and does not create any new directory in HDFS. To locate the HDFS
+      data directory for a table, issue a <code class="ph codeph">DESCRIBE FORMATTED
+      <var class="keyword varname">table</var></code> statement. To examine the contents of that HDFS
+      directory, use an OS command such as <code class="ph codeph">hdfs dfs -ls
+      hdfs://<var class="keyword varname">path</var></code>, either from the OS command line or through the
+      <code class="ph codeph">shell</code> or <code class="ph codeph">!</code> commands in <span class="keyword cmdname">impala-shell</span>.
+    </p>
+
+    <p class="p">
+      The <code class="ph codeph">CREATE TABLE AS SELECT</code> syntax creates data files under the table data
+      directory to hold any data copied by the <code class="ph codeph">INSERT</code> portion of the statement.
+      (Even if no data is copied, Impala might create one or more empty data files.)
+    </p>
+
+    <p class="p">
+        <strong class="ph b">HDFS permissions:</strong>
+      </p>
+
+    <p class="p">
+      The user ID that the <span class="keyword cmdname">impalad</span> daemon runs under, typically the
+      <code class="ph codeph">impala</code> user, must have both execute and write permission for the database
+      directory where the table is being created.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Security considerations:</strong>
+      </p>
+
+    <p class="p">
+        If these statements in your environment contain sensitive literal values such as credit card numbers or tax
+        identifiers, Impala can redact this sensitive information when displaying the statements in log files and
+        other administrative contexts. See <span class="xref">the documentation for your Apache Hadoop distribution</span> for details.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Cancellation:</strong> Certain multi-stage statements (<code class="ph codeph">CREATE TABLE AS SELECT</code> and
+        <code class="ph codeph">COMPUTE STATS</code>) can be cancelled during some stages, when running <code class="ph codeph">INSERT</code>
+        or <code class="ph codeph">SELECT</code> operations internally. To cancel this statement, use Ctrl-C from the
+        <span class="keyword cmdname">impala-shell</span> interpreter, the <span class="ph uicontrol">Cancel</span> button from the
+        <span class="ph uicontrol">Watch</span> page in Hue, or <span class="ph uicontrol">Cancel</span> from the list of
+        in-flight queries (for a particular node) on the <span class="ph uicontrol">Queries</span> tab in the Impala web UI
+        (port 25000).
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+    <p class="p">
+      <a class="xref" href="impala_tables.html#tables">Overview of Impala Tables</a>,
+      <a class="xref" href="impala_alter_table.html#alter_table">ALTER TABLE Statement</a>,
+      <a class="xref" href="impala_drop_table.html#drop_table">DROP TABLE Statement</a>,
+      <a class="xref" href="impala_partitioning.html#partitioning">Partitioning for Impala Tables</a>,
+      <a class="xref" href="impala_tables.html#internal_tables">Internal Tables</a>,
+      <a class="xref" href="impala_tables.html#external_tables">External Tables</a>,
+      <a class="xref" href="impala_compute_stats.html#compute_stats">COMPUTE STATS Statement</a>,
+      <a class="xref" href="impala_sync_ddl.html#sync_ddl">SYNC_DDL Query Option</a>, <a class="xref" href="impala_show.html#show_tables">SHOW TABLES Statement</a>,
+      <a class="xref" href="impala_show.html#show_create_table">SHOW CREATE TABLE Statement</a>,
+      <a class="xref" href="impala_describe.html#describe">DESCRIBE Statement</a>
+    </p>
+
+  </div>
+
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_langref_sql.html">Impala SQL Statements</a></div></div></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_create_view.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_create_view.html b/docs/build/html/topics/impala_create_view.html
new file mode 100644
index 0000000..3d56969
--- /dev/null
+++ b/docs/build/html/topics/impala_create_view.html
@@ -0,0 +1,194 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_langref_sql.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="create_view"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>CREATE VIEW Statement</title></head><body id="create_view"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">CREATE VIEW Statement</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      
+      The <code class="ph codeph">CREATE VIEW</code> statement lets you create a shorthand abbreviation for a more complicated
+      query. The base query can involve joins, expressions, reordered columns, column aliases, and other SQL
+      features that can make a query hard to understand or maintain.
+    </p>
+
+    <p class="p">
+      Because a view is purely a logical construct (an alias for a query) with no physical data behind it,
+      <code class="ph codeph">ALTER VIEW</code> only involves changes to metadata in the metastore database, not any data files
+      in HDFS.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>CREATE VIEW [IF NOT EXISTS] <var class="keyword varname">view_name</var> [(<var class="keyword varname">column_list</var>)]
+  AS <var class="keyword varname">select_statement</var></code></pre>
+
+    <p class="p">
+        <strong class="ph b">Statement type:</strong> DDL
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+    <p class="p">
+      The <code class="ph codeph">CREATE VIEW</code> statement can be useful in scenarios such as the following:
+    </p>
+
+    <ul class="ul">
+      <li class="li">
+        To turn even the most lengthy and complicated SQL query into a one-liner. You can issue simple queries
+        against the view from applications, scripts, or interactive queries in <span class="keyword cmdname">impala-shell</span>.
+        For example:
+<pre class="pre codeblock"><code>select * from <var class="keyword varname">view_name</var>;
+select * from <var class="keyword varname">view_name</var> order by c1 desc limit 10;</code></pre>
+        The more complicated and hard-to-read the original query, the more benefit there is to simplifying the
+        query using a view.
+      </li>
+
+      <li class="li">
+        To hide the underlying table and column names, to minimize maintenance problems if those names change. In
+        that case, you re-create the view using the new names, and all queries that use the view rather than the
+        underlying tables keep running with no changes.
+      </li>
+
+      <li class="li">
+        To experiment with optimization techniques and make the optimized queries available to all applications.
+        For example, if you find a combination of <code class="ph codeph">WHERE</code> conditions, join order, join hints, and so
+        on that works the best for a class of queries, you can establish a view that incorporates the
+        best-performing techniques. Applications can then make relatively simple queries against the view, without
+        repeating the complicated and optimized logic over and over. If you later find a better way to optimize the
+        original query, when you re-create the view, all the applications immediately take advantage of the
+        optimized base query.
+      </li>
+
+      <li class="li">
+        To simplify a whole class of related queries, especially complicated queries involving joins between
+        multiple tables, complicated expressions in the column list, and other SQL syntax that makes the query
+        difficult to understand and debug. For example, you might create a view that joins several tables, filters
+        using several <code class="ph codeph">WHERE</code> conditions, and selects several columns from the result set.
+        Applications might issue queries against this view that only vary in their <code class="ph codeph">LIMIT</code>,
+        <code class="ph codeph">ORDER BY</code>, and similar simple clauses.
+      </li>
+    </ul>
+
+    <p class="p">
+      For queries that require repeating complicated clauses over and over again, for example in the select list,
+      <code class="ph codeph">ORDER BY</code>, and <code class="ph codeph">GROUP BY</code> clauses, you can use the <code class="ph codeph">WITH</code>
+      clause as an alternative to creating a view.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Complex type considerations:</strong>
+      </p>
+    <p class="p">
+        For tables containing complex type columns (<code class="ph codeph">ARRAY</code>,
+        <code class="ph codeph">STRUCT</code>, or <code class="ph codeph">MAP</code>), you typically use
+        join queries to refer to the complex values. You can use views to
+        hide the join notation, making such tables seem like traditional denormalized
+        tables, and making those tables queryable by business intelligence tools
+        that do not have built-in support for those complex types.
+        See <a class="xref" href="../shared/../topics/impala_complex_types.html#complex_types_views">Accessing Complex Type Data in Flattened Form Using Views</a> for details.
+      </p>
+    <p class="p">
+        Because you cannot directly issue <code class="ph codeph">SELECT <var class="keyword varname">col_name</var></code>
+        against a column of complex type, you cannot use a view or a <code class="ph codeph">WITH</code>
+        clause to <span class="q">"rename"</span> a column by selecting it with a column alias.
+      </p>
+
+    <p class="p">
+        If you connect to different Impala nodes within an <span class="keyword cmdname">impala-shell</span> session for
+        load-balancing purposes, you can enable the <code class="ph codeph">SYNC_DDL</code> query option to make each DDL
+        statement wait before returning, until the new or changed metadata has been received by all the Impala
+        nodes. See <a class="xref" href="../shared/../topics/impala_sync_ddl.html#sync_ddl">SYNC_DDL Query Option</a> for details.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Security considerations:</strong>
+      </p>
+    <p class="p">
+        If these statements in your environment contain sensitive literal values such as credit card numbers or tax
+        identifiers, Impala can redact this sensitive information when displaying the statements in log files and
+        other administrative contexts. See <span class="xref">the documentation for your Apache Hadoop distribution</span> for details.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Cancellation:</strong> Cannot be cancelled.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">HDFS permissions:</strong> This statement does not touch any HDFS files or directories,
+        therefore no HDFS permissions are required.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+
+
+<pre class="pre codeblock"><code>-- Create a view that is exactly the same as the underlying table.
+create view v1 as select * from t1;
+
+-- Create a view that includes only certain columns from the underlying table.
+create view v2 as select c1, c3, c7 from t1;
+
+-- Create a view that filters the values from the underlying table.
+create view v3 as select distinct c1, c3, c7 from t1 where c1 is not null and c5 &gt; 0;
+
+-- Create a view that that reorders and renames columns from the underlying table.
+create view v4 as select c4 as last_name, c6 as address, c2 as birth_date from t1;
+
+-- Create a view that runs functions to convert or transform certain columns.
+create view v5 as select c1, cast(c3 as string) c3, concat(c4,c5) c5, trim(c6) c6, "Constant" c8 from t1;
+
+-- Create a view that hides the complexity of a view query.
+create view v6 as select t1.c1, t2.c2 from t1 join t2 on t1.id = t2.id;
+</code></pre>
+
+
+
+    <div class="p">
+        The following example creates a series of views and then drops them. These examples illustrate how views
+        are associated with a particular database, and both the view definitions and the view names for
+        <code class="ph codeph">CREATE VIEW</code> and <code class="ph codeph">DROP VIEW</code> can refer to a view in the current database or
+        a fully qualified view name.
+<pre class="pre codeblock"><code>
+-- Create and drop a view in the current database.
+CREATE VIEW few_rows_from_t1 AS SELECT * FROM t1 LIMIT 10;
+DROP VIEW few_rows_from_t1;
+
+-- Create and drop a view referencing a table in a different database.
+CREATE VIEW table_from_other_db AS SELECT x FROM db1.foo WHERE x IS NOT NULL;
+DROP VIEW table_from_other_db;
+
+USE db1;
+-- Create a view in a different database.
+CREATE VIEW db2.v1 AS SELECT * FROM db2.foo;
+-- Switch into the other database and drop the view.
+USE db2;
+DROP VIEW v1;
+
+USE db1;
+-- Create a view in a different database.
+CREATE VIEW db2.v1 AS SELECT * FROM db2.foo;
+-- Drop a view in the other database.
+DROP VIEW db2.v1;
+</code></pre>
+      </div>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+    <p class="p">
+      <a class="xref" href="impala_views.html#views">Overview of Impala Views</a>, <a class="xref" href="impala_alter_view.html#alter_view">ALTER VIEW Statement</a>,
+      <a class="xref" href="impala_drop_view.html#drop_view">DROP VIEW Statement</a>
+    </p>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_langref_sql.html">Impala SQL Statements</a></div></div></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_databases.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_databases.html b/docs/build/html/topics/impala_databases.html
new file mode 100644
index 0000000..83e9911
--- /dev/null
+++ b/docs/build/html/topics/impala_databases.html
@@ -0,0 +1,62 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_schema_objects.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="databases"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Overview of Impala Databases</title></head><body id="databases"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Overview of Impala Databases</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      In Impala, a database is a logical container for a group of tables. Each database defines a separate
+      namespace. Within a database, you can refer to the tables inside it using their unqualified names. Different
+      databases can contain tables with identical names.
+    </p>
+
+    <p class="p">
+      Creating a database is a lightweight operation. There are minimal database-specific properties to configure,
+      only <code class="ph codeph">LOCATION</code> and <code class="ph codeph">COMMENT</code>.  There is no <code class="ph codeph">ALTER DATABASE</code> statement.
+    </p>
+
+    <p class="p">
+      Typically, you create a separate database for each project or application, to avoid naming conflicts between
+      tables and to make clear which tables are related to each other. The <code class="ph codeph">USE</code> statement lets
+      you switch between databases. Unqualified references to tables, views, and functions refer to objects
+      within the current database. You can also refer to objects in other databases by using qualified names
+      of the form <code class="ph codeph"><var class="keyword varname">dbname</var>.<var class="keyword varname">object_name</var></code>.
+    </p>
+
+    <p class="p">
+      Each database is physically represented by a directory in HDFS. When you do not specify a <code class="ph codeph">LOCATION</code>
+      attribute, the directory is located in the Impala data directory with the associated tables managed by Impala.
+      When you do specify a <code class="ph codeph">LOCATION</code> attribute, any read and write operations for tables in that
+      database are relative to the specified HDFS directory.
+    </p>
+
+    <p class="p">
+      There is a special database, named <code class="ph codeph">default</code>, where you begin when you connect to Impala.
+      Tables created in <code class="ph codeph">default</code> are physically located one level higher in HDFS than all the
+      user-created databases.
+    </p>
+
+    <div class="p">
+        Impala includes another predefined database, <code class="ph codeph">_impala_builtins</code>, that serves as the location
+        for the <a class="xref" href="../shared/../topics/impala_functions.html#builtins">built-in functions</a>. To see the built-in
+        functions, use a statement like the following:
+<pre class="pre codeblock"><code>show functions in _impala_builtins;
+show functions in _impala_builtins like '*<var class="keyword varname">substring</var>*';
+</code></pre>
+      </div>
+
+    <p class="p">
+      <strong class="ph b">Related statements:</strong>
+    </p>
+
+    <p class="p">
+      <a class="xref" href="impala_create_database.html#create_database">CREATE DATABASE Statement</a>,
+      <a class="xref" href="impala_drop_database.html#drop_database">DROP DATABASE Statement</a>, <a class="xref" href="impala_use.html#use">USE Statement</a>,
+      <a class="xref" href="impala_show.html#show_databases">SHOW DATABASES</a>
+    </p>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_schema_objects.html">Impala Schema Objects and Object Names</a></div></div></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_datatypes.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_datatypes.html b/docs/build/html/topics/impala_datatypes.html
new file mode 100644
index 0000000..dfe9b71
--- /dev/null
+++ b/docs/build/html/topics/impala_datatypes.html
@@ -0,0 +1,33 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_langref.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_array.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_bigint.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_boolean.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_char.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_decimal.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_double.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_float.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_int.html"><meta name="DC.Relation" scheme="URI" content=".
 ./topics/impala_map.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_real.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_smallint.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_string.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_struct.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_timestamp.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_tinyint.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_varchar.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_complex_types.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="datatypes"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Data Types</title></h
 ead><body id="datatypes"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Data Types</h1>
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      
+      Impala supports a set of data types that you can use for table columns, expression values, and function
+      arguments and return values.
+    </p>
+
+    <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+      Currently, Impala supports only scalar types, not composite or nested types. Accessing a table containing any
+      columns with unsupported types causes an error.
+    </div>
+
+    <p class="p toc"></p>
+
+    <p class="p">
+      For the notation to write literals of each of these data types, see
+      <a class="xref" href="impala_literals.html#literals">Literals</a>.
+    </p>
+
+    <p class="p">
+      See <a class="xref" href="impala_langref_unsupported.html#langref_hiveql_delta">SQL Differences Between Impala and Hive</a> for differences between Impala and
+      Hive data types.
+    </p>
+  </div>
+<nav role="navigation" class="related-links"><ul class="ullinks"><li class="link ulchildlink"><strong><a href="../topics/impala_array.html">ARRAY Complex Type (Impala 2.3 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_bigint.html">BIGINT Data Type</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_boolean.html">BOOLEAN Data Type</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_char.html">CHAR Data Type (Impala 2.0 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_decimal.html">DECIMAL Data Type (Impala 1.4 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_double.html">DOUBLE Data Type</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_float.html">FLOAT Data Type</a></strong><br></li><li class="link ulchildlink"><strong><a href="../to
 pics/impala_int.html">INT Data Type</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_map.html">MAP Complex Type (Impala 2.3 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_real.html">REAL Data Type</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_smallint.html">SMALLINT Data Type</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_string.html">STRING Data Type</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_struct.html">STRUCT Complex Type (Impala 2.3 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_timestamp.html">TIMESTAMP Data Type</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_tinyint.html">TINYINT Data Type</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_varchar.html">V
 ARCHAR Data Type (Impala 2.0 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_complex_types.html">Complex Types (Impala 2.3 or higher only)</a></strong><br></li></ul><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_langref.html">Impala SQL Language Reference</a></div></div></nav></article></main></body></html>
\ No newline at end of file

[20/51] [partial] incubator-impala git commit: IMPALA-4181 [DOCS] Publish rendered Impala documentation to ASF site

Posted by jb...@apache.org.

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_num_nodes.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_num_nodes.html b/docs/build/html/topics/impala_num_nodes.html
new file mode 100644
index 0000000..9f98c68
--- /dev/null
+++ b/docs/build/html/topics/impala_num_nodes.html
@@ -0,0 +1,61 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="num_nodes"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>NUM_NODES Query Option</title></head><body id="num_nodes"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">NUM_NODES Query Option</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      
+      Limit the number of nodes that process a query, typically during debugging.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Type:</strong> numeric
+    </p>
+
+<p class="p">
+      <strong class="ph b">Allowed values:</strong> Only accepts the values 0
+      (meaning all nodes) or 1 (meaning all work is done on the coordinator node).
+</p>
+
+    <p class="p">
+      <strong class="ph b">Default:</strong> 0
+    </p>
+
+     <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+     <p class="p">
+       If you are diagnosing a problem that you suspect is due to a timing issue due to
+       distributed query processing, you can set <code class="ph codeph">NUM_NODES=1</code> to verify
+       if the problem still occurs when all the work is done on a single node.
+     </p>
+
+    <p class="p">
+        You might set the <code class="ph codeph">NUM_NODES</code> option to 1 briefly, during <code class="ph codeph">INSERT</code> or
+        <code class="ph codeph">CREATE TABLE AS SELECT</code> statements. Normally, those statements produce one or more data
+        files per data node. If the write operation involves small amounts of data, a Parquet table, and/or a
+        partitioned table, the default behavior could produce many small files when intuitively you might expect
+        only a single output file. <code class="ph codeph">SET NUM_NODES=1</code> turns off the <span class="q">"distributed"</span> aspect of the
+        write operation, making it more likely to produce only one or a few data files.
+      </p>
+
+    <div class="note warning note_warning"><span class="note__title warningtitle">Warning:</span> 
+    <p class="p">
+      Because this option results in increased resource utilization on a single host,
+      it could cause problems due to contention with other Impala statements or
+      high resource usage. Symptoms could include queries running slowly, exceeding the memory limit,
+      or appearing to hang. Use it only in a single-user development/test environment;
+      <strong class="ph b">do not</strong> use it in a production environment or in a cluster with a high-concurrency
+      or high-volume or performance-critical workload.
+    </p>
+    </div>
+
+
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_num_scanner_threads.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_num_scanner_threads.html b/docs/build/html/topics/impala_num_scanner_threads.html
new file mode 100644
index 0000000..9bd9375
--- /dev/null
+++ b/docs/build/html/topics/impala_num_scanner_threads.html
@@ -0,0 +1,27 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="num_scanner_threads"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>NUM_SCANNER_THREADS Query Option</title></head><body id="num_scanner_threads"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">NUM_SCANNER_THREADS Query Option</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      
+      Maximum number of scanner threads (on each node) used for each query. By default, Impala uses as many cores
+      as are available (one thread per core). You might lower this value if queries are using excessive resources
+      on a busy cluster. Impala imposes a maximum value automatically, so a high value has no practical effect.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Type:</strong> numeric
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Default:</strong> 0
+    </p>
+
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_odbc.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_odbc.html b/docs/build/html/topics/impala_odbc.html
new file mode 100644
index 0000000..cd9aec9
--- /dev/null
+++ b/docs/build/html/topics/impala_odbc.html
@@ -0,0 +1,24 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_config.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="impala_odbc"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Configuring Impala to Work with ODBC</title></head><body id="impala_odbc"><main role="main"><article role="article" aria-labelledby="impala_odbc__odbc">
+
+  <h1 class="title topictitle1" id="impala_odbc__odbc">Configuring Impala to Work with ODBC</h1>
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      
+      Third-party products, especially business intelligence and reporting tools, can access Impala
+      using the ODBC protocol. For the best experience, ensure any third-party product you intend to use is supported.
+      Verifying support includes checking that the versions of Impala, ODBC, the operating system, the
+      Apache Hadoop distribution, and the third-party product have all been approved by the appropriate suppliers
+      for use together. To configure your systems to use ODBC, download and install a connector, typically from
+      the supplier of the third-party product or the Hadoop distribution.
+      You may need to sign in and accept license agreements before accessing the pages required for downloading
+      ODBC connectors.
+    </p>
+
+  </div>
+
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_config.html">Managing Impala</a></div></div></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_offset.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_offset.html b/docs/build/html/topics/impala_offset.html
new file mode 100644
index 0000000..6de1515
--- /dev/null
+++ b/docs/build/html/topics/impala_offset.html
@@ -0,0 +1,67 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_select.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="offset"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>OFFSET Clause</title></head><body id="offset"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">OFFSET Clause</h1>
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      The <code class="ph codeph">OFFSET</code> clause in a <code class="ph codeph">SELECT</code> query causes the result set to start some
+      number of rows after the logical first item. The result set is numbered starting from zero, so <code class="ph codeph">OFFSET
+      0</code> produces the same result as leaving out the <code class="ph codeph">OFFSET</code> clause. Always use this clause
+      in combination with <code class="ph codeph">ORDER BY</code> (so that it is clear which item should be first, second, and so
+      on) and <code class="ph codeph">LIMIT</code> (so that the result set covers a bounded range, such as items 0-9, 100-199,
+      and so on).
+    </p>
+
+    <p class="p">
+        In Impala 1.2.1 and higher, you can combine a <code class="ph codeph">LIMIT</code> clause with an <code class="ph codeph">OFFSET</code>
+        clause to produce a small result set that is different from a top-N query, for example, to return items 11
+        through 20. This technique can be used to simulate <span class="q">"paged"</span> results. Because Impala queries typically
+        involve substantial amounts of I/O, use this technique only for compatibility in cases where you cannot
+        rewrite the application logic. For best performance and scalability, wherever practical, query as many
+        items as you expect to need, cache them on the application side, and display small groups of results to
+        users using application logic.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+    <p class="p">
+      The following example shows how you could run a <span class="q">"paging"</span> query originally written for a traditional
+      database application. Because typical Impala queries process megabytes or gigabytes of data and read large
+      data files from disk each time, it is inefficient to run a separate query to retrieve each small group of
+      items. Use this technique only for compatibility while porting older applications, then rewrite the
+      application code to use a single query with a large result set, and display pages of results from the cached
+      result set.
+    </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; create table numbers (x int);
+[localhost:21000] &gt; insert into numbers select x from very_long_sequence;
+Inserted 1000000 rows in 1.34s
+[localhost:21000] &gt; select x from numbers order by x limit 5 offset 0;
++----+
+| x  |
++----+
+| 1  |
+| 2  |
+| 3  |
+| 4  |
+| 5  |
++----+
+[localhost:21000] &gt; select x from numbers order by x limit 5 offset 5;
++----+
+| x  |
++----+
+| 6  |
+| 7  |
+| 8  |
+| 9  |
+| 10 |
++----+
+</code></pre>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_select.html">SELECT Statement</a></div></div></nav></article></main></body></html>
\ No newline at end of file

[40/51] [partial] incubator-impala git commit: IMPALA-4181 [DOCS] Publish rendered Impala documentation to ASF site

Posted by jb...@apache.org.

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_create_function.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_create_function.html b/docs/build/html/topics/impala_create_function.html
new file mode 100644
index 0000000..ee74515
--- /dev/null
+++ b/docs/build/html/topics/impala_create_function.html
@@ -0,0 +1,502 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_langref_sql.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="create_function"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>CREATE FUNCTION Statement</title></head><body id="create_function"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">CREATE FUNCTION Statement</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      
+      Creates a user-defined function (UDF), which you can use to implement custom logic during
+      <code class="ph codeph">SELECT</code> or <code class="ph codeph">INSERT</code> operations.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+    <p class="p">
+      The syntax is different depending on whether you create a scalar UDF, which is called once for each row and
+      implemented by a single function, or a user-defined aggregate function (UDA), which is implemented by
+      multiple functions that compute intermediate results across sets of rows.
+    </p>
+
+    <p class="p">
+      In <span class="keyword">Impala 2.5</span> and higher, the syntax is also different for creating or dropping scalar Java-based UDFs.
+      The statements for Java UDFs use a new syntax, without any argument types or return type specified. Java-based UDFs
+      created using the new syntax persist across restarts of the Impala catalog server, and can be shared transparently
+      between Impala and Hive.
+    </p>
+
+    <p class="p">
+      To create a persistent scalar C++ UDF with <code class="ph codeph">CREATE FUNCTION</code>:
+    </p>
+
+<pre class="pre codeblock"><code>CREATE FUNCTION [IF NOT EXISTS] [<var class="keyword varname">db_name</var>.]<var class="keyword varname">function_name</var>([<var class="keyword varname">arg_type</var>[, <var class="keyword varname">arg_type</var>...])
+  RETURNS <var class="keyword varname">return_type</var>
+  LOCATION '<var class="keyword varname">hdfs_path_to_dot_so</var>'
+  SYMBOL='<var class="keyword varname">symbol_name</var>'</code></pre>
+
+    <div class="p">
+      To create a persistent Java UDF with <code class="ph codeph">CREATE FUNCTION</code>:
+<pre class="pre codeblock"><code>CREATE FUNCTION [IF NOT EXISTS] [<var class="keyword varname">db_name</var>.]<var class="keyword varname">function_name</var>
+  LOCATION '<var class="keyword varname">hdfs_path_to_jar</var>'
+  SYMBOL='<var class="keyword varname">class_name</var>'</code></pre>
+    </div>
+
+
+
+    <p class="p">
+      To create a persistent UDA, which must be written in C++, issue a <code class="ph codeph">CREATE AGGREGATE FUNCTION</code> statement:
+    </p>
+
+<pre class="pre codeblock"><code>CREATE [AGGREGATE] FUNCTION [IF NOT EXISTS] [<var class="keyword varname">db_name</var>.]<var class="keyword varname">function_name</var>([<var class="keyword varname">arg_type</var>[, <var class="keyword varname">arg_type</var>...])
+  RETURNS <var class="keyword varname">return_type</var>
+  LOCATION '<var class="keyword varname">hdfs_path</var>'
+  [INIT_FN='<var class="keyword varname">function</var>]
+  UPDATE_FN='<var class="keyword varname">function</var>
+  MERGE_FN='<var class="keyword varname">function</var>
+  [PREPARE_FN='<var class="keyword varname">function</var>]
+  [CLOSEFN='<var class="keyword varname">function</var>]
+  <span class="ph">[SERIALIZE_FN='<var class="keyword varname">function</var>]</span>
+  [FINALIZE_FN='<var class="keyword varname">function</var>]
+  <span class="ph">[INTERMEDIATE <var class="keyword varname">type_spec</var>]</span></code></pre>
+
+    <p class="p">
+        <strong class="ph b">Statement type:</strong> DDL
+      </p>
+
+    <p class="p">
+      <strong class="ph b">Varargs notation:</strong>
+    </p>
+
+    <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+      <p class="p">
+        Variable-length argument lists are supported for C++ UDFs, but currently not for Java UDFs.
+      </p>
+    </div>
+
+    <p class="p">
+      If the underlying implementation of your function accepts a variable number of arguments:
+    </p>
+
+    <ul class="ul">
+      <li class="li">
+        The variable arguments must go last in the argument list.
+      </li>
+
+      <li class="li">
+        The variable arguments must all be of the same type.
+      </li>
+
+      <li class="li">
+        You must include at least one instance of the variable arguments in every function call invoked from SQL.
+      </li>
+
+      <li class="li">
+        You designate the variable portion of the argument list in the <code class="ph codeph">CREATE FUNCTION</code> statement
+        by including <code class="ph codeph">...</code> immediately after the type name of the first variable argument. For
+        example, to create a function that accepts an <code class="ph codeph">INT</code> argument, followed by a
+        <code class="ph codeph">BOOLEAN</code>, followed by one or more <code class="ph codeph">STRING</code> arguments, your <code class="ph codeph">CREATE
+        FUNCTION</code> statement would look like:
+<pre class="pre codeblock"><code>CREATE FUNCTION <var class="keyword varname">func_name</var> (INT, BOOLEAN, STRING ...)
+  RETURNS <var class="keyword varname">type</var> LOCATION '<var class="keyword varname">path</var>' SYMBOL='<var class="keyword varname">entry_point</var>';
+</code></pre>
+      </li>
+    </ul>
+
+    <p class="p">
+      See <a class="xref" href="impala_udf.html#udf_varargs">Variable-Length Argument Lists</a> for how to code a C++ UDF to accept
+      variable-length argument lists.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Scalar and aggregate functions:</strong>
+    </p>
+
+    <p class="p">
+      The simplest kind of user-defined function returns a single scalar value each time it is called, typically
+      once for each row in the result set. This general kind of function is what is usually meant by UDF.
+      User-defined aggregate functions (UDAs) are a specialized kind of UDF that produce a single value based on
+      the contents of multiple rows. You usually use UDAs in combination with a <code class="ph codeph">GROUP BY</code> clause to
+      condense a large result set into a smaller one, or even a single row summarizing column values across an
+      entire table.
+    </p>
+
+    <p class="p">
+      You create UDAs by using the <code class="ph codeph">CREATE AGGREGATE FUNCTION</code> syntax. The clauses
+      <code class="ph codeph">INIT_FN</code>, <code class="ph codeph">UPDATE_FN</code>, <code class="ph codeph">MERGE_FN</code>,
+      <span class="ph"><code class="ph codeph">SERIALIZE_FN</code>,</span> <code class="ph codeph">FINALIZE_FN</code>, and
+      <code class="ph codeph">INTERMEDIATE</code> only apply when you create a UDA rather than a scalar UDF.
+    </p>
+
+    <p class="p">
+      The <code class="ph codeph">*_FN</code> clauses specify functions to call at different phases of function processing.
+    </p>
+
+    <ul class="ul">
+      <li class="li">
+        <strong class="ph b">Initialize:</strong> The function you specify with the <code class="ph codeph">INIT_FN</code> clause does any initial
+        setup, such as initializing member variables in internal data structures. This function is often a stub for
+        simple UDAs. You can omit this clause and a default (no-op) function will be used.
+      </li>
+
+      <li class="li">
+        <strong class="ph b">Update:</strong> The function you specify with the <code class="ph codeph">UPDATE_FN</code> clause is called once for each
+        row in the original result set, that is, before any <code class="ph codeph">GROUP BY</code> clause is applied. A separate
+        instance of the function is called for each different value returned by the <code class="ph codeph">GROUP BY</code>
+        clause. The final argument passed to this function is a pointer, to which you write an updated value based
+        on its original value and the value of the first argument.
+      </li>
+
+      <li class="li">
+        <strong class="ph b">Merge:</strong> The function you specify with the <code class="ph codeph">MERGE_FN</code> clause is called an arbitrary
+        number of times, to combine intermediate values produced by different nodes or different threads as Impala
+        reads and processes data files in parallel. The final argument passed to this function is a pointer, to
+        which you write an updated value based on its original value and the value of the first argument.
+      </li>
+
+      <li class="li">
+        <strong class="ph b">Serialize:</strong> The function you specify with the <code class="ph codeph">SERIALIZE_FN</code> clause frees memory
+        allocated to intermediate results. It is required if any memory was allocated by the Allocate function in
+        the Init, Update, or Merge functions, or if the intermediate type contains any pointers. See
+        <span class="xref">the UDA code samples</span> for details.
+      </li>
+
+      <li class="li">
+        <strong class="ph b">Finalize:</strong> The function you specify with the <code class="ph codeph">FINALIZE_FN</code> clause does any required
+        teardown for resources acquired by your UDF, such as freeing memory, closing file handles if you explicitly
+        opened any files, and so on. This function is often a stub for simple UDAs. You can omit this clause and a
+        default (no-op) function will be used. It is required in UDAs where the final return type is different than
+        the intermediate type. or if any memory was allocated by the Allocate function in the Init, Update, or
+        Merge functions. See <span class="xref">the UDA code samples</span> for details.
+      </li>
+    </ul>
+
+    <p class="p">
+      If you use a consistent naming convention for each of the underlying functions, Impala can automatically
+      determine the names based on the first such clause, so the others are optional.
+    </p>
+
+    
+
+    <p class="p">
+      For end-to-end examples of UDAs, see <a class="xref" href="impala_udf.html#udfs">Impala User-Defined Functions (UDFs)</a>.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Complex type considerations:</strong>
+      </p>
+
+    <p class="p">
+        Currently, Impala UDFs cannot accept arguments or return values of the Impala complex types
+        (<code class="ph codeph">STRUCT</code>, <code class="ph codeph">ARRAY</code>, or <code class="ph codeph">MAP</code>).
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+    <ul class="ul">
+      <li class="li">
+        You can write Impala UDFs in either C++ or Java. C++ UDFs are new to Impala, and are the recommended format
+        for high performance utilizing native code. Java-based UDFs are compatible between Impala and Hive, and are
+        most suited to reusing existing Hive UDFs. (Impala can run Java-based Hive UDFs but not Hive UDAs.)
+      </li>
+
+      <li class="li">
+        <span class="keyword">Impala 2.5</span> introduces UDF improvements to persistence for both C++ and Java UDFs,
+        and better compatibility between Impala and Hive for Java UDFs.
+        See <a class="xref" href="impala_udf.html#udfs">Impala User-Defined Functions (UDFs)</a> for details.
+      </li>
+
+      <li class="li">
+        The body of the UDF is represented by a <code class="ph codeph">.so</code> or <code class="ph codeph">.jar</code> file, which you store
+        in HDFS and the <code class="ph codeph">CREATE FUNCTION</code> statement distributes to each Impala node.
+      </li>
+
+      <li class="li">
+        Impala calls the underlying code during SQL statement evaluation, as many times as needed to process all
+        the rows from the result set. All UDFs are assumed to be deterministic, that is, to always return the same
+        result when passed the same argument values. Impala might or might not skip some invocations of a UDF if
+        the result value is already known from a previous call. Therefore, do not rely on the UDF being called a
+        specific number of times, and do not return different result values based on some external factor such as
+        the current time, a random number function, or an external data source that could be updated while an
+        Impala query is in progress.
+      </li>
+
+      <li class="li">
+        The names of the function arguments in the UDF are not significant, only their number, positions, and data
+        types.
+      </li>
+
+      <li class="li">
+        You can overload the same function name by creating multiple versions of the function, each with a
+        different argument signature. For security reasons, you cannot make a UDF with the same name as any
+        built-in function.
+      </li>
+
+      <li class="li">
+        In the UDF code, you represent the function return result as a <code class="ph codeph">struct</code>. This
+        <code class="ph codeph">struct</code> contains 2 fields. The first field is a <code class="ph codeph">boolean</code> representing
+        whether the value is <code class="ph codeph">NULL</code> or not. (When this field is <code class="ph codeph">true</code>, the return
+        value is interpreted as <code class="ph codeph">NULL</code>.) The second field is the same type as the specified function
+        return type, and holds the return value when the function returns something other than
+        <code class="ph codeph">NULL</code>.
+      </li>
+
+      <li class="li">
+        In the UDF code, you represent the function arguments as an initial pointer to a UDF context structure,
+        followed by references to zero or more <code class="ph codeph">struct</code>s, corresponding to each of the arguments.
+        Each <code class="ph codeph">struct</code> has the same 2 fields as with the return value, a <code class="ph codeph">boolean</code>
+        field representing whether the argument is <code class="ph codeph">NULL</code>, and a field of the appropriate type
+        holding any non-<code class="ph codeph">NULL</code> argument value.
+      </li>
+
+      <li class="li">
+        For sample code and build instructions for UDFs,
+        see <span class="xref">the sample UDFs in the Impala github repo</span>.
+      </li>
+
+      <li class="li">
+        Because the file representing the body of the UDF is stored in HDFS, it is automatically available to all
+        the Impala nodes. You do not need to manually copy any UDF-related files between servers.
+      </li>
+
+      <li class="li">
+        Because Impala currently does not have any <code class="ph codeph">ALTER FUNCTION</code> statement, if you need to rename
+        a function, move it to a different database, or change its signature or other properties, issue a
+        <code class="ph codeph">DROP FUNCTION</code> statement for the original function followed by a <code class="ph codeph">CREATE
+        FUNCTION</code> with the desired properties.
+      </li>
+
+      <li class="li">
+        Because each UDF is associated with a particular database, either issue a <code class="ph codeph">USE</code> statement
+        before doing any <code class="ph codeph">CREATE FUNCTION</code> statements, or specify the name of the function as
+        <code class="ph codeph"><var class="keyword varname">db_name</var>.<var class="keyword varname">function_name</var></code>.
+      </li>
+    </ul>
+
+    <p class="p">
+        If you connect to different Impala nodes within an <span class="keyword cmdname">impala-shell</span> session for
+        load-balancing purposes, you can enable the <code class="ph codeph">SYNC_DDL</code> query option to make each DDL
+        statement wait before returning, until the new or changed metadata has been received by all the Impala
+        nodes. See <a class="xref" href="../shared/../topics/impala_sync_ddl.html#sync_ddl">SYNC_DDL Query Option</a> for details.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Compatibility:</strong>
+      </p>
+
+    <p class="p">
+      Impala can run UDFs that were created through Hive, as long as they refer to Impala-compatible data types
+      (not composite or nested column types). Hive can run Java-based UDFs that were created through Impala, but
+      not Impala UDFs written in C++.
+    </p>
+
+    <p class="p">
+        The Hive <code class="ph codeph">current_user()</code> function cannot be
+        called from a Java UDF through Impala.
+      </p>
+
+    <p class="p"><strong class="ph b">Persistence:</strong></p>
+
+    <p class="p">
+        In <span class="keyword">Impala 2.5</span> and higher, Impala UDFs and UDAs written in C++ are persisted in the metastore database.
+        Java UDFs are also persisted, if they were created with the new <code class="ph codeph">CREATE FUNCTION</code> syntax for Java UDFs,
+        where the Java function argument and return types are omitted.
+        Java-based UDFs created with the old <code class="ph codeph">CREATE FUNCTION</code> syntax do not persist across restarts
+        because they are held in the memory of the <span class="keyword cmdname">catalogd</span> daemon.
+        Until you re-create such Java UDFs using the new <code class="ph codeph">CREATE FUNCTION</code> syntax,
+        you must reload those Java-based UDFs by running the original <code class="ph codeph">CREATE FUNCTION</code> statements again each time
+        you restart the <span class="keyword cmdname">catalogd</span> daemon.
+        Prior to <span class="keyword">Impala 2.5</span> the requirement to reload functions after a restart applied to both C++ and Java functions.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Cancellation:</strong> Cannot be cancelled.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">HDFS permissions:</strong> This statement does not touch any HDFS files or directories,
+        therefore no HDFS permissions are required.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+    <p class="p">
+      For additional examples of all kinds of user-defined functions, see <a class="xref" href="impala_udf.html#udfs">Impala User-Defined Functions (UDFs)</a>.
+    </p>
+
+    <p class="p">
+      The following example shows how to take a Java jar file and make all the functions inside one of its classes
+      into UDFs under a single (overloaded) function name in Impala. Each <code class="ph codeph">CREATE FUNCTION</code> or
+      <code class="ph codeph">DROP FUNCTION</code> statement applies to all the overloaded Java functions with the same name.
+      This example uses the signatureless syntax for <code class="ph codeph">CREATE FUNCTION</code> and <code class="ph codeph">DROP FUNCTION</code>,
+      which is available in <span class="keyword">Impala 2.5</span> and higher.
+    </p>
+    <p class="p">
+      At the start, the jar file is in the local filesystem. Then it is copied into HDFS, so that it is
+      available for Impala to reference through the <code class="ph codeph">CREATE FUNCTION</code> statement and
+      queries that refer to the Impala function name.
+    </p>
+<pre class="pre codeblock"><code>
+$ jar -tvf udf-examples.jar
+     0 Mon Feb 22 04:06:50 PST 2016 META-INF/
+   122 Mon Feb 22 04:06:48 PST 2016 META-INF/MANIFEST.MF
+     0 Mon Feb 22 04:06:46 PST 2016 org/
+     0 Mon Feb 22 04:06:46 PST 2016 org/apache/
+     0 Mon Feb 22 04:06:46 PST 2016 org/apache/impala/
+  2460 Mon Feb 22 04:06:46 PST 2016 org/apache/impala/IncompatibleUdfTest.class
+   541 Mon Feb 22 04:06:46 PST 2016 org/apache/impala/TestUdfException.class
+  3438 Mon Feb 22 04:06:46 PST 2016 org/apache/impala/JavaUdfTest.class
+  5872 Mon Feb 22 04:06:46 PST 2016 org/apache/impala/TestUdf.class
+...
+$ hdfs dfs -put udf-examples.jar /user/impala/udfs
+$ hdfs dfs -ls /user/impala/udfs
+Found 2 items
+-rw-r--r--   3 jrussell supergroup        853 2015-10-09 14:05 /user/impala/udfs/hello_world.jar
+-rw-r--r--   3 jrussell supergroup       7366 2016-06-08 14:25 /user/impala/udfs/udf-examples.jar
+</code></pre>
+    <p class="p">
+      In <span class="keyword cmdname">impala-shell</span>, the <code class="ph codeph">CREATE FUNCTION</code> refers to the HDFS path of the jar file
+      and the fully qualified class name inside the jar. Each of the functions inside the class becomes an
+      Impala function, each one overloaded under the specified Impala function name.
+    </p>
+<pre class="pre codeblock"><code>
+[localhost:21000] &gt; create function testudf location '/user/impala/udfs/udf-examples.jar' symbol='org.apache.impala.TestUdf';
+[localhost:21000] &gt; show functions;
++-------------+---------------------------------------+-------------+---------------+
+| return type | signature                             | binary type | is persistent |
++-------------+---------------------------------------+-------------+---------------+
+| BIGINT      | testudf(BIGINT)                       | JAVA        | true          |
+| BOOLEAN     | testudf(BOOLEAN)                      | JAVA        | true          |
+| BOOLEAN     | testudf(BOOLEAN, BOOLEAN)             | JAVA        | true          |
+| BOOLEAN     | testudf(BOOLEAN, BOOLEAN, BOOLEAN)    | JAVA        | true          |
+| DOUBLE      | testudf(DOUBLE)                       | JAVA        | true          |
+| DOUBLE      | testudf(DOUBLE, DOUBLE)               | JAVA        | true          |
+| DOUBLE      | testudf(DOUBLE, DOUBLE, DOUBLE)       | JAVA        | true          |
+| FLOAT       | testudf(FLOAT)                        | JAVA        | true          |
+| FLOAT       | testudf(FLOAT, FLOAT)                 | JAVA        | true          |
+| FLOAT       | testudf(FLOAT, FLOAT, FLOAT)          | JAVA        | true          |
+| INT         | testudf(INT)                          | JAVA        | true          |
+| DOUBLE      | testudf(INT, DOUBLE)                  | JAVA        | true          |
+| INT         | testudf(INT, INT)                     | JAVA        | true          |
+| INT         | testudf(INT, INT, INT)                | JAVA        | true          |
+| SMALLINT    | testudf(SMALLINT)                     | JAVA        | true          |
+| SMALLINT    | testudf(SMALLINT, SMALLINT)           | JAVA        | true          |
+| SMALLINT    | testudf(SMALLINT, SMALLINT, SMALLINT) | JAVA        | true          |
+| STRING      | testudf(STRING)                       | JAVA        | true          |
+| STRING      | testudf(STRING, STRING)               | JAVA        | true          |
+| STRING      | testudf(STRING, STRING, STRING)       | JAVA        | true          |
+| TINYINT     | testudf(TINYINT)                      | JAVA        | true          |
++-------------+---------------------------------------+-------------+---------------+
+</code></pre>
+    <p class="p">
+      These are all simple functions that return their single arguments, or
+      sum, concatenate, and so on their multiple arguments. Impala determines which
+      overloaded function to use based on the number and types of the arguments.
+    </p>
+<pre class="pre codeblock"><code>
+insert into bigint_x values (1), (2), (4), (3);
+select testudf(x) from bigint_x;
++-----------------+
+| udfs.testudf(x) |
++-----------------+
+| 1               |
+| 2               |
+| 4               |
+| 3               |
++-----------------+
+
+insert into int_x values (1), (2), (4), (3);
+select testudf(x, x+1, x*x) from int_x;
++-------------------------------+
+| udfs.testudf(x, x + 1, x * x) |
++-------------------------------+
+| 4                             |
+| 9                             |
+| 25                            |
+| 16                            |
++-------------------------------+
+
+select testudf(x) from string_x;
++-----------------+
+| udfs.testudf(x) |
++-----------------+
+| one             |
+| two             |
+| four            |
+| three           |
++-----------------+
+select testudf(x,x) from string_x;
++--------------------+
+| udfs.testudf(x, x) |
++--------------------+
+| oneone             |
+| twotwo             |
+| fourfour           |
+| threethree         |
++--------------------+
+</code></pre>
+
+    <p class="p">
+      The previous example used the same Impala function name as the name of the class.
+      This example shows how the Impala function name is independent of the underlying
+      Java class or function names. A second <code class="ph codeph">CREATE FUNCTION</code> statement
+      results in a set of overloaded functions all named <code class="ph codeph">my_func</code>,
+      to go along with the overloaded functions all named <code class="ph codeph">testudf</code>.
+    </p>
+<pre class="pre codeblock"><code>
+create function my_func location '/user/impala/udfs/udf-examples.jar'
+  symbol='org.apache.impala.TestUdf';
+
+show functions;
++-------------+---------------------------------------+-------------+---------------+
+| return type | signature                             | binary type | is persistent |
++-------------+---------------------------------------+-------------+---------------+
+| BIGINT      | my_func(BIGINT)                       | JAVA        | true          |
+| BOOLEAN     | my_func(BOOLEAN)                      | JAVA        | true          |
+| BOOLEAN     | my_func(BOOLEAN, BOOLEAN)             | JAVA        | true          |
+...
+| BIGINT      | testudf(BIGINT)                       | JAVA        | true          |
+| BOOLEAN     | testudf(BOOLEAN)                      | JAVA        | true          |
+| BOOLEAN     | testudf(BOOLEAN, BOOLEAN)             | JAVA        | true          |
+...
+</code></pre>
+    <p class="p">
+      The corresponding <code class="ph codeph">DROP FUNCTION</code> statement with no signature
+      drops all the overloaded functions with that name.
+    </p>
+<pre class="pre codeblock"><code>
+drop function my_func;
+show functions;
++-------------+---------------------------------------+-------------+---------------+
+| return type | signature                             | binary type | is persistent |
++-------------+---------------------------------------+-------------+---------------+
+| BIGINT      | testudf(BIGINT)                       | JAVA        | true          |
+| BOOLEAN     | testudf(BOOLEAN)                      | JAVA        | true          |
+| BOOLEAN     | testudf(BOOLEAN, BOOLEAN)             | JAVA        | true          |
+...
+</code></pre>
+    <p class="p">
+      The signatureless <code class="ph codeph">CREATE FUNCTION</code> syntax for Java UDFs ensures that
+      the functions shown in this example remain available after the Impala service
+      (specifically, the Catalog Server) are restarted.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+    <p class="p">
+      <a class="xref" href="impala_udf.html#udfs">Impala User-Defined Functions (UDFs)</a> for more background information, usage instructions, and examples for
+      Impala UDFs; <a class="xref" href="impala_drop_function.html#drop_function">DROP FUNCTION Statement</a>
+    </p>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_langref_sql.html">Impala SQL Statements</a></div></div></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_create_role.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_create_role.html b/docs/build/html/topics/impala_create_role.html
new file mode 100644
index 0000000..ae4bbd8
--- /dev/null
+++ b/docs/build/html/topics/impala_create_role.html
@@ -0,0 +1,70 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_langref_sql.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="create_role"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>CREATE ROLE Statement (Impala 2.0 or higher only)</title></head><body id="create_role"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">CREATE ROLE Statement (<span class="keyword">Impala 2.0</span> or higher only)</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      
+
+      The <code class="ph codeph">CREATE ROLE</code> statement creates a role to which privileges can be granted. Privileges can
+      be granted to roles, which can then be assigned to users. A user that has been assigned a role will only be
+      able to exercise the privileges of that role. Only users that have administrative privileges can create/drop
+      roles. By default, the <code class="ph codeph">hive</code>, <code class="ph codeph">impala</code> and <code class="ph codeph">hue</code> users have
+      administrative privileges in Sentry.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>CREATE ROLE <var class="keyword varname">role_name</var>
+</code></pre>
+
+    <p class="p">
+        <strong class="ph b">Required privileges:</strong>
+      </p>
+
+    <p class="p">
+      Only administrative users (those with <code class="ph codeph">ALL</code> privileges on the server, defined in the Sentry
+      policy file) can use this statement.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Compatibility:</strong>
+      </p>
+
+    <p class="p">
+      Impala makes use of any roles and privileges specified by the <code class="ph codeph">GRANT</code> and
+      <code class="ph codeph">REVOKE</code> statements in Hive, and Hive makes use of any roles and privileges specified by the
+      <code class="ph codeph">GRANT</code> and <code class="ph codeph">REVOKE</code> statements in Impala. The Impala <code class="ph codeph">GRANT</code>
+      and <code class="ph codeph">REVOKE</code> statements for privileges do not require the <code class="ph codeph">ROLE</code> keyword to be
+      repeated before each role name, unlike the equivalent Hive statements.
+    </p>
+
+
+
+    <p class="p">
+        <strong class="ph b">Cancellation:</strong> Cannot be cancelled.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">HDFS permissions:</strong> This statement does not touch any HDFS files or directories,
+        therefore no HDFS permissions are required.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+    <p class="p">
+      <a class="xref" href="impala_authorization.html#authorization">Enabling Sentry Authorization for Impala</a>, <a class="xref" href="impala_grant.html#grant">GRANT Statement (Impala 2.0 or higher only)</a>,
+      <a class="xref" href="impala_revoke.html#revoke">REVOKE Statement (Impala 2.0 or higher only)</a>, <a class="xref" href="impala_drop_role.html#drop_role">DROP ROLE Statement (Impala 2.0 or higher only)</a>,
+      <a class="xref" href="impala_show.html#show">SHOW Statement</a>
+    </p>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_langref_sql.html">Impala SQL Statements</a></div></div></nav></article></main></body></html>
\ No newline at end of file

[05/51] [partial] incubator-impala git commit: IMPALA-4181 [DOCS] Publish rendered Impala documentation to ASF site

Posted by jb...@apache.org.

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_smallint.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_smallint.html b/docs/build/html/topics/impala_smallint.html
new file mode 100644
index 0000000..cd48e90
--- /dev/null
+++ b/docs/build/html/topics/impala_smallint.html
@@ -0,0 +1,125 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_datatypes.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="smallint"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>SMALLINT Data Type</title></head><body id="smallint"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">SMALLINT Data Type</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      A 2-byte integer data type used in <code class="ph codeph">CREATE TABLE</code> and <code class="ph codeph">ALTER TABLE</code> statements.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+    <p class="p">
+      In the column definition of a <code class="ph codeph">CREATE TABLE</code> statement:
+    </p>
+
+<pre class="pre codeblock"><code><var class="keyword varname">column_name</var> SMALLINT</code></pre>
+
+    <p class="p">
+      <strong class="ph b">Range:</strong> -32768 .. 32767. There is no <code class="ph codeph">UNSIGNED</code> subtype.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Conversions:</strong> Impala automatically converts to a larger integer type (<code class="ph codeph">INT</code> or
+      <code class="ph codeph">BIGINT</code>) or a floating-point type (<code class="ph codeph">FLOAT</code> or <code class="ph codeph">DOUBLE</code>)
+      automatically. Use <code class="ph codeph">CAST()</code> to convert to <code class="ph codeph">TINYINT</code>, <code class="ph codeph">STRING</code>,
+      or <code class="ph codeph">TIMESTAMP</code>.
+      <span class="ph">Casting an integer or floating-point value <code class="ph codeph">N</code> to
+        <code class="ph codeph">TIMESTAMP</code> produces a value that is <code class="ph codeph">N</code> seconds past the start of the epoch
+        date (January 1, 1970). By default, the result value represents a date and time in the UTC time zone.
+        If the setting <code class="ph codeph">-use_local_tz_for_unix_timestamp_conversions=true</code> is in effect,
+        the resulting <code class="ph codeph">TIMESTAMP</code> represents a date and time in the local time zone.</span>
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+    <p class="p">
+      For a convenient and automated way to check the bounds of the <code class="ph codeph">SMALLINT</code> type, call the
+      functions <code class="ph codeph">MIN_SMALLINT()</code> and <code class="ph codeph">MAX_SMALLINT()</code>.
+    </p>
+
+    <p class="p">
+      If an integer value is too large to be represented as a <code class="ph codeph">SMALLINT</code>, use an
+      <code class="ph codeph">INT</code> instead.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">NULL considerations:</strong> Casting any non-numeric value to this type produces a <code class="ph codeph">NULL</code>
+        value.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>CREATE TABLE t1 (x SMALLINT);
+SELECT CAST(1000 AS SMALLINT);
+</code></pre>
+
+    <p class="p">
+        <strong class="ph b">Parquet considerations:</strong>
+      </p>
+
+
+
+    <p class="p">
+      Physically, Parquet files represent <code class="ph codeph">TINYINT</code> and <code class="ph codeph">SMALLINT</code> values as 32-bit
+      integers. Although Impala rejects attempts to insert out-of-range values into such columns, if you create a
+      new table with the <code class="ph codeph">CREATE TABLE ... LIKE PARQUET</code> syntax, any <code class="ph codeph">TINYINT</code> or
+      <code class="ph codeph">SMALLINT</code> columns in the original table turn into <code class="ph codeph">INT</code> columns in the new
+      table.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Partitioning:</strong> Prefer to use this type for a partition key column. Impala can process the numeric
+        type more efficiently than a <code class="ph codeph">STRING</code> representation of the value.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">HBase considerations:</strong> This data type is fully compatible with HBase tables.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Text table considerations:</strong> Values of this type are potentially larger in text tables than in tables
+        using Parquet or other binary formats.
+      </p>
+
+
+
+    <p class="p">
+        <strong class="ph b">Internal details:</strong> Represented in memory as a 2-byte value.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Added in:</strong> Available in all versions of Impala.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Column statistics considerations:</strong> Because this type has a fixed size, the maximum and average size
+        fields are always filled in for column statistics, even before you run the <code class="ph codeph">COMPUTE STATS</code>
+        statement.
+      </p>
+
+
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+    <p class="p">
+      <a class="xref" href="impala_literals.html#numeric_literals">Numeric Literals</a>, <a class="xref" href="impala_tinyint.html#tinyint">TINYINT Data Type</a>,
+      <a class="xref" href="impala_smallint.html#smallint">SMALLINT Data Type</a>, <a class="xref" href="impala_int.html#int">INT Data Type</a>,
+      <a class="xref" href="impala_bigint.html#bigint">BIGINT Data Type</a>, <a class="xref" href="impala_decimal.html#decimal">DECIMAL Data Type (Impala 1.4 or higher only)</a>,
+      <a class="xref" href="impala_math_functions.html#math_functions">Impala Mathematical Functions</a>
+    </p>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_datatypes.html">Data Types</a></div></div></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_ssl.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_ssl.html b/docs/build/html/topics/impala_ssl.html
new file mode 100644
index 0000000..a9b4d25
--- /dev/null
+++ b/docs/build/html/topics/impala_ssl.html
@@ -0,0 +1,119 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_security.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="ssl"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Configuring TLS/SSL for Impala</title></head><body id="ssl"><main role="main"><article role="article" aria-labelledby="ssl__tls">
+
+  <h1 class="title topictitle1" id="ssl__tls">Configuring TLS/SSL for Impala</h1>
+
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      
+      Impala supports TLS/SSL network encryption, between Impala and client
+      programs, and between the Impala-related daemons running on different nodes
+      in the cluster. This feature is important when you also use other features such as Kerberos
+      authentication or Sentry authorization, where credentials are being
+      transmitted back and forth.
+    </p>
+
+  </div>
+
+  <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_security.html">Impala Security</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="ssl__concept_q1p_j2d_rp">
+
+    <h2 class="title topictitle2" id="ariaid-title2">Using the Command Line</h2>
+
+    <div class="body conbody">
+      <p class="p">
+        To enable SSL for when client applications connect to Impala, add both of the following flags to the <span class="keyword cmdname">impalad</span> startup options:
+      </p>
+
+      <ul class="ul" id="concept_q1p_j2d_rp__ul_i2p_m2d_rp">
+        <li class="li">
+          <code class="ph codeph">--ssl_server_certificate</code>: the full path to the server certificate, on the local filesystem.
+        </li>
+
+        <li class="li">
+          <code class="ph codeph">--ssl_private_key</code>: the full path to the server private key, on the local filesystem.
+        </li>
+      </ul>
+
+      <p class="p">
+        In <span class="keyword">Impala 2.3</span> and higher, Impala can also use SSL for its own internal communication between the
+        <span class="keyword cmdname">impalad</span>, <code class="ph codeph">statestored</code>, and <code class="ph codeph">catalogd</code> daemons.
+        To enable this additional SSL encryption, set the <code class="ph codeph">--ssl_server_certificate</code>
+        and <code class="ph codeph">--ssl_private_key</code> flags in the startup options for
+        <span class="keyword cmdname">impalad</span>, <span class="keyword cmdname">catalogd</span>, and <span class="keyword cmdname">statestored</span>,
+        and also add the <code class="ph codeph">--ssl_client_ca_certificate</code> flag for all three of those daemons.
+      </p>
+
+      <div class="note warning note_warning"><span class="note__title warningtitle">Warning:</span> 
+        Prior to <span class="keyword">Impala 2.3.2</span>, you could enable Kerberos authentication between Impala internal components,
+        or SSL encryption between Impala internal components, but not both at the same time.
+        This restriction has now been lifted.
+        See <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2598" target="_blank">IMPALA-2598</a>
+        to see the maintenance releases for different levels of Impala where the fix has been published.
+      </div>
+
+      <p class="p">
+        If either of these flags are set, both must be set. In that case, Impala starts listening for Beeswax and HiveServer2 requests on
+        SSL-secured ports only. (The port numbers stay the same; see <a class="xref" href="impala_ports.html#ports">Ports Used by Impala</a> for details.)
+      </p>
+
+      <p class="p">
+        Since Impala uses passphrase-less certificates in PEM format, you can reuse a host's existing Java keystore
+        by using the <code class="ph codeph">openssl</code> toolkit to convert it to the PEM format.
+      </p>
+
+      <section class="section" id="concept_q1p_j2d_rp__secref"><h3 class="title sectiontitle">Configuring TLS/SSL Communication for the Impala Shell</h3>
+
+        
+
+        <p class="p">
+          With SSL enabled for Impala, use the following options when starting the
+          <span class="keyword cmdname">impala-shell</span> interpreter:
+        </p>
+
+        <ul class="ul" id="concept_q1p_j2d_rp__ul_kgp_m2d_rp">
+          <li class="li">
+            <code class="ph codeph">--ssl</code>: enables TLS/SSL for <span class="keyword cmdname">impala-shell</span>.
+          </li>
+
+          <li class="li">
+            <code class="ph codeph">--ca_cert</code>: the local pathname pointing to the third-party CA certificate, or to a copy of the server
+            certificate for self-signed server certificates.
+          </li>
+        </ul>
+
+        <p class="p">
+          If <code class="ph codeph">--ca_cert</code> is not set, <span class="keyword cmdname">impala-shell</span> enables TLS/SSL, but does not validate the server
+          certificate. This is useful for connecting to a known-good Impala that is only running over TLS/SSL, when a copy of the
+          certificate is not available (such as when debugging customer installations).
+        </p>
+
+      </section>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="ssl__ssl_jdbc_odbc">
+    <h2 class="title topictitle2" id="ariaid-title3">Using TLS/SSL with Business Intelligence Tools</h2>
+    <div class="body conbody">
+      <p class="p">
+        You can use Kerberos authentication, TLS/SSL encryption, or both to secure
+        connections from JDBC and ODBC applications to Impala.
+        See <a class="xref" href="impala_jdbc.html#impala_jdbc">Configuring Impala to Work with JDBC</a> and <a class="xref" href="impala_odbc.html#impala_odbc">Configuring Impala to Work with ODBC</a>
+        for details.
+      </p>
+
+      <p class="p">
+        Prior to <span class="keyword">Impala 2.5</span>, the Hive JDBC driver did not support connections that use both Kerberos authentication
+        and SSL encryption. If your cluster is running an older release that has this restriction,
+        use an alternative JDBC driver that supports
+        both of these security features.
+      </p>
+    </div>
+  </article>
+
+</article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_stddev.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_stddev.html b/docs/build/html/topics/impala_stddev.html
new file mode 100644
index 0000000..4a14e14
--- /dev/null
+++ b/docs/build/html/topics/impala_stddev.html
@@ -0,0 +1,121 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_aggregate_functions.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="stddev"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>STDDEV, STDDEV_SAMP, STDDEV_POP Functions</title></head><body id="stddev"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">STDDEV, STDDEV_SAMP, STDDEV_POP Functions</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      
+      
+      
+      An aggregate function that
+      <a class="xref" href="http://en.wikipedia.org/wiki/Standard_deviation" target="_blank">standard
+      deviation</a> of a set of numbers.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>{ STDDEV | STDDEV_SAMP | STDDEV_POP } ([DISTINCT | ALL] <var class="keyword varname">expression</var>)</code></pre>
+
+    <p class="p">
+      This function works with any numeric data type.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Return type:</strong> <code class="ph codeph">DOUBLE</code> in Impala 2.0 and higher; <code class="ph codeph">STRING</code> in earlier
+        releases
+      </p>
+
+    <p class="p">
+      This function is typically used in mathematical formulas related to probability distributions.
+    </p>
+
+    <p class="p">
+      The <code class="ph codeph">STDDEV_POP()</code> and <code class="ph codeph">STDDEV_SAMP()</code> functions compute the population
+      standard deviation and sample standard deviation, respectively, of the input values.
+      (<code class="ph codeph">STDDEV()</code> is an alias for <code class="ph codeph">STDDEV_SAMP()</code>.) Both functions evaluate all input
+      rows matched by the query. The difference is that <code class="ph codeph">STDDEV_SAMP()</code> is scaled by
+      <code class="ph codeph">1/(N-1)</code> while <code class="ph codeph">STDDEV_POP()</code> is scaled by <code class="ph codeph">1/N</code>.
+    </p>
+
+    <p class="p">
+      If no input rows match the query, the result of any of these functions is <code class="ph codeph">NULL</code>. If a single
+      input row matches the query, the result of any of these functions is <code class="ph codeph">"0.0"</code>.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+    <p class="p">
+      This example demonstrates how <code class="ph codeph">STDDEV()</code> and <code class="ph codeph">STDDEV_SAMP()</code> return the same
+      result, while <code class="ph codeph">STDDEV_POP()</code> uses a slightly different calculation to reflect that the input
+      data is considered part of a larger <span class="q">"population"</span>.
+    </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; select stddev(score) from test_scores;
++---------------+
+| stddev(score) |
++---------------+
+| 28.5          |
++---------------+
+[localhost:21000] &gt; select stddev_samp(score) from test_scores;
++--------------------+
+| stddev_samp(score) |
++--------------------+
+| 28.5               |
++--------------------+
+[localhost:21000] &gt; select stddev_pop(score) from test_scores;
++-------------------+
+| stddev_pop(score) |
++-------------------+
+| 28.4858           |
++-------------------+
+</code></pre>
+
+    <p class="p">
+      This example demonstrates that, because the return value of these aggregate functions is a
+      <code class="ph codeph">STRING</code>, you must currently convert the result with <code class="ph codeph">CAST</code>.
+    </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; create table score_stats as select cast(stddev(score) as decimal(7,4)) `standard_deviation`, cast(variance(score) as decimal(7,4)) `variance` from test_scores;
++-------------------+
+| summary           |
++-------------------+
+| Inserted 1 row(s) |
++-------------------+
+[localhost:21000] &gt; desc score_stats;
++--------------------+--------------+---------+
+| name               | type         | comment |
++--------------------+--------------+---------+
+| standard_deviation | decimal(7,4) |         |
+| variance           | decimal(7,4) |         |
++--------------------+--------------+---------+
+</code></pre>
+
+    <p class="p">
+        <strong class="ph b">Restrictions:</strong>
+      </p>
+
+    <p class="p">
+        This function cannot be used in an analytic context. That is, the <code class="ph codeph">OVER()</code> clause is not allowed at all with this function.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+    <p class="p">
+      The <code class="ph codeph">STDDEV()</code>, <code class="ph codeph">STDDEV_POP()</code>, and <code class="ph codeph">STDDEV_SAMP()</code> functions
+      compute the standard deviation (square root of the variance) based on the results of
+      <code class="ph codeph">VARIANCE()</code>, <code class="ph codeph">VARIANCE_POP()</code>, and <code class="ph codeph">VARIANCE_SAMP()</code>
+      respectively. See <a class="xref" href="impala_variance.html#variance">VARIANCE, VARIANCE_SAMP, VARIANCE_POP, VAR_SAMP, VAR_POP Functions</a> for details about the variance property.
+    </p>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_aggregate_functions.html">Impala Aggregate Functions</a></div></div></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_string.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_string.html b/docs/build/html/topics/impala_string.html
new file mode 100644
index 0000000..60714c7
--- /dev/null
+++ b/docs/build/html/topics/impala_string.html
@@ -0,0 +1,197 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_datatypes.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="string"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>STRING Data Type</title></head><body id="string"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">STRING Data Type</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      A data type used in <code class="ph codeph">CREATE TABLE</code> and <code class="ph codeph">ALTER TABLE</code> statements.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+    <p class="p">
+      In the column definition of a <code class="ph codeph">CREATE TABLE</code> statement:
+    </p>
+
+<pre class="pre codeblock"><code><var class="keyword varname">column_name</var> STRING</code></pre>
+
+    <p class="p">
+      <strong class="ph b">Length:</strong> Maximum of 32,767 bytes. Do not use any length constraint when declaring
+      <code class="ph codeph">STRING</code> columns, as you might be familiar with from <code class="ph codeph">VARCHAR</code>,
+      <code class="ph codeph">CHAR</code>, or similar column types from relational database systems. <span class="ph">If you do
+      need to manipulate string values with precise or maximum lengths, in Impala 2.0 and higher you can declare
+      columns as <code class="ph codeph">VARCHAR(<var class="keyword varname">max_length</var>)</code> or
+      <code class="ph codeph">CHAR(<var class="keyword varname">length</var>)</code>, but for best performance use <code class="ph codeph">STRING</code>
+      where practical.</span>
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Character sets:</strong> For full support in all Impala subsystems, restrict string values to the ASCII
+      character set. Although some UTF-8 character data can be stored in Impala and retrieved through queries, UTF-8 strings
+      containing non-ASCII characters are not guaranteed to work properly in combination with many SQL aspects,
+      including but not limited to:
+    </p>
+    <ul class="ul">
+      <li class="li">
+        String manipulation functions.
+      </li>
+      <li class="li">
+        Comparison operators.
+      </li>
+      <li class="li">
+        The <code class="ph codeph">ORDER BY</code> clause.
+      </li>
+      <li class="li">
+        Values in partition key columns.
+      </li>
+    </ul>
+
+    <p class="p">
+      For any national language aspects such as
+      collation order or interpreting extended ASCII variants such as ISO-8859-1 or ISO-8859-2 encodings, Impala
+      does not include such metadata with the table definition. If you need to sort, manipulate, or display data
+      depending on those national language characteristics of string data, use logic on the application side.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Conversions:</strong>
+    </p>
+
+    <ul class="ul">
+      <li class="li">
+        <p class="p">
+          Impala does not automatically convert <code class="ph codeph">STRING</code> to any numeric type. Impala does
+          automatically convert <code class="ph codeph">STRING</code> to <code class="ph codeph">TIMESTAMP</code> if the value matches one of
+          the accepted <code class="ph codeph">TIMESTAMP</code> formats; see <a class="xref" href="impala_timestamp.html#timestamp">TIMESTAMP Data Type</a> for
+          details.
+        </p>
+      </li>
+
+      <li class="li">
+        <p class="p">
+          You can use <code class="ph codeph">CAST()</code> to convert <code class="ph codeph">STRING</code> values to
+          <code class="ph codeph">TINYINT</code>, <code class="ph codeph">SMALLINT</code>, <code class="ph codeph">INT</code>, <code class="ph codeph">BIGINT</code>,
+          <code class="ph codeph">FLOAT</code>, <code class="ph codeph">DOUBLE</code>, or <code class="ph codeph">TIMESTAMP</code>.
+        </p>
+      </li>
+
+      <li class="li">
+        <p class="p">
+          You cannot directly cast a <code class="ph codeph">STRING</code> value to <code class="ph codeph">BOOLEAN</code>. You can use a
+          <code class="ph codeph">CASE</code> expression to evaluate string values such as <code class="ph codeph">'T'</code>,
+          <code class="ph codeph">'true'</code>, and so on and return Boolean <code class="ph codeph">true</code> and <code class="ph codeph">false</code>
+          values as appropriate.
+        </p>
+      </li>
+
+      <li class="li">
+        <p class="p">
+          You can cast a <code class="ph codeph">BOOLEAN</code> value to <code class="ph codeph">STRING</code>, returning <code class="ph codeph">'1'</code>
+          for <code class="ph codeph">true</code> values and <code class="ph codeph">'0'</code> for <code class="ph codeph">false</code> values.
+        </p>
+      </li>
+    </ul>
+
+    <p class="p">
+        <strong class="ph b">Partitioning:</strong>
+      </p>
+
+    <p class="p">
+      Although it might be convenient to use <code class="ph codeph">STRING</code> columns for partition keys, even when those
+      columns contain numbers, for performance and scalability it is much better to use numeric columns as
+      partition keys whenever practical. Although the underlying HDFS directory name might be the same in either
+      case, the in-memory storage for the partition key columns is more compact, and computations are faster, if
+      partition key columns such as <code class="ph codeph">YEAR</code>, <code class="ph codeph">MONTH</code>, <code class="ph codeph">DAY</code> and so on
+      are declared as <code class="ph codeph">INT</code>, <code class="ph codeph">SMALLINT</code>, and so on.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Zero-length strings:</strong> For purposes of clauses such as <code class="ph codeph">DISTINCT</code> and <code class="ph codeph">GROUP
+        BY</code>, Impala considers zero-length strings (<code class="ph codeph">""</code>), <code class="ph codeph">NULL</code>, and space
+        to all be different values.
+      </p>
+
+
+
+
+
+    <p class="p">
+        <strong class="ph b">Text table considerations:</strong> Values of this type are potentially larger in text tables than in tables
+        using Parquet or other binary formats.
+      </p>
+
+    <p class="p"><strong class="ph b">Avro considerations:</strong></p>
+    <p class="p">
+        The Avro specification allows string values up to 2**64 bytes in length.
+        Impala queries for Avro tables use 32-bit integers to hold string lengths.
+        In <span class="keyword">Impala 2.5</span> and higher, Impala truncates <code class="ph codeph">CHAR</code>
+        and <code class="ph codeph">VARCHAR</code> values in Avro tables to (2**31)-1 bytes.
+        If a query encounters a <code class="ph codeph">STRING</code> value longer than (2**31)-1
+        bytes in an Avro table, the query fails. In earlier releases,
+        encountering such long values in an Avro table could cause a crash.
+      </p>
+
+
+
+
+
+
+
+    <p class="p">
+        <strong class="ph b">Column statistics considerations:</strong> Because the values of this type have variable size, none of the
+        column statistics fields are filled in until you run the <code class="ph codeph">COMPUTE STATS</code> statement.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+    <p class="p">
+      The following examples demonstrate double-quoted and single-quoted string literals, and required escaping for
+      quotation marks within string literals:
+    </p>
+
+<pre class="pre codeblock"><code>SELECT 'I am a single-quoted string';
+SELECT "I am a double-quoted string";
+SELECT 'I\'m a single-quoted string with an apostrophe';
+SELECT "I\'m a double-quoted string with an apostrophe";
+SELECT 'I am a "short" single-quoted string containing quotes';
+SELECT "I am a \"short\" double-quoted string containing quotes";
+</code></pre>
+
+    <p class="p">
+      The following examples demonstrate calls to string manipulation functions to concatenate strings, convert
+      numbers to strings, or pull out substrings:
+    </p>
+
+<pre class="pre codeblock"><code>SELECT CONCAT("Once upon a time, there were ", CAST(3 AS STRING), ' little pigs.');
+SELECT SUBSTR("hello world",7,5);
+</code></pre>
+
+    <p class="p">
+      The following examples show how to perform operations on <code class="ph codeph">STRING</code> columns within a table:
+    </p>
+
+<pre class="pre codeblock"><code>CREATE TABLE t1 (s1 STRING, s2 STRING);
+INSERT INTO t1 VALUES ("hello", 'world'), (CAST(7 AS STRING), "wonders");
+SELECT s1, s2, length(s1) FROM t1 WHERE s2 LIKE 'w%';
+</code></pre>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+    <p class="p">
+      <a class="xref" href="impala_literals.html#string_literals">String Literals</a>, <a class="xref" href="impala_char.html#char">CHAR Data Type (Impala 2.0 or higher only)</a>,
+      <a class="xref" href="impala_varchar.html#varchar">VARCHAR Data Type (Impala 2.0 or higher only)</a>, <a class="xref" href="impala_string_functions.html#string_functions">Impala String Functions</a>,
+      <a class="xref" href="impala_datetime_functions.html#datetime_functions">Impala Date and Time Functions</a>
+    </p>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_datatypes.html">Data Types</a></div></div></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_string_functions.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_string_functions.html b/docs/build/html/topics/impala_string_functions.html
new file mode 100644
index 0000000..aab1f35
--- /dev/null
+++ b/docs/build/html/topics/impala_string_functions.html
@@ -0,0 +1,1036 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_functions.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="string_functions"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Impala String Functions</title></head><body id="string_functions"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Impala String Functions</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <div class="p">
+      String functions are classified as those primarily accepting or returning <code class="ph codeph">STRING</code>,
+      <code class="ph codeph">VARCHAR</code>, or <code class="ph codeph">CHAR</code> data types, for example to measure the length of a string
+      or concatenate two strings together.
+      <ul class="ul">
+        <li class="li">
+          All the functions that accept <code class="ph codeph">STRING</code> arguments also accept the <code class="ph codeph">VARCHAR</code>
+          and <code class="ph codeph">CHAR</code> types introduced in Impala 2.0.
+        </li>
+
+        <li class="li">
+          Whenever <code class="ph codeph">VARCHAR</code> or <code class="ph codeph">CHAR</code> values are passed to a function that returns a
+          string value, the return type is normalized to <code class="ph codeph">STRING</code>. For example, a call to
+          <code class="ph codeph">concat()</code> with a mix of <code class="ph codeph">STRING</code>, <code class="ph codeph">VARCHAR</code>, and
+          <code class="ph codeph">CHAR</code> arguments produces a <code class="ph codeph">STRING</code> result.
+        </li>
+      </ul>
+    </div>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+    <p class="p">
+      The string functions operate mainly on these data types: <a class="xref" href="impala_string.html#string">STRING Data Type</a>,
+      <a class="xref" href="impala_varchar.html#varchar">VARCHAR Data Type (Impala 2.0 or higher only)</a>, and <a class="xref" href="impala_char.html#char">CHAR Data Type (Impala 2.0 or higher only)</a>.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Function reference:</strong>
+    </p>
+
+    <p class="p">
+      Impala supports the following string functions:
+    </p>
+
+    <dl class="dl">
+      
+
+        <dt class="dt dlterm" id="string_functions__ascii">
+          <code class="ph codeph">ascii(string str)</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Returns the numeric ASCII code of the first character of the argument.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">int</code>
+          </p>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="string_functions__btrim">
+          <code class="ph codeph">btrim(string a)</code>,
+          <code class="ph codeph">btrim(string a, string chars_to_trim)</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Removes all instances of one or more characters
+          from the start and end of a <code class="ph codeph">STRING</code> value.
+          By default, removes only spaces.
+          If a non-<code class="ph codeph">NULL</code> optional second argument is specified, the function removes all
+          occurrences of characters in that second argument from the beginning and
+          end of the string.
+          <p class="p"><strong class="ph b">Return type:</strong> <code class="ph codeph">string</code></p>
+          <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.3.0</span>
+      </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+          <p class="p">
+            The following examples show the default <code class="ph codeph">btrim()</code> behavior,
+            and what changes when you specify the optional second argument.
+            All the examples bracket the output value with <code class="ph codeph">[ ]</code>
+            so that you can see any leading or trailing spaces in the <code class="ph codeph">btrim()</code> result.
+            By default, the function removes and number of both leading and trailing spaces.
+            When the second argument is specified, any number of occurrences of any
+            character in the second argument are removed from the start and end of the
+            input string; in this case, spaces are not removed (unless they are part of the second
+            argument) and any instances of the characters are not removed if they do not come
+            right at the beginning or end of the string.
+          </p>
+<pre class="pre codeblock"><code>-- Remove multiple spaces before and one space after.
+select concat('[',btrim('    hello '),']');
++---------------------------------------+
+| concat('[', btrim('    hello '), ']') |
++---------------------------------------+
+| [hello]                               |
++---------------------------------------+
+
+-- Remove any instances of x or y or z at beginning or end. Leave spaces alone.
+select concat('[',btrim('xy    hello zyzzxx','xyz'),']');
++------------------------------------------------------+
+| concat('[', btrim('xy    hello zyzzxx', 'xyz'), ']') |
++------------------------------------------------------+
+| [    hello ]                                         |
++------------------------------------------------------+
+
+-- Remove any instances of x or y or z at beginning or end.
+-- Leave x, y, z alone in the middle of the string.
+select concat('[',btrim('xyhelxyzlozyzzxx','xyz'),']');
++----------------------------------------------------+
+| concat('[', btrim('xyhelxyzlozyzzxx', 'xyz'), ']') |
++----------------------------------------------------+
+| [helxyzlo]                                         |
++----------------------------------------------------+
+</code></pre>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="string_functions__char_length">
+          <code class="ph codeph">char_length(string a), <span class="ph" id="string_functions__character_length">character_length(string a)</span></code>
+        </dt>
+
+        <dd class="dd">
+          
+          
+          <strong class="ph b">Purpose:</strong> Returns the length in characters of the argument string. Aliases for the
+          <code class="ph codeph">length()</code> function.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">int</code>
+          </p>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="string_functions__chr">
+          <code class="ph codeph">chr(int character_code)</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Returns a character specified by a decimal code point value.
+          The interpretation and display of the resulting character depends on your system locale.
+          Because consistent processing of Impala string values is only guaranteed
+          for values within the ASCII range, only use this function for values
+          corresponding to ASCII characters.
+          In particular, parameter values greater than 255 return an empty string.
+          <p class="p"><strong class="ph b">Return type:</strong> <code class="ph codeph">string</code></p>
+          <p class="p">
+            <strong class="ph b">Usage notes:</strong> Can be used as the inverse of the <code class="ph codeph">ascii()</code> function, which
+            converts a character to its numeric ASCII code.
+          </p>
+          <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.3.0</span>
+      </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+<pre class="pre codeblock"><code>SELECT chr(65);
++---------+
+| chr(65) |
++---------+
+| A       |
++---------+
+
+SELECT chr(97);
++---------+
+| chr(97) |
++---------+
+| a       |
++---------+
+</code></pre>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="string_functions__concat">
+          <code class="ph codeph">concat(string a, string b...)</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Returns a single string representing all the argument values joined together.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+          </p>
+          <p class="p">
+        <strong class="ph b">Usage notes:</strong> <code class="ph codeph">concat()</code> and <code class="ph codeph">concat_ws()</code> are appropriate for
+        concatenating the values of multiple columns within the same row, while <code class="ph codeph">group_concat()</code>
+        joins together values from different rows.
+      </p>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="string_functions__concat_ws">
+          <code class="ph codeph">concat_ws(string sep, string a, string b...)</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Returns a single string representing the second and following argument values joined
+          together, delimited by a specified separator.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+          </p>
+          <p class="p">
+        <strong class="ph b">Usage notes:</strong> <code class="ph codeph">concat()</code> and <code class="ph codeph">concat_ws()</code> are appropriate for
+        concatenating the values of multiple columns within the same row, while <code class="ph codeph">group_concat()</code>
+        joins together values from different rows.
+      </p>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="string_functions__find_in_set">
+          <code class="ph codeph">find_in_set(string str, string strList)</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Returns the position (starting from 1) of the first occurrence of a specified string
+          within a comma-separated string. Returns <code class="ph codeph">NULL</code> if either argument is
+          <code class="ph codeph">NULL</code>, 0 if the search string is not found, or 0 if the search string contains a comma.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">int</code>
+          </p>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="string_functions__group_concat">
+          <code class="ph codeph">group_concat(string s [, string sep])</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Returns a single string representing the argument value concatenated together for each
+          row of the result set. If the optional separator string is specified, the separator is added between each
+          pair of concatenated values.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+          </p>
+          <p class="p">
+        <strong class="ph b">Usage notes:</strong> <code class="ph codeph">concat()</code> and <code class="ph codeph">concat_ws()</code> are appropriate for
+        concatenating the values of multiple columns within the same row, while <code class="ph codeph">group_concat()</code>
+        joins together values from different rows.
+      </p>
+          <p class="p">
+            By default, returns a single string covering the whole result set. To include other columns or values
+            in the result set, or to produce multiple concatenated strings for subsets of rows, include a
+            <code class="ph codeph">GROUP BY</code> clause in the query.
+          </p>
+          <p class="p">
+            Strictly speaking, <code class="ph codeph">group_concat()</code> is an aggregate function, not a scalar
+            function like the others in this list.
+            For additional details and examples, see <a class="xref" href="impala_group_concat.html#group_concat">GROUP_CONCAT Function</a>.
+          </p>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="string_functions__initcap">
+          <code class="ph codeph">initcap(string str)</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Returns the input string with the first letter capitalized.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+          </p>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="string_functions__instr">
+          <code class="ph codeph">instr(string str, string substr)</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Returns the position (starting from 1) of the first occurrence of a substring within a
+          longer string.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">int</code>
+          </p>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="string_functions__length">
+          <code class="ph codeph">length(string a)</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Returns the length in characters of the argument string.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">int</code>
+          </p>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="string_functions__locate">
+          <code class="ph codeph">locate(string substr, string str[, int pos])</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Returns the position (starting from 1) of the first occurrence of a substring within a
+          longer string, optionally after a particular position.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">int</code>
+          </p>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="string_functions__lower">
+          <code class="ph codeph">lower(string a), <span class="ph" id="string_functions__lcase">lcase(string a)</span> </code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Returns the argument string converted to all-lowercase.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+          </p>
+          <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+          <p class="p">
+        In <span class="keyword">Impala 2.5</span> and higher, you can simplify queries that
+        use many <code class="ph codeph">UPPER()</code> and <code class="ph codeph">LOWER()</code> calls
+        to do case-insensitive comparisons, by using the <code class="ph codeph">ILIKE</code>
+        or <code class="ph codeph">IREGEXP</code> operators instead. See
+        <a class="xref" href="../shared/../topics/impala_operators.html#ilike">ILIKE Operator</a> and
+        <a class="xref" href="../shared/../topics/impala_operators.html#iregexp">IREGEXP Operator</a> for details.
+      </p>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="string_functions__lpad">
+          <code class="ph codeph">lpad(string str, int len, string pad)</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Returns a string of a specified length, based on the first argument string. If the
+          specified string is too short, it is padded on the left with a repeating sequence of the characters from
+          the pad string. If the specified string is too long, it is truncated on the right.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+          </p>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="string_functions__ltrim">
+          <code class="ph codeph">ltrim(string a)</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Returns the argument string with any leading spaces removed from the left side.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+          </p>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="string_functions__parse_url">
+          <code class="ph codeph">parse_url(string urlString, string partToExtract [, string keyToExtract])</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Returns the portion of a URL corresponding to a specified part. The part argument can be
+          <code class="ph codeph">'PROTOCOL'</code>, <code class="ph codeph">'HOST'</code>, <code class="ph codeph">'PATH'</code>, <code class="ph codeph">'REF'</code>,
+          <code class="ph codeph">'AUTHORITY'</code>, <code class="ph codeph">'FILE'</code>, <code class="ph codeph">'USERINFO'</code>, or
+          <code class="ph codeph">'QUERY'</code>. Uppercase is required for these literal values. When requesting the
+          <code class="ph codeph">QUERY</code> portion of the URL, you can optionally specify a key to retrieve just the
+          associated value from the key-value pairs in the query string.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+          </p>
+          <p class="p">
+            <strong class="ph b">Usage notes:</strong> This function is important for the traditional Hadoop use case of interpreting web
+            logs. For example, if the web traffic data features raw URLs not divided into separate table columns,
+            you can count visitors to a particular page by extracting the <code class="ph codeph">'PATH'</code> or
+            <code class="ph codeph">'FILE'</code> field, or analyze search terms by extracting the corresponding key from the
+            <code class="ph codeph">'QUERY'</code> field.
+          </p>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="string_functions__regexp_extract">
+          <code class="ph codeph">regexp_extract(string subject, string pattern, int index)</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Returns the specified () group from a string based on a regular expression pattern. Group
+          0 refers to the entire extracted string, while group 1, 2, and so on refers to the first, second, and so
+          on <code class="ph codeph">(...)</code> portion.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+          </p>
+          <p class="p">
+        In Impala 2.0 and later, the Impala regular expression syntax conforms to the POSIX Extended Regular
+        Expression syntax used by the Google RE2 library. For details, see
+        <a class="xref" href="https://code.google.com/p/re2/" target="_blank">the RE2 documentation</a>. It
+        has most idioms familiar from regular expressions in Perl, Python, and so on, including
+        <code class="ph codeph">.*?</code> for non-greedy matches.
+      </p>
+          <p class="p">
+        In Impala 2.0 and later, a change in the underlying regular expression library could cause changes in the
+        way regular expressions are interpreted by this function. Test any queries that use regular expressions and
+        adjust the expression patterns if necessary. See
+        <a class="xref" href="../shared/../topics/impala_incompatible_changes.html#incompatible_changes_200">Incompatible Changes Introduced in Impala 2.0.0</a> for details.
+      </p>
+          <p class="p">
+        Because the <span class="keyword cmdname">impala-shell</span> interpreter uses the <code class="ph codeph">\</code> character for escaping,
+        use <code class="ph codeph">\\</code> to represent the regular expression escape character in any regular expressions
+        that you submit through <span class="keyword cmdname">impala-shell</span> . You might prefer to use the equivalent character
+        class names, such as <code class="ph codeph">[[:digit:]]</code> instead of <code class="ph codeph">\d</code> which you would have to
+        escape as <code class="ph codeph">\\d</code>.
+      </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+          <p class="p">
+            This example shows how group 0 matches the full pattern string, including the portion outside any
+            <code class="ph codeph">()</code> group:
+          </p>
+<pre class="pre codeblock"><code>[localhost:21000] &gt; select regexp_extract('abcdef123ghi456jkl','.*?(\\d+)',0);
++------------------------------------------------------+
+| regexp_extract('abcdef123ghi456jkl', '.*?(\\d+)', 0) |
++------------------------------------------------------+
+| abcdef123ghi456                                      |
++------------------------------------------------------+
+Returned 1 row(s) in 0.11s</code></pre>
+          <p class="p">
+            This example shows how group 1 matches just the contents inside the first <code class="ph codeph">()</code> group in
+            the pattern string:
+          </p>
+<pre class="pre codeblock"><code>[localhost:21000] &gt; select regexp_extract('abcdef123ghi456jkl','.*?(\\d+)',1);
++------------------------------------------------------+
+| regexp_extract('abcdef123ghi456jkl', '.*?(\\d+)', 1) |
++------------------------------------------------------+
+| 456                                                  |
++------------------------------------------------------+
+Returned 1 row(s) in 0.11s</code></pre>
+          <p class="p">
+            Unlike in earlier Impala releases, the regular expression library used in Impala 2.0 and later supports
+            the <code class="ph codeph">.*?</code> idiom for non-greedy matches. This example shows how a pattern string starting
+            with <code class="ph codeph">.*?</code> matches the shortest possible portion of the source string, returning the
+            rightmost set of lowercase letters. A pattern string both starting and ending with <code class="ph codeph">.*?</code>
+            finds two potential matches of equal length, and returns the first one found (the leftmost set of
+            lowercase letters).
+          </p>
+<pre class="pre codeblock"><code>[localhost:21000] &gt; select regexp_extract('AbcdBCdefGHI','.*?([[:lower:]]+)',1);
++--------------------------------------------------------+
+| regexp_extract('abcdbcdefghi', '.*?([[:lower:]]+)', 1) |
++--------------------------------------------------------+
+| def                                                    |
++--------------------------------------------------------+
+[localhost:21000] &gt; select regexp_extract('AbcdBCdefGHI','.*?([[:lower:]]+).*?',1);
++-----------------------------------------------------------+
+| regexp_extract('abcdbcdefghi', '.*?([[:lower:]]+).*?', 1) |
++-----------------------------------------------------------+
+| bcd                                                       |
++-----------------------------------------------------------+
+</code></pre>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="string_functions__regexp_like">
+          <code class="ph codeph">regexp_like(string source, string pattern[, string options])</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Returns <code class="ph codeph">true</code> or <code class="ph codeph">false</code> to indicate
+          whether the source string contains anywhere inside it the regular expression given by the pattern.
+          The optional third argument consists of letter flags that change how the match is performed,
+          such as <code class="ph codeph">i</code> for case-insensitive matching.
+          <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+          <p class="p">
+            The flags that you can include in the optional third argument are:
+          </p>
+          <ul class="ul">
+          <li class="li">
+          <code class="ph codeph">c</code>: Case-sensitive matching (the default).
+          </li>
+          <li class="li">
+          <code class="ph codeph">i</code>: Case-insensitive matching. If multiple instances of <code class="ph codeph">c</code> and <code class="ph codeph">i</code>
+          are included in the third argument, the last such option takes precedence.
+          </li>
+          <li class="li">
+          <code class="ph codeph">m</code>: Multi-line matching. The <code class="ph codeph">^</code> and <code class="ph codeph">$</code>
+          operators match the start or end of any line within the source string, not the
+          start and end of the entire string.
+          </li>
+          <li class="li">
+          <code class="ph codeph">n</code>: Newline matching. The <code class="ph codeph">.</code> operator can match the
+          newline character. A repetition operator such as <code class="ph codeph">.*</code> can
+          match a portion of the source string that spans multiple lines.
+          </li>
+          </ul>
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">boolean</code>
+          </p>
+          <p class="p">
+        In Impala 2.0 and later, the Impala regular expression syntax conforms to the POSIX Extended Regular
+        Expression syntax used by the Google RE2 library. For details, see
+        <a class="xref" href="https://code.google.com/p/re2/" target="_blank">the RE2 documentation</a>. It
+        has most idioms familiar from regular expressions in Perl, Python, and so on, including
+        <code class="ph codeph">.*?</code> for non-greedy matches.
+      </p>
+          <p class="p">
+        In Impala 2.0 and later, a change in the underlying regular expression library could cause changes in the
+        way regular expressions are interpreted by this function. Test any queries that use regular expressions and
+        adjust the expression patterns if necessary. See
+        <a class="xref" href="../shared/../topics/impala_incompatible_changes.html#incompatible_changes_200">Incompatible Changes Introduced in Impala 2.0.0</a> for details.
+      </p>
+          <p class="p">
+        Because the <span class="keyword cmdname">impala-shell</span> interpreter uses the <code class="ph codeph">\</code> character for escaping,
+        use <code class="ph codeph">\\</code> to represent the regular expression escape character in any regular expressions
+        that you submit through <span class="keyword cmdname">impala-shell</span> . You might prefer to use the equivalent character
+        class names, such as <code class="ph codeph">[[:digit:]]</code> instead of <code class="ph codeph">\d</code> which you would have to
+        escape as <code class="ph codeph">\\d</code>.
+      </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+          <p class="p">
+            This example shows how <code class="ph codeph">regexp_like()</code> can test for the existence
+            of various kinds of regular expression patterns within a source string:
+          </p>
+<pre class="pre codeblock"><code>
+-- Matches because the 'f' appears somewhere in 'foo'.
+select regexp_like('foo','f');
++-------------------------+
+| regexp_like('foo', 'f') |
++-------------------------+
+| true                    |
++-------------------------+
+
+-- Does not match because the comparison is case-sensitive by default.
+select regexp_like('foo','F');
++-------------------------+
+| regexp_like('foo', 'f') |
++-------------------------+
+| false                   |
++-------------------------+
+
+-- The 3rd argument can change the matching logic, such as 'i' meaning case-insensitive.
+select regexp_like('foo','F','i');
++------------------------------+
+| regexp_like('foo', 'f', 'i') |
++------------------------------+
+| true                         |
++------------------------------+
+
+-- The familiar regular expression notations work, such as ^ and $ anchors...
+select regexp_like('foo','f$');
++--------------------------+
+| regexp_like('foo', 'f$') |
++--------------------------+
+| false                    |
++--------------------------+
+
+select regexp_like('foo','o$');
++--------------------------+
+| regexp_like('foo', 'o$') |
++--------------------------+
+| true                     |
++--------------------------+
+
+-- ...and repetition operators such as * and +
+select regexp_like('foooooobar','fo+b');
++-----------------------------------+
+| regexp_like('foooooobar', 'fo+b') |
++-----------------------------------+
+| true                              |
++-----------------------------------+
+
+select regexp_like('foooooobar','fx*y*o*b');
++---------------------------------------+
+| regexp_like('foooooobar', 'fx*y*o*b') |
++---------------------------------------+
+| true                                  |
++---------------------------------------+
+
+</code></pre>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="string_functions__regexp_replace">
+          <code class="ph codeph">regexp_replace(string initial, string pattern, string replacement)</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Returns the initial argument with the regular expression pattern replaced by the final
+          argument string.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+          </p>
+          <p class="p">
+        In Impala 2.0 and later, the Impala regular expression syntax conforms to the POSIX Extended Regular
+        Expression syntax used by the Google RE2 library. For details, see
+        <a class="xref" href="https://code.google.com/p/re2/" target="_blank">the RE2 documentation</a>. It
+        has most idioms familiar from regular expressions in Perl, Python, and so on, including
+        <code class="ph codeph">.*?</code> for non-greedy matches.
+      </p>
+          <p class="p">
+        In Impala 2.0 and later, a change in the underlying regular expression library could cause changes in the
+        way regular expressions are interpreted by this function. Test any queries that use regular expressions and
+        adjust the expression patterns if necessary. See
+        <a class="xref" href="../shared/../topics/impala_incompatible_changes.html#incompatible_changes_200">Incompatible Changes Introduced in Impala 2.0.0</a> for details.
+      </p>
+          <p class="p">
+        Because the <span class="keyword cmdname">impala-shell</span> interpreter uses the <code class="ph codeph">\</code> character for escaping,
+        use <code class="ph codeph">\\</code> to represent the regular expression escape character in any regular expressions
+        that you submit through <span class="keyword cmdname">impala-shell</span> . You might prefer to use the equivalent character
+        class names, such as <code class="ph codeph">[[:digit:]]</code> instead of <code class="ph codeph">\d</code> which you would have to
+        escape as <code class="ph codeph">\\d</code>.
+      </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+          <p class="p">
+            These examples show how you can replace parts of a string matching a pattern with replacement text,
+            which can include backreferences to any <code class="ph codeph">()</code> groups in the pattern string. The
+            backreference numbers start at 1, and any <code class="ph codeph">\</code> characters must be escaped as
+            <code class="ph codeph">\\</code>.
+          </p>
+          <p class="p">
+            Replace a character pattern with new text:
+          </p>
+<pre class="pre codeblock"><code>[localhost:21000] &gt; select regexp_replace('aaabbbaaa','b+','xyz');
++------------------------------------------+
+| regexp_replace('aaabbbaaa', 'b+', 'xyz') |
++------------------------------------------+
+| aaaxyzaaa                                |
++------------------------------------------+
+Returned 1 row(s) in 0.11s</code></pre>
+          <p class="p">
+            Replace a character pattern with substitution text that includes the original matching text:
+          </p>
+<pre class="pre codeblock"><code>[localhost:21000] &gt; select regexp_replace('aaabbbaaa','(b+)','&lt;\\1&gt;');
++----------------------------------------------+
+| regexp_replace('aaabbbaaa', '(b+)', '&lt;\\1&gt;') |
++----------------------------------------------+
+| aaa&lt;bbb&gt;aaa                                  |
++----------------------------------------------+
+Returned 1 row(s) in 0.11s</code></pre>
+          <p class="p">
+            Remove all characters that are not digits:
+          </p>
+<pre class="pre codeblock"><code>[localhost:21000] &gt; select regexp_replace('123-456-789','[^[:digit:]]','');
++---------------------------------------------------+
+| regexp_replace('123-456-789', '[^[:digit:]]', '') |
++---------------------------------------------------+
+| 123456789                                         |
++---------------------------------------------------+
+Returned 1 row(s) in 0.12s</code></pre>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="string_functions__repeat">
+          <code class="ph codeph">repeat(string str, int n)</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Returns the argument string repeated a specified number of times.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+          </p>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="string_functions__reverse">
+          <code class="ph codeph">reverse(string a)</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Returns the argument string with characters in reversed order.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+          </p>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="string_functions__rpad">
+          <code class="ph codeph">rpad(string str, int len, string pad)</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Returns a string of a specified length, based on the first argument string. If the
+          specified string is too short, it is padded on the right with a repeating sequence of the characters from
+          the pad string. If the specified string is too long, it is truncated on the right.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+          </p>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="string_functions__rtrim">
+          <code class="ph codeph">rtrim(string a)</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Returns the argument string with any trailing spaces removed from the right side.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+          </p>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="string_functions__space">
+          <code class="ph codeph">space(int n)</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Returns a concatenated string of the specified number of spaces. Shorthand for
+          <code class="ph codeph">repeat(' ',<var class="keyword varname">n</var>)</code>.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+          </p>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="string_functions__split_part">
+          <code class="ph codeph">split_part(string source, string delimiter, bigint n)</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Returns the nth field within a delimited string.
+          The fields are numbered starting from 1.
+          The delimiter can consist of multiple characters, not just a
+          single character. All matching of the delimiter is done exactly, not using any
+          regular expression patterns.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+          </p>
+          <p class="p">
+        In Impala 2.0 and later, the Impala regular expression syntax conforms to the POSIX Extended Regular
+        Expression syntax used by the Google RE2 library. For details, see
+        <a class="xref" href="https://code.google.com/p/re2/" target="_blank">the RE2 documentation</a>. It
+        has most idioms familiar from regular expressions in Perl, Python, and so on, including
+        <code class="ph codeph">.*?</code> for non-greedy matches.
+      </p>
+          <p class="p">
+        In Impala 2.0 and later, a change in the underlying regular expression library could cause changes in the
+        way regular expressions are interpreted by this function. Test any queries that use regular expressions and
+        adjust the expression patterns if necessary. See
+        <a class="xref" href="../shared/../topics/impala_incompatible_changes.html#incompatible_changes_200">Incompatible Changes Introduced in Impala 2.0.0</a> for details.
+      </p>
+          <p class="p">
+        Because the <span class="keyword cmdname">impala-shell</span> interpreter uses the <code class="ph codeph">\</code> character for escaping,
+        use <code class="ph codeph">\\</code> to represent the regular expression escape character in any regular expressions
+        that you submit through <span class="keyword cmdname">impala-shell</span> . You might prefer to use the equivalent character
+        class names, such as <code class="ph codeph">[[:digit:]]</code> instead of <code class="ph codeph">\d</code> which you would have to
+        escape as <code class="ph codeph">\\d</code>.
+      </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+          <p class="p">
+            These examples show how to retrieve the nth field from a delimited string:
+          </p>
+<pre class="pre codeblock"><code>
+select split_part('x,y,z',',',1);
++-----------------------------+
+| split_part('x,y,z', ',', 1) |
++-----------------------------+
+| x                           |
++-----------------------------+
+
+select split_part('x,y,z',',',2);
++-----------------------------+
+| split_part('x,y,z', ',', 2) |
++-----------------------------+
+| y                           |
++-----------------------------+
+
+select split_part('x,y,z',',',3);
++-----------------------------+
+| split_part('x,y,z', ',', 3) |
++-----------------------------+
+| z                           |
++-----------------------------+
+&lt;/codeblock&gt;
+
+          &lt;p&gt;
+            These examples show what happens for out-of-range field positions.
+            Specifying a value less than 1 produces an error. Specifying a value
+            greater than the number of fields returns a zero-length string
+            (which is not the same as &lt;codeph&gt;NULL&lt;/codeph&gt;).
+          &lt;/p&gt;
+&lt;codeblock&gt;&lt;![CDATA[
+select split_part('x,y,z',',',0);
+ERROR: Invalid field position: 0
+
+with t1 as (select split_part('x,y,z',',',4) nonexistent_field)
+  select
+      nonexistent_field
+    , concat('[',nonexistent_field,']')
+    , length(nonexistent_field);
+from t1
++-------------------+-------------------------------------+---------------------------+
+| nonexistent_field | concat('[', nonexistent_field, ']') | length(nonexistent_field) |
++-------------------+-------------------------------------+---------------------------+
+|                   | []                                  | 0                         |
++-------------------+-------------------------------------+---------------------------+
+&lt;/codeblock&gt;
+
+          &lt;p&gt;
+            These examples show how the delimiter can be a multi-character value:
+          &lt;/p&gt;
+&lt;codeblock&gt;&lt;![CDATA[
+select split_part('one***two***three','***',2);
++-------------------------------------------+
+| split_part('one***two***three', '***', 2) |
++-------------------------------------------+
+| two                                       |
++-------------------------------------------+
+
+select split_part('one\|/two\|/three','\|/',3);
++-------------------------------------------+
+| split_part('one\|/two\|/three', '\|/', 3) |
++-------------------------------------------+
+| three                                     |
++-------------------------------------------+
+
+</code></pre>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="string_functions__strleft">
+          <code class="ph codeph">strleft(string a, int num_chars)</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Returns the leftmost characters of the string. Shorthand for a call to
+          <code class="ph codeph">substr()</code> with 2 arguments.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+          </p>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="string_functions__strright">
+          <code class="ph codeph">strright(string a, int num_chars)</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Returns the rightmost characters of the string. Shorthand for a call to
+          <code class="ph codeph">substr()</code> with 2 arguments.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+          </p>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="string_functions__substr">
+          <code class="ph codeph">substr(string a, int start [, int len]), <span class="ph" id="string_functions__substring">substring(string a, int start [, int
+          len])</span></code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Returns the portion of the string starting at a specified point, optionally with a
+          specified maximum length. The characters in the string are indexed starting at 1.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+          </p>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="string_functions__translate">
+          <code class="ph codeph">translate(string input, string from, string to)</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Returns the input string with a set of characters replaced by another set of characters.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+          </p>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="string_functions__trim">
+          <code class="ph codeph">trim(string a)</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Returns the input string with both leading and trailing spaces removed. The same as
+          passing the string through both <code class="ph codeph">ltrim()</code> and <code class="ph codeph">rtrim()</code>.
+          <p class="p">
+            <strong class="ph b">Usage notes:</strong> Often used during data cleansing operations during the ETL cycle, if input values might still have surrounding spaces.
+            For a more general-purpose function that can remove other leading and trailing characters besides spaces, see <code class="ph codeph">btrim()</code>.
+          </p>
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+          </p>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="string_functions__upper">
+          <code class="ph codeph">upper(string a), <span class="ph" id="string_functions__ucase">ucase(string a)</span></code>
+        </dt>
+
+        <dd class="dd">
+          
+          
+          <strong class="ph b">Purpose:</strong> Returns the argument string converted to all-uppercase.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+          </p>
+          <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+          <p class="p">
+        In <span class="keyword">Impala 2.5</span> and higher, you can simplify queries that
+        use many <code class="ph codeph">UPPER()</code> and <code class="ph codeph">LOWER()</code> calls
+        to do case-insensitive comparisons, by using the <code class="ph codeph">ILIKE</code>
+        or <code class="ph codeph">IREGEXP</code> operators instead. See
+        <a class="xref" href="../shared/../topics/impala_operators.html#ilike">ILIKE Operator</a> and
+        <a class="xref" href="../shared/../topics/impala_operators.html#iregexp">IREGEXP Operator</a> for details.
+      </p>
+        </dd>
+
+      
+    </dl>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_functions.html">Impala Built-In Functions</a></div></div></nav></article></main></body></html>
\ No newline at end of file

[11/51] [partial] incubator-impala git commit: IMPALA-4181 [DOCS] Publish rendered Impala documentation to ASF site

Posted by jb...@apache.org.

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_reservation_request_timeout.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_reservation_request_timeout.html b/docs/build/html/topics/impala_reservation_request_timeout.html
new file mode 100644
index 0000000..7bc4114
--- /dev/null
+++ b/docs/build/html/topics/impala_reservation_request_timeout.html
@@ -0,0 +1,21 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="reservation_request_timeout"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>RESERVATION_REQUEST_TIMEOUT Query Option</title></head><body id="reservation_request_timeout"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">RESERVATION_REQUEST_TIMEOUT Query Option</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+        <p class="p">
+          This query option no longer has any effect.
+          The use of the Llama component for integrated resource management within YARN is no
+          longer supported with <span class="keyword">Impala 2.3</span> and higher, and the Llama
+          support code is removed entirely in <span class="keyword">Impala 2.8</span> and higher.
+        </p>
+      </div>
+
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_reserved_words.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_reserved_words.html b/docs/build/html/topics/impala_reserved_words.html
new file mode 100644
index 0000000..2516a6d
--- /dev/null
+++ b/docs/build/html/topics/impala_reserved_words.html
@@ -0,0 +1,357 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="reserved_words"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Impala Reserved Words</title></head><body id="reserved_words"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Impala Reserved Words</h1>
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      
+      The following are the reserved words for the current release of Impala. A reserved word is one that
+      cannot be used directly as an identifier; you must quote it with backticks. For example, a statement
+      <code class="ph codeph">CREATE TABLE select (x INT)</code> fails, while <code class="ph codeph">CREATE TABLE `select` (x INT)</code>
+      succeeds. Impala does not reserve the names of aggregate or scalar built-in functions. (Formerly, Impala did
+      reserve the names of some aggregate functions.)
+    </p>
+
+    <p class="p">
+      Because different database systems have different sets of reserved words, and the reserved words change from
+      release to release, carefully consider database, table, and column names to ensure maximum compatibility
+      between products and versions.
+    </p>
+
+    <p class="p">
+      Because you might switch between Impala and Hive when doing analytics and ETL, also consider whether
+      your object names are the same as any Hive keywords, and rename or quote any that conflict. Consult the
+      <a class="xref" href="https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Keywords,Non-reservedKeywordsandReservedKeywords" target="_blank">list of Hive keywords</a>.
+    </p>
+
+    <p class="p toc inpage"></p>
+
+  </div>
+
+<article class="topic concept nested1" aria-labelledby="ariaid-title2" id="reserved_words__reserved_words_current">
+<h2 class="title topictitle2" id="ariaid-title2">List of Current Reserved Words</h2>
+<div class="body conbody">
+
+
+<pre class="pre codeblock"><code>add
+aggregate
+all
+alter
+<span class="ph">analytic</span>
+and
+<span class="ph">anti</span>
+<span class="ph">api_version</span>
+as
+asc
+avro
+between
+bigint
+<span class="ph">binary</span>
+<span class="ph">blocksize</span>
+boolean
+
+by
+<span class="ph">cached</span>
+<span class="ph">cascade</span>
+case
+cast
+change
+<span class="ph">char</span>
+<span class="ph">class</span>
+<span class="ph">close_fn</span>
+column
+columns
+comment
+<span class="ph">compression</span>
+compute
+create
+cross
+<span class="ph">current</span>
+data
+database
+databases
+date
+datetime
+decimal
+<span class="ph">default</span>
+<span class="ph">delete</span>
+delimited
+desc
+describe
+distinct
+
+div
+double
+drop
+else
+<span class="ph">encoding</span>
+end
+escaped
+exists
+explain
+<span class="ph">extended</span>
+external
+false
+fields
+fileformat
+<span class="ph">finalize_fn</span>
+first
+float
+<span class="ph">following</span>
+<span class="ph">for</span>
+format
+formatted
+from
+full
+function
+functions
+<span class="ph">grant</span>
+group
+<span class="ph">hash</span>
+having
+if
+
+<span class="ph">ilike</span>
+in
+<span class="ph">incremental</span>
+<span class="ph">init_fn</span>
+inner
+inpath
+insert
+int
+integer
+intermediate
+interval
+into
+invalidate
+<span class="ph">iregexp</span>
+is
+join
+last
+left
+like
+limit
+lines
+load
+location
+<span class="ph">merge_fn</span>
+metadata
+not
+null
+nulls
+offset
+on
+or
+order
+outer
+<span class="ph">over</span>
+overwrite
+parquet
+parquetfile
+partition
+partitioned
+<span class="ph">partitions</span>
+<span class="ph">preceding</span>
+<span class="ph">prepare_fn</span>
+<span class="ph">produced</span>
+<span class="ph">purge</span>
+<span class="ph">range</span>
+rcfile
+real
+refresh
+regexp
+rename
+replace
+<span class="ph">restrict</span>
+returns
+<span class="ph">revoke</span>
+right
+rlike
+<span class="ph">role</span>
+<span class="ph">roles</span>
+row
+<span class="ph">rows</span>
+schema
+schemas
+select
+semi
+sequencefile
+serdeproperties
+<span class="ph">serialize_fn</span>
+set
+show
+smallint
+
+stats
+stored
+straight_join
+string
+symbol
+table
+tables
+tblproperties
+terminated
+textfile
+then
+timestamp
+tinyint
+to
+true
+<span class="ph">truncate</span>
+<span class="ph">unbounded</span>
+<span class="ph">uncached</span>
+union
+<span class="ph">update</span>
+<span class="ph">update_fn</span>
+<span class="ph">upsert</span>
+use
+using
+values
+<span class="ph">varchar</span>
+view
+when
+where
+with</code></pre>
+</div>
+</article>
+
+<article class="topic concept nested1" aria-labelledby="ariaid-title3" id="reserved_words__reserved_words_planning">
+<h2 class="title topictitle2" id="ariaid-title3">Planning for Future Reserved Words</h2>
+<div class="body conbody">
+<p class="p">
+The previous list of reserved words includes all the keywords
+used in the current level of Impala SQL syntax.
+To future-proof your code,
+you should avoid additional words in case they
+become reserved words if
+Impala adds features in later releases.
+This kind of planning can also help to avoid
+name conflicts in case you port SQL from other systems that
+have different sets of reserved words.
+</p>
+
+<p class="p">
+The following list contains additional words that you should
+avoid for table, column, or other object names,
+even though they are not currently reserved by Impala.
+</p>
+
+<pre class="pre codeblock"><code>any
+authorization
+backup
+begin
+break
+browse
+bulk
+cascade
+check
+checkpoint
+close
+clustered
+coalesce
+collate
+commit
+constraint
+contains
+continue
+convert
+current
+current_date
+current_time
+current_timestamp
+current_user
+cursor
+dbcc
+deallocate
+declare
+default
+deny
+disk
+distributed
+dump
+errlvl
+escape
+except
+exec
+execute
+exit
+fetch
+file
+fillfactor
+for
+foreign
+freetext
+goto
+holdlock
+identity
+index
+intersect
+key
+kill
+lineno
+merge
+national
+nocheck
+nonclustered
+nullif
+of
+off
+offsets
+open
+option
+percent
+pivot
+plan
+precision
+primary
+print
+proc
+procedure
+public
+raiserror
+read
+readtext
+reconfigure
+references
+replication
+restore
+restrict
+return
+revert
+rollback
+rowcount
+rule
+save
+securityaudit
+session_user
+setuser
+shutdown
+some
+statistics
+system_user
+tablesample
+textsize
+then
+top
+tran
+transaction
+trigger
+try_convert
+unique
+unpivot
+updatetext
+user
+varying
+waitfor
+while
+within
+writetext
+</code></pre>
+</div>
+</article>
+
+</article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_resource_management.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_resource_management.html b/docs/build/html/topics/impala_resource_management.html
new file mode 100644
index 0000000..a89d7fd
--- /dev/null
+++ b/docs/build/html/topics/impala_resource_management.html
@@ -0,0 +1,97 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_admin.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="resource_management"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Resource Management for Impala</title></head><body id="resource_management"><mai
 n role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Resource Management for Impala</h1>
+  
+
+  <div class="body conbody">
+
+    <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+        <p class="p">
+          The use of the Llama component for integrated resource management within YARN
+          is no longer supported with <span class="keyword">Impala 2.3</span> and higher.
+          The Llama support code is removed entirely in <span class="keyword">Impala 2.8</span> and higher.
+        </p>
+        <p class="p">
+          For clusters running Impala alongside
+          other data management components, you define static service pools to define the resources
+          available to Impala and other components. Then within the area allocated for Impala,
+          you can create dynamic service pools, each with its own settings for the Impala admission control feature.
+        </p>
+      </div>
+
+    <p class="p">
+      You can limit the CPU and memory resources used by Impala, to manage and prioritize workloads on clusters
+      that run jobs from many Hadoop components.
+    </p>
+
+    <p class="p toc inpage"></p>
+  </div>
+
+  <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_admin.html">Impala Administration</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="resource_management__rm_enforcement">
+
+    <h2 class="title topictitle2" id="ariaid-title2">How Resource Limits Are Enforced</h2>
+  
+
+    <div class="body conbody">
+
+      <p class="p">
+        Limits on memory usage are enforced by Impala's process memory limit (the <code class="ph codeph">MEM_LIMIT</code>
+        query option setting). The admission control feature checks this setting to decide how many queries
+        can be safely run at the same time. Then the Impala daemon enforces the limit by activating the
+        spill-to-disk mechanism when necessary, or cancelling a query altogether if the limit is exceeded at runtime.
+      </p>
+
+    </div>
+  </article>
+
+
+
+    <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="resource_management__rm_query_options">
+
+      <h2 class="title topictitle2" id="ariaid-title3">impala-shell Query Options for Resource Management</h2>
+  
+
+      <div class="body conbody">
+
+        <p class="p">
+          Before issuing SQL statements through the <span class="keyword cmdname">impala-shell</span> interpreter, you can use the
+          <code class="ph codeph">SET</code> command to configure the following parameters related to resource management:
+        </p>
+
+        <ul class="ul" id="rm_query_options__ul_nzt_twf_jp">
+          <li class="li">
+            <a class="xref" href="impala_explain_level.html#explain_level">EXPLAIN_LEVEL Query Option</a>
+          </li>
+
+          <li class="li">
+            <a class="xref" href="impala_mem_limit.html#mem_limit">MEM_LIMIT Query Option</a>
+          </li>
+
+        </ul>
+      </div>
+    </article>
+
+
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="resource_management__rm_limitations">
+
+    <h2 class="title topictitle2" id="ariaid-title4">Limitations of Resource Management for Impala</h2>
+
+    <div class="body conbody">
+
+
+
+      
+
+      
+
+      <p class="p">
+        The <code class="ph codeph">MEM_LIMIT</code> query option, and the other resource-related query options, are settable
+        through the ODBC or JDBC interfaces in Impala 2.0 and higher. This is a former limitation that is now
+        lifted.
+      </p>
+    </div>
+  </article>
+</article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_revoke.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_revoke.html b/docs/build/html/topics/impala_revoke.html
new file mode 100644
index 0000000..f2e0392
--- /dev/null
+++ b/docs/build/html/topics/impala_revoke.html
@@ -0,0 +1,117 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_langref_sql.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="revoke"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>REVOKE Statement (Impala 2.0 or higher only)</title></head><body id="revoke"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">REVOKE Statement (<span class="keyword">Impala 2.0</span> or higher only)</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      
+
+      The <code class="ph codeph">REVOKE</code> statement revokes roles or privileges on a specified object from groups. Only
+      Sentry administrative users can revoke the role from a group. The revocation has a cascading effect. For
+      example, revoking the <code class="ph codeph">ALL</code> privilege on a database also revokes the same privilege for all
+      the tables in that database.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>REVOKE ROLE <var class="keyword varname">role_name</var> FROM GROUP <var class="keyword varname">group_name</var>
+
+REVOKE <var class="keyword varname">privilege</var> ON <var class="keyword varname">object_type</var> <var class="keyword varname">object_name</var>
+  FROM [ROLE] <var class="keyword varname">role_name</var>
+
+<span class="ph">privilege ::= SELECT | SELECT(<var class="keyword varname">column_name</var>) | INSERT | ALL</span>
+object_type ::= TABLE | DATABASE | SERVER | URI
+</code></pre>
+
+    <p class="p">
+      Typically, the object name is an identifier. For URIs, it is a string literal.
+    </p>
+
+    <p class="p">
+      The ability to grant or revoke <code class="ph codeph">SELECT</code> privilege on specific columns is available
+      in <span class="keyword">Impala 2.3</span> and higher. See
+      <span class="xref">the documentation for Apache Sentry</span> for details.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Required privileges:</strong>
+      </p>
+
+    <p class="p">
+      Only administrative users (those with <code class="ph codeph">ALL</code> privileges on the server, defined in the Sentry
+      policy file) can use this statement.
+    </p>
+
+
+
+    <p class="p">
+        <strong class="ph b">Compatibility:</strong>
+      </p>
+
+    <div class="p">
+      <ul class="ul">
+        <li class="li">
+          The Impala <code class="ph codeph">GRANT</code> and <code class="ph codeph">REVOKE</code> statements are available in <span class="keyword">Impala 2.0</span> and
+          higher.
+        </li>
+
+        <li class="li">
+          In <span class="keyword">Impala 1.4</span> and higher, Impala makes use of any roles and privileges specified by the
+          <code class="ph codeph">GRANT</code> and <code class="ph codeph">REVOKE</code> statements in Hive, when your system is configured to
+          use the Sentry service instead of the file-based policy mechanism.
+        </li>
+
+        <li class="li">
+          The Impala <code class="ph codeph">GRANT</code> and <code class="ph codeph">REVOKE</code> statements do not require the
+          <code class="ph codeph">ROLE</code> keyword to be repeated before each role name, unlike the equivalent Hive
+          statements.
+        </li>
+
+        <li class="li">
+          Currently, each Impala <code class="ph codeph">GRANT</code> or <code class="ph codeph">REVOKE</code> statement can only grant or
+          revoke a single privilege to or from a single role.
+        </li>
+      </ul>
+    </div>
+
+    <p class="p">
+        <strong class="ph b">Cancellation:</strong> Cannot be cancelled.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">HDFS permissions:</strong> This statement does not touch any HDFS files or directories,
+        therefore no HDFS permissions are required.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Kudu considerations:</strong>
+      </p>
+    <p class="p">
+        Access to Kudu tables must be granted to and revoked from roles as usual.
+        Only users with <code class="ph codeph">ALL</code> privileges on <code class="ph codeph">SERVER</code> can create external Kudu tables.
+        Currently, access to a Kudu table is <span class="q">"all or nothing"</span>:
+        enforced at the table level rather than the column level, and applying to all
+        SQL operations rather than individual statements such as <code class="ph codeph">INSERT</code>.
+        Because non-SQL APIs can access Kudu data without going through Sentry
+        authorization, currently the Sentry support is considered preliminary
+        and subject to change.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+    <p class="p">
+      <a class="xref" href="impala_authorization.html#authorization">Enabling Sentry Authorization for Impala</a>, <a class="xref" href="impala_grant.html#grant">GRANT Statement (Impala 2.0 or higher only)</a>
+      <a class="xref" href="impala_create_role.html#create_role">CREATE ROLE Statement (Impala 2.0 or higher only)</a>, <a class="xref" href="impala_drop_role.html#drop_role">DROP ROLE Statement (Impala 2.0 or higher only)</a>,
+      <a class="xref" href="impala_show.html#show">SHOW Statement</a>
+    </p>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_langref_sql.html">Impala SQL Statements</a></div></div></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_runtime_bloom_filter_size.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_runtime_bloom_filter_size.html b/docs/build/html/topics/impala_runtime_bloom_filter_size.html
new file mode 100644
index 0000000..75dcb20
--- /dev/null
+++ b/docs/build/html/topics/impala_runtime_bloom_filter_size.html
@@ -0,0 +1,94 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="runtime_bloom_filter_size"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>RUNTIME_BLOOM_FILTER_SIZE Query Option (Impala 2.5 or higher only)</title></head><body id="runtime_bloom_filter_size"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">RUNTIME_BLOOM_FILTER_SIZE Query Option (<span class="keyword">Impala 2.5</span> or higher only)</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      
+      Size (in bytes) of Bloom filter data structure used by the runtime filtering
+      feature.
+    </p>
+
+    <div class="note important note_important"><span class="note__title importanttitle">Important:</span> 
+      <p class="p">
+        In <span class="keyword">Impala 2.6</span> and higher, this query option only applies as a fallback, when statistics
+        are not available. By default, Impala estimates the optimal size of the Bloom filter structure
+        regardless of the setting for this option. (This is a change from the original behavior in
+        <span class="keyword">Impala 2.5</span>.)
+      </p>
+      <p class="p">
+        In <span class="keyword">Impala 2.6</span> and higher, when the value of this query option is used for query planning,
+        it is constrained by the minimum and maximum sizes specified by the
+        <code class="ph codeph">RUNTIME_FILTER_MIN_SIZE</code> and <code class="ph codeph">RUNTIME_FILTER_MAX_SIZE</code> query options.
+        The filter size is adjusted upward or downward if necessary to fit within the minimum/maximum range.
+      </p>
+    </div>
+
+    <p class="p">
+        <strong class="ph b">Type:</strong> integer
+      </p>
+
+    <p class="p">
+      <strong class="ph b">Default:</strong> 1048576 (1 MB)
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Maximum:</strong> 16 MB
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.5.0</span>
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+    <p class="p">
+      This setting affects optimizations for large and complex queries, such
+      as dynamic partition pruning for partitioned tables, and join optimization
+      for queries that join large tables.
+      Larger filters are more effective at handling
+      higher cardinality input sets, but consume more memory per filter.
+      
+    </p>
+
+    <p class="p">
+      If your query filters on high-cardinality columns (for example, millions of different values)
+      and you do not get the expected speedup from the runtime filtering mechanism, consider
+      doing some benchmarks with a higher value for <code class="ph codeph">RUNTIME_BLOOM_FILTER_SIZE</code>.
+      The extra memory devoted to the Bloom filter data structures can help make the filtering
+      more accurate.
+    </p>
+
+    <p class="p">
+        Because the runtime filtering feature applies mainly to resource-intensive
+        and long-running queries, only adjust this query option when tuning long-running queries
+        involving some combination of large partitioned tables and joins involving large tables.
+      </p>
+
+    <p class="p">
+      Because the effectiveness of this setting depends so much on query characteristics and data distribution,
+      you typically only use it for specific queries that need some extra tuning, and the ideal value depends
+      on the query. Consider setting this query option immediately before the expensive query and
+      unsetting it immediately afterward.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+    <p class="p">
+      <a class="xref" href="impala_runtime_filtering.html">Runtime Filtering for Impala Queries (Impala 2.5 or higher only)</a>,
+      
+      <a class="xref" href="impala_runtime_filter_mode.html#runtime_filter_mode">RUNTIME_FILTER_MODE Query Option (Impala 2.5 or higher only)</a>,
+      <a class="xref" href="impala_runtime_filter_min_size.html#runtime_filter_min_size">RUNTIME_FILTER_MIN_SIZE Query Option (Impala 2.6 or higher only)</a>,
+      <a class="xref" href="impala_runtime_filter_max_size.html#runtime_filter_max_size">RUNTIME_FILTER_MAX_SIZE Query Option (Impala 2.6 or higher only)</a>
+    </p>
+
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_runtime_filter_max_size.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_runtime_filter_max_size.html b/docs/build/html/topics/impala_runtime_filter_max_size.html
new file mode 100644
index 0000000..49d16ae
--- /dev/null
+++ b/docs/build/html/topics/impala_runtime_filter_max_size.html
@@ -0,0 +1,55 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="runtime_filter_max_size"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>RUNTIME_FILTER_MAX_SIZE Query Option (Impala 2.6 or higher only)</title></head><body id="runtime_filter_max_size"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">RUNTIME_FILTER_MAX_SIZE Query Option (<span class="keyword">Impala 2.6</span> or higher only)</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      
+      The <code class="ph codeph">RUNTIME_FILTER_MAX_SIZE</code> query option
+      adjusts the settings for the runtime filtering feature.
+      This option defines the maximum size for a filter,
+      no matter what the estimates produced by the planner are.
+      This value also overrides any lower number specified for the
+      <code class="ph codeph">RUNTIME_BLOOM_FILTER_SIZE</code> query option.
+      Filter sizes are rounded up to the nearest power of two.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Type:</strong> integer
+      </p>
+
+    <p class="p">
+      <strong class="ph b">Default:</strong> 0 (meaning use the value from the corresponding <span class="keyword cmdname">impalad</span> startup option)
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.6.0</span>
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+    <p class="p">
+        Because the runtime filtering feature applies mainly to resource-intensive
+        and long-running queries, only adjust this query option when tuning long-running queries
+        involving some combination of large partitioned tables and joins involving large tables.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+    <p class="p">
+      <a class="xref" href="impala_runtime_filtering.html">Runtime Filtering for Impala Queries (Impala 2.5 or higher only)</a>,
+      <a class="xref" href="impala_runtime_filter_mode.html#runtime_filter_mode">RUNTIME_FILTER_MODE Query Option (Impala 2.5 or higher only)</a>,
+      <a class="xref" href="impala_runtime_filter_min_size.html#runtime_filter_min_size">RUNTIME_FILTER_MIN_SIZE Query Option (Impala 2.6 or higher only)</a>,
+      <a class="xref" href="impala_runtime_bloom_filter_size.html#runtime_bloom_filter_size">RUNTIME_BLOOM_FILTER_SIZE Query Option (Impala 2.5 or higher only)</a>
+    </p>
+
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_runtime_filter_min_size.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_runtime_filter_min_size.html b/docs/build/html/topics/impala_runtime_filter_min_size.html
new file mode 100644
index 0000000..d385b28
--- /dev/null
+++ b/docs/build/html/topics/impala_runtime_filter_min_size.html
@@ -0,0 +1,55 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="runtime_filter_min_size"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>RUNTIME_FILTER_MIN_SIZE Query Option (Impala 2.6 or higher only)</title></head><body id="runtime_filter_min_size"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">RUNTIME_FILTER_MIN_SIZE Query Option (<span class="keyword">Impala 2.6</span> or higher only)</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      
+      The <code class="ph codeph">RUNTIME_FILTER_MIN_SIZE</code> query option
+      adjusts the settings for the runtime filtering feature.
+      This option defines the minimum size for a filter,
+      no matter what the estimates produced by the planner are.
+      This value also overrides any lower number specified for the
+      <code class="ph codeph">RUNTIME_BLOOM_FILTER_SIZE</code> query option.
+      Filter sizes are rounded up to the nearest power of two.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Type:</strong> integer
+      </p>
+
+    <p class="p">
+      <strong class="ph b">Default:</strong> 0 (meaning use the value from the corresponding <span class="keyword cmdname">impalad</span> startup option)
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.6.0</span>
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+    <p class="p">
+        Because the runtime filtering feature applies mainly to resource-intensive
+        and long-running queries, only adjust this query option when tuning long-running queries
+        involving some combination of large partitioned tables and joins involving large tables.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+    <p class="p">
+      <a class="xref" href="impala_runtime_filtering.html">Runtime Filtering for Impala Queries (Impala 2.5 or higher only)</a>,
+      <a class="xref" href="impala_runtime_filter_mode.html#runtime_filter_mode">RUNTIME_FILTER_MODE Query Option (Impala 2.5 or higher only)</a>,
+      <a class="xref" href="impala_runtime_filter_max_size.html#runtime_filter_max_size">RUNTIME_FILTER_MAX_SIZE Query Option (Impala 2.6 or higher only)</a>,
+      <a class="xref" href="impala_runtime_bloom_filter_size.html#runtime_bloom_filter_size">RUNTIME_BLOOM_FILTER_SIZE Query Option (Impala 2.5 or higher only)</a>
+    </p>
+
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_runtime_filter_mode.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_runtime_filter_mode.html b/docs/build/html/topics/impala_runtime_filter_mode.html
new file mode 100644
index 0000000..417863c
--- /dev/null
+++ b/docs/build/html/topics/impala_runtime_filter_mode.html
@@ -0,0 +1,75 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="runtime_filter_mode"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>RUNTIME_FILTER_MODE Query Option (Impala 2.5 or higher only)</title></head><body id="runtime_filter_mode"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">RUNTIME_FILTER_MODE Query Option (<span class="keyword">Impala 2.5</span> or higher only)</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      
+    </p>
+
+    <p class="p">
+      The <code class="ph codeph">RUNTIME_FILTER_MODE</code> query option
+      adjusts the settings for the runtime filtering feature.
+      It turns this feature on and off, and controls how
+      extensively the filters are transmitted between hosts.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Type:</strong> numeric (0, 1, 2)
+      or corresponding mnemonic strings (<code class="ph codeph">OFF</code>, <code class="ph codeph">LOCAL</code>, <code class="ph codeph">GLOBAL</code>).
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Default:</strong> 2 (equivalent to <code class="ph codeph">GLOBAL</code>); formerly was 1 / <code class="ph codeph">LOCAL</code>, in <span class="keyword">Impala 2.5</span>
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.5.0</span>
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+    <p class="p">
+      In <span class="keyword">Impala 2.6</span> and higher, the default is <code class="ph codeph">GLOBAL</code>.
+      This setting is recommended for a wide variety of workloads, to provide best
+      performance with <span class="q">"out of the box"</span> settings.
+    </p>
+
+    <p class="p">
+      The lowest setting of <code class="ph codeph">LOCAL</code> does a similar level of optimization
+      (such as partition pruning) as in earlier Impala releases.
+      This setting was the default in <span class="keyword">Impala 2.5</span>,
+      to allow for a period of post-upgrade testing for existing workloads.
+      This setting is suitable for workloads with non-performance-critical queries,
+      or if the coordinator node is under heavy CPU or memory pressure.
+    </p>
+
+    <p class="p">
+      You might change the setting to <code class="ph codeph">OFF</code> if your workload contains
+      many queries involving partitioned tables or joins that do not experience a performance
+      increase from the runtime filters feature. If the overhead of producing the runtime filters
+      outweighs the performance benefit for queries, you can turn the feature off entirely.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+    <p class="p">
+      <a class="xref" href="impala_partitioning.html#partitioning">Partitioning for Impala Tables</a> for details about runtime filtering.
+      <a class="xref" href="impala_disable_row_runtime_filtering.html#disable_row_runtime_filtering">DISABLE_ROW_RUNTIME_FILTERING Query Option (Impala 2.5 or higher only)</a>,
+      <a class="xref" href="impala_runtime_bloom_filter_size.html#runtime_bloom_filter_size">RUNTIME_BLOOM_FILTER_SIZE Query Option (Impala 2.5 or higher only)</a>,
+      <a class="xref" href="impala_runtime_filter_wait_time_ms.html#runtime_filter_wait_time_ms">RUNTIME_FILTER_WAIT_TIME_MS Query Option (Impala 2.5 or higher only)</a>,
+      and
+      <a class="xref" href="impala_max_num_runtime_filters.html#max_num_runtime_filters">MAX_NUM_RUNTIME_FILTERS Query Option (Impala 2.5 or higher only)</a>
+      for tuning options for runtime filtering.
+    </p>
+
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_runtime_filter_wait_time_ms.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_runtime_filter_wait_time_ms.html b/docs/build/html/topics/impala_runtime_filter_wait_time_ms.html
new file mode 100644
index 0000000..31869dc
--- /dev/null
+++ b/docs/build/html/topics/impala_runtime_filter_wait_time_ms.html
@@ -0,0 +1,51 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="runtime_filter_wait_time_ms"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>RUNTIME_FILTER_WAIT_TIME_MS Query Option (Impala 2.5 or higher only)</title></head><body id="runtime_filter_wait_time_ms"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">RUNTIME_FILTER_WAIT_TIME_MS Query Option (<span class="keyword">Impala 2.5</span> or higher only)</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      
+      The <code class="ph codeph">RUNTIME_FILTER_WAIT_TIME_MS</code> query option
+      adjusts the settings for the runtime filtering feature.
+      It specifies a time in milliseconds that each scan node waits for
+      runtime filters to be produced by other plan fragments.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Type:</strong> integer
+      </p>
+
+    <p class="p">
+      <strong class="ph b">Default:</strong> 0 (meaning use the value from the corresponding <span class="keyword cmdname">impalad</span> startup option)
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.5.0</span>
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+    <p class="p">
+        Because the runtime filtering feature applies mainly to resource-intensive
+        and long-running queries, only adjust this query option when tuning long-running queries
+        involving some combination of large partitioned tables and joins involving large tables.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+    <p class="p">
+      <a class="xref" href="impala_runtime_filtering.html">Runtime Filtering for Impala Queries (Impala 2.5 or higher only)</a>,
+      <a class="xref" href="impala_runtime_filter_mode.html#runtime_filter_mode">RUNTIME_FILTER_MODE Query Option (Impala 2.5 or higher only)</a>
+      
+    </p>
+
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_runtime_filtering.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_runtime_filtering.html b/docs/build/html/topics/impala_runtime_filtering.html
new file mode 100644
index 0000000..0b3bd16
--- /dev/null
+++ b/docs/build/html/topics/impala_runtime_filtering.html
@@ -0,0 +1,521 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_performance.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="runtime_filtering"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Runtime Filtering for Impala Queries (Impala 2.5 or higher only)</title></head><body id="runtime_filtering"><main role="main"><article role="article" aria-labelledby="runtime_filtering__runtime_filters">
+
+  <h1 class="title topictitle1" id="runtime_filtering__runtime_filters">Runtime Filtering for Impala Queries (<span class="keyword">Impala 2.5</span> or higher only)</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      
+      <dfn class="term">Runtime filtering</dfn> is a wide-ranging optimization feature available in
+      <span class="keyword">Impala 2.5</span> and higher. When only a fraction of the data in a table is
+      needed for a query against a partitioned table or to evaluate a join condition,
+      Impala determines the appropriate conditions while the query is running, and
+      broadcasts that information to all the <span class="keyword cmdname">impalad</span> nodes that are reading the table
+      so that they can avoid unnecessary I/O to read partition data, and avoid
+      unnecessary network transmission by sending only the subset of rows that match the join keys
+      across the network.
+    </p>
+
+    <p class="p">
+      This feature is primarily used to optimize queries against large partitioned tables
+      (under the name <dfn class="term">dynamic partition pruning</dfn>) and joins of large tables.
+      The information in this section includes concepts, internals, and troubleshooting
+      information for the entire runtime filtering feature.
+      For specific tuning steps for partitioned tables,
+      
+      see
+      <a class="xref" href="impala_partitioning.html#dynamic_partition_pruning">Dynamic Partition Pruning</a>.
+      
+    </p>
+
+    <div class="note important note_important"><span class="note__title importanttitle">Important:</span> 
+      <p class="p">
+        When this feature made its debut in <span class="keyword">Impala 2.5</span>,
+        the default setting was <code class="ph codeph">RUNTIME_FILTER_MODE=LOCAL</code>.
+        Now the default is <code class="ph codeph">RUNTIME_FILTER_MODE=GLOBAL</code> in <span class="keyword">Impala 2.6</span> and higher,
+        which enables more wide-ranging and ambitious query optimization without requiring you to
+        explicitly set any query options.
+      </p>
+    </div>
+
+    <p class="p toc inpage"></p>
+
+  </div>
+
+  <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_performance.html">Tuning Impala for Performance</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="runtime_filtering__runtime_filtering_concepts">
+    <h2 class="title topictitle2" id="ariaid-title2">Background Information for Runtime Filtering</h2>
+    <div class="body conbody">
+      <p class="p">
+        To understand how runtime filtering works at a detailed level, you must
+        be familiar with some terminology from the field of distributed database technology:
+      </p>
+      <ul class="ul">
+        <li class="li">
+          <p class="p">
+            What a <dfn class="term">plan fragment</dfn> is.
+            Impala decomposes each query into smaller units of work that are distributed across the cluster.
+            Wherever possible, a data block is read, filtered, and aggregated by plan fragments executing
+            on the same host. For some operations, such as joins and combining intermediate results into
+            a final result set, data is transmitted across the network from one DataNode to another.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            What <code class="ph codeph">SCAN</code> and <code class="ph codeph">HASH JOIN</code> plan nodes are, and their role in computing query results:
+          </p>
+          <p class="p">
+            In the Impala query plan, a <dfn class="term">scan node</dfn> performs the I/O to read from the underlying data files.
+            Although this is an expensive operation from the traditional database perspective, Hadoop clusters and Impala are
+            optimized to do this kind of I/O in a highly parallel fashion. The major potential cost savings come from using
+            the columnar Parquet format (where Impala can avoid reading data for unneeded columns) and partitioned tables
+            (where Impala can avoid reading data for unneeded partitions).
+          </p>
+          <p class="p">
+            Most Impala joins use the
+            <a class="xref" href="https://en.wikipedia.org/wiki/Hash_join" target="_blank"><dfn class="term">hash join</dfn></a>
+            mechanism. (It is only fairly recently that Impala
+            started using the nested-loop join technique, for certain kinds of non-equijoin queries.)
+            In a hash join, when evaluating join conditions from two tables, Impala constructs a hash table in memory with all
+            the different column values from the table on one side of the join.
+            Then, for each row from the table on the other side of the join, Impala tests whether the relevant column values
+            are in this hash table or not.
+          </p>
+          <p class="p">
+            A <dfn class="term">hash join node</dfn> constructs such an in-memory hash table, then performs the comparisons to
+            identify which rows match the relevant join conditions
+            and should be included in the result set (or at least sent on to the subsequent intermediate stage of
+            query processing). Because some of the input for a hash join might be transmitted across the network from another host,
+            it is especially important from a performance perspective to prune out ahead of time any data that is known to be
+            irrelevant.
+          </p>
+          <p class="p">
+            The more distinct values are in the columns used as join keys, the larger the in-memory hash table and
+            thus the more memory required to process the query.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            The difference between a <dfn class="term">broadcast join</dfn> and a <dfn class="term">shuffle join</dfn>.
+            (The Hadoop notion of a shuffle join is sometimes referred to in Impala as a <dfn class="term">partitioned join</dfn>.)
+            In a broadcast join, the table from one side of the join (typically the smaller table)
+            is sent in its entirety to all the hosts involved in the query. Then each host can compare its
+            portion of the data from the other (larger) table against the full set of possible join keys.
+            In a shuffle join, there is no obvious <span class="q">"smaller"</span> table, and so the contents of both tables
+            are divided up, and corresponding portions of the data are transmitted to each host involved in the query.
+            See <a class="xref" href="impala_hints.html#hints">Query Hints in Impala SELECT Statements</a> for information about how these different kinds of
+            joins are processed.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            The notion of the build phase and probe phase when Impala processes a join query.
+            The <dfn class="term">build phase</dfn> is where the rows containing the join key columns, typically for the smaller table,
+            are transmitted across the network and built into an in-memory hash table data structure on one or
+            more destination nodes.
+            The <dfn class="term">probe phase</dfn> is where data is read locally (typically from the larger table) and the join key columns
+            are compared to the values in the in-memory hash table.
+            The corresponding input sources (tables, subqueries, and so on) for these
+            phases are referred to as the <dfn class="term">build side</dfn> and the <dfn class="term">probe side</dfn>.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            How to set Impala query options: interactively within an <span class="keyword cmdname">impala-shell</span> session through
+            the <code class="ph codeph">SET</code> command, for a JDBC or ODBC application through the <code class="ph codeph">SET</code> statement, or
+            globally for all <span class="keyword cmdname">impalad</span> daemons through the <code class="ph codeph">default_query_options</code> configuration
+            setting.
+          </p>
+        </li>
+      </ul>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="runtime_filtering__runtime_filtering_internals">
+    <h2 class="title topictitle2" id="ariaid-title3">Runtime Filtering Internals</h2>
+    <div class="body conbody">
+      <p class="p">
+        The <dfn class="term">filter</dfn> that is transmitted between plan fragments is essentially a list
+        of values for join key columns. When this list is values is transmitted in time to a scan node,
+        Impala can filter out non-matching values immediately after reading them, rather than transmitting
+        the raw data to another host to compare against the in-memory hash table on that host.
+        This data structure is implemented as a <dfn class="term">Bloom filter</dfn>, which uses a probability-based
+        algorithm to determine all possible matching values. (The probability-based aspects means that the
+        filter might include some non-matching values, but if so, that does not cause any inaccuracy
+        in the final results.)
+      </p>
+      <p class="p">
+        There are different kinds of filters to match the different kinds of joins (partitioned and broadcast).
+        A broadcast filter is a complete list of relevant values that can be immediately evaluated by a scan node.
+        A partitioned filter is a partial list of relevant values (based on the data processed by one host in the
+        cluster); all the partitioned filters must be combined into one (by the coordinator node) before the
+        scan nodes can use the results to accurately filter the data as it is read from storage.
+      </p>
+      <p class="p">
+        Broadcast filters are also classified as local or global. With a local broadcast filter, the information
+        in the filter is used by a subsequent query fragment that is running on the same host that produced the filter.
+        A non-local broadcast filter must be transmitted across the network to a query fragment that is running on a
+        different host. Impala designates 3 hosts to each produce non-local broadcast filters, to guard against the
+        possibility of a single slow host taking too long. Depending on the setting of the <code class="ph codeph">RUNTIME_FILTER_MODE</code> query option
+        (<code class="ph codeph">LOCAL</code> or <code class="ph codeph">GLOBAL</code>), Impala either uses a conservative optimization
+        strategy where filters are only consumed on the same host that produced them, or a more aggressive strategy
+        where filters are eligible to be transmitted across the network.
+      </p>
+
+      <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+        In <span class="keyword">Impala 2.6</span> and higher, the default for runtime filtering is the <code class="ph codeph">GLOBAL</code> setting.
+      </div>
+
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="runtime_filtering__runtime_filtering_file_formats">
+    <h2 class="title topictitle2" id="ariaid-title4">File Format Considerations for Runtime Filtering</h2>
+    <div class="body conbody">
+      <p class="p">
+        Parquet tables get the most benefit from
+        the runtime filtering optimizations. Runtime filtering can speed up
+        join queries against partitioned or unpartitioned Parquet tables,
+        and single-table queries against partitioned Parquet tables.
+        See <a class="xref" href="impala_parquet.html#parquet">Using the Parquet File Format with Impala Tables</a> for information about
+        using Parquet tables with Impala.
+      </p>
+      <p class="p">
+        For other file formats (text, Avro, RCFile, and SequenceFile),
+        runtime filtering speeds up queries against partitioned tables only.
+        Because partitioned tables can use a mixture of formats, Impala produces
+        the filters in all cases, even if they are not ultimately used to
+        optimize the query.
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title5" id="runtime_filtering__runtime_filtering_timing">
+    <h2 class="title topictitle2" id="ariaid-title5">Wait Intervals for Runtime Filters</h2>
+    <div class="body conbody">
+      <p class="p">
+        Because it takes time to produce runtime filters, especially for
+        partitioned filters that must be combined by the coordinator node,
+        there is a time interval above which it is more efficient for
+        the scan nodes to go ahead and construct their intermediate result sets,
+        even if that intermediate data is larger than optimal. If it only takes
+        a few seconds to produce the filters, it is worth the extra time if pruning
+        the unnecessary data can save minutes in the overall query time.
+        You can specify the maximum wait time in milliseconds using the
+        <code class="ph codeph">RUNTIME_FILTER_WAIT_TIME_MS</code> query option.
+      </p>
+      <p class="p">
+        By default, each scan node waits for up to 1 second (1000 milliseconds)
+        for filters to arrive. If all filters have not arrived within the
+        specified interval, the scan node proceeds, using whatever filters
+        did arrive to help avoid reading unnecessary data. If a filter arrives
+        after the scan node begins reading data, the scan node applies that
+        filter to the data that is read after the filter arrives, but not to
+        the data that was already read.
+      </p>
+      <p class="p">
+        If the cluster is relatively busy and your workload contains many
+        resource-intensive or long-running queries, consider increasing the wait time
+        so that complicated queries do not miss opportunities for optimization.
+        If the cluster is lightly loaded and your workload contains many small queries
+        taking only a few seconds, consider decreasing the wait time to avoid the
+        1 second delay for each query.
+      </p>
+    </div>
+  </article>
+
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title6" id="runtime_filtering__runtime_filtering_query_options">
+    <h2 class="title topictitle2" id="ariaid-title6">Query Options for Runtime Filtering</h2>
+    <div class="body conbody">
+      <p class="p">
+        See the following sections for information about the query options that control runtime filtering:
+      </p>
+      <ul class="ul">
+        <li class="li">
+          <p class="p">
+            The first query option adjusts the <span class="q">"sensitivity"</span> of this feature.
+            <span class="ph">By default, it is set to the highest level (<code class="ph codeph">GLOBAL</code>).
+            (This default applies to <span class="keyword">Impala 2.6</span> and higher.
+            In previous releases, the default was <code class="ph codeph">LOCAL</code>.)</span>
+          </p>
+          <ul class="ul">
+            <li class="li">
+              <p class="p">
+                <a class="xref" href="impala_runtime_filter_mode.html#runtime_filter_mode">RUNTIME_FILTER_MODE Query Option (Impala 2.5 or higher only)</a>
+              </p>
+            </li>
+          </ul>
+        </li>
+        <li class="li">
+          <p class="p">
+            The other query options are tuning knobs that you typically only adjust after doing
+            performance testing, and that you might want to change only for the duration of a single
+            expensive query:
+          </p>
+          <ul class="ul">
+            <li class="li">
+              <p class="p">
+                <a class="xref" href="impala_max_num_runtime_filters.html#max_num_runtime_filters">MAX_NUM_RUNTIME_FILTERS Query Option (Impala 2.5 or higher only)</a>
+              </p>
+            </li>
+            <li class="li">
+              <p class="p">
+                <a class="xref" href="impala_disable_row_runtime_filtering.html#disable_row_runtime_filtering">DISABLE_ROW_RUNTIME_FILTERING Query Option (Impala 2.5 or higher only)</a>
+              </p>
+            </li>
+            <li class="li">
+              <p class="p">
+                <a class="xref" href="impala_runtime_filter_max_size.html#runtime_filter_max_size">RUNTIME_FILTER_MAX_SIZE Query Option (Impala 2.6 or higher only)</a> 
+              </p>
+            </li>
+            <li class="li">
+              <p class="p">
+                <a class="xref" href="impala_runtime_filter_min_size.html#runtime_filter_min_size">RUNTIME_FILTER_MIN_SIZE Query Option (Impala 2.6 or higher only)</a> 
+              </p>
+            </li>
+            <li class="li">
+              <p class="p">
+                <a class="xref" href="impala_runtime_bloom_filter_size.html#runtime_bloom_filter_size">RUNTIME_BLOOM_FILTER_SIZE Query Option (Impala 2.5 or higher only)</a>;
+                in <span class="keyword">Impala 2.6</span> and higher, this setting acts as a fallback when
+                statistics are not available, rather than as a directive.
+              </p>
+            </li>
+          </ul>
+        </li>
+      </ul>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title7" id="runtime_filtering__runtime_filtering_explain_plan">
+    <h2 class="title topictitle2" id="ariaid-title7">Runtime Filtering and Query Plans</h2>
+    <div class="body conbody">
+      <p class="p">
+        In the same way the query plan displayed by the
+        <code class="ph codeph">EXPLAIN</code> statement includes information
+        about predicates used by each plan fragment, it also
+        includes annotations showing whether a plan fragment
+        produces or consumes a runtime filter.
+        A plan fragment that produces a filter includes an
+        annotation such as
+        <code class="ph codeph">runtime filters: <var class="keyword varname">filter_id</var> &lt;- <var class="keyword varname">table</var>.<var class="keyword varname">column</var></code>,
+        while a plan fragment that consumes a filter includes an annotation such as
+        <code class="ph codeph">runtime filters: <var class="keyword varname">filter_id</var> -&gt; <var class="keyword varname">table</var>.<var class="keyword varname">column</var></code>.
+      </p>
+
+      <p class="p">
+        The following example shows a query that uses a single runtime filter (labelled <code class="ph codeph">RF00</code>)
+        to prune the partitions that are scanned in one stage of the query, based on evaluating the
+        result set of a subquery:
+      </p>
+
+<pre class="pre codeblock"><code>
+create table yy (s string) partitioned by (year int) stored as parquet;
+insert into yy partition (year) values ('1999', 1999), ('2000', 2000),
+  ('2001', 2001), ('2010',2010);
+compute stats yy;
+
+create table yy2 (s string) partitioned by (year int) stored as parquet;
+insert into yy2 partition (year) values ('1999', 1999), ('2000', 2000),
+  ('2001', 2001);
+compute stats yy2;
+
+-- The query reads an unknown number of partitions, whose key values are only
+-- known at run time. The 'runtime filters' lines show how the information about
+-- the partitions is calculated in query fragment 02, and then used in query
+-- fragment 00 to decide which partitions to skip.
+explain select s from yy2 where year in (select year from yy where year between 2000 and 2005);
++----------------------------------------------------------+
+| Explain String                                           |
++----------------------------------------------------------+
+| Estimated Per-Host Requirements: Memory=16.00MB VCores=2 |
+|                                                          |
+| 04:EXCHANGE [UNPARTITIONED]                              |
+| |                                                        |
+| 02:HASH JOIN [LEFT SEMI JOIN, BROADCAST]                 |
+| |  hash predicates: year = year                          |
+| |  <strong class="ph b">runtime filters: RF000 &lt;- year</strong>                        |
+| |                                                        |
+| |--03:EXCHANGE [BROADCAST]                               |
+| |  |                                                     |
+| |  01:SCAN HDFS [dpp.yy]                                 |
+| |     partitions=2/4 files=2 size=468B                   |
+| |                                                        |
+| 00:SCAN HDFS [dpp.yy2]                                   |
+|    partitions=2/3 files=2 size=468B                      |
+|    <strong class="ph b">runtime filters: RF000 -&gt; year</strong>                        |
++----------------------------------------------------------+
+</code></pre>
+
+      <p class="p">
+        The query profile (displayed by the <code class="ph codeph">PROFILE</code> command in <span class="keyword cmdname">impala-shell</span>)
+        contains both the <code class="ph codeph">EXPLAIN</code> plan and more detailed information about the internal
+        workings of the query. The profile output includes a section labelled the <span class="q">"filter routing table"</span>,
+        with information about each filter based on its ID.
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title8" id="runtime_filtering__runtime_filtering_queries">
+    <h2 class="title topictitle2" id="ariaid-title8">Examples of Queries that Benefit from Runtime Filtering</h2>
+    <div class="body conbody">
+
+      <p class="p">
+        In this example, Impala would normally do extra work to interpret the columns
+        <code class="ph codeph">C1</code>, <code class="ph codeph">C2</code>, <code class="ph codeph">C3</code>, and <code class="ph codeph">ID</code>
+        for each row in <code class="ph codeph">HUGE_T1</code>, before checking the <code class="ph codeph">ID</code>
+        value against the in-memory hash table constructed from all the <code class="ph codeph">TINY_T2.ID</code>
+        values. By producing a filter containing all the <code class="ph codeph">TINY_T2.ID</code> values
+        even before the query starts scanning the <code class="ph codeph">HUGE_T1</code> table, Impala
+        can skip the unnecessary work to parse the column info as soon as it determines
+        that an <code class="ph codeph">ID</code> value does not match any of the values from the other table.
+      </p>
+
+      <p class="p">
+        The example shows <code class="ph codeph">COMPUTE STATS</code> statements for both the tables (even
+        though that is a one-time operation after loading data into those tables) because
+        Impala relies on up-to-date statistics to
+        determine which one has more distinct <code class="ph codeph">ID</code> values than the other.
+        That information lets Impala make effective decisions about which table to use to
+        construct the in-memory hash table, and which table to read from disk and
+        compare against the entries in the hash table.
+      </p>
+
+<pre class="pre codeblock"><code>
+COMPUTE STATS huge_t1;
+COMPUTE STATS tiny_t2;
+SELECT c1, c2, c3 FROM huge_t1 JOIN tiny_t2 WHERE huge_t1.id = tiny_t2.id;
+</code></pre>
+
+
+
+      <p class="p">
+        In this example, <code class="ph codeph">T1</code> is a table partitioned by year. The subquery
+        on <code class="ph codeph">T2</code> produces multiple values, and transmits those values as a filter to the plan
+        fragments that are reading from <code class="ph codeph">T1</code>. Any non-matching partitions in <code class="ph codeph">T1</code>
+        are skipped.
+      </p>
+
+<pre class="pre codeblock"><code>
+select c1 from t1 where year in (select distinct year from t2);
+</code></pre>
+
+      <p class="p">
+        Now the <code class="ph codeph">WHERE</code> clause contains an additional test that does not apply to
+        the partition key column.
+        A filter on a column that is not a partition key is called a per-row filter.
+        Because per-row filters only apply for Parquet, <code class="ph codeph">T1</code> must be a Parquet table.
+      </p>
+
+      <p class="p">
+        The subqueries result in two filters being transmitted to
+        the scan nodes that read from <code class="ph codeph">T1</code>. The filter on <code class="ph codeph">YEAR</code> helps the query eliminate
+        entire partitions based on non-matching years. The filter on <code class="ph codeph">C2</code> lets Impala discard
+        rows with non-matching <code class="ph codeph">C2</code> values immediately after reading them. Without runtime filtering,
+        Impala would have to keep the non-matching values in memory, assemble <code class="ph codeph">C1</code>, <code class="ph codeph">C2</code>,
+        and <code class="ph codeph">C3</code> into rows in the intermediate result set, and transmit all the intermediate rows
+        back to the coordinator node, where they would be eliminated only at the very end of the query.
+      </p>
+
+<pre class="pre codeblock"><code>
+select c1, c2, c3 from t1
+  where year in (select distinct year from t2)
+    and c2 in (select other_column from t3);
+</code></pre>
+
+      <p class="p">
+        This example involves a broadcast join.
+        The fact that the <code class="ph codeph">ON</code> clause would
+        return a small number of matching rows (because there
+        are not very many rows in <code class="ph codeph">TINY_T2</code>)
+        means that the corresponding filter is very selective.
+        Therefore, runtime filtering will probably be effective
+        in optimizing this query.
+      </p>
+
+<pre class="pre codeblock"><code>
+select c1 from huge_t1 join [broadcast] tiny_t2
+  on huge_t1.id = tiny_t2.id
+  where huge_t1.year in (select distinct year from tiny_t2)
+    and c2 in (select other_column from t3);
+</code></pre>
+
+      <p class="p">
+        This example involves a shuffle or partitioned join.
+        Assume that most rows in <code class="ph codeph">HUGE_T1</code>
+        have a corresponding row in <code class="ph codeph">HUGE_T2</code>.
+        The fact that the <code class="ph codeph">ON</code> clause could
+        return a large number of matching rows means that
+        the corresponding filter would not be very selective.
+        Therefore, runtime filtering might be less effective
+        in optimizing this query.
+      </p>
+
+<pre class="pre codeblock"><code>
+select c1 from huge_t1 join [shuffle] huge_t2
+  on huge_t1.id = huge_t2.id
+  where huge_t1.year in (select distinct year from huge_t2)
+    and c2 in (select other_column from t3);
+</code></pre>
+
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title9" id="runtime_filtering__runtime_filtering_tuning">
+    <h2 class="title topictitle2" id="ariaid-title9">Tuning and Troubleshooting Queries that Use Runtime Filtering</h2>
+    <div class="body conbody">
+      <p class="p">
+        These tuning and troubleshooting procedures apply to queries that are
+        resource-intensive enough, long-running enough, and frequent enough
+        that you can devote special attention to optimizing them individually.
+      </p>
+
+      <p class="p">
+        Use the <code class="ph codeph">EXPLAIN</code> statement and examine the <code class="ph codeph">runtime filters:</code>
+        lines to determine whether runtime filters are being applied to the <code class="ph codeph">WHERE</code> predicates
+        and join clauses that you expect. For example, runtime filtering does not apply to queries that use
+        the nested loop join mechanism due to non-equijoin operators.
+      </p>
+
+      <p class="p">
+        Make sure statistics are up-to-date for all tables involved in the queries.
+        Use the <code class="ph codeph">COMPUTE STATS</code> statement after loading data into non-partitioned tables,
+        and <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> after adding new partitions to partitioned tables.
+      </p>
+
+      <p class="p">
+        If join queries involving large tables use unique columns as the join keys,
+        for example joining a primary key column with a foreign key column, the overhead of
+        producing and transmitting the filter might outweigh the performance benefit because
+        not much data could be pruned during the early stages of the query.
+        For such queries, consider setting the query option <code class="ph codeph">RUNTIME_FILTER_MODE=OFF</code>.
+      </p>
+
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title10" id="runtime_filtering__runtime_filtering_limits">
+    <h2 class="title topictitle2" id="ariaid-title10">Limitations and Restrictions for Runtime Filtering</h2>
+    <div class="body conbody">
+      <p class="p">
+        The runtime filtering feature is most effective for the Parquet file formats.
+        For other file formats, filtering only applies for partitioned tables.
+        See <a class="xref" href="impala_runtime_filtering.html#runtime_filtering_file_formats">File Format Considerations for Runtime Filtering</a>.
+      </p>
+
+      
+      <p class="p">
+        When the spill-to-disk mechanism is activated on a particular host during a query,
+        that host does not produce any filters while processing that query.
+        This limitation does not affect the correctness of results; it only reduces the
+        amount of optimization that can be applied to the query.
+      </p>
+
+    </div>
+  </article>
+
+
+</article></main></body></html>
\ No newline at end of file

[36/51] [partial] incubator-impala git commit: IMPALA-4181 [DOCS] Publish rendered Impala documentation to ASF site

Posted by jb...@apache.org.

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_describe.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_describe.html b/docs/build/html/topics/impala_describe.html
new file mode 100644
index 0000000..963ef6e
--- /dev/null
+++ b/docs/build/html/topics/impala_describe.html
@@ -0,0 +1,802 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_langref_sql.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="describe"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>DESCRIBE Statement</title></head><body id="describe"><main role="main"><article role="article" aria-labelledby="describe__desc">
+
+  <h1 class="title topictitle1" id="describe__desc">DESCRIBE Statement</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      
+      The <code class="ph codeph">DESCRIBE</code> statement displays metadata about a table, such as the column names and their
+      data types.
+      <span class="ph">In <span class="keyword">Impala 2.3</span> and higher, you can specify the name of a complex type column, which takes
+      the form of a dotted path. The path might include multiple components in the case of a nested type definition.</span>
+      <span class="ph">In <span class="keyword">Impala 2.5</span> and higher, the <code class="ph codeph">DESCRIBE DATABASE</code> form can display
+      information about a database.</span>
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>DESCRIBE [DATABASE] [FORMATTED|EXTENDED] <var class="keyword varname">object_name</var>
+
+object_name ::=
+    [<var class="keyword varname">db_name</var>.]<var class="keyword varname">table_name</var>[.<var class="keyword varname">complex_col_name</var> ...]
+  | <var class="keyword varname">db_name</var>
+</code></pre>
+
+    <p class="p">
+      You can use the abbreviation <code class="ph codeph">DESC</code> for the <code class="ph codeph">DESCRIBE</code> statement.
+    </p>
+
+    <p class="p">
+      The <code class="ph codeph">DESCRIBE FORMATTED</code> variation displays additional information, in a format familiar to
+      users of Apache Hive. The extra information includes low-level details such as whether the table is internal
+      or external, when it was created, the file format, the location of the data in HDFS, whether the object is a
+      table or a view, and (for views) the text of the query from the view definition.
+    </p>
+
+    <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+      The <code class="ph codeph">Compressed</code> field is not a reliable indicator of whether the table contains compressed
+      data. It typically always shows <code class="ph codeph">No</code>, because the compression settings only apply during the
+      session that loads data and are not stored persistently with the table metadata.
+    </div>
+
+<p class="p">
+  <strong class="ph b">Describing databases:</strong>
+</p>
+
+<p class="p">
+  By default, the <code class="ph codeph">DESCRIBE</code> output for a database includes the location
+  and the comment, which can be set by the <code class="ph codeph">LOCATION</code> and <code class="ph codeph">COMMENT</code>
+  clauses on the <code class="ph codeph">CREATE DATABASE</code> statement.
+</p>
+
+<p class="p">
+  The additional information displayed by the <code class="ph codeph">FORMATTED</code> or <code class="ph codeph">EXTENDED</code>
+  keyword includes the HDFS user ID that is considered the owner of the database, and any
+  optional database properties. The properties could be specified by the <code class="ph codeph">WITH DBPROPERTIES</code>
+  clause if the database is created using a Hive <code class="ph codeph">CREATE DATABASE</code> statement.
+  Impala currently does not set or do any special processing based on those properties.
+</p>
+
+<p class="p">
+The following examples show the variations in syntax and output for
+describing databases. This feature is available in <span class="keyword">Impala 2.5</span>
+and higher.
+</p>
+
+<pre class="pre codeblock"><code>
+describe database default;
++---------+----------------------+-----------------------+
+| name    | location             | comment               |
++---------+----------------------+-----------------------+
+| default | /user/hive/warehouse | Default Hive database |
++---------+----------------------+-----------------------+
+
+describe database formatted default;
++---------+----------------------+-----------------------+
+| name    | location             | comment               |
++---------+----------------------+-----------------------+
+| default | /user/hive/warehouse | Default Hive database |
+| Owner:  |                      |                       |
+|         | public               | ROLE                  |
++---------+----------------------+-----------------------+
+
+describe database extended default;
++---------+----------------------+-----------------------+
+| name    | location             | comment               |
++---------+----------------------+-----------------------+
+| default | /user/hive/warehouse | Default Hive database |
+| Owner:  |                      |                       |
+|         | public               | ROLE                  |
++---------+----------------------+-----------------------+
+</code></pre>
+
+<p class="p">
+  <strong class="ph b">Describing tables:</strong>
+</p>
+
+<p class="p">
+  If the <code class="ph codeph">DATABASE</code> keyword is omitted, the default
+  for the <code class="ph codeph">DESCRIBE</code> statement is to refer to a table.
+</p>
+
+<pre class="pre codeblock"><code>
+-- By default, the table is assumed to be in the current database.
+describe my_table;
++------+--------+---------+
+| name | type   | comment |
++------+--------+---------+
+| x    | int    |         |
+| s    | string |         |
++------+--------+---------+
+
+-- Use a fully qualified table name to specify a table in any database.
+describe my_database.my_table;
++------+--------+---------+
+| name | type   | comment |
++------+--------+---------+
+| x    | int    |         |
+| s    | string |         |
++------+--------+---------+
+
+-- The formatted or extended output includes additional useful information.
+-- The LOCATION field is especially useful to know for DDL statements and HDFS commands
+-- during ETL jobs. (The LOCATION includes a full hdfs:// URL, omitted here for readability.)
+describe formatted my_table;
++------------------------------+----------------------------------------------+----------------------+
+| name                         | type                                         | comment              |
++------------------------------+----------------------------------------------+----------------------+
+| # col_name                   | data_type                                    | comment              |
+|                              | NULL                                         | NULL                 |
+| x                            | int                                          | NULL                 |
+| s                            | string                                       | NULL                 |
+|                              | NULL                                         | NULL                 |
+| # Detailed Table Information | NULL                                         | NULL                 |
+| Database:                    | my_database                                  | NULL                 |
+| Owner:                       | jrussell                                     | NULL                 |
+| CreateTime:                  | Fri Mar 18 15:58:00 PDT 2016                 | NULL                 |
+| LastAccessTime:              | UNKNOWN                                      | NULL                 |
+| Protect Mode:                | None                                         | NULL                 |
+| Retention:                   | 0                                            | NULL                 |
+| Location:                    | /user/hive/warehouse/my_database.db/my_table | NULL                 |
+| Table Type:                  | MANAGED_TABLE                                | NULL                 |
+| Table Parameters:            | NULL                                         | NULL                 |
+|                              | transient_lastDdlTime                        | 1458341880           |
+|                              | NULL                                         | NULL                 |
+| # Storage Information        | NULL                                         | NULL                 |
+| SerDe Library:               | org. ... .LazySimpleSerDe                    | NULL                 |
+| InputFormat:                 | org.apache.hadoop.mapred.TextInputFormat     | NULL                 |
+| OutputFormat:                | org. ... .HiveIgnoreKeyTextOutputFormat      | NULL                 |
+| Compressed:                  | No                                           | NULL                 |
+| Num Buckets:                 | 0                                            | NULL                 |
+| Bucket Columns:              | []                                           | NULL                 |
+| Sort Columns:                | []                                           | NULL                 |
++------------------------------+----------------------------------------------+----------------------+
+</code></pre>
+
+    <p class="p">
+        <strong class="ph b">Complex type considerations:</strong>
+      </p>
+
+    <p class="p">
+      Because the column definitions for complex types can become long, particularly when such types are nested,
+      the <code class="ph codeph">DESCRIBE</code> statement uses special formatting for complex type columns to make the output readable.
+    </p>
+
+    <p class="p">
+      For the <code class="ph codeph">ARRAY</code>, <code class="ph codeph">STRUCT</code>, and <code class="ph codeph">MAP</code> types available in
+      <span class="keyword">Impala 2.3</span> and higher, the <code class="ph codeph">DESCRIBE</code> output is formatted to avoid
+      excessively long lines for multiple fields within a <code class="ph codeph">STRUCT</code>, or a nested sequence of
+      complex types.
+    </p>
+
+    <p class="p">
+        You can pass a multi-part qualified name to <code class="ph codeph">DESCRIBE</code>
+        to specify an <code class="ph codeph">ARRAY</code>, <code class="ph codeph">STRUCT</code>, or <code class="ph codeph">MAP</code>
+        column and visualize its structure as if it were a table.
+        For example, if table <code class="ph codeph">T1</code> contains an <code class="ph codeph">ARRAY</code> column
+        <code class="ph codeph">A1</code>, you could issue the statement <code class="ph codeph">DESCRIBE t1.a1</code>.
+        If table <code class="ph codeph">T1</code> contained a <code class="ph codeph">STRUCT</code> column <code class="ph codeph">S1</code>,
+        and a field <code class="ph codeph">F1</code> within the <code class="ph codeph">STRUCT</code> was a <code class="ph codeph">MAP</code>,
+        you could issue the statement <code class="ph codeph">DESCRIBE t1.s1.f1</code>.
+        An <code class="ph codeph">ARRAY</code> is shown as a two-column table, with
+        <code class="ph codeph">ITEM</code> and <code class="ph codeph">POS</code> columns.
+        A <code class="ph codeph">STRUCT</code> is shown as a table with each field
+        representing a column in the table.
+        A <code class="ph codeph">MAP</code> is shown as a two-column table, with
+        <code class="ph codeph">KEY</code> and <code class="ph codeph">VALUE</code> columns.
+      </p>
+
+    <p class="p">
+      For example, here is the <code class="ph codeph">DESCRIBE</code> output for a table containing a single top-level column
+      of each complex type:
+    </p>
+
+<pre class="pre codeblock"><code>create table t1 (x int, a array&lt;int&gt;, s struct&lt;f1: string, f2: bigint&gt;, m map&lt;string,int&gt;) stored as parquet;
+
+describe t1;
++------+-----------------+---------+
+| name | type            | comment |
++------+-----------------+---------+
+| x    | int             |         |
+| a    | array&lt;int&gt;      |         |
+| s    | struct&lt;         |         |
+|      |   f1:string,    |         |
+|      |   f2:bigint     |         |
+|      | &gt;               |         |
+| m    | map&lt;string,int&gt; |         |
++------+-----------------+---------+
+
+</code></pre>
+
+    <p class="p">
+      Here are examples showing how to <span class="q">"drill down"</span> into the layouts of complex types, including
+      using multi-part names to examine the definitions of nested types.
+      The <code class="ph codeph">&lt; &gt;</code> delimiters identify the columns with complex types;
+      these are the columns where you can descend another level to see the parts that make up
+      the complex type.
+      This technique helps you to understand the multi-part names you use as table references in queries
+      involving complex types, and the corresponding column names you refer to in the <code class="ph codeph">SELECT</code> list.
+      These tables are from the <span class="q">"nested TPC-H"</span> schema, shown in detail in
+      <a class="xref" href="impala_complex_types.html#complex_sample_schema">Sample Schema and Data for Experimenting with Impala Complex Types</a>.
+    </p>
+
+    <p class="p">
+      The <code class="ph codeph">REGION</code> table contains an <code class="ph codeph">ARRAY</code> of <code class="ph codeph">STRUCT</code>
+      elements:
+    </p>
+
+    <ul class="ul">
+      <li class="li">
+        <p class="p">
+          The first <code class="ph codeph">DESCRIBE</code> specifies the table name, to display the definition
+          of each top-level column.
+        </p>
+      </li>
+      <li class="li">
+        <p class="p">
+          The second <code class="ph codeph">DESCRIBE</code> specifies the name of a complex
+          column, <code class="ph codeph">REGION.R_NATIONS</code>, showing that when you include the name of an <code class="ph codeph">ARRAY</code>
+          column in a <code class="ph codeph">FROM</code> clause, that table reference acts like a two-column table with
+          columns <code class="ph codeph">ITEM</code> and <code class="ph codeph">POS</code>.
+        </p>
+      </li>
+      <li class="li">
+        <p class="p">
+          The final <code class="ph codeph">DESCRIBE</code> specifies the fully qualified name of the <code class="ph codeph">ITEM</code> field,
+          to display the layout of its underlying <code class="ph codeph">STRUCT</code> type in table format, with the fields
+          mapped to column names.
+        </p>
+      </li>
+    </ul>
+
+<pre class="pre codeblock"><code>
+-- #1: The overall layout of the entire table.
+describe region;
++-------------+-------------------------+---------+
+| name        | type                    | comment |
++-------------+-------------------------+---------+
+| r_regionkey | smallint                |         |
+| r_name      | string                  |         |
+| r_comment   | string                  |         |
+| r_nations   | array&lt;struct&lt;           |         |
+|             |   n_nationkey:smallint, |         |
+|             |   n_name:string,        |         |
+|             |   n_comment:string      |         |
+|             | &gt;&gt;                      |         |
++-------------+-------------------------+---------+
+
+-- #2: The ARRAY column within the table.
+describe region.r_nations;
++------+-------------------------+---------+
+| name | type                    | comment |
++------+-------------------------+---------+
+| item | struct&lt;                 |         |
+|      |   n_nationkey:smallint, |         |
+|      |   n_name:string,        |         |
+|      |   n_comment:string      |         |
+|      | &gt;                       |         |
+| pos  | bigint                  |         |
++------+-------------------------+---------+
+
+-- #3: The STRUCT that makes up each ARRAY element.
+--     The fields of the STRUCT act like columns of a table.
+describe region.r_nations.item;
++-------------+----------+---------+
+| name        | type     | comment |
++-------------+----------+---------+
+| n_nationkey | smallint |         |
+| n_name      | string   |         |
+| n_comment   | string   |         |
++-------------+----------+---------+
+
+</code></pre>
+
+    <p class="p">
+      The <code class="ph codeph">CUSTOMER</code> table contains an <code class="ph codeph">ARRAY</code> of <code class="ph codeph">STRUCT</code>
+      elements, where one field in the <code class="ph codeph">STRUCT</code> is another <code class="ph codeph">ARRAY</code> of
+      <code class="ph codeph">STRUCT</code> elements:
+    </p>
+    <ul class="ul">
+      <li class="li">
+        <p class="p">
+          Again, the initial <code class="ph codeph">DESCRIBE</code> specifies only the table name.
+        </p>
+      </li>
+      <li class="li">
+        <p class="p">
+          The second <code class="ph codeph">DESCRIBE</code> specifies the qualified name of the complex
+          column, <code class="ph codeph">CUSTOMER.C_ORDERS</code>, showing how an <code class="ph codeph">ARRAY</code>
+          is represented as a two-column table with columns <code class="ph codeph">ITEM</code> and <code class="ph codeph">POS</code>.
+        </p>
+      </li>
+      <li class="li">
+        <p class="p">
+          The third <code class="ph codeph">DESCRIBE</code> specifies the qualified name of the <code class="ph codeph">ITEM</code>
+          of the <code class="ph codeph">ARRAY</code> column, to see the structure of the nested <code class="ph codeph">ARRAY</code>.
+          Again, it has has two parts, <code class="ph codeph">ITEM</code> and <code class="ph codeph">POS</code>. Because the
+          <code class="ph codeph">ARRAY</code> contains a <code class="ph codeph">STRUCT</code>, the layout of the <code class="ph codeph">STRUCT</code>
+          is shown.
+        </p>
+      </li>
+      <li class="li">
+        <p class="p">
+          The fourth and fifth <code class="ph codeph">DESCRIBE</code> statements drill down into a <code class="ph codeph">STRUCT</code> field that
+          is itself a complex type, an <code class="ph codeph">ARRAY</code> of <code class="ph codeph">STRUCT</code>.
+          The <code class="ph codeph">ITEM</code> portion of the qualified name is only required when the <code class="ph codeph">ARRAY</code>
+          elements are anonymous. The fields of the <code class="ph codeph">STRUCT</code> give names to any other complex types
+          nested inside the <code class="ph codeph">STRUCT</code>. Therefore, the <code class="ph codeph">DESCRIBE</code> parameters
+          <code class="ph codeph">CUSTOMER.C_ORDERS.ITEM.O_LINEITEMS</code> and <code class="ph codeph">CUSTOMER.C_ORDERS.O_LINEITEMS</code>
+          are equivalent. (For brevity, leave out the <code class="ph codeph">ITEM</code> portion of
+          a qualified name when it is not required.)
+        </p>
+      </li>
+      <li class="li">
+        <p class="p">
+          The final <code class="ph codeph">DESCRIBE</code> shows the layout of the deeply nested <code class="ph codeph">STRUCT</code> type.
+          Because there are no more complex types nested inside this <code class="ph codeph">STRUCT</code>, this is as far
+          as you can drill down into the layout for this table.
+        </p>
+      </li>
+    </ul>
+
+<pre class="pre codeblock"><code>-- #1: The overall layout of the entire table.
+describe customer;
++--------------+------------------------------------+
+| name         | type                               |
++--------------+------------------------------------+
+| c_custkey    | bigint                             |
+... more scalar columns ...
+| c_orders     | array&lt;struct&lt;                      |
+|              |   o_orderkey:bigint,               |
+|              |   o_orderstatus:string,            |
+|              |   o_totalprice:decimal(12,2),      |
+|              |   o_orderdate:string,              |
+|              |   o_orderpriority:string,          |
+|              |   o_clerk:string,                  |
+|              |   o_shippriority:int,              |
+|              |   o_comment:string,                |
+|              |   o_lineitems:array&lt;struct&lt;        |
+|              |     l_partkey:bigint,              |
+|              |     l_suppkey:bigint,              |
+|              |     l_linenumber:int,              |
+|              |     l_quantity:decimal(12,2),      |
+|              |     l_extendedprice:decimal(12,2), |
+|              |     l_discount:decimal(12,2),      |
+|              |     l_tax:decimal(12,2),           |
+|              |     l_returnflag:string,           |
+|              |     l_linestatus:string,           |
+|              |     l_shipdate:string,             |
+|              |     l_commitdate:string,           |
+|              |     l_receiptdate:string,          |
+|              |     l_shipinstruct:string,         |
+|              |     l_shipmode:string,             |
+|              |     l_comment:string               |
+|              |   &gt;&gt;                               |
+|              | &gt;&gt;                                 |
++--------------+------------------------------------+
+
+-- #2: The ARRAY column within the table.
+describe customer.c_orders;
++------+------------------------------------+
+| name | type                               |
++------+------------------------------------+
+| item | struct&lt;                            |
+|      |   o_orderkey:bigint,               |
+|      |   o_orderstatus:string,            |
+... more struct fields ...
+|      |   o_lineitems:array&lt;struct&lt;        |
+|      |     l_partkey:bigint,              |
+|      |     l_suppkey:bigint,              |
+... more nested struct fields ...
+|      |     l_comment:string               |
+|      |   &gt;&gt;                               |
+|      | &gt;                                  |
+| pos  | bigint                             |
++------+------------------------------------+
+
+-- #3: The STRUCT that makes up each ARRAY element.
+--     The fields of the STRUCT act like columns of a table.
+describe customer.c_orders.item;
++-----------------+----------------------------------+
+| name            | type                             |
++-----------------+----------------------------------+
+| o_orderkey      | bigint                           |
+| o_orderstatus   | string                           |
+| o_totalprice    | decimal(12,2)                    |
+| o_orderdate     | string                           |
+| o_orderpriority | string                           |
+| o_clerk         | string                           |
+| o_shippriority  | int                              |
+| o_comment       | string                           |
+| o_lineitems     | array&lt;struct&lt;                    |
+|                 |   l_partkey:bigint,              |
+|                 |   l_suppkey:bigint,              |
+... more struct fields ...
+|                 |   l_comment:string               |
+|                 | &gt;&gt;                               |
++-----------------+----------------------------------+
+
+-- #4: The ARRAY nested inside the STRUCT elements of the first ARRAY.
+describe customer.c_orders.item.o_lineitems;
++------+----------------------------------+
+| name | type                             |
++------+----------------------------------+
+| item | struct&lt;                          |
+|      |   l_partkey:bigint,              |
+|      |   l_suppkey:bigint,              |
+... more struct fields ...
+|      |   l_comment:string               |
+|      | &gt;                                |
+| pos  | bigint                           |
++------+----------------------------------+
+
+-- #5: Shorter form of the previous DESCRIBE. Omits the .ITEM portion of the name
+--     because O_LINEITEMS and other field names provide a way to refer to things
+--     inside the ARRAY element.
+describe customer.c_orders.o_lineitems;
++------+----------------------------------+
+| name | type                             |
++------+----------------------------------+
+| item | struct&lt;                          |
+|      |   l_partkey:bigint,              |
+|      |   l_suppkey:bigint,              |
+... more struct fields ...
+|      |   l_comment:string               |
+|      | &gt;                                |
+| pos  | bigint                           |
++------+----------------------------------+
+
+-- #6: The STRUCT representing ARRAY elements nested inside
+--     another ARRAY of STRUCTs. The lack of any complex types
+--     in this output means this is as far as DESCRIBE can
+--     descend into the table layout.
+describe customer.c_orders.o_lineitems.item;
++-----------------+---------------+
+| name            | type          |
++-----------------+---------------+
+| l_partkey       | bigint        |
+| l_suppkey       | bigint        |
+... more scalar columns ...
+| l_comment       | string        |
++-----------------+---------------+
+
+</code></pre>
+
+<p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+<p class="p">
+  After the <span class="keyword cmdname">impalad</span> daemons are restarted, the first query against a table can take longer
+  than subsequent queries, because the metadata for the table is loaded before the query is processed. This
+  one-time delay for each table can cause misleading results in benchmark tests or cause unnecessary concern.
+  To <span class="q">"warm up"</span> the Impala metadata cache, you can issue a <code class="ph codeph">DESCRIBE</code> statement in advance
+  for each table you intend to access later.
+</p>
+
+<p class="p">
+  When you are dealing with data files stored in HDFS, sometimes it is important to know details such as the
+  path of the data files for an Impala table, and the hostname for the namenode. You can get this information
+  from the <code class="ph codeph">DESCRIBE FORMATTED</code> output. You specify HDFS URIs or path specifications with
+  statements such as <code class="ph codeph">LOAD DATA</code> and the <code class="ph codeph">LOCATION</code> clause of <code class="ph codeph">CREATE
+  TABLE</code> or <code class="ph codeph">ALTER TABLE</code>. You might also use HDFS URIs or paths with Linux commands
+  such as <span class="keyword cmdname">hadoop</span> and <span class="keyword cmdname">hdfs</span> to copy, rename, and so on, data files in HDFS.
+</p>
+
+<p class="p">
+        If you connect to different Impala nodes within an <span class="keyword cmdname">impala-shell</span> session for
+        load-balancing purposes, you can enable the <code class="ph codeph">SYNC_DDL</code> query option to make each DDL
+        statement wait before returning, until the new or changed metadata has been received by all the Impala
+        nodes. See <a class="xref" href="../shared/../topics/impala_sync_ddl.html#sync_ddl">SYNC_DDL Query Option</a> for details.
+      </p>
+
+<p class="p">
+  Each table can also have associated table statistics and column statistics. To see these categories of
+  information, use the <code class="ph codeph">SHOW TABLE STATS <var class="keyword varname">table_name</var></code> and <code class="ph codeph">SHOW COLUMN
+  STATS <var class="keyword varname">table_name</var></code> statements.
+
+  See <a class="xref" href="impala_show.html#show">SHOW Statement</a> for details.
+</p>
+
+<div class="note important note_important"><span class="note__title importanttitle">Important:</span> 
+        After adding or replacing data in a table used in performance-critical queries, issue a <code class="ph codeph">COMPUTE
+        STATS</code> statement to make sure all statistics are up-to-date. Consider updating statistics for a
+        table after any <code class="ph codeph">INSERT</code>, <code class="ph codeph">LOAD DATA</code>, or <code class="ph codeph">CREATE TABLE AS
+        SELECT</code> statement in Impala, or after loading data through Hive and doing a <code class="ph codeph">REFRESH
+        <var class="keyword varname">table_name</var></code> in Impala. This technique is especially important for tables that
+        are very large, used in join queries, or both.
+      </div>
+
+<p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+<p class="p">
+  The following example shows the results of both a standard <code class="ph codeph">DESCRIBE</code> and <code class="ph codeph">DESCRIBE
+  FORMATTED</code> for different kinds of schema objects:
+</p>
+
+  <ul class="ul">
+    <li class="li">
+      <code class="ph codeph">DESCRIBE</code> for a table or a view returns the name, type, and comment for each of the
+      columns. For a view, if the column value is computed by an expression, the column name is automatically
+      generated as <code class="ph codeph">_c0</code>, <code class="ph codeph">_c1</code>, and so on depending on the ordinal number of the
+      column.
+    </li>
+
+    <li class="li">
+      A table created with no special format or storage clauses is designated as a <code class="ph codeph">MANAGED_TABLE</code>
+      (an <span class="q">"internal table"</span> in Impala terminology). Its data files are stored in an HDFS directory under the
+      default Hive data directory. By default, it uses Text data format.
+    </li>
+
+    <li class="li">
+      A view is designated as <code class="ph codeph">VIRTUAL_VIEW</code> in <code class="ph codeph">DESCRIBE FORMATTED</code> output. Some
+      of its properties are <code class="ph codeph">NULL</code> or blank because they are inherited from the base table. The
+      text of the query that defines the view is part of the <code class="ph codeph">DESCRIBE FORMATTED</code> output.
+    </li>
+
+    <li class="li">
+      A table with additional clauses in the <code class="ph codeph">CREATE TABLE</code> statement has differences in
+      <code class="ph codeph">DESCRIBE FORMATTED</code> output. The output for <code class="ph codeph">T2</code> includes the
+      <code class="ph codeph">EXTERNAL_TABLE</code> keyword because of the <code class="ph codeph">CREATE EXTERNAL TABLE</code> syntax, and
+      different <code class="ph codeph">InputFormat</code> and <code class="ph codeph">OutputFormat</code> fields to reflect the Parquet file
+      format.
+    </li>
+  </ul>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; create table t1 (x int, y int, s string);
+Query: create table t1 (x int, y int, s string)
+[localhost:21000] &gt; describe t1;
+Query: describe t1
+Query finished, fetching results ...
++------+--------+---------+
+| name | type   | comment |
++------+--------+---------+
+| x    | int    |         |
+| y    | int    |         |
+| s    | string |         |
++------+--------+---------+
+Returned 3 row(s) in 0.13s
+[localhost:21000] &gt; describe formatted t1;
+Query: describe formatted t1
+Query finished, fetching results ...
++------------------------------+--------------------------------------------+------------+
+| name                         | type                                       | comment    |
++------------------------------+--------------------------------------------+------------+
+| # col_name                   | data_type                                  | comment    |
+|                              | NULL                                       | NULL       |
+| x                            | int                                        | None       |
+| y                            | int                                        | None       |
+| s                            | string                                     | None       |
+|                              | NULL                                       | NULL       |
+| # Detailed Table Information | NULL                                       | NULL       |
+| Database:                    | describe_formatted                         | NULL       |
+| Owner:                       | doc_demo                                   | NULL       |
+| CreateTime:                  | Mon Jul 22 17:03:16 EDT 2013               | NULL       |
+| LastAccessTime:              | UNKNOWN                                    | NULL       |
+| Protect Mode:                | None                                       | NULL       |
+| Retention:                   | 0                                          | NULL       |
+| Location:                    | hdfs://127.0.0.1:8020/user/hive/warehouse/ |            |
+|                              |   describe_formatted.db/t1                 | NULL       |
+| Table Type:                  | MANAGED_TABLE                              | NULL       |
+| Table Parameters:            | NULL                                       | NULL       |
+|                              | transient_lastDdlTime                      | 1374526996 |
+|                              | NULL                                       | NULL       |
+| # Storage Information        | NULL                                       | NULL       |
+| SerDe Library:               | org.apache.hadoop.hive.serde2.lazy.        |            |
+|                              |   LazySimpleSerDe                          | NULL       |
+| InputFormat:                 | org.apache.hadoop.mapred.TextInputFormat   | NULL       |
+| OutputFormat:                | org.apache.hadoop.hive.ql.io.              |            |
+|                              |   HiveIgnoreKeyTextOutputFormat            | NULL       |
+| Compressed:                  | No                                         | NULL       |
+| Num Buckets:                 | 0                                          | NULL       |
+| Bucket Columns:              | []                                         | NULL       |
+| Sort Columns:                | []                                         | NULL       |
++------------------------------+--------------------------------------------+------------+
+Returned 26 row(s) in 0.03s
+[localhost:21000] &gt; create view v1 as select x, upper(s) from t1;
+Query: create view v1 as select x, upper(s) from t1
+[localhost:21000] &gt; describe v1;
+Query: describe v1
+Query finished, fetching results ...
++------+--------+---------+
+| name | type   | comment |
++------+--------+---------+
+| x    | int    |         |
+| _c1  | string |         |
++------+--------+---------+
+Returned 2 row(s) in 0.10s
+[localhost:21000] &gt; describe formatted v1;
+Query: describe formatted v1
+Query finished, fetching results ...
++------------------------------+------------------------------+----------------------+
+| name                         | type                         | comment              |
++------------------------------+------------------------------+----------------------+
+| # col_name                   | data_type                    | comment              |
+|                              | NULL                         | NULL                 |
+| x                            | int                          | None                 |
+| _c1                          | string                       | None                 |
+|                              | NULL                         | NULL                 |
+| # Detailed Table Information | NULL                         | NULL                 |
+| Database:                    | describe_formatted           | NULL                 |
+| Owner:                       | doc_demo                     | NULL                 |
+| CreateTime:                  | Mon Jul 22 16:56:38 EDT 2013 | NULL                 |
+| LastAccessTime:              | UNKNOWN                      | NULL                 |
+| Protect Mode:                | None                         | NULL                 |
+| Retention:                   | 0                            | NULL                 |
+| Table Type:                  | VIRTUAL_VIEW                 | NULL                 |
+| Table Parameters:            | NULL                         | NULL                 |
+|                              | transient_lastDdlTime        | 1374526598           |
+|                              | NULL                         | NULL                 |
+| # Storage Information        | NULL                         | NULL                 |
+| SerDe Library:               | null                         | NULL                 |
+| InputFormat:                 | null                         | NULL                 |
+| OutputFormat:                | null                         | NULL                 |
+| Compressed:                  | No                           | NULL                 |
+| Num Buckets:                 | 0                            | NULL                 |
+| Bucket Columns:              | []                           | NULL                 |
+| Sort Columns:                | []                           | NULL                 |
+|                              | NULL                         | NULL                 |
+| # View Information           | NULL                         | NULL                 |
+| View Original Text:          | SELECT x, upper(s) FROM t1   | NULL                 |
+| View Expanded Text:          | SELECT x, upper(s) FROM t1   | NULL                 |
++------------------------------+------------------------------+----------------------+
+Returned 28 row(s) in 0.03s
+[localhost:21000] &gt; create external table t2 (x int, y int, s string) stored as parquet location '/user/doc_demo/sample_data';
+[localhost:21000] &gt; describe formatted t2;
+Query: describe formatted t2
+Query finished, fetching results ...
++------------------------------+----------------------------------------------------+------------+
+| name                         | type                                               | comment    |
++------------------------------+----------------------------------------------------+------------+
+| # col_name                   | data_type                                          | comment    |
+|                              | NULL                                               | NULL       |
+| x                            | int                                                | None       |
+| y                            | int                                                | None       |
+| s                            | string                                             | None       |
+|                              | NULL                                               | NULL       |
+| # Detailed Table Information | NULL                                               | NULL       |
+| Database:                    | describe_formatted                                 | NULL       |
+| Owner:                       | doc_demo                                           | NULL       |
+| CreateTime:                  | Mon Jul 22 17:01:47 EDT 2013                       | NULL       |
+| LastAccessTime:              | UNKNOWN                                            | NULL       |
+| Protect Mode:                | None                                               | NULL       |
+| Retention:                   | 0                                                  | NULL       |
+| Location:                    | hdfs://127.0.0.1:8020/user/doc_demo/sample_data    | NULL       |
+| Table Type:                  | EXTERNAL_TABLE                                     | NULL       |
+| Table Parameters:            | NULL                                               | NULL       |
+|                              | EXTERNAL                                           | TRUE       |
+|                              | transient_lastDdlTime                              | 1374526907 |
+|                              | NULL                                               | NULL       |
+| # Storage Information        | NULL                                               | NULL       |
+| SerDe Library:               | org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | NULL       |
+| InputFormat:                 | org.apache.impala.hive.serde.ParquetInputFormat    | NULL       |
+| OutputFormat:                | org.apache.impala.hive.serde.ParquetOutputFormat   | NULL       |
+| Compressed:                  | No                                                 | NULL       |
+| Num Buckets:                 | 0                                                  | NULL       |
+| Bucket Columns:              | []                                                 | NULL       |
+| Sort Columns:                | []                                                 | NULL       |
++------------------------------+----------------------------------------------------+------------+
+Returned 27 row(s) in 0.17s</code></pre>
+
+    <p class="p">
+        <strong class="ph b">Cancellation:</strong> Cannot be cancelled.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">HDFS permissions:</strong>
+      </p>
+    <p class="p">
+      The user ID that the <span class="keyword cmdname">impalad</span> daemon runs under,
+      typically the <code class="ph codeph">impala</code> user, must have read and execute
+      permissions for all directories that are part of the table.
+      (A table could span multiple different HDFS directories if it is partitioned.
+      The directories could be widely scattered because a partition can reside
+      in an arbitrary HDFS directory based on its <code class="ph codeph">LOCATION</code> attribute.)
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Kudu considerations:</strong>
+      </p>
+
+    <p class="p">
+      The information displayed for Kudu tables includes the additional attributes
+      that are only applicable for Kudu tables:
+    </p>
+    <ul class="ul">
+      <li class="li">
+        Whether or not the column is part of the primary key. Every Kudu table
+        has a <code class="ph codeph">true</code> value here for at least one column. There
+        could be multiple <code class="ph codeph">true</code> values, for tables with
+        composite primary keys.
+      </li>
+      <li class="li">
+        Whether or not the column is nullable. Specified by the <code class="ph codeph">NULL</code>
+        or <code class="ph codeph">NOT NULL</code> attributes on the <code class="ph codeph">CREATE TABLE</code> statement.
+        Columns that are part of the primary key are automatically non-nullable.
+      </li>
+      <li class="li">
+        The default value, if any, for the column. Specified by the <code class="ph codeph">DEFAULT</code>
+        attribute on the <code class="ph codeph">CREATE TABLE</code> statement. If the default value is
+        <code class="ph codeph">NULL</code>, that is not indicated in this column. It is implied by
+        <code class="ph codeph">nullable</code> being true and no other default value specified.
+      </li>
+      <li class="li">
+        The encoding used for values in the column. Specified by the <code class="ph codeph">ENCODING</code>
+        attribute on the <code class="ph codeph">CREATE TABLE</code> statement.
+      </li>
+      <li class="li">
+        The compression used for values in the column. Specified by the <code class="ph codeph">COMPRESSION</code>
+        attribute on the <code class="ph codeph">CREATE TABLE</code> statement.
+      </li>
+      <li class="li">
+        The block size (in bytes) used for the underlying Kudu storage layer for the column.
+        Specified by the <code class="ph codeph">BLOCK_SIZE</code> attribute on the <code class="ph codeph">CREATE TABLE</code>
+        statement.
+      </li>
+    </ul>
+
+    <p class="p">
+      The following example shows <code class="ph codeph">DESCRIBE</code> output for a simple Kudu table, with 
+      a single-column primary key and all column attributes left with their default values:
+    </p>
+
+<pre class="pre codeblock"><code>
+describe million_rows;
++------+--------+---------+-------------+----------+---------------+---------------+---------------------+------------+
+| name | type   | comment | primary_key | nullable | default_value | encoding      | compression         | block_size |
++------+--------+---------+-------------+----------+---------------+---------------+---------------------+------------+
+| id   | string |         | true        | false    |               | AUTO_ENCODING | DEFAULT_COMPRESSION | 0          |
+| s    | string |         | false       | false    |               | AUTO_ENCODING | DEFAULT_COMPRESSION | 0          |
++------+--------+---------+-------------+----------+---------------+---------------+---------------------+------------+
+</code></pre>
+
+    <p class="p">
+      The following example shows <code class="ph codeph">DESCRIBE</code> output for a Kudu table with a
+      two-column primary key, and Kudu-specific attributes applied to some columns:
+    </p>
+
+<pre class="pre codeblock"><code>
+create table kudu_describe_example
+(
+  c1 int, c2 int,
+  c3 string, c4 string not null, c5 string default 'n/a', c6 string default '',
+  c7 bigint not null, c8 bigint null default null, c9 bigint default -1 encoding bit_shuffle,
+  primary key(c1,c2)
+)
+partition by hash (c1, c2) partitions 10 stored as kudu;
+
+describe kudu_describe_example;
++------+--------+---------+-------------+----------+---------------+---------------+---------------------+------------+
+| name | type   | comment | primary_key | nullable | default_value | encoding      | compression         | block_size |
++------+--------+---------+-------------+----------+---------------+---------------+---------------------+------------+
+| c1   | int    |         | true        | false    |               | AUTO_ENCODING | DEFAULT_COMPRESSION | 0          |
+| c2   | int    |         | true        | false    |               | AUTO_ENCODING | DEFAULT_COMPRESSION | 0          |
+| c3   | string |         | false       | true     |               | AUTO_ENCODING | DEFAULT_COMPRESSION | 0          |
+| c4   | string |         | false       | false    |               | AUTO_ENCODING | DEFAULT_COMPRESSION | 0          |
+| c5   | string |         | false       | true     | n/a           | AUTO_ENCODING | DEFAULT_COMPRESSION | 0          |
+| c6   | string |         | false       | true     |               | AUTO_ENCODING | DEFAULT_COMPRESSION | 0          |
+| c7   | bigint |         | false       | false    |               | AUTO_ENCODING | DEFAULT_COMPRESSION | 0          |
+| c8   | bigint |         | false       | true     |               | AUTO_ENCODING | DEFAULT_COMPRESSION | 0          |
+| c9   | bigint |         | false       | true     | -1            | BIT_SHUFFLE   | DEFAULT_COMPRESSION | 0          |
++------+--------+---------+-------------+----------+---------------+---------------+---------------------+------------+
+</code></pre>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+    <p class="p">
+      <a class="xref" href="impala_tables.html#tables">Overview of Impala Tables</a>, <a class="xref" href="impala_create_table.html#create_table">CREATE TABLE Statement</a>,
+      <a class="xref" href="impala_show.html#show_tables">SHOW TABLES Statement</a>, <a class="xref" href="impala_show.html#show_create_table">SHOW CREATE TABLE Statement</a>
+    </p>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_langref_sql.html">Impala SQL Statements</a></div></div></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_development.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_development.html b/docs/build/html/topics/impala_development.html
new file mode 100644
index 0000000..f8e0ae5
--- /dev/null
+++ b/docs/build/html/topics/impala_development.html
@@ -0,0 +1,197 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_concepts.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="intro_dev"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Developing Impala Applications</title></head><body id="intro_dev"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Developing Impala Applications</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      The core development language with Impala is SQL. You can also use Java or other languages to interact with
+      Impala through the standard JDBC and ODBC interfaces used by many business intelligence tools. For
+      specialized kinds of analysis, you can supplement the SQL built-in functions by writing
+      <a class="xref" href="impala_udf.html#udfs">user-defined functions (UDFs)</a> in C++ or Java.
+    </p>
+
+    <p class="p toc inpage"></p>
+  </div>
+
+  <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_concepts.html">Impala Concepts and Architecture</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="intro_dev__intro_sql">
+
+    <h2 class="title topictitle2" id="ariaid-title2">Overview of the Impala SQL Dialect</h2>
+  
+
+    <div class="body conbody">
+
+      <p class="p">
+        The Impala SQL dialect is highly compatible with the SQL syntax used in the Apache Hive component (HiveQL). As
+        such, it is familiar to users who are already familiar with running SQL queries on the Hadoop
+        infrastructure. Currently, Impala SQL supports a subset of HiveQL statements, data types, and built-in
+        functions. Impala also includes additional built-in functions for common industry features, to simplify
+        porting SQL from non-Hadoop systems.
+      </p>
+
+      <p class="p">
+        For users coming to Impala from traditional database or data warehousing backgrounds, the following aspects of the SQL dialect
+        might seem familiar:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          <p class="p">
+            The <a class="xref" href="impala_select.html#select">SELECT statement</a> includes familiar clauses such as <code class="ph codeph">WHERE</code>,
+            <code class="ph codeph">GROUP BY</code>, <code class="ph codeph">ORDER BY</code>, and <code class="ph codeph">WITH</code>.
+            You will find familiar notions such as
+            <a class="xref" href="impala_joins.html#joins">joins</a>, <a class="xref" href="impala_functions.html#builtins">built-in
+            functions</a> for processing strings, numbers, and dates,
+            <a class="xref" href="impala_aggregate_functions.html#aggregate_functions">aggregate functions</a>,
+            <a class="xref" href="impala_subqueries.html#subqueries">subqueries</a>, and
+            <a class="xref" href="impala_operators.html#comparison_operators">comparison operators</a>
+            such as <code class="ph codeph">IN()</code> and <code class="ph codeph">BETWEEN</code>.
+            The <code class="ph codeph">SELECT</code> statement is the place where SQL standards compliance is most important.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+          From the data warehousing world, you will recognize the notion of
+          <a class="xref" href="impala_partitioning.html#partitioning">partitioned tables</a>.
+          One or more columns serve as partition keys, and the data is physically arranged so that
+          queries that refer to the partition key columns in the <code class="ph codeph">WHERE</code> clause
+          can skip partitions that do not match the filter conditions. For example, if you have 10
+          years worth of data and use a clause such as <code class="ph codeph">WHERE year = 2015</code>,
+          <code class="ph codeph">WHERE year &gt; 2010</code>, or <code class="ph codeph">WHERE year IN (2014, 2015)</code>,
+          Impala skips all the data for non-matching years, greatly reducing the amount of I/O
+          for the query.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+          In Impala 1.2 and higher, <a class="xref" href="impala_udf.html#udfs">UDFs</a> let you perform custom comparisons
+          and transformation logic during <code class="ph codeph">SELECT</code> and <code class="ph codeph">INSERT...SELECT</code> statements.
+          </p>
+        </li>
+      </ul>
+
+      <p class="p">
+        For users coming to Impala from traditional database or data warehousing backgrounds, the following aspects of the SQL dialect
+        might require some learning and practice for you to become proficient in the Hadoop environment:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          <p class="p">
+          Impala SQL is focused on queries and includes relatively little DML. There is no <code class="ph codeph">UPDATE</code>
+          or <code class="ph codeph">DELETE</code> statement. Stale data is typically discarded (by <code class="ph codeph">DROP TABLE</code>
+          or <code class="ph codeph">ALTER TABLE ... DROP PARTITION</code> statements) or replaced (by <code class="ph codeph">INSERT
+          OVERWRITE</code> statements).
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+          All data creation is done by <code class="ph codeph">INSERT</code> statements, which typically insert data in bulk by
+          querying from other tables. There are two variations, <code class="ph codeph">INSERT INTO</code> which appends to the
+          existing data, and <code class="ph codeph">INSERT OVERWRITE</code> which replaces the entire contents of a table or
+          partition (similar to <code class="ph codeph">TRUNCATE TABLE</code> followed by a new <code class="ph codeph">INSERT</code>).
+          Although there is an <code class="ph codeph">INSERT ... VALUES</code> syntax to create a small number of values in
+          a single statement, it is far more efficient to use the <code class="ph codeph">INSERT ... SELECT</code> to copy
+          and transform large amounts of data from one table to another in a single operation.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+          You often construct Impala table definitions and data files in some other environment, and then attach
+          Impala so that it can run real-time queries. The same data files and table metadata are shared with other
+          components of the Hadoop ecosystem. In particular, Impala can access tables created by Hive or data
+          inserted by Hive, and Hive can access tables and data produced by Impala. Many other Hadoop components
+          can write files in formats such as Parquet and Avro, that can then be queried by Impala.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+          Because Hadoop and Impala are focused on data warehouse-style operations on large data sets, Impala SQL
+          includes some idioms that you might find in the import utilities for traditional database systems. For
+          example, you can create a table that reads comma-separated or tab-separated text files, specifying the
+          separator in the <code class="ph codeph">CREATE TABLE</code> statement. You can create <strong class="ph b">external tables</strong> that read
+          existing data files but do not move or transform them.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+          Because Impala reads large quantities of data that might not be perfectly tidy and predictable, it does
+          not require length constraints on string data types. For example, you can define a database column as
+          <code class="ph codeph">STRING</code> with unlimited length, rather than <code class="ph codeph">CHAR(1)</code> or
+          <code class="ph codeph">VARCHAR(64)</code>. <span class="ph">(Although in Impala 2.0 and later, you can also use
+          length-constrained <code class="ph codeph">CHAR</code> and <code class="ph codeph">VARCHAR</code> types.)</span>
+          </p>
+        </li>
+
+      </ul>
+
+      <p class="p">
+        <strong class="ph b">Related information:</strong> <a class="xref" href="impala_langref.html#langref">Impala SQL Language Reference</a>, especially
+        <a class="xref" href="impala_langref_sql.html#langref_sql">Impala SQL Statements</a> and <a class="xref" href="impala_functions.html#builtins">Impala Built-In Functions</a>
+      </p>
+    </div>
+  </article>
+
+
+
+  
+
+  
+
+  
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="intro_dev__intro_apis">
+
+    <h2 class="title topictitle2" id="ariaid-title3">Overview of Impala Programming Interfaces</h2>
+  
+
+    <div class="body conbody">
+
+      <p class="p">
+        You can connect and submit requests to the Impala daemons through:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          The <code class="ph codeph"><a class="xref" href="impala_impala_shell.html#impala_shell">impala-shell</a></code> interactive
+          command interpreter.
+        </li>
+
+        <li class="li">
+          The <a class="xref" href="http://gethue.com/" target="_blank">Hue</a> web-based user interface.
+        </li>
+
+        <li class="li">
+          <a class="xref" href="impala_jdbc.html#impala_jdbc">JDBC</a>.
+        </li>
+
+        <li class="li">
+          <a class="xref" href="impala_odbc.html#impala_odbc">ODBC</a>.
+        </li>
+      </ul>
+
+      <p class="p">
+        With these options, you can use Impala in heterogeneous environments, with JDBC or ODBC applications
+        running on non-Linux platforms. You can also use Impala on combination with various Business Intelligence
+        tools that use the JDBC and ODBC interfaces.
+      </p>
+
+      <p class="p">
+        Each <code class="ph codeph">impalad</code> daemon process, running on separate nodes in a cluster, listens to
+        <a class="xref" href="impala_ports.html#ports">several ports</a> for incoming requests. Requests from
+        <code class="ph codeph">impala-shell</code> and Hue are routed to the <code class="ph codeph">impalad</code> daemons through the same
+        port. The <code class="ph codeph">impalad</code> daemons listen on separate ports for JDBC and ODBC requests.
+      </p>
+    </div>
+  </article>
+</article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_disable_codegen.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_disable_codegen.html b/docs/build/html/topics/impala_disable_codegen.html
new file mode 100644
index 0000000..f8766b7
--- /dev/null
+++ b/docs/build/html/topics/impala_disable_codegen.html
@@ -0,0 +1,36 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="disable_codegen"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>DISABLE_CODEGEN Query Option</title></head><body id="disable_codegen"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">DISABLE_CODEGEN Query Option</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      
+      This is a debug option, intended for diagnosing and working around issues that cause crashes. If a query
+      fails with an <span class="q">"illegal instruction"</span> or other hardware-specific message, try setting
+      <code class="ph codeph">DISABLE_CODEGEN=true</code> and running the query again. If the query succeeds only when the
+      <code class="ph codeph">DISABLE_CODEGEN</code> option is turned on, submit the problem to <span class="keyword">the appropriate support channel</span> and include that
+      detail in the problem report. Do not otherwise run with this setting turned on, because it results in lower
+      overall performance.
+    </p>
+
+    <p class="p">
+      Because the code generation phase adds a small amount of overhead for each query, you might turn on the
+      <code class="ph codeph">DISABLE_CODEGEN</code> option to achieve maximum throughput when running many short-lived queries
+      against small tables.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Type:</strong> Boolean; recognized values are 1 and 0, or <code class="ph codeph">true</code> and <code class="ph codeph">false</code>;
+        any other value interpreted as <code class="ph codeph">false</code>
+      </p>
+    <p class="p">
+        <strong class="ph b">Default:</strong> <code class="ph codeph">false</code> (shown as 0 in output of <code class="ph codeph">SET</code> statement)
+      </p>
+
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_disable_row_runtime_filtering.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_disable_row_runtime_filtering.html b/docs/build/html/topics/impala_disable_row_runtime_filtering.html
new file mode 100644
index 0000000..11ccb80
--- /dev/null
+++ b/docs/build/html/topics/impala_disable_row_runtime_filtering.html
@@ -0,0 +1,72 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="disable_row_runtime_filtering"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>DISABLE_ROW_RUNTIME_FILTERING Query Option (Impala 2.5 or higher only)</title></head><body id="disable_row_runtime_filtering"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">DISABLE_ROW_RUNTIME_FILTERING Query Option (<span class="keyword">Impala 2.5</span> or higher only)</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      
+      The <code class="ph codeph">DISABLE_ROW_RUNTIME_FILTERING</code> query option
+      reduces the scope of the runtime filtering feature. Queries still dynamically prune
+      partitions, but do not apply the filtering logic to individual rows within partitions.
+    </p>
+
+    <p class="p">
+      Only applies to queries against Parquet tables. For other file formats, Impala
+      only prunes at the level of partitions, not individual rows.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Type:</strong> Boolean; recognized values are 1 and 0, or <code class="ph codeph">true</code> and <code class="ph codeph">false</code>;
+        any other value interpreted as <code class="ph codeph">false</code>
+      </p>
+    <p class="p">
+        <strong class="ph b">Default:</strong> <code class="ph codeph">false</code>
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.5.0</span>
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+    <p class="p">
+      Impala automatically evaluates whether the per-row filters are being
+      effective at reducing the amount of intermediate data. Therefore,
+      this option is typically only needed for the rare case where Impala
+      cannot accurately determine how effective the per-row filtering is
+      for a query.
+    </p>
+
+    <p class="p">
+        Because the runtime filtering feature applies mainly to resource-intensive
+        and long-running queries, only adjust this query option when tuning long-running queries
+        involving some combination of large partitioned tables and joins involving large tables.
+      </p>
+
+    <p class="p">
+      Because this setting only improves query performance in very specific
+      circumstances, depending on the query characteristics and data distribution,
+      only use it when you determine through benchmarking that it improves
+      performance of specific expensive queries.
+      Consider setting this query option immediately before the expensive query and
+      unsetting it immediately afterward.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+    <p class="p">
+      <a class="xref" href="impala_runtime_filtering.html">Runtime Filtering for Impala Queries (Impala 2.5 or higher only)</a>,
+      <a class="xref" href="impala_runtime_filter_mode.html#runtime_filter_mode">RUNTIME_FILTER_MODE Query Option (Impala 2.5 or higher only)</a>
+      
+    </p>
+
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_disable_streaming_preaggregations.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_disable_streaming_preaggregations.html b/docs/build/html/topics/impala_disable_streaming_preaggregations.html
new file mode 100644
index 0000000..98ea640
--- /dev/null
+++ b/docs/build/html/topics/impala_disable_streaming_preaggregations.html
@@ -0,0 +1,50 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="disable_streaming_preaggregations"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>DISABLE_STREAMING_PREAGGREGATIONS Query Option (Impala 2.5 or higher only)</title></head><body id="disable_streaming_preaggregations"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">DISABLE_STREAMING_PREAGGREGATIONS Query Option (<span class="keyword">Impala 2.5</span> or higher only)</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      
+      Turns off the <span class="q">"streaming preaggregation"</span> optimization that is available in <span class="keyword">Impala 2.5</span>
+      and higher. This optimization reduces unnecessary work performed by queries that perform aggregation
+      operations on columns with few or no duplicate values, for example <code class="ph codeph">DISTINCT <var class="keyword varname">id_column</var></code>
+      or <code class="ph codeph">GROUP BY <var class="keyword varname">unique_column</var></code>. If the optimization causes regressions in
+      existing queries that use aggregation functions, you can turn it off as needed by setting this query option.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Type:</strong> Boolean; recognized values are 1 and 0, or <code class="ph codeph">true</code> and <code class="ph codeph">false</code>;
+        any other value interpreted as <code class="ph codeph">false</code>
+      </p>
+    <p class="p">
+        <strong class="ph b">Default:</strong> <code class="ph codeph">false</code> (shown as 0 in output of <code class="ph codeph">SET</code> statement)
+      </p>
+
+    <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+        In <span class="keyword">Impala 2.5.0</span>, only the value 1 enables the option, and the value
+        <code class="ph codeph">true</code> is not recognized. This limitation is
+        tracked by the issue
+        <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-3334" target="_blank">IMPALA-3334</a>,
+        which shows the releases where the problem is fixed.
+      </div>
+
+    <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+    <p class="p">
+      Typically, queries that would require enabling this option involve very large numbers of
+      aggregated values, such as a billion or more distinct keys being processed on each
+      worker node.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.5.0</span>
+      </p>
+
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_disable_unsafe_spills.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_disable_unsafe_spills.html b/docs/build/html/topics/impala_disable_unsafe_spills.html
new file mode 100644
index 0000000..01bc8fd
--- /dev/null
+++ b/docs/build/html/topics/impala_disable_unsafe_spills.html
@@ -0,0 +1,50 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="disable_unsafe_spills"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>DISABLE_UNSAFE_SPILLS Query Option (Impala 2.0 or higher only)</title></head><body id="disable_unsafe_spills"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">DISABLE_UNSAFE_SPILLS Query Option (<span class="keyword">Impala 2.0</span> or higher only)</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      
+      Enable this option if you prefer to have queries fail when they exceed the Impala memory limit, rather than
+      write temporary data to disk.
+    </p>
+
+    <p class="p">
+      Queries that <span class="q">"spill"</span> to disk typically complete successfully, when in earlier Impala releases they would have failed.
+      However, queries with exorbitant memory requirements due to missing statistics or inefficient join clauses could
+      become so slow as a result that you would rather have them cancelled automatically and reduce the memory
+      usage through standard Impala tuning techniques.
+    </p>
+
+    <p class="p">
+      This option prevents only <span class="q">"unsafe"</span> spill operations, meaning that one or more tables are missing
+      statistics or the query does not include a hint to set the most efficient mechanism for a join or
+      <code class="ph codeph">INSERT ... SELECT</code> into a partitioned table. These are the tables most likely to result in
+      suboptimal execution plans that could cause unnecessary spilling. Therefore, leaving this option enabled is a
+      good way to find tables on which to run the <code class="ph codeph">COMPUTE STATS</code> statement.
+    </p>
+
+    <p class="p">
+      See <a class="xref" href="impala_scalability.html#spill_to_disk">SQL Operations that Spill to Disk</a> for information about the <span class="q">"spill to disk"</span>
+      feature for queries processing large result sets with joins, <code class="ph codeph">ORDER BY</code>, <code class="ph codeph">GROUP
+      BY</code>, <code class="ph codeph">DISTINCT</code>, aggregation functions, or analytic functions.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Type:</strong> Boolean; recognized values are 1 and 0, or <code class="ph codeph">true</code> and <code class="ph codeph">false</code>;
+        any other value interpreted as <code class="ph codeph">false</code>
+      </p>
+    <p class="p">
+        <strong class="ph b">Default:</strong> <code class="ph codeph">false</code> (shown as 0 in output of <code class="ph codeph">SET</code> statement)
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.0.0</span>
+      </p>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_disk_space.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_disk_space.html b/docs/build/html/topics/impala_disk_space.html
new file mode 100644
index 0000000..0b102e5
--- /dev/null
+++ b/docs/build/html/topics/impala_disk_space.html
@@ -0,0 +1,133 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_admin.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="disk_space"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Managing Disk Space for Impala Data</title></head><body id="disk_space"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Managing Disk Space for Impala Data</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      Although Impala typically works with many large files in an HDFS storage system with plenty of capacity,
+      there are times when you might perform some file cleanup to reclaim space, or advise developers on techniques
+      to minimize space consumption and file duplication.
+    </p>
+
+    <ul class="ul">
+      <li class="li">
+        <p class="p">
+          Use compact binary file formats where practical. Numeric and time-based data in particular can be stored
+          in more compact form in binary data files. Depending on the file format, various compression and encoding
+          features can reduce file size even further. You can specify the <code class="ph codeph">STORED AS</code> clause as part
+          of the <code class="ph codeph">CREATE TABLE</code> statement, or <code class="ph codeph">ALTER TABLE</code> with the <code class="ph codeph">SET
+          FILEFORMAT</code> clause for an existing table or partition within a partitioned table. See
+          <a class="xref" href="impala_file_formats.html#file_formats">How Impala Works with Hadoop File Formats</a> for details about file formats, especially
+          <a class="xref" href="impala_parquet.html#parquet">Using the Parquet File Format with Impala Tables</a>. See <a class="xref" href="impala_create_table.html#create_table">CREATE TABLE Statement</a> and
+          <a class="xref" href="impala_alter_table.html#alter_table">ALTER TABLE Statement</a> for syntax details.
+        </p>
+      </li>
+
+      <li class="li">
+        <p class="p">
+          You manage underlying data files differently depending on whether the corresponding Impala table is
+          defined as an <a class="xref" href="impala_tables.html#internal_tables">internal</a> or
+          <a class="xref" href="impala_tables.html#external_tables">external</a> table:
+        </p>
+        <ul class="ul">
+          <li class="li">
+            Use the <code class="ph codeph">DESCRIBE FORMATTED</code> statement to check if a particular table is internal
+            (managed by Impala) or external, and to see the physical location of the data files in HDFS. See
+            <a class="xref" href="impala_describe.html#describe">DESCRIBE Statement</a> for details.
+          </li>
+
+          <li class="li">
+            For Impala-managed (<span class="q">"internal"</span>) tables, use <code class="ph codeph">DROP TABLE</code> statements to remove
+            data files. See <a class="xref" href="impala_drop_table.html#drop_table">DROP TABLE Statement</a> for details.
+          </li>
+
+          <li class="li">
+            For tables not managed by Impala (<span class="q">"external"</span> tables), use appropriate HDFS-related commands such
+            as <code class="ph codeph">hadoop fs</code>, <code class="ph codeph">hdfs dfs</code>, or <code class="ph codeph">distcp</code>, to create, move,
+            copy, or delete files within HDFS directories that are accessible by the <code class="ph codeph">impala</code> user.
+            Issue a <code class="ph codeph">REFRESH <var class="keyword varname">table_name</var></code> statement after adding or removing any
+            files from the data directory of an external table. See <a class="xref" href="impala_refresh.html#refresh">REFRESH Statement</a> for
+            details.
+          </li>
+
+          <li class="li">
+            Use external tables to reference HDFS data files in their original location. With this technique, you
+            avoid copying the files, and you can map more than one Impala table to the same set of data files. When
+            you drop the Impala table, the data files are left undisturbed. See
+            <a class="xref" href="impala_tables.html#external_tables">External Tables</a> for details.
+          </li>
+
+          <li class="li">
+            Use the <code class="ph codeph">LOAD DATA</code> statement to move HDFS files into the data directory for an Impala
+            table from inside Impala, without the need to specify the HDFS path of the destination directory. This
+            technique works for both internal and external tables. See
+            <a class="xref" href="impala_load_data.html#load_data">LOAD DATA Statement</a> for details.
+          </li>
+        </ul>
+      </li>
+
+      <li class="li">
+        <p class="p">
+          Make sure that the HDFS trashcan is configured correctly. When you remove files from HDFS, the space
+          might not be reclaimed for use by other files until sometime later, when the trashcan is emptied. See
+          <a class="xref" href="impala_drop_table.html#drop_table">DROP TABLE Statement</a> for details. See
+          <a class="xref" href="impala_prereqs.html#prereqs_account">User Account Requirements</a> for permissions needed for the HDFS trashcan to operate
+          correctly.
+        </p>
+      </li>
+
+      <li class="li">
+        <p class="p">
+          Drop all tables in a database before dropping the database itself. See
+          <a class="xref" href="impala_drop_database.html#drop_database">DROP DATABASE Statement</a> for details.
+        </p>
+      </li>
+
+      <li class="li">
+        <p class="p">
+          Clean up temporary files after failed <code class="ph codeph">INSERT</code> statements. If an <code class="ph codeph">INSERT</code>
+          statement encounters an error, and you see a directory named <span class="ph filepath">.impala_insert_staging</span>
+          or <span class="ph filepath">_impala_insert_staging</span> left behind in the data directory for the table, it might
+          contain temporary data files taking up space in HDFS. You might be able to salvage these data files, for
+          example if they are complete but could not be moved into place due to a permission error. Or, you might
+          delete those files through commands such as <code class="ph codeph">hadoop fs</code> or <code class="ph codeph">hdfs dfs</code>, to
+          reclaim space before re-trying the <code class="ph codeph">INSERT</code>. Issue <code class="ph codeph">DESCRIBE FORMATTED
+          <var class="keyword varname">table_name</var></code> to see the HDFS path where you can check for temporary files.
+        </p>
+      </li>
+
+      <li class="li">
+        <p class="p">
+        By default, intermediate files used during large sort, join, aggregation, or analytic function operations
+        are stored in the directory <span class="ph filepath">/tmp/impala-scratch</span> . These files are removed when the
+        operation finishes. (Multiple concurrent queries can perform operations that use the <span class="q">"spill to disk"</span>
+        technique, without any name conflicts for these temporary files.) You can specify a different location by
+        starting the <span class="keyword cmdname">impalad</span> daemon with the
+        <code class="ph codeph">--scratch_dirs="<var class="keyword varname">path_to_directory</var>"</code> configuration option.
+        You can specify a single directory, or a comma-separated list of directories. The scratch directories must
+        be on the local filesystem, not in HDFS. You might specify different directory paths for different hosts,
+        depending on the capacity and speed
+        of the available storage devices. In <span class="keyword">Impala 2.3</span> or higher, Impala successfully starts (with a warning
+        Impala successfully starts (with a warning written to the log) if it cannot create or read and write files
+        in one of the scratch directories. If there is less than 1 GB free on the filesystem where that directory resides,
+        Impala still runs, but writes a warning message to its log.  If Impala encounters an error reading or writing
+        files in a scratch directory during a query, Impala logs the error and the query fails.
+      </p>
+      </li>
+
+      <li class="li">
+        <p class="p">
+          If you use the Amazon Simple Storage Service (S3) as a place to offload
+          data to reduce the volume of local storage, Impala 2.2.0 and higher
+          can query the data directly from S3.
+          See <a class="xref" href="impala_s3.html#s3">Using Impala with the Amazon S3 Filesystem</a> for details.
+        </p>
+      </li>
+    </ul>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_admin.html">Impala Administration</a></div></div></nav></article></main></body></html>
\ No newline at end of file

[33/51] [partial] incubator-impala git commit: IMPALA-4181 [DOCS] Publish rendered Impala documentation to ASF site

Posted by jb...@apache.org.

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_fixed_issues.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_fixed_issues.html b/docs/build/html/topics/impala_fixed_issues.html
new file mode 100644
index 0000000..9444528
--- /dev/null
+++ b/docs/build/html/topics/impala_fixed_issues.html
@@ -0,0 +1,5889 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_release_notes.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="fixed_issues"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Fixed Issues in Apache Impala (incubating)</title></head><body id="fixed_issues"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1"><span class="ph">Fixed Issues in Apache Impala (incubating)</span></h1>
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      The following sections describe the major issues fixed in each Impala release.
+    </p>
+
+    <p class="p">
+      For known issues that are currently unresolved, see <a class="xref" href="impala_known_issues.html#known_issues">Known Issues and Workarounds in Impala</a>.
+    </p>
+
+    <p class="p toc inpage"></p>
+  </div>
+
+
+
+  <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_release_notes.html">Impala Release Notes</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="fixed_issues__fixed_issues_280">
+
+    <h2 class="title topictitle2" id="ariaid-title2">Issues Fixed in <span class="keyword">Impala 2.8.0</span></h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        For the full list of Impala fixed issues in <span class="keyword">Impala 2.8</span>, see
+        <a class="xref" href="https://issues.apache.org/jira/issues/?jql=type%20%3D%20bug%20and%20project%20%3D%20IMPALA%20AND%20resolution%20%3D%20fixed%20AND%20affectedVersion%20!%3D%20%22Impala%202.8.0%22%20AND%20fixVersion%20%3D%20%22Impala%202.8.0%22%20and%20not%20labels%20%3D%20broken-build%20order%20by%20priority%20desc" target="_blank">this report in the Impala JIRA tracker</a>.
+      </p>
+
+    </div>
+  </article>
+
+
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="fixed_issues__fixed_issues_270">
+
+    <h2 class="title topictitle2" id="ariaid-title3">Issues Fixed in <span class="keyword">Impala 2.7.0</span></h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+
+        For the full list of Impala fixed issues in Impala 2.7.0, see
+        <a class="xref" href="https://issues.apache.org/jira/issues/?jql=%20type%20%3D%20bug%20and%20project%20%3D%20IMPALA%20AND%20resolution%20%3D%20fixed%20AND%20affectedVersion%20!%3D%20%22Impala%202.7.0%22%20AND%20fixVersion%20%3D%20%22Impala%202.7.0%22%20and%20not%20labels%20%3D%20broken-build%20order%20by%20priority%20desc" target="_blank">this report in the Impala JIRA tracker</a>.
+      </p>
+
+    </div>
+  </article>
+
+
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="fixed_issues__fixed_issues_263">
+    <h2 class="title topictitle2" id="ariaid-title4">Issues Fixed in <span class="keyword">Impala 2.6.3</span></h2>
+    <div class="body conbody">
+      <p class="p"></p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title5" id="fixed_issues__fixed_issues_262">
+    <h2 class="title topictitle2" id="ariaid-title5">Issues Fixed in <span class="keyword">Impala 2.6.2</span></h2>
+    <div class="body conbody">
+      <p class="p"></p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title6" id="fixed_issues__fixed_issues_260">
+
+    <h2 class="title topictitle2" id="ariaid-title6">Issues Fixed in <span class="keyword">Impala 2.6.0</span></h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        The following list contains the most critical fixed issues
+        (<code class="ph codeph">priority='Blocker'</code>) from the JIRA system.
+        For the full list of fixed issues in <span class="keyword">Impala 2.6.0</span>, see
+        <a class="xref" href="https://issues.apache.org/jira/issues/?jql=%20type%20%3D%20bug%20and%20project%20%3D%20IMPALA%20AND%20resolution%20%3D%20fixed%20AND%20affectedVersion%20!%3D%20%22Impala%202.6.0%22%20AND%20fixVersion%20%3D%20%22Impala%202.6.0%22%20and%20not%20labels%20%3D%20broken-build%20order%20by%20priority%20desc" target="_blank">this report in the Impala JIRA tracker</a>.
+      </p>
+
+    </div>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title7" id="fixed_issues_260__IMPALA-3385">
+      <h3 class="title topictitle3" id="ariaid-title7">RuntimeState::error_log_ crashes</h3>
+      <div class="body conbody">
+      <p class="p">
+        A crash could occur, with stack trace pointing to <code class="ph codeph">impala::RuntimeState::ErrorLog</code>.
+      </p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-3385" target="_blank">IMPALA-3385</a></p>
+      <p class="p"><strong class="ph b">Severity:</strong> High</p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title8" id="fixed_issues_260__IMPALA-3378">
+      <h3 class="title topictitle3" id="ariaid-title8">HiveUdfCall::Open() produces unsynchronized access to JniUtil::global_refs_ vector</h3>
+      <div class="body conbody">
+      <p class="p">
+        A crash could occur because of contention between multiple calls to Java UDFs.
+      </p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-3378" target="_blank">IMPALA-3378</a></p>
+      <p class="p"><strong class="ph b">Severity:</strong> High</p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title9" id="fixed_issues_260__IMPALA-3379">
+      <h3 class="title topictitle3" id="ariaid-title9">HBaseTableWriter::CreatePutList() produces unsynchronized access to JniUtil::global_refs_ vector</h3>
+      <div class="body conbody">
+      <p class="p">
+        A crash could occur because of contention between multiple concurrent statements writing to HBase.
+      </p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-3379" target="_blank">IMPALA-3379</a></p>
+      <p class="p"><strong class="ph b">Severity:</strong> High</p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title10" id="fixed_issues_260__IMPALA-3317">
+      <h3 class="title topictitle3" id="ariaid-title10">Stress test failure: sorter.cc:745] Check failed: i == 0 (1 vs. 0) </h3>
+      <div class="body conbody">
+      <p class="p">
+        A crash or wrong results could occur if the spill-to-disk mechanism encountered a zero-length string at
+        the very end of a data block.
+      </p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-3317" target="_blank">IMPALA-3317</a></p>
+      <p class="p"><strong class="ph b">Severity:</strong> High</p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title11" id="fixed_issues_260__IMPALA-3311">
+      <h3 class="title topictitle3" id="ariaid-title11">String data coming out of agg can be corrupted by blocking operators</h3>
+      <div class="body conbody">
+      <p class="p">
+        If a query plan contains an aggregation node producing string values anywhere within a subplan
+        (that is,if in the SQL statement, the aggregate function appears within an inline view over a collection column),
+        the results of the aggregation may be incorrect.
+      </p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-3311" target="_blank">IMPALA-3311</a></p>
+      <p class="p"><strong class="ph b">Severity:</strong> High</p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title12" id="fixed_issues_260__IMPALA-3269">
+      <h3 class="title topictitle3" id="ariaid-title12">CTAS with subquery throws AuthzException</h3>
+      <div class="body conbody">
+      <p class="p">
+        A <code class="ph codeph">CREATE TABLE AS SELECT</code> operation could fail with an authorization error,
+        due to a slight difference in the privilege checking for the CTAS operation.
+      </p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-3269" target="_blank">IMPALA-3269</a></p>
+      <p class="p"><strong class="ph b">Severity:</strong> High</p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title13" id="fixed_issues_260__IMPALA-3237">
+      <h3 class="title topictitle3" id="ariaid-title13">Crash on inserting into table with binary and parquet</h3>
+      <div class="body conbody">
+      <p class="p">
+        Impala incorrectly allowed <code class="ph codeph">BINARY</code> to be specified as a column type,
+        resulting in a crash during a write to a Parquet table with a column of that type.
+      </p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-3237" target="_blank">IMPALA-3237</a></p>
+      <p class="p"><strong class="ph b">Severity:</strong> High</p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title14" id="fixed_issues_260__IMPALA-3105">
+      <h3 class="title topictitle3" id="ariaid-title14">RowBatch::MaxTupleBufferSize() calculation incorrect, may lead to memory corruption</h3>
+      <div class="body conbody">
+      <p class="p">
+        A crash could occur while querying tables with very large rows, for example wide tables with many
+        columns or very large string values. This problem was identified in Impala 2.3, but had low
+        reproducibility in subsequent releases. The fix ensures the memory allocation size is correct.
+      </p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-3105" target="_blank">IMPALA-3105</a></p>
+      <p class="p"><strong class="ph b">Severity:</strong> High</p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title15" id="fixed_issues_260__IMPALA-3494">
+      <h3 class="title topictitle3" id="ariaid-title15">Thrift buffer overflows when serialize more than 3355443200 bytes in impala</h3>
+      <div class="body conbody">
+      <p class="p">
+        A very large memory allocation within the <span class="keyword cmdname">catalogd</span> daemon could exceed an internal Thrift limit,
+        causing a crash.
+      </p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-3494" target="_blank">IMPALA-3494</a></p>
+      <p class="p"><strong class="ph b">Severity:</strong> High</p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title16" id="fixed_issues_260__IMPALA-3314">
+      <h3 class="title topictitle3" id="ariaid-title16">Altering table partition's storage format is not working and crashing the daemon</h3>
+      <div class="body conbody">
+      <p class="p">
+        If a partitioned table used a file format other than Avro, and the file format of an individual partition
+        was changed to Avro, subsequent queries could encounter a crash.
+      </p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-3314" target="_blank">IMPALA-3314</a></p>
+      <p class="p"><strong class="ph b">Severity:</strong> High</p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title17" id="fixed_issues_260__IMPALA-3798">
+      <h3 class="title topictitle3" id="ariaid-title17">Race condition may cause scanners to spin with runtime filters on Avro or Sequence files</h3>
+      <div class="body conbody">
+      <p class="p">
+        A timing problem during runtime filter processing could cause queries against Avro or SequenceFile tables
+        to hang.
+      </p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-3798" target="_blank">IMPALA-3798</a></p>
+      <p class="p"><strong class="ph b">Severity:</strong> High</p>
+      </div>
+    </article>
+
+  </article>
+
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title18" id="fixed_issues__fixed_issues_254">
+    <h2 class="title topictitle2" id="ariaid-title18">Issues Fixed in <span class="keyword">Impala 2.5.4</span></h2>
+    <div class="body conbody">
+      <p class="p"></p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title19" id="fixed_issues__fixed_issues_252">
+    <h2 class="title topictitle2" id="ariaid-title19">Issues Fixed in <span class="keyword">Impala 2.5.2</span></h2>
+    <div class="body conbody">
+      <p class="p"></p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title20" id="fixed_issues__fixed_issues_251">
+
+    <h2 class="title topictitle2" id="ariaid-title20">Issues Fixed in <span class="keyword">Impala 2.5.1</span></h2>
+
+    <div class="body conbody">
+      <p class="p"></p>
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title21" id="fixed_issues__fixed_issues_250">
+
+    <h2 class="title topictitle2" id="ariaid-title21">Issues Fixed in <span class="keyword">Impala 2.5.0</span></h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        The following list contains the most critical issues (<code class="ph codeph">priority='Blocker'</code>) from the JIRA system.
+        For the full list of fixed issues in <span class="keyword">Impala 2.5</span>, see
+        <a class="xref" href="https://issues.apache.org/jira/issues/?jql=%20type%20%3D%20bug%20and%20project%20%3D%20IMPALA%20AND%20resolution%20%3D%20fixed%20AND%20affectedVersion%20!%3D%20%22Impala%202.5.0%22%20AND%20fixVersion%20%3D%20%22Impala%202.5.0%22%20and%20not%20labels%20%3D%20broken-build%20order%20by%20priority%20desc" target="_blank">this report in the Impala JIRA tracker</a>.
+      </p>
+
+    </div>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title22" id="fixed_issues_250__IMPALA-2683">
+      <h3 class="title topictitle3" id="ariaid-title22">Stress test hit assert in LLVM: external function could not be resolved</h3>
+      <div class="body conbody">
+<p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2683" target="_blank">IMPALA-2683</a></p>
+<p class="p">The stress test was running a build with the TPC-H, TPC-DS, and TPC-H nested queries with scale factor 3.</p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title23" id="fixed_issues_250__IMPALA-2365">
+      <h3 class="title topictitle3" id="ariaid-title23">Impalad is crashing if udf jar is not available in hdfs location for first time</h3>
+      <div class="body conbody">
+        <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2365" target="_blank">IMPALA-2365</a></p>
+        <p class="p">
+          If a UDF JAR was not available in the HDFS location specified in the <code class="ph codeph">CREATE FUNCTION</code> statement,
+          the <span class="keyword cmdname">impalad</span> daemon could crash.
+        </p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title24" id="fixed_issues_250__IMPALA-2535-570">
+      <h3 class="title topictitle3" id="ariaid-title24">PAGG hits mem_limit when switching to I/O buffers</h3>
+      <div class="body conbody">
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2535" target="_blank">IMPALA-2535</a></p>
+      <p class="p">
+        A join query could fail with an out-of-memory error despite the apparent presence of sufficient memory.
+        The cause was the internal ordering of operations that could cause a later phase of the query to
+        allocate memory required by an earlier phase of the query. The workaround was to either increase
+        or decrease the <code class="ph codeph">MEM_LIMIT</code> query option, because the issue would only occur for a specific
+        combination of memory limit and data volume.
+      </p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title25" id="fixed_issues_250__IMPALA-2643-570">
+      <h3 class="title topictitle3" id="ariaid-title25">Prevent migrating incorrectly inferred identity predicates into inline views</h3>
+      <div class="body conbody">
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2643" target="_blank">IMPALA-2643</a></p>
+      <p class="p">
+        Referring to the same column twice in a view definition could cause the view to omit
+        rows where that column contained a <code class="ph codeph">NULL</code> value. This could cause
+        incorrect results due to an inaccurate <code class="ph codeph">COUNT(*)</code> value or rows missing
+        from the result set.
+      </p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title26" id="fixed_issues_250__IMPALA-1459-570">
+      <h3 class="title topictitle3" id="ariaid-title26">Fix migration/assignment of On-clause predicates inside inline views</h3>
+      <div class="body conbody">
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-1459" target="_blank">IMPALA-1459</a></p>
+      <p class="p">
+        Some combinations of <code class="ph codeph">ON</code> clauses in join queries could result in comparisons
+        being applied at the wrong stage of query processing, leading to incorrect results.
+        Wrong predicate assignment could happen under the following conditions:
+      </p>
+      <ul class="ul">
+        <li class="li">
+          The query includes an inline view that contains an outer join.
+        </li>
+        <li class="li">
+          That inline view is joined with another table in the enclosing query block.
+        </li>
+        <li class="li">
+          That join has an <code class="ph codeph">ON</code> clause containing a predicate that
+          only references columns originating from the outer-joined tables inside the inline view.
+        </li>
+      </ul>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title27" id="fixed_issues_250__IMPALA-2093">
+      <h3 class="title topictitle3" id="ariaid-title27">Wrong plan of NOT IN aggregate subquery when a constant is used in subquery predicate</h3>
+      <div class="body conbody">
+        <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2093" target="_blank">IMPALA-2093</a></p>
+        <p class="p">
+          <code class="ph codeph">IN</code> subqueries might return wrong results if the left-hand side of the <code class="ph codeph">IN</code> is a constant.
+          For example:
+        </p>
+<pre class="pre codeblock"><code>
+select * from alltypestiny t1
+  where 10 not in (select sum(int_col) from alltypestiny);
+</code></pre>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title28" id="fixed_issues_250__IMPALA-2940">
+      <h3 class="title topictitle3" id="ariaid-title28">Parquet DictDecoders accumulate throughout query</h3>
+      <div class="body conbody">
+        <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2940" target="_blank">IMPALA-2940</a></p>
+        <p class="p">
+          Parquet dictionary decoders can accumulate throughout query execution, leading to excessive memory usage. One decoder is created per-column per-split.
+        </p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title29" id="fixed_issues_250__IMPALA-3056">
+      <h3 class="title topictitle3" id="ariaid-title29">Planner doesn't set the has_local_target field correctly</h3>
+      <div class="body conbody">
+<p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-3056" target="_blank">IMPALA-3056</a></p>
+
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title30" id="fixed_issues_250__IMPALA-2742">
+      <h3 class="title topictitle3" id="ariaid-title30">MemPool allocation growth behavior</h3>
+      <div class="body conbody">
+        <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2742" target="_blank">IMPALA-2742</a></p>
+        <p class="p">
+          Currently, the MemPool would always double the size of the last allocation.
+          This can lead to bad behavior if the MemPool transferred the ownership of all its data
+          except the last chunk. In the next allocation, the next allocated chunk would double
+          the size of this large chunk, which can be undesirable.
+        </p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title31" id="fixed_issues_250__IMPALA-3035">
+      <h3 class="title topictitle3" id="ariaid-title31">Drop partition operations don't follow the catalog's locking protocol</h3>
+      <div class="body conbody">
+        <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-3035" target="_blank">IMPALA-3035</a></p>
+        <p class="p">
+          The <code class="ph codeph">CatalogOpExecutor.alterTableDropPartition()</code> function violates
+          the locking protocol used in the catalog that requires <code class="ph codeph">catalogLock_</code>
+          to be acquired before any table-level lock. That may cause deadlocks when <code class="ph codeph">ALTER TABLE DROP PARTITION</code>
+          is executed concurrently with other DDL operations.
+        </p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title32" id="fixed_issues_250__IMPALA-2215">
+      <h3 class="title topictitle3" id="ariaid-title32">HAVING clause without aggregation not applied properly</h3>
+      <div class="body conbody">
+        <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2215" target="_blank">IMPALA-2215</a></p>
+        <p class="p">
+          A query with a <code class="ph codeph">HAVING</code> clause but no <code class="ph codeph">GROUP BY</code> clause was not being rejected,
+          despite being invalid syntax. For example:
+        </p>
+
+<pre class="pre codeblock"><code>
+select case when 1=1 then 'didit' end as c1 from (select 1 as one) a having 1!=1;
+</code></pre>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title33" id="fixed_issues_250__IMPALA-2914">
+      <h3 class="title topictitle3" id="ariaid-title33">Hit DCHECK Check failed: HasDateOrTime()</h3>
+      <div class="body conbody">
+        <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2914" target="_blank">IMPALA-2914</a></p>
+        <p class="p">
+          <code class="ph codeph">TimestampValue::ToTimestampVal()</code> requires a valid <code class="ph codeph">TimestampValue</code> as input.
+          This requirement was not enforced in some places, leading to serious errors.
+        </p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title34" id="fixed_issues_250__IMPALA-2986">
+      <h3 class="title topictitle3" id="ariaid-title34">Aggregation spill loop gives up too early leading to mem limit exceeded errors</h3>
+      <div class="body conbody">
+        <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2986" target="_blank">IMPALA-2986</a></p>
+        <p class="p">
+          An aggregation query could fail with an out-of-memory error, despite sufficient memory being reported as available.
+        </p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title35" id="fixed_issues_250__IMPALA-2592">
+      <h3 class="title topictitle3" id="ariaid-title35">DataStreamSender::Channel::CloseInternal() does not close the channel on an error.</h3>
+      <div class="body conbody">
+        <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2592" target="_blank">IMPALA-2592</a></p>
+        <p class="p">
+          Some queries do not close an internal communication channel on an error.
+          This will cause the node on the other side of the channel to wait indefinitely, causing the query to hang.
+          For example, this issue could happen on a Kerberos-enabled system if the credential cache was outdated.
+          Although the affected query hangs, the <span class="keyword cmdname">impalad</span> daemons continue processing other queries.
+        </p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title36" id="fixed_issues_250__IMPALA-2184">
+      <h3 class="title topictitle3" id="ariaid-title36">Codegen does not catch exceptions in FROM_UNIXTIME()</h3>
+      <div class="body conbody">
+        <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2184" target="_blank">IMPALA-2184</a></p>
+        <p class="p">
+          Querying for the min or max value of a timestamp cast from a bigint via <code class="ph codeph">from_unixtime()</code>
+          fails silently and crashes instances of <span class="keyword cmdname">impalad</span> when the input includes a value outside of the valid range.
+        </p>
+
+        <p class="p"><strong class="ph b">Workaround:</strong> Disable native code generation with:</p>
+<pre class="pre codeblock"><code>
+SET disable_codegen=true;
+</code></pre>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title37" id="fixed_issues_250__IMPALA-2788">
+      <h3 class="title topictitle3" id="ariaid-title37">Impala returns wrong result for function 'conv(bigint, from_base, to_base)'</h3>
+      <div class="body conbody">
+        <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2788" target="_blank">IMPALA-2788</a></p>
+        <p class="p">
+          Impala returns wrong result for function <code class="ph codeph">conv()</code>.
+          Function <code class="ph codeph">conv(bigint, from_base, to_base)</code> returns an correct result,
+          while <code class="ph codeph">conv(string, from_base, to_base)</code> returns the correct value.
+          For example:
+        </p>
+
+<pre class="pre codeblock"><code>
+
+select 2061013007, conv(2061013007, 16, 10), conv('2061013007', 16, 10);
++------------+--------------------------+----------------------------+
+| 2061013007 | conv(2061013007, 16, 10) | conv('2061013007', 16, 10) |
++------------+--------------------------+----------------------------+
+| 2061013007 | 1627467783               | 139066421255               |
++------------+--------------------------+----------------------------+
+Fetched 1 row(s) in 0.65s
+
+select 2061013007, conv(cast(2061013007 as bigint), 16, 10), conv('2061013007', 16, 10);
++------------+------------------------------------------+----------------------------+
+| 2061013007 | conv(cast(2061013007 as bigint), 16, 10) | conv('2061013007', 16, 10) |
++------------+------------------------------------------+----------------------------+
+| 2061013007 | 1627467783                               | 139066421255               |
++------------+------------------------------------------+----------------------------+
+
+select 2061013007, conv(cast(2061013007 as string), 16, 10), conv('2061013007', 16, 10);
++------------+------------------------------------------+----------------------------+
+| 2061013007 | conv(cast(2061013007 as string), 16, 10) | conv('2061013007', 16, 10) |
++------------+------------------------------------------+----------------------------+
+| 2061013007 | 139066421255                             | 139066421255               |
++------------+------------------------------------------+----------------------------+
+
+select 2061013007, conv(cast(cast(2061013007 as decimal(20,0)) as bigint), 16, 10), conv('2061013007', 16, 10);
++------------+-----------------------------------------------------------------+----------------------------+
+| 2061013007 | conv(cast(cast(2061013007 as decimal(20,0)) as bigint), 16, 10) | conv('2061013007', 16, 10) |
++------------+-----------------------------------------------------------------+----------------------------+
+| 2061013007 | 1627467783                                                      | 139066421255               |
++------------+-----------------------------------------------------------------+----------------------------+
+
+</code></pre>
+
+        <p class="p"><strong class="ph b">Workaround:</strong>
+          Cast the value to string and use <code class="ph codeph">conv(string, from_base, to_base)</code> for conversion.
+        </p>
+      </div>
+    </article>
+
+
+
+  </article>
+
+
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title38" id="fixed_issues__fixed_issues_241">
+
+    <h2 class="title topictitle2" id="ariaid-title38">Issues Fixed in <span class="keyword">Impala 2.4.1</span></h2>
+
+    <div class="body conbody">
+      <p class="p">
+      </p>
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title39" id="fixed_issues__fixed_issues_240">
+
+    <h2 class="title topictitle2" id="ariaid-title39">Issues Fixed in <span class="keyword">Impala 2.4.0</span></h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        The set of fixes for Impala in <span class="keyword">Impala 2.4.0</span> is the same as
+        in <span class="keyword">Impala 2.3.2</span>.
+        
+      </p>
+
+    </div>
+
+  </article>
+
+
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title40" id="fixed_issues__fixed_issues_234">
+
+    <h2 class="title topictitle2" id="ariaid-title40">Issues Fixed in <span class="keyword">Impala 2.3.4</span></h2>
+
+    <div class="body conbody">
+      <p class="p"></p>
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title41" id="fixed_issues__fixed_issues_232">
+
+    <h2 class="title topictitle2" id="ariaid-title41">Issues Fixed in <span class="keyword">Impala 2.3.2</span></h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        This section lists the most serious or frequently encountered customer
+        issues fixed in <span class="keyword">Impala 2.3.2</span>.
+      </p>
+
+    </div>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title42" id="fixed_issues_232__IMPALA-2829">
+      <h3 class="title topictitle3" id="ariaid-title42">SEGV in AnalyticEvalNode touching NULL input_stream_</h3>
+      <div class="body conbody">
+      <p class="p">
+        A query involving an analytic function could encounter a serious error.
+        This issue was encountered infrequently, depending upon specific combinations
+        of queries and data.
+      </p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2829" target="_blank">IMPALA-2829</a></p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title43" id="fixed_issues_232__IMPALA-2722">
+      <h3 class="title topictitle3" id="ariaid-title43">Free local allocations per row batch in non-partitioned AGG and HJ</h3>
+      <div class="body conbody">
+      <p class="p">
+        An outer join query could fail unexpectedly with an out-of-memory error
+        when the <span class="q">"spill to disk"</span> mechanism was turned off.
+      </p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2722" target="_blank">IMPALA-2722</a></p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title44" id="fixed_issues_232__IMPALA-2612">
+      
+      <h3 class="title topictitle3" id="ariaid-title44">Free local allocations once for every row batch when building hash tables</h3>
+      <div class="body conbody">
+      <p class="p">
+        A join query could encounter a serious error due to an internal failure to allocate memory, which
+        resulted in dereferencing a <code class="ph codeph">NULL</code> pointer.
+      </p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2612" target="_blank">IMPALA-2612</a></p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title45" id="fixed_issues_232__IMPALA-2643">
+      <h3 class="title topictitle3" id="ariaid-title45">Prevent migrating incorrectly inferred identity predicates into inline views</h3>
+      <div class="body conbody">
+      <p class="p">
+        Referring to the same column twice in a view definition could cause the view to omit
+        rows where that column contained a <code class="ph codeph">NULL</code> value. This could cause
+        incorrect results due to an inaccurate <code class="ph codeph">COUNT(*)</code> value or rows missing
+        from the result set.
+      </p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2643" target="_blank">IMPALA-2643</a></p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title46" id="fixed_issues_232__IMPALA-2695">
+      <h3 class="title topictitle3" id="ariaid-title46">Fix GRANTs on URIs with uppercase letters</h3>
+      <div class="body conbody">
+      <p class="p">
+        A <code class="ph codeph">GRANT</code> statement for a URI could be ineffective if the URI
+        contained uppercase letters, for example in an uppercase directory name.
+        Subsequent statements, such as <code class="ph codeph">CREATE EXTERNAL TABLE</code>
+        with a <code class="ph codeph">LOCATION</code> clause, could fail with an authorization exception.
+      </p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2695" target="_blank">IMPALA-2695</a></p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="IMPALA-2664-552__IMPALA-2648-552" id="fixed_issues_232__IMPALA-2664-552">
+      <h3 class="title topictitle3" id="IMPALA-2664-552__IMPALA-2648-552">Avoid sending large partition stats objects over thrift</h3>
+      <div class="body conbody">
+      <p class="p">
+        The <span class="keyword cmdname">catalogd</span> daemon could encounter a serious error
+        when loading the incremental statistics metadata for tables with large
+        numbers of partitions and columns. The problem occurred when the
+        internal representation of metadata for the table exceeded 2
+        GB, for example in a table with 20K partitions and 77 columns. The fix causes a
+        <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> operation to fail if it
+        would produce metadata that exceeded the maximum size.
+      </p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2664" target="_blank">IMPALA-2664</a>,
+        <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2648" target="_blank">IMPALA-2648</a></p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title48" id="fixed_issues_232__IMPALA-2226">
+      <h3 class="title topictitle3" id="ariaid-title48">Throw AnalysisError if table properties are too large (for the Hive metastore)</h3>
+      <div class="body conbody">
+      <p class="p">
+        <code class="ph codeph">CREATE TABLE</code> or <code class="ph codeph">ALTER TABLE</code> statements could fail with
+        metastore database errors due to length limits on the <code class="ph codeph">SERDEPROPERTIES</code> and <code class="ph codeph">TBLPROPERTIES</code> clauses.
+        (The limit on key size is 256, while the limit on value size is 4000.) The fix makes Impala handle these error conditions
+        more cleanly, by detecting too-long values rather than passing them to the metastore database.
+      </p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2226" target="_blank">IMPALA-2226</a></p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title49" id="fixed_issues_232__IMPALA-2273-552">
+      <h3 class="title topictitle3" id="ariaid-title49">Make MAX_PAGE_HEADER_SIZE configurable</h3>
+      <div class="body conbody">
+      <p class="p">
+        Impala could fail to access Parquet data files with page headers larger than 8 MB, which could
+        occur, for example, if the minimum or maximum values for a column were long strings. The
+        fix adds a configuration setting <code class="ph codeph">--max_page_header_size</code>, which you can use to
+        increase the Impala size limit to a value higher than 8 MB.
+      </p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2273" target="_blank">IMPALA-2273</a></p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title50" id="fixed_issues_232__IMPALA-2473">
+      <h3 class="title topictitle3" id="ariaid-title50">reduce scanner memory usage</h3>
+      <div class="body conbody">
+      <p class="p">
+        Queries on Parquet tables could consume excessive memory (potentially multiple gigabytes) due to producing
+        large intermediate data values while evaluating groups of rows. The workaround was to reduce the size of
+        the <code class="ph codeph">NUM_SCANNER_THREADS</code> query option, the <code class="ph codeph">BATCH_SIZE</code> query option,
+        or both.
+      </p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2473" target="_blank">IMPALA-2473</a></p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title51" id="fixed_issues_232__IMPALA-2113">
+      <h3 class="title topictitle3" id="ariaid-title51">Handle error when distinct and aggregates are used with a having clause</h3>
+      <div class="body conbody">
+      <p class="p">
+        A query that included a <code class="ph codeph">DISTINCT</code> operator and a <code class="ph codeph">HAVING</code> clause, but no
+        aggregate functions or <code class="ph codeph">GROUP BY</code>, would fail with an uninformative error message.
+      </p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2113" target="_blank">IMPALA-2113</a></p>
+      
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title52" id="fixed_issues_232__IMPALA-2225">
+      <h3 class="title topictitle3" id="ariaid-title52">Handle error when star based select item and aggregate are incorrectly used</h3>
+      <div class="body conbody">
+      <p class="p">
+        A query that included <code class="ph codeph">*</code> in the <code class="ph codeph">SELECT</code> list, in addition to an
+        aggregate function call, would fail with an uninformative message if the query had no
+        <code class="ph codeph">GROUP BY</code> clause.
+      </p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2225" target="_blank">IMPALA-2225</a></p>
+      
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title53" id="fixed_issues_232__IMPALA-2731-552">
+      <h3 class="title topictitle3" id="ariaid-title53">Refactor MemPool usage in HBase scan node</h3>
+      <div class="body conbody">
+      <p class="p">
+        Queries involving HBase tables used substantially more memory than in earlier Impala versions.
+        The problem occurred starting in Impala 2.2.8, as a result of the changes for IMPALA-2284.
+        The fix for this issue involves removing a separate memory work area for HBase queries
+        and reusing other memory that was already allocated.
+      </p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2731" target="_blank">IMPALA-2731</a></p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title54" id="fixed_issues_232__IMPALA-1459-552">
+      <h3 class="title topictitle3" id="ariaid-title54">Fix migration/assignment of On-clause predicates inside inline views</h3>
+      <div class="body conbody">
+      <p class="p">
+        Some combinations of <code class="ph codeph">ON</code> clauses in join queries could result in comparisons
+        being applied at the wrong stage of query processing, leading to incorrect results.
+        Wrong predicate assignment could happen under the following conditions:
+      </p>
+      <ul class="ul">
+        <li class="li">
+          The query includes an inline view that contains an outer join.
+        </li>
+        <li class="li">
+          That inline view is joined with another table in the enclosing query block.
+        </li>
+        <li class="li">
+          That join has an <code class="ph codeph">ON</code> clause containing a predicate that
+          only references columns originating from the outer-joined tables inside the inline view.
+        </li>
+      </ul>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-1459" target="_blank">IMPALA-1459</a></p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title55" id="fixed_issues_232__IMPALA-2558">
+      <h3 class="title topictitle3" id="ariaid-title55">DCHECK in parquet scanner after block read error</h3>
+      <div class="body conbody">
+      <p class="p">
+        A debug build of Impala could encounter a serious error after encountering some kinds of I/O
+        errors for Parquet files. This issue only occurred in debug builds, not release builds.
+      </p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2558" target="_blank">IMPALA-2558</a></p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title56" id="fixed_issues_232__IMPALA-2535">
+      <h3 class="title topictitle3" id="ariaid-title56">PAGG hits mem_limit when switching to I/O buffers</h3>
+      <div class="body conbody">
+      <p class="p">
+        A join query could fail with an out-of-memory error despite the apparent presence of sufficient memory.
+        The cause was the internal ordering of operations that could cause a later phase of the query to
+        allocate memory required by an earlier phase of the query. The workaround was to either increase
+        or decrease the <code class="ph codeph">MEM_LIMIT</code> query option, because the issue would only occur for a specific
+        combination of memory limit and data volume.
+      </p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2535" target="_blank">IMPALA-2535</a></p>
+      </div>
+    </article>
+
+    
+    
+    <article class="topic concept nested2" aria-labelledby="ariaid-title57" id="fixed_issues_232__IMPALA-2559">
+      <h3 class="title topictitle3" id="ariaid-title57">Fix check failed: sorter_runs_.back()-&gt;is_pinned_</h3>
+      <div class="body conbody">
+      <p class="p">
+        A query could fail with an internal error while calculating the memory limit.
+        This was an infrequent condition uncovered during stress testing.
+      </p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2559" target="_blank">IMPALA-2559</a></p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title58" id="fixed_issues_232__IMPALA-2614">
+      <h3 class="title topictitle3" id="ariaid-title58">Don't ignore Status returned by DataStreamRecvr::CreateMerger()</h3>
+      <div class="body conbody">
+      <p class="p">
+        A query could fail with an internal error while calculating the memory limit.
+        This was an infrequent condition uncovered during stress testing.
+      </p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2614" target="_blank">IMPALA-2614</a>,
+      <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2559" target="_blank">IMPALA-2559</a></p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title59" id="fixed_issues_232__IMPALA-2591">
+      <h3 class="title topictitle3" id="ariaid-title59">DataStreamSender::Send() does not return an error status if SendBatch() failed</h3>
+      <div class="body conbody">
+      
+      <p class="p">
+      </p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2591" target="_blank">IMPALA-2591</a></p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title60" id="fixed_issues_232__IMPALA-2598">
+      <h3 class="title topictitle3" id="ariaid-title60">Re-enable SSL and Kerberos on server-server</h3>
+      <div class="body conbody">
+      <p class="p">
+        These fixes lift the restriction on using SSL encryption and Kerberos authentication together
+        for internal communication between Impala components.
+      </p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2598" target="_blank">IMPALA-2598</a>,
+      <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2747" target="_blank">IMPALA-2747</a></p>
+      </div>
+    </article>
+
+
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title61" id="fixed_issues__fixed_issues_231">
+
+    <h2 class="title topictitle2" id="ariaid-title61">Issues Fixed in <span class="keyword">Impala 2.3.1</span></h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        The version of Impala that is included with <span class="keyword">Impala 2.3.1</span> is identical to <span class="keyword">Impala 2.3.0</span>.
+        There are no new bug fixes, new features, or incompatible changes.
+      </p>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title62" id="fixed_issues__fixed_issues_230">
+
+    <h2 class="title topictitle2" id="ariaid-title62">Issues Fixed in <span class="keyword">Impala 2.3.0</span></h2>
+
+    <div class="body conbody">
+      <p class="p"> This section lists the most serious or frequently encountered customer
+        issues fixed in <span class="keyword">Impala 2.3</span>. Any issues already fixed in
+        <span class="keyword">Impala 2.2</span> maintenance releases (up through <span class="keyword">Impala 2.2.8</span>) are also included.
+        Those issues are listed under the respective <span class="keyword">Impala 2.2</span> sections and are
+        not repeated here.
+      </p>
+
+
+
+    </div>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title63" id="fixed_issues_230__serious_230">
+      <h3 class="title topictitle3" id="ariaid-title63">Fixes for Serious Errors</h3>
+      <div class="body conbody">
+      <p class="p">
+        A number of issues were resolved that could result in serious errors
+        when encountered. The most critical or commonly encountered are
+        listed here.
+      </p>
+      <p class="p"><strong class="ph b">Bugs:</strong>
+
+
+
+      <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2168" target="_blank">IMPALA-2168</a>,
+      <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2378" target="_blank">IMPALA-2378</a>,
+      <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2369" target="_blank">IMPALA-2369</a>,
+      <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2357" target="_blank">IMPALA-2357</a>,
+      <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2319" target="_blank">IMPALA-2319</a>,
+      <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2314" target="_blank">IMPALA-2314</a>,
+      <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2016" target="_blank">IMPALA-2016</a>
+      </p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title64" id="fixed_issues_230__correctness_230">
+      <h3 class="title topictitle3" id="ariaid-title64">Fixes for Correctness Errors</h3>
+      <div class="body conbody">
+      <p class="p">
+        A number of issues were resolved that could result in wrong results
+        when encountered. The most critical or commonly encountered are
+        listed here.
+      </p>
+      <p class="p"><strong class="ph b">Bugs:</strong>
+
+
+
+      <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2192" target="_blank">IMPALA-2192</a>,
+      <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2440" target="_blank">IMPALA-2440</a>,
+      <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2090" target="_blank">IMPALA-2090</a>,
+      <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2086" target="_blank">IMPALA-2086</a>,
+      <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-1947" target="_blank">IMPALA-1947</a>,
+      <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-1917" target="_blank">IMPALA-1917</a>
+      </p>
+      </div>
+    </article>
+
+  </article>
+
+
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title65" id="fixed_issues__fixed_issues_2210">
+
+    <h2 class="title topictitle2" id="ariaid-title65">Issues Fixed in <span class="keyword">Impala 2.2.10</span></h2>
+
+    <div class="body conbody">
+      <p class="p"></p>
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title66" id="fixed_issues__fixed_issues_229">
+
+    <h2 class="title topictitle2" id="ariaid-title66">Issues Fixed in <span class="keyword">Impala 2.2.9</span></h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        This section lists the most frequently encountered customer issues fixed in <span class="keyword">Impala 2.2.9</span>.
+      </p>
+
+    </div>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title67" id="fixed_issues_229__IMPALA-1917">
+      
+      <h3 class="title topictitle3" id="ariaid-title67">Query return empty result if it contains NullLiteral in inlineview</h3>
+      <div class="body conbody">
+      <p class="p">
+        If an inline view in a <code class="ph codeph">FROM</code> clause contained a <code class="ph codeph">NULL</code> literal,
+        the result set was empty.
+      </p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-1917" target="_blank">IMPALA-1917</a></p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title68" id="fixed_issues_229__IMPALA-2731">
+      
+      <h3 class="title topictitle3" id="ariaid-title68">HBase scan node uses 2-4x memory after upgrade to Impala 2.2.8</h3>
+      <div class="body conbody">
+      <p class="p">
+        Queries involving HBase tables used substantially more memory than in earlier Impala versions.
+        The problem occurred starting in Impala 2.2.8, as a result of the changes for IMPALA-2284.
+        The fix for this issue involves removing a separate memory work area for HBase queries
+        and reusing other memory that was already allocated.
+      </p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2731" target="_blank">IMPALA-2731</a></p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title69" id="fixed_issues_229__IMPALA-1459">
+      <h3 class="title topictitle3" id="ariaid-title69">Fix migration/assignment of On-clause predicates inside inline views</h3>
+      <div class="body conbody">
+      
+      <p class="p">
+        Some combinations of <code class="ph codeph">ON</code> clauses in join queries could result in comparisons
+        being applied at the wrong stage of query processing, leading to incorrect results.
+        Wrong predicate assignment could happen under the following conditions:
+      </p>
+      <ul class="ul">
+        <li class="li">
+          The query includes an inline view that contains an outer join.
+        </li>
+        <li class="li">
+          That inline view is joined with another table in the enclosing query block.
+        </li>
+        <li class="li">
+          That join has an <code class="ph codeph">ON</code> clause containing a predicate that
+          only references columns originating from the outer-joined tables inside the inline view.
+        </li>
+      </ul>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-1459" target="_blank">IMPALA-1459</a></p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title70" id="fixed_issues_229__IMPALA-2446">
+      <h3 class="title topictitle3" id="ariaid-title70">Fix wrong predicate assignment in outer joins</h3>
+      <div class="body conbody">
+      <p class="p">
+        The join predicate for an <code class="ph codeph">OUTER JOIN</code> clause could be applied at the wrong stage
+        of query processing, leading to incorrect results.
+      </p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2446" target="_blank">IMPALA-2446</a></p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title71" id="fixed_issues_229__IMPALA-2648">
+      <h3 class="title topictitle3" id="ariaid-title71">Avoid sending large partition stats objects over thrift</h3>
+      <div class="body conbody">
+      <p class="p"> The <span class="keyword cmdname">catalogd</span> daemon could encounter a serious error when loading the
+          incremental statistics metadata for tables with large numbers of partitions and columns.
+          The problem occurred when the internal representation of metadata for the table exceeded 2
+          GB, for example in a table with 20K partitions and 77 columns. The fix causes a
+            <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> operation to fail if it would produce
+          metadata that exceeded the maximum size. </p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2648" target="_blank">IMPALA-2648</a>,
+      <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2664" target="_blank">IMPALA-2664</a></p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title72" id="fixed_issues_229__IMPALA-1675">
+      <h3 class="title topictitle3" id="ariaid-title72">Avoid overflow when adding large intervals to TIMESTAMPs</h3>
+      <div class="body conbody">
+      <p class="p"> Adding or subtracting a large <code class="ph codeph">INTERVAL</code> value to a
+            <code class="ph codeph">TIMESTAMP</code> value could produce an incorrect result, with the value
+          wrapping instead of returning an out-of-range error. </p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-1675" target="_blank">IMPALA-1675</a></p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title73" id="fixed_issues_229__IMPALA-1949">
+      <h3 class="title topictitle3" id="ariaid-title73">Analysis exception when a binary operator contains an IN operator with values</h3>
+      <div class="body conbody">
+      <p class="p">
+        An <code class="ph codeph">IN</code> operator with literal values could cause a statement to fail if used
+        as the argument to a binary operator, such as an equality test for a <code class="ph codeph">BOOLEAN</code> value.
+      </p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-1949" target="_blank">IMPALA-1949</a></p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title74" id="fixed_issues_229__IMPALA-2273">
+    
+      <h3 class="title topictitle3" id="ariaid-title74">Make MAX_PAGE_HEADER_SIZE configurable</h3>
+      <div class="body conbody">
+      <p class="p"> Impala could fail to access Parquet data files with page headers larger than 8 MB, which
+          could occur, for example, if the minimum or maximum values for a column were long strings.
+          The fix adds a configuration setting <code class="ph codeph">--max_page_header_size</code>, which you
+          can use to increase the Impala size limit to a value higher than 8 MB. </p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2273" target="_blank">IMPALA-2273</a></p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title75" id="fixed_issues_229__IMPALA-2357">
+      <h3 class="title topictitle3" id="ariaid-title75">Fix spilling sorts with var-len slots that are NULL or empty.</h3>
+      <div class="body conbody">
+      <p class="p">
+        A query that activated the spill-to-disk mechanism could fail if it contained a sort expression
+        involving certain combinations of fixed-length or variable-length types.
+      </p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2357" target="_blank">IMPALA-2357</a></p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title76" id="fixed_issues_229__block_pin_oom">
+      <h3 class="title topictitle3" id="ariaid-title76">Work-around IMPALA-2344: Fail query with OOM in case block-&gt;Pin() fails</h3>
+      <div class="body conbody">
+      <p class="p">
+        Some queries that activated the spill-to-disk mechanism could produce a serious error
+        if there was insufficient memory to set up internal work areas. Now those queries
+        produce normal out-of-memory errors instead.
+      </p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2344" target="_blank">IMPALA-2344</a></p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title77" id="fixed_issues_229__IMPALA-2252">
+      <h3 class="title topictitle3" id="ariaid-title77">Crash (likely race) tearing down BufferedBlockMgr on query failure</h3>
+      <div class="body conbody">
+      <p class="p">
+        A serious error could occur under rare circumstances, due to a race condition while freeing memory during heavily concurrent workloads.
+      </p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2252" target="_blank">IMPALA-2252</a></p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title78" id="fixed_issues_229__IMPALA-1746">
+      <h3 class="title topictitle3" id="ariaid-title78">QueryExecState doesn't check for query cancellation or errors</h3>
+      <div class="body conbody">
+      <p class="p">
+        A call to <code class="ph codeph">SetError()</code> in a user-defined function (UDF) would not cause the query to fail as expected.
+      </p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-1746" target="_blank">IMPALA-1746</a></p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title79" id="fixed_issues_229__IMPALA-2533">
+      <h3 class="title topictitle3" id="ariaid-title79">Impala throws IllegalStateException when inserting data into a partition while select
+        subquery group by partition columns</h3>
+      <div class="body conbody">
+      <p class="p">
+        An <code class="ph codeph">INSERT ... SELECT</code> operation into a partitioned table could fail if the <code class="ph codeph">SELECT</code> query
+        included a <code class="ph codeph">GROUP BY</code> clause referring to the partition key columns.
+      </p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2533" target="_blank">IMPALA-2533</a></p>
+      </div>
+    </article>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title80" id="fixed_issues__fixed_issues_228">
+
+    <h2 class="title topictitle2" id="ariaid-title80">Issues Fixed in <span class="keyword">Impala 2.2.8</span></h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        This section lists the most frequently encountered customer issues fixed in <span class="keyword">Impala 2.2.8</span>.
+      </p>
+
+    </div>
+
+
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title81" id="fixed_issues_228__IMPALA-1136">
+      <h3 class="title topictitle3" id="ariaid-title81">Impala is unable to read hive tables created with the "STORED AS AVRO" clause</h3>
+      <div class="body conbody">
+      <p class="p">Impala could not read Avro tables created in Hive with the <code class="ph codeph">STORED AS AVRO</code> clause.</p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-1136" target="_blank">IMPALA-1136</a>,
+      <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2161" target="_blank">IMPALA-2161</a></p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title82" id="fixed_issues_228__IMPALA-2213">
+      <h3 class="title topictitle3" id="ariaid-title82">make Parquet scanner fail query if the file size metadata is stale</h3>
+      <div class="body conbody">
+      <p class="p">If a Parquet file in HDFS was overwritten by a smaller file, Impala could encounter a serious error.
+      Issuing a <code class="ph codeph">INVALIDATE METADATA</code> statement before a subsequent query would avoid the error.
+      The fix allows Impala to handle such inconsistencies in Parquet file length cleanly regardless of whether the
+      table metadata is up-to-date.</p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2213" target="_blank">IMPALA-2213</a></p>
+      </div>
+    </article>
+
+     <article class="topic concept nested2" aria-labelledby="ariaid-title83" id="fixed_issues_228__IMPALA-2249">
+      <h3 class="title topictitle3" id="ariaid-title83">Avoid allocating StringBuffer &gt; 1GB in ScannerContext::Stream::GetBytesInternal()</h3>
+      <div class="body conbody">
+      <p class="p">Impala could encounter a serious error when reading compressed text files larger than 1 GB. The fix causes Impala
+      to issue an error message instead in this case.</p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2249" target="_blank">IMPALA-2249</a></p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title84" id="fixed_issues_228__IMPALA-2284">
+      <h3 class="title topictitle3" id="ariaid-title84">Disallow long (1&lt;&lt;30) strings in group_concat()</h3>
+      <div class="body conbody">
+      <p class="p">A query using the <code class="ph codeph">group_concat()</code> function could encounter a serious error if the returned string value was larger than 1 GB.
+      Now the query fails with an error message in this case.</p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2284" target="_blank">IMPALA-2284</a></p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title85" id="fixed_issues_228__IMPALA-2270">
+      <h3 class="title topictitle3" id="ariaid-title85">avoid FnvHash64to32 with empty inputs</h3>
+      <div class="body conbody">
+      <p class="p">An edge case in the algorithm used to distribute data among nodes could result in uneven distribution of work for some queries,
+      with all data sent to the same node.</p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2270" target="_blank">IMPALA-2270</a></p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title86" id="fixed_issues_228__IMPALA-2348">
+      <h3 class="title topictitle3" id="ariaid-title86">The catalog does not close the connection to HMS during table invalidation</h3>
+      <div class="body conbody">
+      <p class="p">A communication error could occur between Impala and the Hive metastore database, causing Impala operations that update
+      table metadata to fail.</p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2348" target="_blank">IMPALA-2348</a></p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title87" id="fixed_issues_228__IMPALA-2364-548">
+      <h3 class="title topictitle3" id="ariaid-title87">Wrong DCHECK in PHJ::ProcessProbeBatch</h3>
+      <div class="body conbody">
+      <p class="p">Certain queries could encounter a serious error if the spill-to-disk mechanism was activated.</p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2364" target="_blank">IMPALA-2364</a></p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title88" id="fixed_issues_228__IMPALA-2165-548">
+      <h3 class="title topictitle3" id="ariaid-title88">Avoid cardinality 0 in scan nodes of small tables and low selectivity</h3>
+      <div class="body conbody">
+      <p class="p">Impala could generate a suboptimal query plan for some queries involving small tables.</p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2165" target="_blank">IMPALA-2165</a></p>
+      </div>
+    </article>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title89" id="fixed_issues__fixed_issues_227">
+
+    <h2 class="title topictitle2" id="ariaid-title89">Issues Fixed in <span class="keyword">Impala 2.2.7</span></h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        This section lists the most frequently encountered customer issues fixed in <span class="keyword">Impala 2.2.7</span>.
+      </p>
+
+    </div>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title90" id="fixed_issues_227__IMPALA-1983">
+      <h3 class="title topictitle3" id="ariaid-title90">Warn if table stats are potentially corrupt.</h3>
+      <div class="body conbody">
+      <p class="p">
+        Impala warns if it detects a discrepancy in table statistics: a table considered to have zero rows even though there are data files present.
+        In this case, Impala also skips query optimizations that are normally applied to very small tables.
+      </p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-1983" target="_blank">IMPALA-1983:</a></p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title91" id="fixed_issues_227__IMPALA-2266">
+      <h3 class="title topictitle3" id="ariaid-title91">Pass correct child node in 2nd phase merge aggregation.</h3>
+      <div class="body conbody">
+      <p class="p">A query could encounter a serious error if it included a particular combination of aggregate functions and inline views.</p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2266" target="_blank">IMPALA-2266</a></p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title92" id="fixed_issues_227__IMPALA-2216">
+      <h3 class="title topictitle3" id="ariaid-title92">Set the output smap of an EmptySetNode produced from an empty inline view.</h3>
+      <div class="body conbody">
+      <p class="p">A query could encounter a serious error if it included an inline view whose subquery had no <code class="ph codeph">FROM</code> clause.</p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2216" target="_blank">IMPALA-2216</a></p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title93" id="fixed_issues_227__IMPALA-2203">
+      <h3 class="title topictitle3" id="ariaid-title93">Set an InsertStmt's result exprs from the source statement's result exprs.</h3>
+      <div class="body conbody">
+      <p class="p">
+      A <code class="ph codeph">CREATE TABLE AS SELECT</code> or <code class="ph codeph">INSERT ... SELECT</code> statement could produce
+      different results than a <code class="ph codeph">SELECT</code> statement, for queries including a <code class="ph codeph">FULL JOIN</code> clause
+      and including literal values in the select list.
+      </p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2203" target="_blank">IMPALA-2203</a></p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title94" id="fixed_issues_227__IMPALA-2088">
+      <h3 class="title topictitle3" id="ariaid-title94">Fix planning of empty union operands with analytics.</h3>
+      <div class="body conbody">
+      <p class="p">
+        A query could return incorrect results if it contained a <code class="ph codeph">UNION</code> clause,
+        calls to analytic functions, and a constant expression that evaluated to <code class="ph codeph">FALSE</code>.
+      </p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2088" target="_blank">IMPALA-2088</a></p>
+      </div>
+    </article>
+
+
+
+
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title95" id="fixed_issues_227__IMPALA-2089">
+      <h3 class="title topictitle3" id="ariaid-title95">Retain eq predicates bound by grouping slots with complex grouping exprs.</h3>
+      <div class="body conbody">
+      <p class="p">
+        A query containing an <code class="ph codeph">INNER JOIN</code> clause could return undesired rows.
+        Some predicate specified in the <code class="ph codeph">ON</code> clause could be omitted from the filtering operation.
+      </p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2089" target="_blank">IMPALA-2089</a></p>
+      </div>
+    </article>
+
+
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title96" id="fixed_issues_227__IMPALA-2199">
+      <h3 class="title topictitle3" id="ariaid-title96">Row count not set for empty partition when spec is used with compute incremental stats</h3>
+      <div class="body conbody">
+      <p class="p">
+        A <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> statement could leave the row count for an emptyp partition as -1,
+        rather than initializing the row count to 0. The missing statistic value could result in reduced query performance.
+      </p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2199" target="_blank">IMPALA-2199</a></p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title97" id="fixed_issues_227__IMPALA-1898">
+      <h3 class="title topictitle3" id="ariaid-title97">Explicit aliases + ordinals analysis bug</h3>
+      <div class="body conbody">
+      <p class="p">
+        A query could encounter a serious error if it included column aliases with the same names as table columns, and used
+        ordinal numbers in an <code class="ph codeph">ORDER BY</code> or <code class="ph codeph">GROUP BY</code> clause.
+      </p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-1898" target="_blank">IMPALA-1898</a></p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title98" id="fixed_issues_227__IMPALA-1987">
+      <h3 class="title topictitle3" id="ariaid-title98">Fix TupleIsNullPredicate to return false if no tuples are nullable.</h3>
+      <div class="body conbody">
+      <p class="p">
+        A query could return incorrect results if it included an outer join clause, inline views, and calls to functions such as <code class="ph codeph">coalesce()</code>
+        that can generate <code class="ph codeph">NULL</code> values.
+      </p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-1987" target="_blank">IMPALA-1987</a></p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title99" id="fixed_issues_227__IMPALA-2178">
+      <h3 class="title topictitle3" id="ariaid-title99">fix Expr::ComputeResultsLayout() logic</h3>
+      <div class="body conbody">
+      <p class="p">
+        A query could return incorrect results if the table contained multiple <code class="ph codeph">CHAR</code> columns with length of 2 or less,
+        and the query included a <code class="ph codeph">GROUP BY</code> clause that referred to multiple such columns.
+      </p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2178" target="_blank">IMPALA-2178</a></p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title100" id="fixed_issues_227__IMPALA-1737">
+      <h3 class="title topictitle3" id="ariaid-title100">Substitute an InsertStmt's partition key exprs with the root node's smap.</h3>
+      <div class="body conbody">
+      <p class="p">
+        An <code class="ph codeph">INSERT</code> statement could encounter a serious error if the <code class="ph codeph">SELECT</code>
+        portion called an analytic function.
+      </p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-1737" target="_blank">IMPALA-1737</a></p>
+      </div>
+    </article>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title101" id="fixed_issues__fixed_issues_225">
+
+    <h2 class="title topictitle2" id="ariaid-title101">Issues Fixed in Impala <span class="keyword">Impala 2.2.5</span></h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        This section lists the most frequently encountered customer issues fixed in <span class="keyword">Impala 2.2.5</span>.
+      </p>
+
+    </div>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title102" id="fixed_issues_225__IMPALA-2048">
+      <h3 class="title topictitle3" id="ariaid-title102">Impala DML/DDL operations corrupt table metadata leading to Hive query failures</h3>
+      <div class="body conbody">
+      <p class="p">
+        When the Impala <code class="ph codeph">COMPUTE STATS</code> statement was run on a partitioned Parquet table that was created in Hive, the table subsequently became inaccessible in Hive.
+        The table was still accessible to Impala. Regaining access in Hive required a workaround of creating a new table. The error displayed in Hive was:
+      </p>
+<pre class="pre codeblock"><code>Error: Error while compiling statement: FAILED: SemanticException Class not found: org.apache.impala.hive.serde.ParquetInputFormat (state=42000,code=40000)</code></pre>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2048" target="_blank">IMPALA-2048</a></p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title103" id="fixed_issues_225__IMPALA-1929">
+      <h3 class="title topictitle3" id="ariaid-title103">Avoiding a DCHECK of NULL hash table in spilled right joins</h3>
+      
+      <div class="body conbody">
+      <p class="p">
+        A query could encounter a serious error if it contained a <code class="ph codeph">RIGHT OUTER</code>, <code class="ph codeph">RIGHT ANTI</code>, or <code class="ph codeph">FULL OUTER</code> join clause
+        and approached the memory limit on a host so that the <span class="q">"spill to disk"</span> mechanism was activated.
+      </p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-1929" target="_blank">IMPALA-1929</a></p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title104" id="fixed_issues_225__IMPALA-2136">
+      <h3 class="title topictitle3" id="ariaid-title104">Bug in PrintTColumnValue caused wrong stats for TINYINT partition cols</h3>
+      
+      <div class="body conbody">
+      <p class="p">
+        Declaring a partition key column as a <code class="ph codeph">TINYINT</code> caused problems with the <code class="ph codeph">COMPUTE STATS</code> statement.
+        The associated partitions would always have zero estimated rows, leading to potential inefficient query plans.
+      </p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2136" target="_blank">IMPALA-2136</a></p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title105" id="fixed_issues_225__IMPALA-2018">
+      <h3 class="title topictitle3" id="ariaid-title105">Where clause does not propagate to joins inside nested views</h3>
+      
+      <div class="body conbody">
+      <p class="p">
+        A query that referred to a view whose query referred to another view containing a join, could return incorrect results.
+        <code class="ph codeph">WHERE</code> clauses for the outermost query were not always applied, causing the result
+        set to include additional rows that should have been filtered out.
+      </p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2018" target="_blank">IMPALA-2018</a></p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title106" id="fixed_issues_225__IMPALA-2064">
+      <h3 class="title topictitle3" id="ariaid-title106">Add effective_user() builtin</h3>
+      
+      <div class="body conbody">
+      <p class="p">
+        The <code class="ph codeph">user()</code> function returned the name of the logged-in user, which might not be the
+        same as the user name being checked for authorization if, for example, delegation was enabled.
+      </p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2064" target="_blank">IMPALA-2064</a></p>
+      <p class="p"><strong class="ph b">Resolution:</strong> Rather than change the behavior of the <code class="ph codeph">user()</code> function,
+      the fix introduces an additional function <code class="ph codeph">effective_user()</code> that returns the user name that is checked during authorization.</p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title107" id="fixed_issues_225__IMPALA-2125">
+      <h3 class="title topictitle3" id="ariaid-title107">Make UTC to local TimestampValue conversion faster.</h3>
+      
+      <div class="body conbody">
+      <p class="p">
+        Query performance was improved substantially for Parquet files containing <code class="ph codeph">TIMESTAMP</code>
+        data written by Hive, when the <code class="ph codeph">-convert_legacy_hive_parquet_utc_timestamps=true</code> setting
+        is in effect.
+      </p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2125" target="_blank">IMPALA-2125</a></p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title108" id="fixed_issues_225__IMPALA-2065">
+      <h3 class="title topictitle3" id="ariaid-title108">Workaround IMPALA-1619 in BufferedBlockMgr::ConsumeMemory()</h3>
+      
+      <div class="body conbody">
+      <p class="p">
+        A join query could encounter a serious error if the query
+        approached the memory limit on a host so that the <span class="q">"spill to disk"</span> mechanism was activated,
+        and data volume in the join was large enough that an internal memory buffer exceeded 1 GB in size on a particular host.
+        (Exceeding this limit would only happen for huge join queries, because Impala could split this intermediate data
+        into 16 parts during the join query, and the buffer only contains compact bookkeeping data rather than the actual
+        join column data.)
+      </p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2065" target="_blank">IMPALA-2065</a></p>
+      </div>
+    </article>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title109" id="fixed_issues__fixed_issues_223">
+
+    <h2 class="title topictitle2" id="ariaid-title109">Issues Fixed in <span class="keyword">Impala 2.2.3</span></h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        This section lists the most frequently encountered customer issues fixed in <span class="keyword">Impala 2.2.3</span>.
+      </p>
+
+    </div>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title110" id="fixed_issues_223__isilon_support">
+      <h3 class="title topictitle3" id="ariaid-title110">Enable using Isilon as the underlying filesystem.</h3>
+      <div class="body conbody">
+      <p class="p">
+        Enabling Impala to work with the Isilon filesystem involves a number of
+        fixes to performance and flexibility for dealing with I/O using remote reads.
+        See <a class="xref" href="impala_isilon.html#impala_isilon">Using Impala with Isilon Storage</a> for details on using Impala and Isilon together.
+      </p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-1968" target="_blank">IMPALA-1968</a>,
+      <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-1730" target="_blank">IMPALA-1730</a></p>
+      </div>
+    </article>
+
+
+
+
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title111" id="fixed_issues_223__IMPALA-1381">
+      <h3 class="title topictitle3" id="ariaid-title111">Expand set of supported timezones.</h3>
+      <div class="body conbody">
+      <p class="p">
+        The set of timezones recognized by Impala was expanded.
+        You can always find the latest list of supported timezones in the
+        Impala source code, in the file
+        <a class="xref" href="https://github.com/apache/incubator-impala/blob/master/be/src/exprs/timezone_db.cc" target="_blank">timezone_db.cc</a>.
+      </p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-1381" target="_blank">IMPALA-1381</a></p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title112" id="fixed_issues_223__IMPALA-1963">
+      <h3 class="title topictitle3" id="ariaid-title112">Impala Timestamp ISO-8601 Support.</h3>
+      <div class="body conbody">
+      <p class="p">
+        Impala can now process <code class="ph codeph">TIMESTAMP</code> literals including a trailing <code class="ph codeph">z</code>,
+        signifying <span class="q">"Zulu"</span> time, a synonym for UTC.
+      </p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-1963" target="_blank">IMPALA-1963</a></p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title113" id="fixed_issues_223__IMPALA-2008">
+      <h3 class="title topictitle3" id="ariaid-title113">Fix wrong warning when insert overwrite to empty table</h3>
+      <div class="body conbody">
+      <p class="p">
+        An <code class="ph codeph">INSERT OVERWRITE</code> operation would encounter an error
+        if the <code class="ph codeph">SELECT</code> portion of the statement returned zero
+        rows, such as with a <code class="ph codeph">LIMIT 0</code> clause.
+      </p>
+      <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2008" target="_blank">IMPALA-2008</a></p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title114" id="fixed_issues_223__IMPALA-1952">
+    
+    
+      <h3 class="title topictitle3" id="ariaid-title114">Expand parsing of decimals to include scientific notation</h3>
+      <div class="body conbody">
+      <p class="p">
+        <code class="ph codeph">DECIMAL</code> literals can now include <code class="ph codeph">e</code> scientific notati

<TRUNCATED>

[17/51] [partial] incubator-impala git commit: IMPALA-4181 [DOCS] Publish rendered Impala documentation to ASF site

Posted by jb...@apache.org.

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_parquet.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_parquet.html b/docs/build/html/topics/impala_parquet.html
new file mode 100644
index 0000000..894c97a
--- /dev/null
+++ b/docs/build/html/topics/impala_parquet.html
@@ -0,0 +1,1392 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_file_formats.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content=
 "Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content=
 "parquet"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Using the Parquet File Format with Impala Tables</title></head><body id="parquet"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Using the Parquet File Format with Impala Tables</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      
+      Impala helps you to create, manage, and query Parquet tables. Parquet is a column-oriented binary file format
+      intended to be highly efficient for the types of large-scale queries that Impala is best at. Parquet is
+      especially good for queries scanning particular columns within a table, for example to query <span class="q">"wide"</span>
+      tables with many columns, or to perform aggregation operations such as <code class="ph codeph">SUM()</code> and
+      <code class="ph codeph">AVG()</code> that need to process most or all of the values from a column. Each data file contains
+      the values for a set of rows (the <span class="q">"row group"</span>). Within a data file, the values from each column are
+      organized so that they are all adjacent, enabling good compression for the values from that column. Queries
+      against a Parquet table can retrieve and analyze these values from any column quickly and with minimal I/O.
+    </p>
+
+    <table class="table"><caption><span class="table--title-label">Table 1. </span><span class="title">Parquet Format Support in Impala</span></caption><colgroup><col style="width:10%"><col style="width:10%"><col style="width:20%"><col style="width:30%"><col style="width:30%"></colgroup><thead class="thead">
+          <tr class="row">
+            <th class="entry nocellnorowborder" id="parquet__entry__1">
+              File Type
+            </th>
+            <th class="entry nocellnorowborder" id="parquet__entry__2">
+              Format
+            </th>
+            <th class="entry nocellnorowborder" id="parquet__entry__3">
+              Compression Codecs
+            </th>
+            <th class="entry nocellnorowborder" id="parquet__entry__4">
+              Impala Can CREATE?
+            </th>
+            <th class="entry nocellnorowborder" id="parquet__entry__5">
+              Impala Can INSERT?
+            </th>
+          </tr>
+        </thead><tbody class="tbody">
+          <tr class="row">
+            <td class="entry nocellnorowborder" headers="parquet__entry__1 ">
+              <a class="xref" href="impala_parquet.html#parquet">Parquet</a>
+            </td>
+            <td class="entry nocellnorowborder" headers="parquet__entry__2 ">
+              Structured
+            </td>
+            <td class="entry nocellnorowborder" headers="parquet__entry__3 ">
+              Snappy, gzip; currently Snappy by default
+            </td>
+            <td class="entry nocellnorowborder" headers="parquet__entry__4 ">
+              Yes.
+            </td>
+            <td class="entry nocellnorowborder" headers="parquet__entry__5 ">
+              Yes: <code class="ph codeph">CREATE TABLE</code>, <code class="ph codeph">INSERT</code>, <code class="ph codeph">LOAD DATA</code>, and query.
+            </td>
+          </tr>
+        </tbody></table>
+
+    <p class="p toc inpage"></p>
+
+  </div>
+
+
+  <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_file_formats.html">How Impala Works with Hadoop File Formats</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="parquet__parquet_ddl">
+
+    <h2 class="title topictitle2" id="ariaid-title2">Creating Parquet Tables in Impala</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        To create a table named <code class="ph codeph">PARQUET_TABLE</code> that uses the Parquet format, you would use a
+        command like the following, substituting your own table name, column names, and data types:
+      </p>
+
+<pre class="pre codeblock"><code>[impala-host:21000] &gt; create table <var class="keyword varname">parquet_table_name</var> (x INT, y STRING) STORED AS PARQUET;</code></pre>
+
+
+
+      <p class="p">
+        Or, to clone the column names and data types of an existing table:
+      </p>
+
+<pre class="pre codeblock"><code>[impala-host:21000] &gt; create table <var class="keyword varname">parquet_table_name</var> LIKE <var class="keyword varname">other_table_name</var> STORED AS PARQUET;</code></pre>
+
+      <p class="p">
+        In Impala 1.4.0 and higher, you can derive column definitions from a raw Parquet data file, even without an
+        existing Impala table. For example, you can create an external table pointing to an HDFS directory, and
+        base the column definitions on one of the files in that directory:
+      </p>
+
+<pre class="pre codeblock"><code>CREATE EXTERNAL TABLE ingest_existing_files LIKE PARQUET '/user/etl/destination/datafile1.dat'
+  STORED AS PARQUET
+  LOCATION '/user/etl/destination';
+</code></pre>
+
+      <p class="p">
+        Or, you can refer to an existing data file and create a new empty table with suitable column definitions.
+        Then you can use <code class="ph codeph">INSERT</code> to create new data files or <code class="ph codeph">LOAD DATA</code> to transfer
+        existing data files into the new table.
+      </p>
+
+<pre class="pre codeblock"><code>CREATE TABLE columns_from_data_file LIKE PARQUET '/user/etl/destination/datafile1.dat'
+  STORED AS PARQUET;
+</code></pre>
+
+      <p class="p">
+        The default properties of the newly created table are the same as for any other <code class="ph codeph">CREATE
+        TABLE</code> statement. For example, the default file format is text; if you want the new table to use
+        the Parquet file format, include the <code class="ph codeph">STORED AS PARQUET</code> file also.
+      </p>
+
+      <p class="p">
+        In this example, the new table is partitioned by year, month, and day. These partition key columns are not
+        part of the data file, so you specify them in the <code class="ph codeph">CREATE TABLE</code> statement:
+      </p>
+
+<pre class="pre codeblock"><code>CREATE TABLE columns_from_data_file LIKE PARQUET '/user/etl/destination/datafile1.dat'
+  PARTITION (year INT, month TINYINT, day TINYINT)
+  STORED AS PARQUET;
+</code></pre>
+
+      <p class="p">
+        See <a class="xref" href="impala_create_table.html#create_table">CREATE TABLE Statement</a> for more details about the <code class="ph codeph">CREATE TABLE
+        LIKE PARQUET</code> syntax.
+      </p>
+
+      <p class="p">
+        Once you have created a table, to insert data into that table, use a command similar to the following,
+        again with your own table names:
+      </p>
+
+      
+
+<pre class="pre codeblock"><code>[impala-host:21000] &gt; insert overwrite table <var class="keyword varname">parquet_table_name</var> select * from <var class="keyword varname">other_table_name</var>;</code></pre>
+
+      <p class="p">
+        If the Parquet table has a different number of columns or different column names than the other table,
+        specify the names of columns from the other table rather than <code class="ph codeph">*</code> in the
+        <code class="ph codeph">SELECT</code> statement.
+      </p>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="parquet__parquet_etl">
+
+    <h2 class="title topictitle2" id="ariaid-title3">Loading Data into Parquet Tables</h2>
+  
+
+    <div class="body conbody">
+
+      <p class="p">
+        Choose from the following techniques for loading data into Parquet tables, depending on whether the
+        original data is already in an Impala table, or exists as raw data files outside Impala.
+      </p>
+
+      <p class="p">
+        If you already have data in an Impala or Hive table, perhaps in a different file format or partitioning
+        scheme, you can transfer the data to a Parquet table using the Impala <code class="ph codeph">INSERT...SELECT</code>
+        syntax. You can convert, filter, repartition, and do other things to the data as part of this same
+        <code class="ph codeph">INSERT</code> statement. See <a class="xref" href="#parquet_compression">Snappy and GZip Compression for Parquet Data Files</a> for some examples showing how to
+        insert data into Parquet tables.
+      </p>
+
+      <div class="p">
+        When inserting into partitioned tables, especially using the Parquet file format, you can include a hint in
+        the <code class="ph codeph">INSERT</code> statement to fine-tune the overall performance of the operation and its
+        resource usage:
+        <ul class="ul">
+          <li class="li">
+            These hints are available in Impala 1.2.2 and higher.
+          </li>
+
+          <li class="li">
+            You would only use these hints if an <code class="ph codeph">INSERT</code> into a partitioned Parquet table was
+            failing due to capacity limits, or if such an <code class="ph codeph">INSERT</code> was succeeding but with
+            less-than-optimal performance.
+          </li>
+
+          <li class="li">
+            To use these hints, put the hint keyword <code class="ph codeph">[SHUFFLE]</code> or <code class="ph codeph">[NOSHUFFLE]</code>
+            (including the square brackets) after the <code class="ph codeph">PARTITION</code> clause, immediately before the
+            <code class="ph codeph">SELECT</code> keyword.
+          </li>
+
+          <li class="li">
+            <code class="ph codeph">[SHUFFLE]</code> selects an execution plan that minimizes the number of files being written
+            simultaneously to HDFS, and the number of memory buffers holding data for individual partitions. Thus
+            it reduces overall resource usage for the <code class="ph codeph">INSERT</code> operation, allowing some
+            <code class="ph codeph">INSERT</code> operations to succeed that otherwise would fail. It does involve some data
+            transfer between the nodes so that the data files for a particular partition are all constructed on the
+            same node.
+          </li>
+
+          <li class="li">
+            <code class="ph codeph">[NOSHUFFLE]</code> selects an execution plan that might be faster overall, but might also
+            produce a larger number of small data files or exceed capacity limits, causing the
+            <code class="ph codeph">INSERT</code> operation to fail. Use <code class="ph codeph">[SHUFFLE]</code> in cases where an
+            <code class="ph codeph">INSERT</code> statement fails or runs inefficiently due to all nodes attempting to construct
+            data for all partitions.
+          </li>
+
+          <li class="li">
+            Impala automatically uses the <code class="ph codeph">[SHUFFLE]</code> method if any partition key column in the
+            source table, mentioned in the <code class="ph codeph">INSERT ... SELECT</code> query, does not have column
+            statistics. In this case, only the <code class="ph codeph">[NOSHUFFLE]</code> hint would have any effect.
+          </li>
+
+          <li class="li">
+            If column statistics are available for all partition key columns in the source table mentioned in the
+            <code class="ph codeph">INSERT ... SELECT</code> query, Impala chooses whether to use the <code class="ph codeph">[SHUFFLE]</code>
+            or <code class="ph codeph">[NOSHUFFLE]</code> technique based on the estimated number of distinct values in those
+            columns and the number of nodes involved in the <code class="ph codeph">INSERT</code> operation. In this case, you
+            might need the <code class="ph codeph">[SHUFFLE]</code> or the <code class="ph codeph">[NOSHUFFLE]</code> hint to override the
+            execution plan selected by Impala.
+          </li>
+        </ul>
+      </div>
+
+      <p class="p">
+        Any <code class="ph codeph">INSERT</code> statement for a Parquet table requires enough free space in the HDFS filesystem
+        to write one block. Because Parquet data files use a block size of 1 GB by default, an
+        <code class="ph codeph">INSERT</code> might fail (even for a very small amount of data) if your HDFS is running low on
+        space.
+      </p>
+
+      
+
+      <p class="p">
+        Avoid the <code class="ph codeph">INSERT...VALUES</code> syntax for Parquet tables, because
+        <code class="ph codeph">INSERT...VALUES</code> produces a separate tiny data file for each
+        <code class="ph codeph">INSERT...VALUES</code> statement, and the strength of Parquet is in its handling of data
+        (compressing, parallelizing, and so on) in <span class="ph">large</span> chunks.
+      </p>
+
+      <p class="p">
+        If you have one or more Parquet data files produced outside of Impala, you can quickly make the data
+        queryable through Impala by one of the following methods:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          The <code class="ph codeph">LOAD DATA</code> statement moves a single data file or a directory full of data files into
+          the data directory for an Impala table. It does no validation or conversion of the data. The original
+          data files must be somewhere in HDFS, not the local filesystem.
+          
+        </li>
+
+        <li class="li">
+          The <code class="ph codeph">CREATE TABLE</code> statement with the <code class="ph codeph">LOCATION</code> clause creates a table
+          where the data continues to reside outside the Impala data directory. The original data files must be
+          somewhere in HDFS, not the local filesystem. For extra safety, if the data is intended to be long-lived
+          and reused by other applications, you can use the <code class="ph codeph">CREATE EXTERNAL TABLE</code> syntax so that
+          the data files are not deleted by an Impala <code class="ph codeph">DROP TABLE</code> statement.
+          
+        </li>
+
+        <li class="li">
+          If the Parquet table already exists, you can copy Parquet data files directly into it, then use the
+          <code class="ph codeph">REFRESH</code> statement to make Impala recognize the newly added data. Remember to preserve
+          the block size of the Parquet data files by using the <code class="ph codeph">hadoop distcp -pb</code> command rather
+          than a <code class="ph codeph">-put</code> or <code class="ph codeph">-cp</code> operation on the Parquet files. See
+          <a class="xref" href="#parquet_compression_multiple">Example of Copying Parquet Data Files</a> for an example of this kind of operation.
+        </li>
+      </ul>
+
+      <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+        <p class="p">
+          Currently, Impala always decodes the column data in Parquet files based on the ordinal position of the
+          columns, not by looking up the position of each column based on its name. Parquet files produced outside
+          of Impala must write column data in the same order as the columns are declared in the Impala table. Any
+          optional columns that are omitted from the data files must be the rightmost columns in the Impala table
+          definition.
+        </p>
+
+        <p class="p">
+          If you created compressed Parquet files through some tool other than Impala, make sure that any
+          compression codecs are supported in Parquet by Impala. For example, Impala does not currently support LZO
+          compression in Parquet files. Also doublecheck that you used any recommended compatibility settings in
+          the other tool, such as <code class="ph codeph">spark.sql.parquet.binaryAsString</code> when writing Parquet files
+          through Spark.
+        </p>
+      </div>
+
+      <p class="p">
+        Recent versions of Sqoop can produce Parquet output files using the <code class="ph codeph">--as-parquetfile</code>
+        option.
+      </p>
+
+      <p class="p"> If you use Sqoop to
+        convert RDBMS data to Parquet, be careful with interpreting any
+        resulting values from <code class="ph codeph">DATE</code>, <code class="ph codeph">DATETIME</code>,
+        or <code class="ph codeph">TIMESTAMP</code> columns. The underlying values are
+        represented as the Parquet <code class="ph codeph">INT64</code> type, which is
+        represented as <code class="ph codeph">BIGINT</code> in the Impala table. The Parquet
+        values represent the time in milliseconds, while Impala interprets
+          <code class="ph codeph">BIGINT</code> as the time in seconds. Therefore, if you have
+        a <code class="ph codeph">BIGINT</code> column in a Parquet table that was imported
+        this way from Sqoop, divide the values by 1000 when interpreting as the
+          <code class="ph codeph">TIMESTAMP</code> type.</p>
+
+      <p class="p">
+        If the data exists outside Impala and is in some other format, combine both of the preceding techniques.
+        First, use a <code class="ph codeph">LOAD DATA</code> or <code class="ph codeph">CREATE EXTERNAL TABLE ... LOCATION</code> statement to
+        bring the data into an Impala table that uses the appropriate file format. Then, use an
+        <code class="ph codeph">INSERT...SELECT</code> statement to copy the data to the Parquet table, converting to Parquet
+        format as part of the process.
+      </p>
+
+      
+
+      <p class="p">
+        Loading data into Parquet tables is a memory-intensive operation, because the incoming data is buffered
+        until it reaches <span class="ph">one data block</span> in size, then that chunk of data is
+        organized and compressed in memory before being written out. The memory consumption can be larger when
+        inserting data into partitioned Parquet tables, because a separate data file is written for each
+        combination of partition key column values, potentially requiring several
+        <span class="ph">large</span> chunks to be manipulated in memory at once.
+      </p>
+
+      <p class="p">
+        When inserting into a partitioned Parquet table, Impala redistributes the data among the nodes to reduce
+        memory consumption. You might still need to temporarily increase the memory dedicated to Impala during the
+        insert operation, or break up the load operation into several <code class="ph codeph">INSERT</code> statements, or both.
+      </p>
+
+      <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+        All the preceding techniques assume that the data you are loading matches the structure of the destination
+        table, including column order, column names, and partition layout. To transform or reorganize the data,
+        start by loading the data into a Parquet table that matches the underlying structure of the data, then use
+        one of the table-copying techniques such as <code class="ph codeph">CREATE TABLE AS SELECT</code> or <code class="ph codeph">INSERT ...
+        SELECT</code> to reorder or rename columns, divide the data among multiple partitions, and so on. For
+        example to take a single comprehensive Parquet data file and load it into a partitioned table, you would
+        use an <code class="ph codeph">INSERT ... SELECT</code> statement with dynamic partitioning to let Impala create separate
+        data files with the appropriate partition values; for an example, see
+        <a class="xref" href="impala_insert.html#insert">INSERT Statement</a>.
+      </div>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="parquet__parquet_performance">
+
+    <h2 class="title topictitle2" id="ariaid-title4">Query Performance for Impala Parquet Tables</h2>
+  
+
+    <div class="body conbody">
+
+      <p class="p">
+        Query performance for Parquet tables depends on the number of columns needed to process the
+        <code class="ph codeph">SELECT</code> list and <code class="ph codeph">WHERE</code> clauses of the query, the way data is divided into
+        <span class="ph">large data files with block size equal to file size</span>, the reduction in I/O
+        by reading the data for each column in compressed format, which data files can be skipped (for partitioned
+        tables), and the CPU overhead of decompressing the data for each column.
+      </p>
+
+      <div class="p">
+        For example, the following is an efficient query for a Parquet table:
+<pre class="pre codeblock"><code>select avg(income) from census_data where state = 'CA';</code></pre>
+        The query processes only 2 columns out of a large number of total columns. If the table is partitioned by
+        the <code class="ph codeph">STATE</code> column, it is even more efficient because the query only has to read and decode
+        1 column from each data file, and it can read only the data files in the partition directory for the state
+        <code class="ph codeph">'CA'</code>, skipping the data files for all the other states, which will be physically located
+        in other directories.
+      </div>
+
+      <div class="p">
+        The following is a relatively inefficient query for a Parquet table:
+<pre class="pre codeblock"><code>select * from census_data;</code></pre>
+        Impala would have to read the entire contents of each <span class="ph">large</span> data file,
+        and decompress the contents of each column for each row group, negating the I/O optimizations of the
+        column-oriented format. This query might still be faster for a Parquet table than a table with some other
+        file format, but it does not take advantage of the unique strengths of Parquet data files.
+      </div>
+
+      <p class="p">
+        Impala can optimize queries on Parquet tables, especially join queries, better when statistics are
+        available for all the tables. Issue the <code class="ph codeph">COMPUTE STATS</code> statement for each table after
+        substantial amounts of data are loaded into or appended to it. See
+        <a class="xref" href="impala_compute_stats.html#compute_stats">COMPUTE STATS Statement</a> for details.
+      </p>
+
+      <p class="p">
+        The runtime filtering feature, available in <span class="keyword">Impala 2.5</span> and higher, works best with Parquet tables.
+        The per-row filtering aspect only applies to Parquet tables.
+        See <a class="xref" href="impala_runtime_filtering.html#runtime_filtering">Runtime Filtering for Impala Queries (Impala 2.5 or higher only)</a> for details.
+      </p>
+
+      <p class="p">
+        In <span class="keyword">Impala 2.6</span> and higher, Impala queries are optimized for files stored in Amazon S3.
+        For Impala tables that use the file formats Parquet, RCFile, SequenceFile,
+        Avro, and uncompressed text, the setting <code class="ph codeph">fs.s3a.block.size</code>
+        in the <span class="ph filepath">core-site.xml</span> configuration file determines
+        how Impala divides the I/O work of reading the data files. This configuration
+        setting is specified in bytes. By default, this
+        value is 33554432 (32 MB), meaning that Impala parallelizes S3 read operations on the files
+        as if they were made up of 32 MB blocks. For example, if your S3 queries primarily access
+        Parquet files written by MapReduce or Hive, increase <code class="ph codeph">fs.s3a.block.size</code>
+        to 134217728 (128 MB) to match the row group size of those files. If most S3 queries involve
+        Parquet files written by Impala, increase <code class="ph codeph">fs.s3a.block.size</code>
+        to 268435456 (256 MB) to match the row group size produced by Impala.
+      </p>
+
+    </div>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title5" id="parquet_performance__parquet_partitioning">
+
+      <h3 class="title topictitle3" id="ariaid-title5">Partitioning for Parquet Tables</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          As explained in <a class="xref" href="impala_partitioning.html#partitioning">Partitioning for Impala Tables</a>, partitioning is an important
+          performance technique for Impala generally. This section explains some of the performance considerations
+          for partitioned Parquet tables.
+        </p>
+
+        <p class="p">
+          The Parquet file format is ideal for tables containing many columns, where most queries only refer to a
+          small subset of the columns. As explained in <a class="xref" href="#parquet_data_files">How Parquet Data Files Are Organized</a>, the physical layout of
+          Parquet data files lets Impala read only a small fraction of the data for many queries. The performance
+          benefits of this approach are amplified when you use Parquet tables in combination with partitioning.
+          Impala can skip the data files for certain partitions entirely, based on the comparisons in the
+          <code class="ph codeph">WHERE</code> clause that refer to the partition key columns. For example, queries on
+          partitioned tables often analyze data for time intervals based on columns such as <code class="ph codeph">YEAR</code>,
+          <code class="ph codeph">MONTH</code>, and/or <code class="ph codeph">DAY</code>, or for geographic regions. Remember that Parquet
+          data files use a <span class="ph">large</span> block size, so when deciding how finely to
+          partition the data, try to find a granularity where each partition contains
+          <span class="ph">256 MB</span> or more of data, rather than creating a large number of smaller
+          files split among many partitions.
+        </p>
+
+        <p class="p">
+          Inserting into a partitioned Parquet table can be a resource-intensive operation, because each Impala
+          node could potentially be writing a separate data file to HDFS for each combination of different values
+          for the partition key columns. The large number of simultaneous open files could exceed the HDFS
+          <span class="q">"transceivers"</span> limit. To avoid exceeding this limit, consider the following techniques:
+        </p>
+
+        <ul class="ul">
+          <li class="li">
+            Load different subsets of data using separate <code class="ph codeph">INSERT</code> statements with specific values
+            for the <code class="ph codeph">PARTITION</code> clause, such as <code class="ph codeph">PARTITION (year=2010)</code>.
+          </li>
+
+          <li class="li">
+            Increase the <span class="q">"transceivers"</span> value for HDFS, sometimes spelled <span class="q">"xcievers"</span> (sic). The property
+            value in the <span class="ph filepath">hdfs-site.xml</span> configuration file is
+
+            <code class="ph codeph">dfs.datanode.max.transfer.threads</code>. For example, if you were loading 12 years of data
+            partitioned by year, month, and day, even a value of 4096 might not be high enough. This
+            <a class="xref" href="http://blog.cloudera.com/blog/2012/03/hbase-hadoop-xceivers/" target="_blank">blog post</a> explores the considerations for setting this value
+            higher or lower, using HBase examples for illustration.
+          </li>
+
+          <li class="li">
+            Use the <code class="ph codeph">COMPUTE STATS</code> statement to collect
+            <a class="xref" href="impala_perf_stats.html#perf_column_stats">column statistics</a> on the source table from
+            which data is being copied, so that the Impala query can estimate the number of different values in the
+            partition key columns and distribute the work accordingly.
+          </li>
+        </ul>
+
+      </div>
+
+    </article>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title6" id="parquet__parquet_compression">
+
+    <h2 class="title topictitle2" id="ariaid-title6">Snappy and GZip Compression for Parquet Data Files</h2>
+  
+
+    <div class="body conbody">
+
+      <p class="p">
+        
+        When Impala writes Parquet data files using the <code class="ph codeph">INSERT</code> statement, the underlying
+        compression is controlled by the <code class="ph codeph">COMPRESSION_CODEC</code> query option. (Prior to Impala 2.0, the
+        query option name was <code class="ph codeph">PARQUET_COMPRESSION_CODEC</code>.) The allowed values for this query option
+        are <code class="ph codeph">snappy</code> (the default), <code class="ph codeph">gzip</code>, and <code class="ph codeph">none</code>. The option
+        value is not case-sensitive. If the option is set to an unrecognized value, all kinds of queries will fail
+        due to the invalid option setting, not just queries involving Parquet tables.
+      </p>
+
+    </div>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title7" id="parquet_compression__parquet_snappy">
+
+      <h3 class="title topictitle3" id="ariaid-title7">Example of Parquet Table with Snappy Compression</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          
+          By default, the underlying data files for a Parquet table are compressed with Snappy. The combination of
+          fast compression and decompression makes it a good choice for many data sets. To ensure Snappy
+          compression is used, for example after experimenting with other compression codecs, set the
+          <code class="ph codeph">COMPRESSION_CODEC</code> query option to <code class="ph codeph">snappy</code> before inserting the data:
+        </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; create database parquet_compression;
+[localhost:21000] &gt; use parquet_compression;
+[localhost:21000] &gt; create table parquet_snappy like raw_text_data;
+[localhost:21000] &gt; set COMPRESSION_CODEC=snappy;
+[localhost:21000] &gt; insert into parquet_snappy select * from raw_text_data;
+Inserted 1000000000 rows in 181.98s
+</code></pre>
+
+      </div>
+
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title8" id="parquet_compression__parquet_gzip">
+
+      <h3 class="title topictitle3" id="ariaid-title8">Example of Parquet Table with GZip Compression</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          If you need more intensive compression (at the expense of more CPU cycles for uncompressing during
+          queries), set the <code class="ph codeph">COMPRESSION_CODEC</code> query option to <code class="ph codeph">gzip</code> before
+          inserting the data:
+        </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; create table parquet_gzip like raw_text_data;
+[localhost:21000] &gt; set COMPRESSION_CODEC=gzip;
+[localhost:21000] &gt; insert into parquet_gzip select * from raw_text_data;
+Inserted 1000000000 rows in 1418.24s
+</code></pre>
+
+      </div>
+
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title9" id="parquet_compression__parquet_none">
+
+      <h3 class="title topictitle3" id="ariaid-title9">Example of Uncompressed Parquet Table</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          If your data compresses very poorly, or you want to avoid the CPU overhead of compression and
+          decompression entirely, set the <code class="ph codeph">COMPRESSION_CODEC</code> query option to <code class="ph codeph">none</code>
+          before inserting the data:
+        </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; create table parquet_none like raw_text_data;
+[localhost:21000] &gt; set COMPRESSION_CODEC=none;
+[localhost:21000] &gt; insert into parquet_none select * from raw_text_data;
+Inserted 1000000000 rows in 146.90s
+</code></pre>
+
+      </div>
+
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title10" id="parquet_compression__parquet_compression_examples">
+
+      <h3 class="title topictitle3" id="ariaid-title10">Examples of Sizes and Speeds for Compressed Parquet Tables</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          Here are some examples showing differences in data sizes and query speeds for 1 billion rows of synthetic
+          data, compressed with each kind of codec. As always, run similar tests with realistic data sets of your
+          own. The actual compression ratios, and relative insert and query speeds, will vary depending on the
+          characteristics of the actual data.
+        </p>
+
+        <p class="p">
+          In this case, switching from Snappy to GZip compression shrinks the data by an additional 40% or so,
+          while switching from Snappy compression to no compression expands the data also by about 40%:
+        </p>
+
+<pre class="pre codeblock"><code>$ hdfs dfs -du -h /user/hive/warehouse/parquet_compression.db
+23.1 G  /user/hive/warehouse/parquet_compression.db/parquet_snappy
+13.5 G  /user/hive/warehouse/parquet_compression.db/parquet_gzip
+32.8 G  /user/hive/warehouse/parquet_compression.db/parquet_none
+</code></pre>
+
+        <p class="p">
+          Because Parquet data files are typically <span class="ph">large</span>, each directory will
+          have a different number of data files and the row groups will be arranged differently.
+        </p>
+
+        <p class="p">
+          At the same time, the less agressive the compression, the faster the data can be decompressed. In this
+          case using a table with a billion rows, a query that evaluates all the values for a particular column
+          runs faster with no compression than with Snappy compression, and faster with Snappy compression than
+          with Gzip compression. Query performance depends on several other factors, so as always, run your own
+          benchmarks with your own data to determine the ideal tradeoff between data size, CPU efficiency, and
+          speed of insert and query operations.
+        </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; desc parquet_snappy;
+Query finished, fetching results ...
++-----------+---------+---------+
+| name      | type    | comment |
++-----------+---------+---------+
+| id        | int     |         |
+| val       | int     |         |
+| zfill     | string  |         |
+| name      | string  |         |
+| assertion | boolean |         |
++-----------+---------+---------+
+Returned 5 row(s) in 0.14s
+[localhost:21000] &gt; select avg(val) from parquet_snappy;
+Query finished, fetching results ...
++-----------------+
+| _c0             |
++-----------------+
+| 250000.93577915 |
++-----------------+
+Returned 1 row(s) in 4.29s
+[localhost:21000] &gt; select avg(val) from parquet_gzip;
+Query finished, fetching results ...
++-----------------+
+| _c0             |
++-----------------+
+| 250000.93577915 |
++-----------------+
+Returned 1 row(s) in 6.97s
+[localhost:21000] &gt; select avg(val) from parquet_none;
+Query finished, fetching results ...
++-----------------+
+| _c0             |
++-----------------+
+| 250000.93577915 |
++-----------------+
+Returned 1 row(s) in 3.67s
+</code></pre>
+
+      </div>
+
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title11" id="parquet_compression__parquet_compression_multiple">
+
+      <h3 class="title topictitle3" id="ariaid-title11">Example of Copying Parquet Data Files</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          Here is a final example, to illustrate how the data files using the various compression codecs are all
+          compatible with each other for read operations. The metadata about the compression format is written into
+          each data file, and can be decoded during queries regardless of the <code class="ph codeph">COMPRESSION_CODEC</code>
+          setting in effect at the time. In this example, we copy data files from the
+          <code class="ph codeph">PARQUET_SNAPPY</code>, <code class="ph codeph">PARQUET_GZIP</code>, and <code class="ph codeph">PARQUET_NONE</code> tables
+          used in the previous examples, each containing 1 billion rows, all to the data directory of a new table
+          <code class="ph codeph">PARQUET_EVERYTHING</code>. A couple of sample queries demonstrate that the new table now
+          contains 3 billion rows featuring a variety of compression codecs for the data files.
+        </p>
+
+        <p class="p">
+          First, we create the table in Impala so that there is a destination directory in HDFS to put the data
+          files:
+        </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; create table parquet_everything like parquet_snappy;
+Query: create table parquet_everything like parquet_snappy
+</code></pre>
+
+        <p class="p">
+          Then in the shell, we copy the relevant data files into the data directory for this new table. Rather
+          than using <code class="ph codeph">hdfs dfs -cp</code> as with typical files, we use <code class="ph codeph">hadoop distcp -pb</code>
+          to ensure that the special <span class="ph"> block size</span> of the Parquet data files is
+          preserved.
+        </p>
+
+<pre class="pre codeblock"><code>$ hadoop distcp -pb /user/hive/warehouse/parquet_compression.db/parquet_snappy \
+  /user/hive/warehouse/parquet_compression.db/parquet_everything
+...<var class="keyword varname">MapReduce output</var>...
+$ hadoop distcp -pb /user/hive/warehouse/parquet_compression.db/parquet_gzip  \
+  /user/hive/warehouse/parquet_compression.db/parquet_everything
+...<var class="keyword varname">MapReduce output</var>...
+$ hadoop distcp -pb /user/hive/warehouse/parquet_compression.db/parquet_none  \
+  /user/hive/warehouse/parquet_compression.db/parquet_everything
+...<var class="keyword varname">MapReduce output</var>...
+</code></pre>
+
+        <p class="p">
+          Back in the <span class="keyword cmdname">impala-shell</span> interpreter, we use the <code class="ph codeph">REFRESH</code> statement to
+          alert the Impala server to the new data files for this table, then we can run queries demonstrating that
+          the data files represent 3 billion rows, and the values for one of the numeric columns match what was in
+          the original smaller tables:
+        </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; refresh parquet_everything;
+Query finished, fetching results ...
+
+Returned 0 row(s) in 0.32s
+[localhost:21000] &gt; select count(*) from parquet_everything;
+Query finished, fetching results ...
++------------+
+| _c0        |
++------------+
+| 3000000000 |
++------------+
+Returned 1 row(s) in 8.18s
+[localhost:21000] &gt; select avg(val) from parquet_everything;
+Query finished, fetching results ...
++-----------------+
+| _c0             |
++-----------------+
+| 250000.93577915 |
++-----------------+
+Returned 1 row(s) in 13.35s
+</code></pre>
+
+      </div>
+
+    </article>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title12" id="parquet__parquet_complex_types">
+
+    <h2 class="title topictitle2" id="ariaid-title12">Parquet Tables for Impala Complex Types</h2>
+
+    <div class="body conbody">
+
+    <p class="p">
+      In <span class="keyword">Impala 2.3</span> and higher, Impala supports the complex types
+      <code class="ph codeph">ARRAY</code>, <code class="ph codeph">STRUCT</code>, and <code class="ph codeph">MAP</code>
+      See <a class="xref" href="impala_complex_types.html#complex_types">Complex Types (Impala 2.3 or higher only)</a> for details.
+      Because these data types are currently supported only for the Parquet file format,
+      if you plan to use them, become familiar with the performance and storage aspects
+      of Parquet first.
+    </p>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title13" id="parquet__parquet_interop">
+
+    <h2 class="title topictitle2" id="ariaid-title13">Exchanging Parquet Data Files with Other Hadoop Components</h2>
+  
+
+    <div class="body conbody">
+
+      <p class="p">
+        You can read and write Parquet data files from other <span class="keyword"></span> components.
+        See <span class="xref">the documentation for your Apache Hadoop distribution</span> for details.
+      </p>
+
+
+
+
+
+
+
+
+
+      <p class="p">
+        Previously, it was not possible to create Parquet data through Impala and reuse that table within Hive. Now
+        that Parquet support is available for Hive, reusing existing Impala Parquet data files in Hive
+        requires updating the table metadata. Use the following command if you are already running Impala 1.1.1 or
+        higher:
+      </p>
+
+<pre class="pre codeblock"><code>ALTER TABLE <var class="keyword varname">table_name</var> SET FILEFORMAT PARQUET;
+</code></pre>
+
+      <p class="p">
+        If you are running a level of Impala that is older than 1.1.1, do the metadata update through Hive:
+      </p>
+
+<pre class="pre codeblock"><code>ALTER TABLE <var class="keyword varname">table_name</var> SET SERDE 'parquet.hive.serde.ParquetHiveSerDe';
+ALTER TABLE <var class="keyword varname">table_name</var> SET FILEFORMAT
+  INPUTFORMAT "parquet.hive.DeprecatedParquetInputFormat"
+  OUTPUTFORMAT "parquet.hive.DeprecatedParquetOutputFormat";
+</code></pre>
+
+      <p class="p">
+        Impala 1.1.1 and higher can reuse Parquet data files created by Hive, without any action required.
+      </p>
+
+
+
+      <p class="p">
+        Impala supports the scalar data types that you can encode in a Parquet data file, but not composite or
+        nested types such as maps or arrays. In <span class="keyword">Impala 2.2</span> and higher, Impala can query Parquet data
+        files that include composite or nested types, as long as the query only refers to columns with scalar
+        types.
+
+      </p>
+
+      <p class="p">
+        If you copy Parquet data files between nodes, or even between different directories on the same node, make
+        sure to preserve the block size by using the command <code class="ph codeph">hadoop distcp -pb</code>. To verify that the
+        block size was preserved, issue the command <code class="ph codeph">hdfs fsck -blocks
+        <var class="keyword varname">HDFS_path_of_impala_table_dir</var></code> and check that the average block size is at or
+        near <span class="ph">256 MB (or whatever other size is defined by the
+        <code class="ph codeph">PARQUET_FILE_SIZE</code> query option).</span>. (The <code class="ph codeph">hadoop distcp</code> operation
+        typically leaves some directories behind, with names matching <span class="ph filepath">_distcp_logs_*</span>, that you
+        can delete from the destination directory afterward.)
+
+
+
+        Issue the command <span class="keyword cmdname">hadoop distcp</span> for details about <span class="keyword cmdname">distcp</span> command
+        syntax.
+      </p>
+
+
+
+      <p class="p">
+        Impala can query Parquet files that use the <code class="ph codeph">PLAIN</code>, <code class="ph codeph">PLAIN_DICTIONARY</code>,
+        <code class="ph codeph">BIT_PACKED</code>, and <code class="ph codeph">RLE</code> encodings.
+        Currently, Impala does not support <code class="ph codeph">RLE_DICTIONARY</code> encoding.
+        When creating files outside of Impala for use by Impala, make sure to use one of the supported encodings.
+        In particular, for MapReduce jobs, <code class="ph codeph">parquet.writer.version</code> must not be defined
+        (especially as <code class="ph codeph">PARQUET_2_0</code>) for writing the configurations of Parquet MR jobs.
+        Use the default version (or format). The default format, 1.0, includes some enhancements that are compatible with older versions.
+        Data using the 2.0 format might not be consumable by Impala, due to use of the <code class="ph codeph">RLE_DICTIONARY</code> encoding.
+      </p>
+      <div class="p">
+        To examine the internal structure and data of Parquet files, you can use the
+        <span class="keyword cmdname">parquet-tools</span> command. Make sure this
+        command is in your <code class="ph codeph">$PATH</code>. (Typically, it is symlinked from
+        <span class="ph filepath">/usr/bin</span>; sometimes, depending on your installation setup, you
+        might need to locate it under an alternative  <code class="ph codeph">bin</code> directory.)
+        The arguments to this command let you perform operations such as:
+        <ul class="ul">
+          <li class="li">
+            <code class="ph codeph">cat</code>: Print a file's contents to standard out. In <span class="keyword">Impala 2.3</span> and higher, you can use
+            the <code class="ph codeph">-j</code> option to output JSON.
+          </li>
+          <li class="li">
+            <code class="ph codeph">head</code>: Print the first few records of a file to standard output.
+          </li>
+          <li class="li">
+            <code class="ph codeph">schema</code>: Print the Parquet schema for the file.
+          </li>
+          <li class="li">
+            <code class="ph codeph">meta</code>: Print the file footer metadata, including key-value properties (like Avro schema), compression ratios,
+            encodings, compression used, and row group information.
+          </li>
+          <li class="li">
+            <code class="ph codeph">dump</code>: Print all data and metadata.
+          </li>
+        </ul>
+        Use <code class="ph codeph">parquet-tools -h</code> to see usage information for all the arguments.
+        Here are some examples showing <span class="keyword cmdname">parquet-tools</span> usage:
+
+<pre class="pre codeblock"><code>
+$ # Be careful doing this for a big file! Use parquet-tools head to be safe.
+$ parquet-tools cat sample.parq
+year = 1992
+month = 1
+day = 2
+dayofweek = 4
+dep_time = 748
+crs_dep_time = 750
+arr_time = 851
+crs_arr_time = 846
+carrier = US
+flight_num = 53
+actual_elapsed_time = 63
+crs_elapsed_time = 56
+arrdelay = 5
+depdelay = -2
+origin = CMH
+dest = IND
+distance = 182
+cancelled = 0
+diverted = 0
+
+year = 1992
+month = 1
+day = 3
+...
+
+</code></pre>
+
+<pre class="pre codeblock"><code>
+$ parquet-tools head -n 2 sample.parq
+year = 1992
+month = 1
+day = 2
+dayofweek = 4
+dep_time = 748
+crs_dep_time = 750
+arr_time = 851
+crs_arr_time = 846
+carrier = US
+flight_num = 53
+actual_elapsed_time = 63
+crs_elapsed_time = 56
+arrdelay = 5
+depdelay = -2
+origin = CMH
+dest = IND
+distance = 182
+cancelled = 0
+diverted = 0
+
+year = 1992
+month = 1
+day = 3
+...
+
+</code></pre>
+
+<pre class="pre codeblock"><code>
+$ parquet-tools schema sample.parq
+message schema {
+  optional int32 year;
+  optional int32 month;
+  optional int32 day;
+  optional int32 dayofweek;
+  optional int32 dep_time;
+  optional int32 crs_dep_time;
+  optional int32 arr_time;
+  optional int32 crs_arr_time;
+  optional binary carrier;
+  optional int32 flight_num;
+...
+
+</code></pre>
+
+<pre class="pre codeblock"><code>
+$ parquet-tools meta sample.parq
+creator:             impala version 2.2.0-...
+
+file schema:         schema
+-------------------------------------------------------------------
+year:                OPTIONAL INT32 R:0 D:1
+month:               OPTIONAL INT32 R:0 D:1
+day:                 OPTIONAL INT32 R:0 D:1
+dayofweek:           OPTIONAL INT32 R:0 D:1
+dep_time:            OPTIONAL INT32 R:0 D:1
+crs_dep_time:        OPTIONAL INT32 R:0 D:1
+arr_time:            OPTIONAL INT32 R:0 D:1
+crs_arr_time:        OPTIONAL INT32 R:0 D:1
+carrier:             OPTIONAL BINARY R:0 D:1
+flight_num:          OPTIONAL INT32 R:0 D:1
+...
+
+row group 1:         RC:20636601 TS:265103674
+-------------------------------------------------------------------
+year:                 INT32 SNAPPY DO:4 FPO:35 SZ:10103/49723/4.92 VC:20636601 ENC:PLAIN_DICTIONARY,RLE,PLAIN
+month:                INT32 SNAPPY DO:10147 FPO:10210 SZ:11380/35732/3.14 VC:20636601 ENC:PLAIN_DICTIONARY,RLE,PLAIN
+day:                  INT32 SNAPPY DO:21572 FPO:21714 SZ:3071658/9868452/3.21 VC:20636601 ENC:PLAIN_DICTIONARY,RLE,PLAIN
+dayofweek:            INT32 SNAPPY DO:3093276 FPO:3093319 SZ:2274375/5941876/2.61 VC:20636601 ENC:PLAIN_DICTIONARY,RLE,PLAIN
+dep_time:             INT32 SNAPPY DO:5367705 FPO:5373967 SZ:28281281/28573175/1.01 VC:20636601 ENC:PLAIN_DICTIONARY,RLE,PLAIN
+crs_dep_time:         INT32 SNAPPY DO:33649039 FPO:33654262 SZ:10220839/11574964/1.13 VC:20636601 ENC:PLAIN_DICTIONARY,RLE,PLAIN
+arr_time:             INT32 SNAPPY DO:43869935 FPO:43876489 SZ:28562410/28797767/1.01 VC:20636601 ENC:PLAIN_DICTIONARY,RLE,PLAIN
+crs_arr_time:         INT32 SNAPPY DO:72432398 FPO:72438151 SZ:10908972/12164626/1.12 VC:20636601 ENC:PLAIN_DICTIONARY,RLE,PLAIN
+carrier:              BINARY SNAPPY DO:83341427 FPO:83341558 SZ:114916/128611/1.12 VC:20636601 ENC:PLAIN_DICTIONARY,RLE,PLAIN
+flight_num:           INT32 SNAPPY DO:83456393 FPO:83488603 SZ:10216514/11474301/1.12 VC:20636601 ENC:PLAIN_DICTIONARY,RLE,PLAIN
+...
+
+</code></pre>
+      </div>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title14" id="parquet__parquet_data_files">
+
+    <h2 class="title topictitle2" id="ariaid-title14">How Parquet Data Files Are Organized</h2>
+  
+
+    <div class="body conbody">
+
+      <p class="p">
+        Although Parquet is a column-oriented file format, do not expect to find one data file for each column.
+        Parquet keeps all the data for a row within the same data file, to ensure that the columns for a row are
+        always available on the same node for processing. What Parquet does is to set a large HDFS block size and a
+        matching maximum data file size, to ensure that I/O and network transfer requests apply to large batches of
+        data.
+      </p>
+
+      <p class="p">
+        Within that data file, the data for a set of rows is rearranged so that all the values from the first
+        column are organized in one contiguous block, then all the values from the second column, and so on.
+        Putting the values from the same column next to each other lets Impala use effective compression techniques
+        on the values in that column.
+      </p>
+
+      <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+        <p class="p">
+          Impala <code class="ph codeph">INSERT</code> statements write Parquet data files using an HDFS block size
+          <span class="ph">that matches the data file size</span>, to ensure that each data file is
+          represented by a single HDFS block, and the entire file can be processed on a single node without
+          requiring any remote reads.
+        </p>
+
+        <p class="p">
+          If you create Parquet data files outside of Impala, such as through a MapReduce or Pig job, ensure that
+          the HDFS block size is greater than or equal to the file size, so that the <span class="q">"one file per block"</span>
+          relationship is maintained. Set the <code class="ph codeph">dfs.block.size</code> or the <code class="ph codeph">dfs.blocksize</code>
+          property large enough that each file fits within a single HDFS block, even if that size is larger than
+          the normal HDFS block size.
+        </p>
+
+        <p class="p">
+          If the block size is reset to a lower value during a file copy, you will see lower performance for
+          queries involving those files, and the <code class="ph codeph">PROFILE</code> statement will reveal that some I/O is
+          being done suboptimally, through remote reads. See
+          <a class="xref" href="impala_parquet.html#parquet_compression_multiple">Example of Copying Parquet Data Files</a> for an example showing how to preserve the
+          block size when copying Parquet data files.
+        </p>
+      </div>
+
+      <p class="p">
+        When Impala retrieves or tests the data for a particular column, it opens all the data files, but only
+        reads the portion of each file containing the values for that column. The column values are stored
+        consecutively, minimizing the I/O required to process the values within a single column. If other columns
+        are named in the <code class="ph codeph">SELECT</code> list or <code class="ph codeph">WHERE</code> clauses, the data for all columns
+        in the same row is available within that same data file.
+      </p>
+
+      <p class="p">
+        If an <code class="ph codeph">INSERT</code> statement brings in less than <span class="ph">one Parquet
+        block's worth</span> of data, the resulting data file is smaller than ideal. Thus, if you do split up an ETL
+        job to use multiple <code class="ph codeph">INSERT</code> statements, try to keep the volume of data for each
+        <code class="ph codeph">INSERT</code> statement to approximately <span class="ph">256 MB, or a multiple of
+        256 MB</span>.
+      </p>
+
+    </div>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title15" id="parquet_data_files__parquet_encoding">
+
+      <h3 class="title topictitle3" id="ariaid-title15">RLE and Dictionary Encoding for Parquet Data Files</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          Parquet uses some automatic compression techniques, such as run-length encoding (RLE) and dictionary
+          encoding, based on analysis of the actual data values. Once the data values are encoded in a compact
+          form, the encoded data can optionally be further compressed using a compression algorithm. Parquet data
+          files created by Impala can use Snappy, GZip, or no compression; the Parquet spec also allows LZO
+          compression, but currently Impala does not support LZO-compressed Parquet files.
+        </p>
+
+        <p class="p">
+          RLE and dictionary encoding are compression techniques that Impala applies automatically to groups of
+          Parquet data values, in addition to any Snappy or GZip compression applied to the entire data files.
+          These automatic optimizations can save you time and planning that are normally needed for a traditional
+          data warehouse. For example, dictionary encoding reduces the need to create numeric IDs as abbreviations
+          for longer string values.
+        </p>
+
+        <p class="p">
+          Run-length encoding condenses sequences of repeated data values. For example, if many consecutive rows
+          all contain the same value for a country code, those repeating values can be represented by the value
+          followed by a count of how many times it appears consecutively.
+        </p>
+
+        <p class="p">
+          Dictionary encoding takes the different values present in a column, and represents each one in compact
+          2-byte form rather than the original value, which could be several bytes. (Additional compression is
+          applied to the compacted values, for extra space savings.) This type of encoding applies when the number
+          of different values for a column is less than 2**16 (16,384). It does not apply to columns of data type
+          <code class="ph codeph">BOOLEAN</code>, which are already very short. <code class="ph codeph">TIMESTAMP</code> columns sometimes have
+          a unique value for each row, in which case they can quickly exceed the 2**16 limit on distinct values.
+          The 2**16 limit on different values within a column is reset for each data file, so if several different
+          data files each contained 10,000 different city names, the city name column in each data file could still
+          be condensed using dictionary encoding.
+        </p>
+
+      </div>
+
+    </article>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title16" id="parquet__parquet_compacting">
+
+    <h2 class="title topictitle2" id="ariaid-title16">Compacting Data Files for Parquet Tables</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        If you reuse existing table structures or ETL processes for Parquet tables, you might encounter a <span class="q">"many
+        small files"</span> situation, which is suboptimal for query efficiency. For example, statements like these
+        might produce inefficiently organized data files:
+      </p>
+
+<pre class="pre codeblock"><code>-- In an N-node cluster, each node produces a data file
+-- for the INSERT operation. If you have less than
+-- N GB of data to copy, some files are likely to be
+-- much smaller than the <span class="ph">default Parquet</span> block size.
+insert into parquet_table select * from text_table;
+
+-- Even if this operation involves an overall large amount of data,
+-- when split up by year/month/day, each partition might only
+-- receive a small amount of data. Then the data files for
+-- the partition might be divided between the N nodes in the cluster.
+-- A multi-gigabyte copy operation might produce files of only
+-- a few MB each.
+insert into partitioned_parquet_table partition (year, month, day)
+  select year, month, day, url, referer, user_agent, http_code, response_time
+  from web_stats;
+</code></pre>
+
+      <p class="p">
+        Here are techniques to help you produce large data files in Parquet <code class="ph codeph">INSERT</code> operations, and
+        to compact existing too-small data files:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          <p class="p">
+            When inserting into a partitioned Parquet table, use statically partitioned <code class="ph codeph">INSERT</code>
+            statements where the partition key values are specified as constant values. Ideally, use a separate
+            <code class="ph codeph">INSERT</code> statement for each partition.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+        You might set the <code class="ph codeph">NUM_NODES</code> option to 1 briefly, during <code class="ph codeph">INSERT</code> or
+        <code class="ph codeph">CREATE TABLE AS SELECT</code> statements. Normally, those statements produce one or more data
+        files per data node. If the write operation involves small amounts of data, a Parquet table, and/or a
+        partitioned table, the default behavior could produce many small files when intuitively you might expect
+        only a single output file. <code class="ph codeph">SET NUM_NODES=1</code> turns off the <span class="q">"distributed"</span> aspect of the
+        write operation, making it more likely to produce only one or a few data files.
+      </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            Be prepared to reduce the number of partition key columns from what you are used to with traditional
+            analytic database systems.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            Do not expect Impala-written Parquet files to fill up the entire Parquet block size. Impala estimates
+            on the conservative side when figuring out how much data to write to each Parquet file. Typically, the
+            of uncompressed data in memory is substantially reduced on disk by the compression and encoding
+            techniques in the Parquet file format.
+
+            The final data file size varies depending on the compressibility of the data. Therefore, it is not an
+            indication of a problem if <span class="ph">256 MB</span> of text data is turned into 2
+            Parquet data files, each less than <span class="ph">256 MB</span>.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            If you accidentally end up with a table with many small data files, consider using one or more of the
+            preceding techniques and copying all the data into a new Parquet table, either through <code class="ph codeph">CREATE
+            TABLE AS SELECT</code> or <code class="ph codeph">INSERT ... SELECT</code> statements.
+          </p>
+
+          <p class="p">
+            To avoid rewriting queries to change table names, you can adopt a convention of always running
+            important queries against a view. Changing the view definition immediately switches any subsequent
+            queries to use the new underlying tables:
+          </p>
+<pre class="pre codeblock"><code>create view production_table as select * from table_with_many_small_files;
+-- CTAS or INSERT...SELECT all the data into a more efficient layout...
+alter view production_table as select * from table_with_few_big_files;
+select * from production_table where c1 = 100 and c2 &lt; 50 and ...;
+</code></pre>
+        </li>
+      </ul>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title17" id="parquet__parquet_schema_evolution">
+
+    <h2 class="title topictitle2" id="ariaid-title17">Schema Evolution for Parquet Tables</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        Schema evolution refers to using the statement <code class="ph codeph">ALTER TABLE ... REPLACE COLUMNS</code> to change
+        the names, data type, or number of columns in a table. You can perform schema evolution for Parquet tables
+        as follows:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          <p class="p">
+            The Impala <code class="ph codeph">ALTER TABLE</code> statement never changes any data files in the tables. From the
+            Impala side, schema evolution involves interpreting the same data files in terms of a new table
+            definition. Some types of schema changes make sense and are represented correctly. Other types of
+            changes cannot be represented in a sensible way, and produce special result values or conversion errors
+            during queries.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            The <code class="ph codeph">INSERT</code> statement always creates data using the latest table definition. You might
+            end up with data files with different numbers of columns or internal data representations if you do a
+            sequence of <code class="ph codeph">INSERT</code> and <code class="ph codeph">ALTER TABLE ... REPLACE COLUMNS</code> statements.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            If you use <code class="ph codeph">ALTER TABLE ... REPLACE COLUMNS</code> to define additional columns at the end,
+            when the original data files are used in a query, these final columns are considered to be all
+            <code class="ph codeph">NULL</code> values.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            If you use <code class="ph codeph">ALTER TABLE ... REPLACE COLUMNS</code> to define fewer columns than before, when
+            the original data files are used in a query, the unused columns still present in the data file are
+            ignored.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            Parquet represents the <code class="ph codeph">TINYINT</code>, <code class="ph codeph">SMALLINT</code>, and <code class="ph codeph">INT</code>
+            types the same internally, all stored in 32-bit integers.
+          </p>
+          <ul class="ul">
+            <li class="li">
+              That means it is easy to promote a <code class="ph codeph">TINYINT</code> column to <code class="ph codeph">SMALLINT</code> or
+              <code class="ph codeph">INT</code>, or a <code class="ph codeph">SMALLINT</code> column to <code class="ph codeph">INT</code>. The numbers are
+              represented exactly the same in the data file, and the columns being promoted would not contain any
+              out-of-range values.
+            </li>
+
+            <li class="li">
+              <p class="p">
+                If you change any of these column types to a smaller type, any values that are out-of-range for the
+                new type are returned incorrectly, typically as negative numbers.
+              </p>
+            </li>
+
+            <li class="li">
+              <p class="p">
+                You cannot change a <code class="ph codeph">TINYINT</code>, <code class="ph codeph">SMALLINT</code>, or <code class="ph codeph">INT</code>
+                column to <code class="ph codeph">BIGINT</code>, or the other way around. Although the <code class="ph codeph">ALTER
+                TABLE</code> succeeds, any attempt to query those columns results in conversion errors.
+              </p>
+            </li>
+
+            <li class="li">
+              <p class="p">
+                Any other type conversion for columns produces a conversion error during queries. For example,
+                <code class="ph codeph">INT</code> to <code class="ph codeph">STRING</code>, <code class="ph codeph">FLOAT</code> to <code class="ph codeph">DOUBLE</code>,
+                <code class="ph codeph">TIMESTAMP</code> to <code class="ph codeph">STRING</code>, <code class="ph codeph">DECIMAL(9,0)</code> to
+                <code class="ph codeph">DECIMAL(5,2)</code>, and so on.
+              </p>
+            </li>
+          </ul>
+        </li>
+      </ul>
+
+      <div class="p">
+        You might find that you have Parquet files where the columns do not line up in the same
+        order as in your Impala table. For example, you might have a Parquet file that was part of
+        a table with columns <code class="ph codeph">C1,C2,C3,C4</code>, and now you want to reuse the same
+        Parquet file in a table with columns <code class="ph codeph">C4,C2</code>. By default, Impala expects the
+        columns in the data file to appear in the same order as the columns defined for the table,
+        making it impractical to do some kinds of file reuse or schema evolution. In <span class="keyword">Impala 2.6</span>
+        and higher, the query option <code class="ph codeph">PARQUET_FALLBACK_SCHEMA_RESOLUTION=name</code> lets Impala
+        resolve columns by name, and therefore handle out-of-order or extra columns in the data file.
+        For example:
+
+<pre class="pre codeblock"><code>
+create database schema_evolution;
+use schema_evolution;
+create table t1 (c1 int, c2 boolean, c3 string, c4 timestamp)
+  stored as parquet;
+insert into t1 values
+  (1, true, 'yes', now()),
+  (2, false, 'no', now() + interval 1 day);
+
+select * from t1;
++----+-------+-----+-------------------------------+
+| c1 | c2    | c3  | c4                            |
++----+-------+-----+-------------------------------+
+| 1  | true  | yes | 2016-06-28 14:53:26.554369000 |
+| 2  | false | no  | 2016-06-29 14:53:26.554369000 |
++----+-------+-----+-------------------------------+
+
+desc formatted t1;
+...
+| Location:   | /user/hive/warehouse/schema_evolution.db/t1 |
+...
+
+-- Make T2 have the same data file as in T1, including 2
+-- unused columns and column order different than T2 expects.
+load data inpath '/user/hive/warehouse/schema_evolution.db/t1'
+  into table t2;
++----------------------------------------------------------+
+| summary                                                  |
++----------------------------------------------------------+
+| Loaded 1 file(s). Total files in destination location: 1 |
++----------------------------------------------------------+
+
+-- 'position' is the default setting.
+-- Impala cannot read the Parquet file if the column order does not match.
+set PARQUET_FALLBACK_SCHEMA_RESOLUTION=position;
+PARQUET_FALLBACK_SCHEMA_RESOLUTION set to position
+
+select * from t2;
+WARNINGS:
+File 'schema_evolution.db/t2/45331705_data.0.parq'
+has an incompatible Parquet schema for column 'schema_evolution.t2.c4'.
+Column type: TIMESTAMP, Parquet schema: optional int32 c1 [i:0 d:1 r:0]
+
+File 'schema_evolution.db/t2/45331705_data.0.parq'
+has an incompatible Parquet schema for column 'schema_evolution.t2.c4'.
+Column type: TIMESTAMP, Parquet schema: optional int32 c1 [i:0 d:1 r:0]
+
+-- With the 'name' setting, Impala can read the Parquet data files
+-- despite mismatching column order.
+set PARQUET_FALLBACK_SCHEMA_RESOLUTION=name;
+PARQUET_FALLBACK_SCHEMA_RESOLUTION set to name
+
+select * from t2;
++-------------------------------+-------+
+| c4                            | c2    |
++-------------------------------+-------+
+| 2016-06-28 14:53:26.554369000 | true  |
+| 2016-06-29 14:53:26.554369000 | false |
++-------------------------------+-------+
+
+</code></pre>
+
+        See <a class="xref" href="impala_parquet_fallback_schema_resolution.html#parquet_fallback_schema_resolution">PARQUET_FALLBACK_SCHEMA_RESOLUTION Query Option (Impala 2.6 or higher only)</a>
+        for more details.
+      </div>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title18" id="parquet__parquet_data_types">
+
+    <h2 class="title topictitle2" id="ariaid-title18">Data Type Considerations for Parquet Tables</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        The Parquet format defines a set of data types whose names differ from the names of the corresponding
+        Impala data types. If you are preparing Parquet files using other Hadoop components such as Pig or
+        MapReduce, you might need to work with the type names defined by Parquet. The following figure lists the
+        Parquet-defined types and the equivalent types in Impala.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Primitive types:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>BINARY -&gt; STRING
+BOOLEAN -&gt; BOOLEAN
+DOUBLE -&gt; DOUBLE
+FLOAT -&gt; FLOAT
+INT32 -&gt; INT
+INT64 -&gt; BIGINT
+INT96 -&gt; TIMESTAMP
+</code></pre>
+
+      <p class="p">
+        <strong class="ph b">Logical types:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>BINARY + OriginalType UTF8 -&gt; STRING
+BINARY + OriginalType DECIMAL -&gt; DECIMAL
+</code></pre>
+
+      <p class="p">
+        <strong class="ph b">Complex types:</strong>
+      </p>
+
+      <p class="p">
+        For the complex types (<code class="ph codeph">ARRAY</code>, <code class="ph codeph">MAP</code>, and <code class="ph codeph">STRUCT</code>)
+        available in <span class="keyword">Impala 2.3</span> and higher, Impala only supports queries
+        against those types in Parquet tables.
+      </p>
+
+    </div>
+
+  </article>
+
+</article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_parquet_annotate_strings_utf8.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_parquet_annotate_strings_utf8.html b/docs/build/html/topics/impala_parquet_annotate_strings_utf8.html
new file mode 100644
index 0000000..6f6ed71
--- /dev/null
+++ b/docs/build/html/topics/impala_parquet_annotate_strings_utf8.html
@@ -0,0 +1,54 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="parquet_annotate_strings_utf8"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>PARQUET_ANNOTATE_STRINGS_UTF8 Query Option (Impala 2.6 or higher only)</title></head><body id="parquet_annotate_strings_utf8"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">PARQUET_ANNOTATE_STRINGS_UTF8 Query Option (<span class="keyword">Impala 2.6</span> or higher only)</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      
+      Causes Impala <code class="ph codeph">INSERT</code> and <code class="ph codeph">CREATE TABLE AS SELECT</code> statements
+      to write Parquet files that use the UTF-8 annotation for <code class="ph codeph">STRING</code> columns.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+    <p class="p">
+      By default, Impala represents a <code class="ph codeph">STRING</code> column in Parquet as an unannotated binary field.
+    </p>
+    <p class="p">
+      Impala always uses the UTF-8 annotation when writing <code class="ph codeph">CHAR</code> and <code class="ph codeph">VARCHAR</code>
+      columns to Parquet files. An alternative to using the query option is to cast <code class="ph codeph">STRING</code>
+      values to <code class="ph codeph">VARCHAR</code>.
+    </p>
+    <p class="p">
+      This option is to help make Impala-written data more interoperable with other data processing engines.
+      Impala itself currently does not support all operations on UTF-8 data.
+      Although data processed by Impala is typically represented in ASCII, it is valid to designate the
+      data as UTF-8 when storing on disk, because ASCII is a subset of UTF-8.
+    </p>
+    <p class="p">
+        <strong class="ph b">Type:</strong> Boolean; recognized values are 1 and 0, or <code class="ph codeph">true</code> and <code class="ph codeph">false</code>;
+        any other value interpreted as <code class="ph codeph">false</code>
+      </p>
+    <p class="p">
+        <strong class="ph b">Default:</strong> <code class="ph codeph">false</code> (shown as 0 in output of <code class="ph codeph">SET</code> statement)
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.6.0</span>
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+    <p class="p">
+      <a class="xref" href="impala_parquet.html#parquet">Using the Parquet File Format with Impala Tables</a>
+    </p>
+
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_parquet_compression_codec.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_parquet_compression_codec.html b/docs/build/html/topics/impala_parquet_compression_codec.html
new file mode 100644
index 0000000..34ae693
--- /dev/null
+++ b/docs/build/html/topics/impala_parquet_compression_codec.html
@@ -0,0 +1,17 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="parquet_compression_codec"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>PARQUET_COMPRESSION_CODEC Query Option</title></head><body id="parquet_compression_codec"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">PARQUET_COMPRESSION_CODEC Query Option</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      
+      Deprecated. Use <code class="ph codeph">COMPRESSION_CODEC</code> in Impala 2.0 and later. See
+      <a class="xref" href="impala_compression_codec.html#compression_codec">COMPRESSION_CODEC Query Option (Impala 2.0 or higher only)</a> for details.
+    </p>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_parquet_fallback_schema_resolution.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_parquet_fallback_schema_resolution.html b/docs/build/html/topics/impala_parquet_fallback_schema_resolution.html
new file mode 100644
index 0000000..91abf35
--- /dev/null
+++ b/docs/build/html/topics/impala_parquet_fallback_schema_resolution.html
@@ -0,0 +1,46 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="parquet_fallback_schema_resolution"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>PARQUET_FALLBACK_SCHEMA_RESOLUTION Query Option (Impala 2.6 or higher only)</title></head><body id="parquet_fallback_schema_resolution"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">PARQUET_FALLBACK_SCHEMA_RESOLUTION Query Option (<span class="keyword">Impala 2.6</span> or higher only)</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      
+      Allows Impala to look up columns within Parquet files by column name, rather than column order,
+      when necessary.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+    <p class="p">
+      By default, Impala looks up columns within a Parquet file based on
+      the order of columns in the table.
+      The <code class="ph codeph">name</code> setting for this option enables behavior for Impala queries
+      similar to the Hive setting <code class="ph codeph">parquet.column.index.access=false</code>.
+      It also allows Impala to query Parquet files created by Hive with the
+      <code class="ph codeph">parquet.column.index.access=false</code> setting in effect.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Type:</strong> integer or string.
+      Allowed values are 0 or <code class="ph codeph">position</code> (default), 1 or <code class="ph codeph">name</code>.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.6.0</span>
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+    <p class="p">
+      <a class="xref" href="impala_parquet.html#parquet_schema_evolution">Schema Evolution for Parquet Tables</a>
+    </p>
+
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_parquet_file_size.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_parquet_file_size.html b/docs/build/html/topics/impala_parquet_file_size.html
new file mode 100644
index 0000000..695c557
--- /dev/null
+++ b/docs/build/html/topics/impala_parquet_file_size.html
@@ -0,0 +1,93 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="parquet_file_size"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>PARQUET_FILE_SIZE Query Option</title></head><body id="parquet_file_size"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">PARQUET_FILE_SIZE Query Option</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      
+      Specifies the maximum size of each Parquet data file produced by Impala <code class="ph codeph">INSERT</code> statements.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+    <p class="p">
+      Specify the size in bytes, or with a trailing <code class="ph codeph">m</code> or <code class="ph codeph">g</code> character to indicate
+      megabytes or gigabytes. For example:
+    </p>
+
+<pre class="pre codeblock"><code>-- 128 megabytes.
+set PARQUET_FILE_SIZE=134217728
+INSERT OVERWRITE parquet_table SELECT * FROM text_table;
+
+-- 512 megabytes.
+set PARQUET_FILE_SIZE=512m;
+INSERT OVERWRITE parquet_table SELECT * FROM text_table;
+
+-- 1 gigabyte.
+set PARQUET_FILE_SIZE=1g;
+INSERT OVERWRITE parquet_table SELECT * FROM text_table;
+</code></pre>
+
+    <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+    <p class="p">
+      With tables that are small or finely partitioned, the default Parquet block size (formerly 1 GB, now 256 MB
+      in Impala 2.0 and later) could be much larger than needed for each data file. For <code class="ph codeph">INSERT</code>
+      operations into such tables, you can increase parallelism by specifying a smaller
+      <code class="ph codeph">PARQUET_FILE_SIZE</code> value, resulting in more HDFS blocks that can be processed by different
+      nodes.
+
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Type:</strong> numeric, with optional unit specifier
+    </p>
+
+    <div class="note important note_important"><span class="note__title importanttitle">Important:</span> 
+    <p class="p">
+      Currently, the maximum value for this setting is 1 gigabyte (<code class="ph codeph">1g</code>).
+      Setting a value higher than 1 gigabyte could result in errors during
+      an <code class="ph codeph">INSERT</code> operation.
+    </p>
+    </div>
+
+    <p class="p">
+      <strong class="ph b">Default:</strong> 0 (produces files with a target size of 256 MB; files might be larger for very wide tables)
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Isilon considerations:</strong>
+      </p>
+    <div class="p">
+        Because the EMC Isilon storage devices use a global value for the block size
+        rather than a configurable value for each file, the <code class="ph codeph">PARQUET_FILE_SIZE</code>
+        query option has no effect when Impala inserts data into a table or partition
+        residing on Isilon storage. Use the <code class="ph codeph">isi</code> command to set the
+        default block size globally on the Isilon device. For example, to set the
+        Isilon default block size to 256 MB, the recommended size for Parquet
+        data files for Impala, issue the following command:
+<pre class="pre codeblock"><code>isi hdfs settings modify --default-block-size=256MB</code></pre>
+      </div>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+    <p class="p">
+      For information about the Parquet file format, and how the number and size of data files affects query
+      performance, see <a class="xref" href="impala_parquet.html#parquet">Using the Parquet File Format with Impala Tables</a>.
+    </p>
+
+
+
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>
\ No newline at end of file

[09/51] [partial] incubator-impala git commit: IMPALA-4181 [DOCS] Publish rendered Impala documentation to ASF site

Posted by jb...@apache.org.

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_scalability.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_scalability.html b/docs/build/html/topics/impala_scalability.html
new file mode 100644
index 0000000..a850d35
--- /dev/null
+++ b/docs/build/html/topics/impala_scalability.html
@@ -0,0 +1,711 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.
 8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="scalability"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Scalability Considerations for Impala</title></head><body id="scalability"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Scalability Considerations for Impala</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      This section explains how the size of your cluster and the volume of data influences SQL performance and
+      schema design for Impala tables. Typically, adding more cluster capacity reduces problems due to memory
+      limits or disk throughput. On the other hand, larger clusters are more likely to have other kinds of
+      scalability issues, such as a single slow node that causes performance problems for queries.
+    </p>
+
+    <p class="p toc inpage"></p>
+
+    <p class="p">
+        A good source of tips related to scalability and performance tuning is the
+        <a class="xref" href="http://www.slideshare.net/cloudera/the-impala-cookbook-42530186" target="_blank">Impala Cookbook</a>
+        presentation. These slides are updated periodically as new features come out and new benchmarks are performed.
+      </p>
+
+  </div>
+
+  
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title2" id="scalability__scalability_catalog">
+
+    <h2 class="title topictitle2" id="ariaid-title2">Impact of Many Tables or Partitions on Impala Catalog Performance and Memory Usage</h2>
+
+    <div class="body conbody">
+
+      
+
+      <p class="p">
+        Because Hadoop I/O is optimized for reading and writing large files, Impala is optimized for tables
+        containing relatively few, large data files. Schemas containing thousands of tables, or tables containing
+        thousands of partitions, can encounter performance issues during startup or during DDL operations such as
+        <code class="ph codeph">ALTER TABLE</code> statements.
+      </p>
+
+      <div class="note important note_important"><span class="note__title importanttitle">Important:</span> 
+      <p class="p">
+        Because of a change in the default heap size for the <span class="keyword cmdname">catalogd</span> daemon in
+        <span class="keyword">Impala 2.5</span> and higher, the following procedure to increase the <span class="keyword cmdname">catalogd</span>
+        memory limit might be required following an upgrade to <span class="keyword">Impala 2.5</span> even if not
+        needed previously.
+      </p>
+      </div>
+
+      <div class="p">
+        For schemas with large numbers of tables, partitions, and data files, the <span class="keyword cmdname">catalogd</span>
+        daemon might encounter an out-of-memory error. To increase the memory limit for the
+        <span class="keyword cmdname">catalogd</span> daemon:
+
+        <ol class="ol">
+          <li class="li">
+            <p class="p">
+              Check current memory usage for the <span class="keyword cmdname">catalogd</span> daemon by running the
+              following commands on the host where that daemon runs on your cluster:
+            </p>
+  <pre class="pre codeblock"><code>
+  jcmd <var class="keyword varname">catalogd_pid</var> VM.flags
+  jmap -heap <var class="keyword varname">catalogd_pid</var>
+  </code></pre>
+          </li>
+          <li class="li">
+            <p class="p">
+              Decide on a large enough value for the <span class="keyword cmdname">catalogd</span> heap.
+              You express it as an environment variable value as follows:
+            </p>
+  <pre class="pre codeblock"><code>
+  JAVA_TOOL_OPTIONS="-Xmx8g"
+  </code></pre>
+          </li>
+          <li class="li">
+            <p class="p">
+              On systems not using cluster management software, put this environment variable setting into the
+              startup script for the <span class="keyword cmdname">catalogd</span> daemon, then restart the <span class="keyword cmdname">catalogd</span>
+              daemon.
+            </p>
+          </li>
+          <li class="li">
+            <p class="p">
+              Use the same <span class="keyword cmdname">jcmd</span> and <span class="keyword cmdname">jmap</span> commands as earlier to
+              verify that the new settings are in effect.
+            </p>
+          </li>
+        </ol>
+      </div>
+
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="scalability__statestore_scalability">
+
+    <h2 class="title topictitle2" id="ariaid-title3">Scalability Considerations for the Impala Statestore</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        Before <span class="keyword">Impala 2.1</span>, the statestore sent only one kind of message to its subscribers. This message contained all
+        updates for any topics that a subscriber had subscribed to. It also served to let subscribers know that the
+        statestore had not failed, and conversely the statestore used the success of sending a heartbeat to a
+        subscriber to decide whether or not the subscriber had failed.
+      </p>
+
+      <p class="p">
+        Combining topic updates and failure detection in a single message led to bottlenecks in clusters with large
+        numbers of tables, partitions, and HDFS data blocks. When the statestore was overloaded with metadata
+        updates to transmit, heartbeat messages were sent less frequently, sometimes causing subscribers to time
+        out their connection with the statestore. Increasing the subscriber timeout and decreasing the frequency of
+        statestore heartbeats worked around the problem, but reduced responsiveness when the statestore failed or
+        restarted.
+      </p>
+
+      <p class="p">
+        As of <span class="keyword">Impala 2.1</span>, the statestore now sends topic updates and heartbeats in separate messages. This allows the
+        statestore to send and receive a steady stream of lightweight heartbeats, and removes the requirement to
+        send topic updates according to a fixed schedule, reducing statestore network overhead.
+      </p>
+
+      <p class="p">
+        The statestore now has the following relevant configuration flags for the <span class="keyword cmdname">statestored</span>
+        daemon:
+      </p>
+
+      <dl class="dl">
+        
+
+          <dt class="dt dlterm" id="statestore_scalability__statestore_num_update_threads">
+            <code class="ph codeph">-statestore_num_update_threads</code>
+          </dt>
+
+          <dd class="dd">
+            The number of threads inside the statestore dedicated to sending topic updates. You should not
+            typically need to change this value.
+            <p class="p">
+              <strong class="ph b">Default:</strong> 10
+            </p>
+          </dd>
+
+        
+
+        
+
+          <dt class="dt dlterm" id="statestore_scalability__statestore_update_frequency_ms">
+            <code class="ph codeph">-statestore_update_frequency_ms</code>
+          </dt>
+
+          <dd class="dd">
+            The frequency, in milliseconds, with which the statestore tries to send topic updates to each
+            subscriber. This is a best-effort value; if the statestore is unable to meet this frequency, it sends
+            topic updates as fast as it can. You should not typically need to change this value.
+            <p class="p">
+              <strong class="ph b">Default:</strong> 2000
+            </p>
+          </dd>
+
+        
+
+        
+
+          <dt class="dt dlterm" id="statestore_scalability__statestore_num_heartbeat_threads">
+            <code class="ph codeph">-statestore_num_heartbeat_threads</code>
+          </dt>
+
+          <dd class="dd">
+            The number of threads inside the statestore dedicated to sending heartbeats. You should not typically
+            need to change this value.
+            <p class="p">
+              <strong class="ph b">Default:</strong> 10
+            </p>
+          </dd>
+
+        
+
+        
+
+          <dt class="dt dlterm" id="statestore_scalability__statestore_heartbeat_frequency_ms">
+            <code class="ph codeph">-statestore_heartbeat_frequency_ms</code>
+          </dt>
+
+          <dd class="dd">
+            The frequency, in milliseconds, with which the statestore tries to send heartbeats to each subscriber.
+            This value should be good for large catalogs and clusters up to approximately 150 nodes. Beyond that,
+            you might need to increase this value to make the interval longer between heartbeat messages.
+            <p class="p">
+              <strong class="ph b">Default:</strong> 1000 (one heartbeat message every second)
+            </p>
+          </dd>
+
+        
+      </dl>
+
+      <p class="p">
+        If it takes a very long time for a cluster to start up, and <span class="keyword cmdname">impala-shell</span> consistently
+        displays <code class="ph codeph">This Impala daemon is not ready to accept user requests</code>, the statestore might be
+        taking too long to send the entire catalog topic to the cluster. In this case, consider adding
+        <code class="ph codeph">--load_catalog_in_background=false</code> to your catalog service configuration. This setting
+        stops the statestore from loading the entire catalog into memory at cluster startup. Instead, metadata for
+        each table is loaded when the table is accessed for the first time.
+      </p>
+    </div>
+  </article>
+
+  
+
+  
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="scalability__spill_to_disk">
+
+    <h2 class="title topictitle2" id="ariaid-title4">SQL Operations that Spill to Disk</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        Certain memory-intensive operations write temporary data to disk (known as <dfn class="term">spilling</dfn> to disk)
+        when Impala is close to exceeding its memory limit on a particular host.
+      </p>
+
+      <p class="p">
+        The result is a query that completes successfully, rather than failing with an out-of-memory error. The
+        tradeoff is decreased performance due to the extra disk I/O to write the temporary data and read it back
+        in. The slowdown could be potentially be significant. Thus, while this feature improves reliability,
+        you should optimize your queries, system parameters, and hardware configuration to make this spilling a rare occurrence.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">What kinds of queries might spill to disk:</strong>
+      </p>
+
+      <p class="p">
+        Several SQL clauses and constructs require memory allocations that could activat the spilling mechanism:
+      </p>
+      <ul class="ul">
+        <li class="li">
+          <p class="p">
+            when a query uses a <code class="ph codeph">GROUP BY</code> clause for columns
+            with millions or billions of distinct values, Impala keeps a
+            similar number of temporary results in memory, to accumulate the
+            aggregate results for each value in the group.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            When large tables are joined together, Impala keeps the values of
+            the join columns from one table in memory, to compare them to
+            incoming values from the other table.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            When a large result set is sorted by the <code class="ph codeph">ORDER BY</code>
+            clause, each node sorts its portion of the result set in memory.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            The <code class="ph codeph">DISTINCT</code> and <code class="ph codeph">UNION</code> operators
+            build in-memory data structures to represent all values found so
+            far, to eliminate duplicates as the query progresses.
+          </p>
+        </li>
+        
+      </ul>
+
+      <p class="p">
+        When the spill-to-disk feature is activated for a join node within a query, Impala does not
+        produce any runtime filters for that join operation on that host. Other join nodes within
+        the query are not affected.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">How Impala handles scratch disk space for spilling:</strong>
+      </p>
+
+      <p class="p">
+        By default, intermediate files used during large sort, join, aggregation, or analytic function operations
+        are stored in the directory <span class="ph filepath">/tmp/impala-scratch</span> . These files are removed when the
+        operation finishes. (Multiple concurrent queries can perform operations that use the <span class="q">"spill to disk"</span>
+        technique, without any name conflicts for these temporary files.) You can specify a different location by
+        starting the <span class="keyword cmdname">impalad</span> daemon with the
+        <code class="ph codeph">--scratch_dirs="<var class="keyword varname">path_to_directory</var>"</code> configuration option.
+        You can specify a single directory, or a comma-separated list of directories. The scratch directories must
+        be on the local filesystem, not in HDFS. You might specify different directory paths for different hosts,
+        depending on the capacity and speed
+        of the available storage devices. In <span class="keyword">Impala 2.3</span> or higher, Impala successfully starts (with a warning
+        Impala successfully starts (with a warning written to the log) if it cannot create or read and write files
+        in one of the scratch directories. If there is less than 1 GB free on the filesystem where that directory resides,
+        Impala still runs, but writes a warning message to its log.  If Impala encounters an error reading or writing
+        files in a scratch directory during a query, Impala logs the error and the query fails.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Memory usage for SQL operators:</strong>
+      </p>
+
+      <p class="p">
+        The infrastructure of the spilling feature affects the way the affected SQL operators, such as
+        <code class="ph codeph">GROUP BY</code>, <code class="ph codeph">DISTINCT</code>, and joins, use memory.
+        On each host that participates in the query, each such operator in a query accumulates memory
+        while building the data structure to process the aggregation or join operation. The amount
+        of memory used depends on the portion of the data being handled by that host, and thus might
+        be different from one host to another. When the amount of memory being used for the operator
+        on a particular host reaches a threshold amount, Impala reserves an additional memory buffer
+        to use as a work area in case that operator causes the query to exceed the memory limit for
+        that host. After allocating the memory buffer, the memory used by that operator remains
+        essentially stable or grows only slowly, until the point where the memory limit is reached
+        and the query begins writing temporary data to disk.
+      </p>
+
+      <p class="p">
+        Prior to Impala 2.2, the extra memory buffer for an operator that might spill to disk
+        was allocated when the data structure used by the applicable SQL operator reaches 16 MB in size,
+        and the memory buffer itself was 512 MB. In Impala 2.2, these values are halved: the threshold value
+        is 8 MB and the memory buffer is 256 MB. <span class="ph">In <span class="keyword">Impala 2.3</span> and higher, the memory for the buffer
+        is allocated in pieces, only as needed, to avoid sudden large jumps in memory usage.</span> A query that uses
+        multiple such operators might allocate multiple such memory buffers, as the size of the data structure
+        for each operator crosses the threshold on a particular host.
+      </p>
+
+      <p class="p">
+        Therefore, a query that processes a relatively small amount of data on each host would likely
+        never reach the threshold for any operator, and would never allocate any extra memory buffers. A query
+        that did process millions of groups, distinct values, join keys, and so on might cross the threshold,
+        causing its memory requirement to rise suddenly and then flatten out. The larger the cluster, less data is processed
+        on any particular host, thus reducing the chance of requiring the extra memory allocation.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Added in:</strong> This feature was added to the <code class="ph codeph">ORDER BY</code> clause in Impala 1.4.
+        This feature was extended to cover join queries, aggregation functions, and analytic
+        functions in Impala 2.0. The size of the memory work area required by
+        each operator that spills was reduced from 512 megabytes to 256 megabytes in Impala 2.2.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Avoiding queries that spill to disk:</strong>
+      </p>
+
+      <p class="p">
+        Because the extra I/O can impose significant performance overhead on these types of queries, try to avoid
+        this situation by using the following steps:
+      </p>
+
+      <ol class="ol">
+        <li class="li">
+          Detect how often queries spill to disk, and how much temporary data is written. Refer to the following
+          sources:
+          <ul class="ul">
+            <li class="li">
+              The output of the <code class="ph codeph">PROFILE</code> command in the <span class="keyword cmdname">impala-shell</span>
+              interpreter. This data shows the memory usage for each host and in total across the cluster. The
+              <code class="ph codeph">BlockMgr.BytesWritten</code> counter reports how much data was written to disk during the
+              query.
+            </li>
+
+            <li class="li">
+              The <span class="ph uicontrol">Queries</span> tab in the Impala debug web user interface. Select the query to
+              examine and click the corresponding <span class="ph uicontrol">Profile</span> link. This data breaks down the
+              memory usage for a single host within the cluster, the host whose web interface you are connected to.
+            </li>
+          </ul>
+        </li>
+
+        <li class="li">
+          Use one or more techniques to reduce the possibility of the queries spilling to disk:
+          <ul class="ul">
+            <li class="li">
+              Increase the Impala memory limit if practical, for example, if you can increase the available memory
+              by more than the amount of temporary data written to disk on a particular node. Remember that in
+              Impala 2.0 and later, you can issue <code class="ph codeph">SET MEM_LIMIT</code> as a SQL statement, which lets you
+              fine-tune the memory usage for queries from JDBC and ODBC applications.
+            </li>
+
+            <li class="li">
+              Increase the number of nodes in the cluster, to increase the aggregate memory available to Impala and
+              reduce the amount of memory required on each node.
+            </li>
+
+            <li class="li">
+              Increase the overall memory capacity of each DataNode at the hardware level.
+            </li>
+
+            <li class="li">
+              On a cluster with resources shared between Impala and other Hadoop components, use resource
+              management features to allocate more memory for Impala. See
+              <a class="xref" href="impala_resource_management.html#resource_management">Resource Management for Impala</a> for details.
+            </li>
+
+            <li class="li">
+              If the memory pressure is due to running many concurrent queries rather than a few memory-intensive
+              ones, consider using the Impala admission control feature to lower the limit on the number of
+              concurrent queries. By spacing out the most resource-intensive queries, you can avoid spikes in
+              memory usage and improve overall response times. See
+              <a class="xref" href="impala_admission.html#admission_control">Admission Control and Query Queuing</a> for details.
+            </li>
+
+            <li class="li">
+              Tune the queries with the highest memory requirements, using one or more of the following techniques:
+              <ul class="ul">
+                <li class="li">
+                  Run the <code class="ph codeph">COMPUTE STATS</code> statement for all tables involved in large-scale joins and
+                  aggregation queries.
+                </li>
+
+                <li class="li">
+                  Minimize your use of <code class="ph codeph">STRING</code> columns in join columns. Prefer numeric values
+                  instead.
+                </li>
+
+                <li class="li">
+                  Examine the <code class="ph codeph">EXPLAIN</code> plan to understand the execution strategy being used for the
+                  most resource-intensive queries. See <a class="xref" href="impala_explain_plan.html#perf_explain">Using the EXPLAIN Plan for Performance Tuning</a> for
+                  details.
+                </li>
+
+                <li class="li">
+                  If Impala still chooses a suboptimal execution strategy even with statistics available, or if it
+                  is impractical to keep the statistics up to date for huge or rapidly changing tables, add hints
+                  to the most resource-intensive queries to select the right execution strategy. See
+                  <a class="xref" href="impala_hints.html#hints">Query Hints in Impala SELECT Statements</a> for details.
+                </li>
+              </ul>
+            </li>
+
+            <li class="li">
+              If your queries experience substantial performance overhead due to spilling, enable the
+              <code class="ph codeph">DISABLE_UNSAFE_SPILLS</code> query option. This option prevents queries whose memory usage
+              is likely to be exorbitant from spilling to disk. See
+              <a class="xref" href="impala_disable_unsafe_spills.html#disable_unsafe_spills">DISABLE_UNSAFE_SPILLS Query Option (Impala 2.0 or higher only)</a> for details. As you tune
+              problematic queries using the preceding steps, fewer and fewer will be cancelled by this option
+              setting.
+            </li>
+          </ul>
+        </li>
+      </ol>
+
+      <p class="p">
+        <strong class="ph b">Testing performance implications of spilling to disk:</strong>
+      </p>
+
+      <p class="p">
+        To artificially provoke spilling, to test this feature and understand the performance implications, use a
+        test environment with a memory limit of at least 2 GB. Issue the <code class="ph codeph">SET</code> command with no
+        arguments to check the current setting for the <code class="ph codeph">MEM_LIMIT</code> query option. Set the query
+        option <code class="ph codeph">DISABLE_UNSAFE_SPILLS=true</code>. This option limits the spill-to-disk feature to prevent
+        runaway disk usage from queries that are known in advance to be suboptimal. Within
+        <span class="keyword cmdname">impala-shell</span>, run a query that you expect to be memory-intensive, based on the criteria
+        explained earlier. A self-join of a large table is a good candidate:
+      </p>
+
+<pre class="pre codeblock"><code>select count(*) from big_table a join big_table b using (column_with_many_values);
+</code></pre>
+
+      <p class="p">
+        Issue the <code class="ph codeph">PROFILE</code> command to get a detailed breakdown of the memory usage on each node
+        during the query. The crucial part of the profile output concerning memory is the <code class="ph codeph">BlockMgr</code>
+        portion. For example, this profile shows that the query did not quite exceed the memory limit.
+      </p>
+
+<pre class="pre codeblock"><code>BlockMgr:
+   - BlockWritesIssued: 1
+   - BlockWritesOutstanding: 0
+   - BlocksCreated: 24
+   - BlocksRecycled: 1
+   - BufferedPins: 0
+   - MaxBlockSize: 8.00 MB (8388608)
+   <strong class="ph b">- MemoryLimit: 200.00 MB (209715200)</strong>
+   <strong class="ph b">- PeakMemoryUsage: 192.22 MB (201555968)</strong>
+   - TotalBufferWaitTime: 0ns
+   - TotalEncryptionTime: 0ns
+   - TotalIntegrityCheckTime: 0ns
+   - TotalReadBlockTime: 0ns
+</code></pre>
+
+      <p class="p">
+        In this case, because the memory limit was already below any recommended value, I increased the volume of
+        data for the query rather than reducing the memory limit any further.
+      </p>
+
+      <p class="p">
+        Set the <code class="ph codeph">MEM_LIMIT</code> query option to a value that is smaller than the peak memory usage
+        reported in the profile output. Do not specify a memory limit lower than about 300 MB, because with such a
+        low limit, queries could fail to start for other reasons. Now try the memory-intensive query again.
+      </p>
+
+      <p class="p">
+        Check if the query fails with a message like the following:
+      </p>
+
+<pre class="pre codeblock"><code>WARNINGS: Spilling has been disabled for plans that do not have stats and are not hinted
+to prevent potentially bad plans from using too many cluster resources. Compute stats on
+these tables, hint the plan or disable this behavior via query options to enable spilling.
+</code></pre>
+
+      <p class="p">
+        If so, the query could have consumed substantial temporary disk space, slowing down so much that it would
+        not complete in any reasonable time. Rather than rely on the spill-to-disk feature in this case, issue the
+        <code class="ph codeph">COMPUTE STATS</code> statement for the table or tables in your sample query. Then run the query
+        again, check the peak memory usage again in the <code class="ph codeph">PROFILE</code> output, and adjust the memory
+        limit again if necessary to be lower than the peak memory usage.
+      </p>
+
+      <p class="p">
+        At this point, you have a query that is memory-intensive, but Impala can optimize it efficiently so that
+        the memory usage is not exorbitant. You have set an artificial constraint through the
+        <code class="ph codeph">MEM_LIMIT</code> option so that the query would normally fail with an out-of-memory error. But
+        the automatic spill-to-disk feature means that the query should actually succeed, at the expense of some
+        extra disk I/O to read and write temporary work data.
+      </p>
+
+      <p class="p">
+        Try the query again, and confirm that it succeeds. Examine the <code class="ph codeph">PROFILE</code> output again. This
+        time, look for lines of this form:
+      </p>
+
+<pre class="pre codeblock"><code>- SpilledPartitions: <var class="keyword varname">N</var>
+</code></pre>
+
+      <p class="p">
+        If you see any such lines with <var class="keyword varname">N</var> greater than 0, that indicates the query would have
+        failed in Impala releases prior to 2.0, but now it succeeded because of the spill-to-disk feature. Examine
+        the total time taken by the <code class="ph codeph">AGGREGATION_NODE</code> or other query fragments containing non-zero
+        <code class="ph codeph">SpilledPartitions</code> values. Compare the times to similar fragments that did not spill, for
+        example in the <code class="ph codeph">PROFILE</code> output when the same query is run with a higher memory limit. This
+        gives you an idea of the performance penalty of the spill operation for a particular query with a
+        particular memory limit. If you make the memory limit just a little lower than the peak memory usage, the
+        query only needs to write a small amount of temporary data to disk. The lower you set the memory limit, the
+        more temporary data is written and the slower the query becomes.
+      </p>
+
+      <p class="p">
+        Now repeat this procedure for actual queries used in your environment. Use the
+        <code class="ph codeph">DISABLE_UNSAFE_SPILLS</code> setting to identify cases where queries used more memory than
+        necessary due to lack of statistics on the relevant tables and columns, and issue <code class="ph codeph">COMPUTE
+        STATS</code> where necessary.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">When to use DISABLE_UNSAFE_SPILLS:</strong>
+      </p>
+
+      <p class="p">
+        You might wonder, why not leave <code class="ph codeph">DISABLE_UNSAFE_SPILLS</code> turned on all the time. Whether and
+        how frequently to use this option depends on your system environment and workload.
+      </p>
+
+      <p class="p">
+        <code class="ph codeph">DISABLE_UNSAFE_SPILLS</code> is suitable for an environment with ad hoc queries whose performance
+        characteristics and memory usage are not known in advance. It prevents <span class="q">"worst-case scenario"</span> queries
+        that use large amounts of memory unnecessarily. Thus, you might turn this option on within a session while
+        developing new SQL code, even though it is turned off for existing applications.
+      </p>
+
+      <p class="p">
+        Organizations where table and column statistics are generally up-to-date might leave this option turned on
+        all the time, again to avoid worst-case scenarios for untested queries or if a problem in the ETL pipeline
+        results in a table with no statistics. Turning on <code class="ph codeph">DISABLE_UNSAFE_SPILLS</code> lets you <span class="q">"fail
+        fast"</span> in this case and immediately gather statistics or tune the problematic queries.
+      </p>
+
+      <p class="p">
+        Some organizations might leave this option turned off. For example, you might have tables large enough that
+        the <code class="ph codeph">COMPUTE STATS</code> takes substantial time to run, making it impractical to re-run after
+        loading new data. If you have examined the <code class="ph codeph">EXPLAIN</code> plans of your queries and know that
+        they are operating efficiently, you might leave <code class="ph codeph">DISABLE_UNSAFE_SPILLS</code> turned off. In that
+        case, you know that any queries that spill will not go overboard with their memory consumption.
+      </p>
+
+    </div>
+  </article>
+
+<article class="topic concept nested1" aria-labelledby="ariaid-title5" id="scalability__complex_query">
+<h2 class="title topictitle2" id="ariaid-title5">Limits on Query Size and Complexity</h2>
+<div class="body conbody">
+<p class="p">
+There are hardcoded limits on the maximum size and complexity of queries.
+Currently, the maximum number of expressions in a query is 2000.
+You might exceed the limits with large or deeply nested queries
+produced by business intelligence tools or other query generators.
+</p>
+<p class="p">
+If you have the ability to customize such queries or the query generation
+logic that produces them, replace sequences of repetitive expressions
+with single operators such as <code class="ph codeph">IN</code> or <code class="ph codeph">BETWEEN</code>
+that can represent multiple values or ranges.
+For example, instead of a large number of <code class="ph codeph">OR</code> clauses:
+</p>
+<pre class="pre codeblock"><code>WHERE val = 1 OR val = 2 OR val = 6 OR val = 100 ...
+</code></pre>
+<p class="p">
+use a single <code class="ph codeph">IN</code> clause:
+</p>
+<pre class="pre codeblock"><code>WHERE val IN (1,2,6,100,...)</code></pre>
+</div>
+</article>
+
+<article class="topic concept nested1" aria-labelledby="ariaid-title6" id="scalability__scalability_io">
+<h2 class="title topictitle2" id="ariaid-title6">Scalability Considerations for Impala I/O</h2>
+<div class="body conbody">
+<p class="p">
+Impala parallelizes its I/O operations aggressively,
+therefore the more disks you can attach to each host, the better.
+Impala retrieves data from disk so quickly using
+bulk read operations on large blocks, that most queries
+are CPU-bound rather than I/O-bound.
+</p>
+<p class="p">
+Because the kind of sequential scanning typically done by
+Impala queries does not benefit much from the random-access
+capabilities of SSDs, spinning disks typically provide
+the most cost-effective kind of storage for Impala data,
+with little or no performance penalty as compared to SSDs.
+</p>
+<p class="p">
+Resource management features such as YARN, Llama, and admission control
+typically constrain the amount of memory, CPU, or overall number of
+queries in a high-concurrency environment.
+Currently, there is no throttling mechanism for Impala I/O.
+</p>
+</div>
+</article>
+
+<article class="topic concept nested1" aria-labelledby="ariaid-title7" id="scalability__big_tables">
+<h2 class="title topictitle2" id="ariaid-title7">Scalability Considerations for Table Layout</h2>
+<div class="body conbody">
+<p class="p">
+Due to the overhead of retrieving and updating table metadata
+in the metastore database, try to limit the number of columns
+in a table to a maximum of approximately 2000.
+Although Impala can handle wider tables than this, the metastore overhead
+can become significant, leading to query performance that is slower
+than expected based on the actual data volume.
+</p>
+<p class="p">
+To minimize overhead related to the metastore database and Impala query planning,
+try to limit the number of partitions for any partitioned table to a few tens of thousands.
+</p>
+</div>
+</article>
+
+<article class="topic concept nested1" aria-labelledby="ariaid-title8" id="scalability__kerberos_overhead_cluster_size">
+<h2 class="title topictitle2" id="ariaid-title8">Kerberos-Related Network Overhead for Large Clusters</h2>
+<div class="body conbody">
+<p class="p">
+When Impala starts up, or after each <code class="ph codeph">kinit</code> refresh, Impala sends a number of
+simultaneous requests to the KDC. For a cluster with 100 hosts, the KDC might be able to process
+all the requests within roughly 5 seconds. For a cluster with 1000 hosts, the time to process
+the requests would be roughly 500 seconds. Impala also makes a number of DNS requests at the same
+time as these Kerberos-related requests.
+</p>
+<p class="p">
+While these authentication requests are being processed, any submitted Impala queries will fail.
+During this period, the KDC and DNS may be slow to respond to requests from components other than Impala,
+so other secure services might be affected temporarily.
+</p>
+
+<p class="p">
+  To reduce the frequency  of the <code class="ph codeph">kinit</code> renewal that initiates
+  a new set of authentication requests, increase the <code class="ph codeph">kerberos_reinit_interval</code>
+  configuration setting for the <span class="keyword cmdname">impalad</span> daemons. Currently, the default is 60 minutes.
+  Consider using a higher value such as 360 (6 hours).
+</p>
+
+</div>
+</article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title9" id="scalability__scalability_hotspots">
+    <h2 class="title topictitle2" id="ariaid-title9">Avoiding CPU Hotspots for HDFS Cached Data</h2>
+    <div class="body conbody">
+      <p class="p">
+        You can use the HDFS caching feature, described in <a class="xref" href="impala_perf_hdfs_caching.html#hdfs_caching">Using HDFS Caching with Impala (Impala 2.1 or higher only)</a>,
+        with Impala to reduce I/O and memory-to-memory copying for frequently accessed tables or partitions.
+      </p>
+      <p class="p">
+        In the early days of this feature, you might have found that enabling HDFS caching
+        resulted in little or no performance improvement, because it could result in
+        <span class="q">"hotspots"</span>: instead of the I/O to read the table data being parallelized across
+        the cluster, the I/O was reduced but the CPU load to process the data blocks
+        might be concentrated on a single host.
+      </p>
+      <p class="p">
+        To avoid hotspots, include the <code class="ph codeph">WITH REPLICATION</code> clause with the
+        <code class="ph codeph">CREATE TABLE</code> or <code class="ph codeph">ALTER TABLE</code> statements for tables that use HDFS caching.
+        This clause allows more than one host to cache the relevant data blocks, so the CPU load
+        can be shared, reducing the load on any one host.
+        See <a class="xref" href="impala_create_table.html#create_table">CREATE TABLE Statement</a> and <a class="xref" href="impala_alter_table.html#alter_table">ALTER TABLE Statement</a>
+        for details.
+      </p>
+      <p class="p">
+        Hotspots with high CPU load for HDFS cached data could still arise in some cases, due to
+        the way that Impala schedules the work of processing data blocks on different hosts.
+        In <span class="keyword">Impala 2.5</span> and higher, scheduling improvements mean that the work for
+        HDFS cached data is divided better among all the hosts that have cached replicas
+        for a particular data block. When more than one host has a cached replica for a data block,
+        Impala assigns the work of processing that block to whichever host has done the least work
+        (in terms of number of bytes read) for the current query. If hotspots persist even with this
+        load-based scheduling algorithm, you can enable the query option <code class="ph codeph">SCHEDULE_RANDOM_REPLICA=TRUE</code>
+        to further distribute the CPU load. This setting causes Impala to randomly pick a host to process a cached
+        data block if the scheduling algorithm encounters a tie when deciding which host has done the
+        least work.
+      </p>
+    </div>
+  </article>
+
+</article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_scan_node_codegen_threshold.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_scan_node_codegen_threshold.html b/docs/build/html/topics/impala_scan_node_codegen_threshold.html
new file mode 100644
index 0000000..2e71e50
--- /dev/null
+++ b/docs/build/html/topics/impala_scan_node_codegen_threshold.html
@@ -0,0 +1,69 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="scan_node_codegen_threshold"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>SCAN_NODE_CODEGEN_THRESHOLD Query Option (Impala 2.5 or higher only)</title></head><body id="scan_node_codegen_threshold"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">SCAN_NODE_CODEGEN_THRESHOLD Query Option (<span class="keyword">Impala 2.5</span> or higher only)</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      
+      The <code class="ph codeph">SCAN_NODE_CODEGEN_THRESHOLD</code> query option
+      adjusts the aggressiveness of the code generation optimization process
+      when performing I/O read operations. It can help to work around performance problems
+      for queries where the table is small and the <code class="ph codeph">WHERE</code> clause is complicated.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Type:</strong> integer
+      </p>
+
+    <p class="p">
+      <strong class="ph b">Default:</strong> 1800000 (1.8 million)
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.5.0</span>
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+    <p class="p">
+      This query option is intended mainly for the case where a query with a very complicated
+      <code class="ph codeph">WHERE</code> clause, such as an <code class="ph codeph">IN</code> operator with thousands
+      of entries, is run against a small table, especially a small table using Parquet format.
+      The code generation phase can become the dominant factor in the query response time,
+      making the query take several seconds even though there is relatively little work to do.
+      In this case, increase the value of this option to a much larger amount, anything up to
+      the maximum for a 32-bit integer.
+    </p>
+
+    <p class="p">
+      Because this option only affects the code generation phase for the portion of the
+      query that performs I/O (the <dfn class="term">scan nodes</dfn> within the query plan), it
+      lets you continue to keep code generation enabled for other queries, and other parts
+      of the same query, that can benefit from it. In contrast, the
+      <code class="ph codeph">IMPALA_DISABLE_CODEGEN</code> query option turns off code generation entirely.
+    </p>
+
+    <p class="p">
+      Because of the way the work for queries is divided internally, this option might not
+      affect code generation for all kinds of queries. If a plan fragment contains a scan
+      node and some other kind of plan node, code generation still occurs regardless of
+      this option setting.
+    </p>
+
+    <p class="p">
+      To use this option effectively, you should be familiar with reading query profile output
+      to determine the proportion of time spent in the code generation phase, and whether
+      code generation is enabled or not for specific plan fragments.
+    </p>
+
+
+
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_schedule_random_replica.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_schedule_random_replica.html b/docs/build/html/topics/impala_schedule_random_replica.html
new file mode 100644
index 0000000..9826960
--- /dev/null
+++ b/docs/build/html/topics/impala_schedule_random_replica.html
@@ -0,0 +1,83 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="schedule_random_replica"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>SCHEDULE_RANDOM_REPLICA Query Option (Impala 2.5 or higher only)</title></head><body id="schedule_random_replica"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">SCHEDULE_RANDOM_REPLICA Query Option (<span class="keyword">Impala 2.5</span> or higher only)</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      
+    </p>
+
+    <p class="p">
+      The <code class="ph codeph">SCHEDULE_RANDOM_REPLICA</code> query option fine-tunes the algorithm for deciding which host
+      processes each HDFS data block. It only applies to tables and partitions that are not enabled
+      for the HDFS caching feature.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Type:</strong> Boolean; recognized values are 1 and 0, or <code class="ph codeph">true</code> and <code class="ph codeph">false</code>;
+        any other value interpreted as <code class="ph codeph">false</code>
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Default:</strong> <code class="ph codeph">false</code>
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.5.0</span>
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+    <p class="p">
+      In the presence of HDFS cached replicas, Impala randomizes
+      which host processes each cached data block.
+      To ensure that HDFS data blocks are cached on more
+      than one host, use the <code class="ph codeph">WITH REPLICATION</code> clause along with
+      the <code class="ph codeph">CACHED IN</code> clause in a
+      <code class="ph codeph">CREATE TABLE</code> or <code class="ph codeph">ALTER TABLE</code> statement.
+      Specify a replication value greater than or equal to the HDFS block replication factor.
+    </p>
+
+    <p class="p">
+      The <code class="ph codeph">SCHEDULE_RANDOM_REPLICA</code> query option applies to tables and partitions
+      that <em class="ph i">do not</em> use HDFS caching.
+      By default, Impala estimates how much work each host has done for
+      the query, and selects the host that has the lowest workload.
+      This algorithm is intended to reduce CPU hotspots arising when the
+      same host is selected to process multiple data blocks, but hotspots
+      might still arise for some combinations of queries and data layout.
+      When the <code class="ph codeph">SCHEDULE_RANDOM_REPLICA</code> option is enabled,
+      Impala further randomizes the scheduling algorithm for non-HDFS cached blocks,
+      which can further reduce the chance of CPU hotspots.
+    </p>
+
+    <p class="p">
+      This query option works in conjunction with the work scheduling improvements
+      in <span class="keyword">Impala 2.5</span> and higher. The scheduling improvements
+      distribute the processing for cached HDFS data blocks to minimize hotspots:
+      if a data block is cached on more than one host, Impala chooses which host
+      to process each block based on which host has read the fewest bytes during
+      the current query. Enable <code class="ph codeph">SCHEDULE_RANDOM_REPLICA</code> setting if CPU hotspots
+      still persist because of cases where hosts are <span class="q">"tied"</span> in terms of
+      the amount of work done; by default, Impala picks the first eligible host
+      in this case.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+    <p class="p">
+      <a class="xref" href="impala_perf_hdfs_caching.html#hdfs_caching">Using HDFS Caching with Impala (Impala 2.1 or higher only)</a>,
+      <a class="xref" href="impala_scalability.html#scalability_hotspots">Avoiding CPU Hotspots for HDFS Cached Data</a>
+      , <a class="xref" href="impala_replica_preference.html#replica_preference">REPLICA_PREFERENCE Query Option (Impala 2.7 or higher only)</a>
+    </p>
+
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_schema_design.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_schema_design.html b/docs/build/html/topics/impala_schema_design.html
new file mode 100644
index 0000000..6825c5d
--- /dev/null
+++ b/docs/build/html/topics/impala_schema_design.html
@@ -0,0 +1,184 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_planning.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="schema_design"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Guidelines for Designing Impala Schemas</title></head><body id="schema_design"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Guidelines for Designing Impala Schemas</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      The guidelines in this topic help you to construct an optimized and scalable schema, one that integrates well
+      with your existing data management processes. Use these guidelines as a checklist when doing any
+      proof-of-concept work, porting exercise, or before deploying to production.
+    </p>
+
+    <p class="p">
+      If you are adapting an existing database or Hive schema for use with Impala, read the guidelines in this
+      section and then see <a class="xref" href="impala_porting.html#porting">Porting SQL from Other Database Systems to Impala</a> for specific porting and compatibility tips.
+    </p>
+
+    <p class="p toc inpage"></p>
+
+    <section class="section" id="schema_design__schema_design_text_vs_binary"><h2 class="title sectiontitle">Prefer binary file formats over text-based formats.</h2>
+
+      
+
+      <p class="p">
+        To save space and improve memory usage and query performance, use binary file formats for any large or
+        intensively queried tables. Parquet file format is the most efficient for data warehouse-style analytic
+        queries. Avro is the other binary file format that Impala supports, that you might already have as part of
+        a Hadoop ETL pipeline.
+      </p>
+
+      <p class="p">
+        Although Impala can create and query tables with the RCFile and SequenceFile file formats, such tables are
+        relatively bulky due to the text-based nature of those formats, and are not optimized for data
+        warehouse-style queries due to their row-oriented layout. Impala does not support <code class="ph codeph">INSERT</code>
+        operations for tables with these file formats.
+      </p>
+
+      <p class="p">
+        Guidelines:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          For an efficient and scalable format for large, performance-critical tables, use the Parquet file format.
+        </li>
+
+        <li class="li">
+          To deliver intermediate data during the ETL process, in a format that can also be used by other Hadoop
+          components, Avro is a reasonable choice.
+        </li>
+
+        <li class="li">
+          For convenient import of raw data, use a text table instead of RCFile or SequenceFile, and convert to
+          Parquet in a later stage of the ETL process.
+        </li>
+      </ul>
+    </section>
+
+    <section class="section" id="schema_design__schema_design_compression"><h2 class="title sectiontitle">Use Snappy compression where practical.</h2>
+
+      
+
+      <p class="p">
+        Snappy compression involves low CPU overhead to decompress, while still providing substantial space
+        savings. In cases where you have a choice of compression codecs, such as with the Parquet and Avro file
+        formats, use Snappy compression unless you find a compelling reason to use a different codec.
+      </p>
+    </section>
+
+    <section class="section" id="schema_design__schema_design_numeric_types"><h2 class="title sectiontitle">Prefer numeric types over strings.</h2>
+
+      
+
+      <p class="p">
+        If you have numeric values that you could treat as either strings or numbers (such as
+        <code class="ph codeph">YEAR</code>, <code class="ph codeph">MONTH</code>, and <code class="ph codeph">DAY</code> for partition key columns), define
+        them as the smallest applicable integer types. For example, <code class="ph codeph">YEAR</code> can be
+        <code class="ph codeph">SMALLINT</code>, <code class="ph codeph">MONTH</code> and <code class="ph codeph">DAY</code> can be <code class="ph codeph">TINYINT</code>.
+        Although you might not see any difference in the way partitioned tables or text files are laid out on disk,
+        using numeric types will save space in binary formats such as Parquet, and in memory when doing queries,
+        particularly resource-intensive queries such as joins.
+      </p>
+    </section>
+
+
+
+    <section class="section" id="schema_design__schema_design_partitioning"><h2 class="title sectiontitle">Partition, but do not over-partition.</h2>
+
+      
+
+      <p class="p">
+        Partitioning is an important aspect of performance tuning for Impala. Follow the procedures in
+        <a class="xref" href="impala_partitioning.html#partitioning">Partitioning for Impala Tables</a> to set up partitioning for your biggest, most
+        intensively queried tables.
+      </p>
+
+      <p class="p">
+        If you are moving to Impala from a traditional database system, or just getting started in the Big Data
+        field, you might not have enough data volume to take advantage of Impala parallel queries with your
+        existing partitioning scheme. For example, if you have only a few tens of megabytes of data per day,
+        partitioning by <code class="ph codeph">YEAR</code>, <code class="ph codeph">MONTH</code>, and <code class="ph codeph">DAY</code> columns might be
+        too granular. Most of your cluster might be sitting idle during queries that target a single day, or each
+        node might have very little work to do. Consider reducing the number of partition key columns so that each
+        partition directory contains several gigabytes worth of data.
+      </p>
+
+      <p class="p">
+        For example, consider a Parquet table where each data file is 1 HDFS block, with a maximum block size of 1
+        GB. (In Impala 2.0 and later, the default Parquet block size is reduced to 256 MB. For this exercise, let's
+        assume you have bumped the size back up to 1 GB by setting the query option
+        <code class="ph codeph">PARQUET_FILE_SIZE=1g</code>.) if you have a 10-node cluster, you need 10 data files (up to 10 GB)
+        to give each node some work to do for a query. But each core on each machine can process a separate data
+        block in parallel. With 16-core machines on a 10-node cluster, a query could process up to 160 GB fully in
+        parallel. If there are only a few data files per partition, not only are most cluster nodes sitting idle
+        during queries, so are most cores on those machines.
+      </p>
+
+      <p class="p">
+        You can reduce the Parquet block size to as low as 128 MB or 64 MB to increase the number of files per
+        partition and improve parallelism. But also consider reducing the level of partitioning so that analytic
+        queries have enough data to work with.
+      </p>
+    </section>
+
+    <section class="section" id="schema_design__schema_design_compute_stats"><h2 class="title sectiontitle">Always compute stats after loading data.</h2>
+
+      
+
+      <p class="p">
+        Impala makes extensive use of statistics about data in the overall table and in each column, to help plan
+        resource-intensive operations such as join queries and inserting into partitioned Parquet tables. Because
+        this information is only available after data is loaded, run the <code class="ph codeph">COMPUTE STATS</code> statement
+        on a table after loading or replacing data in a table or partition.
+      </p>
+
+      <p class="p">
+        Having accurate statistics can make the difference between a successful operation, or one that fails due to
+        an out-of-memory error or a timeout. When you encounter performance or capacity issues, always use the
+        <code class="ph codeph">SHOW STATS</code> statement to check if the statistics are present and up-to-date for all tables
+        in the query.
+      </p>
+
+      <p class="p">
+        When doing a join query, Impala consults the statistics for each joined table to determine their relative
+        sizes and to estimate the number of rows produced in each join stage. When doing an <code class="ph codeph">INSERT</code>
+        into a Parquet table, Impala consults the statistics for the source table to determine how to distribute
+        the work of constructing the data files for each partition.
+      </p>
+
+      <p class="p">
+        See <a class="xref" href="impala_compute_stats.html#compute_stats">COMPUTE STATS Statement</a> for the syntax of the <code class="ph codeph">COMPUTE
+        STATS</code> statement, and <a class="xref" href="impala_perf_stats.html#perf_stats">Table and Column Statistics</a> for all the performance
+        considerations for table and column statistics.
+      </p>
+    </section>
+
+    <section class="section" id="schema_design__schema_design_explain"><h2 class="title sectiontitle">Verify sensible execution plans with EXPLAIN and SUMMARY.</h2>
+
+      
+
+      <p class="p">
+        Before executing a resource-intensive query, use the <code class="ph codeph">EXPLAIN</code> statement to get an overview
+        of how Impala intends to parallelize the query and distribute the work. If you see that the query plan is
+        inefficient, you can take tuning steps such as changing file formats, using partitioned tables, running the
+        <code class="ph codeph">COMPUTE STATS</code> statement, or adding query hints. For information about all of these
+        techniques, see <a class="xref" href="impala_performance.html#performance">Tuning Impala for Performance</a>.
+      </p>
+
+      <p class="p">
+        After you run a query, you can see performance-related information about how it actually ran by issuing the
+        <code class="ph codeph">SUMMARY</code> command in <span class="keyword cmdname">impala-shell</span>. Prior to Impala 1.4, you would use
+        the <code class="ph codeph">PROFILE</code> command, but its highly technical output was only useful for the most
+        experienced users. <code class="ph codeph">SUMMARY</code>, new in Impala 1.4, summarizes the most useful information for
+        all stages of execution, for all nodes rather than splitting out figures for each node.
+      </p>
+    </section>
+
+
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_planning.html">Planning for Impala Deployment</a></div></div></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_schema_objects.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_schema_objects.html b/docs/build/html/topics/impala_schema_objects.html
new file mode 100644
index 0000000..b8ea7cd
--- /dev/null
+++ b/docs/build/html/topics/impala_schema_objects.html
@@ -0,0 +1,48 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_langref.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_aliases.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_databases.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_functions_overview.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_identifiers.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_tables.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_views.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name=
 "DC.Format" content="XHTML"><meta name="DC.Identifier" content="schema_objects"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Impala Schema Objects and Object Names</title></head><body id="schema_objects"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Impala Schema Objects and Object Names</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      
+      With Impala, you work with schema objects that are familiar to database users: primarily databases, tables, views,
+      and functions. The SQL syntax to work with these objects is explained in
+      <a class="xref" href="impala_langref_sql.html#langref_sql">Impala SQL Statements</a>. This section explains the conceptual knowledge you need to
+      work with these objects and the various ways to specify their names.
+    </p>
+
+    <p class="p">
+      Within a table, partitions can also be considered a kind of object. Partitioning is an important subject for
+      Impala, with its own documentation section covering use cases and performance considerations. See
+      <a class="xref" href="impala_partitioning.html#partitioning">Partitioning for Impala Tables</a> for details.
+    </p>
+
+    <p class="p">
+      Impala does not have a counterpart of the <span class="q">"tablespace"</span> notion from some database systems. By default,
+      all the data files for a database, table, or partition are located within nested folders within the HDFS file
+      system. You can also specify a particular HDFS location for a given Impala table or partition. The raw data
+      for these objects is represented as a collection of data files, providing the flexibility to load data by
+      simply moving files into the expected HDFS location.
+    </p>
+
+    <p class="p">
+      Information about the schema objects is held in the
+      <a class="xref" href="impala_hadoop.html#intro_metastore">metastore</a> database. This database is shared between
+      Impala and Hive, allowing each to create, drop, and query each other's databases, tables, and so on. When
+      Impala makes a change to schema objects through a <code class="ph codeph">CREATE</code>, <code class="ph codeph">ALTER</code>,
+      <code class="ph codeph">DROP</code>, <code class="ph codeph">INSERT</code>, or <code class="ph codeph">LOAD DATA</code> statement, it broadcasts those
+      changes to all nodes in the cluster through the <a class="xref" href="impala_components.html#intro_catalogd">catalog
+      service</a>. When you make such changes through Hive or directly through manipulating HDFS files, you use
+      the <a class="xref" href="impala_refresh.html#refresh">REFRESH</a> or
+      <a class="xref" href="impala_invalidate_metadata.html#invalidate_metadata">INVALIDATE METADATA</a> statements on the
+      Impala side to recognize the newly loaded data, new tables, and so on.
+    </p>
+
+    <p class="p toc"></p>
+  </div>
+<nav role="navigation" class="related-links"><ul class="ullinks"><li class="link ulchildlink"><strong><a href="../topics/impala_aliases.html">Overview of Impala Aliases</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_databases.html">Overview of Impala Databases</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_functions_overview.html">Overview of Impala Functions</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_identifiers.html">Overview of Impala Identifiers</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_tables.html">Overview of Impala Tables</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_views.html">Overview of Impala Views</a></strong><br></li></ul><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_langref.html">Impala SQL Language Refer
 ence</a></div></div></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_scratch_limit.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_scratch_limit.html b/docs/build/html/topics/impala_scratch_limit.html
new file mode 100644
index 0000000..98bac93
--- /dev/null
+++ b/docs/build/html/topics/impala_scratch_limit.html
@@ -0,0 +1,77 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="scratch_limit"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>SCRATCH_LIMIT Query Option</title></head><body id="scratch_limit"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">SCRATCH_LIMIT Query Option</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      
+      Specifies the maximum amount of disk storage, in bytes, that any Impala query can consume
+      on any host using the <span class="q">"spill to disk"</span> mechanism that handles queries that exceed
+      the memory limit.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+    <p class="p">
+      Specify the size in bytes, or with a trailing <code class="ph codeph">m</code> or <code class="ph codeph">g</code> character to indicate
+      megabytes or gigabytes. For example:
+    </p>
+
+
+<pre class="pre codeblock"><code>-- 128 megabytes.
+set SCRATCH_LIMIT=134217728
+
+-- 512 megabytes.
+set SCRATCH_LIMIT=512m;
+
+-- 1 gigabyte.
+set SCRATCH_LIMIT=1g;
+</code></pre>
+
+    <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+    <p class="p">
+      A value of zero turns off the spill to disk feature for queries
+      in the current session, causing them to fail immediately if they
+      exceed the memory limit.
+    </p>
+
+    <p class="p">
+      The amount of memory used per host for a query is limited by the
+      <code class="ph codeph">MEM_LIMIT</code> query option.
+    </p>
+
+    <p class="p">
+      The more Impala daemon hosts in the cluster, the less memory is used on each host,
+      and therefore also less scratch space is required for queries that
+      exceed the memory limit.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Type:</strong> numeric, with optional unit specifier
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Default:</strong> -1 (amount of spill space is unlimited)
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+    <p class="p">
+      <a class="xref" href="impala_scalability.html#spill_to_disk">SQL Operations that Spill to Disk</a>,
+      <a class="xref" href="impala_mem_limit.html#mem_limit">MEM_LIMIT Query Option</a>
+    </p>
+
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_security.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_security.html b/docs/build/html/topics/impala_security.html
new file mode 100644
index 0000000..45d9923
--- /dev/null
+++ b/docs/build/html/topics/impala_security.html
@@ -0,0 +1,99 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_security_guidelines.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_security_files.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_security_install.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_security_metastore.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_security_webui.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_ssl.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_authorization.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_authentication.html"><meta name="DC.Relation" scheme="URI" content="../topics/i
 mpala_auditing.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_lineage.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="security"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Impala Security</title></head><body id="security"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1"><span class="ph">Impala Security</span></h1>
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      Impala includes a fine-grained authorization framework for Hadoop, based on Apache Sentry.
+      Sentry authorization was added in Impala 1.1.0. Together with the Kerberos
+      authentication framework, Sentry takes Hadoop security to a new level needed for the requirements of
+      highly regulated industries such as healthcare, financial services, and government. Impala also includes
+      an auditing capability which was added in Impala 1.1.1; Impala generates the audit data which can be
+      consumed, filtered, and visualized by cluster-management components focused on governance.
+    </p>
+
+    <p class="p">
+      The Impala security features have several objectives. At the most basic level, security prevents
+      accidents or mistakes that could disrupt application processing, delete or corrupt data, or reveal data to
+      unauthorized users. More advanced security features and practices can harden the system against malicious
+      users trying to gain unauthorized access or perform other disallowed operations. The auditing feature
+      provides a way to confirm that no unauthorized access occurred, and detect whether any such attempts were
+      made. This is a critical set of features for production deployments in large organizations that handle
+      important or sensitive data. It sets the stage for multi-tenancy, where multiple applications run
+      concurrently and are prevented from interfering with each other.
+    </p>
+
+    <p class="p">
+      The material in this section presumes that you are already familiar with administering secure Linux systems.
+      That is, you should know the general security practices for Linux and Hadoop, and their associated commands
+      and configuration files. For example, you should know how to create Linux users and groups, manage Linux
+      group membership, set Linux and HDFS file permissions and ownership, and designate the default permissions
+      and ownership for new files. You should be familiar with the configuration of the nodes in your Hadoop
+      cluster, and know how to apply configuration changes or run a set of commands across all the nodes.
+    </p>
+
+    <p class="p">
+      The security features are divided into these broad categories:
+    </p>
+
+    <dl class="dl">
+      
+
+        <dt class="dt dlterm">
+          authorization
+        </dt>
+
+        <dd class="dd">
+          Which users are allowed to access which resources, and what operations are they allowed to perform?
+          Impala relies on the open source Sentry project for authorization. By default (when authorization is not
+          enabled), Impala does all read and write operations with the privileges of the <code class="ph codeph">impala</code>
+          user, which is suitable for a development/test environment but not for a secure production environment.
+          When authorization is enabled, Impala uses the OS user ID of the user who runs
+          <span class="keyword cmdname">impala-shell</span> or other client program, and associates various privileges with each
+          user. See <a class="xref" href="impala_authorization.html#authorization">Enabling Sentry Authorization for Impala</a> for details about setting up and managing
+          authorization.
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm">
+          authentication
+        </dt>
+
+        <dd class="dd">
+          How does Impala verify the identity of the user to confirm that they really are allowed to exercise the
+          privileges assigned to that user? Impala relies on the Kerberos subsystem for authentication. See
+          <a class="xref" href="impala_kerberos.html#kerberos">Enabling Kerberos Authentication for Impala</a> for details about setting up and managing authentication.
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm">
+          auditing
+        </dt>
+
+        <dd class="dd">
+          What operations were attempted, and did they succeed or not? This feature provides a way to look back and
+          diagnose whether attempts were made to perform unauthorized operations. You use this information to track
+          down suspicious activity, and to see where changes are needed in authorization policies. The audit data
+          produced by this feature can be collected and presented in a user-friendly form by cluster-management
+          software. See <a class="xref" href="impala_auditing.html#auditing">Auditing Impala Operations</a> for details about setting up and managing
+          auditing.
+        </dd>
+
+      
+    </dl>
+
+    <p class="p toc"></p>
+
+    
+  </div>
+<nav role="navigation" class="related-links"><ul class="ullinks"><li class="link ulchildlink"><strong><a href="../topics/impala_security_guidelines.html">Security Guidelines for Impala</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_security_files.html">Securing Impala Data and Log Files</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_security_install.html">Installation Considerations for Impala Security</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_security_metastore.html">Securing the Hive Metastore Database</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_security_webui.html">Securing the Impala Web User Interface</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_ssl.html">Configuring TLS/SSL for Impala</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_authorization.html
 ">Enabling Sentry Authorization for Impala</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_authentication.html">Impala Authentication</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_auditing.html">Auditing Impala Operations</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_lineage.html">Viewing Lineage Information for Impala Data</a></strong><br></li></ul></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_security_files.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_security_files.html b/docs/build/html/topics/impala_security_files.html
new file mode 100644
index 0000000..e980e60
--- /dev/null
+++ b/docs/build/html/topics/impala_security_files.html
@@ -0,0 +1,58 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_security.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="secure_files"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Securing Impala Data and Log Files</title></head><body id="secure_files"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Securing Impala Data and Log Files</h1>
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      One aspect of security is to protect files from unauthorized access at the filesystem level. For example, if
+      you store sensitive data in HDFS, you specify permissions on the associated files and directories in HDFS to
+      restrict read and write permissions to the appropriate users and groups.
+    </p>
+
+    <p class="p">
+      If you issue queries containing sensitive values in the <code class="ph codeph">WHERE</code> clause, such as financial
+      account numbers, those values are stored in Impala log files in the Linux filesystem and you must secure
+      those files also. For the locations of Impala log files, see <a class="xref" href="impala_logging.html#logging">Using Impala Logging</a>.
+    </p>
+
+    <p class="p">
+      All Impala read and write operations are performed under the filesystem privileges of the
+      <code class="ph codeph">impala</code> user. The <code class="ph codeph">impala</code> user must be able to read all directories and data
+      files that you query, and write into all the directories and data files for <code class="ph codeph">INSERT</code> and
+      <code class="ph codeph">LOAD DATA</code> statements. At a minimum, make sure the <code class="ph codeph">impala</code> user is in the
+      <code class="ph codeph">hive</code> group so that it can access files and directories shared between Impala and Hive. See
+      <a class="xref" href="impala_prereqs.html#prereqs_account">User Account Requirements</a> for more details.
+    </p>
+
+    <p class="p">
+      Setting file permissions is necessary for Impala to function correctly, but is not an effective security
+      practice by itself:
+    </p>
+
+    <ul class="ul">
+      <li class="li">
+      <p class="p">
+        The way to ensure that only authorized users can submit requests for databases and tables they are allowed
+        to access is to set up Sentry authorization, as explained in
+        <a class="xref" href="impala_authorization.html#authorization">Enabling Sentry Authorization for Impala</a>. With authorization enabled, the checking of the user
+        ID and group is done by Impala, and unauthorized access is blocked by Impala itself. The actual low-level
+        read and write requests are still done by the <code class="ph codeph">impala</code> user, so you must have appropriate
+        file and directory permissions for that user ID.
+      </p>
+      </li>
+
+      <li class="li">
+      <p class="p">
+        You must also set up Kerberos authentication, as described in <a class="xref" href="impala_kerberos.html#kerberos">Enabling Kerberos Authentication for Impala</a>,
+        so that users can only connect from trusted hosts. With Kerberos enabled, if someone connects a new host to
+        the network and creates user IDs that match your privileged IDs, they will be blocked from connecting to
+        Impala at all from that host.
+      </p>
+      </li>
+    </ul>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_security.html">Impala Security</a></div></div></nav></article></main></body></html>
\ No newline at end of file

[15/51] [partial] incubator-impala git commit: IMPALA-4181 [DOCS] Publish rendered Impala documentation to ASF site

Posted by jb...@apache.org.

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_perf_hdfs_caching.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_perf_hdfs_caching.html b/docs/build/html/topics/impala_perf_hdfs_caching.html
new file mode 100644
index 0000000..9de003e
--- /dev/null
+++ b/docs/build/html/topics/impala_perf_hdfs_caching.html
@@ -0,0 +1,578 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_performance.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="hdfs_caching"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Using HDFS Caching with Impala (Impala 2.1 or higher only)</title></head><body id="hdfs_caching"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Using HDFS Caching with Impala (<span class="keyword">Impala 2.1</span> or higher only)</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      HDFS caching provides performance and scalability benefits in production environments where Impala queries
+      and other Hadoop jobs operate on quantities of data much larger than the physical RAM on the DataNodes,
+      making it impractical to rely on the Linux OS cache, which only keeps the most recently used data in memory.
+      Data read from the HDFS cache avoids the overhead of checksumming and memory-to-memory copying involved when
+      using data from the Linux OS cache.
+    </p>
+
+    <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+      <p class="p">
+        On a small or lightly loaded cluster, HDFS caching might not produce any speedup. It might even lead to
+        slower queries, if I/O read operations that were performed in parallel across the entire cluster are replaced by in-memory
+        operations operating on a smaller number of hosts. The hosts where the HDFS blocks are cached can become
+        bottlenecks because they experience high CPU load while processing the cached data blocks, while other hosts remain idle.
+        Therefore, always compare performance with and without this feature enabled, using a realistic workload.
+      </p>
+      <p class="p">
+        In <span class="keyword">Impala 2.2</span> and higher, you can spread the CPU load more evenly by specifying the <code class="ph codeph">WITH REPLICATION</code>
+        clause of the <code class="ph codeph">CREATE TABLE</code> and <code class="ph codeph">ALTER TABLE</code> statements.
+        This clause lets you control the replication factor for
+        HDFS caching for a specific table or partition. By default, each cached block is
+        only present on a single host, which can lead to CPU contention if the same host
+        processes each cached block. Increasing the replication factor lets Impala choose
+        different hosts to process different cached blocks, to better distribute the CPU load.
+        Always use a <code class="ph codeph">WITH REPLICATION</code> setting of at least 3, and adjust upward
+        if necessary to match the replication factor for the underlying HDFS data files.
+      </p>
+      <p class="p">
+        In <span class="keyword">Impala 2.5</span> and higher, Impala automatically randomizes which host processes
+        a cached HDFS block, to avoid CPU hotspots. For tables where HDFS caching is not applied,
+        Impala designates which host to process a data block using an algorithm that estimates
+        the load on each host. If CPU hotspots still arise during queries,
+        you can enable additional randomization for the scheduling algorithm for non-HDFS cached data
+        by setting the <code class="ph codeph">SCHEDULE_RANDOM_REPLICA</code> query option.
+      </p>
+    </div>
+
+    <p class="p toc inpage"></p>
+
+
+
+    <p class="p">
+      For background information about how to set up and manage HDFS caching for a <span class="keyword"></span> cluster, see
+      <span class="xref">the documentation for your Apache Hadoop distribution</span>.
+    </p>
+  </div>
+
+  <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_performance.html">Tuning Impala for Performance</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="hdfs_caching__hdfs_caching_overview">
+
+    <h2 class="title topictitle2" id="ariaid-title2">Overview of HDFS Caching for Impala</h2>
+  
+
+    <div class="body conbody">
+
+      <p class="p">
+        In <span class="keyword">Impala 1.4</span> and higher, Impala can use the HDFS caching feature to make more effective use of RAM, so that
+        repeated queries can take advantage of data <span class="q">"pinned"</span> in memory regardless of how much data is
+        processed overall. The HDFS caching feature lets you designate a subset of frequently accessed data to be
+        pinned permanently in memory, remaining in the cache across multiple queries and never being evicted. This
+        technique is suitable for tables or partitions that are frequently accessed and are small enough to fit
+        entirely within the HDFS memory cache. For example, you might designate several dimension tables to be
+        pinned in the cache, to speed up many different join queries that reference them. Or in a partitioned
+        table, you might pin a partition holding data from the most recent time period because that data will be
+        queried intensively; then when the next set of data arrives, you could unpin the previous partition and pin
+        the partition holding the new data.
+      </p>
+
+      <p class="p">
+        Because this Impala performance feature relies on HDFS infrastructure, it only applies to Impala tables
+        that use HDFS data files. HDFS caching for Impala does not apply to HBase tables, S3 tables,
+        Kudu tables,
+        or Isilon tables.
+      </p>
+
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="hdfs_caching__hdfs_caching_prereqs">
+
+    <h2 class="title topictitle2" id="ariaid-title3">Setting Up HDFS Caching for Impala</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        To use HDFS caching with Impala, first set up that feature for your <span class="keyword"></span> cluster:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          <p class="p">
+          Decide how much memory to devote to the HDFS cache on each host. Remember that the total memory available
+          for cached data is the sum of the cache sizes on all the hosts. By default, any data block is only cached on one
+          host, although you can cache a block across multiple hosts by increasing the replication factor.
+          
+          </p>
+        </li>
+
+        <li class="li">
+          <div class="p">
+          Issue <span class="keyword cmdname">hdfs cacheadmin</span> commands to set up one or more cache pools, owned by the same
+          user as the <span class="keyword cmdname">impalad</span> daemon (typically <code class="ph codeph">impala</code>). For example:
+<pre class="pre codeblock"><code>hdfs cacheadmin -addPool four_gig_pool -owner impala -limit 4000000000
+</code></pre>
+          For details about the <span class="keyword cmdname">hdfs cacheadmin</span> command, see
+          <span class="xref">the documentation for your Apache Hadoop distribution</span>.
+          </div>
+        </li>
+      </ul>
+
+      <p class="p">
+        Once HDFS caching is enabled and one or more pools are available, see
+        <a class="xref" href="impala_perf_hdfs_caching.html#hdfs_caching_ddl">Enabling HDFS Caching for Impala Tables and Partitions</a> for how to choose which Impala data to load
+        into the HDFS cache. On the Impala side, you specify the cache pool name defined by the <code class="ph codeph">hdfs
+        cacheadmin</code> command in the Impala DDL statements that enable HDFS caching for a table or partition,
+        such as <code class="ph codeph">CREATE TABLE ... CACHED IN <var class="keyword varname">pool</var></code> or <code class="ph codeph">ALTER TABLE ... SET
+        CACHED IN <var class="keyword varname">pool</var></code>.
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="hdfs_caching__hdfs_caching_ddl">
+
+    <h2 class="title topictitle2" id="ariaid-title4">Enabling HDFS Caching for Impala Tables and Partitions</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        Begin by choosing which tables or partitions to cache. For example, these might be lookup tables that are
+        accessed by many different join queries, or partitions corresponding to the most recent time period that
+        are analyzed by different reports or ad hoc queries.
+      </p>
+
+      <p class="p">
+        In your SQL statements, you specify logical divisions such as tables and partitions to be cached. Impala
+        translates these requests into HDFS-level directives that apply to particular directories and files. For
+        example, given a partitioned table <code class="ph codeph">CENSUS</code> with a partition key column
+        <code class="ph codeph">YEAR</code>, you could choose to cache all or part of the data as follows:
+      </p>
+
+      <p class="p">
+        In <span class="keyword">Impala 2.2</span> and higher, the optional <code class="ph codeph">WITH REPLICATION</code> clause
+        for <code class="ph codeph">CREATE TABLE</code> and <code class="ph codeph">ALTER TABLE</code> lets you specify
+        a <dfn class="term">replication factor</dfn>, the number of hosts on which to cache the same data blocks.
+        When Impala processes a cached data block, where the cache replication factor is greater than 1, Impala randomly
+        selects a host that has a cached copy of that data block. This optimization avoids excessive CPU
+        usage on a single host when the same cached data block is processed multiple times.
+        Where practical, specify a value greater than or equal to the HDFS block replication factor.
+      </p>
+
+<pre class="pre codeblock"><code>-- Cache the entire table (all partitions).
+alter table census set cached in '<var class="keyword varname">pool_name</var>';
+
+-- Remove the entire table from the cache.
+alter table census set uncached;
+
+-- Cache a portion of the table (a single partition).
+-- If the table is partitioned by multiple columns (such as year, month, day),
+-- the ALTER TABLE command must specify values for all those columns.
+alter table census partition (year=1960) set cached in '<var class="keyword varname">pool_name</var>';
+
+<span class="ph">-- Cache the data from one partition on up to 4 hosts, to minimize CPU load on any
+-- single host when the same data block is processed multiple times.
+alter table census partition (year=1970)
+  set cached in '<var class="keyword varname">pool_name</var>' with replication = 4;</span>
+
+-- At each stage, check the volume of cached data.
+-- For large tables or partitions, the background loading might take some time,
+-- so you might have to wait and reissue the statement until all the data
+-- has finished being loaded into the cache.
+show table stats census;
++-------+-------+--------+------+--------------+--------+
+| year  | #Rows | #Files | Size | Bytes Cached | Format |
++-------+-------+--------+------+--------------+--------+
+| 1900  | -1    | 1      | 11B  | NOT CACHED   | TEXT   |
+| 1940  | -1    | 1      | 11B  | NOT CACHED   | TEXT   |
+| 1960  | -1    | 1      | 11B  | 11B          | TEXT   |
+| 1970  | -1    | 1      | 11B  | NOT CACHED   | TEXT   |
+| Total | -1    | 4      | 44B  | 11B          |        |
++-------+-------+--------+------+--------------+--------+
+</code></pre>
+
+      <p class="p">
+        <strong class="ph b">CREATE TABLE considerations:</strong>
+      </p>
+
+      <p class="p">
+        The HDFS caching feature affects the Impala <code class="ph codeph">CREATE TABLE</code> statement as follows:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+        <p class="p">
+          You can put a <code class="ph codeph">CACHED IN '<var class="keyword varname">pool_name</var>'</code> clause
+          <span class="ph">and optionally a <code class="ph codeph">WITH REPLICATION = <var class="keyword varname">number_of_hosts</var></code> clause</span>
+          at the end of a
+          <code class="ph codeph">CREATE TABLE</code> statement to automatically cache the entire contents of the table,
+          including any partitions added later. The <var class="keyword varname">pool_name</var> is a pool that you previously set
+          up with the <span class="keyword cmdname">hdfs cacheadmin</span> command.
+        </p>
+        </li>
+
+        <li class="li">
+        <p class="p">
+          Once a table is designated for HDFS caching through the <code class="ph codeph">CREATE TABLE</code> statement, if new
+          partitions are added later through <code class="ph codeph">ALTER TABLE ... ADD PARTITION</code> statements, the data in
+          those new partitions is automatically cached in the same pool.
+        </p>
+        </li>
+
+        <li class="li">
+        <p class="p">
+          If you want to perform repetitive queries on a subset of data from a large table, and it is not practical
+          to designate the entire table or specific partitions for HDFS caching, you can create a new cached table
+          with just a subset of the data by using <code class="ph codeph">CREATE TABLE ... CACHED IN '<var class="keyword varname">pool_name</var>'
+          AS SELECT ... WHERE ...</code>. When you are finished with generating reports from this subset of data,
+          drop the table and both the data files and the data cached in RAM are automatically deleted.
+        </p>
+        </li>
+      </ul>
+
+      <p class="p">
+        See <a class="xref" href="impala_create_table.html#create_table">CREATE TABLE Statement</a> for the full syntax.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Other memory considerations:</strong>
+      </p>
+
+      <p class="p">
+        Certain DDL operations, such as <code class="ph codeph">ALTER TABLE ... SET LOCATION</code>, are blocked while the
+        underlying HDFS directories contain cached files. You must uncache the files first, before changing the
+        location, dropping the table, and so on.
+      </p>
+
+      <p class="p">
+        When data is requested to be pinned in memory, that process happens in the background without blocking
+        access to the data while the caching is in progress. Loading the data from disk could take some time.
+        Impala reads each HDFS data block from memory if it has been pinned already, or from disk if it has not
+        been pinned yet. When files are added to a table or partition whose contents are cached, Impala
+        automatically detects those changes and performs a <code class="ph codeph">REFRESH</code> automatically once the relevant
+        data is cached.
+      </p>
+
+      <p class="p">
+        The amount of data that you can pin on each node through the HDFS caching mechanism is subject to a quota
+        that is enforced by the underlying HDFS service. Before requesting to pin an Impala table or partition in
+        memory, check that its size does not exceed this quota.
+      </p>
+
+      <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+        Because the HDFS cache consists of combined memory from all the DataNodes in the cluster, cached tables or
+        partitions can be bigger than the amount of HDFS cache memory on any single host.
+      </div>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title5" id="hdfs_caching__hdfs_caching_etl">
+
+    <h2 class="title topictitle2" id="ariaid-title5">Loading and Removing Data with HDFS Caching Enabled</h2>
+  
+
+    <div class="body conbody">
+
+      <p class="p">
+        When HDFS caching is enabled, extra processing happens in the background when you add or remove data
+        through statements such as <code class="ph codeph">INSERT</code> and <code class="ph codeph">DROP TABLE</code>.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Inserting or loading data:</strong>
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          When Impala performs an <code class="ph codeph"><a class="xref" href="impala_insert.html#insert">INSERT</a></code> or
+          <code class="ph codeph"><a class="xref" href="impala_load_data.html#load_data">LOAD DATA</a></code> statement for a table or
+          partition that is cached, the new data files are automatically cached and Impala recognizes that fact
+          automatically.
+        </li>
+
+        <li class="li">
+          If you perform an <code class="ph codeph">INSERT</code> or <code class="ph codeph">LOAD DATA</code> through Hive, as always, Impala
+          only recognizes the new data files after a <code class="ph codeph">REFRESH <var class="keyword varname">table_name</var></code>
+          statement in Impala.
+        </li>
+
+        <li class="li">
+          If the cache pool is entirely full, or becomes full before all the requested data can be cached, the
+          Impala DDL statement returns an error. This is to avoid situations where only some of the requested data
+          could be cached.
+        </li>
+
+        <li class="li">
+          When HDFS caching is enabled for a table or partition, new data files are cached automatically when they
+          are added to the appropriate directory in HDFS, without the need for a <code class="ph codeph">REFRESH</code> statement
+          in Impala. Impala automatically performs a <code class="ph codeph">REFRESH</code> once the new data is loaded into the
+          HDFS cache.
+        </li>
+      </ul>
+
+      <p class="p">
+        <strong class="ph b">Dropping tables, partitions, or cache pools:</strong>
+      </p>
+
+      <p class="p">
+        The HDFS caching feature interacts with the Impala
+        <code class="ph codeph"><a class="xref" href="impala_drop_table.html#drop_table">DROP TABLE</a></code> and
+        <code class="ph codeph"><a class="xref" href="impala_alter_table.html#alter_table">ALTER TABLE ... DROP PARTITION</a></code>
+        statements as follows:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          When you issue a <code class="ph codeph">DROP TABLE</code> for a table that is entirely cached, or has some partitions
+          cached, the <code class="ph codeph">DROP TABLE</code> succeeds and all the cache directives Impala submitted for that
+          table are removed from the HDFS cache system.
+        </li>
+
+        <li class="li">
+          The same applies to <code class="ph codeph">ALTER TABLE ... DROP PARTITION</code>. The operation succeeds and any cache
+          directives are removed.
+        </li>
+
+        <li class="li">
+          As always, the underlying data files are removed if the dropped table is an internal table, or the
+          dropped partition is in its default location underneath an internal table. The data files are left alone
+          if the dropped table is an external table, or if the dropped partition is in a non-default location.
+        </li>
+
+        <li class="li">
+          If you designated the data files as cached through the <span class="keyword cmdname">hdfs cacheadmin</span> command, and
+          the data files are left behind as described in the previous item, the data files remain cached. Impala
+          only removes the cache directives submitted by Impala through the <code class="ph codeph">CREATE TABLE</code> or
+          <code class="ph codeph">ALTER TABLE</code> statements. It is OK to have multiple redundant cache directives pertaining
+          to the same files; the directives all have unique IDs and owners so that the system can tell them apart.
+        </li>
+
+        <li class="li">
+          If you drop an HDFS cache pool through the <span class="keyword cmdname">hdfs cacheadmin</span> command, all the Impala
+          data files are preserved, just no longer cached. After a subsequent <code class="ph codeph">REFRESH</code>,
+          <code class="ph codeph">SHOW TABLE STATS</code> reports 0 bytes cached for each associated Impala table or partition.
+        </li>
+      </ul>
+
+      <p class="p">
+        <strong class="ph b">Relocating a table or partition:</strong>
+      </p>
+
+      <p class="p">
+        The HDFS caching feature interacts with the Impala
+        <code class="ph codeph"><a class="xref" href="impala_alter_table.html#alter_table">ALTER TABLE ... SET LOCATION</a></code>
+        statement as follows:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          If you have designated a table or partition as cached through the <code class="ph codeph">CREATE TABLE</code> or
+          <code class="ph codeph">ALTER TABLE</code> statements, subsequent attempts to relocate the table or partition through
+          an <code class="ph codeph">ALTER TABLE ... SET LOCATION</code> statement will fail. You must issue an <code class="ph codeph">ALTER
+          TABLE ... SET UNCACHED</code> statement for the table or partition first. Otherwise, Impala would lose
+          track of some cached data files and have no way to uncache them later.
+        </li>
+      </ul>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title6" id="hdfs_caching__hdfs_caching_admin">
+
+    <h2 class="title topictitle2" id="ariaid-title6">Administration for HDFS Caching with Impala</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        Here are the guidelines and steps to check or change the status of HDFS caching for Impala data:
+      </p>
+
+      <p class="p">
+        <strong class="ph b">hdfs cacheadmin command:</strong>
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          If you drop a cache pool with the <span class="keyword cmdname">hdfs cacheadmin</span> command, Impala queries against the
+          associated data files will still work, by falling back to reading the files from disk. After performing a
+          <code class="ph codeph">REFRESH</code> on the table, Impala reports the number of bytes cached as 0 for all associated
+          tables and partitions.
+        </li>
+
+        <li class="li">
+          You might use <span class="keyword cmdname">hdfs cacheadmin</span> to get a list of existing cache pools, or detailed
+          information about the pools, as follows:
+<pre class="pre codeblock"><code>hdfs cacheadmin -listDirectives         # Basic info
+Found 122 entries
+  ID POOL       REPL EXPIRY  PATH
+ 123 testPool      1 never   /user/hive/warehouse/tpcds.store_sales
+ 124 testPool      1 never   /user/hive/warehouse/tpcds.store_sales/ss_date=1998-01-15
+ 125 testPool      1 never   /user/hive/warehouse/tpcds.store_sales/ss_date=1998-02-01
+...
+
+hdfs cacheadmin -listDirectives -stats  # More details
+Found 122 entries
+  ID POOL       REPL EXPIRY  PATH                                                        BYTES_NEEDED  BYTES_CACHED  FILES_NEEDED  FILES_CACHED
+ 123 testPool      1 never   /user/hive/warehouse/tpcds.store_sales                                 0             0             0             0
+ 124 testPool      1 never   /user/hive/warehouse/tpcds.store_sales/ss_date=1998-01-15         143169        143169             1             1
+ 125 testPool      1 never   /user/hive/warehouse/tpcds.store_sales/ss_date=1998-02-01         112447        112447             1             1
+...
+</code></pre>
+        </li>
+      </ul>
+
+      <p class="p">
+        <strong class="ph b">Impala SHOW statement:</strong>
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          For each table or partition, the <code class="ph codeph">SHOW TABLE STATS</code> or <code class="ph codeph">SHOW PARTITIONS</code>
+          statement displays the number of bytes currently cached by the HDFS caching feature. If there are no
+          cache directives in place for that table or partition, the result set displays <code class="ph codeph">NOT
+          CACHED</code>. A value of 0, or a smaller number than the overall size of the table or partition,
+          indicates that the cache request has been submitted but the data has not been entirely loaded into memory
+          yet. See <a class="xref" href="impala_show.html#show">SHOW Statement</a> for details.
+        </li>
+      </ul>
+
+      <p class="p">
+        <strong class="ph b">Impala memory limits:</strong>
+      </p>
+
+      <p class="p">
+        The Impala HDFS caching feature interacts with the Impala memory limits as follows:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          The maximum size of each HDFS cache pool is specified externally to Impala, through the <span class="keyword cmdname">hdfs
+          cacheadmin</span> command.
+        </li>
+
+        <li class="li">
+          All the memory used for HDFS caching is separate from the <span class="keyword cmdname">impalad</span> daemon address space
+          and does not count towards the limits of the <code class="ph codeph">--mem_limit</code> startup option,
+          <code class="ph codeph">MEM_LIMIT</code> query option, or further limits imposed through YARN resource management or
+          the Linux <code class="ph codeph">cgroups</code> mechanism.
+        </li>
+
+        <li class="li">
+          Because accessing HDFS cached data avoids a memory-to-memory copy operation, queries involving cached
+          data require less memory on the Impala side than the equivalent queries on uncached data. In addition to
+          any performance benefits in a single-user environment, the reduced memory helps to improve scalability
+          under high-concurrency workloads.
+        </li>
+      </ul>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title7" id="hdfs_caching__hdfs_caching_performance">
+
+    <h2 class="title topictitle2" id="ariaid-title7">Performance Considerations for HDFS Caching with Impala</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        In Impala 1.4.0 and higher, Impala supports efficient reads from data that is pinned in memory through HDFS
+        caching. Impala takes advantage of the HDFS API and reads the data from memory rather than from disk
+        whether the data files are pinned using Impala DDL statements, or using the command-line mechanism where
+        you specify HDFS paths.
+      </p>
+
+      <p class="p">
+        When you examine the output of the <span class="keyword cmdname">impala-shell</span> <span class="keyword cmdname">SUMMARY</span> command, or
+        look in the metrics report for the <span class="keyword cmdname">impalad</span> daemon, you see how many bytes are read from
+        the HDFS cache. For example, this excerpt from a query profile illustrates that all the data read during a
+        particular phase of the query came from the HDFS cache, because the <code class="ph codeph">BytesRead</code> and
+        <code class="ph codeph">BytesReadDataNodeCache</code> values are identical.
+      </p>
+
+<pre class="pre codeblock"><code>HDFS_SCAN_NODE (id=0):(Total: 11s114ms, non-child: 11s114ms, % non-child: 100.00%)
+        - AverageHdfsReadThreadConcurrency: 0.00
+        - AverageScannerThreadConcurrency: 32.75
+<strong class="ph b">        - BytesRead: 10.47 GB (11240756479)
+        - BytesReadDataNodeCache: 10.47 GB (11240756479)</strong>
+        - BytesReadLocal: 10.47 GB (11240756479)
+        - BytesReadShortCircuit: 10.47 GB (11240756479)
+        - DecompressionTime: 27s572ms
+</code></pre>
+
+      <p class="p">
+        For queries involving smaller amounts of data, or in single-user workloads, you might not notice a
+        significant difference in query response time with or without HDFS caching. Even with HDFS caching turned
+        off, the data for the query might still be in the Linux OS buffer cache. The benefits become clearer as
+        data volume increases, and especially as the system processes more concurrent queries. HDFS caching
+        improves the scalability of the overall system. That is, it prevents query performance from declining when
+        the workload outstrips the capacity of the Linux OS cache.
+      </p>
+
+      <p class="p">
+        Due to a limitation of HDFS, zero-copy reads are not supported with
+        encryption. Where practical, avoid HDFS caching for Impala data
+        files in encryption zones. The queries fall back to the normal read
+        path during query execution, which might cause some performance overhead.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">SELECT considerations:</strong>
+      </p>
+
+      <p class="p">
+        The Impala HDFS caching feature interacts with the
+        <code class="ph codeph"><a class="xref" href="impala_select.html#select">SELECT</a></code> statement and query performance as
+        follows:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          Impala automatically reads from memory any data that has been designated as cached and actually loaded
+          into the HDFS cache. (It could take some time after the initial request to fully populate the cache for a
+          table with large size or many partitions.) The speedup comes from two aspects: reading from RAM instead
+          of disk, and accessing the data straight from the cache area instead of copying from one RAM area to
+          another. This second aspect yields further performance improvement over the standard OS caching
+          mechanism, which still results in memory-to-memory copying of cached data.
+        </li>
+
+        <li class="li">
+          For small amounts of data, the query speedup might not be noticeable in terms of wall clock time. The
+          performance might be roughly the same with HDFS caching turned on or off, due to recently used data being
+          held in the Linux OS cache. The difference is more pronounced with:
+          <ul class="ul">
+            <li class="li">
+              Data volumes (for all queries running concurrently) that exceed the size of the Linux OS cache.
+            </li>
+
+            <li class="li">
+              A busy cluster running many concurrent queries, where the reduction in memory-to-memory copying and
+              overall memory usage during queries results in greater scalability and throughput.
+            </li>
+
+            <li class="li">
+              Thus, to really exercise and benchmark this feature in a development environment, you might need to
+              simulate realistic workloads and concurrent queries that match your production environment.
+            </li>
+
+            <li class="li">
+              One way to simulate a heavy workload on a lightly loaded system is to flush the OS buffer cache (on
+              each DataNode) between iterations of queries against the same tables or partitions:
+<pre class="pre codeblock"><code>$ sync
+$ echo 1 &gt; /proc/sys/vm/drop_caches
+</code></pre>
+            </li>
+          </ul>
+        </li>
+
+        <li class="li">
+          Impala queries take advantage of HDFS cached data regardless of whether the cache directive was issued by
+          Impala or externally through the <span class="keyword cmdname">hdfs cacheadmin</span> command, for example for an external
+          table where the cached data files might be accessed by several different Hadoop components.
+        </li>
+
+        <li class="li">
+          If your query returns a large result set, the time reported for the query could be dominated by the time
+          needed to print the results on the screen. To measure the time for the underlying query processing, query
+          the <code class="ph codeph">COUNT()</code> of the big result set, which does all the same processing but only prints a
+          single line to the screen.
+        </li>
+      </ul>
+    </div>
+  </article>
+</article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_perf_joins.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_perf_joins.html b/docs/build/html/topics/impala_perf_joins.html
new file mode 100644
index 0000000..064a8c5
--- /dev/null
+++ b/docs/build/html/topics/impala_perf_joins.html
@@ -0,0 +1,493 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_performance.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="perf_joins"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Performance Considerations for Join Queries</title></head><body id="perf_joins"><ma
 in role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Performance Considerations for Join Queries</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      Queries involving join operations often require more tuning than queries that refer to only one table. The
+      maximum size of the result set from a join query is the product of the number of rows in all the joined
+      tables. When joining several tables with millions or billions of rows, any missed opportunity to filter the
+      result set, or other inefficiency in the query, could lead to an operation that does not finish in a
+      practical time and has to be cancelled.
+    </p>
+
+    <p class="p">
+      The simplest technique for tuning an Impala join query is to collect statistics on each table involved in the
+      join using the <code class="ph codeph"><a class="xref" href="impala_compute_stats.html#compute_stats">COMPUTE STATS</a></code>
+      statement, and then let Impala automatically optimize the query based on the size of each table, number of
+      distinct values of each column, and so on. The <code class="ph codeph">COMPUTE STATS</code> statement and the join
+      optimization are new features introduced in Impala 1.2.2. For accurate statistics about each table, issue the
+      <code class="ph codeph">COMPUTE STATS</code> statement after loading the data into that table, and again if the amount of
+      data changes substantially due to an <code class="ph codeph">INSERT</code>, <code class="ph codeph">LOAD DATA</code>, adding a partition,
+      and so on.
+    </p>
+
+    <p class="p">
+      If statistics are not available for all the tables in the join query, or if Impala chooses a join order that
+      is not the most efficient, you can override the automatic join order optimization by specifying the
+      <code class="ph codeph">STRAIGHT_JOIN</code> keyword immediately after the <code class="ph codeph">SELECT</code> keyword. In this case,
+      Impala uses the order the tables appear in the query to guide how the joins are processed.
+    </p>
+
+    <p class="p">
+      When you use the <code class="ph codeph">STRAIGHT_JOIN</code> technique, you must order the tables in the join query
+      manually instead of relying on the Impala optimizer. The optimizer uses sophisticated techniques to estimate
+      the size of the result set at each stage of the join. For manual ordering, use this heuristic approach to
+      start with, and then experiment to fine-tune the order:
+    </p>
+
+    <ul class="ul">
+      <li class="li">
+        Specify the largest table first. This table is read from disk by each Impala node and so its size is not
+        significant in terms of memory usage during the query.
+      </li>
+
+      <li class="li">
+        Next, specify the smallest table. The contents of the second, third, and so on tables are all transmitted
+        across the network. You want to minimize the size of the result set from each subsequent stage of the join
+        query. The most likely approach involves joining a small table first, so that the result set remains small
+        even as subsequent larger tables are processed.
+      </li>
+
+      <li class="li">
+        Join the next smallest table, then the next smallest, and so on.
+      </li>
+
+      <li class="li">
+        For example, if you had tables <code class="ph codeph">BIG</code>, <code class="ph codeph">MEDIUM</code>, <code class="ph codeph">SMALL</code>, and
+        <code class="ph codeph">TINY</code>, the logical join order to try would be <code class="ph codeph">BIG</code>, <code class="ph codeph">TINY</code>,
+        <code class="ph codeph">SMALL</code>, <code class="ph codeph">MEDIUM</code>.
+      </li>
+    </ul>
+
+    <p class="p">
+      The terms <span class="q">"largest"</span> and <span class="q">"smallest"</span> refers to the size of the intermediate result set based on the
+      number of rows and columns from each table that are part of the result set. For example, if you join one
+      table <code class="ph codeph">sales</code> with another table <code class="ph codeph">customers</code>, a query might find results from
+      100 different customers who made a total of 5000 purchases. In that case, you would specify <code class="ph codeph">SELECT
+      ... FROM sales JOIN customers ...</code>, putting <code class="ph codeph">customers</code> on the right side because it
+      is smaller in the context of this query.
+    </p>
+
+    <p class="p">
+      The Impala query planner chooses between different techniques for performing join queries, depending on the
+      absolute and relative sizes of the tables. <strong class="ph b">Broadcast joins</strong> are the default, where the right-hand table
+      is considered to be smaller than the left-hand table, and its contents are sent to all the other nodes
+      involved in the query. The alternative technique is known as a <strong class="ph b">partitioned join</strong> (not related to a
+      partitioned table), which is more suitable for large tables of roughly equal size. With this technique,
+      portions of each table are sent to appropriate other nodes where those subsets of rows can be processed in
+      parallel. The choice of broadcast or partitioned join also depends on statistics being available for all
+      tables in the join, gathered by the <code class="ph codeph">COMPUTE STATS</code> statement.
+    </p>
+
+    <p class="p">
+      To see which join strategy is used for a particular query, issue an <code class="ph codeph">EXPLAIN</code> statement for
+      the query. If you find that a query uses a broadcast join when you know through benchmarking that a
+      partitioned join would be more efficient, or vice versa, add a hint to the query to specify the precise join
+      mechanism to use. See <a class="xref" href="impala_hints.html#hints">Query Hints in Impala SELECT Statements</a> for details.
+    </p>
+  </div>
+
+  <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_performance.html">Tuning Impala for Performance</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="perf_joins__joins_no_stats">
+
+    <h2 class="title topictitle2" id="ariaid-title2">How Joins Are Processed when Statistics Are Unavailable</h2>
+  
+
+    <div class="body conbody">
+
+      <p class="p">
+        If table or column statistics are not available for some tables in a join, Impala still reorders the tables
+        using the information that is available. Tables with statistics are placed on the left side of the join
+        order, in descending order of cost based on overall size and cardinality. Tables without statistics are
+        treated as zero-size, that is, they are always placed on the right side of the join order.
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="perf_joins__straight_join">
+
+    <h2 class="title topictitle2" id="ariaid-title3">Overriding Join Reordering with STRAIGHT_JOIN</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        If an Impala join query is inefficient because of outdated statistics or unexpected data distribution, you
+        can keep Impala from reordering the joined tables by using the <code class="ph codeph">STRAIGHT_JOIN</code> keyword
+        immediately after the <code class="ph codeph">SELECT</code> keyword. The <code class="ph codeph">STRAIGHT_JOIN</code> keyword turns off
+        the reordering of join clauses that Impala does internally, and produces a plan that relies on the join
+        clauses being ordered optimally in the query text. In this case, rewrite the query so that the largest
+        table is on the left, followed by the next largest, and so on until the smallest table is on the right.
+      </p>
+
+      <p class="p">
+        In this example, the subselect from the <code class="ph codeph">BIG</code> table produces a very small result set, but
+        the table might still be treated as if it were the biggest and placed first in the join order. Using
+        <code class="ph codeph">STRAIGHT_JOIN</code> for the last join clause prevents the final table from being reordered,
+        keeping it as the rightmost table in the join order.
+      </p>
+
+<pre class="pre codeblock"><code>select straight_join x from medium join small join (select * from big where c1 &lt; 10) as big
+  where medium.id = small.id and small.id = big.id;</code></pre>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="perf_joins__perf_joins_examples">
+
+    <h2 class="title topictitle2" id="ariaid-title4">Examples of Join Order Optimization</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        Here are examples showing joins between tables with 1 billion, 200 million, and 1 million rows. (In this
+        case, the tables are unpartitioned and using Parquet format.) The smaller tables contain subsets of data
+        from the largest one, for convenience of joining on the unique <code class="ph codeph">ID</code> column. The smallest
+        table only contains a subset of columns from the others.
+      </p>
+
+      <p class="p"></p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; create table big stored as parquet as select * from raw_data;
++----------------------------+
+| summary                    |
++----------------------------+
+| Inserted 1000000000 row(s) |
++----------------------------+
+Returned 1 row(s) in 671.56s
+[localhost:21000] &gt; desc big;
++-----------+---------+---------+
+| name      | type    | comment |
++-----------+---------+---------+
+| id        | int     |         |
+| val       | int     |         |
+| zfill     | string  |         |
+| name      | string  |         |
+| assertion | boolean |         |
++-----------+---------+---------+
+Returned 5 row(s) in 0.01s
+[localhost:21000] &gt; create table medium stored as parquet as select * from big limit 200 * floor(1e6);
++---------------------------+
+| summary                   |
++---------------------------+
+| Inserted 200000000 row(s) |
++---------------------------+
+Returned 1 row(s) in 138.31s
+[localhost:21000] &gt; create table small stored as parquet as select id,val,name from big where assertion = true limit 1 * floor(1e6);
++-------------------------+
+| summary                 |
++-------------------------+
+| Inserted 1000000 row(s) |
++-------------------------+
+Returned 1 row(s) in 6.32s</code></pre>
+
+      <p class="p">
+        For any kind of performance experimentation, use the <code class="ph codeph">EXPLAIN</code> statement to see how any
+        expensive query will be performed without actually running it, and enable verbose <code class="ph codeph">EXPLAIN</code>
+        plans containing more performance-oriented detail: The most interesting plan lines are highlighted in bold,
+        showing that without statistics for the joined tables, Impala cannot make a good estimate of the number of
+        rows involved at each stage of processing, and is likely to stick with the <code class="ph codeph">BROADCAST</code> join
+        mechanism that sends a complete copy of one of the tables to each node.
+      </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; set explain_level=verbose;
+EXPLAIN_LEVEL set to verbose
+[localhost:21000] &gt; explain select count(*) from big join medium where big.id = medium.id;
++----------------------------------------------------------+
+| Explain String                                           |
++----------------------------------------------------------+
+| Estimated Per-Host Requirements: Memory=2.10GB VCores=2  |
+|                                                          |
+| PLAN FRAGMENT 0                                          |
+|   PARTITION: UNPARTITIONED                               |
+|                                                          |
+|   6:AGGREGATE (merge finalize)                           |
+|   |  output: SUM(COUNT(*))                               |
+|   |  cardinality: 1                                      |
+|   |  per-host memory: unavailable                        |
+|   |  tuple ids: 2                                        |
+|   |                                                      |
+|   5:EXCHANGE                                             |
+|      cardinality: 1                                      |
+|      per-host memory: unavailable                        |
+|      tuple ids: 2                                        |
+|                                                          |
+| PLAN FRAGMENT 1                                          |
+|   PARTITION: RANDOM                                      |
+|                                                          |
+|   STREAM DATA SINK                                       |
+|     EXCHANGE ID: 5                                       |
+|     UNPARTITIONED                                        |
+|                                                          |
+|   3:AGGREGATE                                            |
+|   |  output: COUNT(*)                                    |
+|   |  cardinality: 1                                      |
+|   |  per-host memory: 10.00MB                            |
+|   |  tuple ids: 2                                        |
+|   |                                                      |
+|   2:HASH JOIN                                            |
+<strong class="ph b">|   |  join op: INNER JOIN (BROADCAST)                     |</strong>
+|   |  hash predicates:                                    |
+|   |    big.id = medium.id                                |
+<strong class="ph b">|   |  cardinality: unavailable                            |</strong>
+|   |  per-host memory: 2.00GB                             |
+|   |  tuple ids: 0 1                                      |
+|   |                                                      |
+|   |----4:EXCHANGE                                        |
+|   |       cardinality: unavailable                       |
+|   |       per-host memory: 0B                            |
+|   |       tuple ids: 1                                   |
+|   |                                                      |
+|   0:SCAN HDFS                                            |
+<strong class="ph b">|      table=join_order.big #partitions=1/1 size=23.12GB   |
+|      table stats: unavailable                            |
+|      column stats: unavailable                           |
+|      cardinality: unavailable                            |</strong>
+|      per-host memory: 88.00MB                            |
+|      tuple ids: 0                                        |
+|                                                          |
+| PLAN FRAGMENT 2                                          |
+|   PARTITION: RANDOM                                      |
+|                                                          |
+|   STREAM DATA SINK                                       |
+|     EXCHANGE ID: 4                                       |
+|     UNPARTITIONED                                        |
+|                                                          |
+|   1:SCAN HDFS                                            |
+<strong class="ph b">|      table=join_order.medium #partitions=1/1 size=4.62GB |
+|      table stats: unavailable                            |
+|      column stats: unavailable                           |
+|      cardinality: unavailable                            |</strong>
+|      per-host memory: 88.00MB                            |
+|      tuple ids: 1                                        |
++----------------------------------------------------------+
+Returned 64 row(s) in 0.04s</code></pre>
+
+      <p class="p">
+        Gathering statistics for all the tables is straightforward, one <code class="ph codeph">COMPUTE STATS</code> statement
+        per table:
+      </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; compute stats small;
++-----------------------------------------+
+| summary                                 |
++-----------------------------------------+
+| Updated 1 partition(s) and 3 column(s). |
++-----------------------------------------+
+Returned 1 row(s) in 4.26s
+[localhost:21000] &gt; compute stats medium;
++-----------------------------------------+
+| summary                                 |
++-----------------------------------------+
+| Updated 1 partition(s) and 5 column(s). |
++-----------------------------------------+
+Returned 1 row(s) in 42.11s
+[localhost:21000] &gt; compute stats big;
++-----------------------------------------+
+| summary                                 |
++-----------------------------------------+
+| Updated 1 partition(s) and 5 column(s). |
++-----------------------------------------+
+Returned 1 row(s) in 165.44s</code></pre>
+
+      <p class="p">
+        With statistics in place, Impala can choose a more effective join order rather than following the
+        left-to-right sequence of tables in the query, and can choose <code class="ph codeph">BROADCAST</code> or
+        <code class="ph codeph">PARTITIONED</code> join strategies based on the overall sizes and number of rows in the table:
+      </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; explain select count(*) from medium join big where big.id = medium.id;
+Query: explain select count(*) from medium join big where big.id = medium.id
++-----------------------------------------------------------+
+| Explain String                                            |
++-----------------------------------------------------------+
+| Estimated Per-Host Requirements: Memory=937.23MB VCores=2 |
+|                                                           |
+| PLAN FRAGMENT 0                                           |
+|   PARTITION: UNPARTITIONED                                |
+|                                                           |
+|   6:AGGREGATE (merge finalize)                            |
+|   |  output: SUM(COUNT(*))                                |
+|   |  cardinality: 1                                       |
+|   |  per-host memory: unavailable                         |
+|   |  tuple ids: 2                                         |
+|   |                                                       |
+|   5:EXCHANGE                                              |
+|      cardinality: 1                                       |
+|      per-host memory: unavailable                         |
+|      tuple ids: 2                                         |
+|                                                           |
+| PLAN FRAGMENT 1                                           |
+|   PARTITION: RANDOM                                       |
+|                                                           |
+|   STREAM DATA SINK                                        |
+|     EXCHANGE ID: 5                                        |
+|     UNPARTITIONED                                         |
+|                                                           |
+|   3:AGGREGATE                                             |
+|   |  output: COUNT(*)                                     |
+|   |  cardinality: 1                                       |
+|   |  per-host memory: 10.00MB                             |
+|   |  tuple ids: 2                                         |
+|   |                                                       |
+|   2:HASH JOIN                                             |
+|   |  join op: INNER JOIN (BROADCAST)                      |
+|   |  hash predicates:                                     |
+|   |    big.id = medium.id                                 |
+|   |  cardinality: 1443004441                              |
+|   |  per-host memory: 839.23MB                            |
+|   |  tuple ids: 1 0                                       |
+|   |                                                       |
+|   |----4:EXCHANGE                                         |
+|   |       cardinality: 200000000                          |
+|   |       per-host memory: 0B                             |
+|   |       tuple ids: 0                                    |
+|   |                                                       |
+|   1:SCAN HDFS                                             |
+|      table=join_order.big #partitions=1/1 size=23.12GB    |
+|      table stats: 1000000000 rows total                   |
+|      column stats: all                                    |
+|      cardinality: 1000000000                              |
+|      per-host memory: 88.00MB                             |
+|      tuple ids: 1                                         |
+|                                                           |
+| PLAN FRAGMENT 2                                           |
+|   PARTITION: RANDOM                                       |
+|                                                           |
+|   STREAM DATA SINK                                        |
+|     EXCHANGE ID: 4                                        |
+|     UNPARTITIONED                                         |
+|                                                           |
+|   0:SCAN HDFS                                             |
+|      table=join_order.medium #partitions=1/1 size=4.62GB  |
+|      table stats: 200000000 rows total                    |
+|      column stats: all                                    |
+|      cardinality: 200000000                               |
+|      per-host memory: 88.00MB                             |
+|      tuple ids: 0                                         |
++-----------------------------------------------------------+
+Returned 64 row(s) in 0.04s
+
+[localhost:21000] &gt; explain select count(*) from small join big where big.id = small.id;
+Query: explain select count(*) from small join big where big.id = small.id
++-----------------------------------------------------------+
+| Explain String                                            |
++-----------------------------------------------------------+
+| Estimated Per-Host Requirements: Memory=101.15MB VCores=2 |
+|                                                           |
+| PLAN FRAGMENT 0                                           |
+|   PARTITION: UNPARTITIONED                                |
+|                                                           |
+|   6:AGGREGATE (merge finalize)                            |
+|   |  output: SUM(COUNT(*))                                |
+|   |  cardinality: 1                                       |
+|   |  per-host memory: unavailable                         |
+|   |  tuple ids: 2                                         |
+|   |                                                       |
+|   5:EXCHANGE                                              |
+|      cardinality: 1                                       |
+|      per-host memory: unavailable                         |
+|      tuple ids: 2                                         |
+|                                                           |
+| PLAN FRAGMENT 1                                           |
+|   PARTITION: RANDOM                                       |
+|                                                           |
+|   STREAM DATA SINK                                        |
+|     EXCHANGE ID: 5                                        |
+|     UNPARTITIONED                                         |
+|                                                           |
+|   3:AGGREGATE                                             |
+|   |  output: COUNT(*)                                     |
+|   |  cardinality: 1                                       |
+|   |  per-host memory: 10.00MB                             |
+|   |  tuple ids: 2                                         |
+|   |                                                       |
+|   2:HASH JOIN                                             |
+|   |  join op: INNER JOIN (BROADCAST)                      |
+|   |  hash predicates:                                     |
+|   |    big.id = small.id                                  |
+|   |  cardinality: 1000000000                              |
+|   |  per-host memory: 3.15MB                              |
+|   |  tuple ids: 1 0                                       |
+|   |                                                       |
+|   |----4:EXCHANGE                                         |
+|   |       cardinality: 1000000                            |
+|   |       per-host memory: 0B                             |
+|   |       tuple ids: 0                                    |
+|   |                                                       |
+|   1:SCAN HDFS                                             |
+|      table=join_order.big #partitions=1/1 size=23.12GB    |
+|      table stats: 1000000000 rows total                   |
+|      column stats: all                                    |
+|      cardinality: 1000000000                              |
+|      per-host memory: 88.00MB                             |
+|      tuple ids: 1                                         |
+|                                                           |
+| PLAN FRAGMENT 2                                           |
+|   PARTITION: RANDOM                                       |
+|                                                           |
+|   STREAM DATA SINK                                        |
+|     EXCHANGE ID: 4                                        |
+|     UNPARTITIONED                                         |
+|                                                           |
+|   0:SCAN HDFS                                             |
+|      table=join_order.small #partitions=1/1 size=17.93MB  |
+|      table stats: 1000000 rows total                      |
+|      column stats: all                                    |
+|      cardinality: 1000000                                 |
+|      per-host memory: 32.00MB                             |
+|      tuple ids: 0                                         |
++-----------------------------------------------------------+
+Returned 64 row(s) in 0.03s</code></pre>
+
+      <p class="p">
+        When queries like these are actually run, the execution times are relatively consistent regardless of the
+        table order in the query text. Here are examples using both the unique <code class="ph codeph">ID</code> column and the
+        <code class="ph codeph">VAL</code> column containing duplicate values:
+      </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; select count(*) from big join small on (big.id = small.id);
+Query: select count(*) from big join small on (big.id = small.id)
++----------+
+| count(*) |
++----------+
+| 1000000  |
++----------+
+Returned 1 row(s) in 21.68s
+[localhost:21000] &gt; select count(*) from small join big on (big.id = small.id);
+Query: select count(*) from small join big on (big.id = small.id)
++----------+
+| count(*) |
++----------+
+| 1000000  |
++----------+
+Returned 1 row(s) in 20.45s
+
+[localhost:21000] &gt; select count(*) from big join small on (big.val = small.val);
++------------+
+| count(*)   |
++------------+
+| 2000948962 |
++------------+
+Returned 1 row(s) in 108.85s
+[localhost:21000] &gt; select count(*) from small join big on (big.val = small.val);
++------------+
+| count(*)   |
++------------+
+| 2000948962 |
++------------+
+Returned 1 row(s) in 100.76s</code></pre>
+
+      <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+        When examining the performance of join queries and the effectiveness of the join order optimization, make
+        sure the query involves enough data and cluster resources to see a difference depending on the query plan.
+        For example, a single data file of just a few megabytes will reside in a single HDFS block and be processed
+        on a single node. Likewise, if you use a single-node or two-node cluster, there might not be much
+        difference in efficiency for the broadcast or partitioned join strategies.
+      </div>
+    </div>
+  </article>
+</article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_perf_resources.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_perf_resources.html b/docs/build/html/topics/impala_perf_resources.html
new file mode 100644
index 0000000..ab0fadb
--- /dev/null
+++ b/docs/build/html/topics/impala_perf_resources.html
@@ -0,0 +1,47 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_performance.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="mem_limits"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Controlling Impala Resource Usage</title></head><body id="mem_limits"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Controlling Impala Resource Usage</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      Sometimes, balancing raw query performance against scalability requires limiting the amount of resources,
+      such as memory or CPU, used by a single query or group of queries. Impala can use several mechanisms that
+      help to smooth out the load during heavy concurrent usage, resulting in faster overall query times and
+      sharing of resources across Impala queries, MapReduce jobs, and other kinds of workloads across a <span class="keyword"></span>
+      cluster:
+    </p>
+
+    <ul class="ul">
+      <li class="li">
+        The Impala admission control feature uses a fast, distributed mechanism to hold back queries that exceed
+        limits on the number of concurrent queries or the amount of memory used. The queries are queued, and
+        executed as other queries finish and resources become available. You can control the concurrency limits,
+        and specify different limits for different groups of users to divide cluster resources according to the
+        priorities of different classes of users. This feature is new in Impala 1.3.
+        See <a class="xref" href="impala_admission.html#admission_control">Admission Control and Query Queuing</a> for details.
+      </li>
+
+      <li class="li">
+        <p class="p">
+          You can restrict the amount of memory Impala reserves during query execution by specifying the
+          <code class="ph codeph">-mem_limit</code> option for the <code class="ph codeph">impalad</code> daemon. See
+          <a class="xref" href="impala_config_options.html#config_options">Modifying Impala Startup Options</a> for details. This limit applies only to the
+          memory that is directly consumed by queries; Impala reserves additional memory at startup, for example to
+          hold cached metadata.
+        </p>
+      </li>
+
+      <li class="li">
+        <p class="p">
+          For production deployments, implement resource isolation using your cluster management
+          tool.
+        </p>
+      </li>
+    </ul>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_performance.html">Tuning Impala for Performance</a></div></div></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_perf_skew.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_perf_skew.html b/docs/build/html/topics/impala_perf_skew.html
new file mode 100644
index 0000000..cb4726e
--- /dev/null
+++ b/docs/build/html/topics/impala_perf_skew.html
@@ -0,0 +1,139 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_performance.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="perf_skew"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Detecting and Correcting HDFS Block Skew Conditions</title></head><body id="perf_skew"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Detecting and Correcting HDFS Block Skew Conditions</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      For best performance of Impala parallel queries, the work is divided equally across hosts in the cluster, and
+      all hosts take approximately equal time to finish their work. If one host takes substantially longer than
+      others, the extra time needed for the slow host can become the dominant factor in query performance.
+      Therefore, one of the first steps in performance tuning for Impala is to detect and correct such conditions.
+    </p>
+
+    <p class="p">
+      The main cause of uneven performance that you can correct within Impala is <dfn class="term">skew</dfn> in the number of
+      HDFS data blocks processed by each host, where some hosts process substantially more data blocks than others.
+      This condition can occur because of uneven distribution of the data values themselves, for example causing
+      certain data files or partitions to be large while others are very small. (Although it is possible to have
+      unevenly distributed data without any problems with the distribution of HDFS blocks.) Block skew could also
+      be due to the underlying block allocation policies within HDFS, the replication factor of the data files, and
+      the way that Impala chooses the host to process each data block.
+    </p>
+
+    <p class="p">
+      The most convenient way to detect block skew, or slow-host issues in general, is to examine the <span class="q">"executive
+      summary"</span> information from the query profile after running a query:
+    </p>
+
+    <ul class="ul">
+      <li class="li">
+        <p class="p">
+          In <span class="keyword cmdname">impala-shell</span>, issue the <code class="ph codeph">SUMMARY</code> command immediately after the
+          query is complete, to see just the summary information. If you detect issues involving skew, you might
+          switch to issuing the <code class="ph codeph">PROFILE</code> command, which displays the summary information followed
+          by a detailed performance analysis.
+        </p>
+      </li>
+
+      <li class="li">
+        <p class="p">
+          In the Impala debug web UI, click on the <span class="ph uicontrol">Profile</span> link associated with the query after it is
+          complete. The executive summary information is displayed early in the profile output.
+        </p>
+      </li>
+    </ul>
+
+    <p class="p">
+      For each phase of the query, you see an <span class="ph uicontrol">Avg Time</span> and a <span class="ph uicontrol">Max Time</span>
+      value, along with <span class="ph uicontrol">#Hosts</span> indicating how many hosts are involved in that query phase.
+      For all the phases with <span class="ph uicontrol">#Hosts</span> greater than one, look for cases where the maximum time
+      is substantially greater than the average time. Focus on the phases that took the longest, for example, those
+      taking multiple seconds rather than milliseconds or microseconds.
+    </p>
+
+    <p class="p">
+      If you detect that some hosts take longer than others, first rule out non-Impala causes. One reason that some
+      hosts could be slower than others is if those hosts have less capacity than the others, or if they are
+      substantially busier due to unevenly distributed non-Impala workloads:
+    </p>
+
+    <ul class="ul">
+      <li class="li">
+        <p class="p">
+          For clusters running Impala, keep the relative capacities of all hosts roughly equal. Any cost savings
+          from including some underpowered hosts in the cluster will likely be outweighed by poor or uneven
+          performance, and the time spent diagnosing performance issues.
+        </p>
+      </li>
+
+      <li class="li">
+        <p class="p">
+          If non-Impala workloads cause slowdowns on some hosts but not others, use the appropriate load-balancing
+          techniques for the non-Impala components to smooth out the load across the cluster.
+        </p>
+      </li>
+    </ul>
+
+    <p class="p">
+      If the hosts on your cluster are evenly powered and evenly loaded, examine the detailed profile output to
+      determine which host is taking longer than others for the query phase in question. Examine how many bytes are
+      processed during that phase on that host, how much memory is used, and how many bytes are transmitted across
+      the network.
+    </p>
+
+    <p class="p">
+      The most common symptom is a higher number of bytes read on one host than others, due to one host being
+      requested to process a higher number of HDFS data blocks. This condition is more likely to occur when the
+      number of blocks accessed by the query is relatively small. For example, if you have a 10-node cluster and
+      the query processes 10 HDFS blocks, each node might not process exactly one block. If one node sits idle
+      while another node processes two blocks, the query could take twice as long as if the data was perfectly
+      distributed.
+    </p>
+
+    <p class="p">
+      Possible solutions in this case include:
+    </p>
+
+    <ul class="ul">
+      <li class="li">
+        <p class="p">
+          If the query is artificially small, perhaps for benchmarking purposes, scale it up to process a larger
+          data set. For example, if some nodes read 10 HDFS data blocks while others read 11, the overall effect of
+          the uneven distribution is much lower than when some nodes did twice as much work as others. As a
+          guideline, aim for a <span class="q">"sweet spot"</span> where each node reads 2 GB or more from HDFS per query. Queries
+          that process lower volumes than that could experience inconsistent performance that smooths out as
+          queries become more data-intensive.
+        </p>
+      </li>
+
+      <li class="li">
+        <p class="p">
+          If the query processes only a few large blocks, so that many nodes sit idle and cannot help to
+          parallelize the query, consider reducing the overall block size. For example, you might adjust the
+          <code class="ph codeph">PARQUET_FILE_SIZE</code> query option before copying or converting data into a Parquet table.
+          Or you might adjust the granularity of data files produced earlier in the ETL pipeline by non-Impala
+          components. In Impala 2.0 and later, the default Parquet block size is 256 MB, reduced from 1 GB, to
+          improve parallelism for common cluster sizes and data volumes.
+        </p>
+      </li>
+
+      <li class="li">
+        <p class="p">
+          Reduce the amount of compression applied to the data. For text data files, the highest degree of
+          compression (gzip) produces unsplittable files that are more difficult for Impala to process in parallel,
+          and require extra memory during processing to hold the compressed and uncompressed data simultaneously.
+          For binary formats such as Parquet and Avro, compression can result in fewer data blocks overall, but
+          remember that when queries process relatively few blocks, there is less opportunity for parallel
+          execution and many nodes in the cluster might sit idle. Note that when Impala writes Parquet data with
+          the query option <code class="ph codeph">COMPRESSION_CODEC=NONE</code> enabled, the data is still typically compact due
+          to the encoding schemes used by Parquet, independent of the final compression step.
+        </p>
+      </li>
+    </ul>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_performance.html">Tuning Impala for Performance</a></div></div></nav></article></main></body></html>
\ No newline at end of file

[34/51] [partial] incubator-impala git commit: IMPALA-4181 [DOCS] Publish rendered Impala documentation to ASF site

Posted by jb...@apache.org.

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_explain.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_explain.html b/docs/build/html/topics/impala_explain.html
new file mode 100644
index 0000000..473a94d
--- /dev/null
+++ b/docs/build/html/topics/impala_explain.html
@@ -0,0 +1,291 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_langref_sql.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="explain"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>EXPLAIN Statement</title></head><body id="explain"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">EXPLAIN Statement</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      
+      Returns the execution plan for a statement, showing the low-level mechanisms that Impala will use to read the
+      data, divide the work among nodes in the cluster, and transmit intermediate and final results across the
+      network. Use <code class="ph codeph">explain</code> followed by a complete <code class="ph codeph">SELECT</code> query. For example:
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>EXPLAIN { <var class="keyword varname">select_query</var> | <var class="keyword varname">ctas_stmt</var> | <var class="keyword varname">insert_stmt</var> }
+</code></pre>
+
+    <p class="p">
+      The <var class="keyword varname">select_query</var> is a <code class="ph codeph">SELECT</code> statement, optionally prefixed by a
+      <code class="ph codeph">WITH</code> clause. See <a class="xref" href="impala_select.html#select">SELECT Statement</a> for details.
+    </p>
+
+    <p class="p">
+      The <var class="keyword varname">insert_stmt</var> is an <code class="ph codeph">INSERT</code> statement that inserts into or overwrites an
+      existing table. It can use either the <code class="ph codeph">INSERT ... SELECT</code> or <code class="ph codeph">INSERT ...
+      VALUES</code> syntax. See <a class="xref" href="impala_insert.html#insert">INSERT Statement</a> for details.
+    </p>
+
+    <p class="p">
+      The <var class="keyword varname">ctas_stmt</var> is a <code class="ph codeph">CREATE TABLE</code> statement using the <code class="ph codeph">AS
+      SELECT</code> clause, typically abbreviated as a <span class="q">"CTAS"</span> operation. See
+      <a class="xref" href="impala_create_table.html#create_table">CREATE TABLE Statement</a> for details.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+    <p class="p">
+      You can interpret the output to judge whether the query is performing efficiently, and adjust the query
+      and/or the schema if not. For example, you might change the tests in the <code class="ph codeph">WHERE</code> clause, add
+      hints to make join operations more efficient, introduce subqueries, change the order of tables in a join, add
+      or change partitioning for a table, collect column statistics and/or table statistics in Hive, or any other
+      performance tuning steps.
+    </p>
+
+    <p class="p">
+      The <code class="ph codeph">EXPLAIN</code> output reminds you if table or column statistics are missing from any table
+      involved in the query. These statistics are important for optimizing queries involving large tables or
+      multi-table joins. See <a class="xref" href="impala_compute_stats.html#compute_stats">COMPUTE STATS Statement</a> for how to gather statistics,
+      and <a class="xref" href="impala_perf_stats.html#perf_stats">Table and Column Statistics</a> for how to use this information for query tuning.
+    </p>
+
+    <div class="p">
+        Read the <code class="ph codeph">EXPLAIN</code> plan from bottom to top:
+        <ul class="ul">
+          <li class="li">
+            The last part of the plan shows the low-level details such as the expected amount of data that will be
+            read, where you can judge the effectiveness of your partitioning strategy and estimate how long it will
+            take to scan a table based on total data size and the size of the cluster.
+          </li>
+
+          <li class="li">
+            As you work your way up, next you see the operations that will be parallelized and performed on each
+            Impala node.
+          </li>
+
+          <li class="li">
+            At the higher levels, you see how data flows when intermediate result sets are combined and transmitted
+            from one node to another.
+          </li>
+
+          <li class="li">
+            See <a class="xref" href="../shared/../topics/impala_explain_level.html#explain_level">EXPLAIN_LEVEL Query Option</a> for details about the
+            <code class="ph codeph">EXPLAIN_LEVEL</code> query option, which lets you customize how much detail to show in the
+            <code class="ph codeph">EXPLAIN</code> plan depending on whether you are doing high-level or low-level tuning,
+            dealing with logical or physical aspects of the query.
+          </li>
+        </ul>
+      </div>
+
+    <p class="p">
+      If you come from a traditional database background and are not familiar with data warehousing, keep in mind
+      that Impala is optimized for full table scans across very large tables. The structure and distribution of
+      this data is typically not suitable for the kind of indexing and single-row lookups that are common in OLTP
+      environments. Seeing a query scan entirely through a large table is common, not necessarily an indication of
+      an inefficient query. Of course, if you can reduce the volume of scanned data by orders of magnitude, for
+      example by using a query that affects only certain partitions within a partitioned table, then you might be
+      able to optimize a query so that it executes in seconds rather than minutes.
+    </p>
+
+    <p class="p">
+      For more information and examples to help you interpret <code class="ph codeph">EXPLAIN</code> output, see
+      <a class="xref" href="impala_explain_plan.html#perf_explain">Using the EXPLAIN Plan for Performance Tuning</a>.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Extended EXPLAIN output:</strong>
+    </p>
+
+    <p class="p">
+      For performance tuning of complex queries, and capacity planning (such as using the admission control and
+      resource management features), you can enable more detailed and informative output for the
+      <code class="ph codeph">EXPLAIN</code> statement. In the <span class="keyword cmdname">impala-shell</span> interpreter, issue the command
+      <code class="ph codeph">SET EXPLAIN_LEVEL=<var class="keyword varname">level</var></code>, where <var class="keyword varname">level</var> is an integer
+      from 0 to 3 or corresponding mnemonic values <code class="ph codeph">minimal</code>, <code class="ph codeph">standard</code>,
+      <code class="ph codeph">extended</code>, or <code class="ph codeph">verbose</code>.
+    </p>
+
+    <p class="p">
+      When extended <code class="ph codeph">EXPLAIN</code> output is enabled, <code class="ph codeph">EXPLAIN</code> statements print
+      information about estimated memory requirements, minimum number of virtual cores, and so on.
+      
+    </p>
+
+    <p class="p">
+      See <a class="xref" href="impala_explain_level.html#explain_level">EXPLAIN_LEVEL Query Option</a> for details and examples.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+    <p class="p">
+      This example shows how the standard <code class="ph codeph">EXPLAIN</code> output moves from the lowest (physical) level to
+      the higher (logical) levels. The query begins by scanning a certain amount of data; each node performs an
+      aggregation operation (evaluating <code class="ph codeph">COUNT(*)</code>) on some subset of data that is local to that
+      node; the intermediate results are transmitted back to the coordinator node (labelled here as the
+      <code class="ph codeph">EXCHANGE</code> node); lastly, the intermediate results are summed to display the final result.
+    </p>
+
+<pre class="pre codeblock" id="explain__explain_plan_simple"><code>[impalad-host:21000] &gt; explain select count(*) from customer_address;
++----------------------------------------------------------+
+| Explain String                                           |
++----------------------------------------------------------+
+| Estimated Per-Host Requirements: Memory=42.00MB VCores=1 |
+|                                                          |
+| 03:AGGREGATE [MERGE FINALIZE]                            |
+| |  output: sum(count(*))                                 |
+| |                                                        |
+| 02:EXCHANGE [PARTITION=UNPARTITIONED]                    |
+| |                                                        |
+| 01:AGGREGATE                                             |
+| |  output: count(*)                                      |
+| |                                                        |
+| 00:SCAN HDFS [default.customer_address]                  |
+|    partitions=1/1 size=5.25MB                            |
++----------------------------------------------------------+
+</code></pre>
+
+    <p class="p">
+      These examples show how the extended <code class="ph codeph">EXPLAIN</code> output becomes more accurate and informative as
+      statistics are gathered by the <code class="ph codeph">COMPUTE STATS</code> statement. Initially, much of the information
+      about data size and distribution is marked <span class="q">"unavailable"</span>. Impala can determine the raw data size, but
+      not the number of rows or number of distinct values for each column without additional analysis. The
+      <code class="ph codeph">COMPUTE STATS</code> statement performs this analysis, so a subsequent <code class="ph codeph">EXPLAIN</code>
+      statement has additional information to use in deciding how to optimize the distributed query.
+    </p>
+
+    
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; set explain_level=extended;
+EXPLAIN_LEVEL set to extended
+[localhost:21000] &gt; explain select x from t1;
+[localhost:21000] &gt; explain select x from t1;
++----------------------------------------------------------+
+| Explain String                                           |
++----------------------------------------------------------+
+| Estimated Per-Host Requirements: Memory=32.00MB VCores=1 |
+|                                                          |
+| 01:EXCHANGE [PARTITION=UNPARTITIONED]                    |
+| |  hosts=1 per-host-mem=unavailable                      |
+<strong class="ph b">| |  tuple-ids=0 row-size=4B cardinality=unavailable       |</strong>
+| |                                                        |
+| 00:SCAN HDFS [default.t2, PARTITION=RANDOM]              |
+|    partitions=1/1 size=36B                               |
+<strong class="ph b">|    table stats: unavailable                              |</strong>
+<strong class="ph b">|    column stats: unavailable                             |</strong>
+|    hosts=1 per-host-mem=32.00MB                          |
+<strong class="ph b">|    tuple-ids=0 row-size=4B cardinality=unavailable       |</strong>
++----------------------------------------------------------+
+</code></pre>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; compute stats t1;
++-----------------------------------------+
+| summary                                 |
++-----------------------------------------+
+| Updated 1 partition(s) and 1 column(s). |
++-----------------------------------------+
+[localhost:21000] &gt; explain select x from t1;
++----------------------------------------------------------+
+| Explain String                                           |
++----------------------------------------------------------+
+| Estimated Per-Host Requirements: Memory=64.00MB VCores=1 |
+|                                                          |
+| 01:EXCHANGE [PARTITION=UNPARTITIONED]                    |
+| |  hosts=1 per-host-mem=unavailable                      |
+| |  tuple-ids=0 row-size=4B cardinality=0                 |
+| |                                                        |
+| 00:SCAN HDFS [default.t1, PARTITION=RANDOM]              |
+|    partitions=1/1 size=36B                               |
+<strong class="ph b">|    table stats: 0 rows total                             |</strong>
+<strong class="ph b">|    column stats: all                                     |</strong>
+|    hosts=1 per-host-mem=64.00MB                          |
+<strong class="ph b">|    tuple-ids=0 row-size=4B cardinality=0                 |</strong>
++----------------------------------------------------------+
+</code></pre>
+
+    <p class="p">
+        <strong class="ph b">Security considerations:</strong>
+      </p>
+    <p class="p">
+        If these statements in your environment contain sensitive literal values such as credit card numbers or tax
+        identifiers, Impala can redact this sensitive information when displaying the statements in log files and
+        other administrative contexts. See <span class="xref">the documentation for your Apache Hadoop distribution</span> for details.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Cancellation:</strong> Cannot be cancelled.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">HDFS permissions:</strong>
+      </p>
+    <p class="p">
+      
+      The user ID that the <span class="keyword cmdname">impalad</span> daemon runs under,
+      typically the <code class="ph codeph">impala</code> user, must have read
+      and execute permissions for all applicable directories in all source tables
+      for the query that is being explained.
+      (A <code class="ph codeph">SELECT</code> operation could read files from multiple different HDFS directories
+      if the source table is partitioned.)
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Kudu considerations:</strong>
+      </p>
+    <p class="p">
+      The <code class="ph codeph">EXPLAIN</code> statement displays equivalent plan
+      information for queries against Kudu tables as for queries
+      against HDFS-based tables.
+    </p>
+
+    <div class="p">
+      To see which predicates Impala can <span class="q">"push down"</span> to Kudu for
+      efficient evaluation, without transmitting unnecessary rows back
+      to Impala, look for the <code class="ph codeph">kudu predicates</code> item in
+      the scan phase of the query. The label <code class="ph codeph">kudu predicates</code>
+      indicates a condition that can be evaluated efficiently on the Kudu
+      side. The label <code class="ph codeph">predicates</code> in a <code class="ph codeph">SCAN KUDU</code>
+      node indicates a condition that is evaluated by Impala.
+      For example, in a table with primary key column <code class="ph codeph">X</code>
+      and non-primary key column <code class="ph codeph">Y</code>, you can see that
+      some operators in the <code class="ph codeph">WHERE</code> clause are evaluated
+      immediately by Kudu and others are evaluated later by Impala:
+<pre class="pre codeblock"><code>
+EXPLAIN SELECT x,y from kudu_table WHERE
+  x = 1 AND x NOT IN (2,3) AND y = 1
+  AND x IS NOT NULL AND x &gt; 0;
++----------------
+| Explain String
++----------------
+...
+| 00:SCAN KUDU [jrussell.hash_only]
+|    predicates: x IS NOT NULL, x NOT IN (2, 3)
+|    kudu predicates: x = 1, x &gt; 0, y = 1
+</code></pre>
+      Only binary predicates and <code class="ph codeph">IN</code> predicates containing
+      literal values that exactly match the types in the Kudu table, and do not
+      require any casting, can be pushed to Kudu.
+    </div>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+    <p class="p">
+      <a class="xref" href="impala_select.html#select">SELECT Statement</a>,
+      <a class="xref" href="impala_insert.html#insert">INSERT Statement</a>,
+      <a class="xref" href="impala_create_table.html#create_table">CREATE TABLE Statement</a>,
+      <a class="xref" href="impala_explain_plan.html#explain_plan">Understanding Impala Query Performance - EXPLAIN Plans and Query Profiles</a>
+    </p>
+
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_langref_sql.html">Impala SQL Statements</a></div></div></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_explain_level.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_explain_level.html b/docs/build/html/topics/impala_explain_level.html
new file mode 100644
index 0000000..c9f527b
--- /dev/null
+++ b/docs/build/html/topics/impala_explain_level.html
@@ -0,0 +1,342 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="explain_level"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>EXPLAIN_LEVEL Query Option</title></head><body id="explain_level"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">EXPLAIN_LEVEL Query Option</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      
+      Controls the amount of detail provided in the output of the <code class="ph codeph">EXPLAIN</code> statement. The basic
+      output can help you identify high-level performance issues such as scanning a higher volume of data or more
+      partitions than you expect. The higher levels of detail show how intermediate results flow between nodes and
+      how different SQL operations such as <code class="ph codeph">ORDER BY</code>, <code class="ph codeph">GROUP BY</code>, joins, and
+      <code class="ph codeph">WHERE</code> clauses are implemented within a distributed query.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Type:</strong> <code class="ph codeph">STRING</code> or <code class="ph codeph">INT</code>
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Default:</strong> <code class="ph codeph">1</code>
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Arguments:</strong>
+    </p>
+
+    <p class="p">
+      The allowed range of numeric values for this option is 0 to 3:
+    </p>
+
+    <ul class="ul">
+      <li class="li">
+        <code class="ph codeph">0</code> or <code class="ph codeph">MINIMAL</code>: A barebones list, one line per operation. Primarily useful
+        for checking the join order in very long queries where the regular <code class="ph codeph">EXPLAIN</code> output is too
+        long to read easily.
+      </li>
+
+      <li class="li">
+        <code class="ph codeph">1</code> or <code class="ph codeph">STANDARD</code>: The default level of detail, showing the logical way that
+        work is split up for the distributed query.
+      </li>
+
+      <li class="li">
+        <code class="ph codeph">2</code> or <code class="ph codeph">EXTENDED</code>: Includes additional detail about how the query planner
+        uses statistics in its decision-making process, to understand how a query could be tuned by gathering
+        statistics, using query hints, adding or removing predicates, and so on.
+      </li>
+
+      <li class="li">
+        <code class="ph codeph">3</code> or <code class="ph codeph">VERBOSE</code>: The maximum level of detail, showing how work is split up
+        within each node into <span class="q">"query fragments"</span> that are connected in a pipeline. This extra detail is
+        primarily useful for low-level performance testing and tuning within Impala itself, rather than for
+        rewriting the SQL code at the user level.
+      </li>
+    </ul>
+
+    <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+      Prior to Impala 1.3, the allowed argument range for <code class="ph codeph">EXPLAIN_LEVEL</code> was 0 to 1: level 0 had
+      the mnemonic <code class="ph codeph">NORMAL</code>, and level 1 was <code class="ph codeph">VERBOSE</code>. In Impala 1.3 and higher,
+      <code class="ph codeph">NORMAL</code> is not a valid mnemonic value, and <code class="ph codeph">VERBOSE</code> still applies to the
+      highest level of detail but now corresponds to level 3. You might need to adjust the values if you have any
+      older <code class="ph codeph">impala-shell</code> script files that set the <code class="ph codeph">EXPLAIN_LEVEL</code> query option.
+    </div>
+
+    <p class="p">
+      Changing the value of this option controls the amount of detail in the output of the <code class="ph codeph">EXPLAIN</code>
+      statement. The extended information from level 2 or 3 is especially useful during performance tuning, when
+      you need to confirm whether the work for the query is distributed the way you expect, particularly for the
+      most resource-intensive operations such as join queries against large tables, queries against tables with
+      large numbers of partitions, and insert operations for Parquet tables. The extended information also helps to
+      check estimated resource usage when you use the admission control or resource management features explained
+      in <a class="xref" href="impala_resource_management.html#resource_management">Resource Management for Impala</a>. See
+      <a class="xref" href="impala_explain.html#explain">EXPLAIN Statement</a> for the syntax of the <code class="ph codeph">EXPLAIN</code> statement, and
+      <a class="xref" href="impala_explain_plan.html#perf_explain">Using the EXPLAIN Plan for Performance Tuning</a> for details about how to use the extended information.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+    <p class="p">
+      As always, read the <code class="ph codeph">EXPLAIN</code> output from bottom to top. The lowest lines represent the
+      initial work of the query (scanning data files), the lines in the middle represent calculations done on each
+      node and how intermediate results are transmitted from one node to another, and the topmost lines represent
+      the final results being sent back to the coordinator node.
+    </p>
+
+    <p class="p">
+      The numbers in the left column are generated internally during the initial planning phase and do not
+      represent the actual order of operations, so it is not significant if they appear out of order in the
+      <code class="ph codeph">EXPLAIN</code> output.
+    </p>
+
+    <p class="p">
+      At all <code class="ph codeph">EXPLAIN</code> levels, the plan contains a warning if any tables in the query are missing
+      statistics. Use the <code class="ph codeph">COMPUTE STATS</code> statement to gather statistics for each table and suppress
+      this warning. See <a class="xref" href="impala_perf_stats.html#perf_stats">Table and Column Statistics</a> for details about how the statistics help
+      query performance.
+    </p>
+
+    <p class="p">
+      The <code class="ph codeph">PROFILE</code> command in <span class="keyword cmdname">impala-shell</span> always starts with an explain plan
+      showing full detail, the same as with <code class="ph codeph">EXPLAIN_LEVEL=3</code>. <span class="ph">After the explain
+      plan comes the executive summary, the same output as produced by the <code class="ph codeph">SUMMARY</code> command in
+      <span class="keyword cmdname">impala-shell</span>.</span>
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+    <p class="p">
+      These examples use a trivial, empty table to illustrate how the essential aspects of query planning are shown
+      in <code class="ph codeph">EXPLAIN</code> output:
+    </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; create table t1 (x int, s string);
+[localhost:21000] &gt; set explain_level=1;
+[localhost:21000] &gt; explain select count(*) from t1;
++------------------------------------------------------------------------+
+| Explain String                                                         |
++------------------------------------------------------------------------+
+| Estimated Per-Host Requirements: Memory=10.00MB VCores=1               |
+| WARNING: The following tables are missing relevant table and/or column |
+|   statistics.                                                          |
+| explain_plan.t1                                                        |
+|                                                                        |
+| 03:AGGREGATE [MERGE FINALIZE]                                          |
+| |  output: sum(count(*))                                               |
+| |                                                                      |
+| 02:EXCHANGE [PARTITION=UNPARTITIONED]                                  |
+| |                                                                      |
+| 01:AGGREGATE                                                           |
+| |  output: count(*)                                                    |
+| |                                                                      |
+| 00:SCAN HDFS [explain_plan.t1]                                         |
+|    partitions=1/1 size=0B                                              |
++------------------------------------------------------------------------+
+[localhost:21000] &gt; explain select * from t1;
++------------------------------------------------------------------------+
+| Explain String                                                         |
++------------------------------------------------------------------------+
+| Estimated Per-Host Requirements: Memory=-9223372036854775808B VCores=0 |
+| WARNING: The following tables are missing relevant table and/or column |
+|   statistics.                                                          |
+| explain_plan.t1                                                        |
+|                                                                        |
+| 01:EXCHANGE [PARTITION=UNPARTITIONED]                                  |
+| |                                                                      |
+| 00:SCAN HDFS [explain_plan.t1]                                         |
+|    partitions=1/1 size=0B                                              |
++------------------------------------------------------------------------+
+[localhost:21000] &gt; set explain_level=2;
+[localhost:21000] &gt; explain select * from t1;
++------------------------------------------------------------------------+
+| Explain String                                                         |
++------------------------------------------------------------------------+
+| Estimated Per-Host Requirements: Memory=-9223372036854775808B VCores=0 |
+| WARNING: The following tables are missing relevant table and/or column |
+|   statistics.                                                          |
+| explain_plan.t1                                                        |
+|                                                                        |
+| 01:EXCHANGE [PARTITION=UNPARTITIONED]                                  |
+| |  hosts=0 per-host-mem=unavailable                                    |
+| |  tuple-ids=0 row-size=19B cardinality=unavailable                    |
+| |                                                                      |
+| 00:SCAN HDFS [explain_plan.t1, PARTITION=RANDOM]                       |
+|    partitions=1/1 size=0B                                              |
+|    table stats: unavailable                                            |
+|    column stats: unavailable                                           |
+|    hosts=0 per-host-mem=0B                                             |
+|    tuple-ids=0 row-size=19B cardinality=unavailable                    |
++------------------------------------------------------------------------+
+[localhost:21000] &gt; set explain_level=3;
+[localhost:21000] &gt; explain select * from t1;
++------------------------------------------------------------------------+
+| Explain String                                                         |
++------------------------------------------------------------------------+
+| Estimated Per-Host Requirements: Memory=-9223372036854775808B VCores=0 |
+<strong class="ph b">| WARNING: The following tables are missing relevant table and/or column |</strong>
+<strong class="ph b">|   statistics.                                                          |</strong>
+<strong class="ph b">| explain_plan.t1                                                        |</strong>
+|                                                                        |
+| F01:PLAN FRAGMENT [PARTITION=UNPARTITIONED]                            |
+|   01:EXCHANGE [PARTITION=UNPARTITIONED]                                |
+|      hosts=0 per-host-mem=unavailable                                  |
+|      tuple-ids=0 row-size=19B cardinality=unavailable                  |
+|                                                                        |
+| F00:PLAN FRAGMENT [PARTITION=RANDOM]                                   |
+|   DATASTREAM SINK [FRAGMENT=F01, EXCHANGE=01, PARTITION=UNPARTITIONED] |
+|   00:SCAN HDFS [explain_plan.t1, PARTITION=RANDOM]                     |
+|      partitions=1/1 size=0B                                            |
+<strong class="ph b">|      table stats: unavailable                                          |</strong>
+<strong class="ph b">|      column stats: unavailable                                         |</strong>
+|      hosts=0 per-host-mem=0B                                           |
+|      tuple-ids=0 row-size=19B cardinality=unavailable                  |
++------------------------------------------------------------------------+
+</code></pre>
+
+    <p class="p">
+      As the warning message demonstrates, most of the information needed for Impala to do efficient query
+      planning, and for you to understand the performance characteristics of the query, requires running the
+      <code class="ph codeph">COMPUTE STATS</code> statement for the table:
+    </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; compute stats t1;
++-----------------------------------------+
+| summary                                 |
++-----------------------------------------+
+| Updated 1 partition(s) and 2 column(s). |
++-----------------------------------------+
+[localhost:21000] &gt; explain select * from t1;
++------------------------------------------------------------------------+
+| Explain String                                                         |
++------------------------------------------------------------------------+
+| Estimated Per-Host Requirements: Memory=-9223372036854775808B VCores=0 |
+|                                                                        |
+| F01:PLAN FRAGMENT [PARTITION=UNPARTITIONED]                            |
+|   01:EXCHANGE [PARTITION=UNPARTITIONED]                                |
+|      hosts=0 per-host-mem=unavailable                                  |
+|      tuple-ids=0 row-size=20B cardinality=0                            |
+|                                                                        |
+| F00:PLAN FRAGMENT [PARTITION=RANDOM]                                   |
+|   DATASTREAM SINK [FRAGMENT=F01, EXCHANGE=01, PARTITION=UNPARTITIONED] |
+|   00:SCAN HDFS [explain_plan.t1, PARTITION=RANDOM]                     |
+|      partitions=1/1 size=0B                                            |
+<strong class="ph b">|      table stats: 0 rows total                                         |</strong>
+<strong class="ph b">|      column stats: all                                                 |</strong>
+|      hosts=0 per-host-mem=0B                                           |
+|      tuple-ids=0 row-size=20B cardinality=0                            |
++------------------------------------------------------------------------+
+</code></pre>
+
+    <p class="p">
+      Joins and other complicated, multi-part queries are the ones where you most commonly need to examine the
+      <code class="ph codeph">EXPLAIN</code> output and customize the amount of detail in the output. This example shows the
+      default <code class="ph codeph">EXPLAIN</code> output for a three-way join query, then the equivalent output with a
+      <code class="ph codeph">[SHUFFLE]</code> hint to change the join mechanism between the first two tables from a broadcast
+      join to a shuffle join.
+    </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; set explain_level=1;
+[localhost:21000] &gt; explain select one.*, two.*, three.* from t1 one, t1 two, t1 three where one.x = two.x and two.x = three.x;
++---------------------------------------------------------+
+| Explain String                                          |
++---------------------------------------------------------+
+| Estimated Per-Host Requirements: Memory=4.00GB VCores=3 |
+|                                                         |
+| 07:EXCHANGE [PARTITION=UNPARTITIONED]                   |
+| |                                                       |
+<strong class="ph b">| 04:HASH JOIN [INNER JOIN, BROADCAST]                    |</strong>
+| |  hash predicates: two.x = three.x                     |
+| |                                                       |
+<strong class="ph b">| |--06:EXCHANGE [BROADCAST]                              |</strong>
+| |  |                                                    |
+| |  02:SCAN HDFS [explain_plan.t1 three]                 |
+| |     partitions=1/1 size=0B                            |
+| |                                                       |
+<strong class="ph b">| 03:HASH JOIN [INNER JOIN, BROADCAST]                    |</strong>
+| |  hash predicates: one.x = two.x                       |
+| |                                                       |
+<strong class="ph b">| |--05:EXCHANGE [BROADCAST]                              |</strong>
+| |  |                                                    |
+| |  01:SCAN HDFS [explain_plan.t1 two]                   |
+| |     partitions=1/1 size=0B                            |
+| |                                                       |
+| 00:SCAN HDFS [explain_plan.t1 one]                      |
+|    partitions=1/1 size=0B                               |
++---------------------------------------------------------+
+[localhost:21000] &gt; explain select one.*, two.*, three.*
+                  &gt; from t1 one join [shuffle] t1 two join t1 three
+                  &gt; where one.x = two.x and two.x = three.x;
++---------------------------------------------------------+
+| Explain String                                          |
++---------------------------------------------------------+
+| Estimated Per-Host Requirements: Memory=4.00GB VCores=3 |
+|                                                         |
+| 08:EXCHANGE [PARTITION=UNPARTITIONED]                   |
+| |                                                       |
+<strong class="ph b">| 04:HASH JOIN [INNER JOIN, BROADCAST]                    |</strong>
+| |  hash predicates: two.x = three.x                     |
+| |                                                       |
+<strong class="ph b">| |--07:EXCHANGE [BROADCAST]                              |</strong>
+| |  |                                                    |
+| |  02:SCAN HDFS [explain_plan.t1 three]                 |
+| |     partitions=1/1 size=0B                            |
+| |                                                       |
+<strong class="ph b">| 03:HASH JOIN [INNER JOIN, PARTITIONED]                  |</strong>
+| |  hash predicates: one.x = two.x                       |
+| |                                                       |
+<strong class="ph b">| |--06:EXCHANGE [PARTITION=HASH(two.x)]                  |</strong>
+| |  |                                                    |
+| |  01:SCAN HDFS [explain_plan.t1 two]                   |
+| |     partitions=1/1 size=0B                            |
+| |                                                       |
+<strong class="ph b">| 05:EXCHANGE [PARTITION=HASH(one.x)]                     |</strong>
+| |                                                       |
+| 00:SCAN HDFS [explain_plan.t1 one]                      |
+|    partitions=1/1 size=0B                               |
++---------------------------------------------------------+
+</code></pre>
+
+    <p class="p">
+      For a join involving many different tables, the default <code class="ph codeph">EXPLAIN</code> output might stretch over
+      several pages, and the only details you care about might be the join order and the mechanism (broadcast or
+      shuffle) for joining each pair of tables. In that case, you might set <code class="ph codeph">EXPLAIN_LEVEL</code> to its
+      lowest value of 0, to focus on just the join order and join mechanism for each stage. The following example
+      shows how the rows from the first and second joined tables are hashed and divided among the nodes of the
+      cluster for further filtering; then the entire contents of the third table are broadcast to all nodes for the
+      final stage of join processing.
+    </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; set explain_level=0;
+[localhost:21000] &gt; explain select one.*, two.*, three.*
+                  &gt; from t1 one join [shuffle] t1 two join t1 three
+                  &gt; where one.x = two.x and two.x = three.x;
++---------------------------------------------------------+
+| Explain String                                          |
++---------------------------------------------------------+
+| Estimated Per-Host Requirements: Memory=4.00GB VCores=3 |
+|                                                         |
+| 08:EXCHANGE [PARTITION=UNPARTITIONED]                   |
+<strong class="ph b">| 04:HASH JOIN [INNER JOIN, BROADCAST]                    |</strong>
+<strong class="ph b">| |--07:EXCHANGE [BROADCAST]                              |</strong>
+| |  02:SCAN HDFS [explain_plan.t1 three]                 |
+<strong class="ph b">| 03:HASH JOIN [INNER JOIN, PARTITIONED]                  |</strong>
+<strong class="ph b">| |--06:EXCHANGE [PARTITION=HASH(two.x)]                  |</strong>
+| |  01:SCAN HDFS [explain_plan.t1 two]                   |
+<strong class="ph b">| 05:EXCHANGE [PARTITION=HASH(one.x)]                     |</strong>
+| 00:SCAN HDFS [explain_plan.t1 one]                      |
++---------------------------------------------------------+
+</code></pre>
+
+
+
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_explain_plan.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_explain_plan.html b/docs/build/html/topics/impala_explain_plan.html
new file mode 100644
index 0000000..bcd0855
--- /dev/null
+++ b/docs/build/html/topics/impala_explain_plan.html
@@ -0,0 +1,592 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_performance.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="explain_plan"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Understanding Impala Query Performance - EXPLAIN Plans and Query Profiles</title>
 </head><body id="explain_plan"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Understanding Impala Query Performance - EXPLAIN Plans and Query Profiles</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      To understand the high-level performance considerations for Impala queries, read the output of the
+      <code class="ph codeph">EXPLAIN</code> statement for the query. You can get the <code class="ph codeph">EXPLAIN</code> plan without
+      actually running the query itself.
+    </p>
+
+    <p class="p">
+      For an overview of the physical performance characteristics for a query, issue the <code class="ph codeph">SUMMARY</code>
+      statement in <span class="keyword cmdname">impala-shell</span> immediately after executing a query. This condensed information
+      shows which phases of execution took the most time, and how the estimates for memory usage and number of rows
+      at each phase compare to the actual values.
+    </p>
+
+    <p class="p">
+      To understand the detailed performance characteristics for a query, issue the <code class="ph codeph">PROFILE</code>
+      statement in <span class="keyword cmdname">impala-shell</span> immediately after executing a query. This low-level information
+      includes physical details about memory, CPU, I/O, and network usage, and thus is only available after the
+      query is actually run.
+    </p>
+
+    <p class="p toc inpage"></p>
+
+    <p class="p">
+      Also, see <a class="xref" href="impala_hbase.html#hbase_performance">Performance Considerations for the Impala-HBase Integration</a>
+      and <a class="xref" href="impala_s3.html#s3_performance">Understanding and Tuning Impala Query Performance for S3 Data</a>
+      for examples of interpreting
+      <code class="ph codeph">EXPLAIN</code> plans for queries against HBase tables
+      <span class="ph">and data stored in the Amazon Simple Storage System (S3)</span>.
+    </p>
+  </div>
+
+  <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_performance.html">Tuning Impala for Performance</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="explain_plan__perf_explain">
+
+    <h2 class="title topictitle2" id="ariaid-title2">Using the EXPLAIN Plan for Performance Tuning</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        The <code class="ph codeph"><a class="xref" href="impala_explain.html#explain">EXPLAIN</a></code> statement gives you an outline
+        of the logical steps that a query will perform, such as how the work will be distributed among the nodes
+        and how intermediate results will be combined to produce the final result set. You can see these details
+        before actually running the query. You can use this information to check that the query will not operate in
+        some very unexpected or inefficient way.
+      </p>
+
+
+
+<pre class="pre codeblock"><code>[impalad-host:21000] &gt; explain select count(*) from customer_address;
++----------------------------------------------------------+
+| Explain String                                           |
++----------------------------------------------------------+
+| Estimated Per-Host Requirements: Memory=42.00MB VCores=1 |
+|                                                          |
+| 03:AGGREGATE [MERGE FINALIZE]                            |
+| |  output: sum(count(*))                                 |
+| |                                                        |
+| 02:EXCHANGE [PARTITION=UNPARTITIONED]                    |
+| |                                                        |
+| 01:AGGREGATE                                             |
+| |  output: count(*)                                      |
+| |                                                        |
+| 00:SCAN HDFS [default.customer_address]                  |
+|    partitions=1/1 size=5.25MB                            |
++----------------------------------------------------------+
+</code></pre>
+
+      <div class="p">
+        Read the <code class="ph codeph">EXPLAIN</code> plan from bottom to top:
+        <ul class="ul">
+          <li class="li">
+            The last part of the plan shows the low-level details such as the expected amount of data that will be
+            read, where you can judge the effectiveness of your partitioning strategy and estimate how long it will
+            take to scan a table based on total data size and the size of the cluster.
+          </li>
+
+          <li class="li">
+            As you work your way up, next you see the operations that will be parallelized and performed on each
+            Impala node.
+          </li>
+
+          <li class="li">
+            At the higher levels, you see how data flows when intermediate result sets are combined and transmitted
+            from one node to another.
+          </li>
+
+          <li class="li">
+            See <a class="xref" href="../shared/../topics/impala_explain_level.html#explain_level">EXPLAIN_LEVEL Query Option</a> for details about the
+            <code class="ph codeph">EXPLAIN_LEVEL</code> query option, which lets you customize how much detail to show in the
+            <code class="ph codeph">EXPLAIN</code> plan depending on whether you are doing high-level or low-level tuning,
+            dealing with logical or physical aspects of the query.
+          </li>
+        </ul>
+      </div>
+
+      <p class="p">
+        The <code class="ph codeph">EXPLAIN</code> plan is also printed at the beginning of the query profile report described in
+        <a class="xref" href="#perf_profile">Using the Query Profile for Performance Tuning</a>, for convenience in examining both the logical and physical aspects of the
+        query side-by-side.
+      </p>
+
+      <p class="p">
+        The amount of detail displayed in the <code class="ph codeph">EXPLAIN</code> output is controlled by the
+        <a class="xref" href="impala_explain_level.html#explain_level">EXPLAIN_LEVEL</a> query option. You typically
+        increase this setting from <code class="ph codeph">normal</code> to <code class="ph codeph">verbose</code> (or from <code class="ph codeph">0</code>
+        to <code class="ph codeph">1</code>) when doublechecking the presence of table and column statistics during performance
+        tuning, or when estimating query resource usage in conjunction with the resource management features.
+      </p>
+
+      
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="explain_plan__perf_summary">
+
+    <h2 class="title topictitle2" id="ariaid-title3">Using the SUMMARY Report for Performance Tuning</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        The <code class="ph codeph"><a class="xref" href="impala_shell_commands.html#shell_commands">SUMMARY</a></code> command within
+        the <span class="keyword cmdname">impala-shell</span> interpreter gives you an easy-to-digest overview of the timings for the
+        different phases of execution for a query. Like the <code class="ph codeph">EXPLAIN</code> plan, it is easy to see
+        potential performance bottlenecks. Like the <code class="ph codeph">PROFILE</code> output, it is available after the
+        query is run and so displays actual timing numbers.
+      </p>
+
+      <p class="p">
+        The <code class="ph codeph">SUMMARY</code> report is also printed at the beginning of the query profile report described
+        in <a class="xref" href="#perf_profile">Using the Query Profile for Performance Tuning</a>, for convenience in examining high-level and low-level aspects of the query
+        side-by-side.
+      </p>
+
+      <p class="p">
+        For example, here is a query involving an aggregate function, on a single-node VM. The different stages of
+        the query and their timings are shown (rolled up for all nodes), along with estimated and actual values
+        used in planning the query. In this case, the <code class="ph codeph">AVG()</code> function is computed for a subset of
+        data on each node (stage 01) and then the aggregated results from all nodes are combined at the end (stage
+        03). You can see which stages took the most time, and whether any estimates were substantially different
+        than the actual data distribution. (When examining the time values, be sure to consider the suffixes such
+        as <code class="ph codeph">us</code> for microseconds and <code class="ph codeph">ms</code> for milliseconds, rather than just looking
+        for the largest numbers.)
+      </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; select avg(ss_sales_price) from store_sales where ss_coupon_amt = 0;
++---------------------+
+| avg(ss_sales_price) |
++---------------------+
+| 37.80770926328327   |
++---------------------+
+[localhost:21000] &gt; summary;
++--------------+--------+----------+----------+-------+------------+----------+---------------+-----------------+
+| Operator     | #Hosts | Avg Time | Max Time | #Rows | Est. #Rows | Peak Mem | Est. Peak Mem | Detail          |
++--------------+--------+----------+----------+-------+------------+----------+---------------+-----------------+
+| 03:AGGREGATE | 1      | 1.03ms   | 1.03ms   | 1     | 1          | 48.00 KB | -1 B          | MERGE FINALIZE  |
+| 02:EXCHANGE  | 1      | 0ns      | 0ns      | 1     | 1          | 0 B      | -1 B          | UNPARTITIONED   |
+| 01:AGGREGATE | 1      | 30.79ms  | 30.79ms  | 1     | 1          | 80.00 KB | 10.00 MB      |                 |
+| 00:SCAN HDFS | 1      | 5.45s    | 5.45s    | 2.21M | -1         | 64.05 MB | 432.00 MB     | tpc.store_sales |
++--------------+--------+----------+----------+-------+------------+----------+---------------+-----------------+
+</code></pre>
+
+      <p class="p">
+        Notice how the longest initial phase of the query is measured in seconds (s), while later phases working on
+        smaller intermediate results are measured in milliseconds (ms) or even nanoseconds (ns).
+      </p>
+
+      <p class="p">
+        Here is an example from a more complicated query, as it would appear in the <code class="ph codeph">PROFILE</code>
+        output:
+      </p>
+
+<pre class="pre codeblock"><code>Operator              #Hosts   Avg Time   Max Time    #Rows  Est. #Rows  Peak Mem  Est. Peak Mem  Detail
+------------------------------------------------------------------------------------------------------------------------
+09:MERGING-EXCHANGE        1   79.738us   79.738us        5           5         0        -1.00 B  UNPARTITIONED
+05:TOP-N                   3   84.693us   88.810us        5           5  12.00 KB       120.00 B
+04:AGGREGATE               3    5.263ms    6.432ms        5           5  44.00 KB       10.00 MB  MERGE FINALIZE
+08:AGGREGATE               3   16.659ms   27.444ms   52.52K     600.12K   3.20 MB       15.11 MB  MERGE
+07:EXCHANGE                3    2.644ms      5.1ms   52.52K     600.12K         0              0  HASH(o_orderpriority)
+03:AGGREGATE               3  342.913ms  966.291ms   52.52K     600.12K  10.80 MB       15.11 MB
+02:HASH JOIN               3    2s165ms    2s171ms  144.87K     600.12K  13.63 MB      941.01 KB  INNER JOIN, BROADCAST
+|--06:EXCHANGE             3    8.296ms    8.692ms   57.22K      15.00K         0              0  BROADCAST
+|  01:SCAN HDFS            2    1s412ms    1s978ms   57.22K      15.00K  24.21 MB      176.00 MB  tpch.orders o
+00:SCAN HDFS               3    8s032ms    8s558ms    3.79M     600.12K  32.29 MB      264.00 MB  tpch.lineitem l
+</code></pre>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="explain_plan__perf_profile">
+
+    <h2 class="title topictitle2" id="ariaid-title4">Using the Query Profile for Performance Tuning</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        The <code class="ph codeph">PROFILE</code> statement, available in the <span class="keyword cmdname">impala-shell</span> interpreter,
+        produces a detailed low-level report showing how the most recent query was executed. Unlike the
+        <code class="ph codeph">EXPLAIN</code> plan described in <a class="xref" href="#perf_explain">Using the EXPLAIN Plan for Performance Tuning</a>, this information is only available
+        after the query has finished. It shows physical details such as the number of bytes read, maximum memory
+        usage, and so on for each node. You can use this information to determine if the query is I/O-bound or
+        CPU-bound, whether some network condition is imposing a bottleneck, whether a slowdown is affecting some
+        nodes but not others, and to check that recommended configuration settings such as short-circuit local
+        reads are in effect.
+      </p>
+
+      <p class="p">
+        By default, time values in the profile output reflect the wall-clock time taken by an operation.
+        For values denoting system time or user time, the measurement unit is reflected in the metric
+        name, such as <code class="ph codeph">ScannerThreadsSysTime</code> or <code class="ph codeph">ScannerThreadsUserTime</code>.
+        For example, a multi-threaded I/O operation might show a small figure for wall-clock time,
+        while the corresponding system time is larger, representing the sum of the CPU time taken by each thread.
+        Or a wall-clock time figure might be larger because it counts time spent waiting, while
+        the corresponding system and user time figures only measure the time while the operation
+        is actively using CPU cycles.
+      </p>
+
+      <p class="p">
+        The <a class="xref" href="impala_explain_plan.html#perf_explain"><code class="ph codeph">EXPLAIN</code> plan</a> is also printed
+        at the beginning of the query profile report, for convenience in examining both the logical and physical
+        aspects of the query side-by-side. The
+        <a class="xref" href="impala_explain_level.html#explain_level">EXPLAIN_LEVEL</a> query option also controls the
+        verbosity of the <code class="ph codeph">EXPLAIN</code> output printed by the <code class="ph codeph">PROFILE</code> command.
+      </p>
+
+      
+
+      <p class="p">
+        Here is an example of a query profile, from a relatively straightforward query on a single-node
+        pseudo-distributed cluster to keep the output relatively brief.
+      </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; profile;
+Query Runtime Profile:
+Query (id=6540a03d4bee0691:4963d6269b210ebd):
+  Summary:
+    Session ID: ea4a197f1c7bf858:c74e66f72e3a33ba
+    Session Type: BEESWAX
+    Start Time: 2013-12-02 17:10:30.263067000
+    End Time: 2013-12-02 17:10:50.932044000
+    Query Type: QUERY
+    Query State: FINISHED
+    Query Status: OK
+    Impala Version: impalad version 1.2.1 RELEASE (build edb5af1bcad63d410bc5d47cc203df3a880e9324)
+    User: doc_demo
+    Network Address: 127.0.0.1:49161
+    Default Db: stats_testing
+    Sql Statement: select t1.s, t2.s from t1 join t2 on (t1.id = t2.parent)
+    Plan:
+----------------
+Estimated Per-Host Requirements: Memory=2.09GB VCores=2
+
+PLAN FRAGMENT 0
+  PARTITION: UNPARTITIONED
+
+  4:EXCHANGE
+     cardinality: unavailable
+     per-host memory: unavailable
+     tuple ids: 0 1
+
+PLAN FRAGMENT 1
+  PARTITION: RANDOM
+
+  STREAM DATA SINK
+    EXCHANGE ID: 4
+    UNPARTITIONED
+
+  2:HASH JOIN
+  |  join op: INNER JOIN (BROADCAST)
+  |  hash predicates:
+  |    t1.id = t2.parent
+  |  cardinality: unavailable
+  |  per-host memory: 2.00GB
+  |  tuple ids: 0 1
+  |
+  |----3:EXCHANGE
+  |       cardinality: unavailable
+  |       per-host memory: 0B
+  |       tuple ids: 1
+  |
+  0:SCAN HDFS
+     table=stats_testing.t1 #partitions=1/1 size=33B
+     table stats: unavailable
+     column stats: unavailable
+     cardinality: unavailable
+     per-host memory: 32.00MB
+     tuple ids: 0
+
+PLAN FRAGMENT 2
+  PARTITION: RANDOM
+
+  STREAM DATA SINK
+    EXCHANGE ID: 3
+    UNPARTITIONED
+
+  1:SCAN HDFS
+     table=stats_testing.t2 #partitions=1/1 size=960.00KB
+     table stats: unavailable
+     column stats: unavailable
+     cardinality: unavailable
+     per-host memory: 96.00MB
+     tuple ids: 1
+----------------
+    Query Timeline: 20s670ms
+       - Start execution: 2.559ms (2.559ms)
+       - Planning finished: 23.587ms (21.27ms)
+       - Rows available: 666.199ms (642.612ms)
+       - First row fetched: 668.919ms (2.719ms)
+       - Unregister query: 20s668ms (20s000ms)
+  ImpalaServer:
+     - ClientFetchWaitTimer: 19s637ms
+     - RowMaterializationTimer: 167.121ms
+  Execution Profile 6540a03d4bee0691:4963d6269b210ebd:(Active: 837.815ms, % non-child: 0.00%)
+    Per Node Peak Memory Usage: impala-1.example.com:22000(7.42 MB)
+     - FinalizationTimer: 0ns
+    Coordinator Fragment:(Active: 195.198ms, % non-child: 0.00%)
+      MemoryUsage(500.0ms): 16.00 KB, 7.42 MB, 7.33 MB, 7.10 MB, 6.94 MB, 6.71 MB, 6.56 MB, 6.40 MB, 6.17 MB, 6.02 MB, 5.79 MB, 5.63 MB, 5.48 MB, 5.25 MB, 5.09 MB, 4.86 MB, 4.71 MB, 4.47 MB, 4.32 MB, 4.09 MB, 3.93 MB, 3.78 MB, 3.55 MB, 3.39 MB, 3.16 MB, 3.01 MB, 2.78 MB, 2.62 MB, 2.39 MB, 2.24 MB, 2.08 MB, 1.85 MB, 1.70 MB, 1.54 MB, 1.31 MB, 1.16 MB, 948.00 KB, 790.00 KB, 553.00 KB, 395.00 KB, 237.00 KB
+      ThreadUsage(500.0ms): 1
+       - AverageThreadTokens: 1.00
+       - PeakMemoryUsage: 7.42 MB
+       - PrepareTime: 36.144us
+       - RowsProduced: 98.30K (98304)
+       - TotalCpuTime: 20s449ms
+       - TotalNetworkWaitTime: 191.630ms
+       - TotalStorageWaitTime: 0ns
+      CodeGen:(Active: 150.679ms, % non-child: 77.19%)
+         - CodegenTime: 0ns
+         - CompileTime: 139.503ms
+         - LoadTime: 10.7ms
+         - ModuleFileSize: 95.27 KB
+      EXCHANGE_NODE (id=4):(Active: 194.858ms, % non-child: 99.83%)
+         - BytesReceived: 2.33 MB
+         - ConvertRowBatchTime: 2.732ms
+         - DataArrivalWaitTime: 191.118ms
+         - DeserializeRowBatchTimer: 14.943ms
+         - FirstBatchArrivalWaitTime: 191.117ms
+         - PeakMemoryUsage: 7.41 MB
+         - RowsReturned: 98.30K (98304)
+         - RowsReturnedRate: 504.49 K/sec
+         - SendersBlockedTimer: 0ns
+         - SendersBlockedTotalTimer(*): 0ns
+    Averaged Fragment 1:(Active: 442.360ms, % non-child: 0.00%)
+      split sizes:  min: 33.00 B, max: 33.00 B, avg: 33.00 B, stddev: 0.00
+      completion times: min:443.720ms  max:443.720ms  mean: 443.720ms  stddev:0ns
+      execution rates: min:74.00 B/sec  max:74.00 B/sec  mean:74.00 B/sec  stddev:0.00 /sec
+      num instances: 1
+       - AverageThreadTokens: 1.00
+       - PeakMemoryUsage: 6.06 MB
+       - PrepareTime: 7.291ms
+       - RowsProduced: 98.30K (98304)
+       - TotalCpuTime: 784.259ms
+       - TotalNetworkWaitTime: 388.818ms
+       - TotalStorageWaitTime: 3.934ms
+      CodeGen:(Active: 312.862ms, % non-child: 70.73%)
+         - CodegenTime: 2.669ms
+         - CompileTime: 302.467ms
+         - LoadTime: 9.231ms
+         - ModuleFileSize: 95.27 KB
+      DataStreamSender (dst_id=4):(Active: 80.63ms, % non-child: 18.10%)
+         - BytesSent: 2.33 MB
+         - NetworkThroughput(*): 35.89 MB/sec
+         - OverallThroughput: 29.06 MB/sec
+         - PeakMemoryUsage: 5.33 KB
+         - SerializeBatchTime: 26.487ms
+         - ThriftTransmitTime(*): 64.814ms
+         - UncompressedRowBatchSize: 6.66 MB
+      HASH_JOIN_NODE (id=2):(Active: 362.25ms, % non-child: 3.92%)
+         - BuildBuckets: 1.02K (1024)
+         - BuildRows: 98.30K (98304)
+         - BuildTime: 12.622ms
+         - LoadFactor: 0.00
+         - PeakMemoryUsage: 6.02 MB
+         - ProbeRows: 3
+         - ProbeTime: 3.579ms
+         - RowsReturned: 98.30K (98304)
+         - RowsReturnedRate: 271.54 K/sec
+        EXCHANGE_NODE (id=3):(Active: 344.680ms, % non-child: 77.92%)
+           - BytesReceived: 1.15 MB
+           - ConvertRowBatchTime: 2.792ms
+           - DataArrivalWaitTime: 339.936ms
+           - DeserializeRowBatchTimer: 9.910ms
+           - FirstBatchArrivalWaitTime: 199.474ms
+           - PeakMemoryUsage: 156.00 KB
+           - RowsReturned: 98.30K (98304)
+           - RowsReturnedRate: 285.20 K/sec
+           - SendersBlockedTimer: 0ns
+           - SendersBlockedTotalTimer(*): 0ns
+      HDFS_SCAN_NODE (id=0):(Active: 13.616us, % non-child: 0.00%)
+         - AverageHdfsReadThreadConcurrency: 0.00
+         - AverageScannerThreadConcurrency: 0.00
+         - BytesRead: 33.00 B
+         - BytesReadLocal: 33.00 B
+         - BytesReadShortCircuit: 33.00 B
+         - NumDisksAccessed: 1
+         - NumScannerThreadsStarted: 1
+         - PeakMemoryUsage: 46.00 KB
+         - PerReadThreadRawHdfsThroughput: 287.52 KB/sec
+         - RowsRead: 3
+         - RowsReturned: 3
+         - RowsReturnedRate: 220.33 K/sec
+         - ScanRangesComplete: 1
+         - ScannerThreadsInvoluntaryContextSwitches: 26
+         - ScannerThreadsTotalWallClockTime: 55.199ms
+           - DelimiterParseTime: 2.463us
+           - MaterializeTupleTime(*): 1.226us
+           - ScannerThreadsSysTime: 0ns
+           - ScannerThreadsUserTime: 42.993ms
+         - ScannerThreadsVoluntaryContextSwitches: 1
+         - TotalRawHdfsReadTime(*): 112.86us
+         - TotalReadThroughput: 0.00 /sec
+    Averaged Fragment 2:(Active: 190.120ms, % non-child: 0.00%)
+      split sizes:  min: 960.00 KB, max: 960.00 KB, avg: 960.00 KB, stddev: 0.00
+      completion times: min:191.736ms  max:191.736ms  mean: 191.736ms  stddev:0ns
+      execution rates: min:4.89 MB/sec  max:4.89 MB/sec  mean:4.89 MB/sec  stddev:0.00 /sec
+      num instances: 1
+       - AverageThreadTokens: 0.00
+       - PeakMemoryUsage: 906.33 KB
+       - PrepareTime: 3.67ms
+       - RowsProduced: 98.30K (98304)
+       - TotalCpuTime: 403.351ms
+       - TotalNetworkWaitTime: 34.999ms
+       - TotalStorageWaitTime: 108.675ms
+      CodeGen:(Active: 162.57ms, % non-child: 85.24%)
+         - CodegenTime: 3.133ms
+         - CompileTime: 148.316ms
+         - LoadTime: 12.317ms
+         - ModuleFileSize: 95.27 KB
+      DataStreamSender (dst_id=3):(Active: 70.620ms, % non-child: 37.14%)
+         - BytesSent: 1.15 MB
+         - NetworkThroughput(*): 23.30 MB/sec
+         - OverallThroughput: 16.23 MB/sec
+         - PeakMemoryUsage: 5.33 KB
+         - SerializeBatchTime: 22.69ms
+         - ThriftTransmitTime(*): 49.178ms
+         - UncompressedRowBatchSize: 3.28 MB
+      HDFS_SCAN_NODE (id=1):(Active: 118.839ms, % non-child: 62.51%)
+         - AverageHdfsReadThreadConcurrency: 0.00
+         - AverageScannerThreadConcurrency: 0.00
+         - BytesRead: 960.00 KB
+         - BytesReadLocal: 960.00 KB
+         - BytesReadShortCircuit: 960.00 KB
+         - NumDisksAccessed: 1
+         - NumScannerThreadsStarted: 1
+         - PeakMemoryUsage: 869.00 KB
+         - PerReadThreadRawHdfsThroughput: 130.21 MB/sec
+         - RowsRead: 98.30K (98304)
+         - RowsReturned: 98.30K (98304)
+         - RowsReturnedRate: 827.20 K/sec
+         - ScanRangesComplete: 15
+         - ScannerThreadsInvoluntaryContextSwitches: 34
+         - ScannerThreadsTotalWallClockTime: 189.774ms
+           - DelimiterParseTime: 15.703ms
+           - MaterializeTupleTime(*): 3.419ms
+           - ScannerThreadsSysTime: 1.999ms
+           - ScannerThreadsUserTime: 44.993ms
+         - ScannerThreadsVoluntaryContextSwitches: 118
+         - TotalRawHdfsReadTime(*): 7.199ms
+         - TotalReadThroughput: 0.00 /sec
+    Fragment 1:
+      Instance 6540a03d4bee0691:4963d6269b210ebf (host=impala-1.example.com:22000):(Active: 442.360ms, % non-child: 0.00%)
+        Hdfs split stats (&lt;volume id&gt;:&lt;# splits&gt;/&lt;split lengths&gt;): 0:1/33.00 B
+        MemoryUsage(500.0ms): 69.33 KB
+        ThreadUsage(500.0ms): 1
+         - AverageThreadTokens: 1.00
+         - PeakMemoryUsage: 6.06 MB
+         - PrepareTime: 7.291ms
+         - RowsProduced: 98.30K (98304)
+         - TotalCpuTime: 784.259ms
+         - TotalNetworkWaitTime: 388.818ms
+         - TotalStorageWaitTime: 3.934ms
+        CodeGen:(Active: 312.862ms, % non-child: 70.73%)
+           - CodegenTime: 2.669ms
+           - CompileTime: 302.467ms
+           - LoadTime: 9.231ms
+           - ModuleFileSize: 95.27 KB
+        DataStreamSender (dst_id=4):(Active: 80.63ms, % non-child: 18.10%)
+           - BytesSent: 2.33 MB
+           - NetworkThroughput(*): 35.89 MB/sec
+           - OverallThroughput: 29.06 MB/sec
+           - PeakMemoryUsage: 5.33 KB
+           - SerializeBatchTime: 26.487ms
+           - ThriftTransmitTime(*): 64.814ms
+           - UncompressedRowBatchSize: 6.66 MB
+        HASH_JOIN_NODE (id=2):(Active: 362.25ms, % non-child: 3.92%)
+          ExecOption: Build Side Codegen Enabled, Probe Side Codegen Enabled, Hash Table Built Asynchronously
+           - BuildBuckets: 1.02K (1024)
+           - BuildRows: 98.30K (98304)
+           - BuildTime: 12.622ms
+           - LoadFactor: 0.00
+           - PeakMemoryUsage: 6.02 MB
+           - ProbeRows: 3
+           - ProbeTime: 3.579ms
+           - RowsReturned: 98.30K (98304)
+           - RowsReturnedRate: 271.54 K/sec
+          EXCHANGE_NODE (id=3):(Active: 344.680ms, % non-child: 77.92%)
+             - BytesReceived: 1.15 MB
+             - ConvertRowBatchTime: 2.792ms
+             - DataArrivalWaitTime: 339.936ms
+             - DeserializeRowBatchTimer: 9.910ms
+             - FirstBatchArrivalWaitTime: 199.474ms
+             - PeakMemoryUsage: 156.00 KB
+             - RowsReturned: 98.30K (98304)
+             - RowsReturnedRate: 285.20 K/sec
+             - SendersBlockedTimer: 0ns
+             - SendersBlockedTotalTimer(*): 0ns
+        HDFS_SCAN_NODE (id=0):(Active: 13.616us, % non-child: 0.00%)
+          Hdfs split stats (&lt;volume id&gt;:&lt;# splits&gt;/&lt;split lengths&gt;): 0:1/33.00 B
+          Hdfs Read Thread Concurrency Bucket: 0:0% 1:0%
+          File Formats: TEXT/NONE:1
+          ExecOption: Codegen enabled: 1 out of 1
+           - AverageHdfsReadThreadConcurrency: 0.00
+           - AverageScannerThreadConcurrency: 0.00
+           - BytesRead: 33.00 B
+           - BytesReadLocal: 33.00 B
+           - BytesReadShortCircuit: 33.00 B
+           - NumDisksAccessed: 1
+           - NumScannerThreadsStarted: 1
+           - PeakMemoryUsage: 46.00 KB
+           - PerReadThreadRawHdfsThroughput: 287.52 KB/sec
+           - RowsRead: 3
+           - RowsReturned: 3
+           - RowsReturnedRate: 220.33 K/sec
+           - ScanRangesComplete: 1
+           - ScannerThreadsInvoluntaryContextSwitches: 26
+           - ScannerThreadsTotalWallClockTime: 55.199ms
+             - DelimiterParseTime: 2.463us
+             - MaterializeTupleTime(*): 1.226us
+             - ScannerThreadsSysTime: 0ns
+             - ScannerThreadsUserTime: 42.993ms
+           - ScannerThreadsVoluntaryContextSwitches: 1
+           - TotalRawHdfsReadTime(*): 112.86us
+           - TotalReadThroughput: 0.00 /sec
+    Fragment 2:
+      Instance 6540a03d4bee0691:4963d6269b210ec0 (host=impala-1.example.com:22000):(Active: 190.120ms, % non-child: 0.00%)
+        Hdfs split stats (&lt;volume id&gt;:&lt;# splits&gt;/&lt;split lengths&gt;): 0:15/960.00 KB
+         - AverageThreadTokens: 0.00
+         - PeakMemoryUsage: 906.33 KB
+         - PrepareTime: 3.67ms
+         - RowsProduced: 98.30K (98304)
+         - TotalCpuTime: 403.351ms
+         - TotalNetworkWaitTime: 34.999ms
+         - TotalStorageWaitTime: 108.675ms
+        CodeGen:(Active: 162.57ms, % non-child: 85.24%)
+           - CodegenTime: 3.133ms
+           - CompileTime: 148.316ms
+           - LoadTime: 12.317ms
+           - ModuleFileSize: 95.27 KB
+        DataStreamSender (dst_id=3):(Active: 70.620ms, % non-child: 37.14%)
+           - BytesSent: 1.15 MB
+           - NetworkThroughput(*): 23.30 MB/sec
+           - OverallThroughput: 16.23 MB/sec
+           - PeakMemoryUsage: 5.33 KB
+           - SerializeBatchTime: 22.69ms
+           - ThriftTransmitTime(*): 49.178ms
+           - UncompressedRowBatchSize: 3.28 MB
+        HDFS_SCAN_NODE (id=1):(Active: 118.839ms, % non-child: 62.51%)
+          Hdfs split stats (&lt;volume id&gt;:&lt;# splits&gt;/&lt;split lengths&gt;): 0:15/960.00 KB
+          Hdfs Read Thread Concurrency Bucket: 0:0% 1:0%
+          File Formats: TEXT/NONE:15
+          ExecOption: Codegen enabled: 15 out of 15
+           - AverageHdfsReadThreadConcurrency: 0.00
+           - AverageScannerThreadConcurrency: 0.00
+           - BytesRead: 960.00 KB
+           - BytesReadLocal: 960.00 KB
+           - BytesReadShortCircuit: 960.00 KB
+           - NumDisksAccessed: 1
+           - NumScannerThreadsStarted: 1
+           - PeakMemoryUsage: 869.00 KB
+           - PerReadThreadRawHdfsThroughput: 130.21 MB/sec
+           - RowsRead: 98.30K (98304)
+           - RowsReturned: 98.30K (98304)
+           - RowsReturnedRate: 827.20 K/sec
+           - ScanRangesComplete: 15
+           - ScannerThreadsInvoluntaryContextSwitches: 34
+           - ScannerThreadsTotalWallClockTime: 189.774ms
+             - DelimiterParseTime: 15.703ms
+             - MaterializeTupleTime(*): 3.419ms
+             - ScannerThreadsSysTime: 1.999ms
+             - ScannerThreadsUserTime: 44.993ms
+           - ScannerThreadsVoluntaryContextSwitches: 118
+           - TotalRawHdfsReadTime(*): 7.199ms
+           - TotalReadThroughput: 0.00 /sec</code></pre>
+    </div>
+  </article>
+</article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_faq.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_faq.html b/docs/build/html/topics/impala_faq.html
new file mode 100644
index 0000000..b85bb8a
--- /dev/null
+++ b/docs/build/html/topics/impala_faq.html
@@ -0,0 +1,21 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="faq"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Impala Frequently Asked Questions</title></head><body id="faq"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Impala Frequently Asked Questions</h1>
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      This section lists frequently asked questions for Apache Impala (incubating),
+      the interactive SQL engine for Hadoop.
+    </p>
+
+    <p class="p">
+      This section is under construction.
+    </p>
+
+  </div>
+
+</article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_file_formats.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_file_formats.html b/docs/build/html/topics/impala_file_formats.html
new file mode 100644
index 0000000..d9ccbca
--- /dev/null
+++ b/docs/build/html/topics/impala_file_formats.html
@@ -0,0 +1,236 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_txtfile.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_parquet.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_avro.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_rcfile.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_seqfile.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="file_formats"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>How Impala Works with Hado
 op File Formats</title></head><body id="file_formats"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">How Impala Works with Hadoop File Formats</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      
+      
+      Impala supports several familiar file formats used in Apache Hadoop. Impala can load and query data files
+      produced by other Hadoop components such as Pig or MapReduce, and data files produced by Impala can be used
+      by other components also. The following sections discuss the procedures, limitations, and performance
+      considerations for using each file format with Impala.
+    </p>
+
+    <p class="p">
+      The file format used for an Impala table has significant performance consequences. Some file formats include
+      compression support that affects the size of data on the disk and, consequently, the amount of I/O and CPU
+      resources required to deserialize data. The amounts of I/O and CPU resources required can be a limiting
+      factor in query performance since querying often begins with moving and decompressing data. To reduce the
+      potential impact of this part of the process, data is often compressed. By compressing data, a smaller total
+      number of bytes are transferred from disk to memory. This reduces the amount of time taken to transfer the
+      data, but a tradeoff occurs when the CPU decompresses the content.
+    </p>
+
+    <p class="p">
+      Impala can query files encoded with most of the popular file formats and compression codecs used in Hadoop.
+      Impala can create and insert data into tables that use some file formats but not others; for file formats
+      that Impala cannot write to, create the table in Hive, issue the <code class="ph codeph">INVALIDATE METADATA <var class="keyword varname">table_name</var></code>
+      statement in <code class="ph codeph">impala-shell</code>, and query the table through Impala. File formats can be
+      structured, in which case they may include metadata and built-in compression. Supported formats include:
+    </p>
+
+    <table class="table"><caption><span class="table--title-label">Table 1. </span><span class="title">File Format Support in Impala</span></caption><colgroup><col style="width:10%"><col style="width:10%"><col style="width:20%"><col style="width:30%"><col style="width:30%"></colgroup><thead class="thead">
+          <tr class="row">
+            <th class="entry nocellnorowborder" id="file_formats__entry__1">
+              File Type
+            </th>
+            <th class="entry nocellnorowborder" id="file_formats__entry__2">
+              Format
+            </th>
+            <th class="entry nocellnorowborder" id="file_formats__entry__3">
+              Compression Codecs
+            </th>
+            <th class="entry nocellnorowborder" id="file_formats__entry__4">
+              Impala Can CREATE?
+            </th>
+            <th class="entry nocellnorowborder" id="file_formats__entry__5">
+              Impala Can INSERT?
+            </th>
+          </tr>
+        </thead><tbody class="tbody">
+          <tr class="row" id="file_formats__parquet_support">
+            <td class="entry nocellnorowborder" headers="file_formats__entry__1 ">
+              <a class="xref" href="impala_parquet.html#parquet">Parquet</a>
+            </td>
+            <td class="entry nocellnorowborder" headers="file_formats__entry__2 ">
+              Structured
+            </td>
+            <td class="entry nocellnorowborder" headers="file_formats__entry__3 ">
+              Snappy, gzip; currently Snappy by default
+            </td>
+            <td class="entry nocellnorowborder" headers="file_formats__entry__4 ">
+              Yes.
+            </td>
+            <td class="entry nocellnorowborder" headers="file_formats__entry__5 ">
+              Yes: <code class="ph codeph">CREATE TABLE</code>, <code class="ph codeph">INSERT</code>, <code class="ph codeph">LOAD DATA</code>, and query.
+            </td>
+          </tr>
+          <tr class="row" id="file_formats__txtfile_support">
+            <td class="entry nocellnorowborder" headers="file_formats__entry__1 ">
+              <a class="xref" href="impala_txtfile.html#txtfile">Text</a>
+            </td>
+            <td class="entry nocellnorowborder" headers="file_formats__entry__2 ">
+              Unstructured
+            </td>
+            <td class="entry nocellnorowborder" headers="file_formats__entry__3 ">
+              LZO, gzip, bzip2, Snappy
+            </td>
+            <td class="entry nocellnorowborder" headers="file_formats__entry__4 ">
+              Yes. For <code class="ph codeph">CREATE TABLE</code> with no <code class="ph codeph">STORED AS</code> clause, the default file
+              format is uncompressed text, with values separated by ASCII <code class="ph codeph">0x01</code> characters
+              (typically represented as Ctrl-A).
+            </td>
+            <td class="entry nocellnorowborder" headers="file_formats__entry__5 ">
+              Yes: <code class="ph codeph">CREATE TABLE</code>, <code class="ph codeph">INSERT</code>, <code class="ph codeph">LOAD DATA</code>, and query.
+              If LZO compression is used, you must create the table and load data in Hive. If other kinds of
+              compression are used, you must load data through <code class="ph codeph">LOAD DATA</code>, Hive, or manually in
+              HDFS.
+
+
+            </td>
+          </tr>
+          <tr class="row" id="file_formats__avro_support">
+            <td class="entry nocellnorowborder" headers="file_formats__entry__1 ">
+              <a class="xref" href="impala_avro.html#avro">Avro</a>
+            </td>
+            <td class="entry nocellnorowborder" headers="file_formats__entry__2 ">
+              Structured
+            </td>
+            <td class="entry nocellnorowborder" headers="file_formats__entry__3 ">
+              Snappy, gzip, deflate, bzip2
+            </td>
+            <td class="entry nocellnorowborder" headers="file_formats__entry__4 ">
+              Yes, in Impala 1.4.0 and higher. Before that, create the table using Hive.
+            </td>
+            <td class="entry nocellnorowborder" headers="file_formats__entry__5 ">
+              No. Import data by using <code class="ph codeph">LOAD DATA</code> on data files already in the right format, or use
+              <code class="ph codeph">INSERT</code> in Hive followed by <code class="ph codeph">REFRESH <var class="keyword varname">table_name</var></code> in Impala.
+            </td>
+
+          </tr>
+          <tr class="row" id="file_formats__rcfile_support">
+            <td class="entry nocellnorowborder" headers="file_formats__entry__1 ">
+              <a class="xref" href="impala_rcfile.html#rcfile">RCFile</a>
+            </td>
+            <td class="entry nocellnorowborder" headers="file_formats__entry__2 ">
+              Structured
+            </td>
+            <td class="entry nocellnorowborder" headers="file_formats__entry__3 ">
+              Snappy, gzip, deflate, bzip2
+            </td>
+            <td class="entry nocellnorowborder" headers="file_formats__entry__4 ">
+              Yes.
+            </td>
+            <td class="entry nocellnorowborder" headers="file_formats__entry__5 ">
+              No. Import data by using <code class="ph codeph">LOAD DATA</code> on data files already in the right format, or use
+              <code class="ph codeph">INSERT</code> in Hive followed by <code class="ph codeph">REFRESH <var class="keyword varname">table_name</var></code> in Impala.
+            </td>
+
+          </tr>
+          <tr class="row" id="file_formats__sequencefile_support">
+            <td class="entry nocellnorowborder" headers="file_formats__entry__1 ">
+              <a class="xref" href="impala_seqfile.html#seqfile">SequenceFile</a>
+            </td>
+            <td class="entry nocellnorowborder" headers="file_formats__entry__2 ">
+              Structured
+            </td>
+            <td class="entry nocellnorowborder" headers="file_formats__entry__3 ">
+              Snappy, gzip, deflate, bzip2
+            </td>
+            <td class="entry nocellnorowborder" headers="file_formats__entry__4 ">Yes.</td>
+            <td class="entry nocellnorowborder" headers="file_formats__entry__5 ">
+              No. Import data by using <code class="ph codeph">LOAD DATA</code> on data files already in the right format, or use
+              <code class="ph codeph">INSERT</code> in Hive followed by <code class="ph codeph">REFRESH <var class="keyword varname">table_name</var></code> in Impala.
+            </td>
+
+          </tr>
+        </tbody></table>
+
+    <p class="p">
+      Impala can only query the file formats listed in the preceding table.
+      In particular, Impala does not support the ORC file format.
+    </p>
+
+    <p class="p">
+      Impala supports the following compression codecs:
+    </p>
+
+    <ul class="ul">
+      <li class="li">
+        Snappy. Recommended for its effective balance between compression ratio and decompression speed. Snappy
+        compression is very fast, but gzip provides greater space savings. Supported for text files in Impala 2.0
+        and higher.
+
+      </li>
+
+      <li class="li">
+        Gzip. Recommended when achieving the highest level of compression (and therefore greatest disk-space
+        savings) is desired. Supported for text files in Impala 2.0 and higher.
+      </li>
+
+      <li class="li">
+        Deflate. Not supported for text files.
+      </li>
+
+      <li class="li">
+        Bzip2. Supported for text files in Impala 2.0 and higher.
+
+      </li>
+
+      <li class="li">
+        <p class="p"> LZO, for text files only. Impala can query
+          LZO-compressed text tables, but currently cannot create them or insert
+          data into them; perform these operations in Hive. </p>
+      </li>
+    </ul>
+  </div>
+
+  <nav role="navigation" class="related-links"><ul class="ullinks"><li class="link ulchildlink"><strong><a href="../topics/impala_txtfile.html">Using Text Data Files with Impala Tables</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_parquet.html">Using the Parquet File Format with Impala Tables</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_avro.html">Using the Avro File Format with Impala Tables</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_rcfile.html">Using the RCFile File Format with Impala Tables</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_seqfile.html">Using the SequenceFile File Format with Impala Tables</a></strong><br></li></ul></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="file_formats__file_format_choosing">
+
+    <h2 class="title topictitle2" id="ariaid-title2">Choosing the File Format for a Table</h2>
+  
+
+    <div class="body conbody">
+
+      <p class="p">
+        Different file formats and compression codecs work better for different data sets. While Impala typically
+        provides performance gains regardless of file format, choosing the proper format for your data can yield
+        further performance improvements. Use the following considerations to decide which combination of file
+        format and compression to use for a particular table:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          If you are working with existing files that are already in a supported file format, use the same format
+          for the Impala table where practical. If the original format does not yield acceptable query performance
+          or resource usage, consider creating a new Impala table with different file format or compression
+          characteristics, and doing a one-time conversion by copying the data to the new table using the
+          <code class="ph codeph">INSERT</code> statement. Depending on the file format, you might run the
+          <code class="ph codeph">INSERT</code> statement in <code class="ph codeph">impala-shell</code> or in Hive.
+        </li>
+
+        <li class="li">
+          Text files are convenient to produce through many different tools, and are human-readable for ease of
+          verification and debugging. Those characteristics are why text is the default format for an Impala
+          <code class="ph codeph">CREATE TABLE</code> statement. When performance and resource usage are the primary
+          considerations, use one of the other file formats and consider using compression. A typical workflow
+          might involve bringing data into an Impala table by copying CSV or TSV files into the appropriate data
+          directory, and then using the <code class="ph codeph">INSERT ... SELECT</code> syntax to copy the data into a table
+          using a different, more compact file format.
+        </li>
+
+        <li class="li">
+          If your architecture involves storing data to be queried in memory, do not compress the data. There is no
+          I/O savings since the data does not need to be moved from disk, but there is a CPU cost to decompress the
+          data.
+        </li>
+      </ul>
+    </div>
+  </article>
+</article></main></body></html>
\ No newline at end of file

[04/51] [partial] incubator-impala git commit: IMPALA-4181 [DOCS] Publish rendered Impala documentation to ASF site

Posted by jb...@apache.org.

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_struct.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_struct.html b/docs/build/html/topics/impala_struct.html
new file mode 100644
index 0000000..c796fe9
--- /dev/null
+++ b/docs/build/html/topics/impala_struct.html
@@ -0,0 +1,500 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_datatypes.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="struct"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>STRUCT Complex Type (Impala 2.3 or higher only)</title></head><body id="struct"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">STRUCT Complex Type (<span class="keyword">Impala 2.3</span> or higher only)</h1>
+
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      A complex data type, representing multiple fields of a single item. Frequently used as the element type of an <code class="ph codeph">ARRAY</code>
+      or the <code class="ph codeph">VALUE</code> part of a <code class="ph codeph">MAP</code>.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+<pre class="pre codeblock"><code><var class="keyword varname">column_name</var> STRUCT &lt; <var class="keyword varname">name</var> : <var class="keyword varname">type</var> [COMMENT '<var class="keyword varname">comment_string</var>'], ... &gt;
+
+type ::= <var class="keyword varname">primitive_type</var> | <var class="keyword varname">complex_type</var>
+</code></pre>
+
+    <p class="p">
+      The names and number of fields within the <code class="ph codeph">STRUCT</code> are fixed. Each field can be a different type. A field within a
+      <code class="ph codeph">STRUCT</code> can also be another <code class="ph codeph">STRUCT</code>, or an <code class="ph codeph">ARRAY</code> or a <code class="ph codeph">MAP</code>, allowing
+      you to create nested data structures with a maximum nesting depth of 100.
+    </p>
+
+    <p class="p">
+      A <code class="ph codeph">STRUCT</code> can be the top-level type for a column, or can itself be an item within an <code class="ph codeph">ARRAY</code> or the
+      value part of the key-value pair in a <code class="ph codeph">MAP</code>.
+    </p>
+
+    <p class="p">
+      When a <code class="ph codeph">STRUCT</code> is used as an <code class="ph codeph">ARRAY</code> element or a <code class="ph codeph">MAP</code> value, you use a join clause to
+      bring the <code class="ph codeph">ARRAY</code> or <code class="ph codeph">MAP</code> elements into the result set, and then refer to
+      <code class="ph codeph"><var class="keyword varname">array_name</var>.ITEM.<var class="keyword varname">field</var></code> or
+      <code class="ph codeph"><var class="keyword varname">map_name</var>.VALUE.<var class="keyword varname">field</var></code>. In the case of a <code class="ph codeph">STRUCT</code> directly inside
+      an <code class="ph codeph">ARRAY</code> or <code class="ph codeph">MAP</code>, you can omit the <code class="ph codeph">.ITEM</code> and <code class="ph codeph">.VALUE</code> pseudocolumns
+      and refer directly to <code class="ph codeph"><var class="keyword varname">array_name</var>.<var class="keyword varname">field</var></code> or
+      <code class="ph codeph"><var class="keyword varname">map_name</var>.<var class="keyword varname">field</var></code>.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+    <p class="p">
+        Because complex types are often used in combination,
+        for example an <code class="ph codeph">ARRAY</code> of <code class="ph codeph">STRUCT</code>
+        elements, if you are unfamiliar with the Impala complex types,
+        start with <a class="xref" href="../shared/../topics/impala_complex_types.html#complex_types">Complex Types (Impala 2.3 or higher only)</a> for
+        background information and usage examples.
+      </p>
+
+    <p class="p">
+      A <code class="ph codeph">STRUCT</code> is similar conceptually to a table row: it contains a fixed number of named fields, each with a predefined
+      type. To combine two related tables, while using complex types to minimize repetition, the typical way to represent that data is as an
+      <code class="ph codeph">ARRAY</code> of <code class="ph codeph">STRUCT</code> elements.
+    </p>
+
+    <p class="p">
+      Because a <code class="ph codeph">STRUCT</code> has a fixed number of named fields, it typically does not make sense to have a
+      <code class="ph codeph">STRUCT</code> as the type of a table column. In such a case, you could just make each field of the <code class="ph codeph">STRUCT</code>
+      into a separate column of the table. The <code class="ph codeph">STRUCT</code> type is most useful as an item of an <code class="ph codeph">ARRAY</code> or the
+      value part of the key-value pair in a <code class="ph codeph">MAP</code>. A nested type column with a <code class="ph codeph">STRUCT</code> at the lowest level
+      lets you associate a variable number of row-like objects with each row of the table.
+    </p>
+
+    <p class="p">
+      The <code class="ph codeph">STRUCT</code> type is straightforward to reference within a query. You do not need to include the
+      <code class="ph codeph">STRUCT</code> column in a join clause or give it a table alias, as is required for the <code class="ph codeph">ARRAY</code> and
+      <code class="ph codeph">MAP</code> types. You refer to the individual fields using dot notation, such as
+      <code class="ph codeph"><var class="keyword varname">struct_column_name</var>.<var class="keyword varname">field_name</var></code>, without any pseudocolumn such as
+      <code class="ph codeph">ITEM</code> or <code class="ph codeph">VALUE</code>.
+    </p>
+
+    <p class="p">
+        You can pass a multi-part qualified name to <code class="ph codeph">DESCRIBE</code>
+        to specify an <code class="ph codeph">ARRAY</code>, <code class="ph codeph">STRUCT</code>, or <code class="ph codeph">MAP</code>
+        column and visualize its structure as if it were a table.
+        For example, if table <code class="ph codeph">T1</code> contains an <code class="ph codeph">ARRAY</code> column
+        <code class="ph codeph">A1</code>, you could issue the statement <code class="ph codeph">DESCRIBE t1.a1</code>.
+        If table <code class="ph codeph">T1</code> contained a <code class="ph codeph">STRUCT</code> column <code class="ph codeph">S1</code>,
+        and a field <code class="ph codeph">F1</code> within the <code class="ph codeph">STRUCT</code> was a <code class="ph codeph">MAP</code>,
+        you could issue the statement <code class="ph codeph">DESCRIBE t1.s1.f1</code>.
+        An <code class="ph codeph">ARRAY</code> is shown as a two-column table, with
+        <code class="ph codeph">ITEM</code> and <code class="ph codeph">POS</code> columns.
+        A <code class="ph codeph">STRUCT</code> is shown as a table with each field
+        representing a column in the table.
+        A <code class="ph codeph">MAP</code> is shown as a two-column table, with
+        <code class="ph codeph">KEY</code> and <code class="ph codeph">VALUE</code> columns.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Internal details:</strong>
+      </p>
+
+    <p class="p">
+      Within the Parquet data file, the values for each <code class="ph codeph">STRUCT</code> field are stored adjacent to each other, so that they can be
+      encoded and compressed using all the Parquet techniques for storing sets of similar or repeated values. The adjacency applies even
+      when the <code class="ph codeph">STRUCT</code> values are part of an <code class="ph codeph">ARRAY</code> or <code class="ph codeph">MAP</code>. During a query, Impala avoids
+      unnecessary I/O by reading only the portions of the Parquet data file containing the requested <code class="ph codeph">STRUCT</code> fields.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.3.0</span>
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Restrictions:</strong>
+      </p>
+
+    <ul class="ul">
+        <li class="li">
+          <p class="p">
+            Columns with this data type can only be used in tables or partitions with the Parquet file format.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            Columns with this data type cannot be used as partition key columns in a partitioned table.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            The <code class="ph codeph">COMPUTE STATS</code> statement does not produce any statistics for columns of this data type.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p" id="struct__d6e2889">
+            The maximum length of the column definition for any complex type, including declarations for any nested types,
+            is 4000 characters.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            See <a class="xref" href="../shared/../topics/impala_complex_types.html#complex_types_limits">Limitations and Restrictions for Complex Types</a> for a full list of limitations
+            and associated guidelines about complex type columns.
+          </p>
+        </li>
+      </ul>
+
+    <p class="p">
+        <strong class="ph b">Kudu considerations:</strong>
+      </p>
+    <p class="p">
+        Currently, the data types <code class="ph codeph">DECIMAL</code>, <code class="ph codeph">TIMESTAMP</code>, <code class="ph codeph">CHAR</code>, <code class="ph codeph">VARCHAR</code>,
+        <code class="ph codeph">ARRAY</code>, <code class="ph codeph">MAP</code>, and <code class="ph codeph">STRUCT</code> cannot be used with Kudu tables.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+    <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+      Many of the complex type examples refer to tables
+      such as <code class="ph codeph">CUSTOMER</code> and <code class="ph codeph">REGION</code>
+      adapted from the tables used in the TPC-H benchmark.
+      See <a class="xref" href="../shared/../topics/impala_complex_types.html#complex_sample_schema">Sample Schema and Data for Experimenting with Impala Complex Types</a>
+      for the table definitions.
+      </div>
+
+    <p class="p">
+      The following example shows a table with various kinds of <code class="ph codeph">STRUCT</code> columns, both at the top level and nested within
+      other complex types. Practice the <code class="ph codeph">CREATE TABLE</code> and query notation for complex type columns using empty tables, until
+      you can visualize a complex data structure and construct corresponding SQL statements reliably.
+    </p>
+
+<pre class="pre codeblock"><code>CREATE TABLE struct_demo
+(
+  id BIGINT,
+  name STRING,
+
+-- A STRUCT as a top-level column. Demonstrates how the table ID column
+-- and the ID field within the STRUCT can coexist without a name conflict.
+  employee_info STRUCT &lt; employer: STRING, id: BIGINT, address: STRING &gt;,
+
+-- A STRUCT as the element type of an ARRAY.
+  places_lived ARRAY &lt; STRUCT &lt;street: STRING, city: STRING, country: STRING &gt;&gt;,
+
+-- A STRUCT as the value portion of the key-value pairs in a MAP.
+  memorable_moments MAP &lt; STRING, STRUCT &lt; year: INT, place: STRING, details: STRING &gt;&gt;,
+
+-- A STRUCT where one of the fields is another STRUCT.
+  current_address STRUCT &lt; street_address: STRUCT &lt;street_number: INT, street_name: STRING, street_type: STRING&gt;, country: STRING, postal_code: STRING &gt;
+)
+STORED AS PARQUET;
+
+</code></pre>
+
+    <p class="p">
+      The following example shows how to examine the structure of a table containing one or more <code class="ph codeph">STRUCT</code> columns by using
+      the <code class="ph codeph">DESCRIBE</code> statement. You can visualize each <code class="ph codeph">STRUCT</code> as its own table, with columns named the same
+      as each field of the <code class="ph codeph">STRUCT</code>. If the <code class="ph codeph">STRUCT</code> is nested inside another complex type, such as
+      <code class="ph codeph">ARRAY</code>, you can extend the qualified name passed to <code class="ph codeph">DESCRIBE</code> until the output shows just the
+      <code class="ph codeph">STRUCT</code> fields.
+    </p>
+
+<pre class="pre codeblock"><code>DESCRIBE struct_demo;
++-------------------+--------------------------+
+| name              | type                     |
++-------------------+--------------------------+
+| id                | bigint                   |
+| name              | string                   |
+| employee_info     | struct&lt;                  |
+|                   |   employer:string,       |
+|                   |   id:bigint,             |
+|                   |   address:string         |
+|                   | &gt;                        |
+| places_lived      | array&lt;struct&lt;            |
+|                   |   street:string,         |
+|                   |   city:string,           |
+|                   |   country:string         |
+|                   | &gt;&gt;                       |
+| memorable_moments | map&lt;string,struct&lt;       |
+|                   |   year:int,              |
+|                   |   place:string,          |
+|                   |   details:string         |
+|                   | &gt;&gt;                       |
+| current_address   | struct&lt;                  |
+|                   |   street_address:struct&lt; |
+|                   |     street_number:int,   |
+|                   |     street_name:string,  |
+|                   |     street_type:string   |
+|                   |   &gt;,                     |
+|                   |   country:string,        |
+|                   |   postal_code:string     |
+|                   | &gt;                        |
++-------------------+--------------------------+
+
+</code></pre>
+
+    <p class="p">
+      The top-level column <code class="ph codeph">EMPLOYEE_INFO</code> is a <code class="ph codeph">STRUCT</code>. Describing
+      <code class="ph codeph"><var class="keyword varname">table_name</var>.<var class="keyword varname">struct_name</var></code> displays the fields of the <code class="ph codeph">STRUCT</code> as if
+      they were columns of a table:
+    </p>
+
+<pre class="pre codeblock"><code>DESCRIBE struct_demo.employee_info;
++----------+--------+
+| name     | type   |
++----------+--------+
+| employer | string |
+| id       | bigint |
+| address  | string |
++----------+--------+
+
+</code></pre>
+
+    <p class="p">
+      Because <code class="ph codeph">PLACES_LIVED</code> is a <code class="ph codeph">STRUCT</code> inside an <code class="ph codeph">ARRAY</code>, the initial
+      <code class="ph codeph">DESCRIBE</code> shows the structure of the <code class="ph codeph">ARRAY</code>:
+    </p>
+
+<pre class="pre codeblock"><code>DESCRIBE struct_demo.places_lived;
++------+------------------+
+| name | type             |
++------+------------------+
+| item | struct&lt;          |
+|      |   street:string, |
+|      |   city:string,   |
+|      |   country:string |
+|      | &gt;                |
+| pos  | bigint           |
++------+------------------+
+
+</code></pre>
+
+    <p class="p">
+      Ask for the details of the <code class="ph codeph">ITEM</code> field of the <code class="ph codeph">ARRAY</code> to see just the layout of the
+      <code class="ph codeph">STRUCT</code>:
+    </p>
+
+<pre class="pre codeblock"><code>DESCRIBE struct_demo.places_lived.item;
++---------+--------+
+| name    | type   |
++---------+--------+
+| street  | string |
+| city    | string |
+| country | string |
++---------+--------+
+
+</code></pre>
+
+    <p class="p">
+      Likewise, <code class="ph codeph">MEMORABLE_MOMENTS</code> has a <code class="ph codeph">STRUCT</code> inside a <code class="ph codeph">MAP</code>, which requires an extra
+      level of qualified name to see just the <code class="ph codeph">STRUCT</code> part:
+    </p>
+
+<pre class="pre codeblock"><code>DESCRIBE struct_demo.memorable_moments;
++-------+------------------+
+| name  | type             |
++-------+------------------+
+| key   | string           |
+| value | struct&lt;          |
+|       |   year:int,      |
+|       |   place:string,  |
+|       |   details:string |
+|       | &gt;                |
++-------+------------------+
+
+</code></pre>
+
+    <p class="p">
+      For a <code class="ph codeph">MAP</code>, ask to see the <code class="ph codeph">VALUE</code> field to see the corresponding <code class="ph codeph">STRUCT</code> fields in a
+      table-like structure:
+    </p>
+
+<pre class="pre codeblock"><code>DESCRIBE struct_demo.memorable_moments.value;
++---------+--------+
+| name    | type   |
++---------+--------+
+| year    | int    |
+| place   | string |
+| details | string |
++---------+--------+
+
+</code></pre>
+
+    <p class="p">
+      For a <code class="ph codeph">STRUCT</code> inside a <code class="ph codeph">STRUCT</code>, we can see the fields of the outer <code class="ph codeph">STRUCT</code>:
+    </p>
+
+<pre class="pre codeblock"><code>DESCRIBE struct_demo.current_address;
++----------------+-----------------------+
+| name           | type                  |
++----------------+-----------------------+
+| street_address | struct&lt;               |
+|                |   street_number:int,  |
+|                |   street_name:string, |
+|                |   street_type:string  |
+|                | &gt;                     |
+| country        | string                |
+| postal_code    | string                |
++----------------+-----------------------+
+
+</code></pre>
+
+    <p class="p">
+      Then we can use a further qualified name to see just the fields of the inner <code class="ph codeph">STRUCT</code>:
+    </p>
+
+<pre class="pre codeblock"><code>DESCRIBE struct_demo.current_address.street_address;
++---------------+--------+
+| name          | type   |
++---------------+--------+
+| street_number | int    |
+| street_name   | string |
+| street_type   | string |
++---------------+--------+
+
+</code></pre>
+
+    <p class="p">
+      The following example shows how to examine the structure of a table containing one or more <code class="ph codeph">STRUCT</code> columns by using
+      the <code class="ph codeph">DESCRIBE</code> statement. You can visualize each <code class="ph codeph">STRUCT</code> as its own table, with columns named the same
+      as each field of the <code class="ph codeph">STRUCT</code>. If the <code class="ph codeph">STRUCT</code> is nested inside another complex type, such as
+      <code class="ph codeph">ARRAY</code>, you can extend the qualified name passed to <code class="ph codeph">DESCRIBE</code> until the output shows just the
+      <code class="ph codeph">STRUCT</code> fields.
+    </p>
+
+
+
+<pre class="pre codeblock"><code>DESCRIBE struct_demo;
++-------------------+--------------------------+---------+
+| name              | type                     | comment |
++-------------------+--------------------------+---------+
+| id                | bigint                   |         |
+| name              | string                   |         |
+| employee_info     | struct&lt;                  |         |
+|                   |   employer:string,       |         |
+|                   |   id:bigint,             |         |
+|                   |   address:string         |         |
+|                   | &gt;                        |         |
+| places_lived      | array&lt;struct&lt;            |         |
+|                   |   street:string,         |         |
+|                   |   city:string,           |         |
+|                   |   country:string         |         |
+|                   | &gt;&gt;                       |         |
+| memorable_moments | map&lt;string,struct&lt;       |         |
+|                   |   year:int,              |         |
+|                   |   place:string,          |         |
+|                   |   details:string         |         |
+|                   | &gt;&gt;                       |         |
+| current_address   | struct&lt;                  |         |
+|                   |   street_address:struct&lt; |         |
+|                   |     street_number:int,   |         |
+|                   |     street_name:string,  |         |
+|                   |     street_type:string   |         |
+|                   |   &gt;,                     |         |
+|                   |   country:string,        |         |
+|                   |   postal_code:string     |         |
+|                   | &gt;                        |         |
++-------------------+--------------------------+---------+
+
+SELECT id, employee_info.id FROM struct_demo;
+
+SELECT id, employee_info.id AS employee_id FROM struct_demo;
+
+SELECT id, employee_info.id AS employee_id, employee_info.employer
+  FROM struct_demo;
+
+SELECT id, name, street, city, country
+  FROM struct_demo, struct_demo.places_lived;
+
+SELECT id, name, places_lived.pos, places_lived.street, places_lived.city, places_lived.country
+  FROM struct_demo, struct_demo.places_lived;
+
+SELECT id, name, pl.pos, pl.street, pl.city, pl.country
+  FROM struct_demo, struct_demo.places_lived AS pl;
+
+SELECT id, name, places_lived.pos, places_lived.street, places_lived.city, places_lived.country
+  FROM struct_demo, struct_demo.places_lived;
+
+SELECT id, name, pos, street, city, country
+  FROM struct_demo, struct_demo.places_lived;
+
+SELECT id, name, memorable_moments.key,
+  memorable_moments.value.year,
+  memorable_moments.value.place,
+  memorable_moments.value.details
+FROM struct_demo, struct_demo.memorable_moments
+WHERE memorable_moments.key IN ('Birthday','Anniversary','Graduation');
+
+SELECT id, name, mm.key, mm.value.year, mm.value.place, mm.value.details
+  FROM struct_demo, struct_demo.memorable_moments AS mm
+WHERE mm.key IN ('Birthday','Anniversary','Graduation');
+
+SELECT id, name, memorable_moments.key, memorable_moments.value.year,
+  memorable_moments.value.place, memorable_moments.value.details
+FROM struct_demo, struct_demo.memorable_moments
+WHERE key IN ('Birthday','Anniversary','Graduation');
+
+SELECT id, name, key, value.year, value.place, value.details
+  FROM struct_demo, struct_demo.memorable_moments
+WHERE key IN ('Birthday','Anniversary','Graduation');
+
+SELECT id, name, key, year, place, details
+  FROM struct_demo, struct_demo.memorable_moments
+WHERE key IN ('Birthday','Anniversary','Graduation');
+
+SELECT id, name,
+  current_address.street_address.street_number,
+  current_address.street_address.street_name,
+  current_address.street_address.street_type,
+  current_address.country,
+  current_address.postal_code
+FROM struct_demo;
+
+</code></pre>
+
+    <p class="p">
+      For example, this table uses a struct that encodes several data values for each phone number associated with a person. Each person can
+      have a variable-length array of associated phone numbers, and queries can refer to the category field to locate specific home, work,
+      mobile, and so on kinds of phone numbers.
+    </p>
+
+<pre class="pre codeblock"><code>CREATE TABLE contact_info_many_structs
+(
+  id BIGINT, name STRING,
+  phone_numbers ARRAY &lt; STRUCT &lt;category:STRING, country_code:STRING, area_code:SMALLINT, full_number:STRING, mobile:BOOLEAN, carrier:STRING &gt; &gt;
+) STORED AS PARQUET;
+
+</code></pre>
+
+    <p class="p">
+      Because structs are naturally suited to composite values where the fields have different data types, you might use them to decompose
+      things such as addresses:
+    </p>
+
+<pre class="pre codeblock"><code>CREATE TABLE contact_info_detailed_address
+(
+  id BIGINT, name STRING,
+  address STRUCT &lt; house_number:INT, street:STRING, street_type:STRING, apartment:STRING, city:STRING, region:STRING, country:STRING &gt;
+);
+
+</code></pre>
+
+    <p class="p">
+      In a big data context, splitting out data fields such as the number part of the address and the street name could let you do analysis
+      on each field independently. For example, which streets have the largest number range of addresses, what are the statistical
+      properties of the street names, which areas have a higher proportion of <span class="q">"Roads"</span>, <span class="q">"Courts"</span> or <span class="q">"Boulevards"</span>, and so on.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+    <p class="p">
+      <a class="xref" href="impala_complex_types.html#complex_types">Complex Types (Impala 2.3 or higher only)</a>, <a class="xref" href="impala_array.html#array">ARRAY Complex Type (Impala 2.3 or higher only)</a>,
+
+      <a class="xref" href="impala_map.html#map">MAP Complex Type (Impala 2.3 or higher only)</a>
+    </p>
+
+  </div>
+
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_datatypes.html">Data Types</a></div></div></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_subqueries.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_subqueries.html b/docs/build/html/topics/impala_subqueries.html
new file mode 100644
index 0000000..2be2880
--- /dev/null
+++ b/docs/build/html/topics/impala_subqueries.html
@@ -0,0 +1,316 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_select.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="subqueries"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Subqueries in Impala SELECT Statements</title></head><body id="subqueries"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Subqueries in Impala SELECT Statements</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      
+      A <dfn class="term">subquery</dfn> is a query that is nested within another query. Subqueries let queries on one table
+      dynamically adapt based on the contents of another table. This technique provides great flexibility and
+      expressive power for SQL queries.
+    </p>
+
+    <p class="p">
+      A subquery can return a result set for use in the <code class="ph codeph">FROM</code> or <code class="ph codeph">WITH</code> clauses, or
+      with operators such as <code class="ph codeph">IN</code> or <code class="ph codeph">EXISTS</code>.
+    </p>
+
+    <p class="p">
+      A <dfn class="term">scalar subquery</dfn> produces a result set with a single row containing a single column, typically
+      produced by an aggregation function such as <code class="ph codeph">MAX()</code> or <code class="ph codeph">SUM()</code>. This single
+      result value can be substituted in scalar contexts such as arguments to comparison operators. If the result
+      set is empty, the value of the scalar subquery is <code class="ph codeph">NULL</code>. For example, the following query
+      finds the maximum value of <code class="ph codeph">T2.Y</code> and then substitutes that value into the
+      <code class="ph codeph">WHERE</code> clause of the outer block that queries <code class="ph codeph">T1</code>:
+    </p>
+
+<pre class="pre codeblock"><code>SELECT x FROM t1 WHERE x &gt; (SELECT MAX(y) FROM t2);
+</code></pre>
+
+    <p class="p">
+      <dfn class="term">Uncorrelated subqueries</dfn> do not refer to any tables from the outer block of the query. The same
+      value or set of values produced by the subquery is used when evaluating each row from the outer query block.
+      In this example, the subquery returns an arbitrary number of values from <code class="ph codeph">T2.Y</code>, and each
+      value of <code class="ph codeph">T1.X</code> is tested for membership in that same set of values:
+    </p>
+
+<pre class="pre codeblock"><code>SELECT x FROM t1 WHERE x IN (SELECT y FROM t2);
+</code></pre>
+
+    <p class="p">
+      <dfn class="term">Correlated subqueries</dfn> compare one or more values from the outer query block to values referenced
+      in the <code class="ph codeph">WHERE</code> clause of the subquery. Each row evaluated by the outer <code class="ph codeph">WHERE</code>
+      clause can be evaluated using a different set of values. These kinds of subqueries are restricted in the
+      kinds of comparisons they can do between columns of the inner and outer tables. (See the following
+      <strong class="ph b">Restrictions</strong> item.)
+    </p>
+
+    <p class="p">
+      For example, the following query finds all the employees with salaries that are higher than average for their
+      department. The subquery potentially computes a different <code class="ph codeph">AVG()</code> value for each employee.
+    </p>
+
+
+
+<pre class="pre codeblock"><code>SELECT employee_name, employee_id FROM employees one WHERE
+  salary &gt; (SELECT avg(salary) FROM employees two WHERE one.dept_id = two.dept_id);
+</code></pre>
+
+    <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+    <p class="p">
+      <strong class="ph b">Subquery in the <code class="ph codeph">FROM</code> clause:</strong>
+    </p>
+
+<pre class="pre codeblock"><code>SELECT <var class="keyword varname">select_list</var> FROM <var class="keyword varname">table_ref</var> [, <var class="keyword varname">table_ref</var> ...]
+
+<var class="keyword varname">table_ref</var> ::= <var class="keyword varname">table_name</var> | (<var class="keyword varname">select_statement</var>)
+</code></pre>
+
+    <p class="p">
+      <strong class="ph b">Subqueries in <code class="ph codeph">WHERE</code> clause:</strong>
+    </p>
+
+<pre class="pre codeblock"><code>WHERE <var class="keyword varname">value</var> <var class="keyword varname">comparison_operator</var> (<var class="keyword varname">scalar_select_statement</var>)
+WHERE <var class="keyword varname">value</var> [NOT] IN (<var class="keyword varname">select_statement</var>)
+WHERE [NOT] EXISTS (<var class="keyword varname">correlated_select_statement</var>)
+WHERE NOT EXISTS (<var class="keyword varname">correlated_select_statement</var>)
+</code></pre>
+
+    <p class="p">
+      <code class="ph codeph">comparison_operator</code> is a numeric comparison such as <code class="ph codeph">=</code>,
+      <code class="ph codeph">&lt;=</code>, <code class="ph codeph">!=</code>, and so on, or a string comparison operator such as
+      <code class="ph codeph">LIKE</code> or <code class="ph codeph">REGEXP</code>.
+    </p>
+
+    <p class="p">
+      Although you can use non-equality comparison operators such as <code class="ph codeph">&lt;</code> or
+      <code class="ph codeph">&gt;=</code>, the subquery must include at least one equality comparison between the columns of the
+      inner and outer query blocks.
+    </p>
+
+    <p class="p">
+      All syntax is available for both correlated and uncorrelated queries, except that the <code class="ph codeph">NOT
+      EXISTS</code> clause cannot be used with an uncorrelated subquery.
+    </p>
+
+    <p class="p">
+      Impala subqueries can be nested arbitrarily deep.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Standards compliance:</strong> Introduced in
+      <a class="xref" href="http://en.wikipedia.org/wiki/SQL:1999" target="_blank">SQL:1999</a>.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+    <p class="p">
+      This example illustrates how subqueries can be used in the <code class="ph codeph">FROM</code> clause to organize the table
+      names, column names, and column values by producing intermediate result sets, especially for join queries.
+    </p>
+
+<pre class="pre codeblock"><code>SELECT avg(t1.x), max(t2.y) FROM
+  (SELECT id, cast(a AS DECIMAL(10,5)) AS x FROM raw_data WHERE a BETWEEN 0 AND 100) AS t1
+  JOIN
+  (SELECT id, length(s) AS y FROM raw_data WHERE s LIKE 'A%') AS t2;
+  USING (id);
+</code></pre>
+
+    <p class="p">
+      These examples show how a query can test for the existence of values in a separate table using the
+      <code class="ph codeph">EXISTS()</code> operator with a subquery.
+
+    </p>
+
+    <p class="p">
+      The following examples show how a value can be compared against a set of values returned by a subquery.
+    </p>
+
+<pre class="pre codeblock"><code>SELECT count(x) FROM t1 WHERE EXISTS(SELECT 1 FROM t2 WHERE t1.x = t2.y * 10);
+
+SELECT x FROM t1 WHERE x IN (SELECT y FROM t2 WHERE state = 'CA');
+</code></pre>
+
+    <p class="p">
+      The following examples demonstrate scalar subqueries. When a subquery is known to return a single value, you
+      can substitute it where you would normally put a constant value.
+    </p>
+
+<pre class="pre codeblock"><code>SELECT x FROM t1 WHERE y = (SELECT max(z) FROM t2);
+SELECT x FROM t1 WHERE y &gt; (SELECT count(z) FROM t2);
+</code></pre>
+
+
+
+
+
+
+
+    <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+    <p class="p">
+      If the same table is referenced in both the outer and inner query blocks, construct a table alias in the
+      outer query block and use a fully qualified name to distinguish the inner and outer table references:
+    </p>
+
+
+
+<pre class="pre codeblock"><code>SELECT * FROM t1 one WHERE id IN (SELECT parent FROM t1 two WHERE t1.parent = t2.id);
+</code></pre>
+
+    <p class="p">
+        <strong class="ph b">Internal details:</strong>
+      </p>
+
+    <p class="p">
+      Internally, subqueries involving <code class="ph codeph">IN</code>, <code class="ph codeph">NOT IN</code>, <code class="ph codeph">EXISTS</code>, or
+      <code class="ph codeph">NOT EXISTS</code> clauses are rewritten into join queries. Depending on the syntax, the subquery
+      might be rewritten to an outer join, semi join, cross join, or anti join.
+    </p>
+
+    <p class="p">
+      A query is processed differently depending on whether the subquery calls any aggregation functions. There are
+      correlated and uncorrelated forms, with and without calls to aggregation functions. Each of these four
+      categories is rewritten differently.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Column statistics considerations:</strong>
+      </p>
+
+    <p class="p">
+      Because queries that include correlated and uncorrelated subqueries in the <code class="ph codeph">WHERE</code> clause are
+      written into join queries, to achieve best performance, follow the same guidelines for running the
+      <code class="ph codeph">COMPUTE STATS</code> statement as you do for tables involved in regular join queries. Run the
+      <code class="ph codeph">COMPUTE STATS</code> statement for each associated tables after loading or substantially changing
+      the data in that table. See <a class="xref" href="impala_perf_stats.html#perf_stats">Table and Column Statistics</a> for details.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Added in:</strong> Subqueries are substantially enhanced starting in Impala 2.0. Now,
+      they can be used in the <code class="ph codeph">WHERE</code> clause, in combination with clauses such as
+      <code class="ph codeph">EXISTS</code> and <code class="ph codeph">IN</code>, rather than just in the <code class="ph codeph">FROM</code> clause.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Restrictions:</strong>
+      </p>
+
+    <p class="p">
+      The initial Impala support for nested subqueries addresses the most common use cases. Some restrictions
+      remain:
+    </p>
+
+    <ul class="ul">
+      <li class="li">
+        <p class="p">
+          Although you can use subqueries in a query involving <code class="ph codeph">UNION</code> or <code class="ph codeph">UNION ALL</code>
+          in Impala 2.1.0 and higher, currently you cannot construct a union of two subqueries (for example, in the
+          argument of an <code class="ph codeph">IN</code> or <code class="ph codeph">EXISTS</code> operator).
+        </p>
+      </li>
+
+      <li class="li">
+        <p class="p">
+          Subqueries returning scalar values cannot be used with the operators <code class="ph codeph">ANY</code> or
+          <code class="ph codeph">ALL</code>. (Impala does not currently have a <code class="ph codeph">SOME</code> operator, but if it did,
+          the same restriction would apply.)
+        </p>
+      </li>
+
+      <li class="li">
+        <p class="p">
+          For the <code class="ph codeph">EXISTS</code> and <code class="ph codeph">NOT EXISTS</code> clauses, any subquery comparing values
+          from the outer query block to another table must use at least one equality comparison, not exclusively
+          other kinds of comparisons such as less than, greater than, <code class="ph codeph">BETWEEN</code>, or
+          <code class="ph codeph">!=</code>.
+        </p>
+      </li>
+
+      <li class="li">
+
+        <p class="p">
+          Currently, a scalar subquery cannot be used as the first or second argument to the
+          <code class="ph codeph">BETWEEN</code> operator.
+        </p>
+      </li>
+
+      <li class="li">
+        <p class="p">
+          A subquery cannot be used inside an <code class="ph codeph">OR</code> conjunction. Expressions inside a subquery, for
+          example in the <code class="ph codeph">WHERE</code> clause, can use <code class="ph codeph">OR</code> conjunctions; the restriction
+          only applies to parts of the query <span class="q">"above"</span> the subquery.
+        </p>
+      </li>
+
+      <li class="li">
+        <p class="p">
+          Scalar subqueries are only supported in numeric contexts. You cannot use a scalar subquery as an argument
+          to the <code class="ph codeph">LIKE</code>, <code class="ph codeph">REGEXP</code>, or <code class="ph codeph">RLIKE</code> operators, or compare it
+          to a value of a non-numeric type such as <code class="ph codeph">TIMESTAMP</code> or <code class="ph codeph">BOOLEAN</code>.
+        </p>
+      </li>
+
+      <li class="li">
+        <p class="p">
+
+          You cannot use subqueries with the <code class="ph codeph">CASE</code> function to generate the comparison value, the
+          values to be compared against, or the return value.
+        </p>
+      </li>
+
+      <li class="li">
+        <p class="p">
+          A subquery is not allowed in the filter condition for the <code class="ph codeph">HAVING</code> clause. (Strictly
+          speaking, a subquery cannot appear anywhere outside the <code class="ph codeph">WITH</code>, <code class="ph codeph">FROM</code>, and
+          <code class="ph codeph">WHERE</code> clauses.)
+        </p>
+      </li>
+
+      <li class="li">
+        <p class="p">
+          You must use a fully qualified name
+          (<code class="ph codeph"><var class="keyword varname">table_name</var>.<var class="keyword varname">column_name</var></code> or
+          <code class="ph codeph"><var class="keyword varname">database_name</var>.<var class="keyword varname">table_name</var>.<var class="keyword varname">column_name</var></code>)
+          when referring to any column from the outer query block within a subquery.
+        </p>
+      </li>
+    </ul>
+
+    <p class="p">
+        <strong class="ph b">Complex type considerations:</strong>
+      </p>
+
+    <p class="p">
+      For the complex types (<code class="ph codeph">ARRAY</code>, <code class="ph codeph">STRUCT</code>, and <code class="ph codeph">MAP</code>)
+      available in <span class="keyword">Impala 2.3</span> and higher, the join queries that <span class="q">"unpack"</span> complex type
+      columns often use correlated subqueries in the <code class="ph codeph">FROM</code> clause.
+      For example, if the first table in the join clause is <code class="ph codeph">CUSTOMER</code>, the second
+      join clause might have a subquery that selects from the column <code class="ph codeph">CUSTOMER.C_ORDERS</code>,
+      which is an <code class="ph codeph">ARRAY</code>. The subquery re-evaluates the <code class="ph codeph">ARRAY</code> elements
+      corresponding to each row from the <code class="ph codeph">CUSTOMER</code> table.
+      See <a class="xref" href="impala_complex_types.html#complex_types">Complex Types (Impala 2.3 or higher only)</a> for details and examples of
+      using subqueries with complex types.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+    <p class="p">
+      <a class="xref" href="impala_operators.html#exists">EXISTS Operator</a>, <a class="xref" href="impala_operators.html#in">IN Operator</a>
+    </p>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_select.html">SELECT Statement</a></div></div></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_sum.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_sum.html b/docs/build/html/topics/impala_sum.html
new file mode 100644
index 0000000..95ac90e
--- /dev/null
+++ b/docs/build/html/topics/impala_sum.html
@@ -0,0 +1,333 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_aggregate_functions.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="sum"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>SUM Function</title></head><body id="sum"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">SUM Function</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      
+      An aggregate function that returns the sum of a set of numbers. Its single argument can be numeric column, or
+      the numeric result of a function or expression applied to the column value. Rows with a <code class="ph codeph">NULL</code>
+      value for the specified column are ignored. If the table is empty, or all the values supplied to
+      <code class="ph codeph">MIN</code> are <code class="ph codeph">NULL</code>, <code class="ph codeph">SUM</code> returns <code class="ph codeph">NULL</code>.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>SUM([DISTINCT | ALL] <var class="keyword varname">expression</var>) [OVER (<var class="keyword varname">analytic_clause</var>)]</code></pre>
+
+    <p class="p">
+      When the query contains a <code class="ph codeph">GROUP BY</code> clause, returns one value for each combination of
+      grouping values.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Return type:</strong> <code class="ph codeph">BIGINT</code> for integer arguments, <code class="ph codeph">DOUBLE</code> for floating-point
+      arguments
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Complex type considerations:</strong>
+      </p>
+
+    <p class="p">
+        To access a column with a complex type (<code class="ph codeph">ARRAY</code>, <code class="ph codeph">STRUCT</code>, or <code class="ph codeph">MAP</code>)
+        in an aggregation function, you unpack the individual elements using join notation in the query,
+        and then apply the function to the final scalar item, field, key, or value at the bottom of any nested type hierarchy in the column.
+        See <a class="xref" href="../shared/../topics/impala_complex_types.html#complex_types">Complex Types (Impala 2.3 or higher only)</a> for details about using complex types in Impala.
+      </p>
+
+    <div class="p">
+The following example demonstrates calls to several aggregation functions
+using values from a column containing nested complex types
+(an <code class="ph codeph">ARRAY</code> of <code class="ph codeph">STRUCT</code> items).
+The array is unpacked inside the query using join notation.
+The array elements are referenced using the <code class="ph codeph">ITEM</code>
+pseudocolumn, and the structure fields inside the array elements
+are referenced using dot notation.
+Numeric values such as <code class="ph codeph">SUM()</code> and <code class="ph codeph">AVG()</code>
+are computed using the numeric <code class="ph codeph">R_NATIONKEY</code> field, and
+the general-purpose <code class="ph codeph">MAX()</code> and <code class="ph codeph">MIN()</code>
+values are computed from the string <code class="ph codeph">N_NAME</code> field.
+<pre class="pre codeblock"><code>describe region;
++-------------+-------------------------+---------+
+| name        | type                    | comment |
++-------------+-------------------------+---------+
+| r_regionkey | smallint                |         |
+| r_name      | string                  |         |
+| r_comment   | string                  |         |
+| r_nations   | array&lt;struct&lt;           |         |
+|             |   n_nationkey:smallint, |         |
+|             |   n_name:string,        |         |
+|             |   n_comment:string      |         |
+|             | &gt;&gt;                      |         |
++-------------+-------------------------+---------+
+
+select r_name, r_nations.item.n_nationkey
+  from region, region.r_nations as r_nations
+order by r_name, r_nations.item.n_nationkey;
++-------------+------------------+
+| r_name      | item.n_nationkey |
++-------------+------------------+
+| AFRICA      | 0                |
+| AFRICA      | 5                |
+| AFRICA      | 14               |
+| AFRICA      | 15               |
+| AFRICA      | 16               |
+| AMERICA     | 1                |
+| AMERICA     | 2                |
+| AMERICA     | 3                |
+| AMERICA     | 17               |
+| AMERICA     | 24               |
+| ASIA        | 8                |
+| ASIA        | 9                |
+| ASIA        | 12               |
+| ASIA        | 18               |
+| ASIA        | 21               |
+| EUROPE      | 6                |
+| EUROPE      | 7                |
+| EUROPE      | 19               |
+| EUROPE      | 22               |
+| EUROPE      | 23               |
+| MIDDLE EAST | 4                |
+| MIDDLE EAST | 10               |
+| MIDDLE EAST | 11               |
+| MIDDLE EAST | 13               |
+| MIDDLE EAST | 20               |
++-------------+------------------+
+
+select
+  r_name,
+  count(r_nations.item.n_nationkey) as count,
+  sum(r_nations.item.n_nationkey) as sum,
+  avg(r_nations.item.n_nationkey) as avg,
+  min(r_nations.item.n_name) as minimum,
+  max(r_nations.item.n_name) as maximum,
+  ndv(r_nations.item.n_nationkey) as distinct_vals
+from
+  region, region.r_nations as r_nations
+group by r_name
+order by r_name;
++-------------+-------+-----+------+-----------+----------------+---------------+
+| r_name      | count | sum | avg  | minimum   | maximum        | distinct_vals |
++-------------+-------+-----+------+-----------+----------------+---------------+
+| AFRICA      | 5     | 50  | 10   | ALGERIA   | MOZAMBIQUE     | 5             |
+| AMERICA     | 5     | 47  | 9.4  | ARGENTINA | UNITED STATES  | 5             |
+| ASIA        | 5     | 68  | 13.6 | CHINA     | VIETNAM        | 5             |
+| EUROPE      | 5     | 77  | 15.4 | FRANCE    | UNITED KINGDOM | 5             |
+| MIDDLE EAST | 5     | 58  | 11.6 | EGYPT     | SAUDI ARABIA   | 5             |
++-------------+-------+-----+------+-----------+----------------+---------------+
+</code></pre>
+</div>
+
+    <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+    <p class="p">
+      The following example shows how to use <code class="ph codeph">SUM()</code> to compute the total for all the values in the
+      table, a subset of values, or the sum for each combination of values in the <code class="ph codeph">GROUP BY</code> clause:
+    </p>
+
+<pre class="pre codeblock"><code>-- Total all the values for this column in the table.
+select sum(c1) from t1;
+-- Find the total for this column from a subset of the table.
+select sum(c1) from t1 where month = 'January' and year = '2013';
+-- Find the total from a set of numeric function results.
+select sum(length(s)) from t1;
+-- Often used with functions that return predefined values to compute a score.
+select sum(case when grade = 'A' then 1.0 when grade = 'B' then 0.75 else 0) as class_honors from test_scores;
+-- Can also be used in combination with DISTINCT and/or GROUP BY.
+-- Return more than one result.
+select month, year, sum(purchase_price) from store_stats group by month, year;
+-- Filter the input to eliminate duplicates before performing the calculation.
+select sum(distinct x) from t1;
+</code></pre>
+
+    <div class="p">
+      The following examples show how to use <code class="ph codeph">SUM()</code> in an analytic context. They use a table
+      containing integers from 1 to 10. Notice how the <code class="ph codeph">SUM()</code> is reported for each input value, as
+      opposed to the <code class="ph codeph">GROUP BY</code> clause which condenses the result set.
+<pre class="pre codeblock"><code>select x, property, sum(x) <strong class="ph b">over (partition by property)</strong> as sum from int_t where property in ('odd','even');
++----+----------+-----+
+| x  | property | sum |
++----+----------+-----+
+| 2  | even     | 30  |
+| 4  | even     | 30  |
+| 6  | even     | 30  |
+| 8  | even     | 30  |
+| 10 | even     | 30  |
+| 1  | odd      | 25  |
+| 3  | odd      | 25  |
+| 5  | odd      | 25  |
+| 7  | odd      | 25  |
+| 9  | odd      | 25  |
++----+----------+-----+
+</code></pre>
+
+Adding an <code class="ph codeph">ORDER BY</code> clause lets you experiment with results that are cumulative or apply to a moving
+set of rows (the <span class="q">"window"</span>). The following examples use <code class="ph codeph">SUM()</code> in an analytic context
+(that is, with an <code class="ph codeph">OVER()</code> clause) to produce a running total of all the even values,
+then a running total of all the odd values. The basic <code class="ph codeph">ORDER BY x</code> clause implicitly
+activates a window clause of <code class="ph codeph">RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW</code>,
+which is effectively the same as <code class="ph codeph">ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW</code>,
+therefore all of these examples produce the same results:
+<pre class="pre codeblock"><code>select x, property,
+  sum(x) over (partition by property <strong class="ph b">order by x</strong>) as 'cumulative total'
+  from int_t where property in ('odd','even');
++----+----------+------------------+
+| x  | property | cumulative total |
++----+----------+------------------+
+| 2  | even     | 2                |
+| 4  | even     | 6                |
+| 6  | even     | 12               |
+| 8  | even     | 20               |
+| 10 | even     | 30               |
+| 1  | odd      | 1                |
+| 3  | odd      | 4                |
+| 5  | odd      | 9                |
+| 7  | odd      | 16               |
+| 9  | odd      | 25               |
++----+----------+------------------+
+
+select x, property,
+  sum(x) over
+  (
+    partition by property
+    <strong class="ph b">order by x</strong>
+    <strong class="ph b">range between unbounded preceding and current row</strong>
+  ) as 'cumulative total'
+from int_t where property in ('odd','even');
++----+----------+------------------+
+| x  | property | cumulative total |
++----+----------+------------------+
+| 2  | even     | 2                |
+| 4  | even     | 6                |
+| 6  | even     | 12               |
+| 8  | even     | 20               |
+| 10 | even     | 30               |
+| 1  | odd      | 1                |
+| 3  | odd      | 4                |
+| 5  | odd      | 9                |
+| 7  | odd      | 16               |
+| 9  | odd      | 25               |
++----+----------+------------------+
+
+select x, property,
+  sum(x) over
+  (
+    partition by property
+    <strong class="ph b">order by x</strong>
+    <strong class="ph b">rows between unbounded preceding and current row</strong>
+  ) as 'cumulative total'
+  from int_t where property in ('odd','even');
++----+----------+------------------+
+| x  | property | cumulative total |
++----+----------+------------------+
+| 2  | even     | 2                |
+| 4  | even     | 6                |
+| 6  | even     | 12               |
+| 8  | even     | 20               |
+| 10 | even     | 30               |
+| 1  | odd      | 1                |
+| 3  | odd      | 4                |
+| 5  | odd      | 9                |
+| 7  | odd      | 16               |
+| 9  | odd      | 25               |
++----+----------+------------------+
+</code></pre>
+
+Changing the direction of the <code class="ph codeph">ORDER BY</code> clause causes the intermediate
+results of the cumulative total to be calculated in a different order:
+
+<pre class="pre codeblock"><code>select sum(x) over (partition by property <strong class="ph b">order by x desc</strong>) as 'cumulative total'
+  from int_t where property in ('odd','even');
++----+----------+------------------+
+| x  | property | cumulative total |
++----+----------+------------------+
+| 10 | even     | 10               |
+| 8  | even     | 18               |
+| 6  | even     | 24               |
+| 4  | even     | 28               |
+| 2  | even     | 30               |
+| 9  | odd      | 9                |
+| 7  | odd      | 16               |
+| 5  | odd      | 21               |
+| 3  | odd      | 24               |
+| 1  | odd      | 25               |
++----+----------+------------------+
+</code></pre>
+
+The following examples show how to construct a moving window, with a running total taking into account 1 row before
+and 1 row after the current row, within the same partition (all the even values or all the odd values).
+Because of a restriction in the Impala <code class="ph codeph">RANGE</code> syntax, this type of
+moving window is possible with the <code class="ph codeph">ROWS BETWEEN</code> clause but not the <code class="ph codeph">RANGE BETWEEN</code>
+clause:
+<pre class="pre codeblock"><code>select x, property,
+  sum(x) over
+  (
+    partition by property
+    <strong class="ph b">order by x</strong>
+    <strong class="ph b">rows between 1 preceding and 1 following</strong>
+  ) as 'moving total'
+  from int_t where property in ('odd','even');
++----+----------+--------------+
+| x  | property | moving total |
++----+----------+--------------+
+| 2  | even     | 6            |
+| 4  | even     | 12           |
+| 6  | even     | 18           |
+| 8  | even     | 24           |
+| 10 | even     | 18           |
+| 1  | odd      | 4            |
+| 3  | odd      | 9            |
+| 5  | odd      | 15           |
+| 7  | odd      | 21           |
+| 9  | odd      | 16           |
++----+----------+--------------+
+
+-- Doesn't work because of syntax restriction on RANGE clause.
+select x, property,
+  sum(x) over
+  (
+    partition by property
+    <strong class="ph b">order by x</strong>
+    <strong class="ph b">range between 1 preceding and 1 following</strong>
+  ) as 'moving total'
+from int_t where property in ('odd','even');
+ERROR: AnalysisException: RANGE is only supported with both the lower and upper bounds UNBOUNDED or one UNBOUNDED and the other CURRENT ROW.
+</code></pre>
+    </div>
+
+    <p class="p">
+        <strong class="ph b">Restrictions:</strong>
+      </p>
+
+
+
+    <p class="p">
+        Due to the way arithmetic on <code class="ph codeph">FLOAT</code> and <code class="ph codeph">DOUBLE</code> columns uses
+        high-performance hardware instructions, and distributed queries can perform these operations in different
+        order for each query, results can vary slightly for aggregate function calls such as <code class="ph codeph">SUM()</code>
+        and <code class="ph codeph">AVG()</code> for <code class="ph codeph">FLOAT</code> and <code class="ph codeph">DOUBLE</code> columns, particularly on
+        large data sets where millions or billions of values are summed or averaged. For perfect consistency and
+        repeatability, use the <code class="ph codeph">DECIMAL</code> data type for such operations instead of
+        <code class="ph codeph">FLOAT</code> or <code class="ph codeph">DOUBLE</code>.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+    <p class="p">
+      <a class="xref" href="impala_analytic_functions.html#analytic_functions">Impala Analytic Functions</a>
+    </p>
+
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_aggregate_functions.html">Impala Aggregate Functions</a></div></div></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_support_start_over.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_support_start_over.html b/docs/build/html/topics/impala_support_start_over.html
new file mode 100644
index 0000000..f813773
--- /dev/null
+++ b/docs/build/html/topics/impala_support_start_over.html
@@ -0,0 +1,30 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="support_start_over"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>SUPPORT_START_OVER Query Option</title></head><body id="support_start_over"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">SUPPORT_START_OVER Query Option</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      
+      Leave this setting at its default value.
+      It is a read-only setting, tested by some client applications such as Hue.
+    </p>
+    <p class="p">
+      If you accidentally change it through <span class="keyword cmdname">impala-shell</span>,
+      subsequent queries encounter errors until you undo the change
+      by issuing <code class="ph codeph">UNSET support_start_over</code>.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Type:</strong> Boolean; recognized values are 1 and 0, or <code class="ph codeph">true</code> and <code class="ph codeph">false</code>;
+        any other value interpreted as <code class="ph codeph">false</code>
+      </p>
+    <p class="p">
+        <strong class="ph b">Default:</strong> <code class="ph codeph">false</code>
+      </p>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_sync_ddl.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_sync_ddl.html b/docs/build/html/topics/impala_sync_ddl.html
new file mode 100644
index 0000000..b19e266
--- /dev/null
+++ b/docs/build/html/topics/impala_sync_ddl.html
@@ -0,0 +1,55 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="sync_ddl"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>SYNC_DDL Query Option</title></head><body id="sync_ddl"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">SYNC_DDL Query Option</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      
+      When enabled, causes any DDL operation such as <code class="ph codeph">CREATE TABLE</code> or <code class="ph codeph">ALTER TABLE</code>
+      to return only when the changes have been propagated to all other Impala nodes in the cluster by the Impala
+      catalog service. That way, if you issue a subsequent <code class="ph codeph">CONNECT</code> statement in
+      <span class="keyword cmdname">impala-shell</span> to connect to a different node in the cluster, you can be sure that other
+      node will already recognize any added or changed tables. (The catalog service automatically broadcasts the
+      DDL changes to all nodes automatically, but without this option there could be a period of inconsistency if
+      you quickly switched to another node, such as by issuing a subsequent query through a load-balancing proxy.)
+    </p>
+
+    <p class="p">
+      Although <code class="ph codeph">INSERT</code> is classified as a DML statement, when the <code class="ph codeph">SYNC_DDL</code> option
+      is enabled, <code class="ph codeph">INSERT</code> statements also delay their completion until all the underlying data and
+      metadata changes are propagated to all Impala nodes. Internally, Impala inserts have similarities with DDL
+      statements in traditional database systems, because they create metadata needed to track HDFS block locations
+      for new files and they potentially add new partitions to partitioned tables.
+    </p>
+
+    <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+      Because this option can introduce a delay after each write operation, if you are running a sequence of
+      <code class="ph codeph">CREATE DATABASE</code>, <code class="ph codeph">CREATE TABLE</code>, <code class="ph codeph">ALTER TABLE</code>,
+      <code class="ph codeph">INSERT</code>, and similar statements within a setup script, to minimize the overall delay you can
+      enable the <code class="ph codeph">SYNC_DDL</code> query option only near the end, before the final DDL statement.
+    </div>
+
+    <p class="p">
+        <strong class="ph b">Type:</strong> Boolean; recognized values are 1 and 0, or <code class="ph codeph">true</code> and <code class="ph codeph">false</code>;
+        any other value interpreted as <code class="ph codeph">false</code>
+      </p>
+    <p class="p">
+        <strong class="ph b">Default:</strong> <code class="ph codeph">false</code> (shown as 0 in output of <code class="ph codeph">SET</code> statement)
+      </p>
+
+    
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+    <p class="p">
+      <a class="xref" href="impala_ddl.html#ddl">DDL Statements</a>
+    </p>
+
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>
\ No newline at end of file

[51/51] [partial] incubator-impala git commit: IMPALA-4181 [DOCS] Publish rendered Impala documentation to ASF site

Posted by jb...@apache.org.

IMPALA-4181 [DOCS] Publish rendered Impala documentation to ASF site

Add Impala docs from branch master, commit hash
68f32e52bc42bef578330a4fe0edc5b292891eea.
This is the last commit made by JRussell in the
cleanup project. Removed both HTML files from
the shared folder (ImpalaVariables.html and
impala_common.html)

Change-Id: Ibbf0818c4a7fe1e251e2f36da75cc7c3dd16dead
Reviewed-on: http://gerrit.cloudera.org:8080/6604
Reviewed-by: Michael Brown <mi...@cloudera.com>
Reviewed-by: Jim Apple <jb...@apache.org>
Tested-by: Jim Apple <jb...@apache.org>


Project: http://git-wip-us.apache.org/repos/asf/incubator-impala/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-impala/commit/75c46918
Tree: http://git-wip-us.apache.org/repos/asf/incubator-impala/tree/75c46918
Diff: http://git-wip-us.apache.org/repos/asf/incubator-impala/diff/75c46918

Branch: refs/heads/asf-site
Commit: 75c469182bf781341af0b3fdbb52517a88c9f2a0
Parents: d96cd39
Author: Laurel Hale <la...@cloudera.com>
Authored: Mon Apr 10 16:02:29 2017 -0700
Committer: Jim Apple <jb...@apache.org>
Committed: Wed Apr 12 18:24:01 2017 +0000

----------------------------------------------------------------------
 docs/build/html/commonltr.css                   |  555 ++
 docs/build/html/commonrtl.css                   |  592 ++
 docs/build/html/images/impala_arch.jpeg         |  Bin 0 -> 41900 bytes
 docs/build/html/index.html                      |    3 +
 .../impala_abort_on_default_limit_exceeded.html |   24 +
 .../html/topics/impala_abort_on_error.html      |   42 +
 docs/build/html/topics/impala_admin.html        |   52 +
 docs/build/html/topics/impala_admission.html    |  838 +++
 .../html/topics/impala_aggregate_functions.html |   34 +
 docs/build/html/topics/impala_aliases.html      |   85 +
 .../impala_allow_unsupported_formats.html       |   24 +
 docs/build/html/topics/impala_alter_table.html  | 1033 +++
 docs/build/html/topics/impala_alter_view.html   |  139 +
 .../html/topics/impala_analytic_functions.html  | 1785 ++++++
 .../html/topics/impala_appx_count_distinct.html |   82 +
 docs/build/html/topics/impala_appx_median.html  |  127 +
 docs/build/html/topics/impala_array.html        |  321 +
 docs/build/html/topics/impala_auditing.html     |  222 +
 .../html/topics/impala_authentication.html      |   37 +
 .../build/html/topics/impala_authorization.html | 1177 ++++
 docs/build/html/topics/impala_avg.html          |  318 +
 docs/build/html/topics/impala_avro.html         |  565 ++
 docs/build/html/topics/impala_batch_size.html   |   29 +
 docs/build/html/topics/impala_bigint.html       |  136 +
 .../build/html/topics/impala_bit_functions.html |  848 +++
 docs/build/html/topics/impala_boolean.html      |  170 +
 docs/build/html/topics/impala_breakpad.html     |  223 +
 docs/build/html/topics/impala_char.html         |  305 +
 .../html/topics/impala_cluster_sizing.html      |  318 +
 docs/build/html/topics/impala_comments.html     |   46 +
 .../build/html/topics/impala_complex_types.html | 2606 ++++++++
 docs/build/html/topics/impala_components.html   |  192 +
 .../html/topics/impala_compression_codec.html   |   92 +
 .../build/html/topics/impala_compute_stats.html |  558 ++
 docs/build/html/topics/impala_concepts.html     |   48 +
 .../topics/impala_conditional_functions.html    |  517 ++
 docs/build/html/topics/impala_config.html       |   48 +
 .../html/topics/impala_config_options.html      |  361 ++
 .../html/topics/impala_config_performance.html  |  149 +
 docs/build/html/topics/impala_connecting.html   |  190 +
 .../topics/impala_conversion_functions.html     |  288 +
 docs/build/html/topics/impala_count.html        |  353 ++
 .../html/topics/impala_create_database.html     |  209 +
 .../html/topics/impala_create_function.html     |  502 ++
 docs/build/html/topics/impala_create_role.html  |   70 +
 docs/build/html/topics/impala_create_table.html | 1250 ++++
 docs/build/html/topics/impala_create_view.html  |  194 +
 docs/build/html/topics/impala_databases.html    |   62 +
 docs/build/html/topics/impala_datatypes.html    |   33 +
 .../html/topics/impala_datetime_functions.html  | 2657 ++++++++
 docs/build/html/topics/impala_ddl.html          |  141 +
 docs/build/html/topics/impala_debug_action.html |   24 +
 docs/build/html/topics/impala_decimal.html      |  826 +++
 .../topics/impala_default_order_by_limit.html   |   33 +
 docs/build/html/topics/impala_delegation.html   |   70 +
 docs/build/html/topics/impala_delete.html       |  177 +
 docs/build/html/topics/impala_describe.html     |  802 +++
 docs/build/html/topics/impala_development.html  |  197 +
 .../html/topics/impala_disable_codegen.html     |   36 +
 .../impala_disable_row_runtime_filtering.html   |   72 +
 ...mpala_disable_streaming_preaggregations.html |   50 +
 .../topics/impala_disable_unsafe_spills.html    |   50 +
 docs/build/html/topics/impala_disk_space.html   |  133 +
 docs/build/html/topics/impala_distinct.html     |   81 +
 docs/build/html/topics/impala_dml.html          |   82 +
 docs/build/html/topics/impala_double.html       |  144 +
 .../build/html/topics/impala_drop_database.html |  193 +
 .../build/html/topics/impala_drop_function.html |  136 +
 docs/build/html/topics/impala_drop_role.html    |   71 +
 docs/build/html/topics/impala_drop_stats.html   |  285 +
 docs/build/html/topics/impala_drop_table.html   |  192 +
 docs/build/html/topics/impala_drop_view.html    |   80 +
 .../impala_exec_single_node_rows_threshold.html |   89 +
 docs/build/html/topics/impala_explain.html      |  291 +
 .../build/html/topics/impala_explain_level.html |  342 +
 docs/build/html/topics/impala_explain_plan.html |  592 ++
 docs/build/html/topics/impala_faq.html          |   21 +
 docs/build/html/topics/impala_file_formats.html |  236 +
 docs/build/html/topics/impala_fixed_issues.html | 5889 ++++++++++++++++++
 docs/build/html/topics/impala_float.html        |  136 +
 docs/build/html/topics/impala_functions.html    |  162 +
 .../html/topics/impala_functions_overview.html  |  109 +
 docs/build/html/topics/impala_grant.html        |  137 +
 docs/build/html/topics/impala_group_by.html     |  140 +
 docs/build/html/topics/impala_group_concat.html |  137 +
 docs/build/html/topics/impala_hadoop.html       |  138 +
 docs/build/html/topics/impala_having.html       |   39 +
 docs/build/html/topics/impala_hbase.html        |  763 +++
 .../html/topics/impala_hbase_cache_blocks.html  |   36 +
 .../build/html/topics/impala_hbase_caching.html |   36 +
 docs/build/html/topics/impala_hints.html        |  306 +
 docs/build/html/topics/impala_identifiers.html  |  110 +
 docs/build/html/topics/impala_impala_shell.html |   87 +
 .../topics/impala_incompatible_changes.html     | 1443 +++++
 docs/build/html/topics/impala_insert.html       |  798 +++
 docs/build/html/topics/impala_install.html      |  126 +
 docs/build/html/topics/impala_int.html          |  119 +
 docs/build/html/topics/impala_intro.html        |  198 +
 .../html/topics/impala_invalidate_metadata.html |  294 +
 docs/build/html/topics/impala_isilon.html       |   89 +
 docs/build/html/topics/impala_jdbc.html         |  326 +
 docs/build/html/topics/impala_joins.html        |  531 ++
 docs/build/html/topics/impala_kerberos.html     |  342 +
 docs/build/html/topics/impala_known_issues.html | 1712 +++++
 docs/build/html/topics/impala_kudu.html         | 1329 ++++
 docs/build/html/topics/impala_langref.html      |   66 +
 docs/build/html/topics/impala_langref_sql.html  |   28 +
 .../html/topics/impala_langref_unsupported.html |  329 +
 docs/build/html/topics/impala_ldap.html         |  294 +
 docs/build/html/topics/impala_limit.html        |  168 +
 docs/build/html/topics/impala_lineage.html      |   91 +
 docs/build/html/topics/impala_literals.html     |  427 ++
 .../build/html/topics/impala_live_progress.html |  131 +
 docs/build/html/topics/impala_live_summary.html |  177 +
 docs/build/html/topics/impala_load_data.html    |  306 +
 docs/build/html/topics/impala_logging.html      |  416 ++
 docs/build/html/topics/impala_map.html          |  331 +
 .../html/topics/impala_math_functions.html      | 1498 +++++
 docs/build/html/topics/impala_max.html          |  298 +
 docs/build/html/topics/impala_max_errors.html   |   40 +
 .../html/topics/impala_max_io_buffers.html      |   23 +
 .../topics/impala_max_num_runtime_filters.html  |   65 +
 .../topics/impala_max_scan_range_length.html    |   47 +
 docs/build/html/topics/impala_mem_limit.html    |  206 +
 docs/build/html/topics/impala_min.html          |  297 +
 .../html/topics/impala_misc_functions.html      |  175 +
 .../html/topics/impala_mixed_security.html      |   26 +
 docs/build/html/topics/impala_mt_dop.html       |  190 +
 docs/build/html/topics/impala_ndv.html          |  226 +
 docs/build/html/topics/impala_new_features.html | 3712 +++++++++++
 docs/build/html/topics/impala_num_nodes.html    |   61 +
 .../html/topics/impala_num_scanner_threads.html |   27 +
 docs/build/html/topics/impala_odbc.html         |   24 +
 docs/build/html/topics/impala_offset.html       |   67 +
 docs/build/html/topics/impala_operators.html    | 1937 ++++++
 .../impala_optimize_partition_key_scans.html    |  188 +
 docs/build/html/topics/impala_order_by.html     |  407 ++
 docs/build/html/topics/impala_parquet.html      | 1392 +++++
 .../impala_parquet_annotate_strings_utf8.html   |   54 +
 .../impala_parquet_compression_codec.html       |   17 +
 ...pala_parquet_fallback_schema_resolution.html |   46 +
 .../html/topics/impala_parquet_file_size.html   |   93 +
 docs/build/html/topics/impala_partitioning.html |  653 ++
 .../html/topics/impala_perf_benchmarking.html   |   27 +
 .../build/html/topics/impala_perf_cookbook.html |  256 +
 .../html/topics/impala_perf_hdfs_caching.html   |  578 ++
 docs/build/html/topics/impala_perf_joins.html   |  493 ++
 .../html/topics/impala_perf_resources.html      |   47 +
 docs/build/html/topics/impala_perf_skew.html    |  139 +
 docs/build/html/topics/impala_perf_stats.html   |  996 +++
 docs/build/html/topics/impala_perf_testing.html |  152 +
 docs/build/html/topics/impala_performance.html  |  116 +
 docs/build/html/topics/impala_planning.html     |   20 +
 docs/build/html/topics/impala_porting.html      |  603 ++
 docs/build/html/topics/impala_ports.html        |  421 ++
 .../build/html/topics/impala_prefetch_mode.html |   47 +
 docs/build/html/topics/impala_prereqs.html      |  275 +
 docs/build/html/topics/impala_processes.html    |  115 +
 docs/build/html/topics/impala_proxy.html        |  396 ++
 .../build/html/topics/impala_query_options.html |   49 +
 .../html/topics/impala_query_timeout_s.html     |   62 +
 docs/build/html/topics/impala_rcfile.html       |  246 +
 docs/build/html/topics/impala_real.html         |   39 +
 docs/build/html/topics/impala_refresh.html      |  387 ++
 .../build/html/topics/impala_release_notes.html |   26 +
 docs/build/html/topics/impala_relnotes.html     |   26 +
 .../html/topics/impala_replica_preference.html  |   45 +
 docs/build/html/topics/impala_request_pool.html |   35 +
 .../impala_reservation_request_timeout.html     |   21 +
 .../html/topics/impala_reserved_words.html      |  357 ++
 .../html/topics/impala_resource_management.html |   97 +
 docs/build/html/topics/impala_revoke.html       |  117 +
 .../impala_runtime_bloom_filter_size.html       |   94 +
 .../topics/impala_runtime_filter_max_size.html  |   55 +
 .../topics/impala_runtime_filter_min_size.html  |   55 +
 .../html/topics/impala_runtime_filter_mode.html |   75 +
 .../impala_runtime_filter_wait_time_ms.html     |   51 +
 .../html/topics/impala_runtime_filtering.html   |  521 ++
 docs/build/html/topics/impala_s3.html           |  775 +++
 .../topics/impala_s3_skip_insert_staging.html   |   78 +
 docs/build/html/topics/impala_scalability.html  |  711 +++
 .../impala_scan_node_codegen_threshold.html     |   69 +
 .../topics/impala_schedule_random_replica.html  |   83 +
 .../build/html/topics/impala_schema_design.html |  184 +
 .../html/topics/impala_schema_objects.html      |   48 +
 .../build/html/topics/impala_scratch_limit.html |   77 +
 docs/build/html/topics/impala_security.html     |   99 +
 .../html/topics/impala_security_files.html      |   58 +
 .../html/topics/impala_security_guidelines.html |   99 +
 .../html/topics/impala_security_install.html    |   17 +
 .../html/topics/impala_security_metastore.html  |   30 +
 .../html/topics/impala_security_webui.html      |   57 +
 docs/build/html/topics/impala_select.html       |  227 +
 docs/build/html/topics/impala_seqfile.html      |  240 +
 docs/build/html/topics/impala_set.html          |  200 +
 .../html/topics/impala_shell_commands.html      |  392 ++
 .../build/html/topics/impala_shell_options.html |  564 ++
 .../topics/impala_shell_running_commands.html   |  257 +
 docs/build/html/topics/impala_show.html         | 1525 +++++
 docs/build/html/topics/impala_smallint.html     |  125 +
 docs/build/html/topics/impala_ssl.html          |  119 +
 docs/build/html/topics/impala_stddev.html       |  121 +
 docs/build/html/topics/impala_string.html       |  197 +
 .../html/topics/impala_string_functions.html    | 1036 +++
 docs/build/html/topics/impala_struct.html       |  500 ++
 docs/build/html/topics/impala_subqueries.html   |  316 +
 docs/build/html/topics/impala_sum.html          |  333 +
 .../html/topics/impala_support_start_over.html  |   30 +
 docs/build/html/topics/impala_sync_ddl.html     |   55 +
 docs/build/html/topics/impala_tables.html       |  446 ++
 docs/build/html/topics/impala_timeouts.html     |  168 +
 docs/build/html/topics/impala_timestamp.html    |  514 ++
 docs/build/html/topics/impala_tinyint.html      |  131 +
 .../html/topics/impala_troubleshooting.html     |  370 ++
 .../html/topics/impala_truncate_table.html      |  200 +
 docs/build/html/topics/impala_tutorial.html     | 2270 +++++++
 docs/build/html/topics/impala_txtfile.html      |  770 +++
 docs/build/html/topics/impala_udf.html          | 1585 +++++
 docs/build/html/topics/impala_union.html        |  146 +
 docs/build/html/topics/impala_update.html       |  169 +
 docs/build/html/topics/impala_upgrading.html    |  115 +
 docs/build/html/topics/impala_upsert.html       |  113 +
 docs/build/html/topics/impala_use.html          |   84 +
 docs/build/html/topics/impala_v_cpu_cores.html  |   21 +
 docs/build/html/topics/impala_varchar.html      |  254 +
 docs/build/html/topics/impala_variance.html     |  132 +
 docs/build/html/topics/impala_views.html        |  284 +
 docs/build/html/topics/impala_webui.html        |  311 +
 docs/build/html/topics/impala_with.html         |   63 +
 docs/build/impala.pdf                           |  Bin 0 -> 3653059 bytes
 230 files changed, 79827 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/commonltr.css
----------------------------------------------------------------------
diff --git a/docs/build/html/commonltr.css b/docs/build/html/commonltr.css
new file mode 100644
index 0000000..f0738c6
--- /dev/null
+++ b/docs/build/html/commonltr.css
@@ -0,0 +1,555 @@
+/*!
+ * This file is part of the DITA Open Toolkit project. See the accompanying LICENSE.md file for applicable licenses.
+ */
+/*
+ | (c) Copyright IBM Corp. 2004, 2005 All Rights Reserved.
+ */
+.codeblock {
+  font-family: monospace;
+}
+
+.codeph {
+  font-family: monospace;
+}
+
+.kwd {
+  font-weight: bold;
+}
+
+.parmname {
+  font-weight: bold;
+}
+
+.var {
+  font-style: italic;
+}
+
+.filepath {
+  font-family: monospace;
+}
+
+div.tasklabel {
+  margin-top: 1em;
+  margin-bottom: 1em;
+}
+
+h2.tasklabel,
+h3.tasklabel,
+h4.tasklabel,
+h5.tasklabel,
+h6.tasklabel {
+  font-size: 100%;
+}
+
+.screen {
+  padding: 5px 5px 5px 5px;
+  border: outset;
+  background-color: #CCCCCC;
+  margin-top: 2px;
+  margin-bottom: 2px;
+  white-space: pre;
+}
+
+.wintitle {
+  font-weight: bold;
+}
+
+.numcharref {
+  color: #663399;
+  font-family: Menlo, Monaco, Consolas, "Courier New", monospace;
+}
+
+.parameterentity {
+  color: #663399;
+  font-family: Menlo, Monaco, Consolas, "Courier New", monospace;
+}
+
+.textentity {
+  color: #663399;
+  font-family: Menlo, Monaco, Consolas, "Courier New", monospace;
+}
+
+.xmlatt {
+  color: #663399;
+  font-family: Menlo, Monaco, Consolas, "Courier New", monospace;
+}
+
+.xmlelement {
+  color: #663399;
+  font-family: Menlo, Monaco, Consolas, "Courier New", monospace;
+}
+
+.xmlnsname {
+  color: #663399;
+  font-family: Menlo, Monaco, Consolas, "Courier New", monospace;
+}
+
+.xmlpi {
+  color: #663399;
+  font-family: Menlo, Monaco, Consolas, "Courier New", monospace;
+}
+
+.frame-top {
+  border-top: solid 1px;
+  border-right: 0;
+  border-bottom: 0;
+  border-left: 0;
+}
+
+.frame-bottom {
+  border-top: 0;
+  border-right: 0;
+  border-bottom: solid 1px;
+  border-left: 0;
+}
+
+.frame-topbot {
+  border-top: solid 1px;
+  border-right: 0;
+  border-bottom: solid 1px;
+  border-left: 0;
+}
+
+.frame-all {
+  border: solid 1px;
+}
+
+.frame-sides {
+  border-top: 0;
+  border-left: solid 1px;
+  border-right: solid 1px;
+  border-bottom: 0;
+}
+
+.frame-none {
+  border: 0;
+}
+
+.scale-50 {
+  font-size: 50%;
+}
+
+.scale-60 {
+  font-size: 60%;
+}
+
+.scale-70 {
+  font-size: 70%;
+}
+
+.scale-80 {
+  font-size: 80%;
+}
+
+.scale-90 {
+  font-size: 90%;
+}
+
+.scale-100 {
+  font-size: 100%;
+}
+
+.scale-110 {
+  font-size: 110%;
+}
+
+.scale-120 {
+  font-size: 120%;
+}
+
+.scale-140 {
+  font-size: 140%;
+}
+
+.scale-160 {
+  font-size: 160%;
+}
+
+.scale-180 {
+  font-size: 180%;
+}
+
+.scale-200 {
+  font-size: 200%;
+}
+
+.expanse-page, .expanse-spread {
+  width: 100%;
+}
+
+.fig {
+  /* Default of italics to set apart figure captions */
+  /* Use @frame to create frames on figures */
+}
+.figcap {
+  font-style: italic;
+}
+.figdesc {
+  font-style: normal;
+}
+.figborder {
+  border-color: Silver;
+  border-style: solid;
+  border-width: 2px;
+  margin-top: 1em;
+  padding-left: 3px;
+  padding-right: 3px;
+}
+.figsides {
+  border-color: Silver;
+  border-left: 2px solid;
+  border-right: 2px solid;
+  margin-top: 1em;
+  padding-left: 3px;
+  padding-right: 3px;
+}
+.figtop {
+  border-color: Silver;
+  border-top: 2px solid;
+  margin-top: 1em;
+}
+.figbottom {
+  border-bottom: 2px solid;
+  border-color: Silver;
+}
+.figtopbot {
+  border-bottom: 2px solid;
+  border-color: Silver;
+  border-top: 2px solid;
+  margin-top: 1em;
+}
+
+/* Align images based on @align on topic/image */
+div.imageleft {
+  text-align: left;
+}
+
+div.imagecenter {
+  text-align: center;
+}
+
+div.imageright {
+  text-align: right;
+}
+
+div.imagejustify {
+  text-align: justify;
+}
+
+/* Set heading sizes, getting smaller for deeper nesting */
+.topictitle1 {
+  font-size: 1.34em;
+  margin-bottom: 0.1em;
+  margin-top: 0;
+}
+
+.topictitle2 {
+  font-size: 1.17em;
+  margin-bottom: 0.45em;
+  margin-top: 1pc;
+}
+
+.topictitle3 {
+  font-size: 1.17em;
+  font-weight: bold;
+  margin-bottom: 0.17em;
+  margin-top: 1pc;
+}
+
+.topictitle4 {
+  font-size: 1.17em;
+  font-weight: bold;
+  margin-top: 0.83em;
+}
+
+.topictitle5 {
+  font-size: 1.17em;
+  font-weight: bold;
+}
+
+.topictitle6 {
+  font-size: 1.17em;
+  font-style: italic;
+}
+
+.sectiontitle {
+  color: #000;
+  font-size: 1.17em;
+  font-weight: bold;
+  margin-bottom: 0;
+  margin-top: 1em;
+}
+
+.section {
+  margin-bottom: 1em;
+  margin-top: 1em;
+}
+
+.example {
+  margin-bottom: 1em;
+  margin-top: 1em;
+}
+
+/* Most link groups are created with <div>. Ensure they have space before and after. */
+.ullinks {
+  list-style-type: none;
+}
+
+.ulchildlink {
+  margin-bottom: 1em;
+  margin-top: 1em;
+}
+
+.olchildlink {
+  margin-bottom: 1em;
+  margin-top: 1em;
+}
+
+.linklist {
+  margin-bottom: 1em;
+}
+
+.linklistwithchild {
+  margin-bottom: 1em;
+  margin-left: 1.5em;
+}
+
+.sublinklist {
+  margin-bottom: 1em;
+  margin-left: 1.5em;
+}
+
+.relconcepts {
+  margin-bottom: 1em;
+  margin-top: 1em;
+}
+
+.reltasks {
+  margin-bottom: 1em;
+  margin-top: 1em;
+}
+
+.relref {
+  margin-bottom: 1em;
+  margin-top: 1em;
+}
+
+.relinfo {
+  margin-bottom: 1em;
+  margin-top: 1em;
+}
+
+.breadcrumb {
+  font-size: smaller;
+  margin-bottom: 1em;
+}
+
+/* Simple lists do not get a bullet */
+ul.simple {
+  list-style-type: none;
+}
+
+/* Default of bold for definition list terms */
+.dlterm {
+  font-weight: bold;
+}
+
+/* Use CSS to expand lists with @compact="no" */
+.dltermexpand {
+  font-weight: bold;
+  margin-top: 1em;
+}
+
+*[compact="yes"] > li {
+  margin-top: 0;
+}
+
+*[compact="no"] > li {
+  margin-top: 0.53em;
+}
+
+.liexpand {
+  margin-bottom: 1em;
+  margin-top: 1em;
+}
+
+.sliexpand {
+  margin-bottom: 1em;
+  margin-top: 1em;
+}
+
+.dlexpand {
+  margin-bottom: 1em;
+  margin-top: 1em;
+}
+
+.ddexpand {
+  margin-bottom: 1em;
+  margin-top: 1em;
+}
+
+.stepexpand {
+  margin-bottom: 1em;
+  margin-top: 1em;
+}
+
+.substepexpand {
+  margin-bottom: 1em;
+  margin-top: 1em;
+}
+
+dt.prereq {
+  margin-left: 20px;
+}
+
+/* All note formats have the same default presentation */
+.note {
+  margin-bottom: 1em;
+  margin-top: 1em;
+}
+.note .notetitle, .note .notelisttitle,
+.note .note__title {
+  font-weight: bold;
+}
+
+/* Various basic phrase styles */
+.bold {
+  font-weight: bold;
+}
+
+.bolditalic {
+  font-style: italic;
+  font-weight: bold;
+}
+
+.italic {
+  font-style: italic;
+}
+
+.underlined {
+  text-decoration: underline;
+}
+
+.uicontrol {
+  font-weight: bold;
+}
+
+.defkwd {
+  font-weight: bold;
+  text-decoration: underline;
+}
+
+.shortcut {
+  text-decoration: underline;
+}
+
+table {
+  border-collapse: collapse;
+}
+
+table .desc {
+  display: block;
+  font-style: italic;
+}
+
+.cellrowborder {
+  border-bottom: solid 1px;
+  border-left: 0;
+  border-right: solid 1px;
+  border-top: 0;
+}
+
+.row-nocellborder {
+  border-bottom: solid 1px;
+  border-left: 0;
+  border-top: 0;
+}
+
+.cell-norowborder {
+  border-left: 0;
+  border-right: solid 1px;
+  border-top: 0;
+}
+
+.nocellnorowborder {
+  border: 0;
+}
+
+.firstcol {
+  font-weight: bold;
+}
+
+.table--pgwide-1 {
+  width: 100%;
+}
+
+.align-left {
+  text-align: left;
+}
+
+.align-right {
+  text-align: right;
+}
+
+.align-center {
+  text-align: center;
+}
+
+.align-justify {
+  text-align: justify;
+}
+
+.align-char {
+  text-align: char;
+}
+
+.valign-top {
+  vertical-align: top;
+}
+
+.valign-bottom {
+  vertical-align: bottom;
+}
+
+.valign-middle {
+  vertical-align: middle;
+}
+
+.colsep-0 {
+  border-right: 0;
+}
+
+.colsep-1 {
+  border-right: 1px solid;
+}
+
+.rowsep-0 {
+  border-bottom: 0;
+}
+
+.rowsep-1 {
+  border-bottom: 1px solid;
+}
+
+.stentry {
+  border-right: 1px solid;
+  border-bottom: 1px solid;
+}
+
+.stentry:last-child {
+  border-right: 0;
+}
+
+.strow:last-child .stentry {
+  border-bottom: 0;
+}
+
+/* Add space for top level topics */
+.nested0 {
+  margin-top: 1em;
+}
+
+/* div with class=p is used for paragraphs that contain blocks, to keep the XHTML valid */
+.p {
+  margin-top: 1em;
+}

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/commonrtl.css
----------------------------------------------------------------------
diff --git a/docs/build/html/commonrtl.css b/docs/build/html/commonrtl.css
new file mode 100644
index 0000000..99acb72
--- /dev/null
+++ b/docs/build/html/commonrtl.css
@@ -0,0 +1,592 @@
+/*!
+ * This file is part of the DITA Open Toolkit project. See the accompanying LICENSE.md file for applicable licenses.
+ */
+/*
+ | (c) Copyright IBM Corp. 2004, 2005 All Rights Reserved.
+ */
+.codeblock {
+  font-family: monospace;
+}
+
+.codeph {
+  font-family: monospace;
+}
+
+.kwd {
+  font-weight: bold;
+}
+
+.parmname {
+  font-weight: bold;
+}
+
+.var {
+  font-style: italic;
+}
+
+.filepath {
+  font-family: monospace;
+}
+
+div.tasklabel {
+  margin-top: 1em;
+  margin-bottom: 1em;
+}
+
+h2.tasklabel,
+h3.tasklabel,
+h4.tasklabel,
+h5.tasklabel,
+h6.tasklabel {
+  font-size: 100%;
+}
+
+.screen {
+  padding: 5px 5px 5px 5px;
+  border: outset;
+  background-color: #CCCCCC;
+  margin-top: 2px;
+  margin-bottom: 2px;
+  white-space: pre;
+}
+
+.wintitle {
+  font-weight: bold;
+}
+
+.numcharref {
+  color: #663399;
+  font-family: Menlo, Monaco, Consolas, "Courier New", monospace;
+}
+
+.parameterentity {
+  color: #663399;
+  font-family: Menlo, Monaco, Consolas, "Courier New", monospace;
+}
+
+.textentity {
+  color: #663399;
+  font-family: Menlo, Monaco, Consolas, "Courier New", monospace;
+}
+
+.xmlatt {
+  color: #663399;
+  font-family: Menlo, Monaco, Consolas, "Courier New", monospace;
+}
+
+.xmlelement {
+  color: #663399;
+  font-family: Menlo, Monaco, Consolas, "Courier New", monospace;
+}
+
+.xmlnsname {
+  color: #663399;
+  font-family: Menlo, Monaco, Consolas, "Courier New", monospace;
+}
+
+.xmlpi {
+  color: #663399;
+  font-family: Menlo, Monaco, Consolas, "Courier New", monospace;
+}
+
+.frame-top {
+  border-top: solid 1px;
+  border-right: 0;
+  border-bottom: 0;
+  border-left: 0;
+}
+
+.frame-bottom {
+  border-top: 0;
+  border-right: 0;
+  border-bottom: solid 1px;
+  border-left: 0;
+}
+
+.frame-topbot {
+  border-top: solid 1px;
+  border-right: 0;
+  border-bottom: solid 1px;
+  border-left: 0;
+}
+
+.frame-all {
+  border: solid 1px;
+}
+
+.frame-sides {
+  border-top: 0;
+  border-left: solid 1px;
+  border-right: solid 1px;
+  border-bottom: 0;
+}
+
+.frame-none {
+  border: 0;
+}
+
+.scale-50 {
+  font-size: 50%;
+}
+
+.scale-60 {
+  font-size: 60%;
+}
+
+.scale-70 {
+  font-size: 70%;
+}
+
+.scale-80 {
+  font-size: 80%;
+}
+
+.scale-90 {
+  font-size: 90%;
+}
+
+.scale-100 {
+  font-size: 100%;
+}
+
+.scale-110 {
+  font-size: 110%;
+}
+
+.scale-120 {
+  font-size: 120%;
+}
+
+.scale-140 {
+  font-size: 140%;
+}
+
+.scale-160 {
+  font-size: 160%;
+}
+
+.scale-180 {
+  font-size: 180%;
+}
+
+.scale-200 {
+  font-size: 200%;
+}
+
+.expanse-page, .expanse-spread {
+  width: 100%;
+}
+
+.fig {
+  /* Default of italics to set apart figure captions */
+  /* Use @frame to create frames on figures */
+}
+.figcap {
+  font-style: italic;
+}
+.figdesc {
+  font-style: normal;
+}
+.figborder {
+  border-color: Silver;
+  border-style: solid;
+  border-width: 2px;
+  margin-top: 1em;
+  padding-left: 3px;
+  padding-right: 3px;
+}
+.figsides {
+  border-color: Silver;
+  border-left: 2px solid;
+  border-right: 2px solid;
+  margin-top: 1em;
+  padding-left: 3px;
+  padding-right: 3px;
+}
+.figtop {
+  border-color: Silver;
+  border-top: 2px solid;
+  margin-top: 1em;
+}
+.figbottom {
+  border-bottom: 2px solid;
+  border-color: Silver;
+}
+.figtopbot {
+  border-bottom: 2px solid;
+  border-color: Silver;
+  border-top: 2px solid;
+  margin-top: 1em;
+}
+
+/* Align images based on @align on topic/image */
+div.imageleft {
+  text-align: left;
+}
+
+div.imagecenter {
+  text-align: center;
+}
+
+div.imageright {
+  text-align: right;
+}
+
+div.imagejustify {
+  text-align: justify;
+}
+
+/* Set heading sizes, getting smaller for deeper nesting */
+.topictitle1 {
+  font-size: 1.34em;
+  margin-bottom: 0.1em;
+  margin-top: 0;
+}
+
+.topictitle2 {
+  font-size: 1.17em;
+  margin-bottom: 0.45em;
+  margin-top: 1pc;
+}
+
+.topictitle3 {
+  font-size: 1.17em;
+  font-weight: bold;
+  margin-bottom: 0.17em;
+  margin-top: 1pc;
+}
+
+.topictitle4 {
+  font-size: 1.17em;
+  font-weight: bold;
+  margin-top: 0.83em;
+}
+
+.topictitle5 {
+  font-size: 1.17em;
+  font-weight: bold;
+}
+
+.topictitle6 {
+  font-size: 1.17em;
+  font-style: italic;
+}
+
+.sectiontitle {
+  color: #000;
+  font-size: 1.17em;
+  font-weight: bold;
+  margin-bottom: 0;
+  margin-top: 1em;
+}
+
+.section {
+  margin-bottom: 1em;
+  margin-top: 1em;
+}
+
+.example {
+  margin-bottom: 1em;
+  margin-top: 1em;
+}
+
+/* Most link groups are created with <div>. Ensure they have space before and after. */
+.ullinks {
+  list-style-type: none;
+}
+
+.ulchildlink {
+  margin-bottom: 1em;
+  margin-top: 1em;
+}
+
+.olchildlink {
+  margin-bottom: 1em;
+  margin-top: 1em;
+}
+
+.linklist {
+  margin-bottom: 1em;
+}
+
+.linklistwithchild {
+  margin-bottom: 1em;
+  margin-left: 1.5em;
+}
+
+.sublinklist {
+  margin-bottom: 1em;
+  margin-left: 1.5em;
+}
+
+.relconcepts {
+  margin-bottom: 1em;
+  margin-top: 1em;
+}
+
+.reltasks {
+  margin-bottom: 1em;
+  margin-top: 1em;
+}
+
+.relref {
+  margin-bottom: 1em;
+  margin-top: 1em;
+}
+
+.relinfo {
+  margin-bottom: 1em;
+  margin-top: 1em;
+}
+
+.breadcrumb {
+  font-size: smaller;
+  margin-bottom: 1em;
+}
+
+/* Simple lists do not get a bullet */
+ul.simple {
+  list-style-type: none;
+}
+
+/* Default of bold for definition list terms */
+.dlterm {
+  font-weight: bold;
+}
+
+/* Use CSS to expand lists with @compact="no" */
+.dltermexpand {
+  font-weight: bold;
+  margin-top: 1em;
+}
+
+*[compact="yes"] > li {
+  margin-top: 0;
+}
+
+*[compact="no"] > li {
+  margin-top: 0.53em;
+}
+
+.liexpand {
+  margin-bottom: 1em;
+  margin-top: 1em;
+}
+
+.sliexpand {
+  margin-bottom: 1em;
+  margin-top: 1em;
+}
+
+.dlexpand {
+  margin-bottom: 1em;
+  margin-top: 1em;
+}
+
+.ddexpand {
+  margin-bottom: 1em;
+  margin-top: 1em;
+}
+
+.stepexpand {
+  margin-bottom: 1em;
+  margin-top: 1em;
+}
+
+.substepexpand {
+  margin-bottom: 1em;
+  margin-top: 1em;
+}
+
+dt.prereq {
+  margin-left: 20px;
+}
+
+/* All note formats have the same default presentation */
+.note {
+  margin-bottom: 1em;
+  margin-top: 1em;
+}
+.note .notetitle, .note .notelisttitle,
+.note .note__title {
+  font-weight: bold;
+}
+
+/* Various basic phrase styles */
+.bold {
+  font-weight: bold;
+}
+
+.bolditalic {
+  font-style: italic;
+  font-weight: bold;
+}
+
+.italic {
+  font-style: italic;
+}
+
+.underlined {
+  text-decoration: underline;
+}
+
+.uicontrol {
+  font-weight: bold;
+}
+
+.defkwd {
+  font-weight: bold;
+  text-decoration: underline;
+}
+
+.shortcut {
+  text-decoration: underline;
+}
+
+table {
+  border-collapse: collapse;
+}
+
+table .desc {
+  display: block;
+  font-style: italic;
+}
+
+.cellrowborder {
+  border-bottom: solid 1px;
+  border-left: 0;
+  border-right: solid 1px;
+  border-top: 0;
+}
+
+.row-nocellborder {
+  border-bottom: solid 1px;
+  border-left: 0;
+  border-top: 0;
+}
+
+.cell-norowborder {
+  border-left: 0;
+  border-right: solid 1px;
+  border-top: 0;
+}
+
+.nocellnorowborder {
+  border: 0;
+}
+
+.firstcol {
+  font-weight: bold;
+}
+
+.table--pgwide-1 {
+  width: 100%;
+}
+
+.align-left {
+  text-align: left;
+}
+
+.align-right {
+  text-align: right;
+}
+
+.align-center {
+  text-align: center;
+}
+
+.align-justify {
+  text-align: justify;
+}
+
+.align-char {
+  text-align: char;
+}
+
+.valign-top {
+  vertical-align: top;
+}
+
+.valign-bottom {
+  vertical-align: bottom;
+}
+
+.valign-middle {
+  vertical-align: middle;
+}
+
+.colsep-0 {
+  border-right: 0;
+}
+
+.colsep-1 {
+  border-right: 1px solid;
+}
+
+.rowsep-0 {
+  border-bottom: 0;
+}
+
+.rowsep-1 {
+  border-bottom: 1px solid;
+}
+
+.stentry {
+  border-right: 1px solid;
+  border-bottom: 1px solid;
+}
+
+.stentry:last-child {
+  border-right: 0;
+}
+
+.strow:last-child .stentry {
+  border-bottom: 0;
+}
+
+/* Add space for top level topics */
+.nested0 {
+  margin-top: 1em;
+}
+
+/* div with class=p is used for paragraphs that contain blocks, to keep the XHTML valid */
+.p {
+  margin-top: 1em;
+}
+
+.linklist {
+  margin-bottom: 1em;
+}
+
+.linklistwithchild {
+  margin-right: 1.5em;
+  margin-top: 1em;
+}
+
+.sublinklist {
+  margin-right: 1.5em;
+  margin-top: 1em;
+}
+
+dt.prereq {
+  margin-right: 20px;
+}
+
+.cellrowborder {
+  border-left: solid 1px;
+  border-right: none;
+}
+
+.row-nocellborder {
+  border-left: hidden;
+  border-right: none;
+}
+
+.cell-norowborder {
+  border-left: solid 1px;
+  border-right: none;
+}
+
+.nocellnorowborder {
+  border-left: hidden;
+}

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/images/impala_arch.jpeg
----------------------------------------------------------------------
diff --git a/docs/build/html/images/impala_arch.jpeg b/docs/build/html/images/impala_arch.jpeg
new file mode 100644
index 0000000..8289469
Binary files /dev/null and b/docs/build/html/images/impala_arch.jpeg differ

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/index.html
----------------------------------------------------------------------
diff --git a/docs/build/html/index.html b/docs/build/html/index.html
new file mode 100644
index 0000000..faad535
--- /dev/null
+++ b/docs/build/html/index.html
@@ -0,0 +1,3 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="map"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="impala"><link rel="stylesheet" type="text/css" href="commonltr.css"><title>Apache Impala (incubating) Guide</title></head><body id="impala"><h1 class="title topictitle1">Apache Impala (incubating) Guide</h1><nav><ul class="map"><li class="topicref"><a href="topics/impala_intro.html">Introducing Apache Impala (incubating)</a></li><li class="topicref"><a href="topics/impala_concepts.html">Concepts and Architecture</a><ul><li class="topicref"><a href="topics/impala_components.html">Components</a></li><li class="topicref"><a href="topics/impala_development.html">Developing Applications</a></li><li class="topicref"><a href="topics/impala_hadoop.html">Role in the 
 Hadoop Ecosystem</a></li></ul></li><li class="topicref"><a href="topics/impala_planning.html">Deployment Planning</a><ul><li class="topicref"><a href="topics/impala_prereqs.html#prereqs">Requirements</a></li><li class="topicref"><a href="topics/impala_cluster_sizing.html">Cluster Sizing</a></li><li class="topicref"><a href="topics/impala_schema_design.html">Designing Schemas</a></li></ul></li><li class="topicref"><a href="topics/impala_install.html#install">Installing Impala</a></li><li class="topicref"><a href="topics/impala_config.html">Managing Impala</a><ul><li class="topicref"><a href="topics/impala_config_performance.html">Post-Installation Configuration for Impala</a></li><li class="topicref"><a href="topics/impala_odbc.html">Configuring Impala to Work with ODBC</a></li><li class="topicref"><a href="topics/impala_jdbc.html">Configuring Impala to Work with JDBC</a></li></ul></li><li class="topicref"><a href="topics/impala_upgrading.html">Upgrading Impala</a></li><li class="top
 icref"><a href="topics/impala_processes.html">Starting Impala</a><ul><li class="topicref"><a href="topics/impala_config_options.html">Modifying Impala Startup Options</a></li></ul></li><li class="topicref"><a href="topics/impala_tutorial.html">Tutorials</a></li><li class="topicref"><a href="topics/impala_admin.html">Administration</a><ul><li class="topicref"><a href="topics/impala_admission.html">Admission Control and Query Queuing</a></li><li class="topicref"><a href="topics/impala_resource_management.html">Resource Management for Impala</a></li><li class="topicref"><a href="topics/impala_timeouts.html">Setting Timeouts</a></li><li class="topicref"><a href="topics/impala_proxy.html">Load-Balancing Proxy for HA</a></li><li class="topicref"><a href="topics/impala_disk_space.html">Managing Disk Space</a></li></ul></li><li class="topicref"><a href="topics/impala_security.html">Impala Security</a><ul><li class="topicref"><a href="topics/impala_security_guidelines.html">Security Guidelin
 es for Impala</a></li><li class="topicref"><a href="topics/impala_security_files.html">Securing Impala Data and Log Files</a></li><li class="topicref"><a href="topics/impala_security_install.html">Installation Considerations for Impala Security</a></li><li class="topicref"><a href="topics/impala_security_metastore.html">Securing the Hive Metastore Database</a></li><li class="topicref"><a href="topics/impala_security_webui.html">Securing the Impala Web User Interface</a></li><li class="topicref"><a href="topics/impala_ssl.html">Configuring TLS/SSL for Impala</a></li><li class="topicref"><a href="topics/impala_authorization.html">Enabling Sentry Authorization for Impala</a></li><li class="topicref"><a href="topics/impala_authentication.html">Impala Authentication</a><ul><li class="topicref"><a href="topics/impala_kerberos.html">Enabling Kerberos Authentication for Impala</a></li><li class="topicref"><a href="topics/impala_ldap.html">Enabling LDAP Authentication for Impala</a></li><li 
 class="topicref"><a href="topics/impala_mixed_security.html">Using Multiple Authentication Methods with Impala</a></li><li class="topicref"><a href="topics/impala_delegation.html">Configuring Impala Delegation for Hue and BI Tools</a></li></ul></li><li class="topicref"><a href="topics/impala_auditing.html">Auditing</a></li><li class="topicref"><a href="topics/impala_lineage.html">Viewing Lineage Info</a></li></ul></li><li class="topicref"><a href="topics/impala_langref.html">SQL Reference</a><ul><li class="topicref"><a href="topics/impala_comments.html">Comments</a></li><li class="topicref"><a href="topics/impala_datatypes.html">Data Types</a><ul><li class="topicref"><a href="topics/impala_array.html">ARRAY Complex Type (Impala 2.3 or higher only)</a></li><li class="topicref"><a href="topics/impala_bigint.html">BIGINT</a></li><li class="topicref"><a href="topics/impala_boolean.html">BOOLEAN</a></li><li class="topicref"><a href="topics/impala_char.html">CHAR</a></li><li class="topicr
 ef"><a href="topics/impala_decimal.html">DECIMAL</a></li><li class="topicref"><a href="topics/impala_double.html">DOUBLE</a></li><li class="topicref"><a href="topics/impala_float.html">FLOAT</a></li><li class="topicref"><a href="topics/impala_int.html">INT</a></li><li class="topicref"><a href="topics/impala_map.html">MAP Complex Type (Impala 2.3 or higher only)</a></li><li class="topicref"><a href="topics/impala_real.html">REAL</a></li><li class="topicref"><a href="topics/impala_smallint.html">SMALLINT</a></li><li class="topicref"><a href="topics/impala_string.html">STRING</a></li><li class="topicref"><a href="topics/impala_struct.html">STRUCT Complex Type (Impala 2.3 or higher only)</a></li><li class="topicref"><a href="topics/impala_timestamp.html">TIMESTAMP</a></li><li class="topicref"><a href="topics/impala_tinyint.html">TINYINT</a></li><li class="topicref"><a href="topics/impala_varchar.html">VARCHAR</a></li><li class="topicref"><a href="topics/impala_complex_types.html">Comple
 x Types (Impala 2.3 or higher only)</a></li></ul></li><li class="topicref"><a href="topics/impala_literals.html">Literals</a></li><li class="topicref"><a href="topics/impala_operators.html">SQL Operators</a></li><li class="topicref"><a href="topics/impala_schema_objects.html">Schema Objects and Object Names</a><ul><li class="topicref"><a href="topics/impala_aliases.html">Aliases</a></li><li class="topicref"><a href="topics/impala_databases.html">Databases</a></li><li class="topicref"><a href="topics/impala_functions_overview.html">Functions</a></li><li class="topicref"><a href="topics/impala_identifiers.html">Identifiers</a></li><li class="topicref"><a href="topics/impala_tables.html">Tables</a></li><li class="topicref"><a href="topics/impala_views.html">Views</a></li></ul></li><li class="topicref"><a href="topics/impala_langref_sql.html">SQL Statements</a><ul><li class="topicref"><a href="topics/impala_ddl.html">DDL Statements</a></li><li class="topicref"><a href="topics/impala_dml
 .html">DML Statements</a></li><li class="topicref"><a href="topics/impala_alter_table.html">ALTER TABLE</a></li><li class="topicref"><a href="topics/impala_alter_view.html">ALTER VIEW</a></li><li class="topicref"><a href="topics/impala_compute_stats.html">COMPUTE STATS</a></li><li class="topicref"><a href="topics/impala_create_database.html">CREATE DATABASE</a></li><li class="topicref"><a href="topics/impala_create_function.html">CREATE FUNCTION</a></li><li class="topicref"><a href="topics/impala_create_role.html">CREATE ROLE</a></li><li class="topicref"><a href="topics/impala_create_table.html">CREATE TABLE</a></li><li class="topicref"><a href="topics/impala_create_view.html">CREATE VIEW</a></li><li class="topicref"><a href="topics/impala_delete.html">DELETE</a></li><li class="topicref"><a href="topics/impala_describe.html">DESCRIBE</a></li><li class="topicref"><a href="topics/impala_drop_database.html">DROP DATABASE</a></li><li class="topicref"><a href="topics/impala_drop_function
 .html">DROP FUNCTION</a></li><li class="topicref"><a href="topics/impala_drop_role.html">DROP ROLE</a></li><li class="topicref"><a href="topics/impala_drop_stats.html">DROP STATS</a></li><li class="topicref"><a href="topics/impala_drop_table.html">DROP TABLE</a></li><li class="topicref"><a href="topics/impala_drop_view.html">DROP VIEW</a></li><li class="topicref"><a href="topics/impala_explain.html">EXPLAIN</a></li><li class="topicref"><a href="topics/impala_grant.html">GRANT</a></li><li class="topicref"><a href="topics/impala_insert.html">INSERT</a></li><li class="topicref"><a href="topics/impala_invalidate_metadata.html">INVALIDATE METADATA</a></li><li class="topicref"><a href="topics/impala_load_data.html">LOAD DATA</a></li><li class="topicref"><a href="topics/impala_refresh.html">REFRESH</a></li><li class="topicref"><a href="topics/impala_revoke.html">REVOKE</a></li><li class="topicref"><a href="topics/impala_select.html">SELECT</a><ul><li class="topicref"><a href="topics/impala
 _joins.html">Joins</a></li><li class="topicref"><a href="topics/impala_order_by.html">ORDER BY Clause</a></li><li class="topicref"><a href="topics/impala_group_by.html">GROUP BY Clause</a></li><li class="topicref"><a href="topics/impala_having.html">HAVING Clause</a></li><li class="topicref"><a href="topics/impala_limit.html">LIMIT Clause</a></li><li class="topicref"><a href="topics/impala_offset.html">OFFSET Clause</a></li><li class="topicref"><a href="topics/impala_union.html">UNION Clause</a></li><li class="topicref"><a href="topics/impala_subqueries.html">Subqueries</a></li><li class="topicref"><a href="topics/impala_with.html">WITH Clause</a></li><li class="topicref"><a href="topics/impala_distinct.html">DISTINCT Operator</a></li><li class="topicref"><a href="topics/impala_hints.html">Hints</a></li></ul></li><li class="topicref"><a href="topics/impala_set.html">SET</a><ul><li class="topicref"><a href="topics/impala_query_options.html">Query Options for the SET Statement</a><ul>
 <li class="topicref"><a href="topics/impala_abort_on_default_limit_exceeded.html">ABORT_ON_DEFAULT_LIMIT_EXCEEDED</a></li><li class="topicref"><a href="topics/impala_abort_on_error.html">ABORT_ON_ERROR</a></li><li class="topicref"><a href="topics/impala_allow_unsupported_formats.html">ALLOW_UNSUPPORTED_FORMATS</a></li><li class="topicref"><a href="topics/impala_appx_count_distinct.html">APPX_COUNT_DISTINCT</a></li><li class="topicref"><a href="topics/impala_batch_size.html">BATCH_SIZE</a></li><li class="topicref"><a href="topics/impala_compression_codec.html">COMPRESSION_CODEC</a></li><li class="topicref"><a href="topics/impala_debug_action.html">DEBUG_ACTION</a></li><li class="topicref"><a href="topics/impala_default_order_by_limit.html">DEFAULT_ORDER_BY_LIMIT</a></li><li class="topicref"><a href="topics/impala_disable_codegen.html">DISABLE_CODEGEN</a></li><li class="topicref"><a href="topics/impala_disable_row_runtime_filtering.html">DISABLE_ROW_RUNTIME_FILTERING</a></li><li class
 ="topicref"><a href="topics/impala_disable_streaming_preaggregations.html">DISABLE_STREAMING_PREAGGREGATIONS</a></li><li class="topicref"><a href="topics/impala_disable_unsafe_spills.html">DISABLE_UNSAFE_SPILLS</a></li><li class="topicref"><a href="topics/impala_exec_single_node_rows_threshold.html">EXEC_SINGLE_NODE_ROWS_THRESHOLD</a></li><li class="topicref"><a href="topics/impala_explain_level.html">EXPLAIN_LEVEL</a></li><li class="topicref"><a href="topics/impala_hbase_cache_blocks.html">HBASE_CACHE_BLOCKS</a></li><li class="topicref"><a href="topics/impala_hbase_caching.html">HBASE_CACHING</a></li><li class="topicref"><a href="topics/impala_live_progress.html">LIVE_PROGRESS</a></li><li class="topicref"><a href="topics/impala_live_summary.html">LIVE_SUMMARY</a></li><li class="topicref"><a href="topics/impala_max_errors.html">MAX_ERRORS</a></li><li class="topicref"><a href="topics/impala_max_io_buffers.html">MAX_IO_BUFFERS</a></li><li class="topicref"><a href="topics/impala_max_sc
 an_range_length.html">MAX_SCAN_RANGE_LENGTH</a></li><li class="topicref"><a href="topics/impala_max_num_runtime_filters.html">MAX_NUM_RUNTIME_FILTERS</a></li><li class="topicref"><a href="topics/impala_mem_limit.html">MEM_LIMIT</a></li><li class="topicref"><a href="topics/impala_mt_dop.html">MT_DOP</a></li><li class="topicref"><a href="topics/impala_num_nodes.html">NUM_NODES</a></li><li class="topicref"><a href="topics/impala_num_scanner_threads.html">NUM_SCANNER_THREADS</a></li><li class="topicref"><a href="topics/impala_optimize_partition_key_scans.html">OPTIMIZE_PARTITION_KEY_SCANS</a></li><li class="topicref"><a href="topics/impala_parquet_compression_codec.html">PARQUET_COMPRESSION_CODEC</a></li><li class="topicref"><a href="topics/impala_parquet_annotate_strings_utf8.html">PARQUET_ANNOTATE_STRINGS_UTF8</a></li><li class="topicref"><a href="topics/impala_parquet_fallback_schema_resolution.html">PARQUET_FALLBACK_SCHEMA_RESOLUTION</a></li><li class="topicref"><a href="topics/impa
 la_parquet_file_size.html">PARQUET_FILE_SIZE</a></li><li class="topicref"><a href="topics/impala_prefetch_mode.html">PREFETCH_MODE</a></li><li class="topicref"><a href="topics/impala_query_timeout_s.html">QUERY_TIMEOUT_S</a></li><li class="topicref"><a href="topics/impala_request_pool.html">REQUEST_POOL</a></li><li class="topicref"><a href="topics/impala_replica_preference.html">REPLICA_PREFERENCE</a></li><li class="topicref"><a href="topics/impala_reservation_request_timeout.html">RESERVATION_REQUEST_TIMEOUT</a></li><li class="topicref"><a href="topics/impala_runtime_bloom_filter_size.html">RUNTIME_BLOOM_FILTER_SIZE</a></li><li class="topicref"><a href="topics/impala_runtime_filter_max_size.html">RUNTIME_FILTER_MAX_SIZE</a></li><li class="topicref"><a href="topics/impala_runtime_filter_min_size.html">RUNTIME_FILTER_MIN_SIZE</a></li><li class="topicref"><a href="topics/impala_runtime_filter_mode.html">RUNTIME_FILTER_MODE</a></li><li class="topicref"><a href="topics/impala_runtime_fi
 lter_wait_time_ms.html">RUNTIME_FILTER_WAIT_TIME_MS</a></li><li class="topicref"><a href="topics/impala_s3_skip_insert_staging.html">S3_SKIP_INSERT_STAGING</a></li><li class="topicref"><a href="topics/impala_scan_node_codegen_threshold.html">SCAN_NODE_CODEGEN_THRESHOLD</a></li><li class="topicref"><a href="topics/impala_scratch_limit.html">SCRATCH_LIMIT</a></li><li class="topicref"><a href="topics/impala_schedule_random_replica.html">SCHEDULE_RANDOM_REPLICA</a></li><li class="topicref"><a href="topics/impala_support_start_over.html">SUPPORT_START_OVER</a></li><li class="topicref"><a href="topics/impala_sync_ddl.html">SYNC_DDL</a></li><li class="topicref"><a href="topics/impala_v_cpu_cores.html">V_CPU_CORES</a></li></ul></li></ul></li><li class="topicref"><a href="topics/impala_show.html">SHOW</a></li><li class="topicref"><a href="topics/impala_truncate_table.html">TRUNCATE TABLE</a></li><li class="topicref"><a href="topics/impala_update.html">UPDATE</a></li><li class="topicref"><a h
 ref="topics/impala_upsert.html">UPSERT</a></li><li class="topicref"><a href="topics/impala_use.html">USE</a></li></ul></li><li class="topicref"><a href="topics/impala_functions.html">Built-In Functions</a><ul><li class="topicref"><a href="topics/impala_math_functions.html">Mathematical Functions</a></li><li class="topicref"><a href="topics/impala_bit_functions.html">Bit Functions</a></li><li class="topicref"><a href="topics/impala_conversion_functions.html">Type Conversion Functions</a></li><li class="topicref"><a href="topics/impala_datetime_functions.html">Date and Time Functions</a></li><li class="topicref"><a href="topics/impala_conditional_functions.html">Conditional Functions</a></li><li class="topicref"><a href="topics/impala_string_functions.html">String Functions</a></li><li class="topicref"><a href="topics/impala_misc_functions.html">Miscellaneous Functions</a></li><li class="topicref"><a href="topics/impala_aggregate_functions.html">Aggregate Functions</a><ul><li class="t
 opicref"><a href="topics/impala_appx_median.html">APPX_MEDIAN</a></li><li class="topicref"><a href="topics/impala_avg.html">AVG</a></li><li class="topicref"><a href="topics/impala_count.html">COUNT</a></li><li class="topicref"><a href="topics/impala_group_concat.html">GROUP_CONCAT</a></li><li class="topicref"><a href="topics/impala_max.html">MAX</a></li><li class="topicref"><a href="topics/impala_min.html">MIN</a></li><li class="topicref"><a href="topics/impala_ndv.html">NDV</a></li><li class="topicref"><a href="topics/impala_stddev.html">STDDEV, STDDEV_SAMP, STDDEV_POP</a></li><li class="topicref"><a href="topics/impala_sum.html">SUM</a></li><li class="topicref"><a href="topics/impala_variance.html">VARIANCE, VARIANCE_SAMP, VARIANCE_POP, VAR_SAMP, VAR_POP</a></li></ul></li><li class="topicref"><a href="topics/impala_analytic_functions.html">Analytic Functions</a></li><li class="topicref"><a href="topics/impala_udf.html">Impala User-Defined Functions (UDFs)</a></li></ul></li><li cla
 ss="topicref"><a href="topics/impala_langref_unsupported.html">SQL Differences Between Impala and Hive</a></li><li class="topicref"><a href="topics/impala_porting.html">Porting SQL</a></li></ul></li><li class="topicref"><a href="topics/impala_impala_shell.html">The Impala Shell</a><ul><li class="topicref"><a href="topics/impala_shell_options.html">Configuration Options</a></li><li class="topicref"><a href="topics/impala_connecting.html">Connecting to impalad</a></li><li class="topicref"><a href="topics/impala_shell_running_commands.html">Running Commands and SQL Statements</a></li><li class="topicref"><a href="topics/impala_shell_commands.html">Command Reference</a></li></ul></li><li class="topicref"><a href="topics/impala_performance.html">Performance Tuning</a><ul><li class="topicref"><a href="topics/impala_perf_cookbook.html">Performance Best Practices</a></li><li class="topicref"><a href="topics/impala_perf_joins.html">Join Performance</a></li><li class="topicref"><a href="topic
 s/impala_perf_stats.html">Table and Column Statistics</a></li><li class="topicref"><a href="topics/impala_perf_benchmarking.html">Benchmarking</a></li><li class="topicref"><a href="topics/impala_perf_resources.html">Controlling Resource Usage</a></li><li class="topicref"><a href="topics/impala_runtime_filtering.html">Runtime Filtering</a></li><li class="topicref"><a href="topics/impala_perf_hdfs_caching.html">HDFS Caching</a></li><li class="topicref"><a href="topics/impala_perf_testing.html">Testing Impala Performance</a></li><li class="topicref"><a href="topics/impala_explain_plan.html">EXPLAIN Plans and Query Profiles</a></li><li class="topicref"><a href="topics/impala_perf_skew.html">HDFS Block Skew</a></li></ul></li><li class="topicref"><a href="topics/impala_scalability.html">Scalability Considerations</a></li><li class="topicref"><a href="topics/impala_partitioning.html">Partitioning</a></li><li class="topicref"><a href="topics/impala_file_formats.html">File Formats</a><ul><li
  class="topicref"><a href="topics/impala_txtfile.html">Text Data Files</a></li><li class="topicref"><a href="topics/impala_parquet.html">Parquet Data Files</a></li><li class="topicref"><a href="topics/impala_avro.html">Avro Data Files</a></li><li class="topicref"><a href="topics/impala_rcfile.html">RCFile Data Files</a></li><li class="topicref"><a href="topics/impala_seqfile.html">SequenceFile Data Files</a></li></ul></li><li class="topicref"><a href="topics/impala_kudu.html">Using Impala to Query Kudu Tables</a></li><li class="topicref"><a href="topics/impala_hbase.html">HBase Tables</a></li><li class="topicref"><a href="topics/impala_s3.html">S3 Tables</a></li><li class="topicref"><a href="topics/impala_isilon.html">Isilon Storage</a></li><li class="topicref"><a href="topics/impala_logging.html">Logging</a></li><li class="topicref"><a href="topics/impala_troubleshooting.html">Troubleshooting Impala</a><ul><li class="topicref"><a href="topics/impala_webui.html">Web User Interface</
 a></li><li class="topicref"><a href="topics/impala_breakpad.html">Breakpad Minidumps</a></li></ul></li><li class="topicref"><a href="topics/impala_ports.html">Ports Used by Impala</a></li><li class="topicref"><a href="topics/impala_reserved_words.html">Impala Reserved Words</a></li><li class="topicref"><a href="topics/impala_faq.html">Impala Frequently Asked Questions</a></li><li class="topicref"><a href="topics/impala_release_notes.html">Impala Release Notes</a><ul><li class="topicref"><a href="topics/impala_relnotes.html">Impala Release Notes</a></li><li class="topicref"><a href="topics/impala_new_features.html">New Features in Apache Impala (incubating)</a></li><li class="topicref"><a href="topics/impala_incompatible_changes.html">Incompatible Changes and Limitations in Apache Impala (incubating)</a></li><li class="topicref"><a href="topics/impala_known_issues.html">Known Issues and Workarounds in Impala</a></li><li class="topicref"><a href="topics/impala_fixed_issues.html">Fixed
  Issues in Apache Impala (incubating)</a></li></ul></li></ul></nav></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_abort_on_default_limit_exceeded.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_abort_on_default_limit_exceeded.html b/docs/build/html/topics/impala_abort_on_default_limit_exceeded.html
new file mode 100644
index 0000000..812ce13
--- /dev/null
+++ b/docs/build/html/topics/impala_abort_on_default_limit_exceeded.html
@@ -0,0 +1,24 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="abort_on_default_limit_exceeded"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>ABORT_ON_DEFAULT_LIMIT_EXCEEDED Query Option</title></head><body id="abort_on_default_limit_exceeded"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">ABORT_ON_DEFAULT_LIMIT_EXCEEDED Query Option</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+        Now that the <code class="ph codeph">ORDER BY</code> clause no longer requires an accompanying <code class="ph codeph">LIMIT</code>
+        clause in Impala 1.4.0 and higher, this query option is deprecated and has no effect.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Type:</strong> Boolean; recognized values are 1 and 0, or <code class="ph codeph">true</code> and <code class="ph codeph">false</code>;
+        any other value interpreted as <code class="ph codeph">false</code>
+      </p>
+    <p class="p">
+        <strong class="ph b">Default:</strong> <code class="ph codeph">false</code> (shown as 0 in output of <code class="ph codeph">SET</code> statement)
+      </p>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_abort_on_error.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_abort_on_error.html b/docs/build/html/topics/impala_abort_on_error.html
new file mode 100644
index 0000000..c110544
--- /dev/null
+++ b/docs/build/html/topics/impala_abort_on_error.html
@@ -0,0 +1,42 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="abort_on_error"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>ABORT_ON_ERROR Query Option</title></head><body id="abort_on_error"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">ABORT_ON_ERROR Query Option</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      
+      When this option is enabled, Impala cancels a query immediately when any of the nodes encounters an error,
+      rather than continuing and possibly returning incomplete results. This option is disabled by default, to help
+      gather maximum diagnostic information when an error occurs, for example, whether the same problem occurred on
+      all nodes or only a single node. Currently, the errors that Impala can skip over involve data corruption,
+      such as a column that contains a string value when expected to contain an integer value.
+    </p>
+
+    <p class="p">
+      To control how much logging Impala does for non-fatal errors when <code class="ph codeph">ABORT_ON_ERROR</code> is turned
+      off, use the <code class="ph codeph">MAX_ERRORS</code> option.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Type:</strong> Boolean; recognized values are 1 and 0, or <code class="ph codeph">true</code> and <code class="ph codeph">false</code>;
+        any other value interpreted as <code class="ph codeph">false</code>
+      </p>
+    <p class="p">
+        <strong class="ph b">Default:</strong> <code class="ph codeph">false</code> (shown as 0 in output of <code class="ph codeph">SET</code> statement)
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+    <p class="p">
+      <a class="xref" href="impala_max_errors.html#max_errors">MAX_ERRORS Query Option</a>,
+      <a class="xref" href="impala_logging.html#logging">Using Impala Logging</a>
+    </p>
+
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_admin.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_admin.html b/docs/build/html/topics/impala_admin.html
new file mode 100644
index 0000000..bb1384e
--- /dev/null
+++ b/docs/build/html/topics/impala_admin.html
@@ -0,0 +1,52 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_admission.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_resource_management.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_timeouts.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_proxy.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_disk_space.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="admin"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Impala Admini
 stration</title></head><body id="admin"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Impala Administration</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      As an administrator, you monitor Impala's use of resources and take action when necessary to keep Impala
+      running smoothly and avoid conflicts with other Hadoop components running on the same cluster. When you
+      detect that an issue has happened or could happen in the future, you reconfigure Impala or other components
+      such as HDFS or even the hardware of the cluster itself to resolve or avoid problems.
+    </p>
+
+    <p class="p toc"></p>
+
+    <p class="p">
+      <strong class="ph b">Related tasks:</strong>
+    </p>
+
+    <p class="p">
+      As an administrator, you can expect to perform installation, upgrade, and configuration tasks for Impala on
+      all machines in a cluster. See <a class="xref" href="impala_install.html#install">Installing Impala</a>,
+      <a class="xref" href="impala_upgrading.html#upgrading">Upgrading Impala</a>, and <a class="xref" href="impala_config.html#config">Managing Impala</a> for details.
+    </p>
+
+    <p class="p">
+      For security tasks typically performed by administrators, see <a class="xref" href="impala_security.html#security">Impala Security</a>.
+    </p>
+
+    <div class="p">
+      Administrators also decide how to allocate cluster resources so that all Hadoop components can run smoothly
+      together. For Impala, this task primarily involves:
+      <ul class="ul">
+        <li class="li">
+          Deciding how many Impala queries can run concurrently and with how much memory, through the admission
+          control feature. See <a class="xref" href="impala_admission.html#admission_control">Admission Control and Query Queuing</a> for details.
+        </li>
+
+        <li class="li">
+          Dividing cluster resources such as memory between Impala and other components, using YARN for overall
+          resource management, and Llama to mediate resource requests from Impala to YARN. See
+          <a class="xref" href="impala_resource_management.html#resource_management">Resource Management for Impala</a> for details.
+        </li>
+      </ul>
+    </div>
+
+  </div>
+<nav role="navigation" class="related-links"><ul class="ullinks"><li class="link ulchildlink"><strong><a href="../topics/impala_admission.html">Admission Control and Query Queuing</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_resource_management.html">Resource Management for Impala</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_timeouts.html">Setting Timeout Periods for Daemons, Queries, and Sessions</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_proxy.html">Using Impala through a Proxy for High Availability</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_disk_space.html">Managing Disk Space for Impala Data</a></strong><br></li></ul></nav></article></main></body></html>
\ No newline at end of file

[30/51] [partial] incubator-impala git commit: IMPALA-4181 [DOCS] Publish rendered Impala documentation to ASF site

Posted by jb...@apache.org.

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_incompatible_changes.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_incompatible_changes.html b/docs/build/html/topics/impala_incompatible_changes.html
new file mode 100644
index 0000000..642a334
--- /dev/null
+++ b/docs/build/html/topics/impala_incompatible_changes.html
@@ -0,0 +1,1443 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_release_notes.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="incompatible_changes"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Incompatible Changes and Limitations in Apache Impala (incubating)</title></head><body id="incompatible_changes"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1"><span class="ph">Incompatible Changes and Limitations in Apache Impala (incubating)</span></h1>
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      The Impala version covered by this documentation library contains the following incompatible changes. These
+      are things such as file format changes, removed features, or changes to implementation, default
+      configuration, dependencies, or prerequisites that could cause issues during or after an Impala upgrade.
+    </p>
+
+    <p class="p">
+      Even added SQL statements or clauses can produce incompatibilities, if you have databases, tables, or columns
+      whose names conflict with the new keywords. <span class="ph">See
+      <a class="xref" href="impala_reserved_words.html#reserved_words">Impala Reserved Words</a> for the set of reserved words for the current
+      release, and the quoting techniques to avoid name conflicts.</span>
+    </p>
+
+    <p class="p toc inpage"></p>
+  </div>
+
+  <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_release_notes.html">Impala Release Notes</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="incompatible_changes__incompatible_changes_28x">
+
+    <h2 class="title topictitle2" id="ariaid-title2">Incompatible Changes Introduced in Impala 2.8.x</h2>
+
+    <div class="body conbody">
+      <ul class="ul">
+        <li class="li">
+          <p class="p">
+            Llama support is removed completely from Impala. Related flags (<code class="ph codeph">--enable_rm</code>)
+            and query options (such as <code class="ph codeph">V_CPU_CORES</code>) remain but do not have any effect.
+          </p>
+          <p class="p">
+            If <code class="ph codeph">--enable_rm</code> is passed to Impala, a warning is printed to the log on startup.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            The syntax related to Kudu tables includes a number of new reserved words,
+            such as <code class="ph codeph">COMPRESSION</code>, <code class="ph codeph">DEFAULT</code>, and <code class="ph codeph">ENCODING</code>, that
+            might conflict with names of existing tables, columns, or other identifiers from older Impala versions.
+            See <a class="xref" href="impala_reserved_words.html#reserved_words">Impala Reserved Words</a> for the full list of reserved words.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            The DDL syntax for Kudu tables, particularly in the <code class="ph codeph">CREATE TABLE</code> statement, is different
+            from the special <code class="ph codeph">impala_next</code> fork that was previously used for accessing Kudu tables
+            from Impala:
+          </p>
+          <ul class="ul">
+            <li class="li">
+              <p class="p">
+                The <code class="ph codeph">DISTRIBUTE BY</code> clause is now <code class="ph codeph">PARTITIONED BY</code>.
+              </p>
+            </li>
+            <li class="li">
+              <p class="p">
+                The <code class="ph codeph">INTO <var class="keyword varname">N</var> BUCKETS</code>
+                clause is now <code class="ph codeph">PARTITIONS <var class="keyword varname">N</var></code>.
+              </p>
+            </li>
+            <li class="li">
+              <p class="p">
+                The <code class="ph codeph">SPLIT ROWS</code> clause is replaced by different syntax for specifying
+                the ranges covered by each partition.
+              </p>
+            </li>
+          </ul>
+        </li>
+        <li class="li">
+          <p class="p">
+            The <code class="ph codeph">DESCRIBE</code> output for Kudu tables includes several extra columns.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            Non-primary-key columns can contain <code class="ph codeph">NULL</code> values by default. The
+            <code class="ph codeph">SHOW CREATE TABLE</code> output for these columns displays the <code class="ph codeph">NULL</code>
+            attribute. There was a period during early experimental versions of Impala + Kudu where
+            non-primary-key columns had the <code class="ph codeph">NOT NULL</code> attribute by default.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            The <code class="ph codeph">IGNORE</code> keyword that was present in early experimental versions of Impala + Kudu
+            is no longer present. The behavior of the <code class="ph codeph">IGNORE</code> keyword is now the default:
+            DML statements continue with warnings, instead of failing with errors, if they encounter conditions
+            such as <span class="q">"primary key already exists"</span> for an <code class="ph codeph">INSERT</code> statement or
+            <span class="q">"primary key already deleted"</span> for a <code class="ph codeph">DELETE</code> statement.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            The replication factor for Kudu tables must be an odd number.
+          </p>
+        </li>
+      </ul>
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="incompatible_changes__incompatible_changes_27x">
+
+    <h2 class="title topictitle2" id="ariaid-title3">Incompatible Changes Introduced in Impala 2.7.x</h2>
+
+    <div class="body conbody">
+      <ul class="ul">
+        <li class="li">
+          <p class="p">
+            Bug fixes related to parsing of floating-point values (IMPALA-1731 and IMPALA-3868) can change
+            the results of casting strings that represent invalid floating-point values.
+            For example, formerly a string value beginning or ending with <code class="ph codeph">inf</code>,
+            such as <code class="ph codeph">1.23inf</code> or <code class="ph codeph">infinite</code>, now are converted to <code class="ph codeph">NULL</code>
+            when interpreted as a floating-point value.
+            Formerly, they were interpreted as the special <span class="q">"infinity"</span> value when converting from string to floating-point.
+            Similarly, now only the string <code class="ph codeph">NaN</code> (case-sensitive) is interpreted as the special <span class="q">"not a number"</span>
+            value. String values containing multiple dots, such as <code class="ph codeph">3..141</code> or <code class="ph codeph">3.1.4.1</code>,
+            are now interpreted as <code class="ph codeph">NULL</code> rather than being converted to valid floating-point values.
+          </p>
+        </li>
+      </ul>
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="incompatible_changes__incompatible_changes_26x">
+
+    <h2 class="title topictitle2" id="ariaid-title4">Incompatible Changes Introduced in Impala 2.6.x</h2>
+
+    <div class="body conbody">
+      <ul class="ul">
+        <li class="li">
+          <p class="p">
+            The default for the <code class="ph codeph">RUNTIME_FILTER_MODE</code>
+            query option is changed to <code class="ph codeph">GLOBAL</code> (the highest setting).
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            The <code class="ph codeph">RUNTIME_BLOOM_FILTER_SIZE</code> setting is now only used
+            as a fallback if statistics are not available; otherwise, Impala
+            uses the statistics to estimate the appropriate size to use for each filter.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            Admission control and dynamic resource pools are enabled by default.
+            When upgrading from an earlier release, you must turn on these settings yourself
+            if they are not already enabled.
+            See <a class="xref" href="impala_admission.html#admission_control">Admission Control and Query Queuing</a> for details
+            about admission control.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            Impala reserves some new keywords, in preparation for support for Kudu syntax:
+            <code class="ph codeph">buckets</code>, <code class="ph codeph">delete</code>, <code class="ph codeph">distribute</code>,
+            <code class="ph codeph">hash</code>, <code class="ph codeph">ignore</code>, <code class="ph codeph">split</code>, and <code class="ph codeph">update</code>.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            For Kerberized clusters, the Catalog service now uses
+            the Kerberos principal instead of the operating sytem user that runs
+            the <span class="keyword cmdname">catalogd</span> daemon.
+            This eliminates the requirement to configure a <code class="ph codeph">hadoop.user.group.static.mapping.overrides</code>
+            setting to put the OS user into the Sentry administrative group, on clusters where the principal
+            and the OS user name for this user are different.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            The mechanism for interpreting <code class="ph codeph">DECIMAL</code> literals is
+            improved, no longer going through an intermediate conversion step
+            to <code class="ph codeph">DOUBLE</code>:
+          </p>
+          <ul class="ul">
+            <li class="li">
+              <p class="p">
+                Casting a <code class="ph codeph">DECIMAL</code> value to <code class="ph codeph">TIMESTAMP</code>
+                <code class="ph codeph">DOUBLE</code> produces a more precise
+                value for the <code class="ph codeph">TIMESTAMP</code> than formerly.
+              </p>
+            </li>
+            <li class="li">
+              <p class="p">
+                Certain function calls involving <code class="ph codeph">DECIMAL</code> literals
+                now succeed, when formerly they failed due to lack of a function
+                signature with a <code class="ph codeph">DOUBLE</code> argument.
+              </p>
+            </li>
+          </ul>
+        </li>
+        <li class="li">
+          <p class="p">
+            Improved type accuracy for <code class="ph codeph">CASE</code> return values.
+            If all <code class="ph codeph">WHEN</code> clauses of the <code class="ph codeph">CASE</code>
+            expression are of <code class="ph codeph">CHAR</code> type, the final result
+            is also <code class="ph codeph">CHAR</code> instead of being converted to
+            <code class="ph codeph">STRING</code>.
+          </p>
+        </li>
+        <li class="li">
+          <div class="p">
+        The initial release of <span class="keyword">Impala 2.5</span> sometimes has a higher peak memory usage than in previous releases
+        while reading Parquet files.
+        The following query options might help to reduce memory consumption in the Parquet scanner:
+        <ul class="ul">
+          <li class="li">
+            Reduce the number of scanner threads, for example: <code class="ph codeph">set num_scanner_threads=30</code>
+          </li>
+          <li class="li">
+            Reduce the batch size, for example: <code class="ph codeph">set batch_size=512</code>
+          </li>
+          <li class="li">
+            Increase the memory limit, for example: <code class="ph codeph">set mem_limit=64g</code>
+          </li>
+        </ul>
+        You can track the status of the fix for this issue at
+        <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-3662" target="_blank">IMPALA-3662</a>.
+      </div>
+        </li>
+        <li class="li">
+          <p class="p">
+            The <code class="ph codeph">S3_SKIP_INSERT_STAGING</code> query option, which is enabled by
+            default, increases the speed of <code class="ph codeph">INSERT</code> operations for S3 tables.
+            The speedup applies to regular <code class="ph codeph">INSERT</code>, but not <code class="ph codeph">INSERT OVERWRITE</code>.
+            The tradeoff is the possibility of inconsistent output files left behind if a
+            node fails during <code class="ph codeph">INSERT</code> execution.
+            See <a class="xref" href="impala_s3_skip_insert_staging.html#s3_skip_insert_staging">S3_SKIP_INSERT_STAGING Query Option (Impala 2.6 or higher only)</a> for details.
+          </p>
+        </li>
+      </ul>
+      <p class="p">
+        Certain features are turned off by default, to avoid regressions or unexpected
+        behavior following an upgrade. Consider turning on these features after suitable testing:
+      </p>
+      <ul class="ul">
+        <li class="li">
+          <p class="p">
+            Impala now recognizes the <code class="ph codeph">auth_to_local</code> setting,
+            specified through the HDFS configuration setting
+            <code class="ph codeph">hadoop.security.auth_to_local</code>.
+            This feature is disabled by default; to enable it,
+            specify <code class="ph codeph">--load_auth_to_local_rules=true</code>
+            in the <span class="keyword cmdname">impalad</span> configuration settings.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            A new query option, <code class="ph codeph">PARQUET_ANNOTATE_STRINGS_UTF8</code>,
+            makes Impala include the <code class="ph codeph">UTF-8</code> annotation
+            metadata for <code class="ph codeph">STRING</code>, <code class="ph codeph">CHAR</code>,
+            and <code class="ph codeph">VARCHAR</code> columns in Parquet files created
+            by <code class="ph codeph">INSERT</code> or <code class="ph codeph">CREATE TABLE AS SELECT</code>
+            statements.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            A new query option,
+            <code class="ph codeph">PARQUET_FALLBACK_SCHEMA_RESOLUTION</code>,
+            lets Impala locate columns within Parquet files based on
+            column name rather than ordinal position.
+            This enhancement improves interoperability with applications
+            that write Parquet files with a different order or subset of
+            columns than are used in the Impala table.
+          </p>
+        </li>
+      </ul>
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title5" id="incompatible_changes__incompatible_changes_25x">
+
+    <h2 class="title topictitle2" id="ariaid-title5">Incompatible Changes Introduced in Impala 2.5.x</h2>
+
+    <div class="body conbody">
+      <ul class="ul">
+        <li class="li">
+          <p class="p">
+            The admission control default limit for concurrent queries (the <span class="ph uicontrol">max requests</span>
+            setting) is now unlimited instead of 200.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            Multiplying a mixture of <code class="ph codeph">DECIMAL</code> and <code class="ph codeph">FLOAT</code> or
+            <code class="ph codeph">DOUBLE</code> values now returns
+            <code class="ph codeph">DOUBLE</code> rather than <code class="ph codeph">DECIMAL</code>. This
+            change avoids some cases where an intermediate value would underflow or overflow
+            and become <code class="ph codeph">NULL</code> unexpectedly. The results of
+            multiplying <code class="ph codeph">DECIMAL</code> and <code class="ph codeph">FLOAT</code> or
+            <code class="ph codeph">DOUBLE</code> might now be slightly less precise than
+            before. Previously, the intermediate types and thus the final result
+            depended on the exact order of the values of different types being
+            multiplied, which made the final result values difficult to
+            reason about.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            Previously, the <code class="ph codeph">_</code> and <code class="ph codeph">%</code> wildcard
+            characters for the <code class="ph codeph">LIKE</code> operator would not match
+            characters on the second or subsequent lines of multi-line string values. The fix for issue
+            <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2204" target="_blank">IMPALA-2204</a> causes
+            the wildcard matching to apply to the entire string for values
+            containing embedded <code class="ph codeph">\n</code> characters. This could cause
+            different results than in previous Impala releases for identical
+            queries on identical data.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            Formerly, all Impala UDFs and UDAs required running the
+            <code class="ph codeph">CREATE FUNCTION</code> statements to
+            re-create them after each <span class="keyword cmdname">catalogd</span> restart.
+            In <span class="keyword">Impala 2.5</span> and higher, functions written in C++ are persisted across
+            restarts, and the requirement to
+            re-create functions only applies to functions written in Java. Adapt any
+            function-reloading logic that you have added to your Impala environment.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+              <code class="ph codeph">CREATE TABLE LIKE</code> no longer inherits HDFS caching settings from the source table.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            The <code class="ph codeph">SHOW DATABASES</code> statement now returns two columns rather than one.
+            The second column includes the associated comment string, if any, for each database.
+            Adjust any application code that examines the list of databases and assumes the
+            result set contains only a single column.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            The output of the <code class="ph codeph">SHOW FUNCTIONS</code> statement includes
+            two new columns, showing the kind of the function (for example,
+            <code class="ph codeph">BUILTIN</code>) and whether or not the function persists
+            across catalog server restarts. For example, the <code class="ph codeph">SHOW
+            FUNCTIONS</code> output for the
+            <code class="ph codeph">_impala_builtins</code> database starts with:
+          </p>
+<pre class="pre codeblock"><code>
++--------------+-------------------------------------------------+-------------+---------------+
+| return type  | signature                                       | binary type | is persistent |
++--------------+-------------------------------------------------+-------------+---------------+
+| BIGINT       | abs(BIGINT)                                     | BUILTIN     | true          |
+| DECIMAL(*,*) | abs(DECIMAL(*,*))                               | BUILTIN     | true          |
+| DOUBLE       | abs(DOUBLE)                                     | BUILTIN     | true          |
+...
+</code></pre>
+        </li>
+      </ul>
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title6" id="incompatible_changes__incompatible_changes_24x">
+
+    <h2 class="title topictitle2" id="ariaid-title6">Incompatible Changes Introduced in Impala 2.4.x</h2>
+
+    <div class="body conbody">
+      <p class="p">
+        Other than support for DSSD storage, the Impala feature set for <span class="keyword">Impala 2.4</span> is the same as for <span class="keyword">Impala 2.3</span>.
+        Therefore, there are no incompatible changes for Impala introduced in <span class="keyword">Impala 2.4</span>.
+      </p>
+    </div>
+
+  </article>
+
+
+
+
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title7" id="incompatible_changes__incompatible_changes_23x">
+
+    <h2 class="title topictitle2" id="ariaid-title7">Incompatible Changes Introduced in Impala 2.3.x</h2>
+
+    <div class="body conbody">
+
+      <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+        <p class="p">
+          The use of the Llama component for integrated resource management within YARN
+          is no longer supported with <span class="keyword">Impala 2.3</span> and higher.
+          The Llama support code is removed entirely in <span class="keyword">Impala 2.8</span> and higher.
+        </p>
+        <p class="p">
+          For clusters running Impala alongside
+          other data management components, you define static service pools to define the resources
+          available to Impala and other components. Then within the area allocated for Impala,
+          you can create dynamic service pools, each with its own settings for the Impala admission control feature.
+        </p>
+      </div>
+
+      <ul class="ul">
+        
+        <li class="li">
+          <p class="p">
+            If Impala encounters a Parquet file that is invalid because of an incorrect magic number,
+            the query skips the file. This change is caused by the fix for issue <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2130" target="_blank">IMPALA-2130</a>.
+            Previously, Impala would attempt to read the file despite the possibility that the file was corrupted.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            Previously, calls to overloaded built-in functions could treat parameters as <code class="ph codeph">DOUBLE</code>
+            or <code class="ph codeph">FLOAT</code> when no overload had a signature that matched the exact argument types.
+            Now Impala prefers the function signature with <code class="ph codeph">DECIMAL</code> parameters in this case.
+            This change avoids a possible loss of precision in function calls such as <code class="ph codeph">greatest(0, 99999.8888)</code>;
+            now both parameters are treated as <code class="ph codeph">DECIMAL</code> rather than <code class="ph codeph">DOUBLE</code>, avoiding
+            any loss of precision in the fractional value.
+            This could cause slightly different results than in previous Impala releases for certain function calls.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            Formerly, adding or subtracting a large interval value to a <code class="ph codeph">TIMESTAMP</code> could produce
+            a nonsensical result. Now when the result goes outside the range of <code class="ph codeph">TIMESTAMP</code> values,
+            Impala returns <code class="ph codeph">NULL</code>.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            Formerly, it was possible to accidentally create a table with identical row and column delimiters.
+            This could happen unintentionally, when specifying one of the delimiters and using the
+            default value for the other. Now an attempt to use identical delimiters still succeeds,
+            but displays a warning message.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            Formerly, Impala could include snippets of table data in log files by default, for example
+            when reporting conversion errors for data values. Now any such log messages are only produced
+            at higher logging levels that you would enable only during debugging.
+          </p>
+        </li>
+
+      </ul>
+    </div>
+
+  </article>
+
+
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title8" id="incompatible_changes__incompatible_changes_22x">
+
+    <h2 class="title topictitle2" id="ariaid-title8">Incompatible Changes Introduced in Impala 2.2.x</h2>
+
+    <div class="body conbody">
+
+      <section class="section" id="incompatible_changes_22x__files_220"><h3 class="title sectiontitle">
+        Changes to File Handling
+      </h3>
+      
+        <p class="p">
+        Impala queries ignore files with extensions commonly used for temporary work files by Hadoop tools. Any
+        files with extensions <code class="ph codeph">.tmp</code> or <code class="ph codeph">.copying</code> are not considered part of the
+        Impala table. The suffix matching is case-insensitive, so for example Impala ignores both
+        <code class="ph codeph">.copying</code> and <code class="ph codeph">.COPYING</code> suffixes.
+      </p>
+        <p class="p">
+          The log rotation feature in Impala 2.2.0 and higher
+          means that older log files are now removed by default.
+          The default is to preserve the latest 10 log files for each
+          severity level, for each Impala-related daemon. If you have
+          set up your own log rotation processes that expect older
+          files to be present, either adjust your procedures or
+          change the Impala <code class="ph codeph">-max_log_files</code> setting.
+          <span class="ph">See <a class="xref" href="impala_logging.html#logs_rotate">Rotating Impala Logs</a> for details.</span>
+        </p>
+      </section>
+
+      <section class="section" id="incompatible_changes_22x__prereqs_210"><h3 class="title sectiontitle">
+        Changes to Prerequisites
+      </h3>
+      
+        <p class="p">
+        The prerequisite for CPU architecture has been relaxed in Impala 2.2.0 and higher. From this release
+        onward, Impala works on CPUs that have the SSSE3 instruction set. The SSE4 instruction set is no longer
+        required. This relaxed requirement simplifies the upgrade planning from Impala 1.x releases, which also
+        worked on SSSE3-enabled processors.
+      </p>
+      </section>
+
+    </div>
+  </article>
+
+
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title9" id="incompatible_changes__incompatible_changes_21x">
+
+    <h2 class="title topictitle2" id="ariaid-title9">Incompatible Changes Introduced in Impala 2.1.x</h2>
+
+    <div class="body conbody">
+
+      <section class="section" id="incompatible_changes_21x__prereqs_210"><h3 class="title sectiontitle">
+        Changes to Prerequisites
+      </h3>
+      
+        <p class="p">
+          Currently, Impala 2.1.x does not function on CPUs without the SSE4.1 instruction set. This minimum CPU
+          requirement is higher than in previous versions, which relied on the older SSSE3 instruction set. Check
+          the CPU level of the hosts in your cluster before upgrading to <span class="keyword">Impala 2.1</span>.
+        </p>
+      </section>
+
+      <section class="section" id="incompatible_changes_21x__output_format_210"><h3 class="title sectiontitle">
+        Changes to Output Format
+      </h3>
+      
+        <p class="p">
+          The <span class="q">"small query"</span> optimization feature introduces some new information in the
+          <code class="ph codeph">EXPLAIN</code> plan, which you might need to account for if you parse the text of the plan
+          output.
+        </p>
+      </section>
+
+      <section class="section" id="incompatible_changes_21x__reserved_words_210"><h3 class="title sectiontitle">
+        New Reserved Words
+      </h3>
+      
+      <p class="p">
+        New SQL syntax introduces additional reserved words:
+        <code class="ph codeph">FOR</code>, <code class="ph codeph">GRANT</code>, <code class="ph codeph">REVOKE</code>, <code class="ph codeph">ROLE</code>, <code class="ph codeph">ROLES</code>,
+        <code class="ph codeph">INCREMENTAL</code>.
+        <span class="ph">As always, see <a class="xref" href="impala_reserved_words.html#reserved_words">Impala Reserved Words</a>
+        for the set of reserved words for the current release, and the quoting techniques to avoid name conflicts.</span>
+      </p>
+      </section>
+    </div>
+  </article>
+
+
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title10" id="incompatible_changes__incompatible_changes_205">
+
+    <h2 class="title topictitle2" id="ariaid-title10">Incompatible Changes Introduced in Impala 2.0.5</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        No incompatible changes.
+      </p>
+
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title11" id="incompatible_changes__incompatible_changes_204">
+
+    <h2 class="title topictitle2" id="ariaid-title11">Incompatible Changes Introduced in Impala 2.0.4</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        No incompatible changes.
+      </p>
+
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title12" id="incompatible_changes__incompatible_changes_203">
+
+    <h2 class="title topictitle2" id="ariaid-title12">Incompatible Changes Introduced in Impala 2.0.3</h2>
+
+    <div class="body conbody">
+
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title13" id="incompatible_changes__incompatible_changes_202">
+
+    <h2 class="title topictitle2" id="ariaid-title13">Incompatible Changes Introduced in Impala 2.0.2</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        No incompatible changes.
+      </p>
+
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title14" id="incompatible_changes__incompatible_changes_201">
+
+    <h2 class="title topictitle2" id="ariaid-title14">Incompatible Changes Introduced in Impala 2.0.1</h2>
+
+    <div class="body conbody">
+
+      <ul class="ul">
+        <li class="li">
+          <p class="p">
+        The <code class="ph codeph">INSERT</code> statement has always left behind a hidden work directory inside the data
+        directory of the table. Formerly, this hidden work directory was named
+        <span class="ph filepath">.impala_insert_staging</span> . In Impala 2.0.1 and later, this directory name is changed to
+        <span class="ph filepath">_impala_insert_staging</span> . (While HDFS tools are expected to treat names beginning
+        either with underscore and dot as hidden, in practice names beginning with an underscore are more widely
+        supported.) If you have any scripts, cleanup jobs, and so on that rely on the name of this work directory,
+        adjust them to use the new name.
+      </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            The <code class="ph codeph">abs()</code> function now takes a broader range of numeric types as arguments, and the
+            return type is the same as the argument type.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            Shorthand notation for character classes in regular expressions, such as <code class="ph codeph">\d</code> for digit,
+            are now available again in regular expression operators and functions such as
+            <code class="ph codeph">regexp_extract()</code> and <code class="ph codeph">regexp_replace()</code>. Some other differences in
+            regular expression behavior remain between Impala 1.x and Impala 2.x releases. See
+            <a class="xref" href="impala_incompatible_changes.html#incompatible_changes_200">Incompatible Changes Introduced in Impala 2.0.0</a> for details.
+          </p>
+        </li>
+      </ul>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title15" id="incompatible_changes__incompatible_changes_200">
+
+    <h2 class="title topictitle2" id="ariaid-title15">Incompatible Changes Introduced in Impala 2.0.0</h2>
+
+    <div class="body conbody">
+
+      <section class="section" id="incompatible_changes_200__prereqs_200"><h3 class="title sectiontitle">
+        Changes to Prerequisites
+      </h3>
+      
+        <p class="p">
+          Currently, Impala 2.0.x does not function on CPUs without the SSE4.1 instruction set. This minimum CPU
+          requirement is higher than in previous versions, which relied on the older SSSE3 instruction set. Check
+          the CPU level of the hosts in your cluster before upgrading to <span class="keyword">Impala 2.0</span>.
+        </p>
+      </section>
+
+      <section class="section" id="incompatible_changes_200__queries_200"><h3 class="title sectiontitle">
+        Changes to Query Syntax
+      </h3>
+      
+
+        <p class="p">
+          The new syntax where query hints are allowed in comments causes some changes in the way comments are
+          parsed in the <span class="keyword cmdname">impala-shell</span> interpreter. Previously, you could end a
+          <code class="ph codeph">--</code> comment line with a semicolon and <span class="keyword cmdname">impala-shell</span> would treat that
+          as a no-op statement. Now, a comment line ending with a semicolon is passed as an empty statement to
+          the Impala daemon, where it is flagged as an error.
+        </p>
+
+        <p class="p">
+          Impala 2.0 and later uses a different support library for regular expression parsing than in earlier
+          Impala versions. Now, Impala uses the
+          <a class="xref" href="https://code.google.com/p/re2/" target="_blank">Google RE2 library</a>
+          rather than Boost for evaluating regular expressions. This implementation change causes some
+          differences in the allowed regular expression syntax, and in the way certain regex operators are
+          interpreted. The following are some of the major differences (not necessarily a complete list):
+        </p>
+        <ul class="ul">
+          <li class="li">
+            <p class="p">
+              <code class="ph codeph">.*?</code> notation for non-greedy matches is now supported, where it was not in earlier
+              Impala releases.
+            </p>
+          </li>
+
+          <li class="li">
+            <p class="p">
+              By default, <code class="ph codeph">^</code> and <code class="ph codeph">$</code> now match only begin/end of buffer, not
+              begin/end of each line. This behavior can be overridden in the regex itself using the
+              <code class="ph codeph">m</code> flag.
+            </p>
+          </li>
+
+          <li class="li">
+            <p class="p">
+              By default, <code class="ph codeph">.</code> does not match newline. This behavior can be overridden in the regex
+              itself using the <code class="ph codeph">s</code> flag.
+            </p>
+          </li>
+
+          <li class="li">
+            <p class="p">
+              <code class="ph codeph">\Z</code> is not supported.
+            </p>
+          </li>
+
+          <li class="li">
+            <p class="p">
+              <code class="ph codeph">&lt;</code> and <code class="ph codeph">&gt;</code> for start of word and end of word are not
+              supported.
+            </p>
+          </li>
+
+          <li class="li">
+            <p class="p">
+              Lookahead and lookbehind are not supported.
+            </p>
+          </li>
+
+          <li class="li">
+            <p class="p">
+              Shorthand notation for character classes, such as <code class="ph codeph">\d</code> for digit, is not recognized.
+              (This restriction is lifted in Impala 2.0.1, which restores the shorthand notation.)
+            </p>
+          </li>
+        </ul>
+      </section>
+
+      <section class="section" id="incompatible_changes_200__output_format_210"><h3 class="title sectiontitle">
+        Changes to Output Format
+      </h3>
+      
+
+        <p class="p">
+        In Impala 2.0 and later, <code class="ph codeph">user()</code> returns the full Kerberos principal string, such as
+        <code class="ph codeph">user@example.com</code>, in a Kerberized environment.
+      </p>
+
+        <p class="p">
+          The changed format for the user name in secure environments is also reflected where the user name is
+          displayed in the output of the <code class="ph codeph">PROFILE</code> command.
+        </p>
+
+        <p class="p">
+          In the output from <code class="ph codeph">SHOW FUNCTIONS</code>, <code class="ph codeph">SHOW AGGREGATE FUNCTIONS</code>, and
+          <code class="ph codeph">SHOW ANALYTIC FUNCTIONS</code>, arguments and return types of arbitrary
+          <code class="ph codeph">DECIMAL</code> scale and precision are represented as <code class="ph codeph">DECIMAL(*,*)</code>.
+          Formerly, these items were displayed as <code class="ph codeph">DECIMAL(-1,-1)</code>.
+        </p>
+
+      </section>
+
+      <section class="section" id="incompatible_changes_200__query_options_200"><h3 class="title sectiontitle">
+        Changes to Query Options
+      </h3>
+      
+        <p class="p">
+          The <code class="ph codeph">PARQUET_COMPRESSION_CODEC</code> query option has been replaced by the
+          <code class="ph codeph">COMPRESSION_CODEC</code> query option.
+          <span class="ph">See <a class="xref" href="impala_compression_codec.html#compression_codec">COMPRESSION_CODEC Query Option (Impala 2.0 or higher only)</a> for details.</span>
+        </p>
+      </section>
+
+      <section class="section" id="incompatible_changes_200__config_options_200"><h3 class="title sectiontitle">
+        Changes to Configuration Options
+      </h3>
+      
+
+        <p class="p">
+          The meaning of the <code class="ph codeph">--idle_query_timeout</code> configuration option is changed, to
+          accommodate the new <code class="ph codeph">QUERY_TIMEOUT_S</code> query option. Rather than setting an absolute
+          timeout period that applies to all queries, it now sets a maximum timeout period, which can be adjusted
+          downward for individual queries by specifying a value for the <code class="ph codeph">QUERY_TIMEOUT_S</code> query
+          option. In sessions where no <code class="ph codeph">QUERY_TIMEOUT_S</code> query option is specified, the
+          <code class="ph codeph">--idle_query_timeout</code> timeout period applies the same as in earlier versions.
+        </p>
+
+        <p class="p">
+          The <code class="ph codeph">--strict_unicode</code> option of <span class="keyword cmdname">impala-shell</span> was removed. To avoid
+          problems with Unicode values in <span class="keyword cmdname">impala-shell</span>, define the following locale setting
+          before running <span class="keyword cmdname">impala-shell</span>:
+        </p>
+<pre class="pre codeblock"><code>export LC_CTYPE=en_US.UTF-8
+</code></pre>
+
+      </section>
+
+      <section class="section" id="incompatible_changes_200__reserved_words_210"><h3 class="title sectiontitle">
+        New Reserved Words
+      </h3>
+      
+        <p class="p">
+          Some new SQL syntax requires the addition of new reserved words: <code class="ph codeph">ANTI</code>,
+          <code class="ph codeph">ANALYTIC</code>, <code class="ph codeph">OVER</code>, <code class="ph codeph">PRECEDING</code>,
+          <code class="ph codeph">UNBOUNDED</code>, <code class="ph codeph">FOLLOWING</code>, <code class="ph codeph">CURRENT</code>,
+          <code class="ph codeph">ROWS</code>, <code class="ph codeph">RANGE</code>, <code class="ph codeph">CHAR</code>, <code class="ph codeph">VARCHAR</code>.
+          <span class="ph">As always, see <a class="xref" href="impala_reserved_words.html#reserved_words">Impala Reserved Words</a>
+          for the set of reserved words for the current release, and the quoting techniques to avoid name conflicts.</span>
+        </p>
+      </section>
+
+      <section class="section" id="incompatible_changes_200__output_files_200"><h3 class="title sectiontitle">
+        Changes to Data Files
+      </h3>
+      
+
+        <p class="p" id="incompatible_changes_200__parquet_block_size">
+          The default Parquet block size for Impala is changed from 1 GB to 256 MB. This change could have
+          implications for the sizes of Parquet files produced by <code class="ph codeph">INSERT</code> and <code class="ph codeph">CREATE
+          TABLE AS SELECT</code> statements.
+        </p>
+        <p class="p">
+          Although older Impala releases typically produced files that were smaller than the old default size of
+          1 GB, now the file size matches more closely whatever value is specified for the
+          <code class="ph codeph">PARQUET_FILE_SIZE</code> query option. Thus, if you use a non-default value for this setting,
+          the output files could be larger than before. They still might be somewhat smaller than the specified
+          value, because Impala makes conservative estimates about the space needed to represent each column as
+          it encodes the data.
+        </p>
+        <p class="p">
+          When you do not specify an explicit value for the <code class="ph codeph">PARQUET_FILE_SIZE</code> query option,
+          Impala tries to keep the file size within the 256 MB default size, but Impala might adjust the file
+          size to be somewhat larger if needed to accommodate the layout for <dfn class="term">wide</dfn> tables, that is,
+          tables with hundreds or thousands of columns.
+        </p>
+        <p class="p">
+          This change is unlikely to affect memory usage while writing Parquet files, because Impala does not
+          pre-allocate the memory needed to hold the entire Parquet block.
+        </p>
+
+      </section>
+
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title16" id="incompatible_changes__incompatible_changes_144">
+    <h2 class="title topictitle2" id="ariaid-title16">Incompatible Changes Introduced in Impala 1.4.4</h2>
+    <div class="body conbody">
+      <p class="p">
+        No incompatible changes.
+      </p>
+
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title17" id="incompatible_changes__incompatible_changes_143">
+
+    <h2 class="title topictitle2" id="ariaid-title17">Incompatible Changes Introduced in Impala 1.4.3</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        No incompatible changes. The TLS/SSL security fix does not require any change in the way you interact with
+        Impala.
+      </p>
+
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title18" id="incompatible_changes__incompatible_changes_142">
+
+    <h2 class="title topictitle2" id="ariaid-title18">Incompatible Changes Introduced in Impala 1.4.2</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        None. Impala 1.4.2 is purely a bug-fix release. It does not include any incompatible changes.
+      </p>
+
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title19" id="incompatible_changes__incompatible_changes_141">
+
+    <h2 class="title topictitle2" id="ariaid-title19">Incompatible Changes Introduced in Impala 1.4.1</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        None. Impala 1.4.1 is purely a bug-fix release. It does not include any incompatible changes.
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title20" id="incompatible_changes__incompatible_changes_140">
+
+    <h2 class="title topictitle2" id="ariaid-title20">Incompatible Changes Introduced in Impala 1.4.0</h2>
+  
+
+    <div class="body conbody">
+
+      <ul class="ul">
+        <li class="li">
+          <p class="p">
+            There is a slight change to required security privileges in the Sentry framework. To create a new
+            object, now you need the <code class="ph codeph">ALL</code> privilege on the parent object. For example, to create a
+            new table, view, or function requires having the <code class="ph codeph">ALL</code> privilege on the database
+            containing the new object. See <a class="xref" href="impala_authorization.html">Enabling Sentry Authorization for Impala</a> for a full list of operations and
+            associated privileges.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            With the ability of <code class="ph codeph">ORDER BY</code> queries to process unlimited amounts of data with no
+            <code class="ph codeph">LIMIT</code> clause, the query options <code class="ph codeph">DEFAULT_ORDER_BY_LIMIT</code> and
+            <code class="ph codeph">ABORT_ON_DEFAULT_LIMIT_EXCEEDED</code> are now deprecated and have no effect.
+            <span class="ph">See <a class="xref" href="impala_order_by.html#order_by">ORDER BY Clause</a> for details about improvements to
+            the <code class="ph codeph">ORDER BY</code> clause.</span>
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            There are some changes to the list of reserved words. <span class="ph">See
+            <a class="xref" href="impala_reserved_words.html#reserved_words">Impala Reserved Words</a> for the most current list.</span> The following
+            keywords are new:
+          </p>
+          <ul class="ul">
+            <li class="li">
+              <code class="ph codeph">API_VERSION</code>
+            </li>
+
+            <li class="li">
+              <code class="ph codeph">BINARY</code>
+            </li>
+
+            <li class="li">
+              <code class="ph codeph">CACHED</code>
+            </li>
+
+            <li class="li">
+              <code class="ph codeph">CLASS</code>
+            </li>
+
+            <li class="li">
+              <code class="ph codeph">PARTITIONS</code>
+            </li>
+
+            <li class="li">
+              <code class="ph codeph">PRODUCED</code>
+            </li>
+
+            <li class="li">
+              <code class="ph codeph">UNCACHED</code>
+            </li>
+          </ul>
+          <p class="p">
+            The following were formerly reserved keywords, but are no longer reserved:
+          </p>
+          <ul class="ul">
+            <li class="li">
+              <code class="ph codeph">COUNT</code>
+            </li>
+
+            <li class="li">
+              <code class="ph codeph">GROUP_CONCAT</code>
+            </li>
+
+            <li class="li">
+              <code class="ph codeph">NDV</code>
+            </li>
+
+            <li class="li">
+              <code class="ph codeph">SUM</code>
+            </li>
+          </ul>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            The fix for issue
+            <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-973" target="_blank">IMPALA-973</a>
+            changes the behavior of the <code class="ph codeph">INVALIDATE METADATA</code> statement regarding nonexistent
+            tables. In Impala 1.4.0 and higher, the statement returns an error if the specified table is not in the
+            metastore database at all. It completes successfully if the specified table is in the metastore
+            database but not yet recognized by Impala, for example if the table was created through Hive. Formerly,
+            you could issue this statement for a completely nonexistent table, with no error.
+          </p>
+        </li>
+      </ul>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title21" id="incompatible_changes__incompatible_changes_133">
+
+    <h2 class="title topictitle2" id="ariaid-title21">Incompatible Changes Introduced in Impala 1.3.3</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        No incompatible changes. The TLS/SSL security fix does not require any change in the way you interact with
+        Impala.
+      </p>
+
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title22" id="incompatible_changes__incompatible_changes_132">
+
+    <h2 class="title topictitle2" id="ariaid-title22">Incompatible Changes Introduced in Impala 1.3.2</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        With the fix for IMPALA-1019, you can use HDFS caching for files that are accessed by Impala.
+      </p>
+
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title23" id="incompatible_changes__incompatible_changes_131">
+
+    <h2 class="title topictitle2" id="ariaid-title23">Incompatible Changes Introduced in Impala 1.3.1</h2>
+
+    <div class="body conbody">
+
+      <ul class="ul">
+        <li class="li">
+          <p class="p">
+        In Impala 1.3.1 and higher, the <code class="ph codeph">REGEXP</code> and <code class="ph codeph">RLIKE</code> operators now match a
+        regular expression string that occurs anywhere inside the target string, the same as if the regular
+        expression was enclosed on each side by <code class="ph codeph">.*</code>. See
+        <a class="xref" href="../shared/../topics/impala_operators.html#regexp">REGEXP Operator</a> for examples. Previously, these operators only
+        succeeded when the regular expression matched the entire target string. This change improves compatibility
+        with the regular expression support for popular database systems. There is no change to the behavior of the
+        <code class="ph codeph">regexp_extract()</code> and <code class="ph codeph">regexp_replace()</code> built-in functions.
+      </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            The result set for the <code class="ph codeph">SHOW FUNCTIONS</code> statement includes a new first column, with the
+            data type of the return value. <span class="ph">See <a class="xref" href="impala_show.html#show">SHOW Statement</a> for
+            examples.</span>
+          </p>
+        </li>
+      </ul>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title24" id="incompatible_changes__incompatible_changes_130">
+
+    <h2 class="title topictitle2" id="ariaid-title24">Incompatible Changes Introduced in Impala 1.3.0</h2>
+
+    <div class="body conbody">
+
+      <ul class="ul">
+        <li class="li">
+          <p class="p">
+            The <code class="ph codeph">EXPLAIN_LEVEL</code> query option now accepts numeric options from 0 (most concise) to 3
+            (most verbose), rather than only 0 or 1. If you formerly used <code class="ph codeph">SET EXPLAIN_LEVEL=1</code> to
+            get detailed explain plans, switch to <code class="ph codeph">SET EXPLAIN_LEVEL=3</code>. If you used the mnemonic
+            keyword (<code class="ph codeph">SET EXPLAIN_LEVEL=verbose</code>), you do not need to change your code because now
+            level 3 corresponds to <code class="ph codeph">verbose</code>. <span class="ph">See
+            <a class="xref" href="impala_explain_level.html#explain_level">EXPLAIN_LEVEL Query Option</a> for details about the allowed explain levels, and
+            <a class="xref" href="impala_explain_plan.html#explain_plan">Understanding Impala Query Performance - EXPLAIN Plans and Query Profiles</a> for usage information.</span>
+          </p>
+        </li>
+
+        <li class="li">
+          <div class="p">
+            The keyword <code class="ph codeph">DECIMAL</code> is now a reserved word. If you have any databases, tables,
+            columns, or other objects already named <code class="ph codeph">DECIMAL</code>, quote any references to them using
+            backticks (<code class="ph codeph">``</code>) to avoid name conflicts with the keyword.
+            <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+              Although the <code class="ph codeph">DECIMAL</code> keyword is a reserved word, currently Impala does not support
+              <code class="ph codeph">DECIMAL</code> as a data type for columns.
+            </div>
+          </div>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            The query option formerly named <code class="ph codeph">YARN_POOL</code> is now named
+            <code class="ph codeph">REQUEST_POOL</code> to reflect its broader use with the Impala admission control feature.
+            <span class="ph">See <a class="xref" href="impala_request_pool.html#request_pool">REQUEST_POOL Query Option</a> for information about the
+            option, and <a class="xref" href="impala_admission.html#admission_control">Admission Control and Query Queuing</a> for details about its use with the
+            admission control feature.</span>
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            There are some changes to the list of reserved words. <span class="ph">See
+            <a class="xref" href="impala_reserved_words.html#reserved_words">Impala Reserved Words</a> for the most current list.</span>
+          </p>
+          <ul class="ul">
+            <li class="li">
+              <p class="p">
+                The names of aggregate functions are no longer reserved words, so you can have databases, tables,
+                columns, or other objects named <code class="ph codeph">AVG</code>, <code class="ph codeph">MIN</code>, and so on without any
+                name conflicts.
+              </p>
+            </li>
+
+            <li class="li">
+              <p class="p">
+                The internal function names <code class="ph codeph">DISTINCTPC</code> and <code class="ph codeph">DISTINCTPCSA</code> are no
+                longer reserved words, although <code class="ph codeph">DISTINCT</code> is still a reserved word.
+              </p>
+            </li>
+
+            <li class="li">
+              <p class="p">
+                The keywords <code class="ph codeph">CLOSE_FN</code> and <code class="ph codeph">PREPARE_FN</code> are now reserved words.
+                <span class="ph">See <a class="xref" href="impala_create_function.html#create_function">CREATE FUNCTION Statement</a> for their role in
+                the <code class="ph codeph">CREATE FUNCTION</code> statement, and <a class="xref" href="impala_udf.html#udf_threads">Thread-Safe Work Area for UDFs</a> for
+                usage information.</span>
+              </p>
+            </li>
+          </ul>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            The HDFS property <code class="ph codeph">dfs.client.file-block-storage-locations.timeout</code> was renamed to
+            <code class="ph codeph">dfs.client.file-block-storage-locations.timeout.millis</code>, to emphasize that the unit of
+            measure is milliseconds, not seconds. Impala requires a timeout of at least 10 seconds, making the
+            minimum value for this setting 10000. If you are not using cluster management software, you might need to
+            edit the <span class="ph filepath">hdfs-site.xml</span> file in the Impala configuration directory for the new name
+            and minimum value.
+          </p>
+        </li>
+      </ul>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title25" id="incompatible_changes__incompatible_changes_124">
+
+    <h2 class="title topictitle2" id="ariaid-title25">Incompatible Changes Introduced in Impala 1.2.4</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        There are no incompatible changes introduced in Impala 1.2.4.
+      </p>
+
+      <p class="p">
+        Previously, after creating a table in Hive, you had to issue the <code class="ph codeph">INVALIDATE METADATA</code>
+        statement with no table name, a potentially expensive operation on clusters with many databases, tables,
+        and partitions. Starting in Impala 1.2.4, you can issue the statement <code class="ph codeph">INVALIDATE METADATA
+        <var class="keyword varname">table_name</var></code> for a table newly created through Hive. Loading the metadata for
+        only this one table is faster and involves less network overhead. Therefore, you might revisit your setup
+        DDL scripts to add the table name to <code class="ph codeph">INVALIDATE METADATA</code> statements, in cases where you
+        create and populate the tables through Hive before querying them through Impala.
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title26" id="incompatible_changes__incompatible_changes_123">
+
+    <h2 class="title topictitle2" id="ariaid-title26">Incompatible Changes Introduced in Impala 1.2.3</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        Because the feature set of Impala 1.2.3 is identical to Impala 1.2.2, there are no new incompatible
+        changes. See <a class="xref" href="impala_incompatible_changes.html#incompatible_changes_122">Incompatible Changes Introduced in Impala 1.2.2</a> if you are upgrading
+        from Impala 1.2.1 or 1.1.x.
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title27" id="incompatible_changes__incompatible_changes_122">
+
+    <h2 class="title topictitle2" id="ariaid-title27">Incompatible Changes Introduced in Impala 1.2.2</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        The following changes to SQL syntax and semantics in Impala 1.2.2 could require updates to your SQL code,
+        or schema objects such as tables or views:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          <p class="p">
+            With the addition of the <code class="ph codeph">CROSS JOIN</code> keyword, you might need to rewrite any queries
+            that refer to a table named <code class="ph codeph">CROSS</code> or use the name <code class="ph codeph">CROSS</code> as a table
+            alias:
+          </p>
+<pre class="pre codeblock"><code>-- Formerly, 'cross' in this query was an alias for t1
+-- and it was a normal join query.
+-- In 1.2.2 and higher, CROSS JOIN is a keyword, so 'cross'
+-- is not interpreted as a table alias, and the query
+-- uses the special CROSS JOIN processing rather than a
+-- regular join.
+select * from t1 cross join t2...
+
+-- Now if CROSS is used in other context such as a table or column name,
+-- use backticks to escape it.
+create table `cross` (x int);
+select * from `cross`;</code></pre>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            Formerly, a <code class="ph codeph">DROP DATABASE</code> statement in Impala would not remove the top-level HDFS
+            directory for that database. The <code class="ph codeph">DROP DATABASE</code> has been enhanced to remove that
+            directory. (You still need to drop all the tables inside the database first; this change only applies
+            to the top-level directory for the entire database.)
+          </p>
+        </li>
+
+        <li class="li">
+          The keyword <code class="ph codeph">PARQUET</code> is introduced as a synonym for <code class="ph codeph">PARQUETFILE</code> in the
+          <code class="ph codeph">CREATE TABLE</code> and <code class="ph codeph">ALTER TABLE</code> statements, because that is the common
+          name for the file format. (As opposed to SequenceFile and RCFile where the <span class="q">"File"</span> suffix is part of
+          the name.) Documentation examples have been changed to prefer the new shorter keyword. The
+          <code class="ph codeph">PARQUETFILE</code> keyword is still available for backward compatibility with older Impala
+          versions.
+        </li>
+
+        <li class="li">
+          New overloads are available for several operators and built-in functions, allowing you to insert their
+          result values into smaller numeric columns such as <code class="ph codeph">INT</code>, <code class="ph codeph">SMALLINT</code>,
+          <code class="ph codeph">TINYINT</code>, and <code class="ph codeph">FLOAT</code> without using a <code class="ph codeph">CAST()</code> call. If you
+          remove the <code class="ph codeph">CAST()</code> calls from <code class="ph codeph">INSERT</code> statements, those statements might
+          not work with earlier versions of Impala.
+        </li>
+      </ul>
+
+      <p class="p">
+        Because many users are likely to upgrade straight from Impala 1.x to Impala 1.2.2, also read
+        <a class="xref" href="impala_incompatible_changes.html#incompatible_changes_121">Incompatible Changes Introduced in Impala 1.2.1</a> for things to note about upgrading
+        to Impala 1.2.x in general.
+      </p>
+
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title28" id="incompatible_changes__incompatible_changes_121">
+
+    <h2 class="title topictitle2" id="ariaid-title28">Incompatible Changes Introduced in Impala 1.2.1</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        The following changes to SQL syntax and semantics in Impala 1.2.1 could require updates to your SQL code,
+        or schema objects such as tables or views:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          <p class="p">
+        In Impala 1.2.1 and higher, all <code class="ph codeph">NULL</code> values come at the end of the result set for
+        <code class="ph codeph">ORDER BY ... ASC</code> queries, and at the beginning of the result set for <code class="ph codeph">ORDER BY ...
+        DESC</code> queries. In effect, <code class="ph codeph">NULL</code> is considered greater than all other values for
+        sorting purposes. The original Impala behavior always put <code class="ph codeph">NULL</code> values at the end, even for
+        <code class="ph codeph">ORDER BY ... DESC</code> queries. The new behavior in Impala 1.2.1 makes Impala more compatible
+        with other popular database systems. In Impala 1.2.1 and higher, you can override or specify the sorting
+        behavior for <code class="ph codeph">NULL</code> by adding the clause <code class="ph codeph">NULLS FIRST</code> or <code class="ph codeph">NULLS
+        LAST</code> at the end of the <code class="ph codeph">ORDER BY</code> clause.
+      </p>
+          <p class="p">
+            See <a class="xref" href="impala_literals.html#null">NULL</a> for more information.
+          </p>
+        </li>
+      </ul>
+
+      <p class="p">
+        The new <span class="keyword cmdname">catalogd</span> service might require changes to any user-written scripts that stop,
+        start, or restart Impala services, install or upgrade Impala packages, or issue <code class="ph codeph">REFRESH</code> or
+        <code class="ph codeph">INVALIDATE METADATA</code> statements:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          <p class="p">
+            See <a class="xref" href="../shared/../topics/impala_install.html#install">Installing Impala</a>,
+            <a class="xref" href="../shared/../topics/impala_upgrading.html#upgrading">Upgrading Impala</a> and
+            <a class="xref" href="../shared/../topics/impala_processes.html#processes">Starting Impala</a>, for usage information for the
+            <span class="keyword cmdname">catalogd</span> daemon.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            The <code class="ph codeph">REFRESH</code> and <code class="ph codeph">INVALIDATE METADATA</code> statements are no longer needed
+            when the <code class="ph codeph">CREATE TABLE</code>, <code class="ph codeph">INSERT</code>, or other table-changing or
+            data-changing operation is performed through Impala. These statements are still needed if such
+            operations are done through Hive or by manipulating data files directly in HDFS, but in those cases the
+            statements only need to be issued on one Impala node rather than on all nodes. See
+            <a class="xref" href="../shared/../topics/impala_refresh.html#refresh">REFRESH Statement</a> and
+            <a class="xref" href="../shared/../topics/impala_invalidate_metadata.html#invalidate_metadata">INVALIDATE METADATA Statement</a> for the latest usage
+            information for those statements.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            See <a class="xref" href="../shared/../topics/impala_components.html#intro_catalogd">The Impala Catalog Service</a> for background information on the
+            <span class="keyword cmdname">catalogd</span> service.
+          </p>
+        </li>
+      </ul>
+
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title29" id="incompatible_changes__incompatible_changes_120">
+
+    <h2 class="title topictitle2" id="ariaid-title29">Incompatible Changes Introduced in Impala 1.2.0 (Beta)</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        There are no incompatible changes to SQL syntax in Impala 1.2.0 (beta).
+      </p>
+
+      <p class="p">
+        The new <span class="keyword cmdname">catalogd</span> service might require changes to any user-written scripts that stop,
+        start, or restart Impala services, install or upgrade Impala packages, or issue <code class="ph codeph">REFRESH</code> or
+        <code class="ph codeph">INVALIDATE METADATA</code> statements:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          <p class="p">
+            See <a class="xref" href="../shared/../topics/impala_install.html#install">Installing Impala</a>,
+            <a class="xref" href="../shared/../topics/impala_upgrading.html#upgrading">Upgrading Impala</a> and
+            <a class="xref" href="../shared/../topics/impala_processes.html#processes">Starting Impala</a>, for usage information for the
+            <span class="keyword cmdname">catalogd</span> daemon.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            The <code class="ph codeph">REFRESH</code> and <code class="ph codeph">INVALIDATE METADATA</code> statements are no longer needed
+            when the <code class="ph codeph">CREATE TABLE</code>, <code class="ph codeph">INSERT</code>, or other table-changing or
+            data-changing operation is performed through Impala. These statements are still needed if such
+            operations are done through Hive or by manipulating data files directly in HDFS, but in those cases the
+            statements only need to be issued on one Impala node rather than on all nodes. See
+            <a class="xref" href="../shared/../topics/impala_refresh.html#refresh">REFRESH Statement</a> and
+            <a class="xref" href="../shared/../topics/impala_invalidate_metadata.html#invalidate_metadata">INVALIDATE METADATA Statement</a> for the latest usage
+            information for those statements.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            See <a class="xref" href="../shared/../topics/impala_components.html#intro_catalogd">The Impala Catalog Service</a> for background information on the
+            <span class="keyword cmdname">catalogd</span> service.
+          </p>
+        </li>
+      </ul>
+
+      <p class="p">
+        The new resource management feature interacts with both YARN and Llama services.
+        <span class="ph">See
+        <a class="xref" href="impala_resource_management.html#resource_management">Resource Management for Impala</a> for usage information for Impala resource
+        management.</span>
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title30" id="incompatible_changes__incompatible_changes_111">
+
+    <h2 class="title topictitle2" id="ariaid-title30">Incompatible Changes Introduced in Impala 1.1.1</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        There are no incompatible changes in Impala 1.1.1.
+      </p>
+
+
+
+
+
+
+
+
+
+
+
+      <p class="p">
+        Previously, it was not possible to create Parquet data through Impala and reuse that table within Hive. Now
+        that Parquet support is available for Hive 10, reusing existing Impala Parquet data files in Hive requires
+        updating the table metadata. Use the following command if you are already running Impala 1.1.1:
+      </p>
+
+<pre class="pre codeblock"><code>ALTER TABLE <var class="keyword varname">table_name</var> SET FILEFORMAT PARQUETFILE;
+</code></pre>
+
+      <p class="p">
+        If you are running a level of Impala that is older than 1.1.1, do the metadata update through Hive:
+      </p>
+
+<pre class="pre codeblock"><code>ALTER TABLE <var class="keyword varname">table_name</var> SET SERDE 'parquet.hive.serde.ParquetHiveSerDe';
+ALTER TABLE <var class="keyword varname">table_name</var> SET FILEFORMAT
+  INPUTFORMAT "parquet.hive.DeprecatedParquetInputFormat"
+  OUTPUTFORMAT "parquet.hive.DeprecatedParquetOutputFormat";
+</code></pre>
+
+      <p class="p">
+        Impala 1.1.1 and higher can reuse Parquet data files created by Hive, without any action required.
+      </p>
+
+      <p class="p">
+        As usual, make sure to upgrade the Impala LZO package to the latest level at the same
+        time as you upgrade the Impala server.
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title31" id="incompatible_changes__incompatible_changes_11">
+
+    <h2 class="title topictitle2" id="ariaid-title31">Incompatible Change Introduced in Impala 1.1</h2>
+
+    <div class="body conbody">
+
+      <ul class="ul">
+        <li class="li">
+          <p class="p">
+            The <code class="ph codeph">REFRESH</code> statement now requires a table name; in Impala 1.0, the table name was
+            optional. This syntax change is part of the internal rework to make <code class="ph codeph">REFRESH</code> a true
+            Impala SQL statement so that it can be called through the JDBC and ODBC APIs. <code class="ph codeph">REFRESH</code>
+            now reloads the metadata immediately, rather than marking it for update the next time any affected
+            table is accessed. The previous behavior, where omitting the table name caused a refresh of the entire
+            Impala metadata catalog, is available through the new <code class="ph codeph">INVALIDATE METADATA</code> statement.
+            <code class="ph codeph">INVALIDATE METADATA</code> can be specified with a table name to affect a single table, or
+            without a table name to affect the entire metadata catalog; the relevant metadata is reloaded the next
+            time it is requested during the processing for a SQL statement. See
+            <a class="xref" href="impala_refresh.html#refresh">REFRESH Statement</a> and
+            <a class="xref" href="impala_invalidate_metadata.html#invalidate_metadata">INVALIDATE METADATA Statement</a> for the latest details about these
+            statements.
+          </p>
+        </li>
+      </ul>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title32" id="incompatible_changes__incompatible_changes_10">
+
+    <h2 class="title topictitle2" id="ariaid-title32">Incompatible Changes Introduced in Impala 1.0</h2>
+
+    <div class="body conbody">
+
+      <ul class="ul">
+        <li class="li">
+          If you use LZO-compressed text files, when you upgrade Impala to version 1.0, also update the
+          Impala LZO package to the latest level. See <a class="xref" href="impala_txtfile.html#lzo">Using LZO-Compressed Text Files</a> for
+          details.
+        </li>
+      </ul>
+    </div>
+  </article>
+
+</article></main></body></html>
\ No newline at end of file

[07/51] [partial] incubator-impala git commit: IMPALA-4181 [DOCS] Publish rendered Impala documentation to ASF site

Posted by jb...@apache.org.

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_shell_options.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_shell_options.html b/docs/build/html/topics/impala_shell_options.html
new file mode 100644
index 0000000..be21f0b
--- /dev/null
+++ b/docs/build/html/topics/impala_shell_options.html
@@ -0,0 +1,564 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_impala_shell.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="shell_options"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>impala-shell Configuration Options</title></head><body id="shell_options"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">impala-shell Configuration Options</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      You can specify the following options when starting the <code class="ph codeph">impala-shell</code> command to change how
+      shell commands are executed. The table shows the format to use when specifying each option on the command
+      line, or through the <span class="ph filepath">$HOME/.impalarc</span> configuration file.
+    </p>
+
+    <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+      <p class="p">
+        These options are different than the configuration options for the <code class="ph codeph">impalad</code> daemon itself.
+        For the <code class="ph codeph">impalad</code> options, see <a class="xref" href="impala_config_options.html#config_options">Modifying Impala Startup Options</a>.
+      </p>
+    </div>
+
+    <p class="p toc inpage"></p>
+  </div>
+
+  <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_impala_shell.html">Using the Impala Shell (impala-shell Command)</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="shell_options__shell_option_summary">
+
+    <h2 class="title topictitle2" id="ariaid-title2">Summary of impala-shell Configuration Options</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        The following table shows the names and allowed arguments for the <span class="keyword cmdname">impala-shell</span>
+        configuration options. You can specify options on the command line, or in a configuration file as described
+        in <a class="xref" href="impala_shell_options.html#shell_config_file">impala-shell Configuration File</a>.
+      </p>
+
+      <table class="table"><caption></caption><colgroup><col style="width:25%"><col style="width:25%"><col style="width:50%"></colgroup><thead class="thead">
+            <tr class="row">
+              <th class="entry nocellnorowborder" id="shell_option_summary__entry__1">
+                Command-Line Option
+              </th>
+              <th class="entry nocellnorowborder" id="shell_option_summary__entry__2">
+                Configuration File Setting
+              </th>
+              <th class="entry nocellnorowborder" id="shell_option_summary__entry__3">
+                Explanation
+              </th>
+            </tr>
+          </thead><tbody class="tbody">
+            <tr class="row">
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__1 ">
+                <p class="p">
+                  -B or --delimited
+                </p>
+              </td>
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__2 ">
+                <p class="p">
+                  write_delimited=true
+                </p>
+              </td>
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__3 ">
+                <p class="p">
+                  Causes all query results to be printed in plain format as a delimited text file. Useful for
+                  producing data files to be used with other Hadoop components. Also useful for avoiding the
+                  performance overhead of pretty-printing all output, especially when running benchmark tests using
+                  queries returning large result sets. Specify the delimiter character with the
+                  <code class="ph codeph">--output_delimiter</code> option. Store all query results in a file rather than
+                  printing to the screen with the <code class="ph codeph">-B</code> option. Added in Impala 1.0.1.
+                </p>
+              </td>
+            </tr>
+            <tr class="row">
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__1 ">
+                <p class="p">
+                  --print_header
+                </p>
+              </td>
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__2 ">
+                <p class="p">
+                  print_header=true
+                </p>
+              </td>
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__3 ">
+                <p class="p"></p>
+              </td>
+            </tr>
+            <tr class="row">
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__1 ">
+                <p class="p">
+                  -o <var class="keyword varname">filename</var> or --output_file <var class="keyword varname">filename</var>
+                </p>
+              </td>
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__2 ">
+                <p class="p">
+                  output_file=<var class="keyword varname">filename</var>
+                </p>
+              </td>
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__3 ">
+                <p class="p">
+                  Stores all query results in the specified file. Typically used to store the results of a single
+                  query issued from the command line with the <code class="ph codeph">-q</code> option. Also works for
+                  interactive sessions; you see the messages such as number of rows fetched, but not the actual
+                  result set. To suppress these incidental messages when combining the <code class="ph codeph">-q</code> and
+                  <code class="ph codeph">-o</code> options, redirect <code class="ph codeph">stderr</code> to <code class="ph codeph">/dev/null</code>.
+                  Added in Impala 1.0.1.
+                </p>
+              </td>
+            </tr>
+            <tr class="row">
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__1 ">
+                <p class="p">
+                  --output_delimiter=<var class="keyword varname">character</var>
+                </p>
+              </td>
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__2 ">
+                <p class="p">
+                  output_delimiter=<var class="keyword varname">character</var>
+                </p>
+              </td>
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__3 ">
+                <p class="p">
+                  Specifies the character to use as a delimiter between fields when query results are printed in
+                  plain format by the <code class="ph codeph">-B</code> option. Defaults to tab (<code class="ph codeph">'\t'</code>). If an
+                  output value contains the delimiter character, that field is quoted, escaped by doubling quotation marks, or both. Added in
+                  Impala 1.0.1.
+                </p>
+              </td>
+            </tr>
+            <tr class="row">
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__1 ">
+                <p class="p">
+                  -p or --show_profiles
+                </p>
+              </td>
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__2 ">
+                <p class="p">
+                  show_profiles=true
+                </p>
+              </td>
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__3 ">
+                <p class="p">
+                  Displays the query execution plan (same output as the <code class="ph codeph">EXPLAIN</code> statement) and a
+                  more detailed low-level breakdown of execution steps, for every query executed by the shell.
+                </p>
+              </td>
+            </tr>
+            <tr class="row">
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__1 ">
+                <p class="p">
+                  -h or --help
+                </p>
+              </td>
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__2 ">
+                <p class="p">
+                  N/A
+                </p>
+              </td>
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__3 ">
+                <p class="p">
+                  Displays help information.
+                </p>
+              </td>
+            </tr>
+            <tr class="row">
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__1 ">
+                <p class="p">
+                  -i <var class="keyword varname">hostname</var> or
+                  --impalad=<var class="keyword varname">hostname</var>[:<var class="keyword varname">portnum</var>]
+                </p>
+              </td>
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__2 ">
+                <p class="p">
+                  impalad=<var class="keyword varname">hostname</var>[:<var class="keyword varname">portnum</var>]
+                </p>
+              </td>
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__3 ">
+                <p class="p">
+                  Connects to the <code class="ph codeph">impalad</code> daemon on the specified host. The default port of 21000
+                  is assumed unless you provide another value. You can connect to any host in your cluster that is
+                  running <code class="ph codeph">impalad</code>. If you connect to an instance of <code class="ph codeph">impalad</code> that
+                  was started with an alternate port specified by the <code class="ph codeph">--fe_port</code> flag, provide that
+                  alternative port.
+                </p>
+              </td>
+            </tr>
+            <tr class="row">
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__1 ">
+                <p class="p">
+                  -q <var class="keyword varname">query</var> or --query=<var class="keyword varname">query</var>
+                </p>
+              </td>
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__2 ">
+                <p class="p">
+                  query=<var class="keyword varname">query</var>
+                </p>
+              </td>
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__3 ">
+                <p class="p">
+                  Passes a query or other <span class="keyword cmdname">impala-shell</span> command from the command line. The
+                  <span class="keyword cmdname">impala-shell</span> interpreter immediately exits after processing the statement. It
+                  is limited to a single statement, which could be a <code class="ph codeph">SELECT</code>, <code class="ph codeph">CREATE
+                  TABLE</code>, <code class="ph codeph">SHOW TABLES</code>, or any other statement recognized in
+                  <code class="ph codeph">impala-shell</code>. Because you cannot pass a <code class="ph codeph">USE</code> statement and
+                  another query, fully qualify the names for any tables outside the <code class="ph codeph">default</code>
+                  database. (Or use the <code class="ph codeph">-f</code> option to pass a file with a <code class="ph codeph">USE</code>
+                  statement followed by other queries.)
+                </p>
+              </td>
+            </tr>
+            <tr class="row">
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__1 ">
+                <p class="p">
+                  -f <var class="keyword varname">query_file</var> or --query_file=<var class="keyword varname">query_file</var>
+                </p>
+              </td>
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__2 ">
+                <p class="p">
+                  query_file=<var class="keyword varname">path_to_query_file</var>
+                </p>
+              </td>
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__3 ">
+                <p class="p">
+                  Passes a SQL query from a file. Multiple statements must be semicolon (;) delimited.
+                  <span class="ph">In <span class="keyword">Impala 2.3</span> and higher, you can specify a filename of <code class="ph codeph">-</code>
+                  to represent standard input. This feature makes it convenient to use <span class="keyword cmdname">impala-shell</span>
+                  as part of a Unix pipeline where SQL statements are generated dynamically by other tools.</span>
+                </p>
+              </td>
+            </tr>
+            <tr class="row">
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__1 ">
+                <p class="p">
+                  -k or --kerberos
+                </p>
+              </td>
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__2 ">
+                <p class="p">
+                  use_kerberos=true
+                </p>
+              </td>
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__3 ">
+                <p class="p">
+                  Kerberos authentication is used when the shell connects to <code class="ph codeph">impalad</code>. If Kerberos
+                  is not enabled on the instance of <code class="ph codeph">impalad</code> to which you are connecting, errors
+                  are displayed.
+                </p>
+              </td>
+            </tr>
+            <tr class="row">
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__1 ">
+                <p class="p">
+                  -s <var class="keyword varname">kerberos_service_name</var> or --kerberos_service_name=<var class="keyword varname">name</var>
+                </p>
+              </td>
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__2 ">
+                <p class="p">
+                  kerberos_service_name=<var class="keyword varname">name</var>
+                </p>
+              </td>
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__3 ">
+                <p class="p">
+                  Instructs <code class="ph codeph">impala-shell</code> to authenticate to a particular <code class="ph codeph">impalad</code>
+                  service principal. If a <var class="keyword varname">kerberos_service_name</var> is not specified,
+                  <code class="ph codeph">impala</code> is used by default. If this option is used in conjunction with a
+                  connection in which Kerberos is not supported, errors are returned.
+                </p>
+              </td>
+            </tr>
+            <tr class="row">
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__1 ">
+                <p class="p">
+                  -V or --verbose
+                </p>
+              </td>
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__2 ">
+                <p class="p">
+                  verbose=true
+                </p>
+              </td>
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__3 ">
+                <p class="p">
+                  Enables verbose output.
+                </p>
+              </td>
+            </tr>
+            <tr class="row">
+
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__1 ">
+                <p class="p">
+                  --quiet
+                </p>
+              </td>
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__2 ">
+                <p class="p">
+                  verbose=false
+                </p>
+              </td>
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__3 ">
+                <p class="p">
+                  Disables verbose output.
+                </p>
+              </td>
+            </tr>
+            <tr class="row">
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__1 ">
+                <p class="p">
+                  -v or --version
+                </p>
+              </td>
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__2 ">
+                <p class="p">
+                  version=true
+                </p>
+              </td>
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__3 ">
+                <p class="p">
+                  Displays version information.
+                </p>
+              </td>
+            </tr>
+            <tr class="row">
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__1 ">
+                <p class="p">
+                  -c
+                </p>
+              </td>
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__2 ">
+                <p class="p">
+                  ignore_query_failure=true
+                </p>
+              </td>
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__3 ">
+                <p class="p">
+                  Continues on query failure.
+                </p>
+              </td>
+            </tr>
+            <tr class="row">
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__1 ">
+                <p class="p">
+                  -r or --refresh_after_connect
+                </p>
+              </td>
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__2 ">
+                <p class="p">
+                  refresh_after_connect=true
+                </p>
+              </td>
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__3 ">
+                <p class="p">
+                  Updates Impala metadata upon connection. Same as running the
+                  <code class="ph codeph"><a class="xref" href="impala_invalidate_metadata.html#invalidate_metadata">INVALIDATE
+                  METADATA</a></code> statement after connecting. (This option was originally named when the
+                  <code class="ph codeph">REFRESH</code> statement did the extensive metadata updates now performed by
+                  <code class="ph codeph">INVALIDATE METADATA</code>.)
+                </p>
+              </td>
+            </tr>
+            <tr class="row">
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__1 ">
+                <p class="p">
+                  -d <var class="keyword varname">default_db</var> or --database=<var class="keyword varname">default_db</var>
+                </p>
+              </td>
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__2 ">
+                <p class="p">
+                  default_db=<var class="keyword varname">default_db</var>
+                </p>
+              </td>
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__3 ">
+                <p class="p">
+                  Specifies the database to be used on startup. Same as running the
+                  <code class="ph codeph"><a class="xref" href="impala_use.html#use">USE</a></code> statement after connecting. If not
+                  specified, a database named <code class="ph codeph">DEFAULT</code> is used.
+                </p>
+              </td>
+            </tr>
+            <tr class="row">
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__1 ">
+                -ssl
+              </td>
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__2 ">
+                ssl=true
+              </td>
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__3 ">
+                Enables TLS/SSL for <span class="keyword cmdname">impala-shell</span>.
+              </td>
+            </tr>
+            <tr class="row">
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__1 ">
+                --ca_cert=<var class="keyword varname">path_to_certificate</var>
+              </td>
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__2 ">
+                ca_cert=<var class="keyword varname">path_to_certificate</var>
+              </td>
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__3 ">
+                The local pathname pointing to the third-party CA certificate, or to a copy of the server
+                certificate for self-signed server certificates. If <code class="ph codeph">--ca_cert</code> is not set,
+                <span class="keyword cmdname">impala-shell</span> enables TLS/SSL, but does not validate the server certificate. This is
+                useful for connecting to a known-good Impala that is only running over TLS/SSL, when a copy of the
+                certificate is not available (such as when debugging customer installations).
+              </td>
+            </tr>
+            <tr class="row">
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__1 ">
+                -l
+              </td>
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__2 ">
+                use_ldap=true
+              </td>
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__3 ">
+                Enables LDAP authentication.
+              </td>
+            </tr>
+            <tr class="row">
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__1 ">
+                -u
+              </td>
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__2 ">
+                user=<var class="keyword varname">user_name</var>
+              </td>
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__3 ">
+                Supplies the username, when LDAP authentication is enabled by the <code class="ph codeph">-l</code> option.
+                (Specify the short username, not the full LDAP distinguished name.) The shell then prompts
+                interactively for the password.
+              </td>
+            </tr>
+            <tr class="row">
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__1 ">
+                --ldap_password_cmd=<var class="keyword varname">command</var>
+              </td>
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__2 ">
+                N/A
+              </td>
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__3 ">
+                Specifies a command to run to retrieve the LDAP password,
+                when LDAP authentication is enabled by the <code class="ph codeph">-l</code> option.
+                If the command includes space-separated arguments, enclose the command and
+                its arguments in quotation marks.
+              </td>
+            </tr>
+            <tr class="row">
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__1 ">
+                --config_file=<var class="keyword varname">path_to_config_file</var>
+              </td>
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__2 ">
+                N/A
+              </td>
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__3 ">
+                Specifies the path of the file containing <span class="keyword cmdname">impala-shell</span> configuration settings.
+                The default is <span class="ph filepath">$HOME/.impalarc</span>. This setting can only be specified on the
+                command line.
+              </td>
+            </tr>
+            <tr class="row">
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__1 ">--live_progress</td>
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__2 ">N/A</td>
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__3 ">Prints a progress bar showing roughly the percentage complete for each query.
+              The information is updated interactively as the query progresses.
+              See <a class="xref" href="impala_live_progress.html#live_progress">LIVE_PROGRESS Query Option (Impala 2.3 or higher only)</a>.</td>
+            </tr>
+            <tr class="row">
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__1 ">--live_summary</td>
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__2 ">N/A</td>
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__3 ">Prints a detailed report, similar to the <code class="ph codeph">SUMMARY</code> command, showing progress details for each phase of query execution.
+              The information is updated interactively as the query progresses.
+              See <a class="xref" href="impala_live_summary.html#live_summary">LIVE_SUMMARY Query Option (Impala 2.3 or higher only)</a>.</td>
+            </tr>
+            <tr class="row">
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__1 ">--var=<var class="keyword varname">variable_name</var>=<var class="keyword varname">value</var></td>
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__2 ">N/A</td>
+              <td class="entry nocellnorowborder" headers="shell_option_summary__entry__3 ">
+                Defines a substitution variable that can be used within the <span class="keyword cmdname">impala-shell</span> session.
+                The variable can be substituted into statements processed by the <code class="ph codeph">-q</code> or <code class="ph codeph">-f</code> options,
+                or in an interactive shell session.
+                Within a SQL statement, you substitute the value by using the notation <code class="ph codeph">${var:<var class="keyword varname">variable_name</var>}</code>.
+                This feature is available in <span class="keyword">Impala 2.5</span> and higher.
+              </td>
+            </tr>
+          </tbody></table>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="shell_options__shell_config_file">
+
+    <h2 class="title topictitle2" id="ariaid-title3">impala-shell Configuration File</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        You can define a set of default options for your <span class="keyword cmdname">impala-shell</span> environment, stored in the
+        file <span class="ph filepath">$HOME/.impalarc</span>. This file consists of key-value pairs, one option per line.
+        Everything after a <code class="ph codeph">#</code> character on a line is treated as a comment and ignored.
+      </p>
+
+      <p class="p">
+        The configuration file must contain a header label <code class="ph codeph">[impala]</code>, followed by the options
+        specific to <span class="keyword cmdname">impala-shell</span>. (This standard convention for configuration files lets you
+        use a single file to hold configuration options for multiple applications.)
+      </p>
+
+      <p class="p">
+        To specify a different filename or path for the configuration file, specify the argument
+        <code class="ph codeph">--config_file=<var class="keyword varname">path_to_config_file</var></code> on the
+        <span class="keyword cmdname">impala-shell</span> command line.
+      </p>
+
+      <p class="p">
+        The names of the options in the configuration file are similar (although not necessarily identical) to the
+        long-form command-line arguments to the <span class="keyword cmdname">impala-shell</span> command. For the names to use, see
+        <a class="xref" href="impala_shell_options.html#shell_option_summary">Summary of impala-shell Configuration Options</a>.
+      </p>
+
+      <p class="p">
+        Any options you specify on the <span class="keyword cmdname">impala-shell</span> command line override any corresponding
+        options within the configuration file.
+      </p>
+
+      <p class="p">
+        The following example shows a configuration file that you might use during benchmarking tests. It sets
+        verbose mode, so that the output from each SQL query is followed by timing information.
+        <span class="keyword cmdname">impala-shell</span> starts inside the database containing the tables with the benchmark data,
+        avoiding the need to issue a <code class="ph codeph">USE</code> statement or use fully qualified table names.
+      </p>
+
+      <p class="p">
+        In this example, the query output is formatted as delimited text rather than enclosed in ASCII art boxes,
+        and is stored in a file rather than printed to the screen. Those options are appropriate for benchmark
+        situations, so that the overhead of <span class="keyword cmdname">impala-shell</span> formatting and printing the result set
+        does not factor into the timing measurements. It also enables the <code class="ph codeph">show_profiles</code> option.
+        That option prints detailed performance information after each query, which might be valuable in
+        understanding the performance of benchmark queries.
+      </p>
+
+<pre class="pre codeblock"><code>[impala]
+verbose=true
+default_db=tpc_benchmarking
+write_delimited=true
+output_delimiter=,
+output_file=/home/tester1/benchmark_results.csv
+show_profiles=true
+</code></pre>
+
+      <p class="p">
+        The following example shows a configuration file that connects to a specific remote Impala node, runs a
+        single query within a particular database, then exits. You would typically use this kind of single-purpose
+        configuration setting with the <span class="keyword cmdname">impala-shell</span> command-line option
+        <code class="ph codeph">--config_file=<var class="keyword varname">path_to_config_file</var></code>, to easily select between many
+        predefined queries that could be run against different databases, hosts, or even different clusters. To run
+        a sequence of statements instead of a single query, specify the configuration option
+        <code class="ph codeph">query_file=<var class="keyword varname">path_to_query_file</var></code> instead.
+      </p>
+
+<pre class="pre codeblock"><code>[impala]
+impalad=impala-test-node1.example.com
+default_db=site_stats
+# Issue a predefined query and immediately exit.
+query=select count(*) from web_traffic where event_date = trunc(now(),'dd')
+</code></pre>
+    </div>
+  </article>
+</article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_shell_running_commands.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_shell_running_commands.html b/docs/build/html/topics/impala_shell_running_commands.html
new file mode 100644
index 0000000..e0e8880
--- /dev/null
+++ b/docs/build/html/topics/impala_shell_running_commands.html
@@ -0,0 +1,257 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_impala_shell.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="shell_running_commands"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Running Commands and SQL Statements in impala-shell</title></head><body id="shell_running_commands"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Running Commands and SQL Statements in impala-shell</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      For information on available commands, see
+      <a class="xref" href="impala_shell_commands.html#shell_commands">impala-shell Command Reference</a>. You can see the full set of available
+      commands by pressing TAB twice, for example:
+    </p>
+
+<pre class="pre codeblock"><code>[impalad-host:21000] &gt;
+connect   describe  explain   help      history   insert    quit      refresh   select    set       shell     show      use       version
+[impalad-host:21000] &gt;</code></pre>
+
+    <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+      Commands must be terminated by a semi-colon. A command can span multiple lines.
+    </div>
+
+    <p class="p">
+      For example:
+    </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; select *
+                  &gt; from t1
+                  &gt; limit 5;
++---------+-----------+
+| s1      | s2        |
++---------+-----------+
+| hello   | world     |
+| goodbye | cleveland |
++---------+-----------+
+</code></pre>
+
+    <p class="p">
+      A comment is considered part of the statement it precedes, so when you enter a <code class="ph codeph">--</code> or
+      <code class="ph codeph">/* */</code> comment, you get a continuation prompt until you finish entering a statement ending
+      with a semicolon:
+    </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; -- This is a test comment
+                  &gt; show tables like 't*';
++--------+
+| name   |
++--------+
+| t1     |
+| t2     |
+| tab1   |
+| tab2   |
+| tab3   |
+| text_t |
++--------+
+</code></pre>
+
+    <p class="p">
+      Use the up-arrow and down-arrow keys to cycle through and edit previous commands.
+      <span class="keyword cmdname">impala-shell</span> uses the <code class="ph codeph">readline</code> library and so supports a standard set of
+      keyboard shortcuts for editing and cursor movement, such as <code class="ph codeph">Ctrl-A</code> for beginning of line and
+      <code class="ph codeph">Ctrl-E</code> for end of line.
+    </p>
+
+    <p class="p">
+      In <span class="keyword">Impala 2.5</span> and higher, you can define substitution variables to be used within SQL statements
+      processed by <span class="keyword cmdname">impala-shell</span>. On the command line, you specify the option
+      <code class="ph codeph">--var=<var class="keyword varname">variable_name</var>=<var class="keyword varname">value</var></code>.
+      Within an interactive session or a script file processed by the <code class="ph codeph">-f</code> option, you specify
+      a <code class="ph codeph">SET</code> command using the notation <code class="ph codeph">SET VAR:<var class="keyword varname">variable_name</var>=<var class="keyword varname">value</var></code>.
+      Within a SQL statement, you substitute the value by using the notation <code class="ph codeph">${var:<var class="keyword varname">variable_name</var>}</code>.
+    </p>
+
+    <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+      Because this feature is part of <span class="keyword cmdname">impala-shell</span> rather than the <span class="keyword cmdname">impalad</span>
+      backend, make sure the client system you are connecting from has the most recent <span class="keyword cmdname">impala-shell</span>.
+      You can use this feature with a new <span class="keyword cmdname">impala-shell</span> connecting to an older <span class="keyword cmdname">impalad</span>,
+      but not the reverse.
+    </div>
+
+    <p class="p">
+      For example, here are some <span class="keyword cmdname">impala-shell</span> commands that define substitution variables and then
+      use them in SQL statements executed through the <code class="ph codeph">-q</code> and <code class="ph codeph">-f</code> options.
+      Notice how the <code class="ph codeph">-q</code> argument strings are single-quoted to prevent shell expansion of the
+      <code class="ph codeph">${var:value}</code> notation, and any string literals within the queries are enclosed by double quotation marks.
+    </p>
+
+<pre class="pre codeblock"><code>
+$ impala-shell --var=tname=table1 --var=colname=x --var=coltype=string -q 'create table ${var:tname} (${var:colname} ${var:coltype}) stored as parquet'
+Starting Impala Shell without Kerberos authentication
+Connected to <var class="keyword varname">hostname</var>
+Server version: <var class="keyword varname">impalad_version</var>
+Query: create table table1 (x string) stored as parquet
+
+$ NEW_STRING="hello world"
+$ impala-shell --var=tname=table1 --var=insert_val="$NEW_STRING" -q 'insert into ${var:tname} values ("${var:insert_val}")'
+Starting Impala Shell without Kerberos authentication
+Connected to <var class="keyword varname">hostname</var>
+Server version: <var class="keyword varname">impalad_version</var>
+Query: insert into table1 values ("hello world")
+Inserted 1 row(s) in 1.40s
+
+$ for VAL in foo bar bletch
+do
+  impala-shell --var=tname=table1 --var=insert_val="$VAL" -q 'insert into ${var:tname} values ("${var:insert_val}")'
+done
+...
+Query: insert into table1 values ("foo")
+Inserted 1 row(s) in 0.22s
+Query: insert into table1 values ("bar")
+Inserted 1 row(s) in 0.11s
+Query: insert into table1 values ("bletch")
+Inserted 1 row(s) in 0.21s
+
+$ echo "Search for what substring?" ; read answer
+Search for what substring?
+b
+$ impala-shell --var=tname=table1 -q 'select x from ${var:tname} where x like "%${var:answer}%"'
+Starting Impala Shell without Kerberos authentication
+Connected to <var class="keyword varname">hostname</var>
+Server version: <var class="keyword varname">impalad_version</var>
+Query: select x from table1 where x like "%b%"
++--------+
+| x      |
++--------+
+| bletch |
+| bar    |
++--------+
+Fetched 2 row(s) in 0.83s
+</code></pre>
+
+    <p class="p">
+      Here is a substitution variable passed in by the <code class="ph codeph">--var</code> option,
+      and then referenced by statements issued interactively. Then the variable is
+      cleared with the <code class="ph codeph">UNSET</code> command, and defined again with the
+      <code class="ph codeph">SET</code> command.
+    </p>
+
+<pre class="pre codeblock"><code>
+$ impala-shell --quiet --var=tname=table1
+Starting Impala Shell without Kerberos authentication
+***********************************************************************************
+<var class="keyword varname">banner_message</var>
+***********************************************************************************
+[<var class="keyword varname">hostname</var>:21000] &gt; select count(*) from ${var:tname};
++----------+
+| count(*) |
++----------+
+| 4        |
++----------+
+[<var class="keyword varname">hostname</var>:21000] &gt; unset var:tname;
+Unsetting variable TNAME
+[<var class="keyword varname">hostname</var>:21000] &gt; select count(*) from ${var:tname};
+Error: Unknown variable TNAME
+[<var class="keyword varname">hostname</var>:21000] &gt; set var:tname=table1;
+[<var class="keyword varname">hostname</var>:21000] &gt; select count(*) from ${var:tname};
++----------+
+| count(*) |
++----------+
+| 4        |
++----------+
+</code></pre>
+
+    <p class="p">
+      The following example shows how the <code class="ph codeph">SOURCE</code> command can execute
+      a series of statements from a file:
+    </p>
+
+<pre class="pre codeblock"><code>
+$ cat commands.sql
+show databases;
+show tables in default;
+show functions in _impala_builtins like '*minute*';
+
+$ impala-shell -i localhost
+...
+[localhost:21000] &gt; source commands.sql;
+Query: show databases
++------------------+----------------------------------------------+
+| name             | comment                                      |
++------------------+----------------------------------------------+
+| _impala_builtins | System database for Impala builtin functions |
+| default          | Default Hive database                        |
++------------------+----------------------------------------------+
+Fetched 2 row(s) in 0.06s
+Query: show tables in default
++-----------+
+| name      |
++-----------+
+| customers |
+| sample_07 |
+| sample_08 |
+| web_logs  |
++-----------+
+Fetched 4 row(s) in 0.02s
+Query: show functions in _impala_builtins like '*minute*'
++-------------+--------------------------------+-------------+---------------+
+| return type | signature                      | binary type | is persistent |
++-------------+--------------------------------+-------------+---------------+
+| INT         | minute(TIMESTAMP)              | BUILTIN     | true          |
+| TIMESTAMP   | minutes_add(TIMESTAMP, BIGINT) | BUILTIN     | true          |
+| TIMESTAMP   | minutes_add(TIMESTAMP, INT)    | BUILTIN     | true          |
+| TIMESTAMP   | minutes_sub(TIMESTAMP, BIGINT) | BUILTIN     | true          |
+| TIMESTAMP   | minutes_sub(TIMESTAMP, INT)    | BUILTIN     | true          |
++-------------+--------------------------------+-------------+---------------+
+Fetched 5 row(s) in 0.03s
+</code></pre>
+
+    <p class="p">
+      The following example shows how a file that is run by the <code class="ph codeph">SOURCE</code> command,
+      or through the <code class="ph codeph">-q</code> or <code class="ph codeph">-f</code> options of <span class="keyword cmdname">impala-shell</span>,
+      can contain additional <code class="ph codeph">SOURCE</code> commands.
+      The first file, <span class="ph filepath">nested1.sql</span>, runs an <span class="keyword cmdname">impala-shell</span> command
+      and then also runs the commands from <span class="ph filepath">nested2.sql</span>.
+      This ability for scripts to call each other is often useful for code that sets up schemas for applications
+      or test environments.
+    </p>
+
+<pre class="pre codeblock"><code>
+$ cat nested1.sql
+show functions in _impala_builtins like '*minute*';
+source nested2.sql
+$ cat nested2.sql
+show functions in _impala_builtins like '*hour*'
+
+$ impala-shell -i localhost -f nested1.sql
+Starting Impala Shell without Kerberos authentication
+Connected to localhost:21000
+...
+Query: show functions in _impala_builtins like '*minute*'
++-------------+--------------------------------+-------------+---------------+
+| return type | signature                      | binary type | is persistent |
++-------------+--------------------------------+-------------+---------------+
+| INT         | minute(TIMESTAMP)              | BUILTIN     | true          |
+| TIMESTAMP   | minutes_add(TIMESTAMP, BIGINT) | BUILTIN     | true          |
+| TIMESTAMP   | minutes_add(TIMESTAMP, INT)    | BUILTIN     | true          |
+| TIMESTAMP   | minutes_sub(TIMESTAMP, BIGINT) | BUILTIN     | true          |
+| TIMESTAMP   | minutes_sub(TIMESTAMP, INT)    | BUILTIN     | true          |
++-------------+--------------------------------+-------------+---------------+
+Fetched 5 row(s) in 0.01s
+Query: show functions in _impala_builtins like '*hour*'
++-------------+------------------------------+-------------+---------------+
+| return type | signature                    | binary type | is persistent |
++-------------+------------------------------+-------------+---------------+
+| INT         | hour(TIMESTAMP)              | BUILTIN     | true          |
+| TIMESTAMP   | hours_add(TIMESTAMP, BIGINT) | BUILTIN     | true          |
+| TIMESTAMP   | hours_add(TIMESTAMP, INT)    | BUILTIN     | true          |
+| TIMESTAMP   | hours_sub(TIMESTAMP, BIGINT) | BUILTIN     | true          |
+| TIMESTAMP   | hours_sub(TIMESTAMP, INT)    | BUILTIN     | true          |
++-------------+------------------------------+-------------+---------------+
+Fetched 5 row(s) in 0.01s
+</code></pre>
+
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_impala_shell.html">Using the Impala Shell (impala-shell Command)</a></div></div></nav></article></main></body></html>
\ No newline at end of file

[47/51] [partial] incubator-impala git commit: IMPALA-4181 [DOCS] Publish rendered Impala documentation to ASF site

Posted by jb...@apache.org.

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_appx_median.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_appx_median.html b/docs/build/html/topics/impala_appx_median.html
new file mode 100644
index 0000000..1883f2c
--- /dev/null
+++ b/docs/build/html/topics/impala_appx_median.html
@@ -0,0 +1,127 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_aggregate_functions.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="appx_median"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>APPX_MEDIAN Function</title></head><body id="appx_median"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">APPX_MEDIAN Function</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      
+      An aggregate function that returns a value that is approximately the median (midpoint) of values in the set
+      of input values.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>APPX_MEDIAN([DISTINCT | ALL] <var class="keyword varname">expression</var>)
+</code></pre>
+
+    <p class="p">
+      This function works with any input type, because the only requirement is that the type supports less-than and
+      greater-than comparison operators.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+    <p class="p">
+      Because the return value represents the estimated midpoint, it might not reflect the precise midpoint value,
+      especially if the cardinality of the input values is very high. If the cardinality is low (up to
+      approximately 20,000), the result is more accurate because the sampling considers all or almost all of the
+      different values.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Return type:</strong> Same as the input value, except for <code class="ph codeph">CHAR</code> and <code class="ph codeph">VARCHAR</code>
+        arguments which produce a <code class="ph codeph">STRING</code> result
+      </p>
+
+    <p class="p">
+      The return value is always the same as one of the input values, not an <span class="q">"in-between"</span> value produced by
+      averaging.
+    </p>
+
+
+
+    <p class="p">
+        <strong class="ph b">Restrictions:</strong>
+      </p>
+
+    <p class="p">
+        This function cannot be used in an analytic context. That is, the <code class="ph codeph">OVER()</code> clause is not allowed at all with this function.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+    <p class="p">
+      The following example uses a table of a million random floating-point numbers ranging up to approximately
+      50,000. The average is approximately 25,000. Because of the random distribution, we would expect the median
+      to be close to this same number. Computing the precise median is a more intensive operation than computing
+      the average, because it requires keeping track of every distinct value and how many times each occurs. The
+      <code class="ph codeph">APPX_MEDIAN()</code> function uses a sampling algorithm to return an approximate result, which in
+      this case is close to the expected value. To make sure that the value is not substantially out of range due
+      to a skewed distribution, subsequent queries confirm that there are approximately 500,000 values higher than
+      the <code class="ph codeph">APPX_MEDIAN()</code> value, and approximately 500,000 values lower than the
+      <code class="ph codeph">APPX_MEDIAN()</code> value.
+    </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; select min(x), max(x), avg(x) from million_numbers;
++-------------------+-------------------+-------------------+
+| min(x)            | max(x)            | avg(x)            |
++-------------------+-------------------+-------------------+
+| 4.725693727250069 | 49994.56852674231 | 24945.38563793553 |
++-------------------+-------------------+-------------------+
+[localhost:21000] &gt; select appx_median(x) from million_numbers;
++----------------+
+| appx_median(x) |
++----------------+
+| 24721.6        |
++----------------+
+[localhost:21000] &gt; select count(x) as higher from million_numbers where x &gt; (select appx_median(x) from million_numbers);
++--------+
+| higher |
++--------+
+| 502013 |
++--------+
+[localhost:21000] &gt; select count(x) as lower from million_numbers where x &lt; (select appx_median(x) from million_numbers);
++--------+
+| lower  |
++--------+
+| 497987 |
++--------+
+</code></pre>
+
+    <p class="p">
+      The following example computes the approximate median using a subset of the values from the table, and then
+      confirms that the result is a reasonable estimate for the midpoint.
+    </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; select appx_median(x) from million_numbers where x between 1000 and 5000;
++-------------------+
+| appx_median(x)    |
++-------------------+
+| 3013.107787358159 |
++-------------------+
+[localhost:21000] &gt; select count(x) as higher from million_numbers where x between 1000 and 5000 and x &gt; 3013.107787358159;
++--------+
+| higher |
++--------+
+| 37692  |
++--------+
+[localhost:21000] &gt; select count(x) as lower from million_numbers where x between 1000 and 5000 and x &lt; 3013.107787358159;
++-------+
+| lower |
++-------+
+| 37089 |
++-------+
+</code></pre>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_aggregate_functions.html">Impala Aggregate Functions</a></div></div></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_array.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_array.html b/docs/build/html/topics/impala_array.html
new file mode 100644
index 0000000..45c9a42
--- /dev/null
+++ b/docs/build/html/topics/impala_array.html
@@ -0,0 +1,321 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_datatypes.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="array"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>ARRAY Complex Type (Impala 2.3 or higher only)</title></head><body id="array"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">ARRAY Complex Type (<span class="keyword">Impala 2.3</span> or higher only)</h1>
+
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      A complex data type that can represent an arbitrary number of ordered elements.
+      The elements can be scalars or another complex type (<code class="ph codeph">ARRAY</code>,
+      <code class="ph codeph">STRUCT</code>, or <code class="ph codeph">MAP</code>).
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+
+
+<pre class="pre codeblock"><code><var class="keyword varname">column_name</var> ARRAY &lt; <var class="keyword varname">type</var> &gt;
+
+type ::= <var class="keyword varname">primitive_type</var> | <var class="keyword varname">complex_type</var>
+</code></pre>
+
+      <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+      <p class="p">
+        Because complex types are often used in combination,
+        for example an <code class="ph codeph">ARRAY</code> of <code class="ph codeph">STRUCT</code>
+        elements, if you are unfamiliar with the Impala complex types,
+        start with <a class="xref" href="../shared/../topics/impala_complex_types.html#complex_types">Complex Types (Impala 2.3 or higher only)</a> for
+        background information and usage examples.
+      </p>
+
+      <p class="p">
+        The elements of the array have no names. You refer to the value of the array item using the
+        <code class="ph codeph">ITEM</code> pseudocolumn, or its position in the array with the <code class="ph codeph">POS</code>
+        pseudocolumn. See <a class="xref" href="impala_complex_types.html#item">ITEM and POS Pseudocolumns</a> for information about
+        these pseudocolumns.
+      </p>
+
+
+
+    <p class="p">
+      Each row can have a different number of elements (including none) in the array for that row.
+    </p>
+
+
+
+      <p class="p">
+        When an array contains items of scalar types, you can use aggregation functions on the array elements without using join notation. For
+        example, you can find the <code class="ph codeph">COUNT()</code>, <code class="ph codeph">AVG()</code>, <code class="ph codeph">SUM()</code>, and so on of numeric array
+        elements, or the <code class="ph codeph">MAX()</code> and <code class="ph codeph">MIN()</code> of any scalar array elements by referring to
+        <code class="ph codeph"><var class="keyword varname">table_name</var>.<var class="keyword varname">array_column</var></code> in the <code class="ph codeph">FROM</code> clause of the query. When
+        you need to cross-reference values from the array with scalar values from the same row, such as by including a <code class="ph codeph">GROUP
+        BY</code> clause to produce a separate aggregated result for each row, then the join clause is required.
+      </p>
+
+      <p class="p">
+        A common usage pattern with complex types is to have an array as the top-level type for the column:
+        an array of structs, an array of maps, or an array of arrays.
+        For example, you can model a denormalized table by creating a column that is an <code class="ph codeph">ARRAY</code>
+        of <code class="ph codeph">STRUCT</code> elements; each item in the array represents a row from a table that would
+        normally be used in a join query. This kind of data structure lets you essentially denormalize tables by
+        associating multiple rows from one table with the matching row in another table.
+      </p>
+
+      <p class="p">
+        You typically do not create more than one top-level <code class="ph codeph">ARRAY</code> column, because if there is
+        some relationship between the elements of multiple arrays, it is convenient to model the data as
+        an array of another complex type element (either <code class="ph codeph">STRUCT</code> or <code class="ph codeph">MAP</code>).
+      </p>
+
+      <p class="p">
+        You can pass a multi-part qualified name to <code class="ph codeph">DESCRIBE</code>
+        to specify an <code class="ph codeph">ARRAY</code>, <code class="ph codeph">STRUCT</code>, or <code class="ph codeph">MAP</code>
+        column and visualize its structure as if it were a table.
+        For example, if table <code class="ph codeph">T1</code> contains an <code class="ph codeph">ARRAY</code> column
+        <code class="ph codeph">A1</code>, you could issue the statement <code class="ph codeph">DESCRIBE t1.a1</code>.
+        If table <code class="ph codeph">T1</code> contained a <code class="ph codeph">STRUCT</code> column <code class="ph codeph">S1</code>,
+        and a field <code class="ph codeph">F1</code> within the <code class="ph codeph">STRUCT</code> was a <code class="ph codeph">MAP</code>,
+        you could issue the statement <code class="ph codeph">DESCRIBE t1.s1.f1</code>.
+        An <code class="ph codeph">ARRAY</code> is shown as a two-column table, with
+        <code class="ph codeph">ITEM</code> and <code class="ph codeph">POS</code> columns.
+        A <code class="ph codeph">STRUCT</code> is shown as a table with each field
+        representing a column in the table.
+        A <code class="ph codeph">MAP</code> is shown as a two-column table, with
+        <code class="ph codeph">KEY</code> and <code class="ph codeph">VALUE</code> columns.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.3.0</span>
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Restrictions:</strong>
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          <p class="p">
+            Columns with this data type can only be used in tables or partitions with the Parquet file format.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            Columns with this data type cannot be used as partition key columns in a partitioned table.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            The <code class="ph codeph">COMPUTE STATS</code> statement does not produce any statistics for columns of this data type.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p" id="array__d6e2889">
+            The maximum length of the column definition for any complex type, including declarations for any nested types,
+            is 4000 characters.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            See <a class="xref" href="../shared/../topics/impala_complex_types.html#complex_types_limits">Limitations and Restrictions for Complex Types</a> for a full list of limitations
+            and associated guidelines about complex type columns.
+          </p>
+        </li>
+      </ul>
+
+      <p class="p">
+        <strong class="ph b">Kudu considerations:</strong>
+      </p>
+      <p class="p">
+        Currently, the data types <code class="ph codeph">DECIMAL</code>, <code class="ph codeph">TIMESTAMP</code>, <code class="ph codeph">CHAR</code>, <code class="ph codeph">VARCHAR</code>,
+        <code class="ph codeph">ARRAY</code>, <code class="ph codeph">MAP</code>, and <code class="ph codeph">STRUCT</code> cannot be used with Kudu tables.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+      <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+      Many of the complex type examples refer to tables
+      such as <code class="ph codeph">CUSTOMER</code> and <code class="ph codeph">REGION</code>
+      adapted from the tables used in the TPC-H benchmark.
+      See <a class="xref" href="../shared/../topics/impala_complex_types.html#complex_sample_schema">Sample Schema and Data for Experimenting with Impala Complex Types</a>
+      for the table definitions.
+      </div>
+
+      <p class="p">
+        The following example shows how to construct a table with various kinds of <code class="ph codeph">ARRAY</code> columns,
+        both at the top level and nested within other complex types.
+        Whenever the <code class="ph codeph">ARRAY</code> consists of a scalar value, such as in the <code class="ph codeph">PETS</code>
+        column or the <code class="ph codeph">CHILDREN</code> field, you can see that future expansion is limited.
+        For example, you could not easily evolve the schema to record the kind of pet or the child's birthday alongside the name.
+        Therefore, it is more common to use an <code class="ph codeph">ARRAY</code> whose elements are of <code class="ph codeph">STRUCT</code> type,
+        to associate multiple fields with each array element.
+      </p>
+
+      <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+        Practice the <code class="ph codeph">CREATE TABLE</code> and query notation for complex type columns
+        using empty tables, until you can visualize a complex data structure and construct corresponding SQL statements reliably.
+      </div>
+
+
+
+<pre class="pre codeblock"><code>CREATE TABLE array_demo
+(
+  id BIGINT,
+  name STRING,
+-- An ARRAY of scalar type as a top-level column.
+  pets ARRAY &lt;STRING&gt;,
+
+-- An ARRAY with elements of complex type (STRUCT).
+  places_lived ARRAY &lt; STRUCT &lt;
+    place: STRING,
+    start_year: INT
+  &gt;&gt;,
+
+-- An ARRAY as a field (CHILDREN) within a STRUCT.
+-- (The STRUCT is inside another ARRAY, because it is rare
+-- for a STRUCT to be a top-level column.)
+  marriages ARRAY &lt; STRUCT &lt;
+    spouse: STRING,
+    children: ARRAY &lt;STRING&gt;
+  &gt;&gt;,
+
+-- An ARRAY as the value part of a MAP.
+-- The first MAP field (the key) would be a value such as
+-- 'Parent' or 'Grandparent', and the corresponding array would
+-- represent 2 parents, 4 grandparents, and so on.
+  ancestors MAP &lt; STRING, ARRAY &lt;STRING&gt; &gt;
+)
+STORED AS PARQUET;
+
+</code></pre>
+
+    <p class="p">
+      The following example shows how to examine the structure of a table containing one or more <code class="ph codeph">ARRAY</code> columns by using the
+      <code class="ph codeph">DESCRIBE</code> statement. You can visualize each <code class="ph codeph">ARRAY</code> as its own two-column table, with columns
+      <code class="ph codeph">ITEM</code> and <code class="ph codeph">POS</code>.
+    </p>
+
+
+
+<pre class="pre codeblock"><code>DESCRIBE array_demo;
++--------------+---------------------------+
+| name         | type                      |
++--------------+---------------------------+
+| id           | bigint                    |
+| name         | string                    |
+| pets         | array&lt;string&gt;             |
+| marriages    | array&lt;struct&lt;             |
+|              |   spouse:string,          |
+|              |   children:array&lt;string&gt;  |
+|              | &gt;&gt;                        |
+| places_lived | array&lt;struct&lt;             |
+|              |   place:string,           |
+|              |   start_year:int          |
+|              | &gt;&gt;                        |
+| ancestors    | map&lt;string,array&lt;string&gt;&gt; |
++--------------+---------------------------+
+
+DESCRIBE array_demo.pets;
++------+--------+
+| name | type   |
++------+--------+
+| item | string |
+| pos  | bigint |
++------+--------+
+
+DESCRIBE array_demo.marriages;
++------+--------------------------+
+| name | type                     |
++------+--------------------------+
+| item | struct&lt;                  |
+|      |   spouse:string,         |
+|      |   children:array&lt;string&gt; |
+|      | &gt;                        |
+| pos  | bigint                   |
++------+--------------------------+
+
+DESCRIBE array_demo.places_lived;
++------+------------------+
+| name | type             |
++------+------------------+
+| item | struct&lt;          |
+|      |   place:string,  |
+|      |   start_year:int |
+|      | &gt;                |
+| pos  | bigint           |
++------+------------------+
+
+DESCRIBE array_demo.ancestors;
++-------+---------------+
+| name  | type          |
++-------+---------------+
+| key   | string        |
+| value | array&lt;string&gt; |
++-------+---------------+
+
+</code></pre>
+
+    <p class="p">
+      The following example shows queries involving <code class="ph codeph">ARRAY</code> columns containing elements of scalar or complex types. You
+      <span class="q">"unpack"</span> each <code class="ph codeph">ARRAY</code> column by referring to it in a join query, as if it were a separate table with
+      <code class="ph codeph">ITEM</code> and <code class="ph codeph">POS</code> columns. If the array element is a scalar type, you refer to its value using the
+      <code class="ph codeph">ITEM</code> pseudocolumn. If the array element is a <code class="ph codeph">STRUCT</code>, you refer to the <code class="ph codeph">STRUCT</code> fields
+      using dot notation and the field names. If the array element is another <code class="ph codeph">ARRAY</code> or a <code class="ph codeph">MAP</code>, you use
+      another level of join to unpack the nested collection elements.
+    </p>
+
+
+
+<pre class="pre codeblock"><code>-- Array of scalar values.
+-- Each array element represents a single string, plus we know its position in the array.
+SELECT id, name, pets.pos, pets.item FROM array_demo, array_demo.pets;
+
+-- Array of structs.
+-- Now each array element has named fields, possibly of different types.
+-- You can consider an ARRAY of STRUCT to represent a table inside another table.
+SELECT id, name, places_lived.pos, places_lived.item.place, places_lived.item.start_year
+FROM array_demo, array_demo.places_lived;
+
+-- The .ITEM name is optional for array elements that are structs.
+-- The following query is equivalent to the previous one, with .ITEM
+-- removed from the column references.
+SELECT id, name, places_lived.pos, places_lived.place, places_lived.start_year
+  FROM array_demo, array_demo.places_lived;
+
+-- To filter specific items from the array, do comparisons against the .POS or .ITEM
+-- pseudocolumns, or names of struct fields, in the WHERE clause.
+SELECT id, name, pets.item FROM array_demo, array_demo.pets
+  WHERE pets.pos in (0, 1, 3);
+
+SELECT id, name, pets.item FROM array_demo, array_demo.pets
+  WHERE pets.item LIKE 'Mr. %';
+
+SELECT id, name, places_lived.pos, places_lived.place, places_lived.start_year
+  FROM array_demo, array_demo.places_lived
+WHERE places_lived.place like '%California%';
+
+</code></pre>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+    <p class="p">
+      <a class="xref" href="impala_complex_types.html#complex_types">Complex Types (Impala 2.3 or higher only)</a>,
+
+      <a class="xref" href="impala_struct.html#struct">STRUCT Complex Type (Impala 2.3 or higher only)</a>, <a class="xref" href="impala_map.html#map">MAP Complex Type (Impala 2.3 or higher only)</a>
+    </p>
+
+  </div>
+
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_datatypes.html">Data Types</a></div></div></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_auditing.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_auditing.html b/docs/build/html/topics/impala_auditing.html
new file mode 100644
index 0000000..bcd6d9f
--- /dev/null
+++ b/docs/build/html/topics/impala_auditing.html
@@ -0,0 +1,222 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_security.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="auditing"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Auditing Impala Operations</title></head><body id="auditing"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Auditing Impala Operations</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      To monitor how Impala data is being used within your organization, ensure
+      that your Impala authorization and authentication policies are effective.
+      To detect attempts at intrusion or unauthorized access to Impala
+      data, you can use the auditing feature in Impala 1.2.1 and higher:
+    </p>
+
+    <ul class="ul">
+      <li class="li">
+        Enable auditing by including the option
+        <code class="ph codeph">-audit_event_log_dir=<var class="keyword varname">directory_path</var></code>
+        in your <span class="keyword cmdname">impalad</span> startup options.
+        The log directory must be a local directory on the
+        server, not an HDFS directory.
+      </li>
+
+      <li class="li">
+        Decide how many queries will be represented in each log file. By default,
+        Impala starts a new log file every 5000 queries. To specify a different number,
+        <span class="ph">include
+        the option <code class="ph codeph">-max_audit_event_log_file_size=<var class="keyword varname">number_of_queries</var></code>
+        in the <span class="keyword cmdname">impalad</span> startup options</span>.
+      </li>
+
+      <li class="li"> 
+        Use a cluster manager with governance capabilities to filter, visualize,
+        and produce reports based on the audit logs collected
+        from all the hosts in the cluster. 
+      </li>
+    </ul>
+
+    <p class="p toc inpage"></p>
+  </div>
+
+  <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_security.html">Impala Security</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="auditing__auditing_performance">
+
+    <h2 class="title topictitle2" id="ariaid-title2">Durability and Performance Considerations for Impala Auditing</h2>
+  
+
+    <div class="body conbody">
+
+      <p class="p">
+        The auditing feature only imposes performance overhead while auditing is enabled.
+      </p>
+
+      <p class="p">
+        Because any Impala host can process a query, enable auditing on all hosts where the
+        <span class="ph"><span class="keyword cmdname">impalad</span> daemon</span>
+         runs. Each host stores its own log
+        files, in a directory in the local filesystem. The log data is periodically flushed to disk (through an
+        <code class="ph codeph">fsync()</code> system call) to avoid loss of audit data in case of a crash.
+      </p>
+
+      <p class="p"> 
+        The runtime overhead of auditing applies to whichever host serves as the coordinator
+        for the query, that is, the host you connect to when you issue the query. This might
+        be the same host for all queries, or different applications or users might connect to
+        and issue queries through different hosts. 
+      </p>
+
+      <p class="p"> 
+        To avoid excessive I/O overhead on busy coordinator hosts, Impala syncs the audit log
+        data (using the <code class="ph codeph">fsync()</code> system call) periodically rather than after
+        every query. Currently, the <code class="ph codeph">fsync()</code> calls are issued at a fixed
+        interval, every 5 seconds. 
+      </p>
+
+      <p class="p">
+        By default, Impala avoids losing any audit log data in the case of an error during a logging operation
+        (such as a disk full error), by immediately shutting down
+        <span class="keyword cmdname">impalad</span> on the host where the auditing problem occurred.
+        <span class="ph">You can override this setting by specifying the option
+        <code class="ph codeph">-abort_on_failed_audit_event=false</code> in the <span class="keyword cmdname">impalad</span> startup options.</span>
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="auditing__auditing_format">
+
+    <h2 class="title topictitle2" id="ariaid-title3">Format of the Audit Log Files</h2>
+  
+
+    <div class="body conbody">
+
+      <p class="p"> 
+        The audit log files represent the query information in JSON format, one query per line.
+        Typically, rather than looking at the log files themselves, you should use cluster-management
+        software to consolidate the log data from all Impala hosts and filter and visualize the results
+        in useful ways. (If you do examine the raw log data, you might run the files through
+        a JSON pretty-printer first.) 
+     </p>
+
+      <p class="p">
+        All the information about schema objects accessed by the query is encoded in a single nested record on the
+        same line. For example, the audit log for an <code class="ph codeph">INSERT ... SELECT</code> statement records that a
+        select operation occurs on the source table and an insert operation occurs on the destination table. The
+        audit log for a query against a view records the base table accessed by the view, or multiple base tables
+        in the case of a view that includes a join query. Every Impala operation that corresponds to a SQL
+        statement is recorded in the audit logs, whether the operation succeeds or fails. Impala records more
+        information for a successful operation than for a failed one, because an unauthorized query is stopped
+        immediately, before all the query planning is completed.
+      </p>
+
+
+
+      <p class="p">
+        The information logged for each query includes:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          Client session state:
+          <ul class="ul">
+            <li class="li">
+              Session ID
+            </li>
+
+            <li class="li">
+              User name
+            </li>
+
+            <li class="li">
+              Network address of the client connection
+            </li>
+          </ul>
+        </li>
+
+        <li class="li">
+          SQL statement details:
+          <ul class="ul">
+            <li class="li">
+              Query ID
+            </li>
+
+            <li class="li">
+              Statement Type - DML, DDL, and so on
+            </li>
+
+            <li class="li">
+              SQL statement text
+            </li>
+
+            <li class="li">
+              Execution start time, in local time
+            </li>
+
+            <li class="li">
+              Execution Status - Details on any errors that were encountered
+            </li>
+
+            <li class="li">
+              Target Catalog Objects:
+              <ul class="ul">
+                <li class="li">
+                  Object Type - Table, View, or Database
+                </li>
+
+                <li class="li">
+                  Fully qualified object name
+                </li>
+
+                <li class="li">
+                  Privilege - How the object is being used (<code class="ph codeph">SELECT</code>, <code class="ph codeph">INSERT</code>,
+                  <code class="ph codeph">CREATE</code>, and so on)
+                </li>
+              </ul>
+            </li>
+          </ul>
+        </li>
+      </ul>
+
+
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="auditing__auditing_exceptions">
+
+    <h2 class="title topictitle2" id="ariaid-title4">Which Operations Are Audited</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        The kinds of SQL queries represented in the audit log are:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          Queries that are prevented due to lack of authorization.
+        </li>
+
+        <li class="li">
+          Queries that Impala can analyze and parse to determine that they are authorized. The audit data is
+          recorded immediately after Impala finishes its analysis, before the query is actually executed.
+        </li>
+      </ul>
+
+      <p class="p">
+        The audit log does not contain entries for queries that could not be parsed and analyzed. For example, a
+        query that fails due to a syntax error is not recorded in the audit log. The audit log also does not
+        contain queries that fail due to a reference to a table that does not exist, if you would be authorized to
+        access the table if it did exist.
+      </p>
+
+      <p class="p">
+        Certain statements in the <span class="keyword cmdname">impala-shell</span> interpreter, such as <code class="ph codeph">CONNECT</code>,
+        <code class="ph codeph">SUMMARY</code>, <code class="ph codeph">PROFILE</code>, <code class="ph codeph">SET</code>, and
+        <code class="ph codeph">QUIT</code>, do not correspond to actual SQL queries, and these statements are not reflected in
+        the audit log.
+      </p>
+    </div>
+  </article>
+</article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_authentication.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_authentication.html b/docs/build/html/topics/impala_authentication.html
new file mode 100644
index 0000000..504f6c7
--- /dev/null
+++ b/docs/build/html/topics/impala_authentication.html
@@ -0,0 +1,37 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_security.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_kerberos.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_ldap.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_mixed_security.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_delegation.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="authentication"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Impala Auth
 entication</title></head><body id="authentication"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Impala Authentication</h1>
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      Authentication is the mechanism to ensure that only specified hosts and users can connect to Impala. It also
+      verifies that when clients connect to Impala, they are connected to a legitimate server. This feature
+      prevents spoofing such as <dfn class="term">impersonation</dfn> (setting up a phony client system with the same account
+      and group names as a legitimate user) and <dfn class="term">man-in-the-middle attacks</dfn> (intercepting application
+      requests before they reach Impala and eavesdropping on sensitive information in the requests or the results).
+    </p>
+
+    <p class="p">
+      Impala supports authentication using either Kerberos or LDAP.
+    </p>
+
+    <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+      Regardless of the authentication mechanism used, Impala always creates HDFS directories and data files
+      owned by the same user (typically <code class="ph codeph">impala</code>). To implement user-level access to different
+      databases, tables, columns, partitions, and so on, use the Sentry authorization feature, as explained in
+      <a class="xref" href="../shared/../topics/impala_authorization.html#authorization">Enabling Sentry Authorization for Impala</a>.
+    </div>
+
+    <p class="p toc"></p>
+
+    <p class="p">
+      Once you are finished setting up authentication, move on to authorization, which involves specifying what
+      databases, tables, HDFS directories, and so on can be accessed by particular users when they connect through
+      Impala. See <a class="xref" href="impala_authorization.html#authorization">Enabling Sentry Authorization for Impala</a> for details.
+    </p>
+  </div>
+<nav role="navigation" class="related-links"><ul class="ullinks"><li class="link ulchildlink"><strong><a href="../topics/impala_kerberos.html">Enabling Kerberos Authentication for Impala</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_ldap.html">Enabling LDAP Authentication for Impala</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_mixed_security.html">Using Multiple Authentication Methods with Impala</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_delegation.html">Configuring Impala Delegation for Hue and BI Tools</a></strong><br></li></ul><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_security.html">Impala Security</a></div></div></nav></article></main></body></html>
\ No newline at end of file

[43/51] [partial] incubator-impala git commit: IMPALA-4181 [DOCS] Publish rendered Impala documentation to ASF site

Posted by jb...@apache.org.

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_complex_types.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_complex_types.html b/docs/build/html/topics/impala_complex_types.html
new file mode 100644
index 0000000..ac55311
--- /dev/null
+++ b/docs/build/html/topics/impala_complex_types.html
@@ -0,0 +1,2606 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_datatypes.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="complex_types"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Complex Types (Impala 2.3 or higher only)</title></head><body id="complex_types"><main role="main"><article role="article" aria-labelledby="complex_types__nested_types">
+
+  <h1 class="title topictitle1" id="complex_types__nested_types">Complex Types (<span class="keyword">Impala 2.3</span> or higher only)</h1>
+
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      
+
+      
+      <dfn class="term">Complex types</dfn> (also referred to as <dfn class="term">nested types</dfn>) let you represent multiple data values within a single
+      row/column position. They differ from the familiar column types such as <code class="ph codeph">BIGINT</code> and <code class="ph codeph">STRING</code>, known as
+      <dfn class="term">scalar types</dfn> or <dfn class="term">primitive types</dfn>, which represent a single data value within a given row/column position.
+      Impala supports the complex types <code class="ph codeph">ARRAY</code>, <code class="ph codeph">MAP</code>, and <code class="ph codeph">STRUCT</code> in <span class="keyword">Impala 2.3</span>
+      and higher. The Hive <code class="ph codeph">UNION</code> type is not currently supported.
+    </p>
+
+    <p class="p toc inpage"></p>
+
+    <p class="p">
+      Once you understand the basics of complex types, refer to the individual type topics when you need to refresh your memory about syntax
+      and examples:
+    </p>
+
+    <ul class="ul">
+      <li class="li">
+        <a class="xref" href="impala_array.html#array">ARRAY Complex Type (Impala 2.3 or higher only)</a>
+      </li>
+
+      <li class="li">
+        <a class="xref" href="impala_struct.html#struct">STRUCT Complex Type (Impala 2.3 or higher only)</a>
+      </li>
+
+      <li class="li">
+        <a class="xref" href="impala_map.html#map">MAP Complex Type (Impala 2.3 or higher only)</a>
+      </li>
+    </ul>
+
+  </div>
+
+  <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_datatypes.html">Data Types</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="complex_types__complex_types_benefits">
+
+    <h2 class="title topictitle2" id="ariaid-title2">Benefits of Impala Complex Types</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        The reasons for using Impala complex types include the following:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          <p class="p">
+            You already have data produced by Hive or other non-Impala component that uses the complex type column names. You might need to
+            convert the underlying data to Parquet to use it with Impala.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            Your data model originates with a non-SQL programming language or a NoSQL data management system. For example, if you are
+            representing Python data expressed as nested lists, dictionaries, and tuples, those data structures correspond closely to Impala
+            <code class="ph codeph">ARRAY</code>, <code class="ph codeph">MAP</code>, and <code class="ph codeph">STRUCT</code> types.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            Your analytic queries involving multiple tables could benefit from greater locality during join processing. By packing more
+            related data items within each HDFS data block, complex types let join queries avoid the network overhead of the traditional
+            Hadoop shuffle or broadcast join techniques.
+          </p>
+        </li>
+      </ul>
+
+      <p class="p">
+        The Impala complex type support produces result sets with all scalar values, and the scalar components of complex types can be used
+        with all SQL clauses, such as <code class="ph codeph">GROUP BY</code>, <code class="ph codeph">ORDER BY</code>, all kinds of joins, subqueries, and inline
+        views. The ability to process complex type data entirely in SQL reduces the need to write application-specific code in Java or other
+        programming languages to deconstruct the underlying data structures.
+      </p>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="complex_types__complex_types_overview">
+
+    <h2 class="title topictitle2" id="ariaid-title3">Overview of Impala Complex Types</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+
+        The <code class="ph codeph">ARRAY</code> and <code class="ph codeph">MAP</code> types are closely related: they represent collections with arbitrary numbers of
+        elements, where each element is the same type. In contrast, <code class="ph codeph">STRUCT</code> groups together a fixed number of items into a
+        single element. The parts of a <code class="ph codeph">STRUCT</code> element (the <dfn class="term">fields</dfn>) can be of different types, and each field
+        has a name.
+      </p>
+
+      <p class="p">
+        The elements of an <code class="ph codeph">ARRAY</code> or <code class="ph codeph">MAP</code>, or the fields of a <code class="ph codeph">STRUCT</code>, can also be other
+        complex types. You can construct elaborate data structures with up to 100 levels of nesting. For example, you can make an
+        <code class="ph codeph">ARRAY</code> whose elements are <code class="ph codeph">STRUCT</code>s. Within each <code class="ph codeph">STRUCT</code>, you can have some fields
+        that are <code class="ph codeph">ARRAY</code>, <code class="ph codeph">MAP</code>, or another kind of <code class="ph codeph">STRUCT</code>. The Impala documentation uses the
+        terms complex and nested types interchangeably; for simplicity, it primarily uses the term complex types, to encompass all the
+        properties of these types.
+      </p>
+
+      <p class="p">
+        When visualizing your data model in familiar SQL terms, you can think of each <code class="ph codeph">ARRAY</code> or <code class="ph codeph">MAP</code> as a
+        miniature table, and each <code class="ph codeph">STRUCT</code> as a row within such a table. By default, the table represented by an
+        <code class="ph codeph">ARRAY</code> has two columns, <code class="ph codeph">POS</code> to represent ordering of elements, and <code class="ph codeph">ITEM</code>
+        representing the value of each element. Likewise, by default, the table represented by a <code class="ph codeph">MAP</code> encodes key-value
+        pairs, and therefore has two columns, <code class="ph codeph">KEY</code> and <code class="ph codeph">VALUE</code>.
+
+      </p>
+
+      <p class="p">
+        The <code class="ph codeph">ITEM</code> and <code class="ph codeph">VALUE</code> names are only required for the very simplest kinds of <code class="ph codeph">ARRAY</code>
+        and <code class="ph codeph">MAP</code> columns, ones that hold only scalar values. When the elements within the <code class="ph codeph">ARRAY</code> or
+        <code class="ph codeph">MAP</code> are of type <code class="ph codeph">STRUCT</code> rather than a scalar type, then the result set contains columns with names
+        corresponding to the <code class="ph codeph">STRUCT</code> fields rather than <code class="ph codeph">ITEM</code> or <code class="ph codeph">VALUE</code>.
+      </p>
+
+
+
+      <p class="p">
+        You write most queries that process complex type columns using familiar join syntax, even though the data for both sides of the join
+        resides in a single table. The join notation brings together the scalar values from a row with the values from the complex type
+        columns for that same row. The final result set contains all scalar values, allowing you to do all the familiar filtering,
+        aggregation, ordering, and so on for the complex data entirely in SQL or using business intelligence tools that issue SQL queries.
+
+      </p>
+
+      <p class="p">
+        Behind the scenes, Impala ensures that the processing for each row is done efficiently on a single host, without the network traffic
+        involved in broadcast or shuffle joins. The most common type of join query for tables with complex type columns is <code class="ph codeph">INNER
+        JOIN</code>, which returns results only in those cases where the complex type contains some elements. Therefore, most query
+        examples in this section use either the <code class="ph codeph">INNER JOIN</code> clause or the equivalent comma notation.
+      </p>
+
+      <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+        <p class="p">
+          Although Impala can query complex types that are present in Parquet files, Impala currently cannot create new Parquet files
+          containing complex types. Therefore, the discussion and examples presume that you are working with existing Parquet data produced
+          through Hive, Spark, or some other source. See <a class="xref" href="#complex_types_ex_hive_etl">Constructing Parquet Files with Complex Columns Using Hive</a> for examples of constructing Parquet data
+          files with complex type columns.
+        </p>
+
+        <p class="p">
+          For learning purposes, you can create empty tables with complex type columns and practice query syntax, even if you do not have
+          sample data with the required structure.
+        </p>
+      </div>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="complex_types__complex_types_design">
+
+    <h2 class="title topictitle2" id="ariaid-title4">Design Considerations for Complex Types</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        When planning to use Impala complex types, and designing the Impala schema, first learn how this kind of schema differs from
+        traditional table layouts from the relational database and data warehousing fields. Because you might have already encountered
+        complex types in a Hadoop context while using Hive for ETL, also learn how to write high-performance analytic queries for complex
+        type data using Impala SQL syntax.
+      </p>
+
+      <p class="p toc inpage"></p>
+
+    </div>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title5" id="complex_types_design__complex_types_vs_rdbms">
+
+      <h3 class="title topictitle3" id="ariaid-title5">How Complex Types Differ from Traditional Data Warehouse Schemas</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          Complex types let you associate arbitrary data structures with a particular row. If you are familiar with schema design for
+          relational database management systems or data warehouses, a schema with complex types has the following differences:
+        </p>
+
+        <ul class="ul">
+          <li class="li">
+            <p class="p">
+              Logically, related values can now be grouped tightly together in the same table.
+            </p>
+
+            <p class="p">
+              In traditional data warehousing, related values were typically arranged in one of two ways:
+            </p>
+            <ul class="ul">
+              <li class="li">
+                <p class="p">
+                  Split across multiple normalized tables. Foreign key columns specified which rows from each table were associated with
+                  each other. This arrangement avoided duplicate data and therefore the data was compact, but join queries could be
+                  expensive because the related data had to be retrieved from separate locations. (In the case of distributed Hadoop
+                  queries, the joined tables might even be transmitted between different hosts in a cluster.)
+                </p>
+              </li>
+
+              <li class="li">
+                <p class="p">
+                  Flattened into a single denormalized table. Although this layout eliminated some potential performance issues by removing
+                  the need for join queries, the table typically became larger because values were repeated. The extra data volume could
+                  cause performance issues in other parts of the workflow, such as longer ETL cycles or more expensive full-table scans
+                  during queries.
+                </p>
+              </li>
+            </ul>
+            <p class="p">
+              Complex types represent a middle ground that addresses these performance and volume concerns. By physically locating related
+              data within the same data files, complex types increase locality and reduce the expense of join queries. By associating an
+              arbitrary amount of data with a single row, complex types avoid the need to repeat lengthy values such as strings. Because
+              Impala knows which complex type values are associated with each row, you can save storage by avoiding artificial foreign key
+              values that are only used for joins. The flexibility of the <code class="ph codeph">STRUCT</code>, <code class="ph codeph">ARRAY</code>, and
+              <code class="ph codeph">MAP</code> types lets you model familiar constructs such as fact and dimension tables from a data warehouse, and
+              wide tables representing sparse matrixes.
+            </p>
+          </li>
+        </ul>
+
+      </div>
+
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title6" id="complex_types_design__complex_types_physical">
+
+      <h3 class="title topictitle3" id="ariaid-title6">Physical Storage for Complex Types</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          Physically, the scalar and complex columns in each row are located adjacent to each other in the same Parquet data file, ensuring
+          that they are processed on the same host rather than being broadcast across the network when cross-referenced within a query. This
+          co-location simplifies the process of copying, converting, and backing all the columns up at once. Because of the column-oriented
+          layout of Parquet files, you can still query only the scalar columns of a table without imposing the I/O penalty of reading the
+          (possibly large) values of the composite columns.
+        </p>
+
+        <p class="p">
+          Within each Parquet data file, the constituent parts of complex type columns are stored in column-oriented format:
+        </p>
+
+        <ul class="ul">
+          <li class="li">
+            <p class="p">
+              Each field of a <code class="ph codeph">STRUCT</code> type is stored like a column, with all the scalar values adjacent to each other and
+              encoded, compressed, and so on using the Parquet space-saving techniques.
+            </p>
+          </li>
+
+          <li class="li">
+            <p class="p">
+              For an <code class="ph codeph">ARRAY</code> containing scalar values, all those values (represented by the <code class="ph codeph">ITEM</code>
+              pseudocolumn) are stored adjacent to each other.
+            </p>
+          </li>
+
+          <li class="li">
+            <p class="p">
+              For a <code class="ph codeph">MAP</code>, the values of the <code class="ph codeph">KEY</code> pseudocolumn are stored adjacent to each other. If the
+              <code class="ph codeph">VALUE</code> pseudocolumn is a scalar type, its values are also stored adjacent to each other.
+            </p>
+          </li>
+
+          <li class="li">
+            <p class="p">
+              If an <code class="ph codeph">ARRAY</code> element, <code class="ph codeph">STRUCT</code> field, or <code class="ph codeph">MAP</code> <code class="ph codeph">VALUE</code> part is
+              another complex type, the column-oriented storage applies to the next level down (or the next level after that, and so on for
+              deeply nested types) where the final elements, fields, or values are of scalar types.
+            </p>
+          </li>
+        </ul>
+
+        <p class="p">
+          The numbers represented by the <code class="ph codeph">POS</code> pseudocolumn of an <code class="ph codeph">ARRAY</code> are not physically stored in the
+          data files. They are synthesized at query time based on the order of the <code class="ph codeph">ARRAY</code> elements associated with each row.
+        </p>
+
+      </div>
+
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title7" id="complex_types_design__complex_types_file_formats">
+
+      <h3 class="title topictitle3" id="ariaid-title7">File Format Support for Impala Complex Types</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          Currently, Impala queries support complex type data only in the Parquet file format. See <a class="xref" href="impala_parquet.html#parquet">Using the Parquet File Format with Impala Tables</a>
+          for details about the performance benefits and physical layout of this file format.
+        </p>
+
+        <p class="p">
+          Each table, or each partition within a table, can have a separate file format, and you can change file format at the table or
+          partition level through an <code class="ph codeph">ALTER TABLE</code> statement. Because this flexibility makes it difficult to guarantee ahead
+          of time that all the data files for a table or partition are in a compatible format, Impala does not throw any errors when you
+          change the file format for a table or partition using <code class="ph codeph">ALTER TABLE</code>. Any errors come at runtime when Impala
+          actually processes a table or partition that contains nested types and is not in one of the supported formats. If a query on a
+          partitioned table only processes some partitions, and all those partitions are in one of the supported formats, the query
+          succeeds.
+        </p>
+
+        <p class="p">
+          Because Impala does not parse the data structures containing nested types for unsupported formats such as text, Avro,
+          SequenceFile, or RCFile, you cannot use data files in these formats with Impala, even if the query does not refer to the nested
+          type columns. Also, if a table using an unsupported format originally contained nested type columns, and then those columns were
+          dropped from the table using <code class="ph codeph">ALTER TABLE ... DROP COLUMN</code>, any existing data files in the table still contain the
+          nested type data and Impala queries on that table will generate errors.
+        </p>
+
+        <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+          <p class="p">
+            The one exception to the preceding rule is <code class="ph codeph">COUNT(*)</code> queries on RCFile tables that include complex types.
+            Such queries are allowed in <span class="keyword">Impala 2.6</span> and higher.
+          </p>
+        </div>
+
+        <p class="p">
+          You can perform DDL operations (even <code class="ph codeph">CREATE TABLE</code>) for tables involving complex types in file formats other than
+          Parquet. The DDL support lets you set up intermediate tables in your ETL pipeline, to be populated by Hive, before the final stage
+          where the data resides in a Parquet table and is queryable by Impala. Also, you can have a partitioned table with complex type
+          columns that uses a non-Parquet format, and use <code class="ph codeph">ALTER TABLE</code> to change the file format to Parquet for individual
+          partitions. When you put Parquet data files into those partitions, Impala can execute queries against that data as long as the
+          query does not involve any of the non-Parquet partitions.
+        </p>
+
+        <p class="p">
+          If you use the <span class="keyword cmdname">parquet-tools</span> command to examine the structure of a Parquet data file that includes complex
+          types, you see that both <code class="ph codeph">ARRAY</code> and <code class="ph codeph">MAP</code> are represented as a <code class="ph codeph">Bag</code> in Parquet
+          terminology, with all fields marked <code class="ph codeph">Optional</code> because Impala allows any column to be nullable.
+        </p>
+
+        <p class="p">
+          Impala supports either 2-level and 3-level encoding within each Parquet data file. When constructing Parquet data files outside
+          Impala, use either encoding style but do not mix 2-level and 3-level encoding within the same data file.
+        </p>
+
+      </div>
+
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title8" id="complex_types_design__complex_types_vs_normalization">
+
+      <h3 class="title topictitle3" id="ariaid-title8">Choosing Between Complex Types and Normalized Tables</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          Choosing between multiple normalized fact and dimension tables, or a single table containing complex types, is an important design
+          decision.
+        </p>
+
+        <ul class="ul">
+          <li class="li">
+            <p class="p">
+              If you are coming from a traditional database or data warehousing background, you might be familiar with how to split up data
+              between tables. Your business intelligence tools might already be optimized for dealing with this kind of multi-table scenario
+              through join queries.
+            </p>
+          </li>
+
+          <li class="li">
+            <p class="p">
+              If you are pulling data from Impala into an application written in a programming language that has data structures analogous
+              to the complex types, such as Python or Java, complex types in Impala could simplify data interchange and improve
+              understandability and reliability of your program logic.
+            </p>
+          </li>
+
+          <li class="li">
+            <p class="p">
+              You might already be faced with existing infrastructure or receive high volumes of data that assume one layout or the other.
+              For example, complex types are popular with web-oriented applications, for example to keep information about an online user
+              all in one place for convenient lookup and analysis, or to deal with sparse or constantly evolving data fields.
+            </p>
+          </li>
+
+          <li class="li">
+            <p class="p">
+              If some parts of the data change over time while related data remains constant, using multiple normalized tables lets you
+              replace certain parts of the data without reloading the entire data set. Conversely, if you receive related data all bundled
+              together, such as in JSON files, using complex types can save the overhead of splitting the related items across multiple
+              tables.
+            </p>
+          </li>
+
+          <li class="li">
+            <p class="p">
+              From a performance perspective:
+            </p>
+            <ul class="ul">
+              <li class="li">
+                <p class="p">
+                  In Parquet tables, Impala can skip columns that are not referenced in a query, avoiding the I/O penalty of reading the
+                  embedded data. When complex types are nested within a column, the data is physically divided at a very granular level; for
+                  example, a query referring to data nested multiple levels deep in a complex type column does not have to read all the data
+                  from that column, only the data for the relevant parts of the column type hierarchy.
+
+                </p>
+              </li>
+
+              <li class="li">
+                <p class="p">
+                  Complex types avoid the possibility of expensive join queries when data from fact and dimension tables is processed in
+                  parallel across multiple hosts. All the information for a row containing complex types is typically to be in the same data
+                  block, and therefore does not need to be transmitted across the network when joining fields that are all part of the same
+                  row.
+                </p>
+              </li>
+
+              <li class="li">
+                <p class="p">
+                  The tradeoff with complex types is that fewer rows fit in each data block. Whether it is better to have more data blocks
+                  with fewer rows, or fewer data blocks with many rows, depends on the distribution of your data and the characteristics of
+                  your query workload. If the complex columns are rarely referenced, using them might lower efficiency. If you are seeing
+                  low parallelism due to a small volume of data (relatively few data blocks) in each table partition, increasing the row
+                  size by including complex columns might produce more data blocks and thus spread the work more evenly across the cluster.
+                  See <a class="xref" href="impala_scalability.html#scalability">Scalability Considerations for Impala</a> for more on this advanced topic.
+                </p>
+              </li>
+            </ul>
+          </li>
+        </ul>
+
+      </div>
+
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title9" id="complex_types_design__complex_types_hive">
+
+      <h3 class="title topictitle3" id="ariaid-title9">Differences Between Impala and Hive Complex Types</h3>
+
+      <div class="body conbody">
+
+
+
+
+
+
+
+        <p class="p">
+          Impala can query Parquet tables containing <code class="ph codeph">ARRAY</code>, <code class="ph codeph">STRUCT</code>, and <code class="ph codeph">MAP</code> columns
+          produced by Hive. There are some differences to be aware of between the Impala SQL and HiveQL syntax for complex types, primarily
+          for queries.
+        </p>
+
+        <p class="p">
+          The syntax for specifying <code class="ph codeph">ARRAY</code>, <code class="ph codeph">STRUCT</code>, and <code class="ph codeph">MAP</code> types in a <code class="ph codeph">CREATE
+          TABLE</code> statement is compatible between Impala and Hive.
+        </p>
+
+        <p class="p">
+          Because Impala <code class="ph codeph">STRUCT</code> columns include user-specified field names, you use the <code class="ph codeph">NAMED_STRUCT()</code>
+          constructor in Hive rather than the <code class="ph codeph">STRUCT()</code> constructor when you populate an Impala <code class="ph codeph">STRUCT</code>
+          column using a Hive <code class="ph codeph">INSERT</code> statement.
+        </p>
+
+        <p class="p">
+          The Hive <code class="ph codeph">UNION</code> type is not currently supported in Impala.
+        </p>
+
+        <p class="p">
+          While Impala usually aims for a high degree of compatibility with HiveQL query syntax, Impala syntax differs from Hive for queries
+          involving complex types. The differences are intended to provide extra flexibility for queries involving these kinds of tables.
+        </p>
+
+        <ul class="ul">
+          <li class="li">
+            Impala uses dot notation for referring to element names or elements within complex types, and join notation for
+            cross-referencing scalar columns with the elements of complex types within the same row, rather than the <code class="ph codeph">LATERAL
+            VIEW</code> clause and <code class="ph codeph">EXPLODE()</code> function of HiveQL.
+          </li>
+
+          <li class="li">
+            Using join notation lets you use all the kinds of join queries with complex type columns. For example, you can use a
+            <code class="ph codeph">LEFT OUTER JOIN</code>, <code class="ph codeph">LEFT ANTI JOIN</code>, or <code class="ph codeph">LEFT SEMI JOIN</code> query to evaluate
+            different scenarios where the complex columns do or do not contain any elements.
+          </li>
+
+          <li class="li">
+            You can include references to collection types inside subqueries and inline views. For example, you can construct a
+            <code class="ph codeph">FROM</code> clause where one of the <span class="q">"tables"</span> is a subquery against a complex type column, or use a subquery
+            against a complex type column as the argument to an <code class="ph codeph">IN</code> or <code class="ph codeph">EXISTS</code> clause.
+          </li>
+
+          <li class="li">
+            The Impala pseudocolumn <code class="ph codeph">POS</code> lets you retrieve the position of elements in an array along with the elements
+            themselves, equivalent to the <code class="ph codeph">POSEXPLODE()</code> function of HiveQL. You do not use index notation to retrieve a
+            single array element in a query; the join query loops through the array elements and you use <code class="ph codeph">WHERE</code> clauses to
+            specify which elements to return.
+          </li>
+
+          <li class="li">
+            <p class="p">
+              Join clauses involving complex type columns do not require an <code class="ph codeph">ON</code> or <code class="ph codeph">USING</code> clause. Impala
+              implicitly applies the join key so that the correct array entries or map elements are associated with the correct row from the
+              table.
+            </p>
+          </li>
+
+          <li class="li">
+            <p class="p">
+              Impala does not currently support the <code class="ph codeph">UNION</code> complex type.
+            </p>
+          </li>
+        </ul>
+
+      </div>
+
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title10" id="complex_types_design__complex_types_limits">
+
+      <h3 class="title topictitle3" id="ariaid-title10">Limitations and Restrictions for Complex Types</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          Complex type columns can only be used in tables or partitions with the Parquet file format.
+        </p>
+
+        <p class="p">
+          Complex type columns cannot be used as partition key columns in a partitioned table.
+        </p>
+
+        <p class="p">
+          When you use complex types with the <code class="ph codeph">ORDER BY</code>, <code class="ph codeph">GROUP BY</code>, <code class="ph codeph">HAVING</code>, or
+          <code class="ph codeph">WHERE</code> clauses, you cannot refer to the column name by itself. Instead, you refer to the names of the scalar
+          values within the complex type, such as the <code class="ph codeph">ITEM</code>, <code class="ph codeph">POS</code>, <code class="ph codeph">KEY</code>, or
+          <code class="ph codeph">VALUE</code> pseudocolumns, or the field names from a <code class="ph codeph">STRUCT</code>.
+        </p>
+
+        <p class="p">
+          The maximum depth of nesting for complex types is 100 levels.
+        </p>
+
+        <p class="p">
+            The maximum length of the column definition for any complex type, including declarations for any nested types,
+            is 4000 characters.
+          </p>
+
+        <p class="p">
+          For ideal performance and scalability, use small or medium-sized collections, where all the complex columns contain at most a few
+          hundred megabytes per row. Remember, all the columns of a row are stored in the same HDFS data block, whose size in Parquet files
+          typically ranges from 256 MB to 1 GB.
+        </p>
+
+        <p class="p">
+          Including complex type columns in a table introduces some overhead that might make queries that do not reference those columns
+          somewhat slower than Impala queries against tables without any complex type columns. Expect at most a 2x slowdown compared to
+          tables that do not have any complex type columns.
+        </p>
+
+        <p class="p">
+          Currently, the <code class="ph codeph">COMPUTE STATS</code> statement does not collect any statistics for columns containing complex types.
+          Impala uses heuristics to construct execution plans involving complex type columns.
+        </p>
+
+        <p class="p">
+          Currently, Impala built-in functions and user-defined functions cannot accept complex types as parameters or produce them as
+          function return values. (When the complex type values are materialized in an Impala result set, the result set contains the scalar
+          components of the values, such as the <code class="ph codeph">POS</code> or <code class="ph codeph">ITEM</code> for an <code class="ph codeph">ARRAY</code>, the
+          <code class="ph codeph">KEY</code> or <code class="ph codeph">VALUE</code> for a <code class="ph codeph">MAP</code>, or the fields of a <code class="ph codeph">STRUCT</code>; these
+          scalar data items <em class="ph i">can</em> be used with built-in functions and UDFs as usual.)
+        </p>
+
+        <p class="p">
+        Impala currently cannot write new data files containing complex type columns.
+        Therefore, although the <code class="ph codeph">SELECT</code> statement works for queries
+        involving complex type columns, you cannot use a statement form that writes
+        data to complex type columns, such as <code class="ph codeph">CREATE TABLE AS SELECT</code> or <code class="ph codeph">INSERT ... SELECT</code>.
+        To create data files containing complex type data, use the Hive <code class="ph codeph">INSERT</code> statement, or another
+        ETL mechanism such as MapReduce jobs, Spark jobs, Pig, and so on.
+      </p>
+
+        <p class="p">
+          Currently, Impala can query complex type columns only from Parquet tables or Parquet partitions within partitioned tables.
+          Although you can use complex types in tables with Avro, text, and other file formats as part of your ETL pipeline, for example as
+          intermediate tables populated through Hive, doing analytics through Impala requires that the data eventually ends up in a Parquet
+          table. The requirement for Parquet data files means that you can use complex types with Impala tables hosted on other kinds of
+          file storage systems such as Isilon and Amazon S3, but you cannot use Impala to query complex types from HBase tables. See
+          <a class="xref" href="impala_complex_types.html#complex_types_file_formats">File Format Support for Impala Complex Types</a> for more details.
+        </p>
+
+      </div>
+
+    </article>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title11" id="complex_types__complex_types_using">
+
+    <h2 class="title topictitle2" id="ariaid-title11">Using Complex Types from SQL</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        When using complex types through SQL in Impala, you learn the notation for <code class="ph codeph">&lt; &gt;</code> delimiters for the complex
+        type columns in <code class="ph codeph">CREATE TABLE</code> statements, and how to construct join queries to <span class="q">"unpack"</span> the scalar values
+        nested inside the complex data structures. You might need to condense a traditional RDBMS or data warehouse schema into a smaller
+        number of Parquet tables, and use Hive, Spark, Pig, or other mechanism outside Impala to populate the tables with data.
+      </p>
+
+      <p class="p toc inpage"></p>
+
+    </div>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title12" id="complex_types_using__nested_types_ddl">
+
+      <h3 class="title topictitle3" id="ariaid-title12">Complex Type Syntax for DDL Statements</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          The definition of <var class="keyword varname">data_type</var>, as seen in the <code class="ph codeph">CREATE TABLE</code> and <code class="ph codeph">ALTER TABLE</code>
+          statements, now includes complex types in addition to primitive types:
+        </p>
+
+<pre class="pre codeblock"><code>  primitive_type
+| array_type
+| map_type
+| struct_type
+</code></pre>
+
+        <p class="p">
+          Unions are not currently supported.
+        </p>
+
+        <p class="p">
+          Array, struct, and map column type declarations are specified in the <code class="ph codeph">CREATE TABLE</code> statement. You can also add or
+          change the type of complex columns through the <code class="ph codeph">ALTER TABLE</code> statement.
+        </p>
+
+        <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+          <p class="p">
+            Currently, Impala queries allow complex types only in tables that use the Parquet format. If an Impala query encounters complex
+            types in a table or partition using another file format, the query returns a runtime error.
+          </p>
+
+          <p class="p">
+            The Impala DDL support for complex types works for all file formats, so that you can create tables using text or other
+            non-Parquet formats for Hive to use as staging tables in an ETL cycle that ends with the data in a Parquet table. You can also
+            use <code class="ph codeph">ALTER TABLE ... SET FILEFORMAT PARQUET</code> to change the file format of an existing table containing complex
+            types to Parquet, after which Impala can query it. Make sure to load Parquet files into the table after changing the file
+            format, because the <code class="ph codeph">ALTER TABLE ... SET FILEFORMAT</code> statement does not convert existing data to the new file
+            format.
+          </p>
+        </div>
+
+        <p class="p">
+        Partitioned tables can contain complex type columns.
+        All the partition key columns must be scalar types.
+      </p>
+
+        <p class="p">
+          Because use cases for Impala complex types require that you already have Parquet data files produced outside of Impala, you can
+          use the Impala <code class="ph codeph">CREATE TABLE LIKE PARQUET</code> syntax to produce a table with columns that match the structure of an
+          existing Parquet file, including complex type columns for nested data structures. Remember to include the <code class="ph codeph">STORED AS
+          PARQUET</code> clause in this case, because even with <code class="ph codeph">CREATE TABLE LIKE PARQUET</code>, the default file format of the
+          resulting table is still text.
+        </p>
+
+        <p class="p">
+          Because the complex columns are omitted from the result set of an Impala <code class="ph codeph">SELECT *</code> or <code class="ph codeph">SELECT
+          <var class="keyword varname">col_name</var></code> query, and because Impala currently does not support writing Parquet files with complex type
+          columns, you cannot use the <code class="ph codeph">CREATE TABLE AS SELECT</code> syntax to create a table with nested type columns.
+        </p>
+
+        <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+          <p class="p">
+            Once you have a table set up with complex type columns, use the <code class="ph codeph">DESCRIBE</code> and <code class="ph codeph">SHOW CREATE TABLE</code>
+            statements to see the correct notation with <code class="ph codeph">&lt;</code> and <code class="ph codeph">&gt;</code> delimiters and comma and colon
+            separators within the complex type definitions. If you do not have existing data with the same layout as the table, you can
+            query the empty table to practice with the notation for the <code class="ph codeph">SELECT</code> statement. In the <code class="ph codeph">SELECT</code>
+            list, you use dot notation and pseudocolumns such as <code class="ph codeph">ITEM</code>, <code class="ph codeph">KEY</code>, and <code class="ph codeph">VALUE</code> for
+            referring to items within the complex type columns. In the <code class="ph codeph">FROM</code> clause, you use join notation to construct
+            table aliases for any referenced <code class="ph codeph">ARRAY</code> and <code class="ph codeph">MAP</code> columns.
+          </p>
+        </div>
+
+
+
+        <p class="p">
+          For example, when defining a table that holds contact information, you might represent phone numbers differently depending on the
+          expected layout and relationships of the data, and how well you can predict those properties in advance.
+        </p>
+
+        <p class="p">
+          Here are different ways that you might represent phone numbers in a traditional relational schema, with equivalent representations
+          using complex types.
+        </p>
+
+        <figure class="fig fignone" id="nested_types_ddl__complex_types_phones_flat_fixed"><figcaption><span class="fig--title-label">Figure 1. </span>Traditional Relational Representation of Phone Numbers: Single Table</figcaption>
+
+          
+
+          <p class="p">
+            The traditional, simplest way to represent phone numbers in a relational table is to store all contact info in a single table,
+            with all columns having scalar types, and each potential phone number represented as a separate column. In this example, each
+            person can only have these 3 types of phone numbers. If the person does not have a particular kind of phone number, the
+            corresponding column is <code class="ph codeph">NULL</code> for that row.
+          </p>
+
+<pre class="pre codeblock"><code>
+CREATE TABLE contacts_fixed_phones
+(
+    id BIGINT
+  , name STRING
+  , address STRING
+  , home_phone STRING
+  , work_phone STRING
+  , mobile_phone STRING
+) STORED AS PARQUET;
+</code></pre>
+
+        </figure>
+
+        <figure class="fig fignone" id="nested_types_ddl__complex_types_phones_array"><figcaption><span class="fig--title-label">Figure 2. </span>An Array of Phone Numbers</figcaption>
+
+          
+
+          <p class="p">
+            Using a complex type column to represent the phone numbers adds some extra flexibility. Now there could be an unlimited number
+            of phone numbers. Because the array elements have an order but not symbolic names, you could decide in advance that
+            phone_number[0] is the home number, [1] is the work number, [2] is the mobile number, and so on. (In subsequent examples, you
+            will see how to create a more flexible naming scheme using other complex type variations, such as a <code class="ph codeph">MAP</code> or an
+            <code class="ph codeph">ARRAY</code> where each element is a <code class="ph codeph">STRUCT</code>.)
+          </p>
+
+<pre class="pre codeblock"><code>
+CREATE TABLE contacts_array_of_phones
+(
+    id BIGINT
+  , name STRING
+  , address STRING
+  , phone_number ARRAY &lt; STRING &gt;
+) STORED AS PARQUET;
+
+</code></pre>
+
+        </figure>
+
+        <figure class="fig fignone" id="nested_types_ddl__complex_types_phones_map"><figcaption><span class="fig--title-label">Figure 3. </span>A Map of Phone Numbers</figcaption>
+
+          
+
+          <p class="p">
+            Another way to represent an arbitrary set of phone numbers is with a <code class="ph codeph">MAP</code> column. With a <code class="ph codeph">MAP</code>,
+            each element is associated with a key value that you specify, which could be a numeric, string, or other scalar type. This
+            example uses a <code class="ph codeph">STRING</code> key to give each phone number a name, such as <code class="ph codeph">'home'</code> or
+            <code class="ph codeph">'mobile'</code>. A query could filter the data based on the key values, or display the key values in reports.
+          </p>
+
+<pre class="pre codeblock"><code>
+CREATE TABLE contacts_unlimited_phones
+(
+  id BIGINT, name STRING, address STRING, phone_number MAP &lt; STRING,STRING &gt;
+) STORED AS PARQUET;
+
+</code></pre>
+
+        </figure>
+
+        <figure class="fig fignone" id="nested_types_ddl__complex_types_phones_flat_normalized"><figcaption><span class="fig--title-label">Figure 4. </span>Traditional Relational Representation of Phone Numbers: Normalized Tables</figcaption>
+
+          
+
+          <p class="p">
+            If you are an experienced database designer, you already know how to work around the limitations of the single-table schema from
+            <a class="xref" href="#nested_types_ddl__complex_types_phones_flat_fixed">Figure 1</a>. By normalizing the schema, with the phone numbers in their own
+            table, you can associate an arbitrary set of phone numbers with each person, and associate additional details with each phone
+            number, such as whether it is a home, work, or mobile phone.
+          </p>
+
+          <p class="p">
+            The flexibility of this approach comes with some drawbacks. Reconstructing all the data for a particular person requires a join
+            query, which might require performance tuning on Hadoop because the data from each table might be transmitted from a different
+            host. Data management tasks such as backups and refreshing the data require dealing with multiple tables instead of a single
+            table.
+          </p>
+
+          <p class="p">
+            This example illustrates a traditional database schema to store contact info normalized across 2 tables. The fact table
+            establishes the identity and basic information about person. A dimension table stores information only about phone numbers,
+            using an ID value to associate each phone number with a person ID from the fact table. Each person can have 0, 1, or many
+            phones; the categories are not restricted to a few predefined ones; and the phone table can contain as many columns as desired,
+            to represent all sorts of details about each phone number.
+          </p>
+
+<pre class="pre codeblock"><code>
+CREATE TABLE fact_contacts (id BIGINT, name STRING, address STRING) STORED AS PARQUET;
+CREATE TABLE dim_phones
+(
+    contact_id BIGINT
+  , category STRING
+  , international_code STRING
+  , area_code STRING
+  , exchange STRING
+  , extension STRING
+  , mobile BOOLEAN
+  , carrier STRING
+  , current BOOLEAN
+  , service_start_date TIMESTAMP
+  , service_end_date TIMESTAMP
+)
+STORED AS PARQUET;
+</code></pre>
+
+        </figure>
+
+        <figure class="fig fignone" id="nested_types_ddl__complex_types_phones_array_struct"><figcaption><span class="fig--title-label">Figure 5. </span>Phone Numbers Represented as an Array of Structs</figcaption>
+
+          
+
+          <p class="p">
+            To represent a schema equivalent to the one from <a class="xref" href="#nested_types_ddl__complex_types_phones_flat_normalized">Figure 4</a> using
+            complex types, this example uses an <code class="ph codeph">ARRAY</code> where each array element is a <code class="ph codeph">STRUCT</code>. As with the
+            earlier complex type examples, each person can have an arbitrary set of associated phone numbers. Making each array element into
+            a <code class="ph codeph">STRUCT</code> lets us associate multiple data items with each phone number, and give a separate name and type to
+            each data item. The <code class="ph codeph">STRUCT</code> fields of the <code class="ph codeph">ARRAY</code> elements reproduce the columns of the dimension
+            table from the previous example.
+          </p>
+
+          <p class="p">
+            You can do all the same kinds of queries with the complex type schema as with the normalized schema from the previous example.
+            The advantages of the complex type design are in the areas of convenience and performance. Now your backup and ETL processes
+            only deal with a single table. When a query uses a join to cross-reference the information about a person with their associated
+            phone numbers, all the relevant data for each row resides in the same HDFS data block, meaning each row can be processed on a
+            single host without requiring network transmission.
+          </p>
+
+<pre class="pre codeblock"><code>
+CREATE TABLE contacts_detailed_phones
+(
+  id BIGINT, name STRING, address STRING
+    , phone ARRAY &lt; STRUCT &lt;
+        category: STRING
+      , international_code: STRING
+      , area_code: STRING
+      , exchange: STRING
+      , extension: STRING
+      , mobile: BOOLEAN
+      , carrier: STRING
+      , current: BOOLEAN
+      , service_start_date: TIMESTAMP
+      , service_end_date: TIMESTAMP
+    &gt;&gt;
+) STORED AS PARQUET;
+
+</code></pre>
+
+        </figure>
+
+      </div>
+
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title13" id="complex_types_using__complex_types_sql">
+
+      <h3 class="title topictitle3" id="ariaid-title13">SQL Statements that Support Complex Types</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          The Impala SQL statements that support complex types are currently
+          <code class="ph codeph"><a class="xref" href="impala_create_table.html#create_table">CREATE TABLE</a></code>,
+          <code class="ph codeph"><a class="xref" href="impala_alter_table.html#alter_table">ALTER TABLE</a></code>,
+          <code class="ph codeph"><a class="xref" href="impala_describe.html#describe">DESCRIBE</a></code>,
+          <code class="ph codeph"><a class="xref" href="impala_load_data.html#load_data">LOAD DATA</a></code>, and
+          <code class="ph codeph"><a class="xref" href="impala_select.html#select">SELECT</a></code>. That is, currently Impala can create or alter tables
+          containing complex type columns, examine the structure of a table containing complex type columns, import existing data files
+          containing complex type columns into a table, and query Parquet tables containing complex types.
+        </p>
+
+        <p class="p">
+        Impala currently cannot write new data files containing complex type columns.
+        Therefore, although the <code class="ph codeph">SELECT</code> statement works for queries
+        involving complex type columns, you cannot use a statement form that writes
+        data to complex type columns, such as <code class="ph codeph">CREATE TABLE AS SELECT</code> or <code class="ph codeph">INSERT ... SELECT</code>.
+        To create data files containing complex type data, use the Hive <code class="ph codeph">INSERT</code> statement, or another
+        ETL mechanism such as MapReduce jobs, Spark jobs, Pig, and so on.
+      </p>
+
+        <p class="p toc inpage"></p>
+
+      </div>
+
+      <article class="topic concept nested3" aria-labelledby="ariaid-title14" id="complex_types_sql__complex_types_ddl">
+
+        <h4 class="title topictitle4" id="ariaid-title14">DDL Statements and Complex Types</h4>
+
+        <div class="body conbody">
+
+          <p class="p">
+            Column specifications for complex or nested types use <code class="ph codeph">&lt;</code> and <code class="ph codeph">&gt;</code> delimiters:
+          </p>
+
+<pre class="pre codeblock"><code>-- What goes inside the &lt; &gt; for an ARRAY is a single type, either a scalar or another
+-- complex type (ARRAY, STRUCT, or MAP).
+CREATE TABLE array_t
+(
+  id BIGINT,
+  a1 ARRAY &lt;STRING&gt;,
+  a2 ARRAY &lt;BIGINT&gt;,
+  a3 ARRAY &lt;TIMESTAMP&gt;,
+  a4 ARRAY &lt;STRUCT &lt;f1: STRING, f2: INT, f3: BOOLEAN&gt;&gt;
+)
+STORED AS PARQUET;
+
+-- What goes inside the &lt; &gt; for a MAP is two comma-separated types specifying the types of the key-value pair:
+-- a scalar type representing the key, and a scalar or complex type representing the value.
+CREATE TABLE map_t
+(
+  id BIGINT,
+  m1 MAP &lt;STRING, STRING&gt;,
+  m2 MAP &lt;STRING, BIGINT&gt;,
+  m3 MAP &lt;BIGINT, STRING&gt;,
+  m4 MAP &lt;BIGINT, BIGINT&gt;,
+  m5 MAP &lt;STRING, ARRAY &lt;STRING&gt;&gt;
+)
+STORED AS PARQUET;
+
+-- What goes inside the &lt; &gt; for a STRUCT is a comma-separated list of fields, each field defined as
+-- name:type. The type can be a scalar or a complex type. The field names for each STRUCT do not clash
+-- with the names of table columns or fields in other STRUCTs. A STRUCT is most often used inside
+-- an ARRAY or a MAP rather than as a top-level column.
+CREATE TABLE struct_t
+(
+  id BIGINT,
+  s1 STRUCT &lt;f1: STRING, f2: BIGINT&gt;,
+  s2 ARRAY &lt;STRUCT &lt;f1: INT, f2: TIMESTAMP&gt;&gt;,
+  s3 MAP &lt;BIGINT, STRUCT &lt;name: STRING, birthday: TIMESTAMP&gt;&gt;
+)
+STORED AS PARQUET;
+
+</code></pre>
+
+        </div>
+
+      </article>
+
+      <article class="topic concept nested3" aria-labelledby="ariaid-title15" id="complex_types_sql__complex_types_queries">
+
+        <h4 class="title topictitle4" id="ariaid-title15">Queries and Complex Types</h4>
+
+        <div class="body conbody">
+
+
+
+
+
+          <p class="p">
+            The result set of an Impala query always contains all scalar types; the elements and fields within any complex type queries must
+            be <span class="q">"unpacked"</span> using join queries. A query cannot directly retrieve the entire value for a complex type column. Impala
+            returns an error in this case. Queries using <code class="ph codeph">SELECT *</code> are allowed for tables with complex types, but the
+            columns with complex types are skipped.
+          </p>
+
+          <p class="p">
+            The following example shows how referring directly to a complex type column returns an error, while <code class="ph codeph">SELECT *</code> on
+            the same table succeeds, but only retrieves the scalar columns.
+          </p>
+
+          <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+      Many of the complex type examples refer to tables
+      such as <code class="ph codeph">CUSTOMER</code> and <code class="ph codeph">REGION</code>
+      adapted from the tables used in the TPC-H benchmark.
+      See <a class="xref" href="../shared/../topics/impala_complex_types.html#complex_sample_schema">Sample Schema and Data for Experimenting with Impala Complex Types</a>
+      for the table definitions.
+      </div>
+
+
+
+<pre class="pre codeblock"><code>SELECT c_orders FROM customer LIMIT 1;
+ERROR: AnalysisException: Expr 'c_orders' in select list returns a complex type 'ARRAY&lt;STRUCT&lt;o_orderkey:BIGINT,o_orderstatus:STRING, ... l_receiptdate:STRING,l_shipinstruct:STRING,l_shipmode:STRING,l_comment:STRING&gt;&gt;&gt;&gt;'.
+Only scalar types are allowed in the select list.
+
+-- Original column has several scalar and one complex column.
+DESCRIBE customer;
++--------------+------------------------------------+
+| name         | type                               |
++--------------+------------------------------------+
+| c_custkey    | bigint                             |
+| c_name       | string                             |
+...
+| c_orders     | array&lt;struct&lt;                      |
+|              |   o_orderkey:bigint,               |
+|              |   o_orderstatus:string,            |
+|              |   o_totalprice:decimal(12,2),      |
+...
+|              | &gt;&gt;                                 |
++--------------+------------------------------------+
+
+-- When we SELECT * from that table, only the scalar columns come back in the result set.
+CREATE TABLE select_star_customer STORED AS PARQUET AS SELECT * FROM customer;
++------------------------+
+| summary                |
++------------------------+
+| Inserted 150000 row(s) |
++------------------------+
+
+-- The c_orders column, being of complex type, was not included in the SELECT * result set.
+DESC select_star_customer;
++--------------+---------------+
+| name         | type          |
++--------------+---------------+
+| c_custkey    | bigint        |
+| c_name       | string        |
+| c_address    | string        |
+| c_nationkey  | smallint      |
+| c_phone      | string        |
+| c_acctbal    | decimal(12,2) |
+| c_mktsegment | string        |
+| c_comment    | string        |
++--------------+---------------+
+
+</code></pre>
+
+
+
+          <p class="p">
+            References to fields within <code class="ph codeph">STRUCT</code> columns use dot notation. If the field name is unambiguous, you can omit
+            qualifiers such as table name, column name, or even the <code class="ph codeph">ITEM</code> or <code class="ph codeph">VALUE</code> pseudocolumn names for
+            <code class="ph codeph">STRUCT</code> elements inside an <code class="ph codeph">ARRAY</code> or a <code class="ph codeph">MAP</code>.
+          </p>
+
+
+
+
+
+
+
+<pre class="pre codeblock"><code>SELECT id, address.city FROM customers WHERE address.zip = 94305;
+</code></pre>
+
+          <p class="p">
+            References to elements within <code class="ph codeph">ARRAY</code> columns use the <code class="ph codeph">ITEM</code> pseudocolumn:
+          </p>
+
+
+
+<pre class="pre codeblock"><code>select r_name, r_nations.item.n_name from region, region.r_nations limit 7;
++--------+----------------+
+| r_name | item.n_name    |
++--------+----------------+
+| EUROPE | UNITED KINGDOM |
+| EUROPE | RUSSIA         |
+| EUROPE | ROMANIA        |
+| EUROPE | GERMANY        |
+| EUROPE | FRANCE         |
+| ASIA   | VIETNAM        |
+| ASIA   | CHINA          |
++--------+----------------+
+</code></pre>
+
+          <p class="p">
+            References to fields within <code class="ph codeph">MAP</code> columns use the <code class="ph codeph">KEY</code> and <code class="ph codeph">VALUE</code> pseudocolumns.
+            In this example, once the query establishes the alias <code class="ph codeph">MAP_FIELD</code> for a <code class="ph codeph">MAP</code> column with a
+            <code class="ph codeph">STRING</code> key and an <code class="ph codeph">INT</code> value, the query can refer to <code class="ph codeph">MAP_FIELD.KEY</code> and
+            <code class="ph codeph">MAP_FIELD.VALUE</code>, which have zero, one, or many instances for each row from the containing table.
+          </p>
+
+<pre class="pre codeblock"><code>DESCRIBE table_0;
++---------+-----------------------+
+| name    | type                  |
++---------+-----------------------+
+| field_0 | string                |
+| field_1 | map&lt;string,int&gt;       |
+...
+
+SELECT field_0, map_field.key, map_field.value
+  FROM table_0, table_0.field_1 AS map_field
+WHERE length(field_0) = 1
+LIMIT 10;
++---------+-----------+-------+
+| field_0 | key       | value |
++---------+-----------+-------+
+| b       | gshsgkvd  | NULL  |
+| b       | twrtcxj6  | 18    |
+| b       | 2vp5      | 39    |
+| b       | fh0s      | 13    |
+| v       | 2         | 41    |
+| v       | 8b58mz    | 20    |
+| v       | hw        | 16    |
+| v       | 65l388pyt | 29    |
+| v       | 03k68g91z | 30    |
+| v       | r2hlg5b   | NULL  |
++---------+-----------+-------+
+
+</code></pre>
+
+
+
+          <p class="p">
+            When complex types are nested inside each other, you use a combination of joins, pseudocolumn names, and dot notation to refer
+            to specific fields at the appropriate level. This is the most frequent form of query syntax for complex columns, because the
+            typical use case involves two levels of complex types, such as an <code class="ph codeph">ARRAY</code> of <code class="ph codeph">STRUCT</code> elements.
+          </p>
+
+
+
+
+
+<pre class="pre codeblock"><code>SELECT id, phone_numbers.area_code FROM contact_info_many_structs INNER JOIN contact_info_many_structs.phone_numbers phone_numbers LIMIT 3;
+</code></pre>
+
+          <p class="p">
+            You can express relationships between <code class="ph codeph">ARRAY</code> and <code class="ph codeph">MAP</code> columns at different levels as joins. You
+            include comparison operators between fields at the top level and within the nested type columns so that Impala can do the
+            appropriate join operation.
+          </p>
+
+
+
+
+
+
+
+
+
+          <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+      Many of the complex type examples refer to tables
+      such as <code class="ph codeph">CUSTOMER</code> and <code class="ph codeph">REGION</code>
+      adapted from the tables used in the TPC-H benchmark.
+      See <a class="xref" href="../shared/../topics/impala_complex_types.html#complex_sample_schema">Sample Schema and Data for Experimenting with Impala Complex Types</a>
+      for the table definitions.
+      </div>
+
+          <p class="p">
+            For example, the following queries work equivalently. They each return customer and order data for customers that have at least
+            one order.
+          </p>
+
+<pre class="pre codeblock"><code>SELECT c.c_name, o.o_orderkey FROM customer c, c.c_orders o LIMIT 5;
++--------------------+------------+
+| c_name             | o_orderkey |
++--------------------+------------+
+| Customer#000072578 | 558821     |
+| Customer#000072578 | 2079810    |
+| Customer#000072578 | 5768068    |
+| Customer#000072578 | 1805604    |
+| Customer#000072578 | 3436389    |
++--------------------+------------+
+
+SELECT c.c_name, o.o_orderkey FROM customer c INNER JOIN c.c_orders o LIMIT 5;
++--------------------+------------+
+| c_name             | o_orderkey |
++--------------------+------------+
+| Customer#000072578 | 558821     |
+| Customer#000072578 | 2079810    |
+| Customer#000072578 | 5768068    |
+| Customer#000072578 | 1805604    |
+| Customer#000072578 | 3436389    |
++--------------------+------------+
+</code></pre>
+
+          <p class="p">
+            The following query using an outer join returns customers that have orders, plus customers with no orders (no entries in the
+            <code class="ph codeph">C_ORDERS</code> array):
+          </p>
+
+<pre class="pre codeblock"><code>SELECT c.c_custkey, o.o_orderkey
+  FROM customer c LEFT OUTER JOIN c.c_orders o
+LIMIT 5;
++-----------+------------+
+| c_custkey | o_orderkey |
++-----------+------------+
+| 60210     | NULL       |
+| 147873    | NULL       |
+| 72578     | 558821     |
+| 72578     | 2079810    |
+| 72578     | 5768068    |
++-----------+------------+
+
+</code></pre>
+
+          <p class="p">
+            The following query returns <em class="ph i">only</em> customers that have no orders. (With <code class="ph codeph">LEFT ANTI JOIN</code> or <code class="ph codeph">LEFT
+            SEMI JOIN</code>, the query can only refer to columns from the left-hand table, because by definition there is no matching
+            information in the right-hand table.)
+          </p>
+
+<pre class="pre codeblock"><code>SELECT c.c_custkey, c.c_name
+  FROM customer c LEFT ANTI JOIN c.c_orders o
+LIMIT 5;
++-----------+--------------------+
+| c_custkey | c_name             |
++-----------+--------------------+
+| 60210     | Customer#000060210 |
+| 147873    | Customer#000147873 |
+| 141576    | Customer#000141576 |
+| 85365     | Customer#000085365 |
+| 70998     | Customer#000070998 |
++-----------+--------------------+
+
+</code></pre>
+
+
+
+          <p class="p">
+            You can also perform correlated subqueries to examine the properties of complex type columns for each row in the result set.
+          </p>
+
+          <p class="p">
+            Count the number of orders per customer. Note the correlated reference to the table alias <code class="ph codeph">C</code>. The
+            <code class="ph codeph">COUNT(*)</code> operation applies to all the elements of the <code class="ph codeph">C_ORDERS</code> array for the corresponding
+            row, avoiding the need for a <code class="ph codeph">GROUP BY</code> clause.
+          </p>
+
+<pre class="pre codeblock"><code>select c_name, howmany FROM customer c, (SELECT COUNT(*) howmany FROM c.c_orders) v limit 5;
++--------------------+---------+
+| c_name             | howmany |
++--------------------+---------+
+| Customer#000030065 | 15      |
+| Customer#000065455 | 18      |
+| Customer#000113644 | 21      |
+| Customer#000111078 | 0       |
+| Customer#000024621 | 0       |
++--------------------+---------+
+</code></pre>
+
+          <p class="p">
+            Count the number of orders per customer, ignoring any customers that have not placed any orders:
+          </p>
+
+<pre class="pre codeblock"><code>SELECT c_name, howmany_orders
+FROM
+  customer c,
+  (SELECT COUNT(*) howmany_orders FROM c.c_orders) subq1
+WHERE howmany_orders &gt; 0
+LIMIT 5;
++--------------------+----------------+
+| c_name             | howmany_orders |
++--------------------+----------------+
+| Customer#000072578 | 7              |
+| Customer#000046378 | 26             |
+| Customer#000069815 | 11             |
+| Customer#000079058 | 12             |
+| Customer#000092239 | 26             |
++--------------------+----------------+
+</code></pre>
+
+          <p class="p">
+            Count the number of line items in each order. The reference to <code class="ph codeph">C.C_ORDERS</code> in the <code class="ph codeph">FROM</code> clause
+            is needed because the <code class="ph codeph">O_ORDERKEY</code> field is a member of the elements in the <code class="ph codeph">C_ORDERS</code> array. The
+            subquery labelled <code class="ph codeph">SUBQ1</code> is correlated: it is re-evaluated for the <code class="ph codeph">C_ORDERS.O_LINEITEMS</code> array
+            from each row of the <code class="ph codeph">CUSTOMERS</code> table.
+          </p>
+
+<pre class="pre codeblock"><code>SELECT c_name, o_orderkey, howmany_line_items
+FROM
+  customer c,
+  c.c_orders t2,
+  (SELECT COUNT(*) howmany_line_items FROM c.c_orders.o_lineitems) subq1
+WHERE howmany_line_items &gt; 0
+LIMIT 5;
++--------------------+------------+--------------------+
+| c_name             | o_orderkey | howmany_line_items |
++--------------------+------------+--------------------+
+| Customer#000020890 | 1884930    | 95                 |
+| Customer#000020890 | 4570754    | 95                 |
+| Customer#000020890 | 3771072    | 95                 |
+| Customer#000020890 | 2555489    | 95                 |
+| Customer#000020890 | 919171     | 95                 |
++--------------------+------------+--------------------+
+</code></pre>
+
+          <p class="p">
+            Get the number of orders, the average order price, and the maximum items in any order per customer. For this example, the
+            subqueries labelled <code class="ph codeph">SUBQ1</code> and <code class="ph codeph">SUBQ2</code> are correlated: they are re-evaluated for each row from
+            the original <code class="ph codeph">CUSTOMER</code> table, and only apply to the complex columns associated with that row.
+          </p>
+
+<pre class="pre codeblock"><code>SELECT c_name, howmany, average_price, most_items
+FROM
+  customer c,
+  (SELECT COUNT(*) howmany, AVG(o_totalprice) average_price FROM c.c_orders) subq1,
+  (SELECT MAX(l_quantity) most_items FROM c.c_orders.o_lineitems ) subq2
+LIMIT 5;
++--------------------+---------+---------------+------------+
+| c_name             | howmany | average_price | most_items |
++--------------------+---------+---------------+------------+
+| Customer#000030065 | 15      | 128908.34     | 50.00      |
+| Customer#000088191 | 0       | NULL          | NULL       |
+| Customer#000101555 | 10      | 164250.31     | 50.00      |
+| Customer#000022092 | 0       | NULL          | NULL       |
+| Customer#000036277 | 27      | 166040.06     | 50.00      |
++--------------------+---------+---------------+------------+
+</code></pre>
+
+          <p class="p">
+            For example, these queries show how to access information about the <code class="ph codeph">ARRAY</code> elements within the
+            <code class="ph codeph">CUSTOMER</code> table from the <span class="q">"nested TPC-H"</span> schema, starting with the initial <code class="ph codeph">ARRAY</code> elements
+            and progressing to examine the <code class="ph codeph">STRUCT</code> fields of the <code class="ph codeph">ARRAY</code>, and then the elements nested within
+            another <code class="ph codeph">ARRAY</code> of <code class="ph codeph">STRUCT</code>:
+          </p>
+
+<pre class="pre codeblock"><code>-- How many orders does each customer have?
+-- The type of the ARRAY column doesn't matter, this is just counting the elements.
+SELECT c_custkey, count(*)
+  FROM customer, customer.c_orders
+GROUP BY c_custkey
+LIMIT 5;
++-----------+----------+
+| c_custkey | count(*) |
++-----------+----------+
+| 61081     | 21       |
+| 115987    | 15       |
+| 69685     | 19       |
+| 109124    | 15       |
+| 50491     | 12       |
++-----------+----------+
+
+-- How many line items are part of each customer order?
+-- Now we examine a field from a STRUCT nested inside the ARRAY.
+SELECT c_custkey, c_orders.o_orderkey, count(*)
+  FROM customer, customer.c_orders c_orders, c_orders.o_lineitems
+GROUP BY c_custkey, c_orders.o_orderkey
+LIMIT 5;
++-----------+------------+----------+
+| c_custkey | o_orderkey | count(*) |
++-----------+------------+----------+
+| 63367     | 4985959    | 7        |
+| 53989     | 1972230    | 2        |
+| 143513    | 5750498    | 5        |
+| 17849     | 4857989    | 1        |
+| 89881     | 1046437    | 1        |
++-----------+------------+----------+
+
+-- What are the line items in each customer order?
+-- One of the STRUCT fields inside the ARRAY is another
+-- ARRAY containing STRUCT elements. The join finds
+-- all the related items from both levels of ARRAY.
+SELECT c_custkey, o_orderkey, l_partkey
+  FROM customer, customer.c_orders, c_orders.o_lineitems
+LIMIT 5;
++-----------+------------+-----------+
+| c_custkey | o_orderkey | l_partkey |
++-----------+------------+-----------+
+| 113644    | 2738497    | 175846    |
+| 113644    | 2738497    | 27309     |
+| 113644    | 2738497    | 175873    |
+| 113644    | 2738497    | 88559     |
+| 113644    | 2738497    | 8032      |
++-----------+------------+-----------+
+
+</code></pre>
+
+        </div>
+
+      </article>
+
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title16" id="complex_types_using__pseudocolumns">
+
+      <h3 class="title topictitle3" id="ariaid-title16">Pseudocolumns for ARRAY and MAP Types</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          Each element in an <code class="ph codeph">ARRAY</code> type has a position, indexed starting from zero, and a value. Each element in a
+          <code class="ph codeph">MAP</code> type represents a key-value pair. Impala provides pseudocolumns that let you retrieve this metadata as part
+          of a query, or filter query results by including such things in a <code class="ph codeph">WHERE</code> clause. You refer to the pseudocolumns as
+          part of qualified column names in queries:
+        </p>
+
+        <ul class="ul">
+          <li class="li">
+            <code class="ph codeph">ITEM</code>: The value of an array element. If the <code class="ph codeph">ARRAY</code> contains <code class="ph codeph">STRUCT</code> elements,
+            you can refer to either <code class="ph codeph"><var class="keyword varname">array_name</var>.ITEM.<var class="keyword varname">field_name</var></code> or use the shorthand
+            <code class="ph codeph"><var class="keyword varname">array_name</var>.<var class="keyword varname">field_name</var></code>.
+          </li>
+
+          <li class="li">
+            <code class="ph codeph">POS</code>: The position of an element within an array.
+          </li>
+
+          <li class="li">
+            <code class="ph codeph">KEY</code>: The value forming the first part of a key-value pair in a map. It is not necessarily unique.
+          </li>
+
+          <li class="li">
+            <code class="ph codeph">VALUE</code>: The data item forming the second part of a key-value pair in a map. If the <code class="ph codeph">VALUE</code> part
+            of the <code class="ph codeph">MAP</code> element is a <code class="ph codeph">STRUCT</code>, you can refer to either
+            <code class="ph codeph"><var class="keyword varname">map_name</var>.VALUE.<var class="keyword varname">field_name</var></code> or use the shorthand
+            <code class="ph codeph"><var class="keyword varname">map_name</var>.<var class="keyword varname">field_name</var></code>.
+          </li>
+        </ul>
+
+
+
+        <p class="p toc inpage"></p>
+
+      </div>
+
+      <article class="topic concept nested3" aria-labelledby="item__pos" id="pseudocolumns__item">
+
+        <h4 class="title topictitle4" id="item__pos">ITEM and POS Pseudocolumns</h4>
+
+        <div class="body conbody">
+
+          <p class="p">
+            When an <code class="ph codeph">ARRAY</code> column contains <code class="ph codeph">STRUCT</code> elements, you can refer to a field within the
+            <code class="ph codeph">STRUCT</code> using a qualified name of the form
+            <code class="ph codeph"><var class="keyword varname">array_column</var>.<var class="keyword varname">field_name</var></code>. If the <code class="ph codeph">ARRAY</code> contains scalar
+            values, Impala recognizes the special name <code class="ph codeph"><var class="keyword varname">array_column</var>.ITEM</code> to represent the value of each
+            scalar array element. For example, if a column contained an <code class="ph codeph">ARRAY</code> where each element was a
+            <code class="ph codeph">STRING</code>, you would use <code class="ph codeph"><var class="keyword varname">array_name</var>.ITEM</code> to refer to each scalar value in the
+            <code class="ph codeph">SELECT</code> list, or the <code class="ph codeph">WHERE</code> or other clauses.
+          </p>
+
+          <p class="p">
+            This example shows a table with two <code class="ph codeph">ARRAY</code> columns whose elements are of the scalar type
+            <code class="ph codeph">STRING</code>. When referring to the values of the array elements in the <code class="ph codeph">SELECT</code> list,
+            <code class="ph codeph">WHERE</code> clause, or <code class="ph codeph">ORDER BY</code> clause, you use the <code class="ph codeph">ITEM</code> pseudocolumn because
+            within the array, the individual elements have no defined names.
+          </p>
+
+<pre class="pre codeblock"><code>create TABLE persons_of_interest
+(
+person_id BIGINT,
+aliases ARRAY &lt;STRING&gt;,
+associates ARRAY &lt;STRING&gt;,
+real_name STRING
+)
+STORED AS PARQUET;
+
+-- Get all the aliases of each person.
+SELECT real_name, aliases.ITEM
+  FROM persons_of_interest, persons_of_interest.aliases
+ORDER BY real_name, aliases.item;
+
+-- Search for particular associates of each person.
+SELECT real_name, associates.ITEM
+  FROM persons_of_interest, persons_of_interest.associates
+WHERE associates.item LIKE '% MacGuffin';
+
+</code></pre>
+
+          <p class="p">
+            Because an array is inherently an ordered data structure, Impala recognizes the special name
+            <code class="ph codeph"><var class="keyword varname">array_column</var>.POS</code> to represent the numeric position of each element within the array. The
+            <code class="ph codeph">POS</code> pseudocolumn lets you filter or reorder the result set based on the sequence of array elements.
+          </p>
+
+          <p class="p">
+            The following example uses a table from a flattened version of the TPC-H schema. The <code class="ph codeph">REGION</code> table only has a
+            few rows, such as one row for Europe and one for Asia. The row for each region represents all the countries in that region as an
+            <code class="ph codeph">ARRAY</code> of <code class="ph codeph">STRUCT</code> elements:
+          </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; desc region;
++-------------+--------------------------------------------------------------------+
+| name        | type                                                               |
++-------------+--------------------------------------------------------------------+
+| r_regionkey | smallint                                                           |
+| r_name      | string                                                             |
+| r_comment   | string                                                             |
+| r_nations   | array&lt;struct&lt;n_nationkey:smallint,n_name:string,n_comment:string&gt;&gt; |
++-------------+--------------------------------------------------------------------+
+
+</code></pre>
+
+          <p class="p">
+            To find the countries within a specific region, you use a join query. To find out the order of elements in the array, you also
+            refer to the <code class="ph codeph">POS</code> pseudocolumn in the select list:
+          </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; SELECT r1.r_name, r2.n_name, <strong class="ph b">r2.POS</strong>
+                  &gt; FROM region r1 INNER JOIN r1.r_nations r2
+                  &gt; WHERE r1.r_name = 'ASIA';
++--------+-----------+-----+
+| r_name | n_name    | pos |
++--------+-----------+-----+
+| ASIA   | VIETNAM   | 0   |
+| ASIA   | CHINA     | 1   |
+| ASIA   | JAPAN     | 2   |
+| ASIA   | INDONESIA | 3   |
+| ASIA   | INDIA     | 4   |
++--------+-----------+-----+
+</code></pre>
+
+          <p class="p">
+            Once you know the positions of the elements, you can use that information in subsequent queries, for example to change the
+            ordering of results from the complex type column or to filter certain elements from the array:
+          </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; SELECT r1.r_name, r2.n_name, r2.POS
+                  &gt; FROM region r1 INNER JOIN r1.r_nations r2
+                  &gt; WHERE r1.r_name = 'ASIA'
+                  &gt; <strong class="ph b">ORDER BY r2.POS DESC</strong>;
++--------+-----------+-----+
+| r_name | n_name    | pos |
++--------+-----------+-----+
+| ASIA   | INDIA     | 4   |
+| ASIA   | INDONESIA | 3   |
+| ASIA   | JAPAN     | 2   |
+| ASIA   | CHINA     | 1   |
+| ASIA   | VIETNAM   | 0   |
++--------+-----------+-----+
+[localhost:21000] &gt; SELECT r1.r_name, r2.n_name, r2.POS
+                  &gt; FROM region r1 INNER JOIN r1.r_nations r2
+                  &gt; WHERE r1.r_name = 'ASIA' AND <strong class="ph b">r2.POS BETWEEN 1 and 3</strong>;
++--------+-----------+-----+
+| r_name | n_name    | pos |
++--------+-----------+-----+
+| ASIA   | CHINA     | 1   |
+| ASIA   | JAPAN     | 2   |
+| ASIA   | INDONESIA | 3   |
++--------+-----------+-----+
+</code></pre>
+
+        </div>
+
+      </article>
+
+      <article class="topic concept nested3" aria-labelledby="key__value" id="pseudocolumns__key">
+
+        <h4 class="title topictitle4" id="key__value">KEY and VALUE Pseudocolumns</h4>
+
+        <div class="body conbody">
+
+          <p class="p">
+            The <code class="ph codeph">MAP</code> data type is suitable for representing sparse or wide data structures, where each row might only have
+            entries for a small subset of named fields. Because the element names (the map keys) vary depending on the row, a query must be
+            able to refer to both the key and the value parts of each key-value pair. The <code class="ph codeph">KEY</code> and <code class="ph codeph">VALUE</code>
+            pseudocolumns let you refer to the parts of the key-value pair independently within the query, as
+            <code class="ph codeph"><var class="keyword varname">map_column</var>.KEY</code> and <code class="ph codeph"><var class="keyword varname">map_column</var>.VALUE</code>.
+          </p>
+
+          <p class="p">
+            The <code class="ph codeph">KEY</code> must always be a scalar type, such as <code class="ph codeph">STRING</code>, <code class="ph codeph">BIGINT</code>, or
+            <code class="ph codeph">TIMESTAMP</code>. It can be <code class="ph codeph">NULL</code>. Values of the <code class="ph codeph">KEY</code> field are not necessarily unique
+            within the same <code class="ph codeph">MAP</code>. You apply any required <code class="ph codeph">DISTINCT</code>, <code class="ph codeph">GROUP BY</code>, and other
+            clauses in the query, and loop through the result set to process all the values matching any specified keys.
+          </p>
+
+          <p class="p">
+            The <code class="ph codeph">VALUE</code> can be either a scalar type or another complex type. If the <code class="ph codeph">VALUE</code> is a
+            <code class="ph codeph">STRUCT</code>, you can construct a qualified name
+            <code class="ph codeph"><var class="keyword varname">map_column</var>.VALUE.<var class="keyword varname">struct_field</var></code> to refer to the individual fields inside
+            the value part. If the <code class="ph codeph">VALUE</code> is an <code class="ph codeph">ARRAY</code> or another <code class="ph codeph">MAP</code>, you must include
+            another join condition that establishes a table alias for <code class="ph codeph"><var class="keyword varname">map_column</var>.VALUE</code>, and then
+            construct another qualified name using that alias, for example <code class="ph codeph"><var class="keyword varname">table_alias</var>.ITEM</code> or
+            <code class="ph codeph"><var class="keyword varname">table_alias</var>.KEY</code> and <code class="ph codeph"><var class="keyword varname">table_alias</var>.VALUE</code>
+          </p>
+
+          <p class="p">
+            The following example shows different ways to access a <code class="ph codeph">MAP</code> column using the <code class="ph codeph">KEY</code> and
+            <code class="ph codeph">VALUE</code> pseudocolumns. The <code class="ph codeph">DETAILS</code> column has a <code class="ph codeph">STRING</code> first part with short,
+            standardized values such as <code class="ph codeph">'Recurring'</code>, <code class="ph codeph">'Lucid'</code>, or <code class="ph codeph">'Anxiety'</code>. This is the
+            <span class="q">"key"</span> that is used to look up particular kinds of elements from the <code class="ph codeph">MAP</code>. The second part, also a
+            <code class="ph codeph">STRING</code>, is a longer free-form explanation. Impala gives you the standard pseudocolumn names
+            <code class="ph codeph">KEY</code> and <code class="ph codeph">VALUE</code> for the two parts, and you apply your own conventions and interpretations to the
+            underlying values.
+          </p>
+
+          <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+            If you find that the single-item nature of the <code class="ph codeph">VALUE</code> makes it difficult to model your data accurately, the
+            solution is typically to add some nesting to the complex type. For example, to have several sets of key-value pairs, make the
+            column an <code class="ph codeph">ARRAY</code> whose elements are <code class="ph codeph">MAP</code>. To make a set of key-value pairs that holds more
+            elaborate information, make a <code class="ph codeph">MAP</code> column whose <code class="ph codeph">VALUE</code> part contains an <code class="ph codeph">ARRAY</code>
+            or a <code class="ph codeph">STRUCT</code>.
+          </div>
+
+<pre class="pre codeblock"><code>CREATE TABLE dream_journal
+(
+  dream_id BIGINT,
+  details MAP &lt;STRING,STRING&gt;
+)
+STORED AS PARQUET;
+
+
+-- What are all the types of dreams that are recorded?
+SELECT DISTINCT details.KEY FROM dream_journal, dream_journal.details;
+
+-- How many lucid dreams were recorded?
+-- Because there is no GROUP BY, we count the 'Lucid' keys across all rows.
+SELECT <strong class="ph b">COUNT(details.KEY)</strong>
+  FROM dream_journal, dream_journal.details
+WHERE <strong class="ph b">details.KEY = 'Lucid'</strong>;
+
+-- Print a report of a subset of dreams, filtering based on both the lookup key
+-- and the detailed value.
+SELECT dream_id, <strong class="ph b">details.KEY AS "Dream Type"</strong>, <strong class="ph b">details.VALUE AS "Dream Summary"</strong>
+  FROM dream_journal, dream_journal.details
+WHERE
+  <strong class="ph b">details.KEY IN ('Happy', 'Pleasant', 'Joyous')</strong>
+  AND <strong class="ph b">details.VALUE LIKE '%childhood%'</strong>;
+</code></pre>
+
+          <p class="p">
+            The following example shows a more elaborate version of the previous table, where the <code class="ph codeph">VALUE</code> part of the
+            <code class="ph codeph">MAP</code> entry is a <code class="ph codeph">STRUCT</code> rather than a scalar type. Now instead of referring to the
+            <code class="ph codeph">VALUE</code> pseudocolumn directly, you use dot notation to refer to the <code class="ph codeph">STRUCT</code> fields inside it.
+          </p>
+
+<pre class="pre codeblock"><code>CREATE TABLE better_dream_journal
+(
+  dream_id BIGINT,
+  details MAP &lt;STRING,STRUCT &lt;summary: STRING, when_happened: TIMESTAMP, duration: DECIMAL(5,2), woke_up: BOOLEAN&gt; &gt;
+)
+STORED AS PARQUET;
+
+
+-- Do more elaborate reporting and filtering by examining multiple attributes within the same dream.
+SELECT dream_id, <strong class="ph b">details.KEY AS "Dream Type"</strong>, <strong class="ph b">details.VALUE.summary AS "Dream Summary"</strong>, <strong class="ph b">details.VALUE.duration AS "Duration"</strong>
+  FROM better_dream_journal, better_dream_journal.details
+WHERE
+  <strong class="ph b">details.KEY IN ('Anxiety', 'Nightmare')</strong>
+  AND <strong class="ph b">details.VALUE.duration &gt; 60</strong>
+  AND <strong class="ph b">details.VALUE.woke_up = TRUE</strong>;
+
+-- Remember that if the ITEM or VALUE contains a STRUCT, you can reference
+-- the STRUCT fields directly without the .ITEM or .VALUE qualifier.
+SELECT dream_id, <strong class="ph b">details.KEY AS "Dream Type"</strong>, <strong class="ph b">details.summary AS "Dream Summary"</strong>, <strong class="ph b">details.duration AS "Duration"</strong>
+  FROM better_dream_journal, better_dream_journal.details
+WHERE
+  <strong class="ph b">details.KEY IN ('Anxiety', 'Nightmare')</strong>
+  AND <strong class="ph b">details.duration &gt; 60</strong>
+  AND <strong class="ph b">details.woke_up = TRUE</strong>;
+</code></pre>
+
+        </div>
+
+      </article>
+
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title19" id="complex_types_using__complex_types_etl">
+
+
+
+      <h3 class="title topictitle3" id="ariaid-title19">Loading Data Containing Complex Types</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          Because the Impala <code class="ph codeph">INSERT</code> statement does not currently support creating new data with complex type columns, or
+          copying existing complex type values from one table to another, you primarily use Impala to query Parquet tables with complex
+          types where the data was inserted through Hive, or create tables with compl

<TRUNCATED>

[08/51] [partial] incubator-impala git commit: IMPALA-4181 [DOCS] Publish rendered Impala documentation to ASF site

Posted by jb...@apache.org.

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_security_guidelines.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_security_guidelines.html b/docs/build/html/topics/impala_security_guidelines.html
new file mode 100644
index 0000000..4b1a738
--- /dev/null
+++ b/docs/build/html/topics/impala_security_guidelines.html
@@ -0,0 +1,99 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_security.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="security_guidelines"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Security Guidelines for Impala</title></head><body id="security_guidelines"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Security Guidelines for Impala</h1>
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      The following are the major steps to harden a cluster running Impala against accidents and mistakes, or
+      malicious attackers trying to access sensitive data:
+    </p>
+
+    <ul class="ul">
+      <li class="li">
+      <p class="p">
+        Secure the <code class="ph codeph">root</code> account. The <code class="ph codeph">root</code> user can tamper with the
+        <span class="keyword cmdname">impalad</span> daemon, read and write the data files in HDFS, log into other user accounts, and
+        access other system services that are beyond the control of Impala.
+      </p>
+      </li>
+
+      <li class="li">
+      <p class="p">
+        Restrict membership in the <code class="ph codeph">sudoers</code> list (in the <span class="ph filepath">/etc/sudoers</span> file).
+        The users who can run the <code class="ph codeph">sudo</code> command can do many of the same things as the
+        <code class="ph codeph">root</code> user.
+      </p>
+      </li>
+
+      <li class="li">
+      <p class="p">
+        Ensure the Hadoop ownership and permissions for Impala data files are restricted.
+      </p>
+      </li>
+
+      <li class="li">
+      <p class="p">
+        Ensure the Hadoop ownership and permissions for Impala log files are restricted.
+      </p>
+      </li>
+
+      <li class="li">
+      <p class="p">
+        Ensure that the Impala web UI (available by default on port 25000 on each Impala node) is
+        password-protected. See <a class="xref" href="impala_webui.html#webui">Impala Web User Interface for Debugging</a> for details.
+      </p>
+      </li>
+
+      <li class="li">
+      <p class="p">
+        Create a policy file that specifies which Impala privileges are available to users in particular Hadoop
+        groups (which by default map to Linux OS groups). Create the associated Linux groups using the
+        <span class="keyword cmdname">groupadd</span> command if necessary.
+      </p>
+      </li>
+
+      <li class="li">
+      <p class="p">
+        The Impala authorization feature makes use of the HDFS file ownership and permissions mechanism; for
+        background information, see the
+        <a class="xref" href="https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsPermissionsGuide.html" target="_blank">HDFS Permissions Guide</a>.
+        Set up users and assign them to groups at the OS level, corresponding to the
+        different categories of users with different access levels for various databases, tables, and HDFS
+        locations (URIs). Create the associated Linux users using the <span class="keyword cmdname">useradd</span> command if
+        necessary, and add them to the appropriate groups with the <span class="keyword cmdname">usermod</span> command.
+      </p>
+      </li>
+
+      <li class="li">
+      <p class="p">
+        Design your databases, tables, and views with database and table structure to allow policy rules to specify
+        simple, consistent rules. For example, if all tables related to an application are inside a single
+        database, you can assign privileges for that database and use the <code class="ph codeph">*</code> wildcard for the table
+        name. If you are creating views with different privileges than the underlying base tables, you might put
+        the views in a separate database so that you can use the <code class="ph codeph">*</code> wildcard for the database
+        containing the base tables, while specifying the precise names of the individual views. (For specifying
+        table or database names, you either specify the exact name or <code class="ph codeph">*</code> to mean all the databases
+        on a server, or all the tables and views in a database.)
+      </p>
+      </li>
+
+      <li class="li">
+      <p class="p">
+        Enable authorization by running the <code class="ph codeph">impalad</code> daemons with the <code class="ph codeph">-server_name</code>
+        and <code class="ph codeph">-authorization_policy_file</code> options on all nodes. (The authorization feature does not
+        apply to the <span class="keyword cmdname">statestored</span> daemon, which has no access to schema objects or data files.)
+      </p>
+      </li>
+
+      <li class="li">
+      <p class="p">
+        Set up authentication using Kerberos, to make sure users really are who they say they are.
+      </p>
+      </li>
+    </ul>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_security.html">Impala Security</a></div></div></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_security_install.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_security_install.html b/docs/build/html/topics/impala_security_install.html
new file mode 100644
index 0000000..f9724ef
--- /dev/null
+++ b/docs/build/html/topics/impala_security_install.html
@@ -0,0 +1,17 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_security.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="security_install"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Installation Considerations for Impala Security</title></head><body id="security_install"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Installation Considerations for Impala Security</h1>
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      Impala 1.1 comes set up with all the software and settings needed to enable security when you run the
+      <span class="keyword cmdname">impalad</span> daemon with the new security-related options (<code class="ph codeph">-server_name</code> and
+      <code class="ph codeph">-authorization_policy_file</code>). You do not need to change any environment variables or install
+      any additional JAR files.
+    </p>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_security.html">Impala Security</a></div></div></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_security_metastore.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_security_metastore.html b/docs/build/html/topics/impala_security_metastore.html
new file mode 100644
index 0000000..cc852ad
--- /dev/null
+++ b/docs/build/html/topics/impala_security_metastore.html
@@ -0,0 +1,30 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_security.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="security_metastore"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Securing the Hive Metastore Database</title></head><body id="security_metastore"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Securing the Hive Metastore Database</h1>
+  
+
+  <div class="body conbody">
+
+
+
+    <p class="p">
+      It is important to secure the Hive metastore, so that users cannot access the names or other information
+      about databases and tables the through the Hive client or by querying the metastore database. Do this by
+      turning on Hive metastore security, using the instructions in
+      <span class="xref">the documentation for your Apache Hadoop distribution</span> for securing different Hive components:
+    </p>
+
+    <ul class="ul">
+      <li class="li">
+        Secure the Hive Metastore.
+      </li>
+
+      <li class="li">
+        In addition, allow access to the metastore only from the HiveServer2 server, and then disable local access
+        to the HiveServer2 server.
+      </li>
+    </ul>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_security.html">Impala Security</a></div></div></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_security_webui.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_security_webui.html b/docs/build/html/topics/impala_security_webui.html
new file mode 100644
index 0000000..6286012
--- /dev/null
+++ b/docs/build/html/topics/impala_security_webui.html
@@ -0,0 +1,57 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_security.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="security_webui"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Securing the Impala Web User Interface</title></head><body id="security_webui"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Securing the Impala Web User Interface</h1>
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      The instructions in this section presume you are familiar with the
+      <a class="xref" href="http://en.wikipedia.org/wiki/.htpasswd" target="_blank">
+      <span class="ph filepath">.htpasswd</span> mechanism</a> commonly used to password-protect pages on web servers.
+    </p>
+
+    <p class="p">
+      Password-protect the Impala web UI that listens on port 25000 by default. Set up a
+      <span class="ph filepath">.htpasswd</span> file in the <code class="ph codeph">$IMPALA_HOME</code> directory, or start both the
+      <span class="keyword cmdname">impalad</span> and <span class="keyword cmdname">statestored</span> daemons with the
+      <code class="ph codeph">--webserver_password_file</code> option to specify a different location (including the filename).
+    </p>
+
+    <p class="p">
+      This file should only be readable by the Impala process and machine administrators, because it contains
+      (hashed) versions of passwords. The username / password pairs are not derived from Unix usernames, Kerberos
+      users, or any other system. The <code class="ph codeph">domain</code> field in the password file must match the domain
+      supplied to Impala by the new command-line option <code class="ph codeph">--webserver_authentication_domain</code>. The
+      default is <code class="ph codeph">mydomain.com</code>.
+
+    </p>
+
+    <p class="p">
+      Impala also supports using HTTPS for secure web traffic. To do so, set
+      <code class="ph codeph">--webserver_certificate_file</code> to refer to a valid <code class="ph codeph">.pem</code> TLS/SSL certificate file.
+      Impala will automatically start using HTTPS once the TLS/SSL certificate has been read and validated. A
+      <code class="ph codeph">.pem</code> file is basically a private key, followed by a signed TLS/SSL certificate; make sure to
+      concatenate both parts when constructing the <code class="ph codeph">.pem</code> file.
+
+    </p>
+
+    <p class="p">
+      If Impala cannot find or parse the <code class="ph codeph">.pem</code> file, it prints an error message and quits.
+    </p>
+
+    <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+      <p class="p">
+        If the private key is encrypted using a passphrase, Impala will ask for that passphrase on startup, which
+        is not useful for a large cluster. In that case, remove the passphrase and make the <code class="ph codeph">.pem</code>
+        file readable only by Impala and administrators.
+      </p>
+      <p class="p">
+        When you turn on TLS/SSL for the Impala web UI, the associated URLs change from <code class="ph codeph">http://</code>
+        prefixes to <code class="ph codeph">https://</code>. Adjust any bookmarks or application code that refers to those URLs.
+      </p>
+    </div>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_security.html">Impala Security</a></div></div></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_select.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_select.html b/docs/build/html/topics/impala_select.html
new file mode 100644
index 0000000..7a12c42
--- /dev/null
+++ b/docs/build/html/topics/impala_select.html
@@ -0,0 +1,227 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_langref_sql.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_joins.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_order_by.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_group_by.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_having.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_limit.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_offset.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_union.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_subqueries.html"><meta name="DC.Relation" scheme="U
 RI" content="../topics/impala_with.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_distinct.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_hints.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="select"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>SELECT Statement</title></head><body id="select"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">SELECT Statement</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      
+      The <code class="ph codeph">SELECT</code> statement performs queries, retrieving data from one or more tables and producing
+      result sets consisting of rows and columns.
+    </p>
+
+    <p class="p">
+      The Impala <code class="ph codeph"><a class="xref" href="impala_insert.html#insert">INSERT</a></code> statement also typically ends
+      with a <code class="ph codeph">SELECT</code> statement, to define data to copy from one table to another.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>[WITH <em class="ph i">name</em> AS (<em class="ph i">select_expression</em>) [, ...] ]
+SELECT
+  [ALL | DISTINCT]
+  [STRAIGHT_JOIN]
+  <em class="ph i">expression</em> [, <em class="ph i">expression</em> ...]
+FROM <em class="ph i">table_reference</em> [, <em class="ph i">table_reference</em> ...]
+[[FULL | [LEFT | RIGHT] INNER | [LEFT | RIGHT] OUTER | [LEFT | RIGHT] SEMI | [LEFT | RIGHT] ANTI | CROSS]
+  JOIN <em class="ph i">table_reference</em>
+  [ON <em class="ph i">join_equality_clauses</em> | USING (<var class="keyword varname">col1</var>[, <var class="keyword varname">col2</var> ...]] ...
+WHERE <em class="ph i">conditions</em>
+GROUP BY { <em class="ph i">column</em> | <em class="ph i">expression</em> [ASC | DESC] [NULLS FIRST | NULLS LAST] [, ...] }
+HAVING <code class="ph codeph">conditions</code>
+GROUP BY { <em class="ph i">column</em> | <em class="ph i">expression</em> [ASC | DESC] [, ...] }
+LIMIT <em class="ph i">expression</em> [OFFSET <em class="ph i">expression</em>]
+[UNION [ALL] <em class="ph i">select_statement</em>] ...]
+</code></pre>
+
+    <p class="p">
+      Impala <code class="ph codeph">SELECT</code> queries support:
+    </p>
+
+    <ul class="ul">
+      <li class="li">
+        SQL scalar data types: <code class="ph codeph"><a class="xref" href="impala_boolean.html#boolean">BOOLEAN</a></code>,
+        <code class="ph codeph"><a class="xref" href="impala_tinyint.html#tinyint">TINYINT</a></code>,
+        <code class="ph codeph"><a class="xref" href="impala_smallint.html#smallint">SMALLINT</a></code>,
+        <code class="ph codeph"><a class="xref" href="impala_int.html#int">INT</a></code>,
+        <code class="ph codeph"><a class="xref" href="impala_bigint.html#bigint">BIGINT</a></code>,
+        <code class="ph codeph"><a class="xref" href="impala_decimal.html#decimal">DECIMAL</a></code>
+        <code class="ph codeph"><a class="xref" href="impala_float.html#float">FLOAT</a></code>,
+        <code class="ph codeph"><a class="xref" href="impala_double.html#double">DOUBLE</a></code>,
+        <code class="ph codeph"><a class="xref" href="impala_timestamp.html#timestamp">TIMESTAMP</a></code>,
+        <code class="ph codeph"><a class="xref" href="impala_string.html#string">STRING</a></code>,
+        <code class="ph codeph"><a class="xref" href="impala_varchar.html#varchar">VARCHAR</a></code>,
+        <code class="ph codeph"><a class="xref" href="impala_char.html#char">CHAR</a></code>.
+      </li>
+
+
+      <li class="li">
+        The complex data types <code class="ph codeph">ARRAY</code>, <code class="ph codeph">STRUCT</code>, and <code class="ph codeph">MAP</code>,
+        are available in <span class="keyword">Impala 2.3</span> and higher.
+        Queries involving these types typically involve special qualified names
+        using dot notation for referring to the complex column fields,
+        and join clauses for bringing the complex columns into the result set.
+        See <a class="xref" href="impala_complex_types.html#complex_types">Complex Types (Impala 2.3 or higher only)</a> for details.
+      </li>
+
+      <li class="li">
+        An optional <a class="xref" href="impala_with.html#with"><code class="ph codeph">WITH</code> clause</a> before the
+        <code class="ph codeph">SELECT</code> keyword, to define a subquery whose name or column names can be referenced from
+        later in the main query. This clause lets you abstract repeated clauses, such as aggregation functions,
+        that are referenced multiple times in the same query.
+      </li>
+
+      <li class="li">
+        By default, one <code class="ph codeph">DISTINCT</code> clause per query. See <a class="xref" href="impala_distinct.html#distinct">DISTINCT Operator</a>
+        for details. See <a class="xref" href="impala_appx_count_distinct.html#appx_count_distinct">APPX_COUNT_DISTINCT Query Option (Impala 2.0 or higher only)</a> for a query option to
+        allow multiple <code class="ph codeph">COUNT(DISTINCT)</code> impressions in the same query.
+      </li>
+
+      <li class="li">
+        Subqueries in a <code class="ph codeph">FROM</code> clause. In <span class="keyword">Impala 2.0</span> and higher,
+        subqueries can also go in the <code class="ph codeph">WHERE</code> clause, for example with the
+        <code class="ph codeph">IN()</code>, <code class="ph codeph">EXISTS</code>, and <code class="ph codeph">NOT EXISTS</code> operators.
+      </li>
+
+      <li class="li">
+        <code class="ph codeph">WHERE</code>, <code class="ph codeph">GROUP BY</code>, <code class="ph codeph">HAVING</code> clauses.
+      </li>
+
+      <li class="li">
+        <code class="ph codeph"><a class="xref" href="impala_order_by.html#order_by">ORDER BY</a></code>. Prior to Impala 1.4.0, Impala
+        required that queries using an <code class="ph codeph">ORDER BY</code> clause also include a
+        <code class="ph codeph"><a class="xref" href="impala_limit.html#limit">LIMIT</a></code> clause. In Impala 1.4.0 and higher, this
+        restriction is lifted; sort operations that would exceed the Impala memory limit automatically use a
+        temporary disk work area to perform the sort.
+      </li>
+
+      <li class="li">
+        <p class="p">
+        Impala supports a wide variety of <code class="ph codeph">JOIN</code> clauses. Left, right, semi, full, and outer joins
+        are supported in all Impala versions. The <code class="ph codeph">CROSS JOIN</code> operator is available in Impala 1.2.2
+        and higher. During performance tuning, you can override the reordering of join clauses that Impala does
+        internally by including the keyword <code class="ph codeph">STRAIGHT_JOIN</code> immediately after the
+        <code class="ph codeph">SELECT</code> keyword
+      </p>
+        <p class="p">
+          See <a class="xref" href="impala_joins.html#joins">Joins in Impala SELECT Statements</a> for details and examples of join queries.
+        </p>
+      </li>
+
+      <li class="li">
+        <code class="ph codeph">UNION ALL</code>.
+      </li>
+
+      <li class="li">
+        <code class="ph codeph">LIMIT</code>.
+      </li>
+
+      <li class="li">
+        External tables.
+      </li>
+
+      <li class="li">
+        Relational operators such as greater than, less than, or equal to.
+      </li>
+
+      <li class="li">
+        Arithmetic operators such as addition or subtraction.
+      </li>
+
+      <li class="li">
+        Logical/Boolean operators <code class="ph codeph">AND</code>, <code class="ph codeph">OR</code>, and <code class="ph codeph">NOT</code>. Impala does
+        not support the corresponding symbols <code class="ph codeph">&amp;&amp;</code>, <code class="ph codeph">||</code>, and
+        <code class="ph codeph">!</code>.
+      </li>
+
+      <li class="li">
+        Common SQL built-in functions such as <code class="ph codeph">COUNT</code>, <code class="ph codeph">SUM</code>, <code class="ph codeph">CAST</code>,
+        <code class="ph codeph">LIKE</code>, <code class="ph codeph">IN</code>, <code class="ph codeph">BETWEEN</code>, and <code class="ph codeph">COALESCE</code>. Impala
+        specifically supports built-ins described in <a class="xref" href="impala_functions.html#builtins">Impala Built-In Functions</a>.
+      </li>
+    </ul>
+
+    <p class="p">
+        Impala queries ignore files with extensions commonly used for temporary work files by Hadoop tools. Any
+        files with extensions <code class="ph codeph">.tmp</code> or <code class="ph codeph">.copying</code> are not considered part of the
+        Impala table. The suffix matching is case-insensitive, so for example Impala ignores both
+        <code class="ph codeph">.copying</code> and <code class="ph codeph">.COPYING</code> suffixes.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Security considerations:</strong>
+      </p>
+    <p class="p">
+        If these statements in your environment contain sensitive literal values such as credit card numbers or tax
+        identifiers, Impala can redact this sensitive information when displaying the statements in log files and
+        other administrative contexts. See <span class="xref">the documentation for your Apache Hadoop distribution</span> for details.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Amazon S3 considerations:</strong>
+      </p>
+    <p class="p">
+        In <span class="keyword">Impala 2.6</span> and higher, Impala queries are optimized for files stored in Amazon S3.
+        For Impala tables that use the file formats Parquet, RCFile, SequenceFile,
+        Avro, and uncompressed text, the setting <code class="ph codeph">fs.s3a.block.size</code>
+        in the <span class="ph filepath">core-site.xml</span> configuration file determines
+        how Impala divides the I/O work of reading the data files. This configuration
+        setting is specified in bytes. By default, this
+        value is 33554432 (32 MB), meaning that Impala parallelizes S3 read operations on the files
+        as if they were made up of 32 MB blocks. For example, if your S3 queries primarily access
+        Parquet files written by MapReduce or Hive, increase <code class="ph codeph">fs.s3a.block.size</code>
+        to 134217728 (128 MB) to match the row group size of those files. If most S3 queries involve
+        Parquet files written by Impala, increase <code class="ph codeph">fs.s3a.block.size</code>
+        to 268435456 (256 MB) to match the row group size produced by Impala.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Cancellation:</strong> Can be cancelled. To cancel this statement, use Ctrl-C from the
+        <span class="keyword cmdname">impala-shell</span> interpreter, the <span class="ph uicontrol">Cancel</span> button from the
+        <span class="ph uicontrol">Watch</span> page in Hue, or <span class="ph uicontrol">Cancel</span> from the list of
+        in-flight queries (for a particular node) on the <span class="ph uicontrol">Queries</span> tab in the Impala web UI
+        (port 25000).
+      </p>
+
+    <p class="p">
+        <strong class="ph b">HDFS permissions:</strong>
+      </p>
+    <p class="p">
+      The user ID that the <span class="keyword cmdname">impalad</span> daemon runs under,
+      typically the <code class="ph codeph">impala</code> user, must have read
+      permissions for the files in all applicable directories in all source tables,
+      and read and execute permissions for the relevant data directories.
+      (A <code class="ph codeph">SELECT</code> operation could read files from multiple different HDFS directories
+      if the source table is partitioned.)
+      If a query attempts to read a data file and is unable to because of an HDFS permission error,
+      the query halts and does not return any further results.
+    </p>
+
+    <p class="p toc"></p>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+    <p class="p">
+      The <code class="ph codeph">SELECT</code> syntax is so extensive that it forms its own category of statements: queries. The
+      other major classifications of SQL statements are data definition language (see
+      <a class="xref" href="impala_ddl.html#ddl">DDL Statements</a>) and data manipulation language (see <a class="xref" href="impala_dml.html#dml">DML Statements</a>).
+    </p>
+
+    <p class="p">
+      Because the focus of Impala is on fast queries with interactive response times over huge data sets, query
+      performance and scalability are important considerations. See
+      <a class="xref" href="impala_performance.html#performance">Tuning Impala for Performance</a> and <a class="xref" href="impala_scalability.html#scalability">Scalability Considerations for Impala</a> for
+      details.
+    </p>
+  </div>
+
+  
+<nav role="navigation" class="related-links"><ul class="ullinks"><li class="link ulchildlink"><strong><a href="../topics/impala_joins.html">Joins in Impala SELECT Statements</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_order_by.html">ORDER BY Clause</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_group_by.html">GROUP BY Clause</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_having.html">HAVING Clause</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_limit.html">LIMIT Clause</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_offset.html">OFFSET Clause</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_union.html">UNION Clause</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_subqueries.html">Subqueries in Impala SELECT Statements</a></strong><
 br></li><li class="link ulchildlink"><strong><a href="../topics/impala_with.html">WITH Clause</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_distinct.html">DISTINCT Operator</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_hints.html">Query Hints in Impala SELECT Statements</a></strong><br></li></ul><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_langref_sql.html">Impala SQL Statements</a></div></div></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_seqfile.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_seqfile.html b/docs/build/html/topics/impala_seqfile.html
new file mode 100644
index 0000000..53a0eaf
--- /dev/null
+++ b/docs/build/html/topics/impala_seqfile.html
@@ -0,0 +1,240 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_file_formats.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="seqfile"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Using the SequenceFile File Format with Impala Tables</title></head><body id="seqfile"><main role="main"><article role="article" aria-labelledby="seqfile__sequencefile">
+
+  <h1 class="title topictitle1" id="seqfile__sequencefile">Using the SequenceFile File Format with Impala Tables</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      
+      Impala supports using SequenceFile data files.
+    </p>
+
+    <table class="table"><caption><span class="table--title-label">Table 1. </span><span class="title">SequenceFile Format Support in Impala</span></caption><colgroup><col style="width:10%"><col style="width:10%"><col style="width:20%"><col style="width:30%"><col style="width:30%"></colgroup><thead class="thead">
+          <tr class="row">
+            <th class="entry nocellnorowborder" id="seqfile__entry__1">
+              File Type
+            </th>
+            <th class="entry nocellnorowborder" id="seqfile__entry__2">
+              Format
+            </th>
+            <th class="entry nocellnorowborder" id="seqfile__entry__3">
+              Compression Codecs
+            </th>
+            <th class="entry nocellnorowborder" id="seqfile__entry__4">
+              Impala Can CREATE?
+            </th>
+            <th class="entry nocellnorowborder" id="seqfile__entry__5">
+              Impala Can INSERT?
+            </th>
+          </tr>
+        </thead><tbody class="tbody">
+          <tr class="row">
+            <td class="entry nocellnorowborder" headers="seqfile__entry__1 ">
+              <a class="xref" href="impala_seqfile.html#seqfile">SequenceFile</a>
+            </td>
+            <td class="entry nocellnorowborder" headers="seqfile__entry__2 ">
+              Structured
+            </td>
+            <td class="entry nocellnorowborder" headers="seqfile__entry__3 ">
+              Snappy, gzip, deflate, bzip2
+            </td>
+            <td class="entry nocellnorowborder" headers="seqfile__entry__4 ">Yes.</td>
+            <td class="entry nocellnorowborder" headers="seqfile__entry__5 ">
+              No. Import data by using <code class="ph codeph">LOAD DATA</code> on data files already in the right format, or use
+              <code class="ph codeph">INSERT</code> in Hive followed by <code class="ph codeph">REFRESH <var class="keyword varname">table_name</var></code> in Impala.
+            </td>
+
+          </tr>
+        </tbody></table>
+
+    <p class="p toc inpage"></p>
+  </div>
+
+  <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_file_formats.html">How Impala Works with Hadoop File Formats</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="seqfile__seqfile_create">
+
+    <h2 class="title topictitle2" id="ariaid-title2">Creating SequenceFile Tables and Loading Data</h2>
+  
+
+    <div class="body conbody">
+
+      <p class="p">
+        If you do not have an existing data file to use, begin by creating one in the appropriate format.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">To create a SequenceFile table:</strong>
+      </p>
+
+      <p class="p">
+        In the <code class="ph codeph">impala-shell</code> interpreter, issue a command similar to:
+      </p>
+
+<pre class="pre codeblock"><code>create table sequencefile_table (<var class="keyword varname">column_specs</var>) stored as sequencefile;</code></pre>
+
+      <p class="p">
+        Because Impala can query some kinds of tables that it cannot currently write to, after creating tables of
+        certain file formats, you might use the Hive shell to load the data. See
+        <a class="xref" href="impala_file_formats.html#file_formats">How Impala Works with Hadoop File Formats</a> for details. After loading data into a table through
+        Hive or other mechanism outside of Impala, issue a <code class="ph codeph">REFRESH <var class="keyword varname">table_name</var></code>
+        statement the next time you connect to the Impala node, before querying the table, to make Impala recognize
+        the new data.
+      </p>
+
+      <p class="p">
+        For example, here is how you might create some SequenceFile tables in Impala (by specifying the columns
+        explicitly, or cloning the structure of another table), load data through Hive, and query them through
+        Impala:
+      </p>
+
+<pre class="pre codeblock"><code>$ impala-shell -i localhost
+[localhost:21000] &gt; create table seqfile_table (x int) stored as sequencefile;
+[localhost:21000] &gt; create table seqfile_clone like some_other_table stored as sequencefile;
+[localhost:21000] &gt; quit;
+
+$ hive
+hive&gt; insert into table seqfile_table select x from some_other_table;
+3 Rows loaded to seqfile_table
+Time taken: 19.047 seconds
+hive&gt; quit;
+
+$ impala-shell -i localhost
+[localhost:21000] &gt; select * from seqfile_table;
+Returned 0 row(s) in 0.23s
+[localhost:21000] &gt; -- Make Impala recognize the data loaded through Hive;
+[localhost:21000] &gt; refresh seqfile_table;
+[localhost:21000] &gt; select * from seqfile_table;
++---+
+| x |
++---+
+| 1 |
+| 2 |
+| 3 |
++---+
+Returned 3 row(s) in 0.23s</code></pre>
+
+      <p class="p">
+        <strong class="ph b">Complex type considerations:</strong>
+        Although you can create tables in this file format using
+        the complex types (<code class="ph codeph">ARRAY</code>, <code class="ph codeph">STRUCT</code>,
+        and <code class="ph codeph">MAP</code>) available in <span class="keyword">Impala 2.3</span> and higher,
+        currently, Impala can query these types only in Parquet tables.
+        <span class="ph">
+        The one exception to the preceding rule is <code class="ph codeph">COUNT(*)</code> queries on RCFile tables that include complex types.
+        Such queries are allowed in <span class="keyword">Impala 2.6</span> and higher.
+        </span>
+      </p>
+
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="seqfile__seqfile_compression">
+
+    <h2 class="title topictitle2" id="ariaid-title3">Enabling Compression for SequenceFile Tables</h2>
+  
+
+    <div class="body conbody">
+
+      <p class="p">
+        
+        You may want to enable compression on existing tables. Enabling compression provides performance gains in
+        most cases and is supported for SequenceFile tables. For example, to enable Snappy compression, you would
+        specify the following additional settings when loading data through the Hive shell:
+      </p>
+
+<pre class="pre codeblock"><code>hive&gt; SET hive.exec.compress.output=true;
+hive&gt; SET mapred.max.split.size=256000000;
+hive&gt; SET mapred.output.compression.type=BLOCK;
+hive&gt; SET mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec;
+hive&gt; insert overwrite table <var class="keyword varname">new_table</var> select * from <var class="keyword varname">old_table</var>;</code></pre>
+
+      <p class="p">
+        If you are converting partitioned tables, you must complete additional steps. In such a case, specify
+        additional settings similar to the following:
+      </p>
+
+<pre class="pre codeblock"><code>hive&gt; create table <var class="keyword varname">new_table</var> (<var class="keyword varname">your_cols</var>) partitioned by (<var class="keyword varname">partition_cols</var>) stored as <var class="keyword varname">new_format</var>;
+hive&gt; SET hive.exec.dynamic.partition.mode=nonstrict;
+hive&gt; SET hive.exec.dynamic.partition=true;
+hive&gt; insert overwrite table <var class="keyword varname">new_table</var> partition(<var class="keyword varname">comma_separated_partition_cols</var>) select * from <var class="keyword varname">old_table</var>;</code></pre>
+
+      <p class="p">
+        Remember that Hive does not require that you specify a source format for it. Consider the case of
+        converting a table with two partition columns called <code class="ph codeph">year</code> and <code class="ph codeph">month</code> to a
+        Snappy compressed SequenceFile. Combining the components outlined previously to complete this table
+        conversion, you would specify settings similar to the following:
+      </p>
+
+<pre class="pre codeblock"><code>hive&gt; create table TBL_SEQ (int_col int, string_col string) STORED AS SEQUENCEFILE;
+hive&gt; SET hive.exec.compress.output=true;
+hive&gt; SET mapred.max.split.size=256000000;
+hive&gt; SET mapred.output.compression.type=BLOCK;
+hive&gt; SET mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec;
+hive&gt; SET hive.exec.dynamic.partition.mode=nonstrict;
+hive&gt; SET hive.exec.dynamic.partition=true;
+hive&gt; INSERT OVERWRITE TABLE tbl_seq SELECT * FROM tbl;</code></pre>
+
+      <p class="p">
+        To complete a similar process for a table that includes partitions, you would specify settings similar to
+        the following:
+      </p>
+
+<pre class="pre codeblock"><code>hive&gt; CREATE TABLE tbl_seq (int_col INT, string_col STRING) PARTITIONED BY (year INT) STORED AS SEQUENCEFILE;
+hive&gt; SET hive.exec.compress.output=true;
+hive&gt; SET mapred.max.split.size=256000000;
+hive&gt; SET mapred.output.compression.type=BLOCK;
+hive&gt; SET mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec;
+hive&gt; SET hive.exec.dynamic.partition.mode=nonstrict;
+hive&gt; SET hive.exec.dynamic.partition=true;
+hive&gt; INSERT OVERWRITE TABLE tbl_seq PARTITION(year) SELECT * FROM tbl;</code></pre>
+
+      <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+        <p class="p">
+          The compression type is specified in the following command:
+        </p>
+<pre class="pre codeblock"><code>SET mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec;</code></pre>
+        <p class="p">
+          You could elect to specify alternative codecs such as <code class="ph codeph">GzipCodec</code> here.
+        </p>
+      </div>
+    </div>
+  </article>
+
+  
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="seqfile__seqfile_performance">
+
+    <h2 class="title topictitle2" id="ariaid-title4">Query Performance for Impala SequenceFile Tables</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        In general, expect query performance with SequenceFile tables to be
+        faster than with tables using text data, but slower than with
+        Parquet tables. See <a class="xref" href="impala_parquet.html#parquet">Using the Parquet File Format with Impala Tables</a>
+        for information about using the Parquet file format for
+        high-performance analytic queries.
+      </p>
+
+      <p class="p">
+        In <span class="keyword">Impala 2.6</span> and higher, Impala queries are optimized for files stored in Amazon S3.
+        For Impala tables that use the file formats Parquet, RCFile, SequenceFile,
+        Avro, and uncompressed text, the setting <code class="ph codeph">fs.s3a.block.size</code>
+        in the <span class="ph filepath">core-site.xml</span> configuration file determines
+        how Impala divides the I/O work of reading the data files. This configuration
+        setting is specified in bytes. By default, this
+        value is 33554432 (32 MB), meaning that Impala parallelizes S3 read operations on the files
+        as if they were made up of 32 MB blocks. For example, if your S3 queries primarily access
+        Parquet files written by MapReduce or Hive, increase <code class="ph codeph">fs.s3a.block.size</code>
+        to 134217728 (128 MB) to match the row group size of those files. If most S3 queries involve
+        Parquet files written by Impala, increase <code class="ph codeph">fs.s3a.block.size</code>
+        to 268435456 (256 MB) to match the row group size produced by Impala.
+      </p>
+
+    </div>
+  </article>
+
+</article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_set.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_set.html b/docs/build/html/topics/impala_set.html
new file mode 100644
index 0000000..b16ff7b
--- /dev/null
+++ b/docs/build/html/topics/impala_set.html
@@ -0,0 +1,200 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_langref_sql.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="set"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>SET Statement</title></head><body id="set"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">SET Statement</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      
+      Specifies values for query options that control the runtime behavior of other statements within the same
+      session.
+    </p>
+
+    <p class="p">
+      In <span class="keyword">Impala 2.5</span> and higher, <code class="ph codeph">SET</code> also defines user-specified substitution variables for
+      the <span class="keyword cmdname">impala-shell</span> interpreter. This feature uses the <code class="ph codeph">SET</code> command
+      built into <span class="keyword cmdname">impala-shell</span> instead of the SQL <code class="ph codeph">SET</code> statement.
+      Therefore the substitution mechanism only works with queries processed by <span class="keyword cmdname">impala-shell</span>,
+      not with queries submitted through JDBC or ODBC.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>SET [<var class="keyword varname">query_option</var>=<var class="keyword varname">option_value</var>]
+</code></pre>
+
+    <p class="p">
+      <code class="ph codeph">SET</code> with no arguments returns a result set consisting of all available query options and
+      their current values.
+    </p>
+
+    <p class="p">
+      The query option name and any string argument values are case-insensitive.
+    </p>
+
+    <p class="p">
+      Each query option has a specific allowed notation for its arguments. Boolean options can be enabled and
+      disabled by assigning values of either <code class="ph codeph">true</code> and <code class="ph codeph">false</code>, or
+      <code class="ph codeph">1</code> and <code class="ph codeph">0</code>. Some numeric options accept a final character signifying the unit,
+      such as <code class="ph codeph">2g</code> for 2 gigabytes or <code class="ph codeph">100m</code> for 100 megabytes. See
+      <a class="xref" href="impala_query_options.html#query_options">Query Options for the SET Statement</a> for the details of each query option.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">User-specified substitution variables:</strong>
+    </p>
+
+    <p class="p">
+      In <span class="keyword">Impala 2.5</span> and higher, you can specify your own names and string substitution values
+      within the <span class="keyword cmdname">impala-shell</span> interpreter. Once a substitution variable is set up,
+      its value is inserted into any SQL statement in that same <span class="keyword cmdname">impala-shell</span> session
+      that contains the notation <code class="ph codeph">${var:<var class="keyword varname">varname</var>}</code>.
+      Using <code class="ph codeph">SET</code> in an interactive <span class="keyword cmdname">impala-shell</span> session overrides
+      any value for that same variable passed in through the <code class="ph codeph">--var=<var class="keyword varname">varname</var>=<var class="keyword varname">value</var></code>
+      command-line option.
+    </p>
+
+    <p class="p">
+      For example, to set up some default parameters for report queries, but then override those default
+      within an <span class="keyword cmdname">impala-shell</span> session, you might issue commands and statements such as
+      the following:
+    </p>
+
+<pre class="pre codeblock"><code>
+-- Initial setup for this example.
+create table staging_table (s string);
+insert into staging_table values ('foo'), ('bar'), ('bletch');
+
+create table production_table (s string);
+insert into production_table values ('North America'), ('EMEA'), ('Asia');
+quit;
+
+-- Start impala-shell with user-specified substitution variables,
+-- run a query, then override the variables with SET and run the query again.
+$ impala-shell --var=table_name=staging_table --var=cutoff=2
+... <var class="keyword varname">banner message</var> ...
+[localhost:21000] &gt; select s from ${var:table_name} order by s limit ${var:cutoff};
+Query: select s from staging_table order by s limit 2
++--------+
+| s      |
++--------+
+| bar    |
+| bletch |
++--------+
+Fetched 2 row(s) in 1.06s
+
+[localhost:21000] &gt; set var:table_name=production_table;
+Variable TABLE_NAME set to production_table
+[localhost:21000] &gt; set var:cutoff=3;
+Variable CUTOFF set to 3
+
+[localhost:21000] &gt; select s from ${var:table_name} order by s limit ${var:cutoff};
+Query: select s from production_table order by s limit 3
++---------------+
+| s             |
++---------------+
+| Asia          |
+| EMEA          |
+| North America |
++---------------+
+</code></pre>
+
+    <p class="p">
+      The following example shows how <code class="ph codeph">SET</code> with no parameters displays
+      all user-specified substitution variables, and how <code class="ph codeph">UNSET</code> removes
+      the substitution variable entirely:
+    </p>
+
+<pre class="pre codeblock"><code>
+[localhost:21000] &gt; set;
+Query options (defaults shown in []):
+  ABORT_ON_DEFAULT_LIMIT_EXCEEDED: [0]
+  ...
+  V_CPU_CORES: [0]
+
+Shell Options
+  LIVE_PROGRESS: False
+  LIVE_SUMMARY: False
+
+Variables:
+  CUTOFF: 3
+  TABLE_NAME: staging_table
+
+[localhost:21000] &gt; unset var:cutoff;
+Unsetting variable CUTOFF
+[localhost:21000] &gt; select s from ${var:table_name} order by s limit ${var:cutoff};
+Error: Unknown variable CUTOFF
+</code></pre>
+
+    <p class="p">
+      See <a class="xref" href="impala_shell_running_commands.html">Running Commands and SQL Statements in impala-shell</a> for more examples of using the
+      <code class="ph codeph">--var</code>, <code class="ph codeph">SET</code>, and <code class="ph codeph">${var:<var class="keyword varname">varname</var>}</code>
+      substitution technique in <span class="keyword cmdname">impala-shell</span>.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+    <p class="p">
+      <code class="ph codeph">MEM_LIMIT</code> is probably the most commonly used query option. You can specify a high value to
+      allow a resource-intensive query to complete. For testing how queries would work on memory-constrained
+      systems, you might specify an artificially low value.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Complex type considerations:</strong>
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+    <p class="p">
+      The following example sets some numeric and some Boolean query options to control usage of memory, disk
+      space, and timeout periods, then runs a query whose success could depend on the options in effect:
+    </p>
+
+<pre class="pre codeblock"><code>set mem_limit=64g;
+set DISABLE_UNSAFE_SPILLS=true;
+set parquet_file_size=400m;
+set RESERVATION_REQUEST_TIMEOUT=900000;
+insert overwrite parquet_table select c1, c2, count(c3) from text_table group by c1, c2, c3;
+</code></pre>
+
+    <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.0.0</span>
+      </p>
+
+    <p class="p">
+      <code class="ph codeph">SET</code> has always been available as an <span class="keyword cmdname">impala-shell</span> command. Promoting it to
+      a SQL statement lets you use this feature in client applications through the JDBC and ODBC APIs.
+    </p>
+
+
+
+    <p class="p">
+        <strong class="ph b">Cancellation:</strong> Cannot be cancelled.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">HDFS permissions:</strong> This statement does not touch any HDFS files or directories,
+        therefore no HDFS permissions are required.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+    <p class="p">
+      See <a class="xref" href="impala_query_options.html#query_options">Query Options for the SET Statement</a> for the query options you can adjust using this
+      statement.
+    </p>
+  </div>
+<nav role="navigation" class="related-links"><ul class="ullinks"><li class="link ulchildlink"><strong><a href="../topics/impala_query_options.html">Query Options for the SET Statement</a></strong><br></li></ul><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_langref_sql.html">Impala SQL Statements</a></div></div></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_shell_commands.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_shell_commands.html b/docs/build/html/topics/impala_shell_commands.html
new file mode 100644
index 0000000..d2bee6c
--- /dev/null
+++ b/docs/build/html/topics/impala_shell_commands.html
@@ -0,0 +1,392 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_impala_shell.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="shell_commands"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>impala-shell Command Reference</title></head><body id="shell_commands"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">impala-shell Command Reference</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      
+      Use the following commands within <code class="ph codeph">impala-shell</code> to pass requests to the
+      <code class="ph codeph">impalad</code> daemon that the shell is connected to. You can enter a command interactively at the
+      prompt, or pass it as the argument to the <code class="ph codeph">-q</code> option of <code class="ph codeph">impala-shell</code>. Most
+      of these commands are passed to the Impala daemon as SQL statements; refer to the corresponding
+      <a class="xref" href="impala_langref_sql.html#langref_sql">SQL language reference sections</a> for full syntax
+      details.
+    </p>
+
+    <table class="table"><caption></caption><colgroup><col style="width:20%"><col style="width:80%"></colgroup><thead class="thead">
+          <tr class="row">
+            <th class="entry nocellnorowborder" id="shell_commands__entry__1">
+              Command
+            </th>
+            <th class="entry nocellnorowborder" id="shell_commands__entry__2">
+              Explanation
+            </th>
+          </tr>
+        </thead><tbody class="tbody">
+          <tr class="row" id="shell_commands__alter_cmd">
+            <td class="entry nocellnorowborder" headers="shell_commands__entry__1 ">
+              <p class="p">
+                <code class="ph codeph">alter</code>
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="shell_commands__entry__2 ">
+              <p class="p">
+                Changes the underlying structure or settings of an Impala table, or a table shared between Impala
+                and Hive. See <a class="xref" href="impala_alter_table.html#alter_table">ALTER TABLE Statement</a> and
+                <a class="xref" href="impala_alter_view.html#alter_view">ALTER VIEW Statement</a> for details.
+              </p>
+            </td>
+          </tr>
+          <tr class="row" id="shell_commands__compute_cmd">
+            <td class="entry nocellnorowborder" headers="shell_commands__entry__1 ">
+              <p class="p">
+                <code class="ph codeph">compute stats</code>
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="shell_commands__entry__2 ">
+              <p class="p">
+                Gathers important performance-related information for a table, used by Impala to optimize queries.
+                See <a class="xref" href="impala_compute_stats.html#compute_stats">COMPUTE STATS Statement</a> for details.
+              </p>
+            </td>
+          </tr>
+          <tr class="row" id="shell_commands__connect_cmd">
+            <td class="entry nocellnorowborder" headers="shell_commands__entry__1 ">
+              <p class="p">
+                <code class="ph codeph">connect</code>
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="shell_commands__entry__2 ">
+              <p class="p">
+                Connects to the specified instance of <code class="ph codeph">impalad</code>. The default port of 21000 is
+                assumed unless you provide another value. You can connect to any host in your cluster that is
+                running <code class="ph codeph">impalad</code>. If you connect to an instance of <code class="ph codeph">impalad</code> that
+                was started with an alternate port specified by the <code class="ph codeph">--fe_port</code> flag, you must
+                provide that alternate port. See <a class="xref" href="impala_connecting.html#connecting">Connecting to impalad through impala-shell</a> for examples.
+              </p>
+
+              <p class="p">
+        The <code class="ph codeph">SET</code> statement has no effect until the <span class="keyword cmdname">impala-shell</span> interpreter is
+        connected to an Impala server. Once you are connected, any query options you set remain in effect as you
+        issue a subsequent <code class="ph codeph">CONNECT</code> command to connect to a different Impala host.
+      </p>
+            </td>
+          </tr>
+          <tr class="row" id="shell_commands__describe_cmd">
+            <td class="entry nocellnorowborder" headers="shell_commands__entry__1 ">
+              <p class="p">
+                <code class="ph codeph">describe</code>
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="shell_commands__entry__2 ">
+              <p class="p">
+                Shows the columns, column data types, and any column comments for a specified table.
+                <code class="ph codeph">DESCRIBE FORMATTED</code> shows additional information such as the HDFS data directory,
+                partitions, and internal properties for the table. See <a class="xref" href="impala_describe.html#describe">DESCRIBE Statement</a>
+                for details about the basic <code class="ph codeph">DESCRIBE</code> output and the <code class="ph codeph">DESCRIBE
+                FORMATTED</code> variant. You can use <code class="ph codeph">DESC</code> as shorthand for the
+                <code class="ph codeph">DESCRIBE</code> command.
+              </p>
+            </td>
+          </tr>
+          <tr class="row" id="shell_commands__drop_cmd">
+            <td class="entry nocellnorowborder" headers="shell_commands__entry__1 ">
+              <p class="p">
+                <code class="ph codeph">drop</code>
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="shell_commands__entry__2 ">
+              <p class="p">
+                Removes a schema object, and in some cases its associated data files. See
+                <a class="xref" href="impala_drop_table.html#drop_table">DROP TABLE Statement</a>, <a class="xref" href="impala_drop_view.html#drop_view">DROP VIEW Statement</a>,
+                <a class="xref" href="impala_drop_database.html#drop_database">DROP DATABASE Statement</a>, and
+                <a class="xref" href="impala_drop_function.html#drop_function">DROP FUNCTION Statement</a> for details.
+              </p>
+            </td>
+          </tr>
+          <tr class="row" id="shell_commands__explain_cmd">
+            <td class="entry nocellnorowborder" headers="shell_commands__entry__1 ">
+              <p class="p">
+                <code class="ph codeph">explain</code>
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="shell_commands__entry__2 ">
+              <p class="p">
+                Provides the execution plan for a query. <code class="ph codeph">EXPLAIN</code> represents a query as a series of
+                steps. For example, these steps might be map/reduce stages, metastore operations, or file system
+                operations such as move or rename. See <a class="xref" href="impala_explain.html#explain">EXPLAIN Statement</a> and
+                <a class="xref" href="impala_explain_plan.html#perf_explain">Using the EXPLAIN Plan for Performance Tuning</a> for details.
+              </p>
+            </td>
+          </tr>
+          <tr class="row" id="shell_commands__help_cmd">
+            <td class="entry nocellnorowborder" headers="shell_commands__entry__1 ">
+              <p class="p">
+                <code class="ph codeph">help</code>
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="shell_commands__entry__2 ">
+              <p class="p">
+                Help provides a list of all available commands and options.
+              </p>
+            </td>
+          </tr>
+          <tr class="row" id="shell_commands__history_cmd">
+            <td class="entry nocellnorowborder" headers="shell_commands__entry__1 ">
+              <p class="p">
+                <code class="ph codeph">history</code>
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="shell_commands__entry__2 ">
+              <p class="p">
+                Maintains an enumerated cross-session command history. This history is stored in the
+                <span class="ph filepath">~/.impalahistory</span> file.
+              </p>
+            </td>
+          </tr>
+          <tr class="row" id="shell_commands__insert_cmd">
+            <td class="entry nocellnorowborder" headers="shell_commands__entry__1 ">
+              <p class="p">
+                <code class="ph codeph">insert</code>
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="shell_commands__entry__2 ">
+              <p class="p">
+                Writes the results of a query to a specified table. This either overwrites table data or appends
+                data to the existing table content. See <a class="xref" href="impala_insert.html#insert">INSERT Statement</a> for details.
+              </p>
+            </td>
+          </tr>
+          <tr class="row" id="shell_commands__invalidate_metadata_cmd">
+            <td class="entry nocellnorowborder" headers="shell_commands__entry__1 ">
+              <p class="p">
+                <code class="ph codeph">invalidate metadata</code>
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="shell_commands__entry__2 ">
+              <p class="p">
+                Updates <span class="keyword cmdname">impalad</span> metadata for table existence and structure. Use this command
+                after creating, dropping, or altering databases, tables, or partitions in Hive. See
+                <a class="xref" href="impala_invalidate_metadata.html#invalidate_metadata">INVALIDATE METADATA Statement</a> for details.
+              </p>
+            </td>
+          </tr>
+          <tr class="row" id="shell_commands__profile_cmd">
+            <td class="entry nocellnorowborder" headers="shell_commands__entry__1 ">
+              <p class="p">
+                <code class="ph codeph">profile</code>
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="shell_commands__entry__2 ">
+              <p class="p">
+                Displays low-level information about the most recent query. Used for performance diagnosis and
+                tuning. <span class="ph"> The report starts with the same information as produced by the
+                <code class="ph codeph">EXPLAIN</code> statement and the <code class="ph codeph">SUMMARY</code> command.</span> See
+                <a class="xref" href="impala_explain_plan.html#perf_profile">Using the Query Profile for Performance Tuning</a> for details.
+              </p>
+            </td>
+          </tr>
+          <tr class="row" id="shell_commands__quit_cmd">
+            <td class="entry nocellnorowborder" headers="shell_commands__entry__1 ">
+              <p class="p">
+                <code class="ph codeph">quit</code>
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="shell_commands__entry__2 ">
+              <p class="p">
+                Exits the shell. Remember to include the final semicolon so that the shell recognizes the end of
+                the command.
+              </p>
+            </td>
+          </tr>
+          <tr class="row" id="shell_commands__refresh_cmd">
+            <td class="entry nocellnorowborder" headers="shell_commands__entry__1 ">
+              <p class="p">
+                <code class="ph codeph">refresh</code>
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="shell_commands__entry__2 ">
+              <p class="p">
+                Refreshes <span class="keyword cmdname">impalad</span> metadata for the locations of HDFS blocks corresponding to
+                Impala data files. Use this command after loading new data files into an Impala table through Hive
+                or through HDFS commands. See <a class="xref" href="impala_refresh.html#refresh">REFRESH Statement</a> for details.
+              </p>
+            </td>
+          </tr>
+          <tr class="row" id="shell_commands__select_cmd">
+            <td class="entry nocellnorowborder" headers="shell_commands__entry__1 ">
+              <p class="p">
+                <code class="ph codeph">select</code>
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="shell_commands__entry__2 ">
+              <p class="p">
+                Specifies the data set on which to complete some action. All information returned from
+                <code class="ph codeph">select</code> can be sent to some output such as the console or a file or can be used to
+                complete some other element of query. See <a class="xref" href="impala_select.html#select">SELECT Statement</a> for details.
+              </p>
+            </td>
+          </tr>
+          <tr class="row" id="shell_commands__set_cmd">
+            <td class="entry nocellnorowborder" headers="shell_commands__entry__1 ">
+              <p class="p">
+                <code class="ph codeph">set</code>
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="shell_commands__entry__2 ">
+              <p class="p">
+                Manages query options for an <span class="keyword cmdname">impala-shell</span> session. The available options are the
+                ones listed in <a class="xref" href="impala_query_options.html#query_options">Query Options for the SET Statement</a>. These options are used for
+                query tuning and troubleshooting. Issue <code class="ph codeph">SET</code> with no arguments to see the current
+                query options, either based on the <span class="keyword cmdname">impalad</span> defaults, as specified by you at
+                <span class="keyword cmdname">impalad</span> startup, or based on earlier <code class="ph codeph">SET</code> statements in the same
+                session. To modify option values, issue commands with the syntax <code class="ph codeph">set
+                <var class="keyword varname">option</var>=<var class="keyword varname">value</var></code>. To restore an option to its default,
+                use the <code class="ph codeph">unset</code> command. Some options take Boolean values of <code class="ph codeph">true</code>
+                and <code class="ph codeph">false</code>. Others take numeric arguments, or quoted string values.
+              </p>
+
+              <p class="p">
+        The <code class="ph codeph">SET</code> statement has no effect until the <span class="keyword cmdname">impala-shell</span> interpreter is
+        connected to an Impala server. Once you are connected, any query options you set remain in effect as you
+        issue a subsequent <code class="ph codeph">CONNECT</code> command to connect to a different Impala host.
+      </p>
+
+              <p class="p">
+                In Impala 2.0 and later, <code class="ph codeph">SET</code> is available as a SQL statement for any kind of
+                application, not only through <span class="keyword cmdname">impala-shell</span>. See
+                <a class="xref" href="impala_set.html#set">SET Statement</a> for details.
+              </p>
+
+              <p class="p">
+                In Impala 2.5 and later, you can use <code class="ph codeph">SET</code> to define your own substitution variables
+                within an <span class="keyword cmdname">impala-shell</span> session.
+                Within a SQL statement, you substitute the value by using the notation <code class="ph codeph">${var:<var class="keyword varname">variable_name</var>}</code>.
+              </p>
+            </td>
+          </tr>
+          <tr class="row" id="shell_commands__shell_cmd">
+            <td class="entry nocellnorowborder" headers="shell_commands__entry__1 ">
+              <p class="p">
+                <code class="ph codeph">shell</code>
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="shell_commands__entry__2 ">
+              <p class="p">
+                Executes the specified command in the operating system shell without exiting
+                <code class="ph codeph">impala-shell</code>. You can use the <code class="ph codeph">!</code> character as shorthand for the
+                <code class="ph codeph">shell</code> command.
+              </p>
+
+              <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+                Quote any instances of the <code class="ph codeph">--</code> or <code class="ph codeph">/*</code> tokens to avoid them being
+                interpreted as the start of a comment. To embed comments within <code class="ph codeph">source</code> or
+                <code class="ph codeph">!</code> commands, use the shell comment character <code class="ph codeph">#</code> before the comment
+                portion of the line.
+              </div>
+            </td>
+          </tr>
+          <tr class="row" id="shell_commands__show_cmd">
+            <td class="entry nocellnorowborder" headers="shell_commands__entry__1 ">
+              <p class="p">
+                <code class="ph codeph">show</code>
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="shell_commands__entry__2 ">
+              <p class="p">
+                Displays metastore data for schema objects created and accessed through Impala, Hive, or both.
+                <code class="ph codeph">show</code> can be used to gather information about objects such as databases, tables, and functions.
+                See <a class="xref" href="impala_show.html#show">SHOW Statement</a> for details.
+              </p>
+            </td>
+          </tr>
+          <tr class="row" id="shell_commands__source_cmd">
+            <td class="entry nocellnorowborder" headers="shell_commands__entry__1 ">
+              <p class="p">
+                <code class="ph codeph">source</code> or <code class="ph codeph">src</code>
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="shell_commands__entry__2 ">
+              <p class="p">
+                Executes one or more statements residing in a specified file from the local filesystem.
+                Allows you to perform the same kinds of batch operations as with the <code class="ph codeph">-f</code> option,
+                but interactively within the interpreter. The file can contain SQL statements and other
+                <span class="keyword cmdname">impala-shell</span> commands, including additional <code class="ph codeph">SOURCE</code> commands
+                to perform a flexible sequence of actions. Each command or statement, except the last one in the file,
+                must end with a semicolon.
+                See <a class="xref" href="impala_shell_running_commands.html#shell_running_commands">Running Commands and SQL Statements in impala-shell</a> for examples.
+              </p>
+            </td>
+          </tr>
+          <tr class="row" id="shell_commands__summary_cmd">
+            <td class="entry nocellnorowborder" headers="shell_commands__entry__1 ">
+              <p class="p">
+                <code class="ph codeph">summary</code>
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="shell_commands__entry__2 ">
+              <p class="p">
+                Summarizes the work performed in various stages of a query. It provides a higher-level view of the
+                information displayed by the <code class="ph codeph">EXPLAIN</code> command. Added in Impala 1.4.0. See
+                <a class="xref" href="impala_explain_plan.html#perf_summary">Using the SUMMARY Report for Performance Tuning</a> for details about the report format
+                and how to interpret it.
+              </p>
+              <p class="p">
+                In <span class="keyword">Impala 2.3</span> and higher, you can see a continuously updated report of
+                the summary information while a query is in progress.
+                See <a class="xref" href="impala_live_summary.html#live_summary">LIVE_SUMMARY Query Option (Impala 2.3 or higher only)</a> for details.
+              </p>
+            </td>
+          </tr>
+          <tr class="row" id="shell_commands__unset_cmd">
+            <td class="entry nocellnorowborder" headers="shell_commands__entry__1 ">
+              <p class="p">
+                <code class="ph codeph">unset</code>
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="shell_commands__entry__2 ">
+              <p class="p">
+                Removes any user-specified value for a query option and returns the option to its default value.
+                See <a class="xref" href="impala_query_options.html#query_options">Query Options for the SET Statement</a> for the available query options.
+              </p>
+              <p class="p">
+                In <span class="keyword">Impala 2.5</span> and higher, it can also remove user-specified substitution variables
+                using the notation <code class="ph codeph">UNSET VAR:<var class="keyword varname">variable_name</var></code>.
+              </p>
+            </td>
+          </tr>
+          <tr class="row" id="shell_commands__use_cmd">
+            <td class="entry nocellnorowborder" headers="shell_commands__entry__1 ">
+              <p class="p">
+                <code class="ph codeph">use</code>
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="shell_commands__entry__2 ">
+              <p class="p">
+                Indicates the database against which to execute subsequent commands. Lets you avoid using fully
+                qualified names when referring to tables in databases other than <code class="ph codeph">default</code>. See
+                <a class="xref" href="impala_use.html#use">USE Statement</a> for details. Not effective with the <code class="ph codeph">-q</code> option,
+                because that option only allows a single statement in the argument.
+              </p>
+            </td>
+          </tr>
+          <tr class="row" id="shell_commands__version_cmd">
+            <td class="entry nocellnorowborder" headers="shell_commands__entry__1 ">
+              <p class="p">
+                <code class="ph codeph">version</code>
+              </p>
+            </td>
+            <td class="entry nocellnorowborder" headers="shell_commands__entry__2 ">
+              <p class="p">
+                Returns Impala version information.
+              </p>
+            </td>
+          </tr>
+        </tbody></table>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_impala_shell.html">Using the Impala Shell (impala-shell Command)</a></div></div></nav></article></main></body></html>
\ No newline at end of file

[23/51] [partial] incubator-impala git commit: IMPALA-4181 [DOCS] Publish rendered Impala documentation to ASF site

Posted by jb...@apache.org.

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_math_functions.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_math_functions.html b/docs/build/html/topics/impala_math_functions.html
new file mode 100644
index 0000000..318dd56
--- /dev/null
+++ b/docs/build/html/topics/impala_math_functions.html
@@ -0,0 +1,1498 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_functions.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="math_functions"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Impala Mathematical Functions</title></head><body id="math_functions"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Impala Mathematical Functions</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      Mathematical functions, or arithmetic functions, perform numeric calculations that are typically more complex
+      than basic addition, subtraction, multiplication, and division. For example, these functions include
+      trigonometric, logarithmic, and base conversion operations.
+    </p>
+
+    <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+      In Impala, exponentiation uses the <code class="ph codeph">pow()</code> function rather than an exponentiation operator
+      such as <code class="ph codeph">**</code>.
+    </div>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+    <p class="p">
+      The mathematical functions operate mainly on these data types: <a class="xref" href="impala_int.html#int">INT Data Type</a>,
+      <a class="xref" href="impala_bigint.html#bigint">BIGINT Data Type</a>, <a class="xref" href="impala_smallint.html#smallint">SMALLINT Data Type</a>,
+      <a class="xref" href="impala_tinyint.html#tinyint">TINYINT Data Type</a>, <a class="xref" href="impala_double.html#double">DOUBLE Data Type</a>,
+      <a class="xref" href="impala_float.html#float">FLOAT Data Type</a>, and <a class="xref" href="impala_decimal.html#decimal">DECIMAL Data Type (Impala 1.4 or higher only)</a>. For the operators that
+      perform the standard operations such as addition, subtraction, multiplication, and division, see
+      <a class="xref" href="impala_operators.html#arithmetic_operators">Arithmetic Operators</a>.
+    </p>
+
+    <p class="p">
+      Functions that perform bitwise operations are explained in <a class="xref" href="impala_bit_functions.html#bit_functions">Impala Bit Functions</a>.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Function reference:</strong>
+    </p>
+
+    <p class="p">
+      Impala supports the following mathematical functions:
+    </p>
+
+    <dl class="dl">
+      
+
+        <dt class="dt dlterm" id="math_functions__abs">
+          <code class="ph codeph">abs(numeric_type a)</code>
+
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Returns the absolute value of the argument.
+          <p class="p">
+        <strong class="ph b">Return type:</strong> Same as the input value
+      </p>
+          <p class="p">
+            <strong class="ph b">Usage notes:</strong> Use this function to ensure all return values are positive. This is different than
+            the <code class="ph codeph">positive()</code> function, which returns its argument unchanged (even if the argument
+            was negative).
+          </p>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="math_functions__acos">
+          <code class="ph codeph">acos(double a)</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Returns the arccosine of the argument.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">double</code>
+          </p>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="math_functions__asin">
+          <code class="ph codeph">asin(double a)</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Returns the arcsine of the argument.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">double</code>
+          </p>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="math_functions__atan">
+          <code class="ph codeph">atan(double a)</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Returns the arctangent of the argument.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">double</code>
+          </p>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="math_functions__atan2">
+          <code class="ph codeph">atan2(double a, double b)</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Returns the arctangent of the two arguments, with the signs of the arguments used to determine the
+          quadrant of the result.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">double</code>
+          </p>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="math_functions__bin">
+          <code class="ph codeph">bin(bigint a)</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Returns the binary representation of an integer value, that is, a string of 0 and 1
+          digits.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+          </p>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="math_functions__ceil">
+          <code class="ph codeph">ceil(double a)</code>,
+          <code class="ph codeph">ceil(decimal(p,s) a)</code>,
+          <code class="ph codeph" id="math_functions__ceiling">ceiling(double a)</code>,
+          <code class="ph codeph">ceiling(decimal(p,s) a)</code>,
+          <code class="ph codeph" id="math_functions__dceil">dceil(double a)</code>,
+          <code class="ph codeph">dceil(decimal(p,s) a)</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Returns the smallest integer that is greater than or equal to the argument.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">int</code> or <code class="ph codeph">decimal(p,s)</code> depending on the type of the
+            input argument
+          </p>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="math_functions__conv">
+          <code class="ph codeph">conv(bigint num, int from_base, int to_base), conv(string num, int from_base, int
+          to_base)</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Returns a string representation of an integer value in a particular base. The input value
+          can be a string, for example to convert a hexadecimal number such as <code class="ph codeph">fce2</code> to decimal. To
+          use the return value as a number (for example, when converting to base 10), use <code class="ph codeph">CAST()</code>
+          to convert to the appropriate type.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+          </p>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="math_functions__cos">
+          <code class="ph codeph">cos(double a)</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Returns the cosine of the argument.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">double</code>
+          </p>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="math_functions__cosh">
+          <code class="ph codeph">cosh(double a)</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Returns the hyperbolic cosine of the argument.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">double</code>
+          </p>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="math_functions__cot">
+          <code class="ph codeph">cot(double a)</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Returns the cotangent of the argument.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">double</code>
+          </p>
+          <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.3.0</span>
+      </p>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="math_functions__degrees">
+          <code class="ph codeph">degrees(double a)</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Converts argument value from radians to degrees.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">double</code>
+          </p>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="math_functions__e">
+          <code class="ph codeph">e()</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Returns the
+          <a class="xref" href="https://en.wikipedia.org/wiki/E_(mathematical_constant" target="_blank">mathematical
+          constant e</a>.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">double</code>
+          </p>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="math_functions__exp">
+          <code class="ph codeph">exp(double a)</code>,
+          <code class="ph codeph" id="math_functions__dexp">dexp(double a)</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Returns the
+          <a class="xref" href="https://en.wikipedia.org/wiki/E_(mathematical_constant" target="_blank">mathematical
+          constant e</a> raised to the power of the argument.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">double</code>
+          </p>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="math_functions__factorial">
+          <code class="ph codeph">factorial(integer_type a)</code>
+        </dt>
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Computes the <a class="xref" href="https://en.wikipedia.org/wiki/Factorial" target="_blank">factorial</a> of an integer value.
+          It works with any integer type.
+          <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.3.0</span>
+      </p>
+          <p class="p">
+            <strong class="ph b">Usage notes:</strong> You can use either the <code class="ph codeph">factorial()</code> function or the <code class="ph codeph">!</code> operator.
+            The factorial of 0 is 1. Likewise, the <code class="ph codeph">factorial()</code> function returns 1 for any negative value.
+            The maximum positive value for the input argument is 20; a value of 21 or greater overflows the
+            range for a <code class="ph codeph">BIGINT</code> and causes an error.
+          </p>
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">bigint</code>
+          </p>
+          <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.3.0</span>
+      </p>
+<pre class="pre codeblock"><code>select factorial(5);
++--------------+
+| factorial(5) |
++--------------+
+| 120          |
++--------------+
+
+select 5!;
++-----+
+| 5!  |
++-----+
+| 120 |
++-----+
+
+select factorial(0);
++--------------+
+| factorial(0) |
++--------------+
+| 1            |
++--------------+
+
+select factorial(-100);
++-----------------+
+| factorial(-100) |
++-----------------+
+| 1               |
++-----------------+
+</code></pre>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="math_functions__floor">
+          <code class="ph codeph">floor(double a)</code>,
+          <code class="ph codeph">floor(decimal(p,s) a)</code>,
+          <code class="ph codeph" id="math_functions__dfloor">dfloor(double a)</code>,
+          <code class="ph codeph">dfloor(decimal(p,s) a)</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Returns the largest integer that is less than or equal to the argument.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">bigint</code> or <code class="ph codeph">decimal(p,s)</code> depending on the type of
+            the input argument
+          </p>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="math_functions__fmod">
+          <code class="ph codeph">fmod(double a, double b), fmod(float a, float b)</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Returns the modulus of a floating-point number. Equivalent to the <code class="ph codeph">%</code> arithmetic operator.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">float</code> or <code class="ph codeph">double</code>, depending on type of arguments
+          </p>
+          <p class="p">
+        <strong class="ph b">Added in:</strong> Impala 1.1.1
+      </p>
+          <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+          <p class="p">
+            Because this function operates on <code class="ph codeph">DOUBLE</code> or <code class="ph codeph">FLOAT</code>
+            values, it is subject to potential rounding errors for values that cannot be
+            represented precisely. Prefer to use whole numbers, or values that you know
+            can be represented precisely by the <code class="ph codeph">DOUBLE</code> or <code class="ph codeph">FLOAT</code>
+            types.
+          </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+          <p class="p">
+            The following examples show equivalent operations with the <code class="ph codeph">fmod()</code>
+            function and the <code class="ph codeph">%</code> arithmetic operator, for values not subject
+            to any rounding error.
+          </p>
+<pre class="pre codeblock"><code>select fmod(10,3);
++-------------+
+| fmod(10, 3) |
++-------------+
+| 1           |
++-------------+
+
+select fmod(5.5,2);
++--------------+
+| fmod(5.5, 2) |
++--------------+
+| 1.5          |
++--------------+
+
+select 10 % 3;
++--------+
+| 10 % 3 |
++--------+
+| 1      |
++--------+
+
+select 5.5 % 2;
++---------+
+| 5.5 % 2 |
++---------+
+| 1.5     |
++---------+
+</code></pre>
+          <p class="p">
+            The following examples show operations with the <code class="ph codeph">fmod()</code>
+            function for values that cannot be represented precisely by the
+            <code class="ph codeph">DOUBLE</code> or <code class="ph codeph">FLOAT</code> types, and thus are
+            subject to rounding error. <code class="ph codeph">fmod(9.9,3.0)</code> returns a value
+            slightly different than the expected 0.9 because of rounding.
+            <code class="ph codeph">fmod(9.9,3.3)</code> returns a value quite different from
+            the expected value of 0 because of rounding error during intermediate
+            calculations.
+          </p>
+<pre class="pre codeblock"><code>select fmod(9.9,3.0);
++--------------------+
+| fmod(9.9, 3.0)     |
++--------------------+
+| 0.8999996185302734 |
++--------------------+
+
+select fmod(9.9,3.3);
++-------------------+
+| fmod(9.9, 3.3)    |
++-------------------+
+| 3.299999713897705 |
++-------------------+
+</code></pre>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="math_functions__fnv_hash">
+          <code class="ph codeph">fnv_hash(type v)</code>,
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Returns a consistent 64-bit value derived from the input argument, for convenience of
+          implementing hashing logic in an application.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">BIGINT</code>
+          </p>
+          <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+          <p class="p">
+            You might use the return value in an application where you perform load balancing, bucketing, or some
+            other technique to divide processing or storage.
+          </p>
+          <p class="p">
+            Because the result can be any 64-bit value, to restrict the value to a particular range, you can use an
+            expression that includes the <code class="ph codeph">ABS()</code> function and the <code class="ph codeph">%</code> (modulo)
+            operator. For example, to produce a hash value in the range 0-9, you could use the expression
+            <code class="ph codeph">ABS(FNV_HASH(x)) % 10</code>.
+          </p>
+          <p class="p">
+            This function implements the same algorithm that Impala uses internally for hashing, on systems where
+            the CRC32 instructions are not available.
+          </p>
+          <p class="p">
+            This function implements the
+            <a class="xref" href="http://en.wikipedia.org/wiki/Fowler%E2%80%93Noll%E2%80%93Vo_hash_function" target="_blank">Fowler\u2013Noll\u2013Vo
+            hash function</a>, in particular the FNV-1a variation. This is not a perfect hash function: some
+            combinations of values could produce the same result value. It is not suitable for cryptographic use.
+          </p>
+          <p class="p">
+            Similar input values of different types could produce different hash values, for example the same
+            numeric value represented as <code class="ph codeph">SMALLINT</code> or <code class="ph codeph">BIGINT</code>,
+            <code class="ph codeph">FLOAT</code> or <code class="ph codeph">DOUBLE</code>, or <code class="ph codeph">DECIMAL(5,2)</code> or
+            <code class="ph codeph">DECIMAL(20,5)</code>.
+          </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+<pre class="pre codeblock"><code>[localhost:21000] &gt; create table h (x int, s string);
+[localhost:21000] &gt; insert into h values (0, 'hello'), (1,'world'), (1234567890,'antidisestablishmentarianism');
+[localhost:21000] &gt; select x, fnv_hash(x) from h;
++------------+----------------------+
+| x          | fnv_hash(x)          |
++------------+----------------------+
+| 0          | -2611523532599129963 |
+| 1          | 4307505193096137732  |
+| 1234567890 | 3614724209955230832  |
++------------+----------------------+
+[localhost:21000] &gt; select s, fnv_hash(s) from h;
++------------------------------+---------------------+
+| s                            | fnv_hash(s)         |
++------------------------------+---------------------+
+| hello                        | 6414202926103426347 |
+| world                        | 6535280128821139475 |
+| antidisestablishmentarianism | -209330013948433970 |
++------------------------------+---------------------+
+[localhost:21000] &gt; select s, abs(fnv_hash(s)) % 10 from h;
++------------------------------+-------------------------+
+| s                            | abs(fnv_hash(s)) % 10.0 |
++------------------------------+-------------------------+
+| hello                        | 8                       |
+| world                        | 6                       |
+| antidisestablishmentarianism | 4                       |
++------------------------------+-------------------------+</code></pre>
+          <p class="p">
+            For short argument values, the high-order bits of the result have relatively low entropy:
+          </p>
+<pre class="pre codeblock"><code>[localhost:21000] &gt; create table b (x boolean);
+[localhost:21000] &gt; insert into b values (true), (true), (false), (false);
+[localhost:21000] &gt; select x, fnv_hash(x) from b;
++-------+---------------------+
+| x     | fnv_hash(x)         |
++-------+---------------------+
+| true  | 2062020650953872396 |
+| true  | 2062020650953872396 |
+| false | 2062021750465500607 |
+| false | 2062021750465500607 |
++-------+---------------------+</code></pre>
+          <p class="p">
+            <strong class="ph b">Added in:</strong> Impala 1.2.2
+          </p>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="math_functions__greatest">
+          <code class="ph codeph">greatest(bigint a[, bigint b ...])</code>, <code class="ph codeph">greatest(double a[, double b ...])</code>,
+          <code class="ph codeph">greatest(decimal(p,s) a[, decimal(p,s) b ...])</code>, <code class="ph codeph">greatest(string a[, string b
+          ...])</code>, <code class="ph codeph">greatest(timestamp a[, timestamp b ...])</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Returns the largest value from a list of expressions.
+          <p class="p">
+        <strong class="ph b">Return type:</strong> same as the initial argument value, except that integer values are promoted to
+        <code class="ph codeph">BIGINT</code> and floating-point values are promoted to <code class="ph codeph">DOUBLE</code>; use
+        <code class="ph codeph">CAST()</code> when inserting into a smaller numeric column
+      </p>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="math_functions__hex">
+          <code class="ph codeph">hex(bigint a), hex(string a)</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Returns the hexadecimal representation of an integer value, or of the characters in a
+          string.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+          </p>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="math_functions__is_inf">
+          <code class="ph codeph">is_inf(double a)</code>,
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Tests whether a value is equal to the special value <span class="q">"inf"</span>, signifying infinity.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">boolean</code>
+          </p>
+          <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+          <p class="p">
+        Infinity and NaN can be specified in text data files as <code class="ph codeph">inf</code> and <code class="ph codeph">nan</code>
+        respectively, and Impala interprets them as these special values. They can also be produced by certain
+        arithmetic expressions; for example, <code class="ph codeph">pow(-1, 0.5)</code> returns <code class="ph codeph">Infinity</code> and
+        <code class="ph codeph">1/0</code> returns <code class="ph codeph">NaN</code>. Or you can cast the literal values, such as <code class="ph codeph">CAST('nan' AS
+        DOUBLE)</code> or <code class="ph codeph">CAST('inf' AS DOUBLE)</code>.
+      </p>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="math_functions__is_nan">
+          <code class="ph codeph">is_nan(double a)</code>,
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Tests whether a value is equal to the special value <span class="q">"NaN"</span>, signifying <span class="q">"not a
+          number"</span>.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">boolean</code>
+          </p>
+          <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+          <p class="p">
+        Infinity and NaN can be specified in text data files as <code class="ph codeph">inf</code> and <code class="ph codeph">nan</code>
+        respectively, and Impala interprets them as these special values. They can also be produced by certain
+        arithmetic expressions; for example, <code class="ph codeph">pow(-1, 0.5)</code> returns <code class="ph codeph">Infinity</code> and
+        <code class="ph codeph">1/0</code> returns <code class="ph codeph">NaN</code>. Or you can cast the literal values, such as <code class="ph codeph">CAST('nan' AS
+        DOUBLE)</code> or <code class="ph codeph">CAST('inf' AS DOUBLE)</code>.
+      </p>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="math_functions__least">
+          <code class="ph codeph">least(bigint a[, bigint b ...])</code>, <code class="ph codeph">least(double a[, double b ...])</code>,
+          <code class="ph codeph">least(decimal(p,s) a[, decimal(p,s) b ...])</code>, <code class="ph codeph">least(string a[, string b
+          ...])</code>, <code class="ph codeph">least(timestamp a[, timestamp b ...])</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Returns the smallest value from a list of expressions.
+          <p class="p">
+        <strong class="ph b">Return type:</strong> same as the initial argument value, except that integer values are promoted to
+        <code class="ph codeph">BIGINT</code> and floating-point values are promoted to <code class="ph codeph">DOUBLE</code>; use
+        <code class="ph codeph">CAST()</code> when inserting into a smaller numeric column
+      </p>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="math_functions__ln">
+          <code class="ph codeph">ln(double a)</code>,
+          <code class="ph codeph" id="math_functions__dlog1">dlog1(double a)</code>
+        </dt>
+
+        <dd class="dd">
+          
+          
+          <strong class="ph b">Purpose:</strong> Returns the
+          <a class="xref" href="https://en.wikipedia.org/wiki/Natural_logarithm" target="_blank">natural
+          logarithm</a> of the argument.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">double</code>
+          </p>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="math_functions__log">
+          <code class="ph codeph">log(double base, double a)</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Returns the logarithm of the second argument to the specified base.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">double</code>
+          </p>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="math_functions__log10">
+          <code class="ph codeph">log10(double a)</code>,
+          <code class="ph codeph" id="math_functions__dlog10">dlog10(double a)</code>
+        </dt>
+
+        <dd class="dd">
+          
+          
+          <strong class="ph b">Purpose:</strong> Returns the logarithm of the argument to the base 10.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">double</code>
+          </p>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="math_functions__log2">
+          <code class="ph codeph">log2(double a)</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Returns the logarithm of the argument to the base 2.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">double</code>
+          </p>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="math_functions__max_int">
+          <code class="ph codeph">max_int(), <span class="ph" id="math_functions__max_tinyint">max_tinyint()</span>, <span class="ph" id="math_functions__max_smallint">max_smallint()</span>,
+          <span class="ph" id="math_functions__max_bigint">max_bigint()</span></code>
+        </dt>
+
+        <dd class="dd">
+          
+          
+          
+          
+          <strong class="ph b">Purpose:</strong> Returns the largest value of the associated integral type.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> The same as the integral type being checked.
+          </p>
+          <p class="p">
+
+            <strong class="ph b">Usage notes:</strong> Use the corresponding <code class="ph codeph">min_</code> and <code class="ph codeph">max_</code> functions to
+            check if all values in a column are within the allowed range, before copying data or altering column
+            definitions. If not, switch to the next higher integral type or to a <code class="ph codeph">DECIMAL</code> with
+            sufficient precision.
+          </p>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="math_functions__min_int">
+          <code class="ph codeph">min_int(), <span class="ph" id="math_functions__min_tinyint">min_tinyint()</span>, <span class="ph" id="math_functions__min_smallint">min_smallint()</span>,
+          <span class="ph" id="math_functions__min_bigint">min_bigint()</span></code>
+        </dt>
+
+        <dd class="dd">
+          
+          
+          
+          
+          <strong class="ph b">Purpose:</strong> Returns the smallest value of the associated integral type (a negative number).
+          <p class="p">
+            <strong class="ph b">Return type:</strong> The same as the integral type being checked.
+          </p>
+          <p class="p">
+            <strong class="ph b">Usage notes:</strong> Use the corresponding <code class="ph codeph">min_</code> and <code class="ph codeph">max_</code> functions to
+            check if all values in a column are within the allowed range, before copying data or altering column
+            definitions. If not, switch to the next higher integral type or to a <code class="ph codeph">DECIMAL</code> with
+            sufficient precision.
+          </p>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="math_functions__mod">
+          <code class="ph codeph">mod(<var class="keyword varname">numeric_type</var> a, <var class="keyword varname">same_type</var> b)</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Returns the modulus of a number. Equivalent to the <code class="ph codeph">%</code> arithmetic operator.
+          Works with any size integer type, any size floating-point type, and <code class="ph codeph">DECIMAL</code>
+          with any precision and scale.
+          <p class="p">
+        <strong class="ph b">Return type:</strong> Same as the input value
+      </p>
+          <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.2.0</span>
+      </p>
+          <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+          <p class="p">
+            Because this function works with <code class="ph codeph">DECIMAL</code> values, prefer it over <code class="ph codeph">fmod()</code>
+            when working with fractional values. It is not subject to the rounding errors that make
+            <code class="ph codeph">fmod()</code> problematic with floating-point numbers.
+            The <code class="ph codeph">%</code> arithmetic operator now uses the <code class="ph codeph">mod()</code> function
+            in cases where its arguments can be interpreted as <code class="ph codeph">DECIMAL</code> values,
+            increasing the accuracy of that operator.
+          </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+          <p class="p">
+            The following examples show how the <code class="ph codeph">mod()</code> function works for
+            whole numbers and fractional values, and how the <code class="ph codeph">%</code> operator
+            works the same way. In the case of <code class="ph codeph">mod(9.9,3)</code>,
+            the type conversion for the second argument results in the first argument
+            being interpreted as <code class="ph codeph">DOUBLE</code>, so to produce an accurate
+            <code class="ph codeph">DECIMAL</code> result requires casting the second argument
+            or writing it as a <code class="ph codeph">DECIMAL</code> literal, 3.0.
+          </p>
+<pre class="pre codeblock"><code>select mod(10,3);
++-------------+
+| fmod(10, 3) |
++-------------+
+| 1           |
++-------------+
+
+select mod(5.5,2);
++--------------+
+| fmod(5.5, 2) |
++--------------+
+| 1.5          |
++--------------+
+
+select 10 % 3;
++--------+
+| 10 % 3 |
++--------+
+| 1      |
++--------+
+
+select 5.5 % 2;
++---------+
+| 5.5 % 2 |
++---------+
+| 1.5     |
++---------+
+
+select mod(9.9,3.3);
++---------------+
+| mod(9.9, 3.3) |
++---------------+
+| 0.0           |
++---------------+
+
+select mod(9.9,3);
++--------------------+
+| mod(9.9, 3)        |
++--------------------+
+| 0.8999996185302734 |
++--------------------+
+
+select mod(9.9, cast(3 as decimal(2,1)));
++-----------------------------------+
+| mod(9.9, cast(3 as decimal(2,1))) |
++-----------------------------------+
+| 0.9                               |
++-----------------------------------+
+
+select mod(9.9,3.0);
++---------------+
+| mod(9.9, 3.0) |
++---------------+
+| 0.9           |
++---------------+
+</code></pre>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="math_functions__negative">
+          <code class="ph codeph">negative(numeric_type a)</code>
+
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Returns the argument with the sign reversed; returns a positive value if the argument was
+          already negative.
+          <p class="p">
+        <strong class="ph b">Return type:</strong> Same as the input value
+      </p>
+
+          <p class="p">
+            <strong class="ph b">Usage notes:</strong> Use <code class="ph codeph">-abs(a)</code> instead if you need to ensure all return values are
+            negative.
+          </p>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="math_functions__pi">
+          <code class="ph codeph">pi()</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Returns the constant pi.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">double</code>
+          </p>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="math_functions__pmod">
+          <code class="ph codeph">pmod(bigint a, bigint b), pmod(double a, double b)</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Returns the positive modulus of a number.
+          Primarily for <a class="xref" href="https://issues.apache.org/jira/browse/HIVE-656" target="_blank">HiveQL compatibility</a>.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">int</code> or <code class="ph codeph">double</code>, depending on type of arguments
+          </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+          <p class="p">
+            The following examples show how the <code class="ph codeph">fmod()</code> function sometimes returns a negative value
+            depending on the sign of its arguments, and the <code class="ph codeph">pmod()</code> function returns the same value
+            as <code class="ph codeph">fmod()</code>, but sometimes with the sign flipped.
+          </p>
+<pre class="pre codeblock"><code>select fmod(-5,2);
++-------------+
+| fmod(-5, 2) |
++-------------+
+| -1          |
++-------------+
+
+select pmod(-5,2);
++-------------+
+| pmod(-5, 2) |
++-------------+
+| 1           |
++-------------+
+
+select fmod(-5,-2);
++--------------+
+| fmod(-5, -2) |
++--------------+
+| -1           |
++--------------+
+
+select pmod(-5,-2);
++--------------+
+| pmod(-5, -2) |
++--------------+
+| -1           |
++--------------+
+
+select fmod(5,-2);
++-------------+
+| fmod(5, -2) |
++-------------+
+| 1           |
++-------------+
+
+select pmod(5,-2);
++-------------+
+| pmod(5, -2) |
++-------------+
+| -1          |
++-------------+
+</code></pre>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="math_functions__positive">
+          <code class="ph codeph">positive(numeric_type a)</code>
+
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Returns the original argument unchanged (even if the argument is negative).
+          <p class="p">
+        <strong class="ph b">Return type:</strong> Same as the input value
+      </p>
+
+          <p class="p">
+            <strong class="ph b">Usage notes:</strong> Use <code class="ph codeph">abs()</code> instead if you need to ensure all return values are
+            positive.
+          </p>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="math_functions__pow">
+          <code class="ph codeph">pow(double a, double p)</code>,
+          <code class="ph codeph" id="math_functions__power">power(double a, double p)</code>,
+          <code class="ph codeph" id="math_functions__dpow">dpow(double a, double p)</code>,
+          <code class="ph codeph" id="math_functions__fpow">fpow(double a, double p)</code>
+        </dt>
+
+        <dd class="dd">
+          
+          
+          
+          
+          <strong class="ph b">Purpose:</strong> Returns the first argument raised to the power of the second argument.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">double</code>
+          </p>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="math_functions__precision">
+          <code class="ph codeph">precision(<var class="keyword varname">numeric_expression</var>)</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Computes the precision (number of decimal digits) needed to represent the type of the
+          argument expression as a <code class="ph codeph">DECIMAL</code> value.
+          <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+          <p class="p">
+            Typically used in combination with the <code class="ph codeph">scale()</code> function, to determine the appropriate
+            <code class="ph codeph">DECIMAL(<var class="keyword varname">precision</var>,<var class="keyword varname">scale</var>)</code> type to declare in a
+            <code class="ph codeph">CREATE TABLE</code> statement or <code class="ph codeph">CAST()</code> function.
+          </p>
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">int</code>
+          </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+          <div class="p">
+        The following examples demonstrate how to check the precision and scale of numeric literals or other
+        numeric expressions. Impala represents numeric literals in the smallest appropriate type. 5 is a
+        <code class="ph codeph">TINYINT</code> value, which ranges from -128 to 127, therefore 3 decimal digits are needed to
+        represent the entire range, and because it is an integer value there are no fractional digits. 1.333 is
+        interpreted as a <code class="ph codeph">DECIMAL</code> value, with 4 digits total and 3 digits after the decimal point.
+<pre class="pre codeblock"><code>[localhost:21000] &gt; select precision(5), scale(5);
++--------------+----------+
+| precision(5) | scale(5) |
++--------------+----------+
+| 3            | 0        |
++--------------+----------+
+[localhost:21000] &gt; select precision(1.333), scale(1.333);
++------------------+--------------+
+| precision(1.333) | scale(1.333) |
++------------------+--------------+
+| 4                | 3            |
++------------------+--------------+
+[localhost:21000] &gt; with t1 as
+  ( select cast(12.34 as decimal(20,2)) x union select cast(1 as decimal(8,6)) x )
+  select precision(x), scale(x) from t1 limit 1;
++--------------+----------+
+| precision(x) | scale(x) |
++--------------+----------+
+| 24           | 6        |
++--------------+----------+
+</code></pre>
+      </div>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="math_functions__quotient">
+          <code class="ph codeph">quotient(bigint numerator, bigint denominator)</code>,
+          <code class="ph codeph">quotient(double numerator, double denominator)</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Returns the first argument divided by the second argument, discarding any fractional
+          part. Avoids promoting integer arguments to <code class="ph codeph">DOUBLE</code> as happens with the <code class="ph codeph">/</code> SQL
+          operator. <span class="ph">Also includes an overload that accepts <code class="ph codeph">DOUBLE</code> arguments,
+          discards the fractional part of each argument value before dividing, and again returns <code class="ph codeph">BIGINT</code>.
+          With integer arguments, this function works the same as the <code class="ph codeph">DIV</code> operator.</span>
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">bigint</code>
+          </p>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="math_functions__radians">
+          <code class="ph codeph">radians(double a)</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Converts argument value from degrees to radians.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">double</code>
+          </p>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="math_functions__rand">
+          <code class="ph codeph">rand()</code>, <code class="ph codeph">rand(int seed)</code>,
+          <code class="ph codeph" id="math_functions__random">random()</code>,
+          <code class="ph codeph">random(int seed)</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Returns a random value between 0 and 1. After <code class="ph codeph">rand()</code> is called with a
+          seed argument, it produces a consistent random sequence based on the seed value.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">double</code>
+          </p>
+          <p class="p">
+            <strong class="ph b">Usage notes:</strong> Currently, the random sequence is reset after each query, and multiple calls to
+            <code class="ph codeph">rand()</code> within the same query return the same value each time. For different number
+            sequences that are different for each query, pass a unique seed value to each call to
+            <code class="ph codeph">rand()</code>. For example, <code class="ph codeph">select rand(unix_timestamp()) from ...</code>
+          </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+          <p class="p">
+            The following examples show how <code class="ph codeph">rand()</code> can produce sequences of varying predictability,
+            so that you can reproduce query results involving random values or generate unique sequences of random
+            values for each query.
+            When <code class="ph codeph">rand()</code> is called with no argument, it generates the same sequence of values each time,
+            regardless of the ordering of the result set.
+            When <code class="ph codeph">rand()</code> is called with a constant integer, it generates a different sequence of values,
+            but still always the same sequence for the same seed value.
+            If you pass in a seed value that changes, such as the return value of the expression <code class="ph codeph">unix_timestamp(now())</code>,
+            each query will use a different sequence of random values, potentially more useful in probability calculations although
+            more difficult to reproduce at a later time. Therefore, the final two examples with an unpredictable seed value
+            also include the seed in the result set, to make it possible to reproduce the same random sequence later.
+          </p>
+<pre class="pre codeblock"><code>select x, rand() from three_rows;
++---+-----------------------+
+| x | rand()                |
++---+-----------------------+
+| 1 | 0.0004714746030380365 |
+| 2 | 0.5895895192351144    |
+| 3 | 0.4431900859080209    |
++---+-----------------------+
+
+select x, rand() from three_rows order by x desc;
++---+-----------------------+
+| x | rand()                |
++---+-----------------------+
+| 3 | 0.0004714746030380365 |
+| 2 | 0.5895895192351144    |
+| 1 | 0.4431900859080209    |
++---+-----------------------+
+
+select x, rand(1234) from three_rows order by x;
++---+----------------------+
+| x | rand(1234)           |
++---+----------------------+
+| 1 | 0.7377511392057646   |
+| 2 | 0.009428468537250751 |
+| 3 | 0.208117277924026    |
++---+----------------------+
+
+select x, rand(1234) from three_rows order by x desc;
++---+----------------------+
+| x | rand(1234)           |
++---+----------------------+
+| 3 | 0.7377511392057646   |
+| 2 | 0.009428468537250751 |
+| 1 | 0.208117277924026    |
++---+----------------------+
+
+select x, unix_timestamp(now()), rand(unix_timestamp(now()))
+  from three_rows order by x;
++---+-----------------------+-----------------------------+
+| x | unix_timestamp(now()) | rand(unix_timestamp(now())) |
++---+-----------------------+-----------------------------+
+| 1 | 1440777752            | 0.002051228658320023        |
+| 2 | 1440777752            | 0.5098743483004506          |
+| 3 | 1440777752            | 0.9517714925817081          |
++---+-----------------------+-----------------------------+
+
+select x, unix_timestamp(now()), rand(unix_timestamp(now()))
+  from three_rows order by x desc;
++---+-----------------------+-----------------------------+
+| x | unix_timestamp(now()) | rand(unix_timestamp(now())) |
++---+-----------------------+-----------------------------+
+| 3 | 1440777761            | 0.9985985015512437          |
+| 2 | 1440777761            | 0.3251255333074953          |
+| 1 | 1440777761            | 0.02422675025846192         |
++---+-----------------------+-----------------------------+
+</code></pre>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="math_functions__round">
+          <code class="ph codeph">round(double a)</code>,
+          <code class="ph codeph">round(double a, int d)</code>,
+          <code class="ph codeph">round(decimal a, int_type d)</code>,
+          <code class="ph codeph" id="math_functions__dround">dround(double a)</code>,
+          <code class="ph codeph">dround(double a, int d)</code>,
+          <code class="ph codeph">dround(decimal(p,s) a, int_type d)</code>
+        </dt>
+
+        <dd class="dd">
+          
+          
+          <strong class="ph b">Purpose:</strong> Rounds a floating-point value. By default (with a single argument), rounds to the nearest
+          integer. Values ending in .5 are rounded up for positive numbers, down for negative numbers (that is,
+          away from zero). The optional second argument specifies how many digits to leave after the decimal point;
+          values greater than zero produce a floating-point return value rounded to the requested number of digits
+          to the right of the decimal point.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">bigint</code> for single <code class="ph codeph">double</code> argument.
+            <code class="ph codeph">double</code> for two-argument signature when second argument greater than zero.
+            For <code class="ph codeph">DECIMAL</code> values, the smallest
+            <code class="ph codeph">DECIMAL(<var class="keyword varname">p</var>,<var class="keyword varname">s</var>)</code> type with appropriate precision and
+            scale.
+          </p>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="math_functions__scale">
+          <code class="ph codeph">scale(<var class="keyword varname">numeric_expression</var>)</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Computes the scale (number of decimal digits to the right of the decimal point) needed to
+          represent the type of the argument expression as a <code class="ph codeph">DECIMAL</code> value.
+          <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+          <p class="p">
+            Typically used in combination with the <code class="ph codeph">precision()</code> function, to determine the
+            appropriate <code class="ph codeph">DECIMAL(<var class="keyword varname">precision</var>,<var class="keyword varname">scale</var>)</code> type to
+            declare in a <code class="ph codeph">CREATE TABLE</code> statement or <code class="ph codeph">CAST()</code> function.
+          </p>
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">int</code>
+          </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+          <div class="p">
+        The following examples demonstrate how to check the precision and scale of numeric literals or other
+        numeric expressions. Impala represents numeric literals in the smallest appropriate type. 5 is a
+        <code class="ph codeph">TINYINT</code> value, which ranges from -128 to 127, therefore 3 decimal digits are needed to
+        represent the entire range, and because it is an integer value there are no fractional digits. 1.333 is
+        interpreted as a <code class="ph codeph">DECIMAL</code> value, with 4 digits total and 3 digits after the decimal point.
+<pre class="pre codeblock"><code>[localhost:21000] &gt; select precision(5), scale(5);
++--------------+----------+
+| precision(5) | scale(5) |
++--------------+----------+
+| 3            | 0        |
++--------------+----------+
+[localhost:21000] &gt; select precision(1.333), scale(1.333);
++------------------+--------------+
+| precision(1.333) | scale(1.333) |
++------------------+--------------+
+| 4                | 3            |
++------------------+--------------+
+[localhost:21000] &gt; with t1 as
+  ( select cast(12.34 as decimal(20,2)) x union select cast(1 as decimal(8,6)) x )
+  select precision(x), scale(x) from t1 limit 1;
++--------------+----------+
+| precision(x) | scale(x) |
++--------------+----------+
+| 24           | 6        |
++--------------+----------+
+</code></pre>
+      </div>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="math_functions__sign">
+          <code class="ph codeph">sign(double a)</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Returns -1, 0, or 1 to indicate the signedness of the argument value.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">int</code>
+          </p>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="math_functions__sin">
+          <code class="ph codeph">sin(double a)</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Returns the sine of the argument.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">double</code>
+          </p>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="math_functions__sinh">
+          <code class="ph codeph">sinh(double a)</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Returns the hyperbolic sine of the argument.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">double</code>
+          </p>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="math_functions__sqrt">
+          <code class="ph codeph">sqrt(double a)</code>,
+          <code class="ph codeph" id="math_functions__dsqrt">dsqrt(double a)</code>
+        </dt>
+
+        <dd class="dd">
+          
+          
+          <strong class="ph b">Purpose:</strong> Returns the square root of the argument.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">double</code>
+          </p>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="math_functions__tan">
+          <code class="ph codeph">tan(double a)</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Returns the tangent of the argument.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">double</code>
+          </p>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="math_functions__tanh">
+          <code class="ph codeph">tanh(double a)</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Returns the hyperbolic tangent of the argument.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">double</code>
+          </p>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="math_functions__truncate">
+          <code class="ph codeph">truncate(double_or_decimal a[, digits_to_leave])</code>,
+          <span class="ph" id="math_functions__dtrunc"><code class="ph codeph">dtrunc(double_or_decimal a[, digits_to_leave])</code></span>
+        </dt>
+
+        <dd class="dd">
+          
+          
+          <strong class="ph b">Purpose:</strong> Removes some or all fractional digits from a numeric value.
+          With no argument, removes all fractional digits, leaving an integer value.
+          The optional argument specifies the number of fractional digits to include
+          in the return value, and only applies with the argument type is <code class="ph codeph">DECIMAL</code>.
+          <code class="ph codeph">truncate()</code> and <code class="ph codeph">dtrunc()</code> are aliases for the same function.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">decimal</code> for <code class="ph codeph">DECIMAL</code> arguments;
+            <code class="ph codeph">bigint</code> for <code class="ph codeph">DOUBLE</code> arguments
+          </p>
+          <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+<pre class="pre codeblock"><code>select truncate(3.45)
++----------------+
+| truncate(3.45) |
++----------------+
+| 3              |
++----------------+
+
+select truncate(-3.45)
++-----------------+
+| truncate(-3.45) |
++-----------------+
+| -3              |
++-----------------+
+
+select truncate(3.456,1)
++--------------------+
+| truncate(3.456, 1) |
++--------------------+
+| 3.4                |
++--------------------+
+
+select dtrunc(3.456,1)
++------------------+
+| dtrunc(3.456, 1) |
++------------------+
+| 3.4              |
++------------------+
+
+select truncate(3.456,2)
++--------------------+
+| truncate(3.456, 2) |
++--------------------+
+| 3.45               |
++--------------------+
+
+select truncate(3.456,7)
++--------------------+
+| truncate(3.456, 7) |
++--------------------+
+| 3.4560000          |
++--------------------+
+</code></pre>
+        </dd>
+
+      
+
+      
+
+        <dt class="dt dlterm" id="math_functions__unhex">
+          <code class="ph codeph">unhex(string a)</code>
+        </dt>
+
+        <dd class="dd">
+          
+          <strong class="ph b">Purpose:</strong> Returns a string of characters with ASCII values corresponding to pairs of hexadecimal
+          digits in the argument.
+          <p class="p">
+            <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+          </p>
+        </dd>
+
+      
+    </dl>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_functions.html">Impala Built-In Functions</a></div></div></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_max.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_max.html b/docs/build/html/topics/impala_max.html
new file mode 100644
index 0000000..fd3d74c
--- /dev/null
+++ b/docs/build/html/topics/impala_max.html
@@ -0,0 +1,298 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_aggregate_functions.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="max"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>MAX Function</title></head><body id="max"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">MAX Function</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      
+      An aggregate function that returns the maximum value from a set of numbers. Opposite of the
+      <code class="ph codeph">MIN</code> function. Its single argument can be numeric column, or the numeric result of a function
+      or expression applied to the column value. Rows with a <code class="ph codeph">NULL</code> value for the specified column
+      are ignored. If the table is empty, or all the values supplied to <code class="ph codeph">MAX</code> are
+      <code class="ph codeph">NULL</code>, <code class="ph codeph">MAX</code> returns <code class="ph codeph">NULL</code>.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>MAX([DISTINCT | ALL] <var class="keyword varname">expression</var>) [OVER (<var class="keyword varname">analytic_clause</var>)]</code></pre>
+
+    <p class="p">
+      When the query contains a <code class="ph codeph">GROUP BY</code> clause, returns one value for each combination of
+      grouping values.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Restrictions:</strong> In Impala 2.0 and higher, this function can be used as an analytic function, but with restrictions on any window clause.
+        For <code class="ph codeph">MAX()</code> and <code class="ph codeph">MIN()</code>, the window clause is only allowed if the start
+        bound is <code class="ph codeph">UNBOUNDED PRECEDING</code>.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Return type:</strong> Same as the input value, except for <code class="ph codeph">CHAR</code> and <code class="ph codeph">VARCHAR</code>
+        arguments which produce a <code class="ph codeph">STRING</code> result
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+    <p class="p">
+        If you frequently run aggregate functions such as <code class="ph codeph">MIN()</code>, <code class="ph codeph">MAX()</code>, and
+        <code class="ph codeph">COUNT(DISTINCT)</code> on partition key columns, consider enabling the <code class="ph codeph">OPTIMIZE_PARTITION_KEY_SCANS</code>
+        query option, which optimizes such queries. This feature is available in <span class="keyword">Impala 2.5</span> and higher.
+        See <a class="xref" href="../shared/../topics/impala_optimize_partition_key_scans.html">OPTIMIZE_PARTITION_KEY_SCANS Query Option (Impala 2.5 or higher only)</a>
+        for the kinds of queries that this option applies to, and slight differences in how partitions are
+        evaluated when this query option is enabled.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Complex type considerations:</strong>
+      </p>
+
+    <p class="p">
+        To access a column with a complex type (<code class="ph codeph">ARRAY</code>, <code class="ph codeph">STRUCT</code>, or <code class="ph codeph">MAP</code>)
+        in an aggregation function, you unpack the individual elements using join notation in the query,
+        and then apply the function to the final scalar item, field, key, or value at the bottom of any nested type hierarchy in the column.
+        See <a class="xref" href="../shared/../topics/impala_complex_types.html#complex_types">Complex Types (Impala 2.3 or higher only)</a> for details about using complex types in Impala.
+      </p>
+
+    <div class="p">
+The following example demonstrates calls to several aggregation functions
+using values from a column containing nested complex types
+(an <code class="ph codeph">ARRAY</code> of <code class="ph codeph">STRUCT</code> items).
+The array is unpacked inside the query using join notation.
+The array elements are referenced using the <code class="ph codeph">ITEM</code>
+pseudocolumn, and the structure fields inside the array elements
+are referenced using dot notation.
+Numeric values such as <code class="ph codeph">SUM()</code> and <code class="ph codeph">AVG()</code>
+are computed using the numeric <code class="ph codeph">R_NATIONKEY</code> field, and
+the general-purpose <code class="ph codeph">MAX()</code> and <code class="ph codeph">MIN()</code>
+values are computed from the string <code class="ph codeph">N_NAME</code> field.
+<pre class="pre codeblock"><code>describe region;
++-------------+-------------------------+---------+
+| name        | type                    | comment |
++-------------+-------------------------+---------+
+| r_regionkey | smallint                |         |
+| r_name      | string                  |         |
+| r_comment   | string                  |         |
+| r_nations   | array&lt;struct&lt;           |         |
+|             |   n_nationkey:smallint, |         |
+|             |   n_name:string,        |         |
+|             |   n_comment:string      |         |
+|             | &gt;&gt;                      |         |
++-------------+-------------------------+---------+
+
+select r_name, r_nations.item.n_nationkey
+  from region, region.r_nations as r_nations
+order by r_name, r_nations.item.n_nationkey;
++-------------+------------------+
+| r_name      | item.n_nationkey |
++-------------+------------------+
+| AFRICA      | 0                |
+| AFRICA      | 5                |
+| AFRICA      | 14               |
+| AFRICA      | 15               |
+| AFRICA      | 16               |
+| AMERICA     | 1                |
+| AMERICA     | 2                |
+| AMERICA     | 3                |
+| AMERICA     | 17               |
+| AMERICA     | 24               |
+| ASIA        | 8                |
+| ASIA        | 9                |
+| ASIA        | 12               |
+| ASIA        | 18               |
+| ASIA        | 21               |
+| EUROPE      | 6                |
+| EUROPE      | 7                |
+| EUROPE      | 19               |
+| EUROPE      | 22               |
+| EUROPE      | 23               |
+| MIDDLE EAST | 4                |
+| MIDDLE EAST | 10               |
+| MIDDLE EAST | 11               |
+| MIDDLE EAST | 13               |
+| MIDDLE EAST | 20               |
++-------------+------------------+
+
+select
+  r_name,
+  count(r_nations.item.n_nationkey) as count,
+  sum(r_nations.item.n_nationkey) as sum,
+  avg(r_nations.item.n_nationkey) as avg,
+  min(r_nations.item.n_name) as minimum,
+  max(r_nations.item.n_name) as maximum,
+  ndv(r_nations.item.n_nationkey) as distinct_vals
+from
+  region, region.r_nations as r_nations
+group by r_name
+order by r_name;
++-------------+-------+-----+------+-----------+----------------+---------------+
+| r_name      | count | sum | avg  | minimum   | maximum        | distinct_vals |
++-------------+-------+-----+------+-----------+----------------+---------------+
+| AFRICA      | 5     | 50  | 10   | ALGERIA   | MOZAMBIQUE     | 5             |
+| AMERICA     | 5     | 47  | 9.4  | ARGENTINA | UNITED STATES  | 5             |
+| ASIA        | 5     | 68  | 13.6 | CHINA     | VIETNAM        | 5             |
+| EUROPE      | 5     | 77  | 15.4 | FRANCE    | UNITED KINGDOM | 5             |
+| MIDDLE EAST | 5     | 58  | 11.6 | EGYPT     | SAUDI ARABIA   | 5             |
++-------------+-------+-----+------+-----------+----------------+---------------+
+</code></pre>
+</div>
+
+    <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>-- Find the largest value for this column in the table.
+select max(c1) from t1;
+-- Find the largest value for this column from a subset of the table.
+select max(c1) from t1 where month = 'January' and year = '2013';
+-- Find the largest value from a set of numeric function results.
+select max(length(s)) from t1;
+-- Can also be used in combination with DISTINCT and/or GROUP BY.
+-- Return more than one result.
+select month, year, max(purchase_price) from store_stats group by month, year;
+-- Filter the input to eliminate duplicates before performing the calculation.
+select max(distinct x) from t1;
+</code></pre>
+
+    <div class="p">
+      The following examples show how to use <code class="ph codeph">MAX()</code> in an analytic context. They use a table
+      containing integers from 1 to 10. Notice how the <code class="ph codeph">MAX()</code> is reported for each input value, as
+      opposed to the <code class="ph codeph">GROUP BY</code> clause which condenses the result set.
+<pre class="pre codeblock"><code>select x, property, max(x) over (partition by property) as max from int_t where property in ('odd','even');
++----+----------+-----+
+| x  | property | max |
++----+----------+-----+
+| 2  | even     | 10  |
+| 4  | even     | 10  |
+| 6  | even     | 10  |
+| 8  | even     | 10  |
+| 10 | even     | 10  |
+| 1  | odd      | 9   |
+| 3  | odd      | 9   |
+| 5  | odd      | 9   |
+| 7  | odd      | 9   |
+| 9  | odd      | 9   |
++----+----------+-----+
+</code></pre>
+
+Adding an <code class="ph codeph">ORDER BY</code> clause lets you experiment with results that are cumulative or apply to a moving
+set of rows (the <span class="q">"window"</span>). The following examples use <code class="ph codeph">MAX()</code> in an analytic context
+(that is, with an <code class="ph codeph">OVER()</code> clause) to display the smallest value of <code class="ph codeph">X</code>
+encountered up to each row in the result set. The examples use two columns in the <code class="ph codeph">ORDER BY</code>
+clause to produce a sequence of values that rises and falls, to illustrate how the <code class="ph codeph">MAX()</code>
+result only increases or stays the same throughout each partition within the result set.
+The basic <code class="ph codeph">ORDER BY x</code> clause implicitly
+activates a window clause of <code class="ph codeph">RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW</code>,
+which is effectively the same as <code class="ph codeph">ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW</code>,
+therefore all of these examples produce the same results:
+
+<pre class="pre codeblock"><code>select x, property,
+  max(x) <strong class="ph b">over (order by property, x desc)</strong> as 'maximum to this point'
+from int_t where property in ('prime','square');
++---+----------+-----------------------+
+| x | property | maximum to this point |
++---+----------+-----------------------+
+| 7 | prime    | 7                     |
+| 5 | prime    | 7                     |
+| 3 | prime    | 7                     |
+| 2 | prime    | 7                     |
+| 9 | square   | 9                     |
+| 4 | square   | 9                     |
+| 1 | square   | 9                     |
++---+----------+-----------------------+
+
+select x, property,
+  max(x) over
+  (
+    <strong class="ph b">order by property, x desc</strong>
+    <strong class="ph b">rows between unbounded preceding and current row</strong>
+  ) as 'maximum to this point'
+from int_t where property in ('prime','square');
++---+----------+-----------------------+
+| x | property | maximum to this point |
++---+----------+-----------------------+
+| 7 | prime    | 7                     |
+| 5 | prime    | 7                     |
+| 3 | prime    | 7                     |
+| 2 | prime    | 7                     |
+| 9 | square   | 9                     |
+| 4 | square   | 9                     |
+| 1 | square   | 9                     |
++---+----------+-----------------------+
+
+select x, property,
+  max(x) over
+  (
+    <strong class="ph b">order by property, x desc</strong>
+    <strong class="ph b">range between unbounded preceding and current row</strong>
+  ) as 'maximum to this point'
+from int_t where property in ('prime','square');
++---+----------+-----------------------+
+| x | property | maximum to this point |
++---+----------+-----------------------+
+| 7 | prime    | 7                     |
+| 5 | prime    | 7                     |
+| 3 | prime    | 7                     |
+| 2 | prime    | 7                     |
+| 9 | square   | 9                     |
+| 4 | square   | 9                     |
+| 1 | square   | 9                     |
++---+----------+-----------------------+
+</code></pre>
+
+The following examples show how to construct a moving window, with a running maximum taking into account all rows before
+and 1 row after the current row.
+Because of a restriction in the Impala <code class="ph codeph">RANGE</code> syntax, this type of
+moving window is possible with the <code class="ph codeph">ROWS BETWEEN</code> clause but not the <code class="ph codeph">RANGE BETWEEN</code> clause.
+Because of an extra Impala restriction on the <code class="ph codeph">MAX()</code> and <code class="ph codeph">MIN()</code> functions in an
+analytic context, the lower bound must be <code class="ph codeph">UNBOUNDED PRECEDING</code>.
+<pre class="pre codeblock"><code>select x, property,
+  max(x) over
+  (
+    <strong class="ph b">order by property, x</strong>
+    <strong class="ph b">rows between unbounded preceding and 1 following</strong>
+  ) as 'local maximum'
+from int_t where property in ('prime','square');
++---+----------+---------------+
+| x | property | local maximum |
++---+----------+---------------+
+| 2 | prime    | 3             |
+| 3 | prime    | 5             |
+| 5 | prime    | 7             |
+| 7 | prime    | 7             |
+| 1 | square   | 7             |
+| 4 | square   | 9             |
+| 9 | square   | 9             |
++---+----------+---------------+
+
+-- Doesn't work because of syntax restriction on RANGE clause.
+select x, property,
+  max(x) over
+  (
+    <strong class="ph b">order by property, x</strong>
+    <strong class="ph b">range between unbounded preceding and 1 following</strong>
+  ) as 'local maximum'
+from int_t where property in ('prime','square');
+ERROR: AnalysisException: RANGE is only supported with both the lower and upper bounds UNBOUNDED or one UNBOUNDED and the other CURRENT ROW.
+</code></pre>
+    </div>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+    <p class="p">
+      <a class="xref" href="impala_analytic_functions.html#analytic_functions">Impala Analytic Functions</a>, <a class="xref" href="impala_min.html#min">MIN Function</a>,
+      <a class="xref" href="impala_avg.html#avg">AVG Function</a>
+    </p>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_aggregate_functions.html">Impala Aggregate Functions</a></div></div></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_max_errors.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_max_errors.html b/docs/build/html/topics/impala_max_errors.html
new file mode 100644
index 0000000..72a6594
--- /dev/null
+++ b/docs/build/html/topics/impala_max_errors.html
@@ -0,0 +1,40 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="max_errors"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>MAX_ERRORS Query Option</title></head><body id="max_errors"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">MAX_ERRORS Query Option</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      
+      Maximum number of non-fatal errors for any particular query that are recorded in the Impala log file. For
+      example, if a billion-row table had a non-fatal data error in every row, you could diagnose the problem
+      without all billion errors being logged. Unspecified or 0 indicates the built-in default value of 1000.
+    </p>
+
+    <p class="p">
+      This option only controls how many errors are reported. To specify whether Impala continues or halts when it
+      encounters such errors, use the <code class="ph codeph">ABORT_ON_ERROR</code> option.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Type:</strong> numeric
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Default:</strong> 0 (meaning 1000 errors)
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+    <p class="p">
+      <a class="xref" href="impala_abort_on_error.html#abort_on_error">ABORT_ON_ERROR Query Option</a>,
+      <a class="xref" href="impala_logging.html#logging">Using Impala Logging</a>
+    </p>
+
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_max_io_buffers.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_max_io_buffers.html b/docs/build/html/topics/impala_max_io_buffers.html
new file mode 100644
index 0000000..3c5ec1e
--- /dev/null
+++ b/docs/build/html/topics/impala_max_io_buffers.html
@@ -0,0 +1,23 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="max_io_buffers"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>MAX_IO_BUFFERS Query Option</title></head><body id="max_io_buffers"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">MAX_IO_BUFFERS Query Option</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      Deprecated query option. Currently has no effect.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Type:</strong> numeric
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Default:</strong> 0
+    </p>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_max_num_runtime_filters.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_max_num_runtime_filters.html b/docs/build/html/topics/impala_max_num_runtime_filters.html
new file mode 100644
index 0000000..37309ae
--- /dev/null
+++ b/docs/build/html/topics/impala_max_num_runtime_filters.html
@@ -0,0 +1,65 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="max_num_runtime_filters"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>MAX_NUM_RUNTIME_FILTERS Query Option (Impala 2.5 or higher only)</title></head><body id="max_num_runtime_filters"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">MAX_NUM_RUNTIME_FILTERS Query Option (<span class="keyword">Impala 2.5</span> or higher only)</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      
+      The <code class="ph codeph">MAX_NUM_RUNTIME_FILTERS</code> query option
+      sets an upper limit on the number of runtime filters that can be produced for each query.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Type:</strong> integer
+      </p>
+
+    <p class="p">
+      <strong class="ph b">Default:</strong> 10
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.5.0</span>
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+    <p class="p">
+      Each runtime filter imposes some memory overhead on the query.
+      Depending on the setting of the <code class="ph codeph">RUNTIME_BLOOM_FILTER_SIZE</code>
+      query option, each filter might consume between 1 and 16 megabytes
+      per plan fragment. There are typically 5 or fewer filters per plan fragment.
+    </p>
+
+    <p class="p">
+      Impala evaluates the effectiveness of each filter, and keeps the
+      ones that eliminate the largest number of partitions or rows.
+      Therefore, this setting can protect against
+      potential problems due to excessive memory overhead for filter production,
+      while still allowing a high level of optimization for suitable queries.
+    </p>
+
+    <p class="p">
+        Because the runtime filtering feature applies mainly to resource-intensive
+        and long-running queries, only adjust this query option when tuning long-running queries
+        involving some combination of large partitioned tables and joins involving large tables.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+    <p class="p">
+      <a class="xref" href="impala_runtime_filtering.html">Runtime Filtering for Impala Queries (Impala 2.5 or higher only)</a>,
+      
+      <a class="xref" href="impala_runtime_bloom_filter_size.html#runtime_bloom_filter_size">RUNTIME_BLOOM_FILTER_SIZE Query Option (Impala 2.5 or higher only)</a>,
+      <a class="xref" href="impala_runtime_filter_mode.html#runtime_filter_mode">RUNTIME_FILTER_MODE Query Option (Impala 2.5 or higher only)</a>
+    </p>
+
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_max_scan_range_length.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_max_scan_range_length.html b/docs/build/html/topics/impala_max_scan_range_length.html
new file mode 100644
index 0000000..df7aa6e
--- /dev/null
+++ b/docs/build/html/topics/impala_max_scan_range_length.html
@@ -0,0 +1,47 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="max_scan_range_length"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>MAX_SCAN_RANGE_LENGTH Query Option</title></head><body id="max_scan_range_length"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">MAX_SCAN_RANGE_LENGTH Query Option</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      
+      Maximum length of the scan range. Interacts with the number of HDFS blocks in the table to determine how many
+      CPU cores across the cluster are involved with the processing for a query. (Each core processes one scan
+      range.)
+    </p>
+
+    <p class="p">
+      Lowering the value can sometimes increase parallelism if you have unused CPU capacity, but a too-small value
+      can limit query performance because each scan range involves extra overhead.
+    </p>
+
+    <p class="p">
+      Only applicable to HDFS tables. Has no effect on Parquet tables. Unspecified or 0 indicates backend default,
+      which is the same as the HDFS block size for each table.
+    </p>
+
+    <p class="p">
+      Although the scan range can be arbitrarily long, Impala internally uses an 8 MB read buffer so that it can
+      query tables with huge block sizes without allocating equivalent blocks of memory.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Type:</strong> numeric
+    </p>
+
+    <p class="p">
+      In <span class="keyword">Impala 2.7</span> and higher, the argument value can include unit specifiers,
+      such as <code class="ph codeph">100m</code> or <code class="ph codeph">100mb</code>. In previous versions,
+      Impala interpreted such formatted values as 0, leading to query failures.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Default:</strong> 0
+    </p>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>
\ No newline at end of file

[25/51] [partial] incubator-impala git commit: IMPALA-4181 [DOCS] Publish rendered Impala documentation to ASF site

Posted by jb...@apache.org.

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_langref_unsupported.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_langref_unsupported.html b/docs/build/html/topics/impala_langref_unsupported.html
new file mode 100644
index 0000000..66a4b19
--- /dev/null
+++ b/docs/build/html/topics/impala_langref_unsupported.html
@@ -0,0 +1,329 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_langref.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="langref_hiveql_delta"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>SQL Differences Between Impala and Hive</title></head><body id="langref_hiveql_delta"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">SQL Differences Between Impala and Hive</h1>
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      
+      
+      Impala's SQL syntax follows the SQL-92 standard, and includes many industry extensions in areas such as
+      built-in functions. See <a class="xref" href="impala_porting.html#porting">Porting SQL from Other Database Systems to Impala</a> for a general discussion of adapting SQL
+      code from a variety of database systems to Impala.
+    </p>
+
+    <p class="p">
+      Because Impala and Hive share the same metastore database and their tables are often used interchangeably,
+      the following section covers differences between Impala and Hive in detail.
+    </p>
+
+    <p class="p toc inpage"></p>
+  </div>
+
+  <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_langref.html">Impala SQL Language Reference</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="langref_hiveql_delta__langref_hiveql_unsupported">
+
+    <h2 class="title topictitle2" id="ariaid-title2">HiveQL Features not Available in Impala</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        The current release of Impala does not support the following SQL features that you might be familiar with
+        from HiveQL:
+      </p>
+
+      
+
+      <ul class="ul">
+
+
+        <li class="li">
+          Extensibility mechanisms such as <code class="ph codeph">TRANSFORM</code>, custom file formats, or custom SerDes.
+        </li>
+
+        <li class="li">
+          The <code class="ph codeph">DATE</code> data type.
+        </li>
+
+        <li class="li">
+          XML and JSON functions.
+        </li>
+
+        <li class="li">
+          Certain aggregate functions from HiveQL: <code class="ph codeph">covar_pop</code>, <code class="ph codeph">covar_samp</code>,
+          <code class="ph codeph">corr</code>, <code class="ph codeph">percentile</code>, <code class="ph codeph">percentile_approx</code>,
+          <code class="ph codeph">histogram_numeric</code>, <code class="ph codeph">collect_set</code>; Impala supports the set of aggregate
+          functions listed in <a class="xref" href="impala_aggregate_functions.html#aggregate_functions">Impala Aggregate Functions</a> and analytic
+          functions listed in <a class="xref" href="impala_analytic_functions.html#analytic_functions">Impala Analytic Functions</a>.
+        </li>
+
+        <li class="li">
+          Sampling.
+        </li>
+
+        <li class="li">
+          Lateral views. In <span class="keyword">Impala 2.3</span> and higher, Impala supports queries on complex types
+          (<code class="ph codeph">STRUCT</code>, <code class="ph codeph">ARRAY</code>, or <code class="ph codeph">MAP</code>), using join notation
+          rather than the <code class="ph codeph">EXPLODE()</code> keyword.
+          See <a class="xref" href="impala_complex_types.html#complex_types">Complex Types (Impala 2.3 or higher only)</a> for details about Impala support for complex types.
+        </li>
+
+        <li class="li">
+          Multiple <code class="ph codeph">DISTINCT</code> clauses per query, although Impala includes some workarounds for this
+          limitation.
+          <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+        <p class="p">
+          By default, Impala only allows a single <code class="ph codeph">COUNT(DISTINCT <var class="keyword varname">columns</var>)</code>
+          expression in each query.
+        </p>
+        <p class="p">
+          If you do not need precise accuracy, you can produce an estimate of the distinct values for a column by
+          specifying <code class="ph codeph">NDV(<var class="keyword varname">column</var>)</code>; a query can contain multiple instances of
+          <code class="ph codeph">NDV(<var class="keyword varname">column</var>)</code>. To make Impala automatically rewrite
+          <code class="ph codeph">COUNT(DISTINCT)</code> expressions to <code class="ph codeph">NDV()</code>, enable the
+          <code class="ph codeph">APPX_COUNT_DISTINCT</code> query option.
+        </p>
+        <p class="p">
+          To produce the same result as multiple <code class="ph codeph">COUNT(DISTINCT)</code> expressions, you can use the
+          following technique for queries involving a single table:
+        </p>
+<pre class="pre codeblock"><code>select v1.c1 result1, v2.c1 result2 from
+  (select count(distinct col1) as c1 from t1) v1
+    cross join
+  (select count(distinct col2) as c1 from t1) v2;
+</code></pre>
+        <p class="p">
+          Because <code class="ph codeph">CROSS JOIN</code> is an expensive operation, prefer to use the <code class="ph codeph">NDV()</code>
+          technique wherever practical.
+        </p>
+      </div>
+        </li>
+      </ul>
+
+      <div class="p">
+        User-defined functions (UDFs) are supported starting in Impala 1.2. See <a class="xref" href="impala_udf.html#udfs">Impala User-Defined Functions (UDFs)</a>
+        for full details on Impala UDFs.
+        <ul class="ul">
+          <li class="li">
+            <p class="p">
+              Impala supports high-performance UDFs written in C++, as well as reusing some Java-based Hive UDFs.
+            </p>
+          </li>
+
+          <li class="li">
+            <p class="p">
+              Impala supports scalar UDFs and user-defined aggregate functions (UDAFs). Impala does not currently
+              support user-defined table generating functions (UDTFs).
+            </p>
+          </li>
+
+          <li class="li">
+            <p class="p">
+              Only Impala-supported column types are supported in Java-based UDFs.
+            </p>
+          </li>
+
+          <li class="li">
+            <p class="p">
+        The Hive <code class="ph codeph">current_user()</code> function cannot be
+        called from a Java UDF through Impala.
+      </p>
+          </li>
+        </ul>
+      </div>
+
+      <p class="p">
+        Impala does not currently support these HiveQL statements:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          <code class="ph codeph">ANALYZE TABLE</code> (the Impala equivalent is <code class="ph codeph">COMPUTE STATS</code>)
+        </li>
+
+        <li class="li">
+          <code class="ph codeph">DESCRIBE COLUMN</code>
+        </li>
+
+        <li class="li">
+          <code class="ph codeph">DESCRIBE DATABASE</code>
+        </li>
+
+        <li class="li">
+          <code class="ph codeph">EXPORT TABLE</code>
+        </li>
+
+        <li class="li">
+          <code class="ph codeph">IMPORT TABLE</code>
+        </li>
+
+        <li class="li">
+          <code class="ph codeph">SHOW TABLE EXTENDED</code>
+        </li>
+
+        <li class="li">
+          <code class="ph codeph">SHOW INDEXES</code>
+        </li>
+
+        <li class="li">
+          <code class="ph codeph">SHOW COLUMNS</code>
+        </li>
+
+        <li class="li">
+          <code class="ph codeph">INSERT OVERWRITE DIRECTORY</code>; use <code class="ph codeph">INSERT OVERWRITE <var class="keyword varname">table_name</var></code>
+          or <code class="ph codeph">CREATE TABLE AS SELECT</code> to materialize query results into the HDFS directory associated
+          with an Impala table.
+        </li>
+      </ul>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="langref_hiveql_delta__langref_hiveql_semantics">
+
+    <h2 class="title topictitle2" id="ariaid-title3">Semantic Differences Between Impala and HiveQL Features</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        This section covers instances where Impala and Hive have similar functionality, sometimes including the
+        same syntax, but there are differences in the runtime semantics of those features.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Security:</strong>
+      </p>
+
+      <p class="p">
+        Impala utilizes the <a class="xref" href="http://sentry.incubator.apache.org/" target="_blank">Apache
+        Sentry </a> authorization framework, which provides fine-grained role-based access control
+        to protect data against unauthorized access or tampering.
+      </p>
+
+      <p class="p">
+        The Hive component now includes Sentry-enabled <code class="ph codeph">GRANT</code>,
+        <code class="ph codeph">REVOKE</code>, and <code class="ph codeph">CREATE/DROP ROLE</code> statements. Earlier Hive releases had a
+        privilege system with <code class="ph codeph">GRANT</code> and <code class="ph codeph">REVOKE</code> statements that were primarily
+        intended to prevent accidental deletion of data, rather than a security mechanism to protect against
+        malicious users.
+      </p>
+
+      <p class="p">
+        Impala can make use of privileges set up through Hive <code class="ph codeph">GRANT</code> and <code class="ph codeph">REVOKE</code> statements.
+        Impala has its own <code class="ph codeph">GRANT</code> and <code class="ph codeph">REVOKE</code> statements in Impala 2.0 and higher.
+        See <a class="xref" href="impala_authorization.html#authorization">Enabling Sentry Authorization for Impala</a> for the details of authorization in Impala, including
+        how to switch from the original policy file-based privilege model to the Sentry service using privileges
+        stored in the metastore database.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">SQL statements and clauses:</strong>
+      </p>
+
+      <p class="p">
+        The semantics of Impala SQL statements varies from HiveQL in some cases where they use similar SQL
+        statement and clause names:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          Impala uses different syntax and names for query hints, <code class="ph codeph">[SHUFFLE]</code> and
+          <code class="ph codeph">[NOSHUFFLE]</code> rather than <code class="ph codeph">MapJoin</code> or <code class="ph codeph">StreamJoin</code>. See
+          <a class="xref" href="impala_joins.html#joins">Joins in Impala SELECT Statements</a> for the Impala details.
+        </li>
+
+        <li class="li">
+          Impala does not expose MapReduce specific features of <code class="ph codeph">SORT BY</code>, <code class="ph codeph">DISTRIBUTE
+          BY</code>, or <code class="ph codeph">CLUSTER BY</code>.
+        </li>
+
+        <li class="li">
+          Impala does not require queries to include a <code class="ph codeph">FROM</code> clause.
+        </li>
+      </ul>
+
+      <p class="p">
+        <strong class="ph b">Data types:</strong>
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          Impala supports a limited set of implicit casts. This can help avoid undesired results from unexpected
+          casting behavior.
+          <ul class="ul">
+            <li class="li">
+              Impala does not implicitly cast between string and numeric or Boolean types. Always use
+              <code class="ph codeph">CAST()</code> for these conversions.
+            </li>
+
+            <li class="li">
+              Impala does perform implicit casts among the numeric types, when going from a smaller or less precise
+              type to a larger or more precise one. For example, Impala will implicitly convert a
+              <code class="ph codeph">SMALLINT</code> to a <code class="ph codeph">BIGINT</code> or <code class="ph codeph">FLOAT</code>, but to convert from
+              <code class="ph codeph">DOUBLE</code> to <code class="ph codeph">FLOAT</code> or <code class="ph codeph">INT</code> to <code class="ph codeph">TINYINT</code>
+              requires a call to <code class="ph codeph">CAST()</code> in the query.
+            </li>
+
+            <li class="li">
+              Impala does perform implicit casts from string to timestamp. Impala has a restricted set of literal
+              formats for the <code class="ph codeph">TIMESTAMP</code> data type and the <code class="ph codeph">from_unixtime()</code> format
+              string; see <a class="xref" href="impala_timestamp.html#timestamp">TIMESTAMP Data Type</a> for details.
+            </li>
+          </ul>
+          <p class="p">
+            See <a class="xref" href="impala_datatypes.html#datatypes">Data Types</a> for full details on implicit and explicit casting for
+            all types, and <a class="xref" href="impala_conversion_functions.html#conversion_functions">Impala Type Conversion Functions</a> for details about
+            the <code class="ph codeph">CAST()</code> function.
+          </p>
+        </li>
+
+        <li class="li">
+          Impala does not store or interpret timestamps using the local timezone, to avoid undesired results from
+          unexpected time zone issues. Timestamps are stored and interpreted relative to UTC. This difference can
+          produce different results for some calls to similarly named date/time functions between Impala and Hive.
+          See <a class="xref" href="impala_datetime_functions.html#datetime_functions">Impala Date and Time Functions</a> for details about the Impala
+          functions. See <a class="xref" href="impala_timestamp.html#timestamp">TIMESTAMP Data Type</a> for a discussion of how Impala handles
+          time zones, and configuration options you can use to make Impala match the Hive behavior more closely
+          when dealing with Parquet-encoded <code class="ph codeph">TIMESTAMP</code> data or when converting between
+          the local time zone and UTC.
+        </li>
+
+        <li class="li">
+          The Impala <code class="ph codeph">TIMESTAMP</code> type can represent dates ranging from 1400-01-01 to 9999-12-31.
+          This is different from the Hive date range, which is 0000-01-01 to 9999-12-31.
+        </li>
+
+        <li class="li">
+          <p class="p">
+        Impala does not return column overflows as <code class="ph codeph">NULL</code>, so that customers can distinguish
+        between <code class="ph codeph">NULL</code> data and overflow conditions similar to how they do so with traditional
+        database systems. Impala returns the largest or smallest value in the range for the type. For example,
+        valid values for a <code class="ph codeph">tinyint</code> range from -128 to 127. In Impala, a <code class="ph codeph">tinyint</code>
+        with a value of -200 returns -128 rather than <code class="ph codeph">NULL</code>. A <code class="ph codeph">tinyint</code> with a
+        value of 200 returns 127.
+      </p>
+        </li>
+
+      </ul>
+
+      <p class="p">
+        <strong class="ph b">Miscellaneous features:</strong>
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          Impala does not provide virtual columns.
+        </li>
+
+        <li class="li">
+          Impala does not expose locking.
+        </li>
+
+        <li class="li">
+          Impala does not expose some configuration properties.
+        </li>
+      </ul>
+    </div>
+  </article>
+</article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_ldap.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_ldap.html b/docs/build/html/topics/impala_ldap.html
new file mode 100644
index 0000000..e4aaf52
--- /dev/null
+++ b/docs/build/html/topics/impala_ldap.html
@@ -0,0 +1,294 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_authentication.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="ldap"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Enabling LDAP Authentication for Impala</title></head><body id="ldap"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Enabling LDAP Authentication for Impala</h1>
+  
+
+  <div class="body conbody">
+
+
+
+    <p class="p"> Authentication is the process of allowing only specified named users to
+      access the server (in this case, the Impala server). This feature is
+      crucial for any production deployment, to prevent misuse, tampering, or
+      excessive load on the server. Impala uses LDAP for authentication,
+      verifying the credentials of each user who connects through
+        <span class="keyword cmdname">impala-shell</span>, Hue, a Business Intelligence tool, JDBC
+      or ODBC application, and so on. </p>
+
+    <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+      Regardless of the authentication mechanism used, Impala always creates HDFS directories and data files
+      owned by the same user (typically <code class="ph codeph">impala</code>). To implement user-level access to different
+      databases, tables, columns, partitions, and so on, use the Sentry authorization feature, as explained in
+      <a class="xref" href="../shared/../topics/impala_authorization.html#authorization">Enabling Sentry Authorization for Impala</a>.
+    </div>
+
+    <p class="p">
+      An alternative form of authentication you can use is Kerberos, described in
+      <a class="xref" href="impala_kerberos.html#kerberos">Enabling Kerberos Authentication for Impala</a>.
+    </p>
+
+    <p class="p toc inpage"></p>
+
+  </div>
+
+  <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_authentication.html">Impala Authentication</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="ldap__ldap_prereqs">
+
+    <h2 class="title topictitle2" id="ariaid-title2">Requirements for Using Impala with LDAP</h2>
+  
+
+    <div class="body conbody">
+
+      <p class="p">
+        Authentication against LDAP servers is available in Impala 1.2.2 and higher. Impala 1.4.0 adds support for
+        secure LDAP authentication through SSL and TLS.
+      </p>
+
+      <p class="p">
+        The Impala LDAP support lets you use Impala with systems such as Active Directory that use LDAP behind the
+        scenes.
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="ldap__ldap_client_server">
+
+    <h2 class="title topictitle2" id="ariaid-title3">Client-Server Considerations for LDAP</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        Only client-&gt;Impala connections can be authenticated by LDAP.
+      </p>
+
+      <p class="p"> You must use the Kerberos authentication mechanism for connections
+        between internal Impala components, such as between the
+          <span class="keyword cmdname">impalad</span>, <span class="keyword cmdname">statestored</span>, and
+          <span class="keyword cmdname">catalogd</span> daemons. See <a class="xref" href="impala_kerberos.html#kerberos">Enabling Kerberos Authentication for Impala</a> on how to set up Kerberos for
+        Impala. </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="ldap__ldap_config">
+
+    <h2 class="title topictitle2" id="ariaid-title4">Server-Side LDAP Setup</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        These requirements apply on the server side when configuring and starting Impala:
+      </p>
+
+      <p class="p">
+        To enable LDAP authentication, set the following startup options for <span class="keyword cmdname">impalad</span>:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          <code class="ph codeph">--enable_ldap_auth</code> enables LDAP-based authentication between the client and Impala.
+        </li>
+
+        <li class="li">
+          <code class="ph codeph">--ldap_uri</code> sets the URI of the LDAP server to use. Typically, the URI is prefixed with
+          <code class="ph codeph">ldap://</code>. In Impala 1.4.0 and higher, you can specify secure SSL-based LDAP transport by
+          using the prefix <code class="ph codeph">ldaps://</code>. The URI can optionally specify the port, for example:
+          <code class="ph codeph">ldap://ldap_server.example.com:389</code> or
+          <code class="ph codeph">ldaps://ldap_server.example.com:636</code>. (389 and 636 are the default ports for non-SSL and
+          SSL LDAP connections, respectively.)
+        </li>
+
+
+
+        <li class="li">
+          For <code class="ph codeph">ldaps://</code> connections secured by SSL,
+          <code class="ph codeph">--ldap_ca_certificate="<var class="keyword varname">/path/to/certificate/pem</var>"</code> specifies the
+          location of the certificate in standard <code class="ph codeph">.PEM</code> format. Store this certificate on the local
+          filesystem, in a location that only the <code class="ph codeph">impala</code> user and other trusted users can read.
+        </li>
+
+
+      </ul>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title5" id="ldap__ldap_bind_strings">
+
+    <h2 class="title topictitle2" id="ariaid-title5">Support for Custom Bind Strings</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        When Impala connects to LDAP it issues a bind call to the LDAP server to authenticate as the connected
+        user. Impala clients, including the Impala shell, provide the short name of the user to Impala. This is
+        necessary so that Impala can use Sentry for role-based access, which uses short names.
+      </p>
+
+      <p class="p">
+        However, LDAP servers often require more complex, structured usernames for authentication. Impala supports
+        three ways of transforming the short name (for example, <code class="ph codeph">'henry'</code>) to a more complicated
+        string. If necessary, specify one of the following configuration options
+        when starting the <span class="keyword cmdname">impalad</span> daemon on each DataNode:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          <code class="ph codeph">--ldap_domain</code>: Replaces the username with a string
+          <code class="ph codeph"><var class="keyword varname">username</var>@<var class="keyword varname">ldap_domain</var></code>.
+        </li>
+
+        <li class="li">
+          <code class="ph codeph">--ldap_baseDN</code>: Replaces the username with a <span class="q">"distinguished name"</span> (DN) of the form:
+          <code class="ph codeph">uid=<var class="keyword varname">userid</var>,ldap_baseDN</code>. (This is equivalent to a Hive option).
+        </li>
+
+        <li class="li">
+          <code class="ph codeph">--ldap_bind_pattern</code>: This is the most general option, and replaces the username with the
+          string <var class="keyword varname">ldap_bind_pattern</var> where all instances of the string <code class="ph codeph">#UID</code> are
+          replaced with <var class="keyword varname">userid</var>. For example, an <code class="ph codeph">ldap_bind_pattern</code> of
+          <code class="ph codeph">"user=#UID,OU=foo,CN=bar"</code> with a username of <code class="ph codeph">henry</code> will construct a
+          bind name of <code class="ph codeph">"user=henry,OU=foo,CN=bar"</code>.
+        </li>
+      </ul>
+
+      <p class="p">
+        These options are mutually exclusive; Impala does not start if more than one of these options is specified.
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title6" id="ldap__ldap_security">
+
+    <h2 class="title topictitle2" id="ariaid-title6">Secure LDAP Connections</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        To avoid sending credentials over the wire in cleartext, you must configure a secure connection between
+        both the client and Impala, and between Impala and the LDAP server. The secure connection could use SSL or
+        TLS.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Secure LDAP connections through SSL:</strong>
+      </p>
+
+      <p class="p">
+        For SSL-enabled LDAP connections, specify a prefix of <code class="ph codeph">ldaps://</code> instead of
+        <code class="ph codeph">ldap://</code>. Also, the default port for SSL-enabled LDAP connections is 636 instead of 389.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Secure LDAP connections through TLS:</strong>
+      </p>
+
+      <p class="p">
+        <a class="xref" href="http://en.wikipedia.org/wiki/Transport_Layer_Security" target="_blank">TLS</a>,
+        the successor to the SSL protocol, is supported by most modern LDAP servers. Unlike SSL connections, TLS
+        connections can be made on the same server port as non-TLS connections. To secure all connections using
+        TLS, specify the following flags as startup options to the <span class="keyword cmdname">impalad</span> daemon:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          <code class="ph codeph">--ldap_tls</code> tells Impala to start a TLS connection to the LDAP server, and to fail
+          authentication if it cannot be done.
+        </li>
+
+        <li class="li">
+          <code class="ph codeph">--ldap_ca_certificate="<var class="keyword varname">/path/to/certificate/pem</var>"</code> specifies the
+          location of the certificate in standard <code class="ph codeph">.PEM</code> format. Store this certificate on the local
+          filesystem, in a location that only the <code class="ph codeph">impala</code> user and other trusted users can read.
+        </li>
+      </ul>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title7" id="ldap__ldap_impala_shell">
+
+    <h2 class="title topictitle2" id="ariaid-title7">LDAP Authentication for impala-shell Interpreter</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        To connect to Impala using LDAP authentication, you specify command-line options to the
+        <span class="keyword cmdname">impala-shell</span> command interpreter and enter the password when prompted:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          <code class="ph codeph">-l</code> enables LDAP authentication.
+        </li>
+
+        <li class="li">
+          <code class="ph codeph">-u</code> sets the user. Per Active Directory, the user is the short username, not the full
+          LDAP distinguished name. If your LDAP settings include a search base, use the
+          <code class="ph codeph">--ldap_bind_pattern</code> on the <span class="keyword cmdname">impalad</span> daemon to translate the short user
+          name from <span class="keyword cmdname">impala-shell</span> automatically to the fully qualified name.
+
+        </li>
+
+        <li class="li">
+          <span class="keyword cmdname">impala-shell</span> automatically prompts for the password.
+        </li>
+      </ul>
+
+      <p class="p">
+        For the full list of available <span class="keyword cmdname">impala-shell</span> options, see
+        <a class="xref" href="impala_shell_options.html#shell_options">impala-shell Configuration Options</a>.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">LDAP authentication for JDBC applications:</strong> See <a class="xref" href="impala_jdbc.html#impala_jdbc">Configuring Impala to Work with JDBC</a> for the
+        format to use with the JDBC connection string for servers using LDAP authentication.
+      </p>
+    </div>
+  </article>
+  <article class="topic concept nested1" aria-labelledby="ariaid-title8" id="ldap__ldap_impala_hue">
+    <h2 class="title topictitle2" id="ariaid-title8">Enabling LDAP for Impala in Hue</h2>
+    
+    <div class="body conbody">
+      <section class="section" id="ldap_impala_hue__ldap_impala_hue_cmdline"><h3 class="title sectiontitle">Enabling LDAP for Impala in Hue Using the Command Line</h3>
+        
+        <div class="p">LDAP authentication for the Impala app in Hue can be enabled by
+          setting the following properties under the <code class="ph codeph">[impala]</code>
+          section in <code class="ph codeph">hue.ini</code>. <table class="table" id="ldap_impala_hue__ldap_impala_hue_configs"><caption></caption><colgroup><col style="width:33.33333333333333%"><col style="width:66.66666666666666%"></colgroup><tbody class="tbody">
+                <tr class="row">
+                  <td class="entry nocellnorowborder"><code class="ph codeph">auth_username</code></td>
+                  <td class="entry nocellnorowborder">LDAP username of Hue user to be authenticated.</td>
+                </tr>
+                <tr class="row">
+                  <td class="entry nocellnorowborder"><code class="ph codeph">auth_password</code></td>
+                  <td class="entry nocellnorowborder">
+                    <p class="p">LDAP password of Hue user to be authenticated.</p>
+                  </td>
+                </tr>
+              </tbody></table>These login details are only used by Impala to authenticate to
+          LDAP. The Impala service trusts Hue to have already validated the user
+          being impersonated, rather than simply passing on the credentials.</div>
+      </section>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title9" id="ldap__ldap_delegation">
+    <h2 class="title topictitle2" id="ariaid-title9">Enabling Impala Delegation for LDAP Users</h2>
+    <div class="body conbody">
+      <p class="p">
+        See <a class="xref" href="impala_delegation.html#delegation">Configuring Impala Delegation for Hue and BI Tools</a> for details about the delegation feature
+        that lets certain users submit queries using the credentials of other users.
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title10" id="ldap__ldap_restrictions">
+
+    <h2 class="title topictitle2" id="ariaid-title10">LDAP Restrictions for Impala</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        The LDAP support is preliminary. It currently has only been tested against Active Directory.
+      </p>
+    </div>
+  </article>
+</article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_limit.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_limit.html b/docs/build/html/topics/impala_limit.html
new file mode 100644
index 0000000..a4e94d0
--- /dev/null
+++ b/docs/build/html/topics/impala_limit.html
@@ -0,0 +1,168 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_select.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="limit"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>LIMIT Clause</title></head><body id="limit"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">LIMIT Clause</h1>
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      The <code class="ph codeph">LIMIT</code> clause in a <code class="ph codeph">SELECT</code> query sets a maximum number of rows for the
+      result set. Pre-selecting the maximum size of the result set helps Impala to optimize memory usage while
+      processing a distributed query.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>LIMIT <var class="keyword varname">constant_integer_expression</var></code></pre>
+
+    <p class="p">
+      The argument to the <code class="ph codeph">LIMIT</code> clause must evaluate to a constant value. It can be a numeric
+      literal, or another kind of numeric expression involving operators, casts, and function return values. You
+      cannot refer to a column or use a subquery.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+    <p class="p">
+      This clause is useful in contexts such as:
+    </p>
+
+    <ul class="ul">
+      <li class="li">
+        To return exactly N items from a top-N query, such as the 10 highest-rated items in a shopping category or
+        the 50 hostnames that refer the most traffic to a web site.
+      </li>
+
+      <li class="li">
+        To demonstrate some sample values from a table or a particular query. (To display some arbitrary items, use
+        a query with no <code class="ph codeph">ORDER BY</code> clause. An <code class="ph codeph">ORDER BY</code> clause causes additional
+        memory and/or disk usage during the query.)
+      </li>
+
+      <li class="li">
+        To keep queries from returning huge result sets by accident if a table is larger than expected, or a
+        <code class="ph codeph">WHERE</code> clause matches more rows than expected.
+      </li>
+    </ul>
+
+    <p class="p">
+      Originally, the value for the <code class="ph codeph">LIMIT</code> clause had to be a numeric literal. In Impala 1.2.1 and
+      higher, it can be a numeric expression.
+    </p>
+
+    <p class="p">
+        Prior to Impala 1.4.0, Impala required any query including an
+        <code class="ph codeph"><a class="xref" href="../shared/../topics/impala_order_by.html#order_by">ORDER BY</a></code> clause to also use a
+        <code class="ph codeph"><a class="xref" href="../shared/../topics/impala_limit.html#limit">LIMIT</a></code> clause. In Impala 1.4.0 and
+        higher, the <code class="ph codeph">LIMIT</code> clause is optional for <code class="ph codeph">ORDER BY</code> queries. In cases where
+        sorting a huge result set requires enough memory to exceed the Impala memory limit for a particular node,
+        Impala automatically uses a temporary disk work area to perform the sort operation.
+      </p>
+
+    <p class="p">
+      See <a class="xref" href="impala_order_by.html#order_by">ORDER BY Clause</a> for details.
+    </p>
+
+    <p class="p">
+        In Impala 1.2.1 and higher, you can combine a <code class="ph codeph">LIMIT</code> clause with an <code class="ph codeph">OFFSET</code>
+        clause to produce a small result set that is different from a top-N query, for example, to return items 11
+        through 20. This technique can be used to simulate <span class="q">"paged"</span> results. Because Impala queries typically
+        involve substantial amounts of I/O, use this technique only for compatibility in cases where you cannot
+        rewrite the application logic. For best performance and scalability, wherever practical, query as many
+        items as you expect to need, cache them on the application side, and display small groups of results to
+        users using application logic.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Restrictions:</strong>
+      </p>
+
+    <p class="p">
+        Correlated subqueries used in <code class="ph codeph">EXISTS</code> and <code class="ph codeph">IN</code> operators cannot include a
+        <code class="ph codeph">LIMIT</code> clause.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+    <p class="p">
+      The following example shows how the <code class="ph codeph">LIMIT</code> clause caps the size of the result set, with the
+      limit being applied after any other clauses such as <code class="ph codeph">WHERE</code>.
+    </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; create database limits;
+[localhost:21000] &gt; use limits;
+[localhost:21000] &gt; create table numbers (x int);
+[localhost:21000] &gt; insert into numbers values (1), (3), (4), (5), (2);
+Inserted 5 rows in 1.34s
+[localhost:21000] &gt; select x from numbers limit 100;
++---+
+| x |
++---+
+| 1 |
+| 3 |
+| 4 |
+| 5 |
+| 2 |
++---+
+Returned 5 row(s) in 0.26s
+[localhost:21000] &gt; select x from numbers limit 3;
++---+
+| x |
++---+
+| 1 |
+| 3 |
+| 4 |
++---+
+Returned 3 row(s) in 0.27s
+[localhost:21000] &gt; select x from numbers where x &gt; 2 limit 2;
++---+
+| x |
++---+
+| 3 |
+| 4 |
++---+
+Returned 2 row(s) in 0.27s</code></pre>
+
+    <p class="p">
+      For top-N and bottom-N queries, you use the <code class="ph codeph">ORDER BY</code> and <code class="ph codeph">LIMIT</code> clauses
+      together:
+    </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; select x as "Top 3" from numbers order by x desc limit 3;
++-------+
+| top 3 |
++-------+
+| 5     |
+| 4     |
+| 3     |
++-------+
+[localhost:21000] &gt; select x as "Bottom 3" from numbers order by x limit 3;
++----------+
+| bottom 3 |
++----------+
+| 1        |
+| 2        |
+| 3        |
++----------+
+</code></pre>
+
+    <p class="p">
+      You can use constant values besides integer literals as the <code class="ph codeph">LIMIT</code> argument:
+    </p>
+
+<pre class="pre codeblock"><code>-- Other expressions that yield constant integer values work too.
+SELECT x FROM t1 LIMIT 1e6;                        -- Limit is one million.
+SELECT x FROM t1 LIMIT length('hello world');      -- Limit is 11.
+SELECT x FROM t1 LIMIT 2+2;                        -- Limit is 4.
+SELECT x FROM t1 LIMIT cast(truncate(9.9) AS INT); -- Limit is 9.
+</code></pre>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_select.html">SELECT Statement</a></div></div></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_lineage.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_lineage.html b/docs/build/html/topics/impala_lineage.html
new file mode 100644
index 0000000..c3581e5
--- /dev/null
+++ b/docs/build/html/topics/impala_lineage.html
@@ -0,0 +1,91 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_security.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="lineage"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Viewing Lineage Information for Impala Data</title></head><body id="lineage"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Viewing Lineage Information for Impala Data</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      
+      
+      <dfn class="term">Lineage</dfn> is a feature that helps you track where data originated, and how
+      data propagates through the system through SQL statements such as
+        <code class="ph codeph">SELECT</code>, <code class="ph codeph">INSERT</code>, and <code class="ph codeph">CREATE
+        TABLE AS SELECT</code>.
+    </p>
+    <p class="p">
+      This type of tracking is important in high-security configurations, especially in
+      highly regulated industries such as healthcare, pharmaceuticals, financial services and
+      intelligence. For such kinds of sensitive data, it is important to know all
+      the places in the system that contain that data or other data derived from it; to verify who has accessed
+      that data; and to be able to doublecheck that the data used to make a decision was processed correctly and
+      not tampered with.
+    </p>
+
+    <section class="section" id="lineage__column_lineage"><h2 class="title sectiontitle">Column Lineage</h2>
+
+      
+
+      <p class="p">
+        <dfn class="term">Column lineage</dfn> tracks information in fine detail, at the level of
+        particular columns rather than entire tables.
+      </p>
+
+      <p class="p">
+        For example, if you have a table with information derived from web logs, you might copy that data into
+        other tables as part of the ETL process. The ETL operations might involve transformations through
+        expressions and function calls, and rearranging the columns into more or fewer tables
+        (<dfn class="term">normalizing</dfn> or <dfn class="term">denormalizing</dfn> the data). Then for reporting, you might issue
+        queries against multiple tables and views. In this example, column lineage helps you determine that data
+        that entered the system as <code class="ph codeph">RAW_LOGS.FIELD1</code> was then turned into
+        <code class="ph codeph">WEBSITE_REPORTS.IP_ADDRESS</code> through an <code class="ph codeph">INSERT ... SELECT</code> statement. Or,
+        conversely, you could start with a reporting query against a view, and trace the origin of the data in a
+        field such as <code class="ph codeph">TOP_10_VISITORS.USER_ID</code> back to the underlying table and even further back
+        to the point where the data was first loaded into Impala.
+      </p>
+
+      <p class="p">
+        When you have tables where you need to track or control access to sensitive information at the column
+        level, see <a class="xref" href="impala_authorization.html#authorization">Enabling Sentry Authorization for Impala</a> for how to implement column-level
+        security. You set up authorization using the Sentry framework, create views that refer to specific sets of
+        columns, and then assign authorization privileges to those views rather than the underlying tables.
+      </p>
+
+    </section>
+
+    <section class="section" id="lineage__lineage_data"><h2 class="title sectiontitle">Lineage Data for Impala</h2>
+
+      
+
+      <p class="p">
+        The lineage feature is enabled by default. When lineage logging is enabled, the serialized column lineage
+        graph is computed for each query and stored in a specialized log file in JSON format.
+      </p>
+
+      <p class="p">
+        Impala records queries in the lineage log if they complete successfully, or fail due to authorization
+        errors. For write operations such as <code class="ph codeph">INSERT</code> and <code class="ph codeph">CREATE TABLE AS SELECT</code>,
+        the statement is recorded in the lineage log only if it successfully completes. Therefore, the lineage
+        feature tracks data that was accessed by successful queries, or that was attempted to be accessed by
+        unsuccessful queries that were blocked due to authorization failure. These kinds of queries represent data
+        that really was accessed, or where the attempted access could represent malicious activity.
+      </p>
+
+      <p class="p">
+        Impala does not record in the lineage log queries that fail due to syntax errors or that fail or are
+        cancelled before they reach the stage of requesting rows from the result set.
+      </p>
+
+      <p class="p">
+        To enable or disable this feature, set or remove the <code class="ph codeph">-lineage_event_log_dir</code>
+        configuration option for the <span class="keyword cmdname">impalad</span> daemon.
+      </p>
+
+    </section>
+
+  </div>
+
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_security.html">Impala Security</a></div></div></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_literals.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_literals.html b/docs/build/html/topics/impala_literals.html
new file mode 100644
index 0000000..cd16389
--- /dev/null
+++ b/docs/build/html/topics/impala_literals.html
@@ -0,0 +1,427 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_langref.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="literals"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Literals</title></head><body id="literals"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Literals</h1>
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      
+      Each of the Impala data types has corresponding notation for literal values of that type. You specify literal
+      values in SQL statements, such as in the <code class="ph codeph">SELECT</code> list or <code class="ph codeph">WHERE</code> clause of a
+      query, or as an argument to a function call. See <a class="xref" href="impala_datatypes.html#datatypes">Data Types</a> for a complete
+      list of types, ranges, and conversion rules.
+    </p>
+
+    <p class="p toc inpage"></p>
+  </div>
+
+  <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_langref.html">Impala SQL Language Reference</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="literals__numeric_literals">
+
+    <h2 class="title topictitle2" id="ariaid-title2">Numeric Literals</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        
+        To write literals for the integer types (<code class="ph codeph">TINYINT</code>, <code class="ph codeph">SMALLINT</code>,
+        <code class="ph codeph">INT</code>, and <code class="ph codeph">BIGINT</code>), use a sequence of digits with optional leading zeros.
+      </p>
+
+      <p class="p">
+        To write literals for the floating-point types (<code class="ph codeph">DECIMAL</code>,
+        <code class="ph codeph">FLOAT</code>, and <code class="ph codeph">DOUBLE</code>), use a sequence of digits with an optional decimal
+        point (<code class="ph codeph">.</code> character). To preserve accuracy during arithmetic expressions, Impala interprets
+        floating-point literals as the <code class="ph codeph">DECIMAL</code> type with the smallest appropriate precision and
+        scale, until required by the context to convert the result to <code class="ph codeph">FLOAT</code> or
+        <code class="ph codeph">DOUBLE</code>.
+      </p>
+
+      <p class="p">
+        Integer values are promoted to floating-point when necessary, based on the context.
+      </p>
+
+      <p class="p">
+        You can also use exponential notation by including an <code class="ph codeph">e</code> character. For example,
+        <code class="ph codeph">1e6</code> is 1 times 10 to the power of 6 (1 million). A number in exponential notation is
+        always interpreted as floating-point.
+      </p>
+
+      <p class="p">
+        When Impala encounters a numeric literal, it considers the type to be the <span class="q">"smallest"</span> that can
+        accurately represent the value. The type is promoted to larger or more accurate types if necessary, based
+        on subsequent parts of an expression.
+      </p>
+      <p class="p">
+        For example, you can see by the types Impala defines for the following table columns
+        how it interprets the corresponding numeric literals:
+      </p>
+<pre class="pre codeblock"><code>[localhost:21000] &gt; create table ten as select 10 as x;
++-------------------+
+| summary           |
++-------------------+
+| Inserted 1 row(s) |
++-------------------+
+[localhost:21000] &gt; desc ten;
++------+---------+---------+
+| name | type    | comment |
++------+---------+---------+
+| x    | tinyint |         |
++------+---------+---------+
+
+[localhost:21000] &gt; create table four_k as select 4096 as x;
++-------------------+
+| summary           |
++-------------------+
+| Inserted 1 row(s) |
++-------------------+
+[localhost:21000] &gt; desc four_k;
++------+----------+---------+
+| name | type     | comment |
++------+----------+---------+
+| x    | smallint |         |
++------+----------+---------+
+
+[localhost:21000] &gt; create table one_point_five as select 1.5 as x;
++-------------------+
+| summary           |
++-------------------+
+| Inserted 1 row(s) |
++-------------------+
+[localhost:21000] &gt; desc one_point_five;
++------+--------------+---------+
+| name | type         | comment |
++------+--------------+---------+
+| x    | decimal(2,1) |         |
++------+--------------+---------+
+
+[localhost:21000] &gt; create table one_point_three_three_three as select 1.333 as x;
++-------------------+
+| summary           |
++-------------------+
+| Inserted 1 row(s) |
++-------------------+
+[localhost:21000] &gt; desc one_point_three_three_three;
++------+--------------+---------+
+| name | type         | comment |
++------+--------------+---------+
+| x    | decimal(4,3) |         |
++------+--------------+---------+
+</code></pre>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="literals__string_literals">
+
+    <h2 class="title topictitle2" id="ariaid-title3">String Literals</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        
+        String literals are quoted using either single or double quotation marks. You can use either kind of quotes
+        for string literals, even both kinds for different literals within the same statement.
+      </p>
+
+      <p class="p">
+        Quoted literals are considered to be of type <code class="ph codeph">STRING</code>. To use quoted literals in contexts
+        requiring a <code class="ph codeph">CHAR</code> or <code class="ph codeph">VARCHAR</code> value, <code class="ph codeph">CAST()</code> the literal to
+        a <code class="ph codeph">CHAR</code> or <code class="ph codeph">VARCHAR</code> of the appropriate length.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Escaping special characters:</strong>
+      </p>
+
+      <p class="p">
+        To encode special characters within a string literal, precede them with the backslash (<code class="ph codeph">\</code>)
+        escape character:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          <code class="ph codeph">\t</code> represents a tab.
+        </li>
+
+        <li class="li">
+          <code class="ph codeph">\n</code> represents a newline or linefeed. This might cause extra line breaks in
+          <span class="keyword cmdname">impala-shell</span> output.
+        </li>
+
+        <li class="li">
+          <code class="ph codeph">\r</code> represents a carriage return. This might cause unusual formatting (making it appear
+          that some content is overwritten) in <span class="keyword cmdname">impala-shell</span> output.
+        </li>
+
+        <li class="li">
+          <code class="ph codeph">\b</code> represents a backspace. This might cause unusual formatting (making it appear that
+          some content is overwritten) in <span class="keyword cmdname">impala-shell</span> output.
+        </li>
+
+        <li class="li">
+          <code class="ph codeph">\0</code> represents an ASCII <code class="ph codeph">nul</code> character (not the same as a SQL
+          <code class="ph codeph">NULL</code>). This might not be visible in <span class="keyword cmdname">impala-shell</span> output.
+        </li>
+
+        <li class="li">
+          <code class="ph codeph">\Z</code> represents a DOS end-of-file character. This might not be visible in
+          <span class="keyword cmdname">impala-shell</span> output.
+        </li>
+
+        <li class="li">
+          <code class="ph codeph">\%</code> and <code class="ph codeph">\_</code> can be used to escape wildcard characters within the string
+          passed to the <code class="ph codeph">LIKE</code> operator.
+        </li>
+
+        <li class="li">
+          <code class="ph codeph">\</code> followed by 3 octal digits represents the ASCII code of a single character; for
+          example, <code class="ph codeph">\101</code> is ASCII 65, the character <code class="ph codeph">A</code>.
+        </li>
+
+        <li class="li">
+          Use two consecutive backslashes (<code class="ph codeph">\\</code>) to prevent the backslash from being interpreted as
+          an escape character.
+        </li>
+
+        <li class="li">
+          Use the backslash to escape single or double quotation mark characters within a string literal, if the
+          literal is enclosed by the same type of quotation mark.
+        </li>
+
+        <li class="li">
+          If the character following the <code class="ph codeph">\</code> does not represent the start of a recognized escape
+          sequence, the character is passed through unchanged.
+        </li>
+      </ul>
+
+      <p class="p">
+        <strong class="ph b">Quotes within quotes:</strong>
+      </p>
+
+      <p class="p">
+        To include a single quotation character within a string value, enclose the literal with either single or
+        double quotation marks, and optionally escape the single quote as a <code class="ph codeph">\'</code> sequence. Earlier
+        releases required escaping a single quote inside double quotes. Continue using escape sequences in this
+        case if you also need to run your SQL code on older versions of Impala.
+      </p>
+
+      <p class="p">
+        To include a double quotation character within a string value, enclose the literal with single quotation
+        marks, no escaping is necessary in this case. Or, enclose the literal with double quotation marks and
+        escape the double quote as a <code class="ph codeph">\"</code> sequence.
+      </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; select "What\'s happening?" as single_within_double,
+                  &gt;        'I\'m not sure.' as single_within_single,
+                  &gt;        "Homer wrote \"The Iliad\"." as double_within_double,
+                  &gt;        'Homer also wrote "The Odyssey".' as double_within_single;
++----------------------+----------------------+--------------------------+---------------------------------+
+| single_within_double | single_within_single | double_within_double     | double_within_single            |
++----------------------+----------------------+--------------------------+---------------------------------+
+| What's happening?    | I'm not sure.        | Homer wrote "The Iliad". | Homer also wrote "The Odyssey". |
++----------------------+----------------------+--------------------------+---------------------------------+
+</code></pre>
+
+      <p class="p">
+        <strong class="ph b">Field terminator character in CREATE TABLE:</strong>
+      </p>
+
+      <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+        The <code class="ph codeph">CREATE TABLE</code> clauses <code class="ph codeph">FIELDS TERMINATED BY</code>, <code class="ph codeph">ESCAPED
+        BY</code>, and <code class="ph codeph">LINES TERMINATED BY</code> have special rules for the string literal used for
+        their argument, because they all require a single character. You can use a regular character surrounded by
+        single or double quotation marks, an octal sequence such as <code class="ph codeph">'\054'</code> (representing a comma),
+        or an integer in the range '-127'..'128' (with quotation marks but no backslash), which is interpreted as a
+        single-byte ASCII character. Negative values are subtracted from 256; for example, <code class="ph codeph">FIELDS
+        TERMINATED BY '-2'</code> sets the field delimiter to ASCII code 254, the <span class="q">"Icelandic Thorn"</span>
+        character used as a delimiter by some data formats.
+      </div>
+
+      <p class="p">
+        <strong class="ph b">impala-shell considerations:</strong>
+      </p>
+
+      <p class="p">
+        When dealing with output that includes non-ASCII or non-printable characters such as linefeeds and
+        backspaces, use the <span class="keyword cmdname">impala-shell</span> options to save to a file, turn off pretty printing, or
+        both rather than relying on how the output appears visually. See
+        <a class="xref" href="impala_shell_options.html#shell_options">impala-shell Configuration Options</a> for a list of <span class="keyword cmdname">impala-shell</span>
+        options.
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="literals__boolean_literals">
+
+    <h2 class="title topictitle2" id="ariaid-title4">Boolean Literals</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        For <code class="ph codeph">BOOLEAN</code> values, the literals are <code class="ph codeph">TRUE</code> and <code class="ph codeph">FALSE</code>,
+        with no quotation marks and case-insensitive.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>select true;
+select * from t1 where assertion = false;
+select case bool_col when true then 'yes' when false 'no' else 'null' end from t1;</code></pre>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title5" id="literals__timestamp_literals">
+
+    <h2 class="title topictitle2" id="ariaid-title5">Timestamp Literals</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        Impala automatically converts <code class="ph codeph">STRING</code> literals of the correct format into
+        <code class="ph codeph">TIMESTAMP</code> values. Timestamp values are accepted in the format
+        <code class="ph codeph">"yyyy-MM-dd HH:mm:ss.SSSSSS"</code>, and can consist of just the date, or just the time, with or
+        without the fractional second portion. For example, you can specify <code class="ph codeph">TIMESTAMP</code> values such as
+        <code class="ph codeph">'1966-07-30'</code>, <code class="ph codeph">'08:30:00'</code>, or <code class="ph codeph">'1985-09-25 17:45:30.005'</code>.
+        <span class="ph">Casting an integer or floating-point value <code class="ph codeph">N</code> to
+        <code class="ph codeph">TIMESTAMP</code> produces a value that is <code class="ph codeph">N</code> seconds past the start of the epoch
+        date (January 1, 1970). By default, the result value represents a date and time in the UTC time zone.
+        If the setting <code class="ph codeph">-use_local_tz_for_unix_timestamp_conversions=true</code> is in effect,
+        the resulting <code class="ph codeph">TIMESTAMP</code> represents a date and time in the local time zone.</span>
+      </p>
+
+      <p class="p">
+        You can also use <code class="ph codeph">INTERVAL</code> expressions to add or subtract from timestamp literal values,
+        such as <code class="ph codeph">'1966-07-30' + INTERVAL 5 YEARS + INTERVAL 3 DAYS</code>. See
+        <a class="xref" href="impala_timestamp.html#timestamp">TIMESTAMP Data Type</a> for details.
+      </p>
+
+      <p class="p">
+        Depending on your data pipeline, you might receive date and time data as text, in notation that does not
+        exactly match the format for Impala <code class="ph codeph">TIMESTAMP</code> literals.
+        See <a class="xref" href="impala_datetime_functions.html#datetime_functions">Impala Date and Time Functions</a> for functions that can convert
+        between a variety of string literals (including different field order, separators, and timezone notation)
+        and equivalent <code class="ph codeph">TIMESTAMP</code> or numeric values.
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title6" id="literals__null">
+
+    <h2 class="title topictitle2" id="ariaid-title6">NULL</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        
+        The notion of <code class="ph codeph">NULL</code> values is familiar from all kinds of database systems, but each SQL
+        dialect can have its own behavior and restrictions on <code class="ph codeph">NULL</code> values. For Big Data
+        processing, the precise semantics of <code class="ph codeph">NULL</code> values are significant: any misunderstanding
+        could lead to inaccurate results or misformatted data, that could be time-consuming to correct for large
+        data sets.
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          <code class="ph codeph">NULL</code> is a different value than an empty string. The empty string is represented by a
+          string literal with nothing inside, <code class="ph codeph">""</code> or <code class="ph codeph">''</code>.
+        </li>
+
+        <li class="li">
+          In a delimited text file, the <code class="ph codeph">NULL</code> value is represented by the special token
+          <code class="ph codeph">\N</code>.
+        </li>
+
+        <li class="li">
+          When Impala inserts data into a partitioned table, and the value of one of the partitioning columns is
+          <code class="ph codeph">NULL</code> or the empty string, the data is placed in a special partition that holds only
+          these two kinds of values. When these values are returned in a query, the result is <code class="ph codeph">NULL</code>
+          whether the value was originally <code class="ph codeph">NULL</code> or an empty string. This behavior is compatible
+          with the way Hive treats <code class="ph codeph">NULL</code> values in partitioned tables. Hive does not allow empty
+          strings as partition keys, and it returns a string value such as
+          <code class="ph codeph">__HIVE_DEFAULT_PARTITION__</code> instead of <code class="ph codeph">NULL</code> when such values are
+          returned from a query. For example:
+<pre class="pre codeblock"><code>create table t1 (i int) partitioned by (x int, y string);
+-- Select an INT column from another table, with all rows going into a special HDFS subdirectory
+-- named __HIVE_DEFAULT_PARTITION__. Depending on whether one or both of the partitioning keys
+-- are null, this special directory name occurs at different levels of the physical data directory
+-- for the table.
+insert into t1 partition(x=NULL, y=NULL) select c1 from some_other_table;
+insert into t1 partition(x, y=NULL) select c1, c2 from some_other_table;
+insert into t1 partition(x=NULL, y) select c1, c3  from some_other_table;</code></pre>
+        </li>
+
+        <li class="li">
+          There is no <code class="ph codeph">NOT NULL</code> clause when defining a column to prevent <code class="ph codeph">NULL</code>
+          values in that column.
+        </li>
+
+        <li class="li">
+          There is no <code class="ph codeph">DEFAULT</code> clause to specify a non-<code class="ph codeph">NULL</code> default value.
+        </li>
+
+        <li class="li">
+          If an <code class="ph codeph">INSERT</code> operation mentions some columns but not others, the unmentioned columns
+          contain <code class="ph codeph">NULL</code> for all inserted rows.
+        </li>
+
+        <li class="li">
+          <p class="p">
+        In Impala 1.2.1 and higher, all <code class="ph codeph">NULL</code> values come at the end of the result set for
+        <code class="ph codeph">ORDER BY ... ASC</code> queries, and at the beginning of the result set for <code class="ph codeph">ORDER BY ...
+        DESC</code> queries. In effect, <code class="ph codeph">NULL</code> is considered greater than all other values for
+        sorting purposes. The original Impala behavior always put <code class="ph codeph">NULL</code> values at the end, even for
+        <code class="ph codeph">ORDER BY ... DESC</code> queries. The new behavior in Impala 1.2.1 makes Impala more compatible
+        with other popular database systems. In Impala 1.2.1 and higher, you can override or specify the sorting
+        behavior for <code class="ph codeph">NULL</code> by adding the clause <code class="ph codeph">NULLS FIRST</code> or <code class="ph codeph">NULLS
+        LAST</code> at the end of the <code class="ph codeph">ORDER BY</code> clause.
+      </p>
+          <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+            
+            Because the <code class="ph codeph">NULLS FIRST</code> and <code class="ph codeph">NULLS LAST</code> keywords are not currently
+            available in Hive queries, any views you create using those keywords will not be available through
+            Hive.
+          </div>
+        </li>
+
+        <li class="li">
+          In all other contexts besides sorting with <code class="ph codeph">ORDER BY</code>, comparing a <code class="ph codeph">NULL</code>
+          to anything else returns <code class="ph codeph">NULL</code>, making the comparison meaningless. For example,
+          <code class="ph codeph">10 &gt; NULL</code> produces <code class="ph codeph">NULL</code>, <code class="ph codeph">10 &lt; NULL</code> also produces
+          <code class="ph codeph">NULL</code>, <code class="ph codeph">5 BETWEEN 1 AND NULL</code> produces <code class="ph codeph">NULL</code>, and so on.
+        </li>
+      </ul>
+
+      <p class="p">
+        Several built-in functions serve as shorthand for evaluating expressions and returning
+        <code class="ph codeph">NULL</code>, 0, or some other substitution value depending on the expression result:
+        <code class="ph codeph">ifnull()</code>, <code class="ph codeph">isnull()</code>, <code class="ph codeph">nvl()</code>, <code class="ph codeph">nullif()</code>,
+        <code class="ph codeph">nullifzero()</code>, and <code class="ph codeph">zeroifnull()</code>. See
+        <a class="xref" href="impala_conditional_functions.html#conditional_functions">Impala Conditional Functions</a> for details.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Kudu considerations:</strong>
+      </p>
+      <p class="p">
+        Columns in Kudu tables have an attribute that specifies whether or not they can contain
+        <code class="ph codeph">NULL</code> values. A column with a <code class="ph codeph">NULL</code> attribute can contain
+        nulls. A column with a <code class="ph codeph">NOT NULL</code> attribute cannot contain any nulls, and
+        an <code class="ph codeph">INSERT</code>, <code class="ph codeph">UPDATE</code>, or <code class="ph codeph">UPSERT</code> statement
+        will skip any row that attempts to store a null in a column designated as <code class="ph codeph">NOT NULL</code>.
+        Kudu tables default to the <code class="ph codeph">NULL</code> setting for each column, except columns that
+        are part of the primary key.
+      </p>
+      <p class="p">
+        In addition to columns with the <code class="ph codeph">NOT NULL</code> attribute, Kudu tables also have
+        restrictions on <code class="ph codeph">NULL</code> values in columns that are part of the primary key for
+        a table. No column that is part of the primary key in a Kudu table can contain any
+        <code class="ph codeph">NULL</code> values.
+      </p>
+
+    </div>
+  </article>
+</article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_live_progress.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_live_progress.html b/docs/build/html/topics/impala_live_progress.html
new file mode 100644
index 0000000..40d6631
--- /dev/null
+++ b/docs/build/html/topics/impala_live_progress.html
@@ -0,0 +1,131 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="live_progress"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>LIVE_PROGRESS Query Option (Impala 2.3 or higher only)</title></head><body id="live_progress"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">LIVE_PROGRESS Query Option (<span class="keyword">Impala 2.3</span> or higher only)</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      
+      For queries submitted through the <span class="keyword cmdname">impala-shell</span> command,
+      displays an interactive progress bar showing roughly what percentage of
+      processing has been completed. When the query finishes, the progress bar is erased
+      from the <span class="keyword cmdname">impala-shell</span> console output.
+    </p>
+
+    <p class="p">
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Type:</strong> Boolean; recognized values are 1 and 0, or <code class="ph codeph">true</code> and <code class="ph codeph">false</code>;
+        any other value interpreted as <code class="ph codeph">false</code>
+      </p>
+    <p class="p">
+        <strong class="ph b">Default:</strong> <code class="ph codeph">false</code> (shown as 0 in output of <code class="ph codeph">SET</code> statement)
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Command-line equivalent:</strong>
+      </p>
+    <p class="p">
+      You can enable this query option within <span class="keyword cmdname">impala-shell</span>
+      by starting the shell with the <code class="ph codeph">--live_progress</code>
+      command-line option.
+      You can still turn this setting off and on again within the shell through the
+      <code class="ph codeph">SET</code> command.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+    <p class="p">
+        The output from this query option is printed to standard error. The output is only displayed in interactive mode,
+        that is, not when the <code class="ph codeph">-q</code> or <code class="ph codeph">-f</code> options are used.
+      </p>
+    <p class="p">
+      For a more detailed way of tracking the progress of an interactive query through
+      all phases of processing, see <a class="xref" href="impala_live_summary.html#live_summary">LIVE_SUMMARY Query Option (Impala 2.3 or higher only)</a>.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Restrictions:</strong>
+      </p>
+    <p class="p">
+      Because the percentage complete figure is calculated using the number of
+      issued and completed <span class="q">"scan ranges"</span>, which occur while reading the table
+      data, the progress bar might reach 100% before the query is entirely finished.
+      For example, the query might do work to perform aggregations after all the
+      table data has been read. If many of your queries fall into this category,
+      consider using the <code class="ph codeph">LIVE_SUMMARY</code> option instead for
+      more granular progress reporting.
+    </p>
+    <p class="p">
+        The <code class="ph codeph">LIVE_PROGRESS</code> and <code class="ph codeph">LIVE_SUMMARY</code> query options
+        currently do not produce any output during <code class="ph codeph">COMPUTE STATS</code> operations.
+      </p>
+    <div class="p">
+        Because the <code class="ph codeph">LIVE_PROGRESS</code> and <code class="ph codeph">LIVE_SUMMARY</code> query options
+        are available only within the <span class="keyword cmdname">impala-shell</span> interpreter:
+        <ul class="ul">
+          <li class="li">
+            <p class="p">
+              You cannot change these query options through the SQL <code class="ph codeph">SET</code>
+              statement using the JDBC or ODBC interfaces. The <code class="ph codeph">SET</code>
+              command in <span class="keyword cmdname">impala-shell</span> recognizes these names as
+              shell-only options.
+            </p>
+          </li>
+          <li class="li">
+            <p class="p">
+              Be careful when using <span class="keyword cmdname">impala-shell</span> on a pre-<span class="keyword">Impala 2.3</span>
+              system to connect to a system running <span class="keyword">Impala 2.3</span> or higher.
+              The older <span class="keyword cmdname">impala-shell</span> does not recognize these
+              query option names. Upgrade <span class="keyword cmdname">impala-shell</span> on the
+              systems where you intend to use these query options.
+            </p>
+          </li>
+          <li class="li">
+            <p class="p">
+              Likewise, the <span class="keyword cmdname">impala-shell</span> command relies on
+              some information only available in <span class="keyword">Impala 2.3</span> and higher
+              to prepare live progress reports and query summaries. The
+              <code class="ph codeph">LIVE_PROGRESS</code> and <code class="ph codeph">LIVE_SUMMARY</code>
+              query options have no effect when <span class="keyword cmdname">impala-shell</span> connects
+              to a cluster running an older version of Impala.
+            </p>
+          </li>
+        </ul>
+      </div>
+
+    <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.3.0</span>
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+<pre class="pre codeblock"><code>[localhost:21000] &gt; set live_progress=true;
+LIVE_PROGRESS set to true
+[localhost:21000] &gt; select count(*) from customer;
++----------+
+| count(*) |
++----------+
+| 150000   |
++----------+
+[localhost:21000] &gt; select count(*) from customer t1 cross join customer t2;
+[###################################                                   ] 50%
+[######################################################################] 100%
+
+
+</code></pre>
+
+    <p class="p">
+        To see how the <code class="ph codeph">LIVE_PROGRESS</code> and <code class="ph codeph">LIVE_SUMMARY</code> query options
+        work in real time, see <a class="xref" href="https://asciinema.org/a/1rv7qippo0fe7h5k1b6k4nexk" target="_blank">this animated demo</a>.
+      </p>
+
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>
\ No newline at end of file

[35/51] [partial] incubator-impala git commit: IMPALA-4181 [DOCS] Publish rendered Impala documentation to ASF site

Posted by jb...@apache.org.

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_distinct.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_distinct.html b/docs/build/html/topics/impala_distinct.html
new file mode 100644
index 0000000..dbdac24
--- /dev/null
+++ b/docs/build/html/topics/impala_distinct.html
@@ -0,0 +1,81 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_select.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="distinct"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>DISTINCT Operator</title></head><body id="distinct"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">DISTINCT Operator</h1>
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      
+      The <code class="ph codeph">DISTINCT</code> operator in a <code class="ph codeph">SELECT</code> statement filters the result set to
+      remove duplicates:
+    </p>
+
+<pre class="pre codeblock"><code>-- Returns the unique values from one column.
+-- NULL is included in the set of values if any rows have a NULL in this column.
+select distinct c_birth_country from customer;
+-- Returns the unique combinations of values from multiple columns.
+select distinct c_salutation, c_last_name from customer;</code></pre>
+
+    <p class="p">
+      You can use <code class="ph codeph">DISTINCT</code> in combination with an aggregation function, typically
+      <code class="ph codeph">COUNT()</code>, to find how many different values a column contains:
+    </p>
+
+<pre class="pre codeblock"><code>-- Counts the unique values from one column.
+-- NULL is not included as a distinct value in the count.
+select count(distinct c_birth_country) from customer;
+-- Counts the unique combinations of values from multiple columns.
+select count(distinct c_salutation, c_last_name) from customer;</code></pre>
+
+    <p class="p">
+      One construct that Impala SQL does <em class="ph i">not</em> support is using <code class="ph codeph">DISTINCT</code> in more than one
+      aggregation function in the same query. For example, you could not have a single query with both
+      <code class="ph codeph">COUNT(DISTINCT c_first_name)</code> and <code class="ph codeph">COUNT(DISTINCT c_last_name)</code> in the
+      <code class="ph codeph">SELECT</code> list.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Zero-length strings:</strong> For purposes of clauses such as <code class="ph codeph">DISTINCT</code> and <code class="ph codeph">GROUP
+        BY</code>, Impala considers zero-length strings (<code class="ph codeph">""</code>), <code class="ph codeph">NULL</code>, and space
+        to all be different values.
+      </p>
+
+    <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+        <p class="p">
+          By default, Impala only allows a single <code class="ph codeph">COUNT(DISTINCT <var class="keyword varname">columns</var>)</code>
+          expression in each query.
+        </p>
+        <p class="p">
+          If you do not need precise accuracy, you can produce an estimate of the distinct values for a column by
+          specifying <code class="ph codeph">NDV(<var class="keyword varname">column</var>)</code>; a query can contain multiple instances of
+          <code class="ph codeph">NDV(<var class="keyword varname">column</var>)</code>. To make Impala automatically rewrite
+          <code class="ph codeph">COUNT(DISTINCT)</code> expressions to <code class="ph codeph">NDV()</code>, enable the
+          <code class="ph codeph">APPX_COUNT_DISTINCT</code> query option.
+        </p>
+        <p class="p">
+          To produce the same result as multiple <code class="ph codeph">COUNT(DISTINCT)</code> expressions, you can use the
+          following technique for queries involving a single table:
+        </p>
+<pre class="pre codeblock"><code>select v1.c1 result1, v2.c1 result2 from
+  (select count(distinct col1) as c1 from t1) v1
+    cross join
+  (select count(distinct col2) as c1 from t1) v2;
+</code></pre>
+        <p class="p">
+          Because <code class="ph codeph">CROSS JOIN</code> is an expensive operation, prefer to use the <code class="ph codeph">NDV()</code>
+          technique wherever practical.
+        </p>
+      </div>
+
+    <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+      <p class="p">
+        In contrast with some database systems that always return <code class="ph codeph">DISTINCT</code> values in sorted order,
+        Impala does not do any ordering of <code class="ph codeph">DISTINCT</code> values. Always include an <code class="ph codeph">ORDER
+        BY</code> clause if you need the values in alphabetical or numeric sorted order.
+      </p>
+    </div>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_select.html">SELECT Statement</a></div></div></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_dml.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_dml.html b/docs/build/html/topics/impala_dml.html
new file mode 100644
index 0000000..71b7158
--- /dev/null
+++ b/docs/build/html/topics/impala_dml.html
@@ -0,0 +1,82 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_langref_sql.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="dml"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>DML Statements</title></head><body id="dml"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">DML Statements</h1>
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      DML refers to <span class="q">"Data Manipulation Language"</span>, a subset of SQL statements that modify the data stored in
+      tables. Because Impala focuses on query performance and leverages the append-only nature of HDFS storage,
+      currently Impala only supports a small set of DML statements:
+    </p>
+
+    <ul class="ul">
+      <li class="li">
+        <a class="xref" href="impala_delete.html">DELETE Statement (Impala 2.8 or higher only)</a>. Works for Kudu tables only.
+      </li>
+
+      <li class="li">
+        <a class="xref" href="impala_insert.html">INSERT Statement</a>.
+      </li>
+
+      <li class="li">
+        <a class="xref" href="impala_load_data.html">LOAD DATA Statement</a>. Does not apply for HBase or Kudu tables.
+      </li>
+
+      <li class="li">
+        <a class="xref" href="impala_update.html">UPDATE Statement (Impala 2.8 or higher only)</a>. Works for Kudu tables only.
+      </li>
+
+      <li class="li">
+        <a class="xref" href="impala_upsert.html">UPSERT Statement (Impala 2.8 or higher only)</a>. Works for Kudu tables only.
+      </li>
+    </ul>
+
+    <p class="p">
+      <code class="ph codeph">INSERT</code> in Impala is primarily optimized for inserting large volumes of data in a single
+      statement, to make effective use of the multi-megabyte HDFS blocks. This is the way in Impala to create new
+      data files. If you intend to insert one or a few rows at a time, such as using the <code class="ph codeph">INSERT ...
+      VALUES</code> syntax, that technique is much more efficient for Impala tables stored in HBase. See
+      <a class="xref" href="impala_hbase.html#impala_hbase">Using Impala to Query HBase Tables</a> for details.
+    </p>
+
+    <p class="p">
+      <code class="ph codeph">LOAD DATA</code> moves existing data files into the directory for an Impala table, making them
+      immediately available for Impala queries. This is one way in Impala to work with data files produced by other
+      Hadoop components. (<code class="ph codeph">CREATE EXTERNAL TABLE</code> is the other alternative; with external tables,
+      you can query existing data files, while the files remain in their original location.)
+    </p>
+
+    <p class="p">
+      In <span class="keyword">Impala 2.8</span> and higher, Impala does support the <code class="ph codeph">UPDATE</code>, <code class="ph codeph">DELETE</code>,
+      and <code class="ph codeph">UPSERT</code> statements for Kudu tables.
+      For HDFS or S3 tables, to simulate the effects of an <code class="ph codeph">UPDATE</code> or <code class="ph codeph">DELETE</code> statement
+      in other database systems, typically you use <code class="ph codeph">INSERT</code> or <code class="ph codeph">CREATE TABLE AS SELECT</code> to copy data
+      from one table to another, filtering out or changing the appropriate rows during the copy operation.
+    </p>
+
+    <p class="p">
+      You can also achieve a result similar to <code class="ph codeph">UPDATE</code> by using Impala tables stored in HBase.
+      When you insert a row into an HBase table, and the table
+      already contains a row with the same value for the key column, the older row is hidden, effectively the same
+      as a single-row <code class="ph codeph">UPDATE</code>.
+    </p>
+
+    <p class="p">
+      Impala can perform DML operations for tables or partitions stored in the Amazon S3 filesystem
+      with <span class="keyword">Impala 2.6</span> and higher. See <a class="xref" href="impala_s3.html#s3">Using Impala with the Amazon S3 Filesystem</a> for details.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+    <p class="p">
+      The other major classifications of SQL statements are data definition language (see
+      <a class="xref" href="impala_ddl.html#ddl">DDL Statements</a>) and queries (see <a class="xref" href="impala_select.html#select">SELECT Statement</a>).
+    </p>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_langref_sql.html">Impala SQL Statements</a></div></div></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_double.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_double.html b/docs/build/html/topics/impala_double.html
new file mode 100644
index 0000000..b87994c
--- /dev/null
+++ b/docs/build/html/topics/impala_double.html
@@ -0,0 +1,144 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_datatypes.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="double"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>DOUBLE Data Type</title></head><body id="double"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">DOUBLE Data Type</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      A double precision floating-point data type used in <code class="ph codeph">CREATE TABLE</code> and <code class="ph codeph">ALTER
+      TABLE</code> statements.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+    <p class="p">
+      In the column definition of a <code class="ph codeph">CREATE TABLE</code> statement:
+    </p>
+
+<pre class="pre codeblock"><code><var class="keyword varname">column_name</var> DOUBLE</code></pre>
+
+    <p class="p">
+      <strong class="ph b">Range:</strong> 4.94065645841246544e-324d .. 1.79769313486231570e+308, positive or negative
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Precision:</strong> 15 to 17 significant digits, depending on usage. The number of significant digits does
+      not depend on the position of the decimal point.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Representation:</strong> The values are stored in 8 bytes, using
+      <a class="xref" href="https://en.wikipedia.org/wiki/Double-precision_floating-point_format" target="_blank">IEEE 754 Double Precision Binary Floating Point</a> format.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Conversions:</strong> Impala does not automatically convert <code class="ph codeph">DOUBLE</code> to any other type. You can
+      use <code class="ph codeph">CAST()</code> to convert <code class="ph codeph">DOUBLE</code> values to <code class="ph codeph">FLOAT</code>,
+      <code class="ph codeph">TINYINT</code>, <code class="ph codeph">SMALLINT</code>, <code class="ph codeph">INT</code>, <code class="ph codeph">BIGINT</code>,
+      <code class="ph codeph">STRING</code>, <code class="ph codeph">TIMESTAMP</code>, or <code class="ph codeph">BOOLEAN</code>. You can use exponential
+      notation in <code class="ph codeph">DOUBLE</code> literals or when casting from <code class="ph codeph">STRING</code>, for example
+      <code class="ph codeph">1.0e6</code> to represent one million.
+      <span class="ph">Casting an integer or floating-point value <code class="ph codeph">N</code> to
+        <code class="ph codeph">TIMESTAMP</code> produces a value that is <code class="ph codeph">N</code> seconds past the start of the epoch
+        date (January 1, 1970). By default, the result value represents a date and time in the UTC time zone.
+        If the setting <code class="ph codeph">-use_local_tz_for_unix_timestamp_conversions=true</code> is in effect,
+        the resulting <code class="ph codeph">TIMESTAMP</code> represents a date and time in the local time zone.</span>
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+    <p class="p">
+      The data type <code class="ph codeph">REAL</code> is an alias for <code class="ph codeph">DOUBLE</code>.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>CREATE TABLE t1 (x DOUBLE);
+SELECT CAST(1000.5 AS DOUBLE);
+</code></pre>
+
+    <p class="p">
+        <strong class="ph b">Partitioning:</strong> Because fractional values of this type are not always represented precisely, when this
+        type is used for a partition key column, the underlying HDFS directories might not be named exactly as you
+        expect. Prefer to partition on a <code class="ph codeph">DECIMAL</code> column instead.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">HBase considerations:</strong> This data type is fully compatible with HBase tables.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Parquet considerations:</strong> This type is fully compatible with Parquet tables.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Text table considerations:</strong> Values of this type are potentially larger in text tables than in tables
+        using Parquet or other binary formats.
+      </p>
+
+
+
+    <p class="p">
+        <strong class="ph b">Internal details:</strong> Represented in memory as an 8-byte value.
+      </p>
+
+
+
+    <p class="p">
+        <strong class="ph b">Column statistics considerations:</strong> Because this type has a fixed size, the maximum and average size
+        fields are always filled in for column statistics, even before you run the <code class="ph codeph">COMPUTE STATS</code>
+        statement.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Restrictions:</strong>
+      </p>
+
+
+
+    <p class="p">
+        Due to the way arithmetic on <code class="ph codeph">FLOAT</code> and <code class="ph codeph">DOUBLE</code> columns uses
+        high-performance hardware instructions, and distributed queries can perform these operations in different
+        order for each query, results can vary slightly for aggregate function calls such as <code class="ph codeph">SUM()</code>
+        and <code class="ph codeph">AVG()</code> for <code class="ph codeph">FLOAT</code> and <code class="ph codeph">DOUBLE</code> columns, particularly on
+        large data sets where millions or billions of values are summed or averaged. For perfect consistency and
+        repeatability, use the <code class="ph codeph">DECIMAL</code> data type for such operations instead of
+        <code class="ph codeph">FLOAT</code> or <code class="ph codeph">DOUBLE</code>.
+      </p>
+
+    <p class="p">
+        The inability to exactly represent certain floating-point values means that
+        <code class="ph codeph">DECIMAL</code> is sometimes a better choice than <code class="ph codeph">DOUBLE</code>
+        or <code class="ph codeph">FLOAT</code> when precision is critical, particularly when
+        transferring data from other database systems that use different representations
+        or file formats.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Kudu considerations:</strong>
+      </p>
+    <p class="p">
+        Currently, the data types <code class="ph codeph">BOOLEAN</code>, <code class="ph codeph">FLOAT</code>,
+        and <code class="ph codeph">DOUBLE</code> cannot be used for primary key columns in Kudu tables.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+    <p class="p">
+      <a class="xref" href="impala_literals.html#numeric_literals">Numeric Literals</a>, <a class="xref" href="impala_math_functions.html#math_functions">Impala Mathematical Functions</a>,
+      <a class="xref" href="impala_float.html#float">FLOAT Data Type</a>
+    </p>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_datatypes.html">Data Types</a></div></div></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_drop_database.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_drop_database.html b/docs/build/html/topics/impala_drop_database.html
new file mode 100644
index 0000000..1e974df
--- /dev/null
+++ b/docs/build/html/topics/impala_drop_database.html
@@ -0,0 +1,193 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_langref_sql.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="drop_database"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>DROP DATABASE Statement</title></head><body id="drop_database"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">DROP DATABASE Statement</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      
+      Removes a database from the system. The physical operations involve removing the metadata for the database
+      from the metastore, and deleting the corresponding <code class="ph codeph">*.db</code> directory from HDFS.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>DROP (DATABASE|SCHEMA) [IF EXISTS] <var class="keyword varname">database_name</var> <span class="ph">[RESTRICT | CASCADE]</span>;</code></pre>
+
+    <p class="p">
+        <strong class="ph b">Statement type:</strong> DDL
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+    <p class="p">
+      By default, the database must be empty before it can be dropped, to avoid losing any data.
+    </p>
+
+    <p class="p">
+      In <span class="keyword">Impala 2.3</span> and higher, you can include the <code class="ph codeph">CASCADE</code>
+      clause to make Impala drop all tables and other objects in the database before dropping the database itself.
+      The <code class="ph codeph">RESTRICT</code> clause enforces the original requirement that the database be empty
+      before being dropped. Because the <code class="ph codeph">RESTRICT</code> behavior is still the default, this
+      clause is optional.
+    </p>
+
+    <p class="p">
+      The automatic dropping resulting from the <code class="ph codeph">CASCADE</code> clause follows the same rules as the
+      corresponding <code class="ph codeph">DROP TABLE</code>, <code class="ph codeph">DROP VIEW</code>, and <code class="ph codeph">DROP FUNCTION</code> statements.
+      In particular, the HDFS directories and data files for any external tables are left behind when the
+      tables are removed.
+    </p>
+
+    <p class="p">
+      When you do not use the <code class="ph codeph">CASCADE</code> clause, drop or move all the objects inside the database manually
+      before dropping the database itself:
+    </p>
+
+    <ul class="ul">
+      <li class="li">
+        <p class="p">
+          Use the <code class="ph codeph">SHOW TABLES</code> statement to locate all tables and views in the database,
+          and issue <code class="ph codeph">DROP TABLE</code> and <code class="ph codeph">DROP VIEW</code> statements to remove them all.
+        </p>
+      </li>
+      <li class="li">
+        <p class="p">
+          Use the <code class="ph codeph">SHOW FUNCTIONS</code> and <code class="ph codeph">SHOW AGGREGATE FUNCTIONS</code> statements
+          to locate all user-defined functions in the database, and issue <code class="ph codeph">DROP FUNCTION</code>
+          and <code class="ph codeph">DROP AGGREGATE FUNCTION</code> statements to remove them all.
+        </p>
+      </li>
+      <li class="li">
+        <p class="p">
+          To keep tables or views contained by a database while removing the database itself, use
+          <code class="ph codeph">ALTER TABLE</code> and <code class="ph codeph">ALTER VIEW</code> to move the relevant
+          objects to a different database before dropping the original database.
+        </p>
+      </li>
+    </ul>
+
+    <p class="p">
+      You cannot drop the current database, that is, the database your session connected to
+      either through the <code class="ph codeph">USE</code> statement or the <code class="ph codeph">-d</code> option of <span class="keyword cmdname">impala-shell</span>.
+      Issue a <code class="ph codeph">USE</code> statement to switch to a different database first.
+      Because the <code class="ph codeph">default</code> database is always available, issuing
+      <code class="ph codeph">USE default</code> is a convenient way to leave the current database
+      before dropping it.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Hive considerations:</strong>
+      </p>
+
+    <p class="p">
+      When you drop a database in Impala, the database can no longer be used by Hive.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+
+
+    <p class="p">
+      See <a class="xref" href="impala_create_database.html#create_database">CREATE DATABASE Statement</a> for examples covering <code class="ph codeph">CREATE
+      DATABASE</code>, <code class="ph codeph">USE</code>, and <code class="ph codeph">DROP DATABASE</code>.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Amazon S3 considerations:</strong>
+      </p>
+
+    <p class="p">
+        In <span class="keyword">Impala 2.6</span> and higher, Impala DDL statements such as
+        <code class="ph codeph">CREATE DATABASE</code>, <code class="ph codeph">CREATE TABLE</code>, <code class="ph codeph">DROP DATABASE CASCADE</code>,
+        <code class="ph codeph">DROP TABLE</code>, and <code class="ph codeph">ALTER TABLE [ADD|DROP] PARTITION</code> can create or remove folders
+        as needed in the Amazon S3 system. Prior to <span class="keyword">Impala 2.6</span>, you had to create folders yourself and point
+        Impala database, tables, or partitions at them, and manually remove folders when no longer needed.
+        See <a class="xref" href="../shared/../topics/impala_s3.html#s3">Using Impala with the Amazon S3 Filesystem</a> for details about reading and writing S3 data with Impala.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Cancellation:</strong> Cannot be cancelled.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">HDFS permissions:</strong>
+      </p>
+    <p class="p">
+      The user ID that the <span class="keyword cmdname">impalad</span> daemon runs under,
+      typically the <code class="ph codeph">impala</code> user, must have write
+      permission for the directory associated with the database.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+    <pre class="pre codeblock"><code>create database first_db;
+use first_db;
+create table t1 (x int);
+
+create database second_db;
+use second_db;
+-- Each database has its own namespace for tables.
+-- You can reuse the same table names in each database.
+create table t1 (s string);
+
+create database temp;
+
+-- You can either USE a database after creating it,
+-- or qualify all references to the table name with the name of the database.
+-- Here, tables T2 and T3 are both created in the TEMP database.
+
+create table temp.t2 (x int, y int);
+use database temp;
+create table t3 (s string);
+
+-- You cannot drop a database while it is selected by the USE statement.
+drop database temp;
+<em class="ph i">ERROR: AnalysisException: Cannot drop current default database: temp</em>
+
+-- The always-available database 'default' is a convenient one to USE
+-- before dropping a database you created.
+use default;
+
+-- Before dropping a database, first drop all the tables inside it,
+<span class="ph">-- or in <span class="keyword">Impala 2.3</span> and higher use the CASCADE clause.</span>
+drop database temp;
+ERROR: ImpalaRuntimeException: Error making 'dropDatabase' RPC to Hive Metastore:
+CAUSED BY: InvalidOperationException: Database temp is not empty
+show tables in temp;
++------+
+| name |
++------+
+| t3   |
++------+
+
+<span class="ph">-- <span class="keyword">Impala 2.3</span> and higher:</span>
+<span class="ph">drop database temp cascade;</span>
+
+-- Earlier releases:
+drop table temp.t3;
+drop database temp;
+</code></pre>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+    <p class="p">
+      <a class="xref" href="impala_databases.html#databases">Overview of Impala Databases</a>, <a class="xref" href="impala_create_database.html#create_database">CREATE DATABASE Statement</a>,
+      <a class="xref" href="impala_use.html#use">USE Statement</a>, <a class="xref" href="impala_show.html#show_databases">SHOW DATABASES</a>, <a class="xref" href="impala_drop_table.html#drop_table">DROP TABLE Statement</a>
+    </p>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_langref_sql.html">Impala SQL Statements</a></div></div></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_drop_function.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_drop_function.html b/docs/build/html/topics/impala_drop_function.html
new file mode 100644
index 0000000..fd9f839
--- /dev/null
+++ b/docs/build/html/topics/impala_drop_function.html
@@ -0,0 +1,136 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_langref_sql.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="drop_function"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>DROP FUNCTION Statement</title></head><body id="drop_function"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">DROP FUNCTION Statement</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      
+      Removes a user-defined function (UDF), so that it is not available for execution during Impala
+      <code class="ph codeph">SELECT</code> or <code class="ph codeph">INSERT</code> operations.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+    <p class="p">
+      To drop C++ UDFs and UDAs:
+    </p>
+
+<pre class="pre codeblock"><code>DROP [AGGREGATE] FUNCTION [IF EXISTS] [<var class="keyword varname">db_name</var>.]<var class="keyword varname">function_name</var>(<var class="keyword varname">type</var>[, <var class="keyword varname">type</var>...])</code></pre>
+
+    <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+      <p class="p">
+        The preceding syntax, which includes the function signature, also applies to Java UDFs that were created
+        using the corresponding <code class="ph codeph">CREATE FUNCTION</code> syntax that includes the argument and return types.
+        After upgrading to <span class="keyword">Impala 2.5</span> or higher, consider re-creating all Java UDFs with the
+        <code class="ph codeph">CREATE FUNCTION</code> syntax that does not include the function signature. Java UDFs created this
+        way are now persisted in the metastore database and do not need to be re-created after an Impala restart.
+      </p>
+    </div>
+
+    <p class="p">
+      To drop Java UDFs (created using the <code class="ph codeph">CREATE FUNCTION</code> syntax with no function signature):
+    </p>
+
+<pre class="pre codeblock"><code>DROP FUNCTION [IF EXISTS] [<var class="keyword varname">db_name</var>.]<var class="keyword varname">function_name</var></code></pre>
+
+
+
+    <p class="p">
+        <strong class="ph b">Statement type:</strong> DDL
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+    <p class="p">
+      Because the same function name could be overloaded with different argument signatures, you specify the
+      argument types to identify the exact function to drop.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Restrictions:</strong>
+      </p>
+
+    <p class="p">
+        In <span class="keyword">Impala 2.5</span> and higher, Impala UDFs and UDAs written in C++ are persisted in the metastore database.
+        Java UDFs are also persisted, if they were created with the new <code class="ph codeph">CREATE FUNCTION</code> syntax for Java UDFs,
+        where the Java function argument and return types are omitted.
+        Java-based UDFs created with the old <code class="ph codeph">CREATE FUNCTION</code> syntax do not persist across restarts
+        because they are held in the memory of the <span class="keyword cmdname">catalogd</span> daemon.
+        Until you re-create such Java UDFs using the new <code class="ph codeph">CREATE FUNCTION</code> syntax,
+        you must reload those Java-based UDFs by running the original <code class="ph codeph">CREATE FUNCTION</code> statements again each time
+        you restart the <span class="keyword cmdname">catalogd</span> daemon.
+        Prior to <span class="keyword">Impala 2.5</span> the requirement to reload functions after a restart applied to both C++ and Java functions.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Cancellation:</strong> Cannot be cancelled.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">HDFS permissions:</strong>
+      </p>
+    <p class="p">
+      The user ID that the <span class="keyword cmdname">impalad</span> daemon runs under,
+      typically the <code class="ph codeph">impala</code> user, does not need any
+      particular HDFS permissions to perform this statement.
+      All read and write operations are on the metastore database,
+      not HDFS files and directories.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+    <p class="p">
+      The following example shows how to drop Java functions created with the signatureless
+      <code class="ph codeph">CREATE FUNCTION</code> syntax in <span class="keyword">Impala 2.5</span> and higher.
+      Issuing <code class="ph codeph">DROP FUNCTION <var class="keyword varname">function_name</var></code> removes all the
+      overloaded functions under that name.
+      (See <a class="xref" href="impala_create_function.html#create_function">CREATE FUNCTION Statement</a> for a longer example
+      showing how to set up such functions in the first place.)
+    </p>
+<pre class="pre codeblock"><code>
+create function my_func location '/user/impala/udfs/udf-examples.jar'
+  symbol='org.apache.impala.TestUdf';
+
+show functions;
++-------------+---------------------------------------+-------------+---------------+
+| return type | signature                             | binary type | is persistent |
++-------------+---------------------------------------+-------------+---------------+
+| BIGINT      | my_func(BIGINT)                       | JAVA        | true          |
+| BOOLEAN     | my_func(BOOLEAN)                      | JAVA        | true          |
+| BOOLEAN     | my_func(BOOLEAN, BOOLEAN)             | JAVA        | true          |
+...
+| BIGINT      | testudf(BIGINT)                       | JAVA        | true          |
+| BOOLEAN     | testudf(BOOLEAN)                      | JAVA        | true          |
+| BOOLEAN     | testudf(BOOLEAN, BOOLEAN)             | JAVA        | true          |
+...
+
+drop function my_func;
+show functions;
++-------------+---------------------------------------+-------------+---------------+
+| return type | signature                             | binary type | is persistent |
++-------------+---------------------------------------+-------------+---------------+
+| BIGINT      | testudf(BIGINT)                       | JAVA        | true          |
+| BOOLEAN     | testudf(BOOLEAN)                      | JAVA        | true          |
+| BOOLEAN     | testudf(BOOLEAN, BOOLEAN)             | JAVA        | true          |
+...
+</code></pre>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+    <p class="p">
+      <a class="xref" href="impala_udf.html#udfs">Impala User-Defined Functions (UDFs)</a>, <a class="xref" href="impala_create_function.html#create_function">CREATE FUNCTION Statement</a>
+    </p>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_langref_sql.html">Impala SQL Statements</a></div></div></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_drop_role.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_drop_role.html b/docs/build/html/topics/impala_drop_role.html
new file mode 100644
index 0000000..addaf76
--- /dev/null
+++ b/docs/build/html/topics/impala_drop_role.html
@@ -0,0 +1,71 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_langref_sql.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="drop_role"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>DROP ROLE Statement (Impala 2.0 or higher only)</title></head><body id="drop_role"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">DROP ROLE Statement (<span class="keyword">Impala 2.0</span> or higher only)</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      
+
+      The <code class="ph codeph">DROP ROLE</code> statement removes a role from the metastore database. Once dropped, the role
+      is revoked for all users to whom it was previously assigned, and all privileges granted to that role are
+      revoked. Queries that are already executing are not affected. Impala verifies the role information
+      approximately every 60 seconds, so the effects of <code class="ph codeph">DROP ROLE</code> might not take effect for new
+      Impala queries for a brief period.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>DROP ROLE <var class="keyword varname">role_name</var>
+</code></pre>
+
+    <p class="p">
+        <strong class="ph b">Required privileges:</strong>
+      </p>
+
+    <p class="p">
+      Only administrative users (initially, a predefined set of users specified in the Sentry service configuration
+      file) can use this statement.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Compatibility:</strong>
+      </p>
+
+    <p class="p">
+      Impala makes use of any roles and privileges specified by the <code class="ph codeph">GRANT</code> and
+      <code class="ph codeph">REVOKE</code> statements in Hive, and Hive makes use of any roles and privileges specified by the
+      <code class="ph codeph">GRANT</code> and <code class="ph codeph">REVOKE</code> statements in Impala. The Impala <code class="ph codeph">GRANT</code>
+      and <code class="ph codeph">REVOKE</code> statements for privileges do not require the <code class="ph codeph">ROLE</code> keyword to be
+      repeated before each role name, unlike the equivalent Hive statements.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+    <p class="p">
+      <a class="xref" href="impala_authorization.html#authorization">Enabling Sentry Authorization for Impala</a>, <a class="xref" href="impala_grant.html#grant">GRANT Statement (Impala 2.0 or higher only)</a>
+      <a class="xref" href="impala_revoke.html#revoke">REVOKE Statement (Impala 2.0 or higher only)</a>, <a class="xref" href="impala_create_role.html#create_role">CREATE ROLE Statement (Impala 2.0 or higher only)</a>,
+      <a class="xref" href="impala_show.html#show">SHOW Statement</a>
+    </p>
+
+
+
+    <p class="p">
+        <strong class="ph b">Cancellation:</strong> Cannot be cancelled.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">HDFS permissions:</strong> This statement does not touch any HDFS files or directories,
+        therefore no HDFS permissions are required.
+      </p>
+
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_langref_sql.html">Impala SQL Statements</a></div></div></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_drop_stats.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_drop_stats.html b/docs/build/html/topics/impala_drop_stats.html
new file mode 100644
index 0000000..f023867
--- /dev/null
+++ b/docs/build/html/topics/impala_drop_stats.html
@@ -0,0 +1,285 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_langref_sql.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="drop_stats"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>DROP STATS Statement</title></head><body id="drop_stats"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">DROP STATS Statement</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      
+      Removes the specified statistics from a table or partition. The statistics were originally created by the
+      <code class="ph codeph">COMPUTE STATS</code> or <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> statement.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>DROP STATS [<var class="keyword varname">database_name</var>.]<var class="keyword varname">table_name</var>
+DROP INCREMENTAL STATS [<var class="keyword varname">database_name</var>.]<var class="keyword varname">table_name</var> PARTITION (<var class="keyword varname">partition_spec</var>)
+
+<var class="keyword varname">partition_spec</var> ::= <var class="keyword varname">partition_col</var>=<var class="keyword varname">constant_value</var>
+</code></pre>
+
+    <p class="p">
+        The <code class="ph codeph">PARTITION</code> clause is only allowed in combination with the <code class="ph codeph">INCREMENTAL</code>
+        clause. It is optional for <code class="ph codeph">COMPUTE INCREMENTAL STATS</code>, and required for <code class="ph codeph">DROP
+        INCREMENTAL STATS</code>. Whenever you specify partitions through the <code class="ph codeph">PARTITION
+        (<var class="keyword varname">partition_spec</var>)</code> clause in a <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> or
+        <code class="ph codeph">DROP INCREMENTAL STATS</code> statement, you must include all the partitioning columns in the
+        specification, and specify constant values for all the partition key columns.
+      </p>
+
+    <p class="p">
+      <code class="ph codeph">DROP STATS</code> removes all statistics from the table, whether created by <code class="ph codeph">COMPUTE
+      STATS</code> or <code class="ph codeph">COMPUTE INCREMENTAL STATS</code>.
+    </p>
+
+    <p class="p">
+      <code class="ph codeph">DROP INCREMENTAL STATS</code> only affects incremental statistics for a single partition, specified
+      through the <code class="ph codeph">PARTITION</code> clause. The incremental stats are marked as outdated, so that they are
+      recomputed by the next <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> statement.
+    </p>
+
+
+
+    <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+    <p class="p">
+      You typically use this statement when the statistics for a table or a partition have become stale due to data
+      files being added to or removed from the associated HDFS data directories, whether by manual HDFS operations
+      or <code class="ph codeph">INSERT</code>, <code class="ph codeph">INSERT OVERWRITE</code>, or <code class="ph codeph">LOAD DATA</code> statements, or
+      adding or dropping partitions.
+    </p>
+
+    <p class="p">
+      When a table or partition has no associated statistics, Impala treats it as essentially zero-sized when
+      constructing the execution plan for a query. In particular, the statistics influence the order in which
+      tables are joined in a join query. To ensure proper query planning and good query performance and
+      scalability, make sure to run <code class="ph codeph">COMPUTE STATS</code> or <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> on
+      the table or partition after removing any stale statistics.
+    </p>
+
+    <p class="p">
+      Dropping the statistics is not required for an unpartitioned table or a partitioned table covered by the
+      original type of statistics. A subsequent <code class="ph codeph">COMPUTE STATS</code> statement replaces any existing
+      statistics with new ones, for all partitions, regardless of whether the old ones were outdated. Therefore,
+      this statement was rarely used before the introduction of incremental statistics.
+    </p>
+
+    <p class="p">
+      Dropping the statistics is required for a partitioned table containing incremental statistics, to make a
+      subsequent <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> statement rescan an existing partition. See
+      <a class="xref" href="impala_perf_stats.html#perf_stats">Table and Column Statistics</a> for information about incremental statistics, a new feature
+      available in Impala 2.1.0 and higher.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Statement type:</strong> DDL
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Cancellation:</strong> Cannot be cancelled.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">HDFS permissions:</strong>
+      </p>
+    <p class="p">
+      The user ID that the <span class="keyword cmdname">impalad</span> daemon runs under,
+      typically the <code class="ph codeph">impala</code> user, does not need any
+      particular HDFS permissions to perform this statement.
+      All read and write operations are on the metastore database,
+      not HDFS files and directories.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+    <p class="p">
+      The following example shows a partitioned table that has associated statistics produced by the
+      <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> statement, and how the situation evolves as statistics are dropped
+      from specific partitions, then the entire table.
+    </p>
+
+    <p class="p">
+      Initially, all table and column statistics are filled in.
+    </p>
+
+
+
+<pre class="pre codeblock"><code>show table stats item_partitioned;
++-------------+-------+--------+----------+--------------+---------+-----------------
+| i_category  | #Rows | #Files | Size     | Bytes Cached | Format  | Incremental stats
++-------------+-------+--------+----------+--------------+---------+-----------------
+| Books       | 1733  | 1      | 223.74KB | NOT CACHED   | PARQUET | true
+| Children    | 1786  | 1      | 230.05KB | NOT CACHED   | PARQUET | true
+| Electronics | 1812  | 1      | 232.67KB | NOT CACHED   | PARQUET | true
+| Home        | 1807  | 1      | 232.56KB | NOT CACHED   | PARQUET | true
+| Jewelry     | 1740  | 1      | 223.72KB | NOT CACHED   | PARQUET | true
+| Men         | 1811  | 1      | 231.25KB | NOT CACHED   | PARQUET | true
+| Music       | 1860  | 1      | 237.90KB | NOT CACHED   | PARQUET | true
+| Shoes       | 1835  | 1      | 234.90KB | NOT CACHED   | PARQUET | true
+| Sports      | 1783  | 1      | 227.97KB | NOT CACHED   | PARQUET | true
+| Women       | 1790  | 1      | 226.27KB | NOT CACHED   | PARQUET | true
+| Total       | 17957 | 10     | 2.25MB   | 0B           |         |
++-------------+-------+--------+----------+--------------+---------+-----------------
+show column stats item_partitioned;
++------------------+-----------+------------------+--------+----------+--------------
+| Column           | Type      | #Distinct Values | #Nulls | Max Size | Avg Size
++------------------+-----------+------------------+--------+----------+--------------
+| i_item_sk        | INT       | 19443            | -1     | 4        | 4
+| i_item_id        | STRING    | 9025             | -1     | 16       | 16
+| i_rec_start_date | TIMESTAMP | 4                | -1     | 16       | 16
+| i_rec_end_date   | TIMESTAMP | 3                | -1     | 16       | 16
+| i_item_desc      | STRING    | 13330            | -1     | 200      | 100.302803039
+| i_current_price  | FLOAT     | 2807             | -1     | 4        | 4
+| i_wholesale_cost | FLOAT     | 2105             | -1     | 4        | 4
+| i_brand_id       | INT       | 965              | -1     | 4        | 4
+| i_brand          | STRING    | 725              | -1     | 22       | 16.1776008605
+| i_class_id       | INT       | 16               | -1     | 4        | 4
+| i_class          | STRING    | 101              | -1     | 15       | 7.76749992370
+| i_category_id    | INT       | 10               | -1     | 4        | 4
+| i_manufact_id    | INT       | 1857             | -1     | 4        | 4
+| i_manufact       | STRING    | 1028             | -1     | 15       | 11.3295001983
+| i_size           | STRING    | 8                | -1     | 11       | 4.33459997177
+| i_formulation    | STRING    | 12884            | -1     | 20       | 19.9799995422
+| i_color          | STRING    | 92               | -1     | 10       | 5.38089990615
+| i_units          | STRING    | 22               | -1     | 7        | 4.18690013885
+| i_container      | STRING    | 2                | -1     | 7        | 6.99259996414
+| i_manager_id     | INT       | 105              | -1     | 4        | 4
+| i_product_name   | STRING    | 19094            | -1     | 25       | 18.0233001708
+| i_category       | STRING    | 10               | 0      | -1       | -1
++------------------+-----------+------------------+--------+----------+--------------
+</code></pre>
+
+    <p class="p">
+      To remove statistics for particular partitions, use the <code class="ph codeph">DROP INCREMENTAL STATS</code> statement.
+      After removing statistics for two partitions, the table-level statistics reflect that change in the
+      <code class="ph codeph">#Rows</code> and <code class="ph codeph">Incremental stats</code> fields. The counts, maximums, and averages of
+      the column-level statistics are unaffected.
+    </p>
+
+    <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+      (It is possible that the row count might be preserved in future after a <code class="ph codeph">DROP INCREMENTAL
+      STATS</code> statement. Check the resolution of the issue
+      <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-1615" target="_blank">IMPALA-1615</a>.)
+    </div>
+
+<pre class="pre codeblock"><code>drop incremental stats item_partitioned partition (i_category='Sports');
+drop incremental stats item_partitioned partition (i_category='Electronics');
+
+show table stats item_partitioned
++-------------+-------+--------+----------+--------------+---------+------------------
+| i_category  | #Rows | #Files | Size     | Bytes Cached | Format  | Incremental stats
++-------------+-------+--------+----------+--------------+---------+-----------------
+| Books       | 1733  | 1      | 223.74KB | NOT CACHED   | PARQUET | true
+| Children    | 1786  | 1      | 230.05KB | NOT CACHED   | PARQUET | true
+| Electronics | -1    | 1      | 232.67KB | NOT CACHED   | PARQUET | false
+| Home        | 1807  | 1      | 232.56KB | NOT CACHED   | PARQUET | true
+| Jewelry     | 1740  | 1      | 223.72KB | NOT CACHED   | PARQUET | true
+| Men         | 1811  | 1      | 231.25KB | NOT CACHED   | PARQUET | true
+| Music       | 1860  | 1      | 237.90KB | NOT CACHED   | PARQUET | true
+| Shoes       | 1835  | 1      | 234.90KB | NOT CACHED   | PARQUET | true
+| Sports      | -1    | 1      | 227.97KB | NOT CACHED   | PARQUET | false
+| Women       | 1790  | 1      | 226.27KB | NOT CACHED   | PARQUET | true
+| Total       | 17957 | 10     | 2.25MB   | 0B           |         |
++-------------+-------+--------+----------+--------------+---------+-----------------
+show column stats item_partitioned
++------------------+-----------+------------------+--------+----------+--------------
+| Column           | Type      | #Distinct Values | #Nulls | Max Size | Avg Size
++------------------+-----------+------------------+--------+----------+--------------
+| i_item_sk        | INT       | 19443            | -1     | 4        | 4
+| i_item_id        | STRING    | 9025             | -1     | 16       | 16
+| i_rec_start_date | TIMESTAMP | 4                | -1     | 16       | 16
+| i_rec_end_date   | TIMESTAMP | 3                | -1     | 16       | 16
+| i_item_desc      | STRING    | 13330            | -1     | 200      | 100.302803039
+| i_current_price  | FLOAT     | 2807             | -1     | 4        | 4
+| i_wholesale_cost | FLOAT     | 2105             | -1     | 4        | 4
+| i_brand_id       | INT       | 965              | -1     | 4        | 4
+| i_brand          | STRING    | 725              | -1     | 22       | 16.1776008605
+| i_class_id       | INT       | 16               | -1     | 4        | 4
+| i_class          | STRING    | 101              | -1     | 15       | 7.76749992370
+| i_category_id    | INT       | 10               | -1     | 4        | 4
+| i_manufact_id    | INT       | 1857             | -1     | 4        | 4
+| i_manufact       | STRING    | 1028             | -1     | 15       | 11.3295001983
+| i_size           | STRING    | 8                | -1     | 11       | 4.33459997177
+| i_formulation    | STRING    | 12884            | -1     | 20       | 19.9799995422
+| i_color          | STRING    | 92               | -1     | 10       | 5.38089990615
+| i_units          | STRING    | 22               | -1     | 7        | 4.18690013885
+| i_container      | STRING    | 2                | -1     | 7        | 6.99259996414
+| i_manager_id     | INT       | 105              | -1     | 4        | 4
+| i_product_name   | STRING    | 19094            | -1     | 25       | 18.0233001708
+| i_category       | STRING    | 10               | 0      | -1       | -1
++------------------+-----------+------------------+--------+----------+--------------
+</code></pre>
+
+    <p class="p">
+      To remove all statistics from the table, whether produced by <code class="ph codeph">COMPUTE STATS</code> or
+      <code class="ph codeph">COMPUTE INCREMENTAL STATS</code>, use the <code class="ph codeph">DROP STATS</code> statement without the
+      <code class="ph codeph">INCREMENTAL</code> clause). Now, both table-level and column-level statistics are reset.
+    </p>
+
+<pre class="pre codeblock"><code>drop stats item_partitioned;
+
+show table stats item_partitioned
++-------------+-------+--------+----------+--------------+---------+------------------
+| i_category  | #Rows | #Files | Size     | Bytes Cached | Format  | Incremental stats
++-------------+-------+--------+----------+--------------+---------+------------------
+| Books       | -1    | 1      | 223.74KB | NOT CACHED   | PARQUET | false
+| Children    | -1    | 1      | 230.05KB | NOT CACHED   | PARQUET | false
+| Electronics | -1    | 1      | 232.67KB | NOT CACHED   | PARQUET | false
+| Home        | -1    | 1      | 232.56KB | NOT CACHED   | PARQUET | false
+| Jewelry     | -1    | 1      | 223.72KB | NOT CACHED   | PARQUET | false
+| Men         | -1    | 1      | 231.25KB | NOT CACHED   | PARQUET | false
+| Music       | -1    | 1      | 237.90KB | NOT CACHED   | PARQUET | false
+| Shoes       | -1    | 1      | 234.90KB | NOT CACHED   | PARQUET | false
+| Sports      | -1    | 1      | 227.97KB | NOT CACHED   | PARQUET | false
+| Women       | -1    | 1      | 226.27KB | NOT CACHED   | PARQUET | false
+| Total       | -1    | 10     | 2.25MB   | 0B           |         |
++-------------+-------+--------+----------+--------------+---------+------------------
+show column stats item_partitioned
++------------------+-----------+------------------+--------+----------+----------+
+| Column           | Type      | #Distinct Values | #Nulls | Max Size | Avg Size |
++------------------+-----------+------------------+--------+----------+----------+
+| i_item_sk        | INT       | -1               | -1     | 4        | 4        |
+| i_item_id        | STRING    | -1               | -1     | -1       | -1       |
+| i_rec_start_date | TIMESTAMP | -1               | -1     | 16       | 16       |
+| i_rec_end_date   | TIMESTAMP | -1               | -1     | 16       | 16       |
+| i_item_desc      | STRING    | -1               | -1     | -1       | -1       |
+| i_current_price  | FLOAT     | -1               | -1     | 4        | 4        |
+| i_wholesale_cost | FLOAT     | -1               | -1     | 4        | 4        |
+| i_brand_id       | INT       | -1               | -1     | 4        | 4        |
+| i_brand          | STRING    | -1               | -1     | -1       | -1       |
+| i_class_id       | INT       | -1               | -1     | 4        | 4        |
+| i_class          | STRING    | -1               | -1     | -1       | -1       |
+| i_category_id    | INT       | -1               | -1     | 4        | 4        |
+| i_manufact_id    | INT       | -1               | -1     | 4        | 4        |
+| i_manufact       | STRING    | -1               | -1     | -1       | -1       |
+| i_size           | STRING    | -1               | -1     | -1       | -1       |
+| i_formulation    | STRING    | -1               | -1     | -1       | -1       |
+| i_color          | STRING    | -1               | -1     | -1       | -1       |
+| i_units          | STRING    | -1               | -1     | -1       | -1       |
+| i_container      | STRING    | -1               | -1     | -1       | -1       |
+| i_manager_id     | INT       | -1               | -1     | 4        | 4        |
+| i_product_name   | STRING    | -1               | -1     | -1       | -1       |
+| i_category       | STRING    | 10               | 0      | -1       | -1       |
++------------------+-----------+------------------+--------+----------+----------+
+</code></pre>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+    <p class="p">
+      <a class="xref" href="impala_compute_stats.html#compute_stats">COMPUTE STATS Statement</a>, <a class="xref" href="impala_show.html#show_table_stats">SHOW TABLE STATS Statement</a>,
+      <a class="xref" href="impala_show.html#show_column_stats">SHOW COLUMN STATS Statement</a>, <a class="xref" href="impala_perf_stats.html#perf_stats">Table and Column Statistics</a>
+    </p>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_langref_sql.html">Impala SQL Statements</a></div></div></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_drop_table.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_drop_table.html b/docs/build/html/topics/impala_drop_table.html
new file mode 100644
index 0000000..b9d21bc
--- /dev/null
+++ b/docs/build/html/topics/impala_drop_table.html
@@ -0,0 +1,192 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_langref_sql.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="drop_table"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>DROP TABLE Statement</title></head><body id="drop_table"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">DROP TABLE Statement</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      
+      Removes an Impala table. Also removes the underlying HDFS data files for internal tables, although not for
+      external tables.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>DROP TABLE [IF EXISTS] [<var class="keyword varname">db_name</var>.]<var class="keyword varname">table_name</var> <span class="ph">[PURGE]</span></code></pre>
+
+    <p class="p">
+      <strong class="ph b">IF EXISTS clause:</strong>
+    </p>
+
+    <p class="p">
+      The optional <code class="ph codeph">IF EXISTS</code> clause makes the statement succeed whether or not the table exists.
+      If the table does exist, it is dropped; if it does not exist, the statement has no effect. This capability is
+      useful in standardized setup scripts that remove existing schema objects and create new ones. By using some
+      combination of <code class="ph codeph">IF EXISTS</code> for the <code class="ph codeph">DROP</code> statements and <code class="ph codeph">IF NOT
+      EXISTS</code> clauses for the <code class="ph codeph">CREATE</code> statements, the script can run successfully the first
+      time you run it (when the objects do not exist yet) and subsequent times (when some or all of the objects do
+      already exist).
+    </p>
+
+    <p class="p">
+      <strong class="ph b">PURGE clause:</strong>
+    </p>
+
+    <p class="p"> The optional <code class="ph codeph">PURGE</code> keyword, available in
+      <span class="keyword">Impala 2.3</span> and higher, causes Impala to remove the associated
+      HDFS data files immediately, rather than going through the HDFS trashcan
+      mechanism. Use this keyword when dropping a table if it is crucial to
+      remove the data as quickly as possible to free up space, or if there is a
+      problem with the trashcan, such as the trash cannot being configured or
+      being in a different HDFS encryption zone than the data files. </p>
+
+    <p class="p">
+        <strong class="ph b">Statement type:</strong> DDL
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+    <p class="p">
+      By default, Impala removes the associated HDFS directory and data files for the table. If you issue a
+      <code class="ph codeph">DROP TABLE</code> and the data files are not deleted, it might be for the following reasons:
+    </p>
+
+    <ul class="ul">
+      <li class="li">
+        If the table was created with the
+        <code class="ph codeph"><a class="xref" href="impala_tables.html#external_tables">EXTERNAL</a></code> clause, Impala leaves all
+        files and directories untouched. Use external tables when the data is under the control of other Hadoop
+        components, and Impala is only used to query the data files from their original locations.
+      </li>
+
+      <li class="li">
+        Impala might leave the data files behind unintentionally, if there is no HDFS location available to hold
+        the HDFS trashcan for the <code class="ph codeph">impala</code> user. See
+        <a class="xref" href="impala_prereqs.html#prereqs_account">User Account Requirements</a> for the procedure to set up the required HDFS home
+        directory.
+      </li>
+    </ul>
+
+    <p class="p">
+      Make sure that you are in the correct database before dropping a table, either by issuing a
+      <code class="ph codeph">USE</code> statement first or by using a fully qualified name
+      <code class="ph codeph"><var class="keyword varname">db_name</var>.<var class="keyword varname">table_name</var></code>.
+    </p>
+
+    <p class="p">
+      If you intend to issue a <code class="ph codeph">DROP DATABASE</code> statement, first issue <code class="ph codeph">DROP TABLE</code>
+      statements to remove all the tables in that database.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>create database temporary;
+use temporary;
+create table unimportant (x int);
+create table trivial (s string);
+-- Drop a table in the current database.
+drop table unimportant;
+-- Switch to a different database.
+use default;
+-- To drop a table in a different database...
+drop table trivial;
+<em class="ph i">ERROR: AnalysisException: Table does not exist: default.trivial</em>
+-- ...use a fully qualified name.
+drop table temporary.trivial;</code></pre>
+
+    <p class="p">
+        For other tips about managing and reclaiming Impala disk space, see
+        <a class="xref" href="../shared/../topics/impala_disk_space.html#disk_space">Managing Disk Space for Impala Data</a>.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Amazon S3 considerations:</strong>
+      </p>
+    <p class="p">
+      The <code class="ph codeph">DROP TABLE</code> statement can remove data files from S3
+      if the associated S3 table is an internal table.
+      In <span class="keyword">Impala 2.6</span> and higher, as part of improved support for writing
+      to S3, Impala also removes the associated folder when dropping an internal table
+      that resides on S3.
+      See <a class="xref" href="impala_s3.html#s3">Using Impala with the Amazon S3 Filesystem</a> for details about working with S3 tables.
+    </p>
+
+    <div class="p">
+        For best compatibility with the S3 write support in <span class="keyword">Impala 2.6</span>
+        and higher:
+        <ul class="ul">
+        <li class="li">Use native Hadoop techniques to create data files in S3 for querying through Impala.</li>
+        <li class="li">Use the <code class="ph codeph">PURGE</code> clause of <code class="ph codeph">DROP TABLE</code> when dropping internal (managed) tables.</li>
+        </ul>
+        By default, when you drop an internal (managed) table, the data files are
+        moved to the HDFS trashcan. This operation is expensive for tables that
+        reside on the Amazon S3 filesystem. Therefore, for S3 tables, prefer to use
+        <code class="ph codeph">DROP TABLE <var class="keyword varname">table_name</var> PURGE</code> rather than the default <code class="ph codeph">DROP TABLE</code> statement.
+        The <code class="ph codeph">PURGE</code> clause makes Impala delete the data files immediately,
+        skipping the HDFS trashcan.
+        For the <code class="ph codeph">PURGE</code> clause to work effectively, you must originally create the
+        data files on S3 using one of the tools from the Hadoop ecosystem, such as
+        <code class="ph codeph">hadoop fs -cp</code>, or <code class="ph codeph">INSERT</code> in Impala or Hive.
+      </div>
+
+    <p class="p">
+        In <span class="keyword">Impala 2.6</span> and higher, Impala DDL statements such as
+        <code class="ph codeph">CREATE DATABASE</code>, <code class="ph codeph">CREATE TABLE</code>, <code class="ph codeph">DROP DATABASE CASCADE</code>,
+        <code class="ph codeph">DROP TABLE</code>, and <code class="ph codeph">ALTER TABLE [ADD|DROP] PARTITION</code> can create or remove folders
+        as needed in the Amazon S3 system. Prior to <span class="keyword">Impala 2.6</span>, you had to create folders yourself and point
+        Impala database, tables, or partitions at them, and manually remove folders when no longer needed.
+        See <a class="xref" href="../shared/../topics/impala_s3.html#s3">Using Impala with the Amazon S3 Filesystem</a> for details about reading and writing S3 data with Impala.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Cancellation:</strong> Cannot be cancelled.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">HDFS permissions:</strong>
+      </p>
+    <p class="p">
+      For an internal table, the user ID that the <span class="keyword cmdname">impalad</span> daemon runs under,
+      typically the <code class="ph codeph">impala</code> user, must have write
+      permission for all the files and directories that make up the table.
+    </p>
+    <p class="p">
+      For an external table, dropping the table only involves changes to metadata in the metastore database.
+      Because Impala does not remove any HDFS files or directories when external tables are dropped,
+      no particular permissions are needed for the associated HDFS files or directories.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Kudu considerations:</strong>
+      </p>
+    <p class="p">
+      Kudu tables can be managed or external, the same as with HDFS-based
+      tables. For a managed table, the underlying Kudu table and its data
+      are removed by <code class="ph codeph">DROP TABLE</code>. For an external table,
+      the underlying Kudu table and its data remain after a
+      <code class="ph codeph">DROP TABLE</code>.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+    <p class="p">
+      <a class="xref" href="impala_tables.html#tables">Overview of Impala Tables</a>,
+      <a class="xref" href="impala_alter_table.html#alter_table">ALTER TABLE Statement</a>, <a class="xref" href="impala_create_table.html#create_table">CREATE TABLE Statement</a>,
+      <a class="xref" href="impala_partitioning.html#partitioning">Partitioning for Impala Tables</a>, <a class="xref" href="impala_tables.html#internal_tables">Internal Tables</a>,
+      <a class="xref" href="impala_tables.html#external_tables">External Tables</a>
+    </p>
+
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_langref_sql.html">Impala SQL Statements</a></div></div></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_drop_view.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_drop_view.html b/docs/build/html/topics/impala_drop_view.html
new file mode 100644
index 0000000..123c376
--- /dev/null
+++ b/docs/build/html/topics/impala_drop_view.html
@@ -0,0 +1,80 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_langref_sql.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="drop_view"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>DROP VIEW Statement</title></head><body id="drop_view"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">DROP VIEW Statement</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      
+      Removes the specified view, which was originally created by the <code class="ph codeph">CREATE VIEW</code> statement.
+      Because a view is purely a logical construct (an alias for a query) with no physical data behind it,
+      <code class="ph codeph">DROP VIEW</code> only involves changes to metadata in the metastore database, not any data files in
+      HDFS.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>DROP VIEW [IF EXISTS] [<var class="keyword varname">db_name</var>.]<var class="keyword varname">view_name</var></code></pre>
+
+    <p class="p">
+        <strong class="ph b">Statement type:</strong> DDL
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Cancellation:</strong> Cannot be cancelled.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">HDFS permissions:</strong> This statement does not touch any HDFS files or directories,
+        therefore no HDFS permissions are required.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+    <div class="p">
+        The following example creates a series of views and then drops them. These examples illustrate how views
+        are associated with a particular database, and both the view definitions and the view names for
+        <code class="ph codeph">CREATE VIEW</code> and <code class="ph codeph">DROP VIEW</code> can refer to a view in the current database or
+        a fully qualified view name.
+<pre class="pre codeblock"><code>
+-- Create and drop a view in the current database.
+CREATE VIEW few_rows_from_t1 AS SELECT * FROM t1 LIMIT 10;
+DROP VIEW few_rows_from_t1;
+
+-- Create and drop a view referencing a table in a different database.
+CREATE VIEW table_from_other_db AS SELECT x FROM db1.foo WHERE x IS NOT NULL;
+DROP VIEW table_from_other_db;
+
+USE db1;
+-- Create a view in a different database.
+CREATE VIEW db2.v1 AS SELECT * FROM db2.foo;
+-- Switch into the other database and drop the view.
+USE db2;
+DROP VIEW v1;
+
+USE db1;
+-- Create a view in a different database.
+CREATE VIEW db2.v1 AS SELECT * FROM db2.foo;
+-- Drop a view in the other database.
+DROP VIEW db2.v1;
+</code></pre>
+      </div>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+    <p class="p">
+      <a class="xref" href="impala_views.html#views">Overview of Impala Views</a>, <a class="xref" href="impala_create_view.html#create_view">CREATE VIEW Statement</a>,
+      <a class="xref" href="impala_alter_view.html#alter_view">ALTER VIEW Statement</a>
+    </p>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_langref_sql.html">Impala SQL Statements</a></div></div></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_exec_single_node_rows_threshold.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_exec_single_node_rows_threshold.html b/docs/build/html/topics/impala_exec_single_node_rows_threshold.html
new file mode 100644
index 0000000..aa43a0c
--- /dev/null
+++ b/docs/build/html/topics/impala_exec_single_node_rows_threshold.html
@@ -0,0 +1,89 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="exec_single_node_rows_threshold"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>EXEC_SINGLE_NODE_ROWS_THRESHOLD Query Option (Impala 2.1 or higher only)</title></head><body id="exec_single_node_rows_threshold"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">EXEC_SINGLE_NODE_ROWS_THRESHOLD Query Option (<span class="keyword">Impala 2.1</span> or higher only)</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      
+      This setting controls the cutoff point (in terms of number of rows scanned) below which Impala treats a query
+      as a <span class="q">"small"</span> query, turning off optimizations such as parallel execution and native code generation. The
+      overhead for these optimizations is applicable for queries involving substantial amounts of data, but it
+      makes sense to skip them for queries involving tiny amounts of data. Reducing the overhead for small queries
+      allows Impala to complete them more quickly, keeping YARN resources, admission control slots, and so on
+      available for data-intensive queries.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>SET EXEC_SINGLE_NODE_ROWS_THRESHOLD=<var class="keyword varname">number_of_rows</var></code></pre>
+
+    <p class="p">
+      <strong class="ph b">Type:</strong> numeric
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Default:</strong> 100
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Usage notes:</strong> Typically, you increase the default value to make this optimization apply to more queries.
+      If incorrect or corrupted table and column statistics cause Impala to apply this optimization
+      incorrectly to queries that actually involve substantial work, you might see the queries being slower as a
+      result of remote reads. In that case, recompute statistics with the <code class="ph codeph">COMPUTE STATS</code>
+      or <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> statement. If there is a problem collecting accurate
+      statistics, you can turn this feature off by setting the value to -1.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Internal details:</strong>
+      </p>
+
+    <p class="p">
+      This setting applies to query fragments where the amount of data to scan can be accurately determined, either
+      through table and column statistics, or by the presence of a <code class="ph codeph">LIMIT</code> clause. If Impala cannot
+      accurately estimate the size of the input data, this setting does not apply.
+    </p>
+
+    <p class="p">
+      In <span class="keyword">Impala 2.3</span> and higher, where Impala supports the complex data types <code class="ph codeph">STRUCT</code>,
+      <code class="ph codeph">ARRAY</code>, and <code class="ph codeph">MAP</code>, if a query refers to any column of those types,
+      the small-query optimization is turned off for that query regardless of the
+      <code class="ph codeph">EXEC_SINGLE_NODE_ROWS_THRESHOLD</code> setting.
+    </p>
+
+    <p class="p">
+      For a query that is determined to be <span class="q">"small"</span>, all work is performed on the coordinator node. This might
+      result in some I/O being performed by remote reads. The savings from not distributing the query work and not
+      generating native code are expected to outweigh any overhead from the remote reads.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.1.0</span>
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+    <p class="p">
+      A common use case is to query just a few rows from a table to inspect typical data values. In this example,
+      Impala does not parallelize the query or perform native code generation because the result set is guaranteed
+      to be smaller than the threshold value from this query option:
+    </p>
+
+<pre class="pre codeblock"><code>SET EXEC_SINGLE_NODE_ROWS_THRESHOLD=500;
+SELECT * FROM enormous_table LIMIT 300;
+</code></pre>
+
+
+
+  </div>
+
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>
\ No newline at end of file

[02/51] [partial] incubator-impala git commit: IMPALA-4181 [DOCS] Publish rendered Impala documentation to ASF site

Posted by jb...@apache.org.

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_troubleshooting.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_troubleshooting.html b/docs/build/html/topics/impala_troubleshooting.html
new file mode 100644
index 0000000..7728ee4
--- /dev/null
+++ b/docs/build/html/topics/impala_troubleshooting.html
@@ -0,0 +1,370 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_webui.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_breakpad.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="troubleshooting"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Troubleshooting Impala</title></head><body id="troubleshooting"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Troubleshooting Impala</h1>
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      
+      Troubleshooting for Impala requires being able to diagnose and debug problems
+      with performance, network connectivity, out-of-memory conditions, disk space usage,
+      and crash or hang conditions in any of the Impala-related daemons.
+    </p>
+
+    <p class="p toc inpage">
+      The following sections describe the general troubleshooting procedures to diagnose
+      different kinds of problems:
+    </p>
+
+  </div>
+
+  <nav role="navigation" class="related-links"><ul class="ullinks"><li class="link ulchildlink"><strong><a href="../topics/impala_webui.html">Impala Web User Interface for Debugging</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_breakpad.html">Breakpad Minidumps for Impala (Impala 2.6 or higher only)</a></strong><br></li></ul></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="troubleshooting__trouble_sql">
+
+    <h2 class="title topictitle2" id="ariaid-title2">Troubleshooting Impala SQL Syntax Issues</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        In general, if queries issued against Impala fail, you can try running these same queries against Hive.
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          If a query fails against both Impala and Hive, it is likely that there is a problem with your query or
+          other elements of your <span class="keyword"></span> environment:
+          <ul class="ul">
+            <li class="li">
+              Review the <a class="xref" href="impala_langref.html#langref">Language Reference</a> to ensure your query is
+              valid.
+            </li>
+
+            <li class="li">
+              Check <a class="xref" href="impala_reserved_words.html#reserved_words">Impala Reserved Words</a> to see if any database, table,
+              column, or other object names in your query conflict with Impala reserved words.
+              Quote those names with backticks (<code class="ph codeph">``</code>) if so.
+            </li>
+
+            <li class="li">
+              Check <a class="xref" href="impala_functions.html#builtins">Impala Built-In Functions</a> to confirm whether Impala supports all the
+              built-in functions being used by your query, and whether argument and return types are the
+              same as you expect.
+            </li>
+
+            <li class="li">
+              Review the <a class="xref" href="impala_logging.html#logs_debug">contents of the Impala logs</a> for any information that may be useful in identifying the
+              source of the problem.
+            </li>
+          </ul>
+        </li>
+
+        <li class="li">
+          If a query fails against Impala but not Hive, it is likely that there is a problem with your Impala
+          installation.
+        </li>
+      </ul>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="troubleshooting__trouble_io">
+    <h2 class="title topictitle2" id="ariaid-title3">Troubleshooting I/O Capacity Problems</h2>
+    <div class="body conbody">
+      <p class="p">
+        Impala queries are typically I/O-intensive. If there is an I/O problem with storage devices,
+        or with HDFS itself, Impala queries could show slow response times with no obvious cause
+        on the Impala side. Slow I/O on even a single DataNode could result in an overall slowdown, because
+        queries involving clauses such as <code class="ph codeph">ORDER BY</code>, <code class="ph codeph">GROUP BY</code>, or <code class="ph codeph">JOIN</code>
+        do not start returning results until all DataNodes have finished their work.
+      </p>
+      <p class="p">
+        To test whether the Linux I/O system itself is performing as expected, run Linux commands like
+        the following on each DataNode:
+      </p>
+<pre class="pre codeblock"><code>
+$ sudo sysctl -w vm.drop_caches=3 vm.drop_caches=0
+vm.drop_caches = 3
+vm.drop_caches = 0
+$ sudo dd if=/dev/sda bs=1M of=/dev/null count=1k 
+1024+0 records in
+1024+0 records out
+1073741824 bytes (1.1 GB) copied, 5.60373 s, 192 MB/s
+$ sudo dd if=/dev/sdb bs=1M of=/dev/null count=1k
+1024+0 records in
+1024+0 records out
+1073741824 bytes (1.1 GB) copied, 5.51145 s, 195 MB/s
+$ sudo dd if=/dev/sdc bs=1M of=/dev/null count=1k
+1024+0 records in
+1024+0 records out
+1073741824 bytes (1.1 GB) copied, 5.58096 s, 192 MB/s
+$ sudo dd if=/dev/sdd bs=1M of=/dev/null count=1k
+1024+0 records in
+1024+0 records out
+1073741824 bytes (1.1 GB) copied, 5.43924 s, 197 MB/s
+</code></pre>
+      <p class="p">
+        On modern hardware, a throughput rate of less than 100 MB/s typically indicates
+        a performance issue with the storage device. Correct the hardware problem before
+        continuing with Impala tuning or benchmarking.
+      </p>
+    </div>
+  </article>
+
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="troubleshooting__trouble_cookbook">
+
+    <h2 class="title topictitle2" id="ariaid-title4">Impala Troubleshooting Quick Reference</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        The following table lists common problems and potential solutions.
+      </p>
+
+      <table class="table"><caption></caption><colgroup><col style="width:14.285714285714285%"><col style="width:42.857142857142854%"><col style="width:42.857142857142854%"></colgroup><thead class="thead">
+            <tr class="row">
+              <th class="entry nocellnorowborder" id="trouble_cookbook__entry__1">
+                Symptom
+              </th>
+              <th class="entry nocellnorowborder" id="trouble_cookbook__entry__2">
+                Explanation
+              </th>
+              <th class="entry nocellnorowborder" id="trouble_cookbook__entry__3">
+                Recommendation
+              </th>
+            </tr>
+          </thead><tbody class="tbody">
+            <tr class="row">
+              <td class="entry nocellnorowborder" headers="trouble_cookbook__entry__1 ">
+                Impala takes a long time to start.
+              </td>
+              <td class="entry nocellnorowborder" headers="trouble_cookbook__entry__2 ">
+                Impala instances with large numbers of tables, partitions, or data files take longer to start
+                because the metadata for these objects is broadcast to all <span class="keyword cmdname">impalad</span> nodes and
+                cached.
+              </td>
+              <td class="entry nocellnorowborder" headers="trouble_cookbook__entry__3 ">
+                Adjust timeout and synchronicity settings.
+              </td>
+            </tr>
+            <tr class="row">
+              <td class="entry nocellnorowborder" headers="trouble_cookbook__entry__1 ">
+                <p class="p">
+                  Joins fail to complete.
+                </p>
+              </td>
+              <td class="entry nocellnorowborder" headers="trouble_cookbook__entry__2 ">
+                <p class="p">
+                  There may be insufficient memory. During a join, data from the second, third, and so on sets to
+                  be joined is loaded into memory. If Impala chooses an inefficient join order or join mechanism,
+                  the query could exceed the total memory available.
+                </p>
+              </td>
+              <td class="entry nocellnorowborder" headers="trouble_cookbook__entry__3 ">
+                <p class="p">
+                  Start by gathering statistics with the <code class="ph codeph">COMPUTE STATS</code> statement for each table
+                  involved in the join. Consider specifying the <code class="ph codeph">[SHUFFLE]</code> hint so that data from
+                  the joined tables is split up between nodes rather than broadcast to each node. If tuning at the
+                  SQL level is not sufficient, add more memory to your system or join smaller data sets.
+                </p>
+              </td>
+            </tr>
+            <tr class="row">
+              <td class="entry nocellnorowborder" headers="trouble_cookbook__entry__1 ">
+                <p class="p">
+                  Queries return incorrect results.
+                </p>
+              </td>
+              <td class="entry nocellnorowborder" headers="trouble_cookbook__entry__2 ">
+                <p class="p">
+                  Impala metadata may be outdated after changes are performed in Hive.
+                </p>
+              </td>
+              <td class="entry nocellnorowborder" headers="trouble_cookbook__entry__3 ">
+                <p class="p">
+                  Where possible, use the appropriate Impala statement (<code class="ph codeph">INSERT</code>, <code class="ph codeph">LOAD
+                  DATA</code>, <code class="ph codeph">CREATE TABLE</code>, <code class="ph codeph">ALTER TABLE</code>, <code class="ph codeph">COMPUTE
+                  STATS</code>, and so on) rather than switching back and forth between Impala and Hive. Impala
+                  automatically broadcasts the results of DDL and DML operations to all Impala nodes in the
+                  cluster, but does not automatically recognize when such changes are made through Hive. After
+                  inserting data, adding a partition, or other operation in Hive, refresh the metadata for the
+                  table as described in <a class="xref" href="impala_refresh.html#refresh">REFRESH Statement</a>.
+                </p>
+              </td>
+            </tr>
+            <tr class="row">
+              <td class="entry nocellnorowborder" headers="trouble_cookbook__entry__1 ">
+                <p class="p">
+                  Queries are slow to return results.
+                </p>
+              </td>
+              <td class="entry nocellnorowborder" headers="trouble_cookbook__entry__2 ">
+                <p class="p">
+                  Some <code class="ph codeph">impalad</code> instances may not have started. Using a browser, connect to the
+                  host running the Impala state store. Connect using an address of the form
+                  <code class="ph codeph">http://<var class="keyword varname">hostname</var>:<var class="keyword varname">port</var>/metrics</code>.
+                </p>
+
+                <div class="p">
+                  <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+                    Replace <var class="keyword varname">hostname</var> and <var class="keyword varname">port</var> with the hostname and port of
+                    your Impala state store host machine and web server port. The default port is 25010.
+                  </div>
+                  The number of <code class="ph codeph">impalad</code> instances listed should match the expected number of
+                  <code class="ph codeph">impalad</code> instances installed in the cluster. There should also be one
+                  <code class="ph codeph">impalad</code> instance installed on each DataNode
+                </div>
+              </td>
+              <td class="entry nocellnorowborder" headers="trouble_cookbook__entry__3 ">
+                <p class="p">
+                  Ensure Impala is installed on all DataNodes. Start any <code class="ph codeph">impalad</code> instances that
+                  are not running.
+                </p>
+              </td>
+            </tr>
+            <tr class="row">
+              <td class="entry nocellnorowborder" headers="trouble_cookbook__entry__1 ">
+                <p class="p">
+                  Queries are slow to return results.
+                </p>
+              </td>
+              <td class="entry nocellnorowborder" headers="trouble_cookbook__entry__2 ">
+                <p class="p">
+                  Impala may not be configured to use native checksumming. Native checksumming uses
+                  machine-specific instructions to compute checksums over HDFS data very quickly. Review Impala
+                  logs. If you find instances of "<code class="ph codeph">INFO util.NativeCodeLoader: Loaded the
+                  native-hadoop</code>" messages, native checksumming is not enabled.
+                </p>
+              </td>
+              <td class="entry nocellnorowborder" headers="trouble_cookbook__entry__3 ">
+                <p class="p">
+                  Ensure Impala is configured to use native checksumming as described in
+                  <a class="xref" href="impala_config_performance.html#config_performance">Post-Installation Configuration for Impala</a>.
+                </p>
+              </td>
+            </tr>
+            <tr class="row">
+              <td class="entry nocellnorowborder" headers="trouble_cookbook__entry__1 ">
+                <p class="p">
+                  Queries are slow to return results.
+                </p>
+              </td>
+              <td class="entry nocellnorowborder" headers="trouble_cookbook__entry__2 ">
+                <p class="p">
+                  Impala may not be configured to use data locality tracking.
+                </p>
+              </td>
+              <td class="entry nocellnorowborder" headers="trouble_cookbook__entry__3 ">
+                <p class="p">
+                  Test Impala for data locality tracking and make configuration changes as necessary. Information
+                  on this process can be found in <a class="xref" href="impala_config_performance.html#config_performance">Post-Installation Configuration for Impala</a>.
+                </p>
+              </td>
+            </tr>
+            <tr class="row">
+              <td class="entry nocellnorowborder" headers="trouble_cookbook__entry__1 ">
+                <p class="p">
+                  Attempts to complete Impala tasks such as executing INSERT-SELECT actions fail. The Impala logs
+                  include notes that files could not be opened due to permission denied.
+                </p>
+              </td>
+              <td class="entry nocellnorowborder" headers="trouble_cookbook__entry__2 ">
+                <p class="p">
+                  This can be the result of permissions issues. For example, you could use the Hive shell as the
+                  hive user to create a table. After creating this table, you could attempt to complete some
+                  action, such as an INSERT-SELECT on the table. Because the table was created using one user and
+                  the INSERT-SELECT is attempted by another, this action may fail due to permissions issues.
+                </p>
+              </td>
+              <td class="entry nocellnorowborder" headers="trouble_cookbook__entry__3 ">
+                <p class="p">
+                  In general, ensure the Impala user has sufficient permissions. In the preceding example, ensure
+                  the Impala user has sufficient permissions to the table that the Hive user created.
+                </p>
+              </td>
+            </tr>
+            <tr class="row">
+              <td class="entry nocellnorowborder" headers="trouble_cookbook__entry__1 ">
+                <p class="p">
+                  Impala fails to start up, with the <span class="keyword cmdname">impalad</span> logs referring to errors connecting
+                  to the statestore service and attempts to re-register.
+                </p>
+              </td>
+              <td class="entry nocellnorowborder" headers="trouble_cookbook__entry__2 ">
+                <p class="p">
+                  A large number of databases, tables, partitions, and so on can require metadata synchronization,
+                  particularly on startup, that takes longer than the default timeout for the statestore service.
+                </p>
+              </td>
+              <td class="entry nocellnorowborder" headers="trouble_cookbook__entry__3 ">
+                <p class="p">
+                  Configure the statestore timeout value and possibly other settings related to the frequency of
+                  statestore updates and metadata loading. See
+                  <a class="xref" href="impala_timeouts.html#statestore_timeout">Increasing the Statestore Timeout</a> and
+                  <a class="xref" href="impala_scalability.html#statestore_scalability">Scalability Considerations for the Impala Statestore</a>.
+                </p>
+              </td>
+            </tr>
+          </tbody></table>
+
+      
+    </div>
+  </article>
+
+  
+
+  
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title5" id="troubleshooting__webui_snippet">
+    <h2 class="title topictitle2" id="ariaid-title5">Impala Web User Interface for Debugging</h2>
+    <div class="body conbody">
+      <div class="p">
+      
+      
+      Each of the Impala daemons (<span class="keyword cmdname">impalad</span>, <span class="keyword cmdname">statestored</span>,
+      and <span class="keyword cmdname">catalogd</span>) includes a built-in web server that displays
+      diagnostic and status information:
+      <ul class="ul">
+      <li class="li">
+        <p class="p">
+          The <span class="keyword cmdname">impalad</span> web UI (default port: 25000) includes
+          information about configuration settings, running and completed queries, and associated performance and
+          resource usage for queries. In particular, the <span class="ph uicontrol">Details</span> link for each query displays
+          alternative views of the query including a graphical representation of the plan, and the
+          output of the <code class="ph codeph">EXPLAIN</code>, <code class="ph codeph">SUMMARY</code>, and <code class="ph codeph">PROFILE</code>
+          statements from <span class="keyword cmdname">impala-shell</span>.
+          Each host that runs the <span class="keyword cmdname">impalad</span> daemon has
+          its own instance of the web UI, with details about those queries for which that
+          host served as the coordinator. The <span class="keyword cmdname">impalad</span> web UI is mainly
+          for diagnosing query problems that can be traced to a particular node.
+        </p>
+      </li>
+      <li class="li">
+        <p class="p">
+          The <span class="keyword cmdname">statestored</span> web UI (default port: 25010) includes
+          information about memory usage, configuration settings, and ongoing health checks
+          performed by this daemon. Because there is only a single instance of this
+          daemon within any cluster, you view the web UI only on the particular host
+          that serves as the Impala Statestore.
+        </p>
+      </li>
+      <li class="li">
+        <p class="p">
+          The <span class="keyword cmdname">catalogd</span> web UI (default port: 25020) includes
+          information about the databases, tables, and other objects managed by Impala,
+          in addition to the resource usage and configuration settings of the daemon itself.
+          The catalog information is represented as the underlying Thrift data structures.
+          Because there is only a single instance of this daemon within any cluster, you view the
+          web UI only on the particular host that serves as the Impala Catalog Server.
+        </p>
+      </li>
+      </ul>
+    </div>
+      <p class="p">
+        For full details, see <a class="xref" href="impala_webui.html#webui">Impala Web User Interface for Debugging</a>.
+      </p>
+    </div>
+  </article>
+
+</article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_truncate_table.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_truncate_table.html b/docs/build/html/topics/impala_truncate_table.html
new file mode 100644
index 0000000..9e5b530
--- /dev/null
+++ b/docs/build/html/topics/impala_truncate_table.html
@@ -0,0 +1,200 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_langref_sql.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="truncate_table"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>TRUNCATE TABLE Statement (Impala 2.3 or higher only)</title></head><body id="truncate_table"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">TRUNCATE TABLE Statement (<span class="keyword">Impala 2.3</span> or higher only)</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      
+      Removes the data from an Impala table while leaving the table itself.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>TRUNCATE TABLE <span class="ph">[IF EXISTS]</span> [<var class="keyword varname">db_name</var>.]<var class="keyword varname">table_name</var></code></pre>
+
+    <p class="p">
+        <strong class="ph b">Statement type:</strong> DDL
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+    <p class="p">
+      Often used to empty tables that are used during ETL cycles, after the data has been copied to another
+      table for the next stage of processing. This statement is a low-overhead alternative to dropping and
+      recreating the table, or using <code class="ph codeph">INSERT OVERWRITE</code> to replace the data during the
+      next ETL cycle.
+    </p>
+
+    <p class="p">
+      This statement removes all the data and associated data files in the table. It can remove data files from internal tables,
+      external tables, partitioned tables, and tables mapped to HBase or the Amazon Simple Storage Service (S3).
+      The data removal applies to the entire table, including all partitions of a partitioned table.
+    </p>
+
+    <p class="p">
+      Any statistics produced by the <code class="ph codeph">COMPUTE STATS</code> statement are reset when the data is removed.
+    </p>
+
+    <p class="p">
+      Make sure that you are in the correct database before truncating a table, either by issuing a
+      <code class="ph codeph">USE</code> statement first or by using a fully qualified name
+      <code class="ph codeph"><var class="keyword varname">db_name</var>.<var class="keyword varname">table_name</var></code>.
+    </p>
+
+    <p class="p">
+      The optional <code class="ph codeph">IF EXISTS</code> clause makes the statement succeed whether or not the table exists.
+      If the table does exist, it is truncated; if it does not exist, the statement has no effect. This capability is
+      useful in standardized setup scripts that are might be run both before and after some of the tables exist.
+      This clause is available in <span class="keyword">Impala 2.5</span> and higher.
+    </p>
+
+    <p class="p">
+      Any HDFS data files removed by this statement go into the HDFS trashcan, from which you can recover them
+      within a defined time interval if this operation turns out to be a mistake.
+    </p>
+
+    <p class="p">
+        For other tips about managing and reclaiming Impala disk space, see
+        <a class="xref" href="../shared/../topics/impala_disk_space.html#disk_space">Managing Disk Space for Impala Data</a>.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Amazon S3 considerations:</strong>
+      </p>
+    <p class="p">
+      Although Impala cannot write new data to a table stored in the Amazon
+      S3 filesystem, the <code class="ph codeph">TRUNCATE TABLE</code> statement can remove data files from S3.
+      See <a class="xref" href="impala_s3.html#s3">Using Impala with the Amazon S3 Filesystem</a> for details about working with S3 tables.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Cancellation:</strong> Cannot be cancelled.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">HDFS permissions:</strong>
+      </p>
+    <p class="p">
+      The user ID that the <span class="keyword cmdname">impalad</span> daemon runs under,
+      typically the <code class="ph codeph">impala</code> user, must have write
+      permission for all the files and directories that make up the table.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Kudu considerations:</strong>
+      </p>
+    <p class="p">
+        Currently, the <code class="ph codeph">TRUNCATE TABLE</code> statement cannot be used with Kudu tables.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+    <p class="p">
+      The following example shows a table containing some data and with table and column statistics.
+      After the <code class="ph codeph">TRUNCATE TABLE</code> statement, the data is removed and the statistics
+      are reset.
+    </p>
+
+<pre class="pre codeblock"><code>CREATE TABLE truncate_demo (x INT);
+INSERT INTO truncate_demo VALUES (1), (2), (4), (8);
+SELECT COUNT(*) FROM truncate_demo;
++----------+
+| count(*) |
++----------+
+| 4        |
++----------+
+COMPUTE STATS truncate_demo;
++-----------------------------------------+
+| summary                                 |
++-----------------------------------------+
+| Updated 1 partition(s) and 1 column(s). |
++-----------------------------------------+
+SHOW TABLE STATS truncate_demo;
++-------+--------+------+--------------+-------------------+--------+-------------------+
+| #Rows | #Files | Size | Bytes Cached | Cache Replication | Format | Incremental stats |
++-------+--------+------+--------------+-------------------+--------+-------------------+
+| 4     | 1      | 8B   | NOT CACHED   | NOT CACHED        | TEXT   | false             |
++-------+--------+------+--------------+-------------------+--------+-------------------+
+SHOW COLUMN STATS truncate_demo;
++--------+------+------------------+--------+----------+----------+
+| Column | Type | #Distinct Values | #Nulls | Max Size | Avg Size |
++--------+------+------------------+--------+----------+----------+
+| x      | INT  | 4                | -1     | 4        | 4        |
++--------+------+------------------+--------+----------+----------+
+
+-- After this statement, the data and the table/column stats will be gone.
+TRUNCATE TABLE truncate_demo;
+
+SELECT COUNT(*) FROM truncate_demo;
++----------+
+| count(*) |
++----------+
+| 0        |
++----------+
+SHOW TABLE STATS truncate_demo;
++-------+--------+------+--------------+-------------------+--------+-------------------+
+| #Rows | #Files | Size | Bytes Cached | Cache Replication | Format | Incremental stats |
++-------+--------+------+--------------+-------------------+--------+-------------------+
+| -1    | 0      | 0B   | NOT CACHED   | NOT CACHED        | TEXT   | false             |
++-------+--------+------+--------------+-------------------+--------+-------------------+
+SHOW COLUMN STATS truncate_demo;
++--------+------+------------------+--------+----------+----------+
+| Column | Type | #Distinct Values | #Nulls | Max Size | Avg Size |
++--------+------+------------------+--------+----------+----------+
+| x      | INT  | -1               | -1     | 4        | 4        |
++--------+------+------------------+--------+----------+----------+
+</code></pre>
+
+    <p class="p">
+      The following example shows how the <code class="ph codeph">IF EXISTS</code> clause allows the <code class="ph codeph">TRUNCATE TABLE</code>
+      statement to be run without error whether or not the table exists:
+    </p>
+
+<pre class="pre codeblock"><code>CREATE TABLE staging_table1 (x INT, s STRING);
+Fetched 0 row(s) in 0.33s
+
+SHOW TABLES LIKE 'staging*';
++----------------+
+| name           |
++----------------+
+| staging_table1 |
++----------------+
+Fetched 1 row(s) in 0.25s
+
+-- Our ETL process involves removing all data from several staging tables
+-- even though some might be already dropped, or not created yet.
+
+TRUNCATE TABLE IF EXISTS staging_table1;
+Fetched 0 row(s) in 5.04s
+
+TRUNCATE TABLE IF EXISTS staging_table2;
+Fetched 0 row(s) in 0.25s
+
+TRUNCATE TABLE IF EXISTS staging_table3;
+Fetched 0 row(s) in 0.25s
+</code></pre>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+    <p class="p">
+      <a class="xref" href="impala_tables.html#tables">Overview of Impala Tables</a>,
+      <a class="xref" href="impala_alter_table.html#alter_table">ALTER TABLE Statement</a>, <a class="xref" href="impala_create_table.html#create_table">CREATE TABLE Statement</a>,
+      <a class="xref" href="impala_partitioning.html#partitioning">Partitioning for Impala Tables</a>, <a class="xref" href="impala_tables.html#internal_tables">Internal Tables</a>,
+      <a class="xref" href="impala_tables.html#external_tables">External Tables</a>
+    </p>
+
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_langref_sql.html">Impala SQL Statements</a></div></div></nav></article></main></body></html>
\ No newline at end of file

[24/51] [partial] incubator-impala git commit: IMPALA-4181 [DOCS] Publish rendered Impala documentation to ASF site

Posted by jb...@apache.org.

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_live_summary.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_live_summary.html b/docs/build/html/topics/impala_live_summary.html
new file mode 100644
index 0000000..cb41693
--- /dev/null
+++ b/docs/build/html/topics/impala_live_summary.html
@@ -0,0 +1,177 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="live_summary"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>LIVE_SUMMARY Query Option (Impala 2.3 or higher only)</title></head><body id="live_summary"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">LIVE_SUMMARY Query Option (<span class="keyword">Impala 2.3</span> or higher only)</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      
+      For queries submitted through the <span class="keyword cmdname">impala-shell</span> command,
+      displays the same output as the <code class="ph codeph">SUMMARY</code> command,
+      with the measurements updated in real time as the query progresses.
+      When the query finishes, the final <code class="ph codeph">SUMMARY</code> output remains
+      visible in the <span class="keyword cmdname">impala-shell</span> console output.
+    </p>
+
+    <p class="p">
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Type:</strong> Boolean; recognized values are 1 and 0, or <code class="ph codeph">true</code> and <code class="ph codeph">false</code>;
+        any other value interpreted as <code class="ph codeph">false</code>
+      </p>
+    <p class="p">
+        <strong class="ph b">Default:</strong> <code class="ph codeph">false</code> (shown as 0 in output of <code class="ph codeph">SET</code> statement)
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Command-line equivalent:</strong>
+      </p>
+    <p class="p">
+      You can enable this query option within <span class="keyword cmdname">impala-shell</span>
+      by starting the shell with the <code class="ph codeph">--live_summary</code>
+      command-line option.
+      You can still turn this setting off and on again within the shell through the
+      <code class="ph codeph">SET</code> command.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+    <p class="p">
+      The live summary output can be useful for evaluating long-running queries,
+      to evaluate which phase of execution takes up the most time, or if some hosts
+      take much longer than others for certain operations, dragging overall performance down.
+      By making the information available in real time, this feature lets you decide what
+      action to take even before you cancel a query that is taking much longer than normal.
+    </p>
+    <p class="p">
+      For example, you might see the HDFS scan phase taking a long time, and therefore revisit
+      performance-related aspects of your schema design such as constructing a partitioned table,
+      switching to the Parquet file format, running the <code class="ph codeph">COMPUTE STATS</code> statement
+      for the table, and so on.
+      Or you might see a wide variation between the average and maximum times for all hosts to
+      perform some phase of the query, and therefore investigate if one particular host
+      needed more memory or was experiencing a network problem.
+    </p>
+    <p class="p">
+        The output from this query option is printed to standard error. The output is only displayed in interactive mode,
+        that is, not when the <code class="ph codeph">-q</code> or <code class="ph codeph">-f</code> options are used.
+      </p>
+    <p class="p">
+      For a simple and concise way of tracking the progress of an interactive query, see
+      <a class="xref" href="impala_live_progress.html#live_progress">LIVE_PROGRESS Query Option (Impala 2.3 or higher only)</a>.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Restrictions:</strong>
+      </p>
+    <p class="p">
+        The <code class="ph codeph">LIVE_PROGRESS</code> and <code class="ph codeph">LIVE_SUMMARY</code> query options
+        currently do not produce any output during <code class="ph codeph">COMPUTE STATS</code> operations.
+      </p>
+    <div class="p">
+        Because the <code class="ph codeph">LIVE_PROGRESS</code> and <code class="ph codeph">LIVE_SUMMARY</code> query options
+        are available only within the <span class="keyword cmdname">impala-shell</span> interpreter:
+        <ul class="ul">
+          <li class="li">
+            <p class="p">
+              You cannot change these query options through the SQL <code class="ph codeph">SET</code>
+              statement using the JDBC or ODBC interfaces. The <code class="ph codeph">SET</code>
+              command in <span class="keyword cmdname">impala-shell</span> recognizes these names as
+              shell-only options.
+            </p>
+          </li>
+          <li class="li">
+            <p class="p">
+              Be careful when using <span class="keyword cmdname">impala-shell</span> on a pre-<span class="keyword">Impala 2.3</span>
+              system to connect to a system running <span class="keyword">Impala 2.3</span> or higher.
+              The older <span class="keyword cmdname">impala-shell</span> does not recognize these
+              query option names. Upgrade <span class="keyword cmdname">impala-shell</span> on the
+              systems where you intend to use these query options.
+            </p>
+          </li>
+          <li class="li">
+            <p class="p">
+              Likewise, the <span class="keyword cmdname">impala-shell</span> command relies on
+              some information only available in <span class="keyword">Impala 2.3</span> and higher
+              to prepare live progress reports and query summaries. The
+              <code class="ph codeph">LIVE_PROGRESS</code> and <code class="ph codeph">LIVE_SUMMARY</code>
+              query options have no effect when <span class="keyword cmdname">impala-shell</span> connects
+              to a cluster running an older version of Impala.
+            </p>
+          </li>
+        </ul>
+      </div>
+
+    <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.3.0</span>
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+    <p class="p">
+      The following example shows a series of <code class="ph codeph">LIVE_SUMMARY</code> reports that
+      are displayed during the course of a query, showing how the numbers increase to
+      show the progress of different phases of the distributed query. When you do the same
+      in <span class="keyword cmdname">impala-shell</span>, only a single report is displayed at any one time,
+      with each update overwriting the previous numbers.
+    </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; set live_summary=true;
+LIVE_SUMMARY set to true
+[localhost:21000] &gt; select count(*) from customer t1 cross join customer t2;
++---------------------+--------+----------+----------+---------+------------+----------+---------------+-----------------------+
+| Operator            | #Hosts | Avg Time | Max Time | #Rows   | Est. #Rows | Peak Mem | Est. Peak Mem | Detail                |
++---------------------+--------+----------+----------+---------+------------+----------+---------------+-----------------------+
+| 06:AGGREGATE        | 0      | 0ns      | 0ns      | 0       | 1          | 0 B      | -1 B          | FINALIZE              |
+| 05:EXCHANGE         | 0      | 0ns      | 0ns      | 0       | 1          | 0 B      | -1 B          | UNPARTITIONED         |
+| 03:AGGREGATE        | 0      | 0ns      | 0ns      | 0       | 1          | 0 B      | 10.00 MB      |                       |
+| 02:NESTED LOOP JOIN | 0      | 0ns      | 0ns      | 0       | 22.50B     | 0 B      | 0 B           | CROSS JOIN, BROADCAST |
+| |--04:EXCHANGE      | 0      | 0ns      | 0ns      | 0       | 150.00K    | 0 B      | 0 B           | BROADCAST             |
+| |  01:SCAN HDFS     | 1      | 503.57ms | 503.57ms | 150.00K | 150.00K    | 24.09 MB | 64.00 MB      | tpch.customer t2      |
+| 00:SCAN HDFS        | 0      | 0ns      | 0ns      | 0       | 150.00K    | 0 B      | 64.00 MB      | tpch.customer t1      |
++---------------------+--------+----------+----------+---------+------------+----------+---------------+-----------------------+
+
++---------------------+--------+----------+----------+---------+------------+----------+---------------+-----------------------+
+| Operator            | #Hosts | Avg Time | Max Time | #Rows   | Est. #Rows | Peak Mem | Est. Peak Mem | Detail                |
++---------------------+--------+----------+----------+---------+------------+----------+---------------+-----------------------+
+| 06:AGGREGATE        | 0      | 0ns      | 0ns      | 0       | 1          | 0 B      | -1 B          | FINALIZE              |
+| 05:EXCHANGE         | 0      | 0ns      | 0ns      | 0       | 1          | 0 B      | -1 B          | UNPARTITIONED         |
+| 03:AGGREGATE        | 1      | 0ns      | 0ns      | 0       | 1          | 20.00 KB | 10.00 MB      |                       |
+| 02:NESTED LOOP JOIN | 1      | 17.62s   | 17.62s   | 81.14M  | 22.50B     | 3.23 MB  | 0 B           | CROSS JOIN, BROADCAST |
+| |--04:EXCHANGE      | 1      | 26.29ms  | 26.29ms  | 150.00K | 150.00K    | 0 B      | 0 B           | BROADCAST             |
+| |  01:SCAN HDFS     | 1      | 503.57ms | 503.57ms | 150.00K | 150.00K    | 24.09 MB | 64.00 MB      | tpch.customer t2      |
+| 00:SCAN HDFS        | 1      | 247.53ms | 247.53ms | 1.02K   | 150.00K    | 24.39 MB | 64.00 MB      | tpch.customer t1      |
++---------------------+--------+----------+----------+---------+------------+----------+---------------+-----------------------+
+
++---------------------+--------+----------+----------+---------+------------+----------+---------------+-----------------------+
+| Operator            | #Hosts | Avg Time | Max Time | #Rows   | Est. #Rows | Peak Mem | Est. Peak Mem | Detail                |
++---------------------+--------+----------+----------+---------+------------+----------+---------------+-----------------------+
+| 06:AGGREGATE        | 0      | 0ns      | 0ns      | 0       | 1          | 0 B      | -1 B          | FINALIZE              |
+| 05:EXCHANGE         | 0      | 0ns      | 0ns      | 0       | 1          | 0 B      | -1 B          | UNPARTITIONED         |
+| 03:AGGREGATE        | 1      | 0ns      | 0ns      | 0       | 1          | 20.00 KB | 10.00 MB      |                       |
+| 02:NESTED LOOP JOIN | 1      | 61.85s   | 61.85s   | 283.43M | 22.50B     | 3.23 MB  | 0 B           | CROSS JOIN, BROADCAST |
+| |--04:EXCHANGE      | 1      | 26.29ms  | 26.29ms  | 150.00K | 150.00K    | 0 B      | 0 B           | BROADCAST             |
+| |  01:SCAN HDFS     | 1      | 503.57ms | 503.57ms | 150.00K | 150.00K    | 24.09 MB | 64.00 MB      | tpch.customer t2      |
+| 00:SCAN HDFS        | 1      | 247.59ms | 247.59ms | 2.05K   | 150.00K    | 24.39 MB | 64.00 MB      | tpch.customer t1      |
++---------------------+--------+----------+----------+---------+------------+----------+---------------+-----------------------+
+
+</code></pre>
+
+
+
+
+    <p class="p">
+        To see how the <code class="ph codeph">LIVE_PROGRESS</code> and <code class="ph codeph">LIVE_SUMMARY</code> query options
+        work in real time, see <a class="xref" href="https://asciinema.org/a/1rv7qippo0fe7h5k1b6k4nexk" target="_blank">this animated demo</a>.
+      </p>
+
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_load_data.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_load_data.html b/docs/build/html/topics/impala_load_data.html
new file mode 100644
index 0000000..e49408b
--- /dev/null
+++ b/docs/build/html/topics/impala_load_data.html
@@ -0,0 +1,306 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_langref_sql.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="load_data"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>LOAD DATA Statement</title></head><body id="load_data"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">LOAD DATA Statement</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      
+      The <code class="ph codeph">LOAD DATA</code> statement streamlines the ETL process for an internal Impala table by moving a
+      data file or all the data files in a directory from an HDFS location into the Impala data directory for that
+      table.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>LOAD DATA INPATH '<var class="keyword varname">hdfs_file_or_directory_path</var>' [OVERWRITE] INTO TABLE <var class="keyword varname">tablename</var>
+  [PARTITION (<var class="keyword varname">partcol1</var>=<var class="keyword varname">val1</var>, <var class="keyword varname">partcol2</var>=<var class="keyword varname">val2</var> ...)]</code></pre>
+
+    <p class="p">
+      When the <code class="ph codeph">LOAD DATA</code> statement operates on a partitioned table,
+      it always operates on one partition at a time. Specify the <code class="ph codeph">PARTITION</code> clauses
+      and list all the partition key columns, with a constant value specified for each.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Statement type:</strong> DML (but still affected by
+        <a class="xref" href="../shared/../topics/impala_sync_ddl.html#sync_ddl">SYNC_DDL</a> query option)
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+    <ul class="ul">
+      <li class="li">
+        The loaded data files are moved, not copied, into the Impala data directory.
+      </li>
+
+      <li class="li">
+        You can specify the HDFS path of a single file to be moved, or the HDFS path of a directory to move all the
+        files inside that directory. You cannot specify any sort of wildcard to take only some of the files from a
+        directory. When loading a directory full of data files, keep all the data files at the top level, with no
+        nested directories underneath.
+      </li>
+
+      <li class="li">
+        Currently, the Impala <code class="ph codeph">LOAD DATA</code> statement only imports files from HDFS, not from the local
+        filesystem. It does not support the <code class="ph codeph">LOCAL</code> keyword of the Hive <code class="ph codeph">LOAD DATA</code>
+        statement. You must specify a path, not an <code class="ph codeph">hdfs://</code> URI.
+      </li>
+
+      <li class="li">
+        In the interest of speed, only limited error checking is done. If the loaded files have the wrong file
+        format, different columns than the destination table, or other kind of mismatch, Impala does not raise any
+        error for the <code class="ph codeph">LOAD DATA</code> statement. Querying the table afterward could produce a runtime
+        error or unexpected results. Currently, the only checking the <code class="ph codeph">LOAD DATA</code> statement does is
+        to avoid mixing together uncompressed and LZO-compressed text files in the same table.
+      </li>
+
+      <li class="li">
+        When you specify an HDFS directory name as the <code class="ph codeph">LOAD DATA</code> argument, any hidden files in
+        that directory (files whose names start with a <code class="ph codeph">.</code>) are not moved to the Impala data
+        directory.
+      </li>
+
+      <li class="li">
+        The operation fails if the source directory contains any non-hidden directories.
+        Prior to <span class="keyword">Impala 2.5</span> if the source directory contained any subdirectory, even a hidden one such as
+        <span class="ph filepath">_impala_insert_staging</span>, the <code class="ph codeph">LOAD DATA</code> statement would fail.
+        In <span class="keyword">Impala 2.5</span> and higher, <code class="ph codeph">LOAD DATA</code> ignores hidden subdirectories in the
+        source directory, and only fails if any of the subdirectories are non-hidden.
+      </li>
+
+      <li class="li">
+        The loaded data files retain their original names in the new location, unless a name conflicts with an
+        existing data file, in which case the name of the new file is modified slightly to be unique. (The
+        name-mangling is a slight difference from the Hive <code class="ph codeph">LOAD DATA</code> statement, which replaces
+        identically named files.)
+      </li>
+
+      <li class="li">
+        By providing an easy way to transport files from known locations in HDFS into the Impala data directory
+        structure, the <code class="ph codeph">LOAD DATA</code> statement lets you avoid memorizing the locations and layout of
+        HDFS directory tree containing the Impala databases and tables. (For a quick way to check the location of
+        the data files for an Impala table, issue the statement <code class="ph codeph">DESCRIBE FORMATTED
+        <var class="keyword varname">table_name</var></code>.)
+      </li>
+
+      <li class="li">
+        The <code class="ph codeph">PARTITION</code> clause is especially convenient for ingesting new data for a partitioned
+        table. As you receive new data for a time period, geographic region, or other division that corresponds to
+        one or more partitioning columns, you can load that data straight into the appropriate Impala data
+        directory, which might be nested several levels down if the table is partitioned by multiple columns. When
+        the table is partitioned, you must specify constant values for all the partitioning columns.
+      </li>
+    </ul>
+
+    <p class="p">
+        <strong class="ph b">Complex type considerations:</strong>
+      </p>
+
+    <p class="p">
+      Because Impala currently cannot create Parquet data files containing complex types
+      (<code class="ph codeph">ARRAY</code>, <code class="ph codeph">STRUCT</code>, and <code class="ph codeph">MAP</code>), the
+      <code class="ph codeph">LOAD DATA</code> statement is especially important when working with
+      tables containing complex type columns. You create the Parquet data files outside
+      Impala, then use either <code class="ph codeph">LOAD DATA</code>, an external table, or HDFS-level
+      file operations followed by <code class="ph codeph">REFRESH</code> to associate the data files with
+      the corresponding table.
+      See <a class="xref" href="impala_complex_types.html#complex_types">Complex Types (Impala 2.3 or higher only)</a> for details about using complex types.
+    </p>
+
+    <p class="p">
+        If you connect to different Impala nodes within an <span class="keyword cmdname">impala-shell</span> session for
+        load-balancing purposes, you can enable the <code class="ph codeph">SYNC_DDL</code> query option to make each DDL
+        statement wait before returning, until the new or changed metadata has been received by all the Impala
+        nodes. See <a class="xref" href="../shared/../topics/impala_sync_ddl.html#sync_ddl">SYNC_DDL Query Option</a> for details.
+      </p>
+
+    <div class="note important note_important"><span class="note__title importanttitle">Important:</span> 
+        After adding or replacing data in a table used in performance-critical queries, issue a <code class="ph codeph">COMPUTE
+        STATS</code> statement to make sure all statistics are up-to-date. Consider updating statistics for a
+        table after any <code class="ph codeph">INSERT</code>, <code class="ph codeph">LOAD DATA</code>, or <code class="ph codeph">CREATE TABLE AS
+        SELECT</code> statement in Impala, or after loading data through Hive and doing a <code class="ph codeph">REFRESH
+        <var class="keyword varname">table_name</var></code> in Impala. This technique is especially important for tables that
+        are very large, used in join queries, or both.
+      </div>
+
+    <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+    <p class="p">
+      First, we use a trivial Python script to write different numbers of strings (one per line) into files stored
+      in the <code class="ph codeph">doc_demo</code> HDFS user account. (Substitute the path for your own HDFS user account when
+      doing <span class="keyword cmdname">hdfs dfs</span> operations like these.)
+    </p>
+
+<pre class="pre codeblock"><code>$ random_strings.py 1000 | hdfs dfs -put - /user/doc_demo/thousand_strings.txt
+$ random_strings.py 100 | hdfs dfs -put - /user/doc_demo/hundred_strings.txt
+$ random_strings.py 10 | hdfs dfs -put - /user/doc_demo/ten_strings.txt</code></pre>
+
+    <p class="p">
+      Next, we create a table and load an initial set of data into it. Remember, unless you specify a
+      <code class="ph codeph">STORED AS</code> clause, Impala tables default to <code class="ph codeph">TEXTFILE</code> format with Ctrl-A (hex
+      01) as the field delimiter. This example uses a single-column table, so the delimiter is not significant. For
+      large-scale ETL jobs, you would typically use binary format data files such as Parquet or Avro, and load them
+      into Impala tables that use the corresponding file format.
+    </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; create table t1 (s string);
+[localhost:21000] &gt; load data inpath '/user/doc_demo/thousand_strings.txt' into table t1;
+Query finished, fetching results ...
++----------------------------------------------------------+
+| summary                                                  |
++----------------------------------------------------------+
+| Loaded 1 file(s). Total files in destination location: 1 |
++----------------------------------------------------------+
+Returned 1 row(s) in 0.61s
+[kilo2-202-961.cs1cloud.internal:21000] &gt; select count(*) from t1;
+Query finished, fetching results ...
++------+
+| _c0  |
++------+
+| 1000 |
++------+
+Returned 1 row(s) in 0.67s
+[localhost:21000] &gt; load data inpath '/user/doc_demo/thousand_strings.txt' into table t1;
+ERROR: AnalysisException: INPATH location '/user/doc_demo/thousand_strings.txt' does not exist. </code></pre>
+
+    <p class="p">
+      As indicated by the message at the end of the previous example, the data file was moved from its original
+      location. The following example illustrates how the data file was moved into the Impala data directory for
+      the destination table, keeping its original filename:
+    </p>
+
+<pre class="pre codeblock"><code>$ hdfs dfs -ls /user/hive/warehouse/load_data_testing.db/t1
+Found 1 items
+-rw-r--r--   1 doc_demo doc_demo      13926 2013-06-26 15:40 /user/hive/warehouse/load_data_testing.db/t1/thousand_strings.txt</code></pre>
+
+    <p class="p">
+      The following example demonstrates the difference between the <code class="ph codeph">INTO TABLE</code> and
+      <code class="ph codeph">OVERWRITE TABLE</code> clauses. The table already contains 1000 rows. After issuing the
+      <code class="ph codeph">LOAD DATA</code> statement with the <code class="ph codeph">INTO TABLE</code> clause, the table contains 100 more
+      rows, for a total of 1100. After issuing the <code class="ph codeph">LOAD DATA</code> statement with the <code class="ph codeph">OVERWRITE
+      INTO TABLE</code> clause, the former contents are gone, and now the table only contains the 10 rows from
+      the just-loaded data file.
+    </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; load data inpath '/user/doc_demo/hundred_strings.txt' into table t1;
+Query finished, fetching results ...
++----------------------------------------------------------+
+| summary                                                  |
++----------------------------------------------------------+
+| Loaded 1 file(s). Total files in destination location: 2 |
++----------------------------------------------------------+
+Returned 1 row(s) in 0.24s
+[localhost:21000] &gt; select count(*) from t1;
+Query finished, fetching results ...
++------+
+| _c0  |
++------+
+| 1100 |
++------+
+Returned 1 row(s) in 0.55s
+[localhost:21000] &gt; load data inpath '/user/doc_demo/ten_strings.txt' overwrite into table t1;
+Query finished, fetching results ...
++----------------------------------------------------------+
+| summary                                                  |
++----------------------------------------------------------+
+| Loaded 1 file(s). Total files in destination location: 1 |
++----------------------------------------------------------+
+Returned 1 row(s) in 0.26s
+[localhost:21000] &gt; select count(*) from t1;
+Query finished, fetching results ...
++-----+
+| _c0 |
++-----+
+| 10  |
++-----+
+Returned 1 row(s) in 0.62s</code></pre>
+
+    <p class="p">
+        <strong class="ph b">Amazon S3 considerations:</strong>
+      </p>
+    <p class="p">
+        In <span class="keyword">Impala 2.6</span> and higher, the Impala DML statements (<code class="ph codeph">INSERT</code>, <code class="ph codeph">LOAD DATA</code>,
+        and <code class="ph codeph">CREATE TABLE AS SELECT</code>) can write data into a table or partition that resides in the
+        Amazon Simple Storage Service (S3).
+        The syntax of the DML statements is the same as for any other tables, because the S3 location for tables and
+        partitions is specified by an <code class="ph codeph">s3a://</code> prefix in the
+        <code class="ph codeph">LOCATION</code> attribute of
+        <code class="ph codeph">CREATE TABLE</code> or <code class="ph codeph">ALTER TABLE</code> statements.
+        If you bring data into S3 using the normal S3 transfer mechanisms instead of Impala DML statements,
+        issue a <code class="ph codeph">REFRESH</code> statement for the table before using Impala to query the S3 data.
+      </p>
+    <p class="p">
+        Because of differences between S3 and traditional filesystems, DML operations
+        for S3 tables can take longer than for tables on HDFS. For example, both the
+        <code class="ph codeph">LOAD DATA</code> statement and the final stage of the <code class="ph codeph">INSERT</code>
+        and <code class="ph codeph">CREATE TABLE AS SELECT</code> statements involve moving files from one directory
+        to another. (In the case of <code class="ph codeph">INSERT</code> and <code class="ph codeph">CREATE TABLE AS SELECT</code>,
+        the files are moved from a temporary staging directory to the final destination directory.)
+        Because S3 does not support a <span class="q">"rename"</span> operation for existing objects, in these cases Impala
+        actually copies the data files from one location to another and then removes the original files.
+        In <span class="keyword">Impala 2.6</span>, the <code class="ph codeph">S3_SKIP_INSERT_STAGING</code> query option provides a way
+        to speed up <code class="ph codeph">INSERT</code> statements for S3 tables and partitions, with the tradeoff
+        that a problem during statement execution could leave data in an inconsistent state.
+        It does not apply to <code class="ph codeph">INSERT OVERWRITE</code> or <code class="ph codeph">LOAD DATA</code> statements.
+        See <a class="xref" href="../shared/../topics/impala_s3_skip_insert_staging.html#s3_skip_insert_staging">S3_SKIP_INSERT_STAGING Query Option (Impala 2.6 or higher only)</a> for details.
+      </p>
+    <p class="p">See <a class="xref" href="impala_s3.html#s3">Using Impala with the Amazon S3 Filesystem</a> for details about reading and writing S3 data with Impala.</p>
+
+    <p class="p">
+        <strong class="ph b">Cancellation:</strong> Cannot be cancelled.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">HDFS permissions:</strong>
+      </p>
+    <p class="p">
+      The user ID that the <span class="keyword cmdname">impalad</span> daemon runs under,
+      typically the <code class="ph codeph">impala</code> user, must have read and write
+      permissions for the files in the source directory, and write
+      permission for the destination directory.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Kudu considerations:</strong>
+      </p>
+    <p class="p">
+        The <code class="ph codeph">LOAD DATA</code> statement cannot be used with Kudu tables.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">HBase considerations:</strong>
+      </p>
+    <p class="p">
+        The <code class="ph codeph">LOAD DATA</code> statement cannot be used with HBase tables.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+    <p class="p">
+      The <code class="ph codeph">LOAD DATA</code> statement is an alternative to the
+      <code class="ph codeph"><a class="xref" href="impala_insert.html#insert">INSERT</a></code> statement.
+      Use <code class="ph codeph">LOAD DATA</code>
+      when you have the data files in HDFS but outside of any Impala table.
+    </p>
+    <p class="p">
+      The <code class="ph codeph">LOAD DATA</code> statement is also an alternative
+      to the <code class="ph codeph">CREATE EXTERNAL TABLE</code> statement. Use
+      <code class="ph codeph">LOAD DATA</code> when it is appropriate to move the
+      data files under Impala control rather than querying them
+      from their original location. See <a class="xref" href="impala_tables.html#external_tables">External Tables</a>
+      for information about working with external tables.
+    </p>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_langref_sql.html">Impala SQL Statements</a></div></div></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_logging.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_logging.html b/docs/build/html/topics/impala_logging.html
new file mode 100644
index 0000000..049f370
--- /dev/null
+++ b/docs/build/html/topics/impala_logging.html
@@ -0,0 +1,416 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impa
 la 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="logging"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Using Impala Logging</title></head><body id="logging"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Using Impala Logging</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      The Impala logs record information about:
+    </p>
+
+    <ul class="ul">
+      <li class="li">
+        Any errors Impala encountered. If Impala experienced a serious error during startup, you must diagnose and
+        troubleshoot that problem before you can do anything further with Impala.
+      </li>
+
+      <li class="li">
+        How Impala is configured.
+      </li>
+
+      <li class="li">
+        Jobs Impala has completed.
+      </li>
+    </ul>
+
+    <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+      <p class="p">
+        Formerly, the logs contained the query profile for each query, showing low-level details of how the work is
+        distributed among nodes and how intermediate and final results are transmitted across the network. To save
+        space, those query profiles are now stored in zlib-compressed files in
+        <span class="ph filepath">/var/log/impala/profiles</span>. You can access them through the Impala web user interface.
+        For example, at <code class="ph codeph">http://<var class="keyword varname">impalad-node-hostname</var>:25000/queries</code>, each query
+        is followed by a <code class="ph codeph">Profile</code> link leading to a page showing extensive analytical data for the
+        query execution.
+      </p>
+
+      <p class="p">
+        The auditing feature introduced in Impala 1.1.1 produces a separate set of audit log files when
+        enabled. See <a class="xref" href="impala_auditing.html#auditing">Auditing Impala Operations</a> for details.
+      </p>
+
+      <p class="p">
+        The lineage feature introduced in Impala 2.2.0 produces a separate lineage log file when
+        enabled. See <a class="xref" href="impala_lineage.html#lineage">Viewing Lineage Information for Impala Data</a> for details.
+      </p>
+    </div>
+
+    <p class="p toc inpage"></p>
+
+  </div>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title2" id="logging__logs_details">
+
+    <h2 class="title topictitle2" id="ariaid-title2">Locations and Names of Impala Log Files</h2>
+
+    <div class="body conbody">
+
+      <ul class="ul">
+        <li class="li">
+          By default, the log files are under the directory <span class="ph filepath">/var/log/impala</span>.
+          To change log file locations, modify the defaults file described in
+          <a class="xref" href="impala_processes.html#processes">Starting Impala</a>.
+        </li>
+
+        <li class="li">
+          The significant files for the <code class="ph codeph">impalad</code> process are <span class="ph filepath">impalad.INFO</span>,
+          <span class="ph filepath">impalad.WARNING</span>, and <span class="ph filepath">impalad.ERROR</span>. You might also see a file
+          <span class="ph filepath">impalad.FATAL</span>, although this is only present in rare conditions.
+        </li>
+
+        <li class="li">
+          The significant files for the <code class="ph codeph">statestored</code> process are
+          <span class="ph filepath">statestored.INFO</span>, <span class="ph filepath">statestored.WARNING</span>, and
+          <span class="ph filepath">statestored.ERROR</span>. You might also see a file <span class="ph filepath">statestored.FATAL</span>,
+          although this is only present in rare conditions.
+        </li>
+
+        <li class="li">
+          The significant files for the <code class="ph codeph">catalogd</code> process are <span class="ph filepath">catalogd.INFO</span>,
+          <span class="ph filepath">catalogd.WARNING</span>, and <span class="ph filepath">catalogd.ERROR</span>. You might also see a file
+          <span class="ph filepath">catalogd.FATAL</span>, although this is only present in rare conditions.
+        </li>
+
+        <li class="li">
+          Examine the <code class="ph codeph">.INFO</code> files to see configuration settings for the processes.
+        </li>
+
+        <li class="li">
+          Examine the <code class="ph codeph">.WARNING</code> files to see all kinds of problem information, including such
+          things as suboptimal settings and also serious runtime errors.
+        </li>
+
+        <li class="li">
+          Examine the <code class="ph codeph">.ERROR</code> and/or <code class="ph codeph">.FATAL</code> files to see only the most serious
+          errors, if the processes crash, or queries fail to complete. These messages are also in the
+          <code class="ph codeph">.WARNING</code> file.
+        </li>
+
+        <li class="li">
+          A new set of log files is produced each time the associated daemon is restarted. These log files have
+          long names including a timestamp. The <code class="ph codeph">.INFO</code>, <code class="ph codeph">.WARNING</code>, and
+          <code class="ph codeph">.ERROR</code> files are physically represented as symbolic links to the latest applicable log
+          files.
+        </li>
+
+        <li class="li">
+          The init script for the <code class="ph codeph">impala-server</code> service also produces a consolidated log file
+          <code class="ph codeph">/var/logs/impalad/impala-server.log</code>, with all the same information as the
+          corresponding<code class="ph codeph">.INFO</code>, <code class="ph codeph">.WARNING</code>, and <code class="ph codeph">.ERROR</code> files.
+        </li>
+
+        <li class="li">
+          The init script for the <code class="ph codeph">impala-state-store</code> service also produces a consolidated log file
+          <code class="ph codeph">/var/logs/impalad/impala-state-store.log</code>, with all the same information as the
+          corresponding<code class="ph codeph">.INFO</code>, <code class="ph codeph">.WARNING</code>, and <code class="ph codeph">.ERROR</code> files.
+        </li>
+      </ul>
+
+      <p class="p">
+        Impala stores information using the <code class="ph codeph">glog_v</code> logging system. You will see some messages
+        referring to C++ file names. Logging is affected by:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          The <code class="ph codeph">GLOG_v</code> environment variable specifies which types of messages are logged. See
+          <a class="xref" href="#log_levels">Setting Logging Levels</a> for details.
+        </li>
+
+        <li class="li">
+          The <code class="ph codeph">-logbuflevel</code> startup flag for the <span class="keyword cmdname">impalad</span> daemon specifies how
+          often the log information is written to disk. The default is 0, meaning that the log is immediately
+          flushed to disk when Impala outputs an important messages such as a warning or an error, but less
+          important messages such as informational ones are buffered in memory rather than being flushed to disk
+          immediately.
+        </li>
+      </ul>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="logging__logs_managing">
+
+    <h2 class="title topictitle2" id="ariaid-title3">Managing Impala Logs</h2>
+  
+
+    <div class="body conbody">
+
+      <p class="p">
+        Review Impala log files on each host, when you have traced an issue back to a specific system.
+      </p>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="logging__logs_rotate">
+
+    <h2 class="title topictitle2" id="ariaid-title4">Rotating Impala Logs</h2>
+  
+
+    <div class="body conbody">
+
+      <p class="p">
+        Impala periodically switches the physical files representing the current log files, after which it is safe
+        to remove the old files if they are no longer needed.
+      </p>
+
+      <p class="p">
+        Impala can automatically remove older unneeded log files, a feature known as <dfn class="term">log rotation</dfn>.
+
+      </p>
+
+      <p class="p">
+        In Impala 2.2 and higher, the <code class="ph codeph">-max_log_files</code> configuration option specifies how many log
+        files to keep at each severity level. You can specify an appropriate setting for each Impala-related daemon
+        (<span class="keyword cmdname">impalad</span>, <span class="keyword cmdname">statestored</span>, and <span class="keyword cmdname">catalogd</span>). The default
+        value is 10, meaning that Impala preserves the latest 10 log files for each severity level
+        (<code class="ph codeph">INFO</code>, <code class="ph codeph">WARNING</code>, <code class="ph codeph">ERROR</code>, and <code class="ph codeph">FATAL</code>).
+        Impala checks to see if any old logs need to be removed based on the interval specified in the
+        <code class="ph codeph">logbufsecs</code> setting, every 5 seconds by default.
+      </p>
+
+
+
+      <p class="p">
+        A value of 0 preserves all log files, in which case you would set up set up manual log rotation using your
+        Linux tool or technique of choice. A value of 1 preserves only the very latest log file.
+      </p>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title5" id="logging__logs_debug">
+
+    <h2 class="title topictitle2" id="ariaid-title5">Reviewing Impala Logs</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        By default, the Impala log is stored at <code class="ph codeph">/var/logs/impalad/</code>. The most comprehensive log,
+        showing informational, warning, and error messages, is in the file name <span class="ph filepath">impalad.INFO</span>.
+        View log file contents by using the web interface or by examining the contents of the log file. (When you
+        examine the logs through the file system, you can troubleshoot problems by reading the
+        <span class="ph filepath">impalad.WARNING</span> and/or <span class="ph filepath">impalad.ERROR</span> files, which contain the
+        subsets of messages indicating potential problems.)
+      </p>
+
+      <p class="p">
+        On a machine named <code class="ph codeph">impala.example.com</code> with default settings, you could view the Impala
+        logs on that machine by using a browser to access <code class="ph codeph">http://impala.example.com:25000/logs</code>.
+      </p>
+
+      <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+        <p class="p">
+          The web interface limits the amount of logging information displayed. To view every log entry, access the
+          log files directly through the file system.
+        </p>
+      </div>
+
+      <p class="p">
+        You can view the contents of the <code class="ph codeph">impalad.INFO</code> log file in the file system. With the
+        default configuration settings, the start of the log file appears as follows:
+      </p>
+
+<pre class="pre codeblock"><code>[user@example impalad]$ pwd
+/var/log/impalad
+[user@example impalad]$ more impalad.INFO
+Log file created at: 2013/01/07 08:42:12
+Running on machine: impala.example.com
+Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
+I0107 08:42:12.292155 14876 daemon.cc:34] impalad version 0.4 RELEASE (build 9d7fadca0461ab40b9e9df8cdb47107ec6b27cff)
+Built on Fri, 21 Dec 2012 12:55:19 PST
+I0107 08:42:12.292484 14876 daemon.cc:35] Using hostname: impala.example.com
+I0107 08:42:12.292706 14876 logging.cc:76] Flags (see also /varz are on debug webserver):
+--dump_ir=false
+--module_output=
+--be_port=22000
+--classpath=
+--hostname=impala.example.com</code></pre>
+
+      <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+        The preceding example shows only a small part of the log file. Impala log files are often several megabytes
+        in size.
+      </div>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title6" id="logging__log_format">
+
+    <h2 class="title topictitle2" id="ariaid-title6">Understanding Impala Log Contents</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        The logs store information about Impala startup options. This information appears once for each time Impala
+        is started and may include:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          Machine name.
+        </li>
+
+        <li class="li">
+          Impala version number.
+        </li>
+
+        <li class="li">
+          Flags used to start Impala.
+        </li>
+
+        <li class="li">
+          CPU information.
+        </li>
+
+        <li class="li">
+          The number of available disks.
+        </li>
+      </ul>
+
+      <p class="p">
+        There is information about each job Impala has run. Because each Impala job creates an additional set of
+        data about queries, the amount of job specific data may be very large. Logs may contained detailed
+        information on jobs. These detailed log entries may include:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          The composition of the query.
+        </li>
+
+        <li class="li">
+          The degree of data locality.
+        </li>
+
+        <li class="li">
+          Statistics on data throughput and response times.
+        </li>
+      </ul>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title7" id="logging__log_levels">
+
+    <h2 class="title topictitle2" id="ariaid-title7">Setting Logging Levels</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        Impala uses the GLOG system, which supports three logging levels. You can adjust logging levels
+        by exporting variable settings. To change logging settings manually, use a command
+        similar to the following on each node before starting <code class="ph codeph">impalad</code>:
+      </p>
+
+<pre class="pre codeblock"><code>export GLOG_v=1</code></pre>
+
+      <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+        For performance reasons, do not enable the most verbose logging level of 3 unless there is
+        no other alternative for troubleshooting.
+      </div>
+
+      <p class="p">
+        For more information on how to configure GLOG, including how to set variable logging levels for different
+        system components, see
+        <a class="xref" href="http://google-glog.googlecode.com/svn/trunk/doc/glog.html" target="_blank">How
+        To Use Google Logging Library (glog)</a>.
+      </p>
+
+      <section class="section" id="log_levels__loglevels_details"><h3 class="title sectiontitle">Understanding What is Logged at Different Logging Levels</h3>
+
+        
+
+        <p class="p">
+          As logging levels increase, the categories of information logged are cumulative. For example, GLOG_v=2
+          records everything GLOG_v=1 records, as well as additional information.
+        </p>
+
+        <p class="p">
+          Increasing logging levels imposes performance overhead and increases log size. Where practical, use
+          GLOG_v=1 for most cases: this level has minimal performance impact but still captures useful
+          troubleshooting information.
+        </p>
+
+        <p class="p">
+          Additional information logged at each level is as follows:
+        </p>
+
+        <ul class="ul">
+          <li class="li">
+            GLOG_v=1 - The default level. Logs information about each connection and query that is initiated to an
+            <code class="ph codeph">impalad</code> instance, including runtime profiles.
+          </li>
+
+          <li class="li">
+            GLOG_v=2 - Everything from the previous level plus information for each RPC initiated. This level also
+            records query execution progress information, including details on each file that is read.
+          </li>
+
+          <li class="li">
+            GLOG_v=3 - Everything from the previous level plus logging of every row that is read. This level is
+            only applicable for the most serious troubleshooting and tuning scenarios, because it can produce
+            exceptionally large and detailed log files, potentially leading to its own set of performance and
+            capacity problems.
+          </li>
+        </ul>
+
+      </section>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title8" id="logging__redaction">
+
+    <h2 class="title topictitle2" id="ariaid-title8">Redacting Sensitive Information from Impala Log Files</h2>
+    
+
+    <div class="body conbody">
+
+      <p class="p">
+        
+        <dfn class="term">Log redaction</dfn> is a security feature that prevents sensitive information from being displayed in
+        locations used by administrators for monitoring and troubleshooting, such as log files and the Impala debug web
+        user interface. You configure regular expressions that match sensitive types of information processed by your
+        system, such as credit card numbers or tax IDs, and literals matching these patterns are obfuscated wherever
+        they would normally be recorded in log files or displayed in administration or debugging user interfaces.
+      </p>
+
+      <p class="p">
+        In a security context, the log redaction feature is complementary to the Sentry authorization framework.
+        Sentry prevents unauthorized users from being able to directly access table data. Redaction prevents
+        administrators or support personnel from seeing the smaller amounts of sensitive or personally identifying
+        information (PII) that might appear in queries issued by those authorized users.
+      </p>
+
+      <p class="p">
+        See <span class="xref">the documentation for your Apache Hadoop distribution</span> for details about how to enable this feature and set
+        up the regular expressions to detect and redact sensitive information within SQL statement text.
+      </p>
+
+    </div>
+
+  </article>
+
+</article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_map.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_map.html b/docs/build/html/topics/impala_map.html
new file mode 100644
index 0000000..31d91f0
--- /dev/null
+++ b/docs/build/html/topics/impala_map.html
@@ -0,0 +1,331 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_datatypes.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="map"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>MAP Complex Type (Impala 2.3 or higher only)</title></head><body id="map"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+    <h1 class="title topictitle1" id="ariaid-title1">MAP Complex Type (<span class="keyword">Impala 2.3</span> or higher only)</h1>
+
+    
+
+    <div class="body conbody">
+
+      <p class="p">
+        A complex data type representing an arbitrary set of key-value pairs.
+        The key part is a scalar type, while the value part can be a scalar or
+        another complex type (<code class="ph codeph">ARRAY</code>, <code class="ph codeph">STRUCT</code>,
+        or <code class="ph codeph">MAP</code>).
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+<pre class="pre codeblock"><code><var class="keyword varname">column_name</var> MAP &lt; <var class="keyword varname">primitive_type</var>, <var class="keyword varname">type</var> &gt;
+
+type ::= <var class="keyword varname">primitive_type</var> | <var class="keyword varname">complex_type</var>
+</code></pre>
+
+      <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+      <p class="p">
+        Because complex types are often used in combination,
+        for example an <code class="ph codeph">ARRAY</code> of <code class="ph codeph">STRUCT</code>
+        elements, if you are unfamiliar with the Impala complex types,
+        start with <a class="xref" href="../shared/../topics/impala_complex_types.html#complex_types">Complex Types (Impala 2.3 or higher only)</a> for
+        background information and usage examples.
+      </p>
+
+      <p class="p">
+        The <code class="ph codeph">MAP</code> complex data type represents a set of key-value pairs.
+        Each element of the map is indexed by a primitive type such as <code class="ph codeph">BIGINT</code> or
+        <code class="ph codeph">STRING</code>, letting you define sequences that are not continuous or categories with arbitrary names.
+        You might find it convenient for modelling data produced in other languages, such as a
+        Python dictionary or Java HashMap, where a single scalar value serves as the lookup key.
+      </p>
+
+      <p class="p">
+        In a big data context, the keys in a map column might represent a numeric sequence of events during a
+        manufacturing process, or <code class="ph codeph">TIMESTAMP</code> values corresponding to sensor observations.
+        The map itself is inherently unordered, so you choose whether to make the key values significant
+        (such as a recorded <code class="ph codeph">TIMESTAMP</code>) or synthetic (such as a random global universal ID).
+      </p>
+
+      <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+        Behind the scenes, the <code class="ph codeph">MAP</code> type is implemented in a similar way as the
+        <code class="ph codeph">ARRAY</code> type. Impala does not enforce any uniqueness constraint on the
+        <code class="ph codeph">KEY</code> values, and the <code class="ph codeph">KEY</code> values are processed by
+        looping through the elements of the <code class="ph codeph">MAP</code> rather than by a constant-time lookup.
+        Therefore, this type is primarily for ease of understanding when importing data and
+        algorithms from non-SQL contexts, rather than optimizing the performance of key lookups.
+      </div>
+
+      <p class="p">
+        You can pass a multi-part qualified name to <code class="ph codeph">DESCRIBE</code>
+        to specify an <code class="ph codeph">ARRAY</code>, <code class="ph codeph">STRUCT</code>, or <code class="ph codeph">MAP</code>
+        column and visualize its structure as if it were a table.
+        For example, if table <code class="ph codeph">T1</code> contains an <code class="ph codeph">ARRAY</code> column
+        <code class="ph codeph">A1</code>, you could issue the statement <code class="ph codeph">DESCRIBE t1.a1</code>.
+        If table <code class="ph codeph">T1</code> contained a <code class="ph codeph">STRUCT</code> column <code class="ph codeph">S1</code>,
+        and a field <code class="ph codeph">F1</code> within the <code class="ph codeph">STRUCT</code> was a <code class="ph codeph">MAP</code>,
+        you could issue the statement <code class="ph codeph">DESCRIBE t1.s1.f1</code>.
+        An <code class="ph codeph">ARRAY</code> is shown as a two-column table, with
+        <code class="ph codeph">ITEM</code> and <code class="ph codeph">POS</code> columns.
+        A <code class="ph codeph">STRUCT</code> is shown as a table with each field
+        representing a column in the table.
+        A <code class="ph codeph">MAP</code> is shown as a two-column table, with
+        <code class="ph codeph">KEY</code> and <code class="ph codeph">VALUE</code> columns.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.3.0</span>
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Restrictions:</strong>
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          <p class="p">
+            Columns with this data type can only be used in tables or partitions with the Parquet file format.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            Columns with this data type cannot be used as partition key columns in a partitioned table.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            The <code class="ph codeph">COMPUTE STATS</code> statement does not produce any statistics for columns of this data type.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p" id="map__d6e2889">
+            The maximum length of the column definition for any complex type, including declarations for any nested types,
+            is 4000 characters.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            See <a class="xref" href="../shared/../topics/impala_complex_types.html#complex_types_limits">Limitations and Restrictions for Complex Types</a> for a full list of limitations
+            and associated guidelines about complex type columns.
+          </p>
+        </li>
+      </ul>
+
+      <p class="p">
+        <strong class="ph b">Kudu considerations:</strong>
+      </p>
+      <p class="p">
+        Currently, the data types <code class="ph codeph">DECIMAL</code>, <code class="ph codeph">TIMESTAMP</code>, <code class="ph codeph">CHAR</code>, <code class="ph codeph">VARCHAR</code>,
+        <code class="ph codeph">ARRAY</code>, <code class="ph codeph">MAP</code>, and <code class="ph codeph">STRUCT</code> cannot be used with Kudu tables.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+      <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+      Many of the complex type examples refer to tables
+      such as <code class="ph codeph">CUSTOMER</code> and <code class="ph codeph">REGION</code>
+      adapted from the tables used in the TPC-H benchmark.
+      See <a class="xref" href="../shared/../topics/impala_complex_types.html#complex_sample_schema">Sample Schema and Data for Experimenting with Impala Complex Types</a>
+      for the table definitions.
+      </div>
+
+      <p class="p">
+        The following example shows a table with various kinds of <code class="ph codeph">MAP</code> columns,
+        both at the top level and nested within other complex types.
+        Each row represents information about a specific country, with complex type fields
+        of various levels of nesting to represent different information associated
+        with the country: factual measurements such as area and population,
+        notable people in different categories, geographic features such as
+        cities, points of interest within each city, and mountains with associated facts.
+        Practice the <code class="ph codeph">CREATE TABLE</code> and query notation for complex type columns
+        using empty tables, until you can visualize a complex data structure and construct corresponding SQL statements reliably.
+      </p>
+
+<pre class="pre codeblock"><code>create TABLE map_demo
+(
+  country_id BIGINT,
+
+-- Numeric facts about each country, looked up by name.
+-- For example, 'Area':1000, 'Population':999999.
+-- Using a MAP instead of a STRUCT because there could be
+-- a different set of facts for each country.
+  metrics MAP &lt;STRING, BIGINT&gt;,
+
+-- MAP whose value part is an ARRAY.
+-- For example, the key 'Famous Politicians' could represent an array of 10 elements,
+-- while the key 'Famous Actors' could represent an array of 20 elements.
+  notables MAP &lt;STRING, ARRAY &lt;STRING&gt;&gt;,
+
+-- MAP that is a field within a STRUCT.
+-- (The STRUCT is inside another ARRAY, because it is rare
+-- for a STRUCT to be a top-level column.)
+-- For example, city #1 might have points of interest with key 'Zoo',
+-- representing an array of 3 different zoos.
+-- City #2 might have completely different kinds of points of interest.
+-- Because the set of field names is potentially large, and most entries could be blank,
+-- a MAP makes more sense than a STRUCT to represent such a sparse data structure.
+  cities ARRAY &lt; STRUCT &lt;
+    name: STRING,
+    points_of_interest: MAP &lt;STRING, ARRAY &lt;STRING&gt;&gt;
+  &gt;&gt;,
+
+-- MAP that is an element within an ARRAY. The MAP is inside a STRUCT field to associate
+-- the mountain name with all the facts about the mountain.
+-- The "key" of the map (the first STRING field) represents the name of some fact whose value
+-- can be expressed as an integer, such as 'Height', 'Year First Climbed', and so on.
+  mountains ARRAY &lt; STRUCT &lt; name: STRING, facts: MAP &lt;STRING, INT &gt; &gt; &gt;
+)
+STORED AS PARQUET;
+
+</code></pre>
+
+<pre class="pre codeblock"><code>DESCRIBE map_demo;
++------------+------------------------------------------------+
+| name       | type                                           |
++------------+------------------------------------------------+
+| country_id | bigint                                         |
+| metrics    | map&lt;string,bigint&gt;                             |
+| notables   | map&lt;string,array&lt;string&gt;&gt;                      |
+| cities     | array&lt;struct&lt;                                  |
+|            |   name:string,                                 |
+|            |   points_of_interest:map&lt;string,array&lt;string&gt;&gt; |
+|            | &gt;&gt;                                             |
+| mountains  | array&lt;struct&lt;                                  |
+|            |   name:string,                                 |
+|            |   facts:map&lt;string,int&gt;                        |
+|            | &gt;&gt;                                             |
++------------+------------------------------------------------+
+
+DESCRIBE map_demo.metrics;
++-------+--------+
+| name  | type   |
++-------+--------+
+| key   | string |
+| value | bigint |
++-------+--------+
+
+DESCRIBE map_demo.notables;
++-------+---------------+
+| name  | type          |
++-------+---------------+
+| key   | string        |
+| value | array&lt;string&gt; |
++-------+---------------+
+
+DESCRIBE map_demo.notables.value;
++------+--------+
+| name | type   |
++------+--------+
+| item | string |
+| pos  | bigint |
++------+--------+
+
+DESCRIBE map_demo.cities;
++------+------------------------------------------------+
+| name | type                                           |
++------+------------------------------------------------+
+| item | struct&lt;                                        |
+|      |   name:string,                                 |
+|      |   points_of_interest:map&lt;string,array&lt;string&gt;&gt; |
+|      | &gt;                                              |
+| pos  | bigint                                         |
++------+------------------------------------------------+
+
+DESCRIBE map_demo.cities.item.points_of_interest;
++-------+---------------+
+| name  | type          |
++-------+---------------+
+| key   | string        |
+| value | array&lt;string&gt; |
++-------+---------------+
+
+DESCRIBE map_demo.cities.item.points_of_interest.value;
++------+--------+
+| name | type   |
++------+--------+
+| item | string |
+| pos  | bigint |
++------+--------+
+
+DESCRIBE map_demo.mountains;
++------+-------------------------+
+| name | type                    |
++------+-------------------------+
+| item | struct&lt;                 |
+|      |   name:string,          |
+|      |   facts:map&lt;string,int&gt; |
+|      | &gt;                       |
+| pos  | bigint                  |
++------+-------------------------+
+
+DESCRIBE map_demo.mountains.item.facts;
++-------+--------+
+| name  | type   |
++-------+--------+
+| key   | string |
+| value | int    |
++-------+--------+
+
+</code></pre>
+
+      <p class="p">
+        The following example shows a table that uses a variety of data types for the <code class="ph codeph">MAP</code>
+        <span class="q">"key"</span> field. Typically, you use <code class="ph codeph">BIGINT</code> or <code class="ph codeph">STRING</code> to use
+        numeric or character-based keys without worrying about exceeding any size or length constraints.
+      </p>
+
+<pre class="pre codeblock"><code>CREATE TABLE map_demo_obscure
+(
+  id BIGINT,
+  m1 MAP &lt;INT, INT&gt;,
+  m2 MAP &lt;SMALLINT, INT&gt;,
+  m3 MAP &lt;TINYINT, INT&gt;,
+  m4 MAP &lt;TIMESTAMP, INT&gt;,
+  m5 MAP &lt;BOOLEAN, INT&gt;,
+  m6 MAP &lt;CHAR(5), INT&gt;,
+  m7 MAP &lt;VARCHAR(25), INT&gt;,
+  m8 MAP &lt;FLOAT, INT&gt;,
+  m9 MAP &lt;DOUBLE, INT&gt;,
+  m10 MAP &lt;DECIMAL(12,2), INT&gt;
+)
+STORED AS PARQUET;
+
+</code></pre>
+
+<pre class="pre codeblock"><code>CREATE TABLE celebrities (name STRING, birth_year MAP &lt; STRING, SMALLINT &gt;) STORED AS PARQUET;
+-- A typical row might represent values with 2 different birth years, such as:
+-- ("Joe Movie Star", { "real": 1972, "claimed": 1977 })
+
+CREATE TABLE countries (name STRING, famous_leaders MAP &lt; INT, STRING &gt;) STORED AS PARQUET;
+-- A typical row might represent values with different leaders, with key values corresponding to their numeric sequence, such as:
+-- ("United States", { 1: "George Washington", 3: "Thomas Jefferson", 16: "Abraham Lincoln" })
+
+CREATE TABLE airlines (name STRING, special_meals MAP &lt; STRING, MAP &lt; STRING, STRING &gt; &gt;) STORED AS PARQUET;
+-- A typical row might represent values with multiple kinds of meals, each with several components:
+-- ("Elegant Airlines",
+--   {
+--     "vegetarian": { "breakfast": "pancakes", "snack": "cookies", "dinner": "rice pilaf" },
+--     "gluten free": { "breakfast": "oatmeal", "snack": "fruit", "dinner": "chicken" }
+--   } )
+</code></pre>
+
+      <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+      <p class="p">
+        <a class="xref" href="impala_complex_types.html#complex_types">Complex Types (Impala 2.3 or higher only)</a>,
+        <a class="xref" href="impala_array.html#array">ARRAY Complex Type (Impala 2.3 or higher only)</a>,
+        <a class="xref" href="impala_struct.html#struct">STRUCT Complex Type (Impala 2.3 or higher only)</a>
+        
+      </p>
+
+    </div>
+
+  <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_datatypes.html">Data Types</a></div></div></nav></article></main></body></html>
\ No newline at end of file

[18/51] [partial] incubator-impala git commit: IMPALA-4181 [DOCS] Publish rendered Impala documentation to ASF site

Posted by jb...@apache.org.

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_optimize_partition_key_scans.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_optimize_partition_key_scans.html b/docs/build/html/topics/impala_optimize_partition_key_scans.html
new file mode 100644
index 0000000..07bfbb1
--- /dev/null
+++ b/docs/build/html/topics/impala_optimize_partition_key_scans.html
@@ -0,0 +1,188 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="optimize_partition_key_scans"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>OPTIMIZE_PARTITION_KEY_SCANS Query Option (Impala 2.5 or higher only)</title></head><body id="optimize_partition_key_scans"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">OPTIMIZE_PARTITION_KEY_SCANS Query Option (<span class="keyword">Impala 2.5</span> or higher only)</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      
+      Enables a fast code path for queries that apply simple aggregate functions to partition key
+      columns: <code class="ph codeph">MIN(<var class="keyword varname">key_column</var>)</code>, <code class="ph codeph">MAX(<var class="keyword varname">key_column</var>)</code>,
+      or <code class="ph codeph">COUNT(DISTINCT <var class="keyword varname">key_column</var>)</code>.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Type:</strong> Boolean; recognized values are 1 and 0, or <code class="ph codeph">true</code> and <code class="ph codeph">false</code>;
+        any other value interpreted as <code class="ph codeph">false</code>
+      </p>
+    <p class="p">
+        <strong class="ph b">Default:</strong> <code class="ph codeph">false</code> (shown as 0 in output of <code class="ph codeph">SET</code> statement)
+      </p>
+
+    <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+        In <span class="keyword">Impala 2.5.0</span>, only the value 1 enables the option, and the value
+        <code class="ph codeph">true</code> is not recognized. This limitation is
+        tracked by the issue
+        <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-3334" target="_blank">IMPALA-3334</a>,
+        which shows the releases where the problem is fixed.
+      </div>
+
+    <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.5.0</span>
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+    <p class="p">
+      This optimization speeds up common <span class="q">"introspection"</span> operations when using queries
+      to calculate the cardinality and range for partition key columns.
+    </p>
+
+    <p class="p">
+      This optimization does not apply if the queries contain any <code class="ph codeph">WHERE</code>,
+      <code class="ph codeph">GROUP BY</code>, or <code class="ph codeph">HAVING</code> clause. The relevant queries
+      should only compute the minimum, maximum, or number of distinct values for the
+      partition key columns across the whole table.
+    </p>
+
+    <p class="p">
+      This optimization is enabled by a query option because it skips some consistency checks
+      and therefore can return slightly different partition values if partitions are in the
+      process of being added, dropped, or loaded outside of Impala. Queries might exhibit different
+      behavior depending on the setting of this option in the following cases:
+    </p>
+
+    <ul class="ul">
+      <li class="li">
+        <p class="p">
+          If files are removed from a partition using HDFS or other non-Impala operations,
+          there is a period until the next <code class="ph codeph">REFRESH</code> of the table where regular
+          queries fail at run time because they detect the missing files. With this optimization
+          enabled, queries that evaluate only the partition key column values (not the contents of
+          the partition itself) succeed, and treat the partition as if it still exists.
+        </p>
+      </li>
+      <li class="li">
+        <p class="p">
+          If a partition contains any data files, but the data files do not contain any rows,
+          a regular query considers that the partition does not exist. With this optimization
+          enabled, the partition is treated as if it exists.
+        </p>
+        <p class="p">
+          If the partition includes no files at all, this optimization does not change the query
+          behavior: the partition is considered to not exist whether or not this optimization is enabled.
+        </p>
+      </li>
+    </ul>
+
+    <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+    <p class="p">
+      The following example shows initial schema setup and the default behavior of queries that
+      return just the partition key column for a table:
+    </p>
+
+<pre class="pre codeblock"><code>
+-- Make a partitioned table with 3 partitions.
+create table t1 (s string) partitioned by (year int);
+insert into t1 partition (year=2015) values ('last year');
+insert into t1 partition (year=2016) values ('this year');
+insert into t1 partition (year=2017) values ('next year');
+
+-- Regardless of the option setting, this query must read the
+-- data files to know how many rows to return for each year value.
+explain select year from t1;
++-----------------------------------------------------+
+| Explain String                                      |
++-----------------------------------------------------+
+| Estimated Per-Host Requirements: Memory=0B VCores=0 |
+|                                                     |
+| F00:PLAN FRAGMENT [UNPARTITIONED]                   |
+|   00:SCAN HDFS [key_cols.t1]                        |
+|      partitions=3/3 files=4 size=40B                |
+|      table stats: 3 rows total                      |
+|      column stats: all                              |
+|      hosts=3 per-host-mem=unavailable               |
+|      tuple-ids=0 row-size=4B cardinality=3          |
++-----------------------------------------------------+
+
+-- The aggregation operation means the query does not need to read
+-- the data within each partition: the result set contains exactly 1 row
+-- per partition, derived from the partition key column value.
+-- By default, Impala still includes a 'scan' operation in the query.
+explain select distinct year from t1;
++------------------------------------------------------------------------------------+
+| Explain String                                                                     |
++------------------------------------------------------------------------------------+
+| Estimated Per-Host Requirements: Memory=0B VCores=0                                |
+|                                                                                    |
+| 01:AGGREGATE [FINALIZE]                                                            |
+| |  group by: year                                                                  |
+| |                                                                                  |
+| 00:SCAN HDFS [key_cols.t1]                                                         |
+|    partitions=0/0 files=0 size=0B                                                  |
++------------------------------------------------------------------------------------+
+</code></pre>
+
+    <p class="p">
+      The following examples show how the plan is made more efficient when the
+      <code class="ph codeph">OPTIMIZE_PARTITION_KEY_SCANS</code> option is enabled:
+    </p>
+
+<pre class="pre codeblock"><code>
+set optimize_partition_key_scans=1;
+OPTIMIZE_PARTITION_KEY_SCANS set to 1
+
+-- The aggregation operation is turned into a UNION internally,
+-- with constant values known in advance based on the metadata
+-- for the partitioned table.
+explain select distinct year from t1;
++-----------------------------------------------------+
+| Explain String                                      |
++-----------------------------------------------------+
+| Estimated Per-Host Requirements: Memory=0B VCores=0 |
+|                                                     |
+| F00:PLAN FRAGMENT [UNPARTITIONED]                   |
+|   01:AGGREGATE [FINALIZE]                           |
+|   |  group by: year                                 |
+|   |  hosts=1 per-host-mem=unavailable               |
+|   |  tuple-ids=1 row-size=4B cardinality=3          |
+|   |                                                 |
+|   00:UNION                                          |
+|      constant-operands=3                            |
+|      hosts=1 per-host-mem=unavailable               |
+|      tuple-ids=0 row-size=4B cardinality=3          |
++-----------------------------------------------------+
+
+-- The same optimization applies to other aggregation queries
+-- that only return values based on partition key columns:
+-- MIN, MAX, COUNT(DISTINCT), and so on.
+explain select min(year) from t1;
++-----------------------------------------------------+
+| Explain String                                      |
++-----------------------------------------------------+
+| Estimated Per-Host Requirements: Memory=0B VCores=0 |
+|                                                     |
+| F00:PLAN FRAGMENT [UNPARTITIONED]                   |
+|   01:AGGREGATE [FINALIZE]                           |
+|   |  output: min(year)                              |
+|   |  hosts=1 per-host-mem=unavailable               |
+|   |  tuple-ids=1 row-size=4B cardinality=1          |
+|   |                                                 |
+|   00:UNION                                          |
+|      constant-operands=3                            |
+|      hosts=1 per-host-mem=unavailable               |
+|      tuple-ids=0 row-size=4B cardinality=3          |
++-----------------------------------------------------+
+</code></pre>
+
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_order_by.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_order_by.html b/docs/build/html/topics/impala_order_by.html
new file mode 100644
index 0000000..c3f5105
--- /dev/null
+++ b/docs/build/html/topics/impala_order_by.html
@@ -0,0 +1,407 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_select.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="order_by"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>ORDER BY Clause</title></head><body id="order_by"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">ORDER BY Clause</h1>
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      The familiar <code class="ph codeph">ORDER BY</code> clause of a <code class="ph codeph">SELECT</code> statement sorts the result set
+      based on the values from one or more columns.
+    </p>
+
+    <p class="p">
+      For distributed queries, this is a relatively expensive operation, because the entire result set must be
+      produced and transferred to one node before the sorting can happen. This can require more memory capacity
+      than a query without <code class="ph codeph">ORDER BY</code>. Even if the query takes approximately the same time to finish
+      with or without the <code class="ph codeph">ORDER BY</code> clause, subjectively it can appear slower because no results
+      are available until all processing is finished, rather than results coming back gradually as rows matching
+      the <code class="ph codeph">WHERE</code> clause are found. Therefore, if you only need the first N results from the sorted
+      result set, also include the <code class="ph codeph">LIMIT</code> clause, which reduces network overhead and the memory
+      requirement on the coordinator node.
+    </p>
+
+    <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+      <p class="p">
+        In Impala 1.4.0 and higher, the <code class="ph codeph">LIMIT</code> clause is now optional (rather than required) for
+        queries that use the <code class="ph codeph">ORDER BY</code> clause. Impala automatically uses a temporary disk work area
+        to perform the sort if the sort operation would otherwise exceed the Impala memory limit for a particular
+        DataNode.
+      </p>
+    </div>
+
+    <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+    <p class="p">
+      The full syntax for the <code class="ph codeph">ORDER BY</code> clause is:
+    </p>
+
+<pre class="pre codeblock"><code>ORDER BY <var class="keyword varname">col_ref</var> [, <var class="keyword varname">col_ref</var> ...] [ASC | DESC] [NULLS FIRST | NULLS LAST]
+
+col_ref ::= <var class="keyword varname">column_name</var> | <var class="keyword varname">integer_literal</var>
+</code></pre>
+
+    <p class="p">
+      Although the most common usage is <code class="ph codeph">ORDER BY <var class="keyword varname">column_name</var></code>, you can also
+      specify <code class="ph codeph">ORDER BY 1</code> to sort by the first column of the result set, <code class="ph codeph">ORDER BY
+      2</code> to sort by the second column, and so on. The number must be a numeric literal, not some other kind
+      of constant expression. (If the argument is some other expression, even a <code class="ph codeph">STRING</code> value, the
+      query succeeds but the order of results is undefined.)
+    </p>
+
+    <p class="p">
+      <code class="ph codeph">ORDER BY <var class="keyword varname">column_number</var></code> can only be used when the query explicitly lists
+      the columns in the <code class="ph codeph">SELECT</code> list, not with <code class="ph codeph">SELECT *</code> queries.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Ascending and descending sorts:</strong>
+    </p>
+
+    <p class="p">
+      The default sort order (the same as using the <code class="ph codeph">ASC</code> keyword) puts the smallest values at the
+      start of the result set, and the largest values at the end. Specifying the <code class="ph codeph">DESC</code> keyword
+      reverses that order.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Sort order for NULL values:</strong>
+    </p>
+
+    <p class="p">
+      See <a class="xref" href="impala_literals.html#null">NULL</a> for details about how <code class="ph codeph">NULL</code> values are positioned
+      in the sorted result set, and how to use the <code class="ph codeph">NULLS FIRST</code> and <code class="ph codeph">NULLS LAST</code>
+      clauses. (The sort position for <code class="ph codeph">NULL</code> values in <code class="ph codeph">ORDER BY ... DESC</code> queries is
+      changed in Impala 1.2.1 and higher to be more standards-compliant, and the <code class="ph codeph">NULLS FIRST</code> and
+      <code class="ph codeph">NULLS LAST</code> keywords are new in Impala 1.2.1.)
+    </p>
+
+    <p class="p">
+        Prior to Impala 1.4.0, Impala required any query including an
+        <code class="ph codeph"><a class="xref" href="../shared/../topics/impala_order_by.html#order_by">ORDER BY</a></code> clause to also use a
+        <code class="ph codeph"><a class="xref" href="../shared/../topics/impala_limit.html#limit">LIMIT</a></code> clause. In Impala 1.4.0 and
+        higher, the <code class="ph codeph">LIMIT</code> clause is optional for <code class="ph codeph">ORDER BY</code> queries. In cases where
+        sorting a huge result set requires enough memory to exceed the Impala memory limit for a particular node,
+        Impala automatically uses a temporary disk work area to perform the sort operation.
+      </p>
+
+    
+    <p class="p">
+        <strong class="ph b">Complex type considerations:</strong>
+      </p>
+
+    <p class="p">
+      In <span class="keyword">Impala 2.3</span> and higher, the complex data types <code class="ph codeph">STRUCT</code>,
+      <code class="ph codeph">ARRAY</code>, and <code class="ph codeph">MAP</code> are available. These columns cannot
+      be referenced directly in the <code class="ph codeph">ORDER BY</code> clause.
+      When you query a complex type column, you use join notation to <span class="q">"unpack"</span> the elements
+      of the complex type, and within the join query you can include an <code class="ph codeph">ORDER BY</code>
+      clause to control the order in the result set of the scalar elements from the complex type.
+      See <a class="xref" href="impala_complex_types.html#complex_types">Complex Types (Impala 2.3 or higher only)</a> for details about Impala support for complex types.
+    </p>
+
+    <p class="p">
+      The following query shows how a complex type column cannot be directly used in an <code class="ph codeph">ORDER BY</code> clause:
+    </p>
+
+<pre class="pre codeblock"><code>CREATE TABLE games (id BIGINT, score ARRAY &lt;BIGINT&gt;) STORED AS PARQUET;
+...use LOAD DATA to load externally created Parquet files into the table...
+SELECT id FROM games ORDER BY score DESC;
+ERROR: AnalysisException: ORDER BY expression 'score' with complex type 'ARRAY&lt;BIGINT&gt;' is not supported.
+</code></pre>
+
+    <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+    <p class="p">
+      The following query retrieves the user ID and score, only for scores greater than one million,
+      with the highest scores for each user listed first.
+      Because the individual array elements are now represented as separate rows in the result set,
+      they can be used in the <code class="ph codeph">ORDER BY</code> clause, referenced using the <code class="ph codeph">ITEM</code>
+      pseudocolumn that represents each array element.
+    </p>
+
+<pre class="pre codeblock"><code>SELECT id, item FROM games, games.score
+  WHERE item &gt; 1000000
+ORDER BY id, item desc;
+</code></pre>
+
+    <p class="p">
+      The following queries use similar <code class="ph codeph">ORDER BY</code> techniques with variations of the <code class="ph codeph">GAMES</code>
+      table, where the complex type is an <code class="ph codeph">ARRAY</code> containing <code class="ph codeph">STRUCT</code> or <code class="ph codeph">MAP</code>
+      elements to represent additional details about each game that was played.
+      For an array of structures, the fields of the structure are referenced as <code class="ph codeph">ITEM.<var class="keyword varname">field_name</var></code>.
+      For an array of maps, the keys and values within each array element are referenced as <code class="ph codeph">ITEM.KEY</code>
+      and <code class="ph codeph">ITEM.VALUE</code>.
+    </p>
+
+<pre class="pre codeblock"><code>CREATE TABLE games2 (id BIGINT, play array &lt; struct &lt;game_name: string, score: BIGINT, high_score: boolean&gt; &gt;) STORED AS PARQUET
+...use LOAD DATA to load externally created Parquet files into the table...
+SELECT id, item.game_name, item.score FROM games2, games2.play
+  WHERE item.score &gt; 1000000
+ORDER BY id, item.score DESC;
+
+CREATE TABLE games3 (id BIGINT, play ARRAY &lt; MAP &lt;STRING, BIGINT&gt; &gt;) STORED AS PARQUET;
+...use LOAD DATA to load externally created Parquet files into the table...
+SELECT id, info.key AS k, info.value AS v from games3, games3.play AS plays, games3.play.item AS info
+  WHERE info.KEY = 'score' AND info.VALUE &gt; 1000000
+ORDER BY id, info.value desc;
+</code></pre>
+
+    <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+    <p class="p">
+      Although the <code class="ph codeph">LIMIT</code> clause is now optional on <code class="ph codeph">ORDER BY</code> queries, if your
+      query only needs some number of rows that you can predict in advance, use the <code class="ph codeph">LIMIT</code> clause
+      to reduce unnecessary processing. For example, if the query has a clause <code class="ph codeph">LIMIT 10</code>, each data
+      node sorts its portion of the relevant result set and only returns 10 rows to the coordinator node. The
+      coordinator node picks the 10 highest or lowest row values out of this small intermediate result set.
+    </p>
+
+    <p class="p">
+      If an <code class="ph codeph">ORDER BY</code> clause is applied to an early phase of query processing, such as a subquery
+      or a view definition, Impala ignores the <code class="ph codeph">ORDER BY</code> clause. To get ordered results from a
+      subquery or view, apply an <code class="ph codeph">ORDER BY</code> clause to the outermost or final <code class="ph codeph">SELECT</code>
+      level.
+    </p>
+
+    <p class="p">
+      <code class="ph codeph">ORDER BY</code> is often used in combination with <code class="ph codeph">LIMIT</code> to perform <span class="q">"top-N"</span>
+      queries:
+    </p>
+
+<pre class="pre codeblock"><code>SELECT user_id AS "Top 10 Visitors", SUM(page_views) FROM web_stats
+  GROUP BY page_views, user_id
+  ORDER BY SUM(page_views) DESC LIMIT 10;
+</code></pre>
+
+    <p class="p">
+      <code class="ph codeph">ORDER BY</code> is sometimes used in combination with <code class="ph codeph">OFFSET</code> and
+      <code class="ph codeph">LIMIT</code> to paginate query results, although it is relatively inefficient to issue multiple
+      queries like this against the large tables typically used with Impala:
+    </p>
+
+<pre class="pre codeblock"><code>SELECT page_title AS "Page 1 of search results", page_url FROM search_content
+  WHERE LOWER(page_title) LIKE '%game%')
+  ORDER BY page_title LIMIT 10 OFFSET 0;
+SELECT page_title AS "Page 2 of search results", page_url FROM search_content
+  WHERE LOWER(page_title) LIKE '%game%')
+  ORDER BY page_title LIMIT 10 OFFSET 10;
+SELECT page_title AS "Page 3 of search results", page_url FROM search_content
+  WHERE LOWER(page_title) LIKE '%game%')
+  ORDER BY page_title LIMIT 10 OFFSET 20;
+</code></pre>
+
+    <p class="p">
+        <strong class="ph b">Internal details:</strong>
+      </p>
+
+    <p class="p">
+      Impala sorts the intermediate results of an <code class="ph codeph">ORDER BY</code> clause in memory whenever practical. In
+      a cluster of N DataNodes, each node sorts roughly 1/Nth of the result set, the exact proportion varying
+      depending on how the data matching the query is distributed in HDFS.
+    </p>
+
+    <p class="p">
+      If the size of the sorted intermediate result set on any DataNode would cause the query to exceed the Impala
+      memory limit, Impala sorts as much as practical in memory, then writes partially sorted data to disk. (This
+      technique is known in industry terminology as <span class="q">"external sorting"</span> and <span class="q">"spilling to disk"</span>.) As each
+      8 MB batch of data is written to disk, Impala frees the corresponding memory to sort a new 8 MB batch of
+      data. When all the data has been processed, a final merge sort operation is performed to correctly order the
+      in-memory and on-disk results as the result set is transmitted back to the coordinator node. When external
+      sorting becomes necessary, Impala requires approximately 60 MB of RAM at a minimum for the buffers needed to
+      read, write, and sort the intermediate results. If more RAM is available on the DataNode, Impala will use
+      the additional RAM to minimize the amount of disk I/O for sorting.
+    </p>
+
+    <p class="p">
+      This external sort technique is used as appropriate on each DataNode (possibly including the coordinator
+      node) to sort the portion of the result set that is processed on that node. When the sorted intermediate
+      results are sent back to the coordinator node to produce the final result set, the coordinator node uses a
+      merge sort technique to produce a final sorted result set without using any extra resources on the
+      coordinator node.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Configuration for disk usage:</strong>
+    </p>
+
+    <p class="p">
+        By default, intermediate files used during large sort, join, aggregation, or analytic function operations
+        are stored in the directory <span class="ph filepath">/tmp/impala-scratch</span> . These files are removed when the
+        operation finishes. (Multiple concurrent queries can perform operations that use the <span class="q">"spill to disk"</span>
+        technique, without any name conflicts for these temporary files.) You can specify a different location by
+        starting the <span class="keyword cmdname">impalad</span> daemon with the
+        <code class="ph codeph">--scratch_dirs="<var class="keyword varname">path_to_directory</var>"</code> configuration option.
+        You can specify a single directory, or a comma-separated list of directories. The scratch directories must
+        be on the local filesystem, not in HDFS. You might specify different directory paths for different hosts,
+        depending on the capacity and speed
+        of the available storage devices. In <span class="keyword">Impala 2.3</span> or higher, Impala successfully starts (with a warning
+        Impala successfully starts (with a warning written to the log) if it cannot create or read and write files
+        in one of the scratch directories. If there is less than 1 GB free on the filesystem where that directory resides,
+        Impala still runs, but writes a warning message to its log.  If Impala encounters an error reading or writing
+        files in a scratch directory during a query, Impala logs the error and the query fails.
+      </p>
+
+
+
+
+
+    <p class="p">
+        <strong class="ph b">Sorting considerations:</strong> Although you can specify an <code class="ph codeph">ORDER BY</code> clause in an
+        <code class="ph codeph">INSERT ... SELECT</code> statement, any <code class="ph codeph">ORDER BY</code> clause is ignored and the
+        results are not necessarily sorted. An <code class="ph codeph">INSERT ... SELECT</code> operation potentially creates
+        many different data files, prepared on different data nodes, and therefore the notion of the data being
+        stored in sorted order is impractical.
+      </p>
+
+    <div class="p">
+        An <code class="ph codeph">ORDER BY</code> clause without an additional <code class="ph codeph">LIMIT</code> clause is ignored in any
+        view definition. If you need to sort the entire result set from a view, use an <code class="ph codeph">ORDER BY</code>
+        clause in the <code class="ph codeph">SELECT</code> statement that queries the view. You can still make a simple <span class="q">"top
+        10"</span> report by combining the <code class="ph codeph">ORDER BY</code> and <code class="ph codeph">LIMIT</code> clauses in the same
+        view definition:
+<pre class="pre codeblock"><code>[localhost:21000] &gt; create table unsorted (x bigint);
+[localhost:21000] &gt; insert into unsorted values (1), (9), (3), (7), (5), (8), (4), (6), (2);
+[localhost:21000] &gt; create view sorted_view as select x from unsorted order by x;
+[localhost:21000] &gt; select x from sorted_view; -- ORDER BY clause in view has no effect.
++---+
+| x |
++---+
+| 1 |
+| 9 |
+| 3 |
+| 7 |
+| 5 |
+| 8 |
+| 4 |
+| 6 |
+| 2 |
++---+
+[localhost:21000] &gt; select x from sorted_view order by x; -- View query requires ORDER BY at outermost level.
++---+
+| x |
++---+
+| 1 |
+| 2 |
+| 3 |
+| 4 |
+| 5 |
+| 6 |
+| 7 |
+| 8 |
+| 9 |
++---+
+[localhost:21000] &gt; create view top_3_view as select x from unsorted order by x limit 3;
+[localhost:21000] &gt; select x from top_3_view; -- ORDER BY and LIMIT together in view definition are preserved.
++---+
+| x |
++---+
+| 1 |
+| 2 |
+| 3 |
++---+
+</code></pre>
+      </div>
+
+    <p class="p">
+      With the lifting of the requirement to include a <code class="ph codeph">LIMIT</code> clause in every <code class="ph codeph">ORDER
+      BY</code> query (in Impala 1.4 and higher):
+    </p>
+
+    <ul class="ul">
+      <li class="li">
+        <p class="p">
+          Now the use of scratch disk space raises the possibility of an <span class="q">"out of disk space"</span> error on a
+          particular DataNode, as opposed to the previous possibility of an <span class="q">"out of memory"</span> error. Make sure
+          to keep at least 1 GB free on the filesystem used for temporary sorting work.
+        </p>
+      </li>
+
+      <li class="li">
+        <p class="p">
+          The query options
+          <a class="xref" href="impala_default_order_by_limit.html#default_order_by_limit">DEFAULT_ORDER_BY_LIMIT</a> and
+          <a class="xref" href="impala_abort_on_default_limit_exceeded.html#abort_on_default_limit_exceeded">ABORT_ON_DEFAULT_LIMIT_EXCEEDED</a>,
+          which formerly controlled the behavior of <code class="ph codeph">ORDER BY</code> queries with no limit specified, are
+          now ignored.
+        </p>
+      </li>
+    </ul>
+
+    <p class="p">
+        In Impala 1.2.1 and higher, all <code class="ph codeph">NULL</code> values come at the end of the result set for
+        <code class="ph codeph">ORDER BY ... ASC</code> queries, and at the beginning of the result set for <code class="ph codeph">ORDER BY ...
+        DESC</code> queries. In effect, <code class="ph codeph">NULL</code> is considered greater than all other values for
+        sorting purposes. The original Impala behavior always put <code class="ph codeph">NULL</code> values at the end, even for
+        <code class="ph codeph">ORDER BY ... DESC</code> queries. The new behavior in Impala 1.2.1 makes Impala more compatible
+        with other popular database systems. In Impala 1.2.1 and higher, you can override or specify the sorting
+        behavior for <code class="ph codeph">NULL</code> by adding the clause <code class="ph codeph">NULLS FIRST</code> or <code class="ph codeph">NULLS
+        LAST</code> at the end of the <code class="ph codeph">ORDER BY</code> clause.
+      </p>
+<pre class="pre codeblock"><code>[localhost:21000] &gt; create table numbers (x int);
+[localhost:21000] &gt; insert into numbers values (1), (null), (2), (null), (3);
+[localhost:21000] &gt; select x from numbers order by x nulls first;
++------+
+| x    |
++------+
+| NULL |
+| NULL |
+| 1    |
+| 2    |
+| 3    |
++------+
+[localhost:21000] &gt; select x from numbers order by x desc nulls first;
++------+
+| x    |
++------+
+| NULL |
+| NULL |
+| 3    |
+| 2    |
+| 1    |
++------+
+[localhost:21000] &gt; select x from numbers order by x nulls last;
++------+
+| x    |
++------+
+| 1    |
+| 2    |
+| 3    |
+| NULL |
+| NULL |
++------+
+[localhost:21000] &gt; select x from numbers order by x desc nulls last;
++------+
+| x    |
++------+
+| 3    |
+| 2    |
+| 1    |
+| NULL |
+| NULL |
++------+
+</code></pre>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+    <p class="p">
+      See <a class="xref" href="impala_select.html#select">SELECT Statement</a> for further examples of queries with the <code class="ph codeph">ORDER
+      BY</code> clause.
+    </p>
+
+    <p class="p">
+      Analytic functions use the <code class="ph codeph">ORDER BY</code> clause in a different context to define the sequence in
+      which rows are analyzed. See <a class="xref" href="impala_analytic_functions.html#analytic_functions">Impala Analytic Functions</a> for details.
+    </p>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_select.html">SELECT Statement</a></div></div></nav></article></main></body></html>
\ No newline at end of file

[03/51] [partial] incubator-impala git commit: IMPALA-4181 [DOCS] Publish rendered Impala documentation to ASF site

Posted by jb...@apache.org.

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_tables.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_tables.html b/docs/build/html/topics/impala_tables.html
new file mode 100644
index 0000000..f636f20
--- /dev/null
+++ b/docs/build/html/topics/impala_tables.html
@@ -0,0 +1,446 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_schema_objects.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="tables"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Overview of Impala Tables</title></head><body id="tables"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Overview of Impala Tables</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p"></p>
+
+    <p class="p">
+      Tables are the primary containers for data in Impala. They have the familiar row and column layout similar to
+      other database systems, plus some features such as partitioning often associated with higher-end data
+      warehouse systems.
+    </p>
+
+    <p class="p">
+      Logically, each table has a structure based on the definition of its columns, partitions, and other
+      properties.
+    </p>
+
+    <p class="p">
+      Physically, each table that uses HDFS storage is associated with a directory in HDFS. The table data consists of all the data files
+      underneath that directory:
+    </p>
+
+    <ul class="ul">
+      <li class="li">
+        <a class="xref" href="impala_tables.html#internal_tables">Internal tables</a> are managed by Impala, and use directories
+        inside the designated Impala work area.
+      </li>
+
+      <li class="li">
+        <a class="xref" href="impala_tables.html#external_tables">External tables</a> use arbitrary HDFS directories, where
+        the data files are typically shared between different Hadoop components.
+      </li>
+
+      <li class="li">
+        Large-scale data is usually handled by partitioned tables, where the data files are divided among different
+        HDFS subdirectories.
+      </li>
+    </ul>
+
+    <p class="p">
+      Impala tables can also represent data that is stored in HBase, or in the Amazon S3 filesystem (<span class="keyword">Impala 2.2</span> or higher),
+      or on Isilon storage devices (<span class="keyword">Impala 2.2.3</span> or higher).  See <a class="xref" href="impala_hbase.html#impala_hbase">Using Impala to Query HBase Tables</a>,
+      <a class="xref" href="impala_s3.html#s3">Using Impala with the Amazon S3 Filesystem</a>, and <a class="xref" href="impala_isilon.html#impala_isilon">Using Impala with Isilon Storage</a>
+      for details about those special kinds of tables.
+    </p>
+
+    <p class="p">
+        Impala queries ignore files with extensions commonly used for temporary work files by Hadoop tools. Any
+        files with extensions <code class="ph codeph">.tmp</code> or <code class="ph codeph">.copying</code> are not considered part of the
+        Impala table. The suffix matching is case-insensitive, so for example Impala ignores both
+        <code class="ph codeph">.copying</code> and <code class="ph codeph">.COPYING</code> suffixes.
+      </p>
+
+    <p class="p toc inpage"></p>
+
+    <p class="p">
+      <strong class="ph b">Related statements:</strong> <a class="xref" href="impala_create_table.html#create_table">CREATE TABLE Statement</a>,
+      <a class="xref" href="impala_drop_table.html#drop_table">DROP TABLE Statement</a>, <a class="xref" href="impala_alter_table.html#alter_table">ALTER TABLE Statement</a>
+      <a class="xref" href="impala_insert.html#insert">INSERT Statement</a>, <a class="xref" href="impala_load_data.html#load_data">LOAD DATA Statement</a>,
+      <a class="xref" href="impala_select.html#select">SELECT Statement</a>
+    </p>
+  </div>
+
+  <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_schema_objects.html">Impala Schema Objects and Object Names</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="tables__internal_tables">
+
+    <h2 class="title topictitle2" id="ariaid-title2">Internal Tables</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        
+        The default kind of table produced by the <code class="ph codeph">CREATE TABLE</code> statement is known as an internal
+        table. (Its counterpart is the external table, produced by the <code class="ph codeph">CREATE EXTERNAL TABLE</code>
+        syntax.)
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          <p class="p">
+            Impala creates a directory in HDFS to hold the data files.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            You can create data in internal tables by issuing <code class="ph codeph">INSERT</code> or <code class="ph codeph">LOAD DATA</code>
+            statements.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            If you add or replace data using HDFS operations, issue the <code class="ph codeph">REFRESH</code> command in
+            <span class="keyword cmdname">impala-shell</span> so that Impala recognizes the changes in data files, block locations,
+            and so on.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            When you issue a <code class="ph codeph">DROP TABLE</code> statement, Impala physically removes all the data files
+            from the directory.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+        To see whether a table is internal or external, and its associated HDFS location, issue the statement
+        <code class="ph codeph">DESCRIBE FORMATTED <var class="keyword varname">table_name</var></code>. The <code class="ph codeph">Table Type</code> field
+        displays <code class="ph codeph">MANAGED_TABLE</code> for internal tables and <code class="ph codeph">EXTERNAL_TABLE</code> for
+        external tables. The <code class="ph codeph">Location</code> field displays the path of the table directory as an HDFS
+        URI.
+      </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            When you issue an <code class="ph codeph">ALTER TABLE</code> statement to rename an internal table, all data files
+            are moved into the new HDFS directory for the table. The files are moved even if they were formerly in
+            a directory outside the Impala data directory, for example in an internal table with a
+            <code class="ph codeph">LOCATION</code> attribute pointing to an outside HDFS directory.
+          </p>
+        </li>
+      </ul>
+
+      <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+      <div class="p">
+        You can switch a table from internal to external, or from external to internal, by using the <code class="ph codeph">ALTER
+        TABLE</code> statement:
+<pre class="pre codeblock"><code>
+-- Switch a table from internal to external.
+ALTER TABLE <var class="keyword varname">table_name</var> SET TBLPROPERTIES('EXTERNAL'='TRUE');
+
+-- Switch a table from external to internal.
+ALTER TABLE <var class="keyword varname">table_name</var> SET TBLPROPERTIES('EXTERNAL'='FALSE');
+</code></pre>
+      </div>
+
+      <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+      <p class="p">
+        <a class="xref" href="impala_tables.html#external_tables">External Tables</a>, <a class="xref" href="impala_create_table.html#create_table">CREATE TABLE Statement</a>,
+        <a class="xref" href="impala_drop_table.html#drop_table">DROP TABLE Statement</a>, <a class="xref" href="impala_alter_table.html#alter_table">ALTER TABLE Statement</a>,
+        <a class="xref" href="impala_describe.html#describe">DESCRIBE Statement</a>
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="tables__external_tables">
+
+    <h2 class="title topictitle2" id="ariaid-title3">External Tables</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        
+        The syntax <code class="ph codeph">CREATE EXTERNAL TABLE</code> sets up an Impala table that points at existing data
+        files, potentially in HDFS locations outside the normal Impala data directories.. This operation saves the
+        expense of importing the data into a new table when you already have the data files in a known location in
+        HDFS, in the desired file format.
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          <p class="p">
+            You can use Impala to query the data in this table.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            You can create data in external tables by issuing <code class="ph codeph">INSERT</code> or <code class="ph codeph">LOAD DATA</code>
+            statements.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            If you add or replace data using HDFS operations, issue the <code class="ph codeph">REFRESH</code> command in
+            <span class="keyword cmdname">impala-shell</span> so that Impala recognizes the changes in data files, block locations,
+            and so on.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            When you issue a <code class="ph codeph">DROP TABLE</code> statement in Impala, that removes the connection that
+            Impala has with the associated data files, but does not physically remove the underlying data. You can
+            continue to use the data files with other Hadoop components and HDFS operations.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+        To see whether a table is internal or external, and its associated HDFS location, issue the statement
+        <code class="ph codeph">DESCRIBE FORMATTED <var class="keyword varname">table_name</var></code>. The <code class="ph codeph">Table Type</code> field
+        displays <code class="ph codeph">MANAGED_TABLE</code> for internal tables and <code class="ph codeph">EXTERNAL_TABLE</code> for
+        external tables. The <code class="ph codeph">Location</code> field displays the path of the table directory as an HDFS
+        URI.
+      </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            When you issue an <code class="ph codeph">ALTER TABLE</code> statement to rename an external table, all data files
+            are left in their original locations.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            You can point multiple external tables at the same HDFS directory by using the same
+            <code class="ph codeph">LOCATION</code> attribute for each one. The tables could have different column definitions,
+            as long as the number and types of columns are compatible with the schema evolution considerations for
+            the underlying file type. For example, for text data files, one table might define a certain column as
+            a <code class="ph codeph">STRING</code> while another defines the same column as a <code class="ph codeph">BIGINT</code>.
+          </p>
+        </li>
+      </ul>
+
+      <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+      <div class="p">
+        You can switch a table from internal to external, or from external to internal, by using the <code class="ph codeph">ALTER
+        TABLE</code> statement:
+<pre class="pre codeblock"><code>
+-- Switch a table from internal to external.
+ALTER TABLE <var class="keyword varname">table_name</var> SET TBLPROPERTIES('EXTERNAL'='TRUE');
+
+-- Switch a table from external to internal.
+ALTER TABLE <var class="keyword varname">table_name</var> SET TBLPROPERTIES('EXTERNAL'='FALSE');
+</code></pre>
+      </div>
+
+      <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+      <p class="p">
+        <a class="xref" href="impala_tables.html#internal_tables">Internal Tables</a>, <a class="xref" href="impala_create_table.html#create_table">CREATE TABLE Statement</a>,
+        <a class="xref" href="impala_drop_table.html#drop_table">DROP TABLE Statement</a>, <a class="xref" href="impala_alter_table.html#alter_table">ALTER TABLE Statement</a>,
+        <a class="xref" href="impala_describe.html#describe">DESCRIBE Statement</a>
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="tables__table_file_formats">
+    <h2 class="title topictitle2" id="ariaid-title4">File Formats</h2>
+
+    <div class="body conbody">
+      <p class="p">
+        Each table has an associated file format, which determines how Impala interprets the
+        associated data files. See <a class="xref" href="impala_file_formats.html#file_formats">How Impala Works with Hadoop File Formats</a> for details.
+      </p>
+      <p class="p">
+        You set the file format during the <code class="ph codeph">CREATE TABLE</code> statement,
+        or change it later using the <code class="ph codeph">ALTER TABLE</code> statement.
+        Partitioned tables can have a different file format for individual partitions,
+        allowing you to change the file format used in your ETL process for new data
+        without going back and reconverting all the existing data in the same table.
+      </p>
+      <p class="p">
+        Any <code class="ph codeph">INSERT</code> statements produce new data files with the current file format of the table.
+        For existing data files, changing the file format of the table does not automatically do any data conversion.
+        You must use <code class="ph codeph">TRUNCATE TABLE</code> or <code class="ph codeph">INSERT OVERWRITE</code> to remove any previous data
+        files that use the old file format.
+        Then you use the <code class="ph codeph">LOAD DATA</code> statement, <code class="ph codeph">INSERT ... SELECT</code>, or other mechanism
+        to put data files of the correct format into the table.
+      </p>
+      <p class="p">
+        The default file format, text, is the most flexible and easy to produce when you are just getting started with
+        Impala. The Parquet file format offers the highest query performance and uses compression to reduce storage
+        requirements; therefore, where practical, use Parquet for Impala tables with substantial amounts of data.
+        <span class="ph">Also, the complex types (<code class="ph codeph">ARRAY</code>, <code class="ph codeph">STRUCT</code>, and <code class="ph codeph">MAP</code>)
+        available in <span class="keyword">Impala 2.3</span> and higher are currently only supported with the Parquet file type.</span>
+        Based on your existing ETL workflow, you might use other file formats such as Avro, possibly doing a final
+        conversion step to Parquet to take advantage of its performance for analytic queries.
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title5" id="tables__kudu_tables">
+    <h2 class="title topictitle2" id="ariaid-title5">Kudu Tables</h2>
+    
+
+    <div class="body conbody">
+      <p class="p">
+        Tables stored in Apache Kudu are treated specially, because Kudu manages its data independently of HDFS files.
+        Some information about the table is stored in the metastore database for use by Impala. Other table metadata is
+        managed internally by Kudu.
+      </p>
+
+      <p class="p">
+        When you create a Kudu table through Impala, it is assigned an internal Kudu table name of the form
+        <code class="ph codeph">impala::<var class="keyword varname">db_name</var>.<var class="keyword varname">table_name</var></code>. You can see the Kudu-assigned name
+        in the output of <code class="ph codeph">DESCRIBE FORMATTED</code>, in the <code class="ph codeph">kudu.table_name</code> field of the table properties.
+        The Kudu-assigned name remains the same even if you use <code class="ph codeph">ALTER TABLE</code> to rename the Impala table
+        or move it to a different Impala database. If you issue the statement
+        <code class="ph codeph">ALTER TABLE <var class="keyword varname">impala_name</var> SET TBLPROPERTIES('kudu.table_name' = '<var class="keyword varname">different_kudu_table_name</var>')</code>,
+        the effect is different depending on whether the Impala table was created with a regular <code class="ph codeph">CREATE TABLE</code>
+        statement (that is, if it is an internal or managed table), or if it was created with a
+        <code class="ph codeph">CREATE EXTERNAL TABLE</code> statement (and therefore is an external table). Changing the <code class="ph codeph">kudu.table_name</code>
+        property of an internal table physically renames the underlying Kudu table to match the new name.
+        Changing the <code class="ph codeph">kudu.table_name</code> property of an external table switches which underlying Kudu table
+        the Impala table refers to; the underlying Kudu table must already exist.
+      </p>
+
+      <p class="p">
+        The following example shows what happens with both internal and external Kudu tables as the <code class="ph codeph">kudu.table_name</code>
+        property is changed. In practice, external tables are typically used to access underlying Kudu tables that were created
+        outside of Impala, that is, through the Kudu API.
+      </p>
+
+<pre class="pre codeblock"><code>
+-- This is an internal table that we will create and then rename.
+create table old_name (id bigint primary key, s string)
+  partition by hash(id) partitions 2 stored as kudu;
+
+-- Initially, the name OLD_NAME is the same on the Impala and Kudu sides.
+describe formatted old_name;
+...
+| Location:          | hdfs://host.example.com:8020/path/user.db/old_name
+| Table Type:        | MANAGED_TABLE         | NULL
+| Table Parameters:  | NULL                  | NULL
+|                    | DO_NOT_UPDATE_STATS   | true
+|                    | kudu.master_addresses | vd0342.example.com
+|                    | kudu.table_name       | impala::user.old_name
+
+-- ALTER TABLE RENAME TO changes the Impala name but not the underlying Kudu name.
+alter table old_name rename to new_name;
+
+describe formatted new_name;
+| Location:          | hdfs://host.example.com:8020/path/user.db/new_name
+| Table Type:        | MANAGED_TABLE         | NULL
+| Table Parameters:  | NULL                  | NULL
+|                    | DO_NOT_UPDATE_STATS   | true
+|                    | kudu.master_addresses | vd0342.example.com
+|                    | kudu.table_name       | impala::user.old_name
+
+-- Setting TBLPROPERTIES changes the underlying Kudu name.
+alter table new_name
+  set tblproperties('kudu.table_name' = 'impala::user.new_name');
+
+describe formatted new_name;
+| Location:          | hdfs://host.example.com:8020/path/user.db/new_name
+| Table Type:        | MANAGED_TABLE         | NULL
+| Table Parameters:  | NULL                  | NULL
+|                    | DO_NOT_UPDATE_STATS   | true
+|                    | kudu.master_addresses | vd0342.example.com
+|                    | kudu.table_name       | impala::user.new_name
+
+-- Put some data in the table to demonstrate how external tables can map to
+-- different underlying Kudu tables.
+insert into new_name values (0, 'zero'), (1, 'one'), (2, 'two');
+
+-- This external table points to the same underlying Kudu table, NEW_NAME,
+-- as we created above. No need to declare columns or other table aspects.
+create external table kudu_table_alias stored as kudu
+  tblproperties('kudu.table_name' = 'impala::user.new_name');
+
+-- The external table can fetch data from the NEW_NAME table that already
+-- existed and already had data.
+select * from kudu_table_alias limit 100;
++----+------+
+| id | s    |
++----+------+
+| 1  | one  |
+| 0  | zero |
+| 2  | two  |
++----+------+
+
+-- We cannot re-point the external table at a different underlying Kudu table
+-- unless that other underlying Kudu table already exists.
+alter table kudu_table_alias
+  set tblproperties('kudu.table_name' = 'impala::user.yet_another_name');
+ERROR:
+TableLoadingException: Error opening Kudu table 'impala::user.yet_another_name',
+  Kudu error: The table does not exist: table_name: "impala::user.yet_another_name"
+
+-- Once the underlying Kudu table exists, we can re-point the external table to it.
+create table yet_another_name (id bigint primary key, x int, y int, s string)
+  partition by hash(id) partitions 2 stored as kudu;
+
+alter table kudu_table_alias
+  set tblproperties('kudu.table_name' = 'impala::user.yet_another_name');
+
+-- Now no data is returned because this other table is empty.
+select * from kudu_table_alias limit 100;
+
+-- The Impala table automatically recognizes the table schema of the new table,
+-- for example the extra X and Y columns not present in the original table.
+describe kudu_table_alias;
++------+--------+---------+-------------+----------+...
+| name | type   | comment | primary_key | nullable |...
++------+--------+---------+-------------+----------+...
+| id   | bigint |         | true        | false    |...
+| x    | int    |         | false       | true     |...
+| y    | int    |         | false       | true     |...
+| s    | string |         | false       | true     |...
++------+--------+---------+-------------+----------+...
+</code></pre>
+
+      <p class="p">
+        The <code class="ph codeph">SHOW TABLE STATS</code> output for a Kudu table shows Kudu-specific details about the layout of the table.
+        Instead of information about the number and sizes of files, the information is divided by the Kudu tablets.
+        For each tablet, the output includes the fields
+        <code class="ph codeph"># Rows</code> (although this number is not currently computed), <code class="ph codeph">Start Key</code>, <code class="ph codeph">Stop Key</code>, <code class="ph codeph">Leader Replica</code>, and <code class="ph codeph"># Replicas</code>.
+        The output of <code class="ph codeph">SHOW COLUMN STATS</code>, illustrating the distribution of values within each column, is the same for Kudu tables
+        as for HDFS-backed tables.
+      </p>
+
+      <div class="p">
+        The distinction between internal and external tables has some special
+        details for Kudu tables. Tables created entirely through Impala are
+        internal tables. The table name as represented within Kudu includes
+        notation such as an <code class="ph codeph">impala::</code> prefix and the Impala
+        database name. External Kudu tables are those created by a non-Impala
+        mechanism, such as a user application calling the Kudu APIs. For
+        these tables, the <code class="ph codeph">CREATE EXTERNAL TABLE</code> syntax lets
+        you establish a mapping from Impala to the existing Kudu table:
+<pre class="pre codeblock"><code>
+CREATE EXTERNAL TABLE impala_name STORED AS KUDU
+  TBLPROPERTIES('kudu.table_name' = 'original_kudu_name');
+</code></pre>
+        External Kudu tables differ in one important way from other external
+        tables: adding or dropping a column or range partition changes the
+        data in the underlying Kudu table, in contrast to an HDFS-backed
+        external table where existing data files are left untouched.
+      </div>
+    </div>
+  </article>
+
+</article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_timeouts.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_timeouts.html b/docs/build/html/topics/impala_timeouts.html
new file mode 100644
index 0000000..2005c7d
--- /dev/null
+++ b/docs/build/html/topics/impala_timeouts.html
@@ -0,0 +1,168 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_admin.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="timeouts"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Setting Timeout Periods for Daemons, Queries, and Sessions</title></head><body id="timeouts"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Setting Timeout Periods for Daemons, Queries, and Sessions</h1>
+
+  
+
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      Depending on how busy your <span class="keyword"></span> cluster is, you might increase or decrease various timeout
+      values. Increase timeouts if Impala is cancelling operations prematurely, when the system
+      is responding slower than usual but the operations are still successful if given extra
+      time. Decrease timeouts if operations are idle or hanging for long periods, and the idle
+      or hung operations are consuming resources and reducing concurrency.
+    </p>
+
+    <p class="p toc inpage"></p>
+
+  </div>
+
+  <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_admin.html">Impala Administration</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="timeouts__statestore_timeout">
+
+    <h2 class="title topictitle2" id="ariaid-title2">Increasing the Statestore Timeout</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        If you have an extensive Impala schema, for example with hundreds of databases, tens of
+        thousands of tables, and so on, you might encounter timeout errors during startup as the
+        Impala catalog service broadcasts metadata to all the Impala nodes using the statestore
+        service. To avoid such timeout errors on startup, increase the statestore timeout value
+        from its default of 10 seconds. Specify the timeout value using the
+        <code class="ph codeph">-statestore_subscriber_timeout_seconds</code> option for the statestore
+        service, using the configuration instructions in
+        <a class="xref" href="impala_config_options.html#config_options">Modifying Impala Startup Options</a>. The symptom of this problem is
+        messages in the <code class="ph codeph">impalad</code> log such as:
+      </p>
+
+<pre class="pre codeblock"><code>Connection with state-store lost
+Trying to re-register with state-store</code></pre>
+
+      <p class="p">
+        See <a class="xref" href="impala_scalability.html#statestore_scalability">Scalability Considerations for the Impala Statestore</a> for more details about
+        statestore operation and settings on clusters with a large number of Impala-related
+        objects such as tables and partitions.
+      </p>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="timeouts__impalad_timeout">
+
+    <h2 class="title topictitle2" id="ariaid-title3">Setting the Idle Query and Idle Session Timeouts for impalad</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        To keep long-running queries or idle sessions from tying up cluster resources, you can
+        set timeout intervals for both individual queries, and entire sessions.
+      </p>
+
+      <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+        <p class="p">
+          The timeout clock for queries and sessions only starts ticking when the query or session is idle.
+          For queries, this means the query has results ready but is waiting for a client to fetch the data. A
+          query can run for an arbitrary time without triggering a timeout, because the query is computing results
+          rather than sitting idle waiting for the results to be fetched. The timeout period is intended to prevent
+          unclosed queries from consuming resources and taking up slots in the admission count of running queries,
+          potentially preventing other queries from starting.
+        </p>
+        <p class="p">
+          For sessions, this means that no query has been submitted for some period of time.
+        </p>
+      </div>
+
+      <p class="p">
+        Specify the following startup options for the <span class="keyword cmdname">impalad</span> daemon:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          The <code class="ph codeph">--idle_query_timeout</code> option specifies the time in seconds after
+          which an idle query is cancelled. This could be a query whose results were all fetched
+          but was never closed, or one whose results were partially fetched and then the client
+          program stopped requesting further results. This condition is most likely to occur in
+          a client program using the JDBC or ODBC interfaces, rather than in the interactive
+          <span class="keyword cmdname">impala-shell</span> interpreter. Once the query is cancelled, the client
+          program cannot retrieve any further results.
+        </li>
+
+        <li class="li">
+          The <code class="ph codeph">--idle_session_timeout</code> option specifies the time in seconds after
+          which an idle session is expired. A session is idle when no activity is occurring for
+          any of the queries in that session, and the session has not started any new queries.
+          Once a session is expired, you cannot issue any new query requests to it. The session
+          remains open, but the only operation you can perform is to close it. The default value
+          of 0 means that sessions never expire.
+        </li>
+      </ul>
+
+      <p class="p">
+        For instructions on changing <span class="keyword cmdname">impalad</span> startup options, see
+        <a class="xref" href="impala_config_options.html#config_options">Modifying Impala Startup Options</a>.
+      </p>
+
+      <p class="p">
+        You can reduce the idle query timeout by using the <code class="ph codeph">QUERY_TIMEOUT_S</code>
+        query option. Any value specified for the <code class="ph codeph">--idle_query_timeout</code> startup
+        option serves as an upper limit for the <code class="ph codeph">QUERY_TIMEOUT_S</code> query option.
+        See <a class="xref" href="impala_query_timeout_s.html#query_timeout_s">QUERY_TIMEOUT_S Query Option (Impala 2.0 or higher only)</a> for details.
+      </p>
+
+    </div>
+
+  </article>
+  <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="timeouts__concept_rfy_jl1_rx">
+    <h2 class="title topictitle2" id="ariaid-title4">Setting Timeout and Retries for Thrift Connections to the Backend
+      Client</h2>
+    <div class="body conbody">
+      <p class="p">Impala connections to the backend client are subject to failure in
+        cases when the network is momentarily overloaded. To avoid failed
+        queries due to transient network problems, you can configure the number
+        of Thrift connection retries using the following option: </p>
+      <ul class="ul" id="concept_rfy_jl1_rx__ul_bj3_ql1_rx">
+        <li class="li">The <code class="ph codeph">--backend_client_connection_num_retries</code> option
+          specifies the number of times Impala will try connecting to the
+          backend client after the first connection attempt fails. By default,
+            <span class="keyword cmdname">impalad</span> will attempt three re-connections before
+          it returns a failure. </li>
+      </ul>
+      <p class="p">You can configure timeouts for sending and receiving data from the
+        backend client. Therefore, if for some reason a query hangs, instead of
+        waiting indefinitely for a response, Impala will terminate the
+        connection after a configurable timeout.</p>
+      <ul class="ul" id="concept_rfy_jl1_rx__ul_vm2_2v1_rx">
+        <li class="li">The <code class="ph codeph">--backend_client_rpc_timeout_ms</code> option can be
+          used to specify the number of milliseconds Impala should wait for a
+          response from the backend client before it terminates the connection
+          and signals a failure. The default value for this property is 300000
+          milliseconds, or 5 minutes. </li>
+      </ul>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title5" id="timeouts__cancel_query">
+
+    <h2 class="title topictitle2" id="ariaid-title5">Cancelling a Query</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        Sometimes, an Impala query might run for an unexpectedly long time, tying up resources
+        in the cluster. You can cancel the query explicitly, independent of the timeout period,
+        by going into the web UI for the <span class="keyword cmdname">impalad</span> host (on port 25000 by
+        default), and using the link on the <code class="ph codeph">/queries</code> tab to cancel the running
+        query. For example, press <code class="ph codeph">^C</code> in <span class="keyword cmdname">impala-shell</span>.
+      </p>
+
+    </div>
+
+  </article>
+
+</article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_timestamp.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_timestamp.html b/docs/build/html/topics/impala_timestamp.html
new file mode 100644
index 0000000..02f86fc
--- /dev/null
+++ b/docs/build/html/topics/impala_timestamp.html
@@ -0,0 +1,514 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_datatypes.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="timestamp"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>TIMESTAMP Data Type</title></head><body id="timestamp"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">TIMESTAMP Data Type</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      A data type used in <code class="ph codeph">CREATE TABLE</code> and <code class="ph codeph">ALTER TABLE</code> statements, representing a
+      point in time.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+    <p class="p">
+      In the column definition of a <code class="ph codeph">CREATE TABLE</code> statement:
+    </p>
+
+<pre class="pre codeblock"><code><var class="keyword varname">column_name</var> TIMESTAMP</code></pre>
+
+    <p class="p">
+      <strong class="ph b">Range:</strong> Allowed date values range from 1400-01-01 to 9999-12-31; this range is different from the Hive
+      <code class="ph codeph">TIMESTAMP</code> type. Internally, the resolution of the time portion of a
+      <code class="ph codeph">TIMESTAMP</code> value is in nanoseconds.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">INTERVAL expressions:</strong>
+    </p>
+
+    <p class="p">
+      You can perform date arithmetic by adding or subtracting a specified number of time units, using the
+      <code class="ph codeph">INTERVAL</code> keyword and the <code class="ph codeph">+</code> and <code class="ph codeph">-</code> operators or
+      <code class="ph codeph">date_add()</code> and <code class="ph codeph">date_sub()</code> functions. You can specify units as
+      <code class="ph codeph">YEAR[S]</code>, <code class="ph codeph">MONTH[S]</code>, <code class="ph codeph">WEEK[S]</code>, <code class="ph codeph">DAY[S]</code>,
+      <code class="ph codeph">HOUR[S]</code>, <code class="ph codeph">MINUTE[S]</code>, <code class="ph codeph">SECOND[S]</code>,
+      <code class="ph codeph">MILLISECOND[S]</code>, <code class="ph codeph">MICROSECOND[S]</code>, and <code class="ph codeph">NANOSECOND[S]</code>. You can
+      only specify one time unit in each interval expression, for example <code class="ph codeph">INTERVAL 3 DAYS</code> or
+      <code class="ph codeph">INTERVAL 25 HOURS</code>, but you can produce any granularity by adding together successive
+      <code class="ph codeph">INTERVAL</code> values, such as <code class="ph codeph"><var class="keyword varname">timestamp_value</var> + INTERVAL 3 WEEKS -
+      INTERVAL 1 DAY + INTERVAL 10 MICROSECONDS</code>.
+    </p>
+
+    <p class="p">
+      For example:
+    </p>
+
+<pre class="pre codeblock"><code>select now() + interval 1 day;
+select date_sub(now(), interval 5 minutes);
+insert into auction_details
+  select auction_id, auction_start_time, auction_start_time + interval 2 days + interval 12 hours
+  from new_auctions;</code></pre>
+
+    <p class="p">
+      <strong class="ph b">Time zones:</strong>
+    </p>
+
+    <p class="p">
+      By default, Impala does not store timestamps using the local timezone, to avoid undesired results from
+      unexpected time zone issues. Timestamps are stored and interpreted relative to UTC, both when written to or
+      read from data files, or when converted to or from Unix time values through functions such as
+      <code class="ph codeph">from_unixtime()</code> or <code class="ph codeph">unix_timestamp()</code>. To convert such a
+      <code class="ph codeph">TIMESTAMP</code> value to one that represents the date and time in a specific time zone, convert
+      the original value with the <code class="ph codeph">from_utc_timestamp()</code> function.
+    </p>
+
+    <p class="p">
+      Because Impala does not assume that <code class="ph codeph">TIMESTAMP</code> values are in any particular time zone, you
+      must be conscious of the time zone aspects of data that you query, insert, or convert.
+    </p>
+
+    <p class="p">
+      For consistency with Unix system calls, the <code class="ph codeph">TIMESTAMP</code> returned by the <code class="ph codeph">now()</code>
+      function represents the local time in the system time zone, rather than in UTC. To store values relative to
+      the current time in a portable way, convert any <code class="ph codeph">now()</code> return values using the
+      <code class="ph codeph">to_utc_timestamp()</code> function first. For example, the following example shows that the current
+      time in California (where this Impala cluster is located) is shortly after 2 PM. If that value was written to a data
+      file, and shipped off to a distant server to be analyzed alongside other data from far-flung locations, the
+      dates and times would not match up precisely because of time zone differences. Therefore, the
+      <code class="ph codeph">to_utc_timestamp()</code> function converts it using a common reference point, the UTC time zone
+      (descended from the old Greenwich Mean Time standard). The <code class="ph codeph">'PDT'</code> argument indicates that the
+      original value is from the Pacific time zone with Daylight Saving Time in effect. When servers in all
+      geographic locations run the same transformation on any local date and time values (with the appropriate time
+      zone argument), the stored data uses a consistent representation. Impala queries can use functions such as
+      <code class="ph codeph">EXTRACT()</code>, <code class="ph codeph">MIN()</code>, <code class="ph codeph">AVG()</code>, and so on to do time-series
+      analysis on those timestamps.
+    </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; select now();
++-------------------------------+
+| now()                         |
++-------------------------------+
+| 2015-04-09 14:07:46.580465000 |
++-------------------------------+
+[localhost:21000] &gt; select to_utc_timestamp(now(), 'PDT');
++--------------------------------+
+| to_utc_timestamp(now(), 'pdt') |
++--------------------------------+
+| 2015-04-09 21:08:07.664547000  |
++--------------------------------+
+</code></pre>
+
+    <p class="p">
+      The converse function, <code class="ph codeph">from_utc_timestamp()</code>, lets you take stored <code class="ph codeph">TIMESTAMP</code>
+      data or calculated results and convert back to local date and time for processing on the application side.
+      The following example shows how you might represent some future date (such as the ending date and time of an
+      auction) in UTC, and then convert back to local time when convenient for reporting or other processing. The
+      final query in the example tests whether this arbitrary UTC date and time has passed yet, by converting it
+      back to the local time zone and comparing it against the current date and time.
+    </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; select to_utc_timestamp(now() + interval 2 weeks, 'PDT');
++---------------------------------------------------+
+| to_utc_timestamp(now() + interval 2 weeks, 'pdt') |
++---------------------------------------------------+
+| 2015-04-23 21:08:34.152923000                     |
++---------------------------------------------------+
+[localhost:21000] &gt; select from_utc_timestamp('2015-04-23 21:08:34.152923000','PDT');
++------------------------------------------------------------+
+| from_utc_timestamp('2015-04-23 21:08:34.152923000', 'pdt') |
++------------------------------------------------------------+
+| 2015-04-23 14:08:34.152923000                              |
++------------------------------------------------------------+
+[localhost:21000] &gt; select from_utc_timestamp('2015-04-23 21:08:34.152923000','PDT') &lt; now();
++--------------------------------------------------------------------+
+| from_utc_timestamp('2015-04-23 21:08:34.152923000', 'pdt') &lt; now() |
++--------------------------------------------------------------------+
+| false                                                              |
++--------------------------------------------------------------------+
+</code></pre>
+
+    <p class="p">
+      If you have data files written by Hive, those <code class="ph codeph">TIMESTAMP</code> values represent the local timezone
+      of the host where the data was written, potentially leading to inconsistent results when processed by Impala.
+      To avoid compatibility problems or having to code workarounds, you can specify one or both of these
+      <span class="keyword cmdname">impalad</span> startup flags: <code class="ph codeph">-use_local_tz_for_unix_timestamp_conversions=true</code>
+      <code class="ph codeph">-convert_legacy_hive_parquet_utc_timestamps=true</code>. Although
+      <code class="ph codeph">-convert_legacy_hive_parquet_utc_timestamps</code> is turned off by default to avoid performance overhead, where practical
+      turn it on when processing <code class="ph codeph">TIMESTAMP</code> columns in Parquet files written by Hive, to avoid unexpected behavior.
+    </p>
+
+    <p class="p">
+      The <code class="ph codeph">-use_local_tz_for_unix_timestamp_conversions</code> setting affects conversions from
+      <code class="ph codeph">TIMESTAMP</code> to <code class="ph codeph">BIGINT</code>, or from <code class="ph codeph">BIGINT</code>
+      to <code class="ph codeph">TIMESTAMP</code>. By default, Impala treats all <code class="ph codeph">TIMESTAMP</code> values as UTC,
+      to simplify analysis of time-series data from different geographic regions. When you enable the
+      <code class="ph codeph">-use_local_tz_for_unix_timestamp_conversions</code> setting, these operations
+      treat the input values as if they are in the local tie zone of the host doing the processing.
+      See <a class="xref" href="impala_datetime_functions.html#datetime_functions">Impala Date and Time Functions</a> for the list of functions
+      affected by the <code class="ph codeph">-use_local_tz_for_unix_timestamp_conversions</code> setting.
+    </p>
+
+    <p class="p">
+      The following sequence of examples shows how the interpretation of <code class="ph codeph">TIMESTAMP</code> values in
+      Parquet tables is affected by the setting of the <code class="ph codeph">-convert_legacy_hive_parquet_utc_timestamps</code>
+      setting.
+    </p>
+
+    <p class="p">
+      Regardless of the <code class="ph codeph">-convert_legacy_hive_parquet_utc_timestamps</code> setting,
+      <code class="ph codeph">TIMESTAMP</code> columns in text tables can be written and read interchangeably by Impala and Hive:
+    </p>
+
+<pre class="pre codeblock"><code>Impala DDL and queries for text table:
+
+[localhost:21000] &gt; create table t1 (x timestamp);
+[localhost:21000] &gt; insert into t1 values (now()), (now() + interval 1 day);
+[localhost:21000] &gt; select x from t1;
++-------------------------------+
+| x                             |
++-------------------------------+
+| 2015-04-07 15:43:02.892403000 |
+| 2015-04-08 15:43:02.892403000 |
++-------------------------------+
+[localhost:21000] &gt; select to_utc_timestamp(x, 'PDT') from t1;
++-------------------------------+
+| to_utc_timestamp(x, 'pdt')    |
++-------------------------------+
+| 2015-04-07 22:43:02.892403000 |
+| 2015-04-08 22:43:02.892403000 |
++-------------------------------+
+
+Hive query for text table:
+
+hive&gt; select * from t1;
+OK
+2015-04-07 15:43:02.892403
+2015-04-08 15:43:02.892403
+Time taken: 1.245 seconds, Fetched: 2 row(s)
+</code></pre>
+
+    <p class="p">
+      When the table uses Parquet format, Impala expects any time zone adjustment to be applied prior to writing,
+      while <code class="ph codeph">TIMESTAMP</code> values written by Hive are adjusted to be in the UTC time zone. When Hive
+      queries Parquet data files that it wrote, it adjusts the <code class="ph codeph">TIMESTAMP</code> values back to the local
+      time zone, while Impala does no conversion. Hive does no time zone conversion when it queries Impala-written
+      Parquet files.
+    </p>
+
+<pre class="pre codeblock"><code>Impala DDL and queries for Parquet table:
+
+[localhost:21000] &gt; create table p1 stored as parquet as select x from t1;
++-------------------+
+| summary           |
++-------------------+
+| Inserted 2 row(s) |
++-------------------+
+[localhost:21000] &gt; select x from p1;
++-------------------------------+
+| x                             |
++-------------------------------+
+| 2015-04-07 15:43:02.892403000 |
+| 2015-04-08 15:43:02.892403000 |
++-------------------------------+
+
+Hive DDL and queries for Parquet table:
+
+hive&gt; create table h1 (x timestamp) stored as parquet;
+OK
+hive&gt; insert into h1 select * from p1;
+...
+OK
+Time taken: 35.573 seconds
+hive&gt; select x from p1;
+OK
+2015-04-07 15:43:02.892403
+2015-04-08 15:43:02.892403
+Time taken: 0.324 seconds, Fetched: 2 row(s)
+hive&gt; select x from h1;
+OK
+2015-04-07 15:43:02.892403
+2015-04-08 15:43:02.892403
+Time taken: 0.197 seconds, Fetched: 2 row(s)
+</code></pre>
+
+    <p class="p">
+      The discrepancy arises when Impala queries the Hive-created Parquet table. The underlying values in the
+      <code class="ph codeph">TIMESTAMP</code> column are different from the ones written by Impala, even though they were copied
+      from one table to another by an <code class="ph codeph">INSERT ... SELECT</code> statement in Hive. Hive did an implicit
+      conversion from the local time zone to UTC as it wrote the values to Parquet.
+    </p>
+
+<pre class="pre codeblock"><code>Impala query for TIMESTAMP values from Impala-written and Hive-written data:
+
+[localhost:21000] &gt; select * from p1;
++-------------------------------+
+| x                             |
++-------------------------------+
+| 2015-04-07 15:43:02.892403000 |
+| 2015-04-08 15:43:02.892403000 |
++-------------------------------+
+Fetched 2 row(s) in 0.29s
+[localhost:21000] &gt; select * from h1;
++-------------------------------+
+| x                             |
++-------------------------------+
+| 2015-04-07 22:43:02.892403000 |
+| 2015-04-08 22:43:02.892403000 |
++-------------------------------+
+Fetched 2 row(s) in 0.41s
+
+Underlying integer values for Impala-written and Hive-written data:
+
+[localhost:21000] &gt; select cast(x as bigint) from p1;
++-------------------+
+| cast(x as bigint) |
++-------------------+
+| 1428421382        |
+| 1428507782        |
++-------------------+
+Fetched 2 row(s) in 0.38s
+[localhost:21000] &gt; select cast(x as bigint) from h1;
++-------------------+
+| cast(x as bigint) |
++-------------------+
+| 1428446582        |
+| 1428532982        |
++-------------------+
+Fetched 2 row(s) in 0.20s
+</code></pre>
+
+    <p class="p">
+      When the <code class="ph codeph">-convert_legacy_hive_parquet_utc_timestamps</code> setting is enabled, Impala recognizes
+      the Parquet data files written by Hive, and applies the same UTC-to-local-timezone conversion logic during
+      the query as Hive uses, making the contents of the Impala-written <code class="ph codeph">P1</code> table and the
+      Hive-written <code class="ph codeph">H1</code> table appear identical, whether represented as <code class="ph codeph">TIMESTAMP</code>
+      values or the underlying <code class="ph codeph">BIGINT</code> integers:
+    </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; select x from p1;
++-------------------------------+
+| x                             |
++-------------------------------+
+| 2015-04-07 15:43:02.892403000 |
+| 2015-04-08 15:43:02.892403000 |
++-------------------------------+
+Fetched 2 row(s) in 0.37s
+[localhost:21000] &gt; select x from h1;
++-------------------------------+
+| x                             |
++-------------------------------+
+| 2015-04-07 15:43:02.892403000 |
+| 2015-04-08 15:43:02.892403000 |
++-------------------------------+
+Fetched 2 row(s) in 0.19s
+[localhost:21000] &gt; select cast(x as bigint) from p1;
++-------------------+
+| cast(x as bigint) |
++-------------------+
+| 1428446582        |
+| 1428532982        |
++-------------------+
+Fetched 2 row(s) in 0.29s
+[localhost:21000] &gt; select cast(x as bigint) from h1;
++-------------------+
+| cast(x as bigint) |
++-------------------+
+| 1428446582        |
+| 1428532982        |
++-------------------+
+Fetched 2 row(s) in 0.22s
+</code></pre>
+
+    <p class="p">
+      <strong class="ph b">Conversions:</strong>
+    </p>
+
+    <p class="p">
+        Impala automatically converts <code class="ph codeph">STRING</code> literals of the correct format into
+        <code class="ph codeph">TIMESTAMP</code> values. Timestamp values are accepted in the format
+        <code class="ph codeph">"yyyy-MM-dd HH:mm:ss.SSSSSS"</code>, and can consist of just the date, or just the time, with or
+        without the fractional second portion. For example, you can specify <code class="ph codeph">TIMESTAMP</code> values such as
+        <code class="ph codeph">'1966-07-30'</code>, <code class="ph codeph">'08:30:00'</code>, or <code class="ph codeph">'1985-09-25 17:45:30.005'</code>.
+        <span class="ph">Casting an integer or floating-point value <code class="ph codeph">N</code> to
+        <code class="ph codeph">TIMESTAMP</code> produces a value that is <code class="ph codeph">N</code> seconds past the start of the epoch
+        date (January 1, 1970). By default, the result value represents a date and time in the UTC time zone.
+        If the setting <code class="ph codeph">-use_local_tz_for_unix_timestamp_conversions=true</code> is in effect,
+        the resulting <code class="ph codeph">TIMESTAMP</code> represents a date and time in the local time zone.</span>
+      </p>
+
+    <p class="p">
+      In Impala 1.3 and higher, the <code class="ph codeph">FROM_UNIXTIME()</code> and <code class="ph codeph">UNIX_TIMESTAMP()</code>
+      functions allow a wider range of format strings, with more flexibility in element order, repetition of letter
+      placeholders, and separator characters. In <span class="keyword">Impala 2.3</span> and higher, the <code class="ph codeph">UNIX_TIMESTAMP()</code>
+      function also allows a numeric timezone offset to be specified as part of the input string.
+      See <a class="xref" href="impala_datetime_functions.html#datetime_functions">Impala Date and Time Functions</a> for details.
+    </p>
+
+    <p class="p">
+        In Impala 2.2.0 and higher, built-in functions that accept or return integers representing <code class="ph codeph">TIMESTAMP</code> values
+        use the <code class="ph codeph">BIGINT</code> type for parameters and return values, rather than <code class="ph codeph">INT</code>.
+        This change lets the date and time functions avoid an overflow error that would otherwise occur
+        on January 19th, 2038 (known as the
+        <a class="xref" href="http://en.wikipedia.org/wiki/Year_2038_problem" target="_blank"><span class="q">"Year 2038 problem"</span> or <span class="q">"Y2K38 problem"</span></a>).
+        This change affects the <code class="ph codeph">from_unixtime()</code> and <code class="ph codeph">unix_timestamp()</code> functions.
+        You might need to change application code that interacts with these functions, change the types of
+        columns that store the return values, or add <code class="ph codeph">CAST()</code> calls to SQL statements that
+        call these functions.
+      </p>
+
+    <p class="p">
+      <strong class="ph b">Partitioning:</strong>
+    </p>
+
+    <p class="p">
+      Although you cannot use a <code class="ph codeph">TIMESTAMP</code> column as a partition key, you can extract the
+      individual years, months, days, hours, and so on and partition based on those columns. Because the partition
+      key column values are represented in HDFS directory names, rather than as fields in the data files
+      themselves, you can also keep the original <code class="ph codeph">TIMESTAMP</code> values if desired, without duplicating
+      data or wasting storage space. See <a class="xref" href="impala_partitioning.html#partition_key_columns">Partition Key Columns</a> for more
+      details on partitioning with date and time values.
+    </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; create table timeline (event string) partitioned by (happened timestamp);
+ERROR: AnalysisException: Type 'TIMESTAMP' is not supported as partition-column type in column: happened
+</code></pre>
+
+    <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>select cast('1966-07-30' as timestamp);
+select cast('1985-09-25 17:45:30.005' as timestamp);
+select cast('08:30:00' as timestamp);
+select hour('1970-01-01 15:30:00');         -- Succeeds, returns 15.
+select hour('1970-01-01 15:30');            -- Returns NULL because seconds field required.
+select hour('1970-01-01 27:30:00');         -- Returns NULL because hour value out of range.
+select dayofweek('2004-06-13');             -- Returns 1, representing Sunday.
+select dayname('2004-06-13');               -- Returns 'Sunday'.
+select date_add('2004-06-13', 365);         -- Returns 2005-06-13 with zeros for hh:mm:ss fields.
+select day('2004-06-13');                   -- Returns 13.
+select datediff('1989-12-31','1984-09-01'); -- How many days between these 2 dates?
+select now();                               -- Returns current date and time in local timezone.
+
+create table dates_and_times (t timestamp);
+insert into dates_and_times values
+  ('1966-07-30'), ('1985-09-25 17:45:30.005'), ('08:30:00'), (now());
+</code></pre>
+
+    <p class="p">
+        <strong class="ph b">NULL considerations:</strong> Casting any unrecognized <code class="ph codeph">STRING</code> value to this type produces a
+        <code class="ph codeph">NULL</code> value.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Partitioning:</strong> Because this type potentially has so many distinct values, it is often not a sensible
+        choice for a partition key column. For example, events 1 millisecond apart would be stored in different
+        partitions. Consider using the <code class="ph codeph">TRUNC()</code> function to condense the number of distinct values,
+        and partition on a new column with the truncated values.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">HBase considerations:</strong> This data type is fully compatible with HBase tables.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Parquet considerations:</strong> This type is fully compatible with Parquet tables.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Text table considerations:</strong> Values of this type are potentially larger in text tables than in tables
+        using Parquet or other binary formats.
+      </p>
+
+
+
+    <p class="p">
+        <strong class="ph b">Internal details:</strong> Represented in memory as a 16-byte value.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Added in:</strong> Available in all versions of Impala.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Column statistics considerations:</strong> Because this type has a fixed size, the maximum and average size
+        fields are always filled in for column statistics, even before you run the <code class="ph codeph">COMPUTE STATS</code>
+        statement.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Sqoop considerations:</strong>
+      </p>
+
+    <p class="p"> If you use Sqoop to
+        convert RDBMS data to Parquet, be careful with interpreting any
+        resulting values from <code class="ph codeph">DATE</code>, <code class="ph codeph">DATETIME</code>,
+        or <code class="ph codeph">TIMESTAMP</code> columns. The underlying values are
+        represented as the Parquet <code class="ph codeph">INT64</code> type, which is
+        represented as <code class="ph codeph">BIGINT</code> in the Impala table. The Parquet
+        values represent the time in milliseconds, while Impala interprets
+          <code class="ph codeph">BIGINT</code> as the time in seconds. Therefore, if you have
+        a <code class="ph codeph">BIGINT</code> column in a Parquet table that was imported
+        this way from Sqoop, divide the values by 1000 when interpreting as the
+          <code class="ph codeph">TIMESTAMP</code> type.</p>
+
+    <p class="p">
+        <strong class="ph b">Restrictions:</strong>
+      </p>
+
+    <p class="p">
+      If you cast a <code class="ph codeph">STRING</code> with an unrecognized format to a <code class="ph codeph">TIMESTAMP</code>, the result
+      is <code class="ph codeph">NULL</code> rather than an error. Make sure to test your data pipeline to be sure any textual
+      date and time values are in a format that Impala <code class="ph codeph">TIMESTAMP</code> can recognize.
+    </p>
+
+    <p class="p">
+        Currently, Avro tables cannot contain <code class="ph codeph">TIMESTAMP</code> columns. If you need to store date and
+        time values in Avro tables, as a workaround you can use a <code class="ph codeph">STRING</code> representation of the
+        values, convert the values to <code class="ph codeph">BIGINT</code> with the <code class="ph codeph">UNIX_TIMESTAMP()</code> function,
+        or create separate numeric columns for individual date and time fields using the <code class="ph codeph">EXTRACT()</code>
+        function.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Kudu considerations:</strong>
+      </p>
+    <p class="p">
+        Currently, the data types <code class="ph codeph">DECIMAL</code>, <code class="ph codeph">TIMESTAMP</code>, <code class="ph codeph">CHAR</code>, <code class="ph codeph">VARCHAR</code>,
+        <code class="ph codeph">ARRAY</code>, <code class="ph codeph">MAP</code>, and <code class="ph codeph">STRUCT</code> cannot be used with Kudu tables.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+    <ul class="ul">
+      <li class="li">
+
+        <a class="xref" href="impala_literals.html#timestamp_literals">Timestamp Literals</a>.
+      </li>
+
+      <li class="li">
+        To convert to or from different date formats, or perform date arithmetic, use the date and time functions
+        described in <a class="xref" href="impala_datetime_functions.html#datetime_functions">Impala Date and Time Functions</a>. In particular, the
+        <code class="ph codeph">from_unixtime()</code> function requires a case-sensitive format string such as
+        <code class="ph codeph">"yyyy-MM-dd HH:mm:ss.SSSS"</code>, matching one of the allowed variations of a
+        <code class="ph codeph">TIMESTAMP</code> value (date plus time, only date, only time, optional fractional seconds).
+      </li>
+
+      <li class="li">
+        See <a class="xref" href="impala_langref_unsupported.html#langref_hiveql_delta">SQL Differences Between Impala and Hive</a> for details about differences in
+        <code class="ph codeph">TIMESTAMP</code> handling between Impala and Hive.
+      </li>
+    </ul>
+
+  </div>
+
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_datatypes.html">Data Types</a></div></div></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_tinyint.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_tinyint.html b/docs/build/html/topics/impala_tinyint.html
new file mode 100644
index 0000000..9efc098
--- /dev/null
+++ b/docs/build/html/topics/impala_tinyint.html
@@ -0,0 +1,131 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_datatypes.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="tinyint"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>TINYINT Data Type</title></head><body id="tinyint"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">TINYINT Data Type</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      A 1-byte integer data type used in <code class="ph codeph">CREATE TABLE</code> and <code class="ph codeph">ALTER TABLE</code> statements.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+    <p class="p">
+      In the column definition of a <code class="ph codeph">CREATE TABLE</code> statement:
+    </p>
+
+<pre class="pre codeblock"><code><var class="keyword varname">column_name</var> TINYINT</code></pre>
+
+    <p class="p">
+      <strong class="ph b">Range:</strong> -128 .. 127. There is no <code class="ph codeph">UNSIGNED</code> subtype.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Conversions:</strong> Impala automatically converts to a larger integer type (<code class="ph codeph">SMALLINT</code>,
+      <code class="ph codeph">INT</code>, or <code class="ph codeph">BIGINT</code>) or a floating-point type (<code class="ph codeph">FLOAT</code> or
+      <code class="ph codeph">DOUBLE</code>) automatically. Use <code class="ph codeph">CAST()</code> to convert to <code class="ph codeph">STRING</code> or
+      <code class="ph codeph">TIMESTAMP</code>.
+      <span class="ph">Casting an integer or floating-point value <code class="ph codeph">N</code> to
+        <code class="ph codeph">TIMESTAMP</code> produces a value that is <code class="ph codeph">N</code> seconds past the start of the epoch
+        date (January 1, 1970). By default, the result value represents a date and time in the UTC time zone.
+        If the setting <code class="ph codeph">-use_local_tz_for_unix_timestamp_conversions=true</code> is in effect,
+        the resulting <code class="ph codeph">TIMESTAMP</code> represents a date and time in the local time zone.</span>
+    </p>
+
+    <p class="p">
+        Impala does not return column overflows as <code class="ph codeph">NULL</code>, so that customers can distinguish
+        between <code class="ph codeph">NULL</code> data and overflow conditions similar to how they do so with traditional
+        database systems. Impala returns the largest or smallest value in the range for the type. For example,
+        valid values for a <code class="ph codeph">tinyint</code> range from -128 to 127. In Impala, a <code class="ph codeph">tinyint</code>
+        with a value of -200 returns -128 rather than <code class="ph codeph">NULL</code>. A <code class="ph codeph">tinyint</code> with a
+        value of 200 returns 127.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+    <p class="p">
+      For a convenient and automated way to check the bounds of the <code class="ph codeph">TINYINT</code> type, call the
+      functions <code class="ph codeph">MIN_TINYINT()</code> and <code class="ph codeph">MAX_TINYINT()</code>.
+    </p>
+
+    <p class="p">
+      If an integer value is too large to be represented as a <code class="ph codeph">TINYINT</code>, use a
+      <code class="ph codeph">SMALLINT</code> instead.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">NULL considerations:</strong> Casting any non-numeric value to this type produces a <code class="ph codeph">NULL</code>
+        value.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>CREATE TABLE t1 (x TINYINT);
+SELECT CAST(100 AS TINYINT);
+</code></pre>
+
+    <p class="p">
+        <strong class="ph b">Parquet considerations:</strong>
+      </p>
+
+
+
+    <p class="p">
+      Physically, Parquet files represent <code class="ph codeph">TINYINT</code> and <code class="ph codeph">SMALLINT</code> values as 32-bit
+      integers. Although Impala rejects attempts to insert out-of-range values into such columns, if you create a
+      new table with the <code class="ph codeph">CREATE TABLE ... LIKE PARQUET</code> syntax, any <code class="ph codeph">TINYINT</code> or
+      <code class="ph codeph">SMALLINT</code> columns in the original table turn into <code class="ph codeph">INT</code> columns in the new
+      table.
+    </p>
+
+
+
+    <p class="p">
+        <strong class="ph b">HBase considerations:</strong> This data type is fully compatible with HBase tables.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Text table considerations:</strong> Values of this type are potentially larger in text tables than in tables
+        using Parquet or other binary formats.
+      </p>
+
+
+
+    <p class="p">
+        <strong class="ph b">Internal details:</strong> Represented in memory as a 1-byte value.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Added in:</strong> Available in all versions of Impala.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Column statistics considerations:</strong> Because this type has a fixed size, the maximum and average size
+        fields are always filled in for column statistics, even before you run the <code class="ph codeph">COMPUTE STATS</code>
+        statement.
+      </p>
+
+
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+    <p class="p">
+      <a class="xref" href="impala_literals.html#numeric_literals">Numeric Literals</a>, <a class="xref" href="impala_tinyint.html#tinyint">TINYINT Data Type</a>,
+      <a class="xref" href="impala_smallint.html#smallint">SMALLINT Data Type</a>, <a class="xref" href="impala_int.html#int">INT Data Type</a>,
+      <a class="xref" href="impala_bigint.html#bigint">BIGINT Data Type</a>, <a class="xref" href="impala_decimal.html#decimal">DECIMAL Data Type (Impala 1.4 or higher only)</a>,
+      <a class="xref" href="impala_math_functions.html#math_functions">Impala Mathematical Functions</a>
+    </p>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_datatypes.html">Data Types</a></div></div></nav></article></main></body></html>
\ No newline at end of file

[10/51] [partial] incubator-impala git commit: IMPALA-4181 [DOCS] Publish rendered Impala documentation to ASF site

Posted by jb...@apache.org.

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_s3.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_s3.html b/docs/build/html/topics/impala_s3.html
new file mode 100644
index 0000000..79a4a69
--- /dev/null
+++ b/docs/build/html/topics/impala_s3.html
@@ -0,0 +1,775 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="s3"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Using Impala with the Amazon S3 Filesystem</title></head><body id="s3"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Using Impala with the Amazon S3 Filesystem</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <div class="note important note_important"><span class="note__title importanttitle">Important:</span> 
+        <p class="p">
+          In <span class="keyword">Impala 2.6</span> and higher, Impala supports both queries (<code class="ph codeph">SELECT</code>)
+          and DML (<code class="ph codeph">INSERT</code>, <code class="ph codeph">LOAD DATA</code>, <code class="ph codeph">CREATE TABLE AS SELECT</code>)
+          for data residing on Amazon S3. With the inclusion of write support,
+          
+          the Impala support for S3 is now considered ready for production use.
+        </p>
+      </div>
+
+    <p class="p">
+      
+
+      
+      You can use Impala to query data residing on the Amazon S3 filesystem. This capability allows convenient
+      access to a storage system that is remotely managed, accessible from anywhere, and integrated with various
+      cloud-based services. Impala can query files in any supported file format from S3. The S3 storage location
+      can be for an entire table, or individual partitions in a partitioned table.
+    </p>
+
+    <p class="p">
+      The default Impala tables use data files stored on HDFS, which are ideal for bulk loads and queries using
+      full-table scans. In contrast, queries against S3 data are less performant, making S3 suitable for holding
+      <span class="q">"cold"</span> data that is only queried occasionally, while more frequently accessed <span class="q">"hot"</span> data resides in
+      HDFS. In a partitioned table, you can set the <code class="ph codeph">LOCATION</code> attribute for individual partitions
+      to put some partitions on HDFS and others on S3, typically depending on the age of the data.
+    </p>
+
+    <p class="p toc inpage"></p>
+
+  </div>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title2" id="s3__s3_sql">
+    <h2 class="title topictitle2" id="ariaid-title2">How Impala SQL Statements Work with S3</h2>
+    <div class="body conbody">
+      <p class="p">
+        Impala SQL statements work with data on S3 as follows:
+      </p>
+      <ul class="ul">
+        <li class="li">
+          <p class="p">
+            The <a class="xref" href="impala_create_table.html#create_table">CREATE TABLE Statement</a>
+            or <a class="xref" href="impala_alter_table.html#alter_table">ALTER TABLE Statement</a> statements
+            can specify that a table resides on the S3 filesystem by
+            encoding an <code class="ph codeph">s3a://</code> prefix for the <code class="ph codeph">LOCATION</code>
+            property. <code class="ph codeph">ALTER TABLE</code> can also set the <code class="ph codeph">LOCATION</code>
+            property for an individual partition, so that some data in a table resides on
+            S3 and other data in the same table resides on HDFS.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            Once a table or partition is designated as residing on S3, the <a class="xref" href="impala_select.html#select">SELECT Statement</a>
+            statement transparently accesses the data files from the appropriate storage layer.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            If the S3 table is an internal table, the <a class="xref" href="impala_drop_table.html#drop_table">DROP TABLE Statement</a> statement
+            removes the corresponding data files from S3 when the table is dropped.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            The <a class="xref" href="impala_truncate_table.html#truncate_table">TRUNCATE TABLE Statement (Impala 2.3 or higher only)</a> statement always removes the corresponding
+            data files from S3 when the table is truncated.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            The <a class="xref" href="impala_load_data.html#load_data">LOAD DATA Statement</a> can move data files residing in HDFS into
+            an S3 table.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            The <a class="xref" href="impala_insert.html#insert">INSERT Statement</a> statement, or the <code class="ph codeph">CREATE TABLE AS SELECT</code>
+            form of the <code class="ph codeph">CREATE TABLE</code> statement, can copy data from an HDFS table or another S3
+            table into an S3 table. The <a class="xref" href="impala_s3_skip_insert_staging.html#s3_skip_insert_staging">S3_SKIP_INSERT_STAGING Query Option (Impala 2.6 or higher only)</a>
+            query option chooses whether or not to use a fast code path for these write operations to S3,
+            with the tradeoff of potential inconsistency in the case of a failure during the statement.
+          </p>
+        </li>
+      </ul>
+      <p class="p">
+        For usage information about Impala SQL statements with S3 tables, see <a class="xref" href="impala_s3.html#s3_ddl">Creating Impala Databases, Tables, and Partitions for Data Stored on S3</a>
+        and <a class="xref" href="impala_s3.html#s3_dml">Using Impala DML Statements for S3 Data</a>.
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="s3__s3_creds">
+
+    <h2 class="title topictitle2" id="ariaid-title3">Specifying Impala Credentials to Access Data in S3</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        
+        
+        
+        
+        To allow Impala to access data in S3, specify values for the following configuration settings in your
+        <span class="ph filepath">core-site.xml</span> file:
+      </p>
+
+
+<pre class="pre codeblock"><code>
+&lt;property&gt;
+&lt;name&gt;fs.s3a.access.key&lt;/name&gt;
+&lt;value&gt;<var class="keyword varname">your_access_key</var>&lt;/value&gt;
+&lt;/property&gt;
+&lt;property&gt;
+&lt;name&gt;fs.s3a.secret.key&lt;/name&gt;
+&lt;value&gt;<var class="keyword varname">your_secret_key</var>&lt;/value&gt;
+&lt;/property&gt;
+</code></pre>
+
+      <p class="p">
+        After specifying the credentials, restart both the Impala and
+        Hive services. (Restarting Hive is required because Impala queries, CREATE TABLE statements, and so on go
+        through the Hive metastore.)
+      </p>
+
+      <div class="note important note_important"><span class="note__title importanttitle">Important:</span> 
+
+          <p class="p">
+            Although you can specify the access key ID and secret key as part of the <code class="ph codeph">s3a://</code> URL in the
+            <code class="ph codeph">LOCATION</code> attribute, doing so makes this sensitive information visible in many places, such
+            as <code class="ph codeph">DESCRIBE FORMATTED</code> output and Impala log files. Therefore, specify this information
+            centrally in the <span class="ph filepath">core-site.xml</span> file, and restrict read access to that file to only
+            trusted users.
+          </p>
+
+        
+
+      </div>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="s3__s3_etl">
+
+    <h2 class="title topictitle2" id="ariaid-title4">Loading Data into S3 for Impala Queries</h2>
+  
+
+    <div class="body conbody">
+
+      <p class="p">
+        If your ETL pipeline involves moving data into S3 and then querying through Impala,
+        you can either use Impala DML statements to create, move, or copy the data, or
+        use the same data loading techniques as you would for non-Impala data.
+      </p>
+
+    </div>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title5" id="s3_etl__s3_dml">
+      <h3 class="title topictitle3" id="ariaid-title5">Using Impala DML Statements for S3 Data</h3>
+      <div class="body conbody">
+        <p class="p">
+        In <span class="keyword">Impala 2.6</span> and higher, the Impala DML statements (<code class="ph codeph">INSERT</code>, <code class="ph codeph">LOAD DATA</code>,
+        and <code class="ph codeph">CREATE TABLE AS SELECT</code>) can write data into a table or partition that resides in the
+        Amazon Simple Storage Service (S3).
+        The syntax of the DML statements is the same as for any other tables, because the S3 location for tables and
+        partitions is specified by an <code class="ph codeph">s3a://</code> prefix in the
+        <code class="ph codeph">LOCATION</code> attribute of
+        <code class="ph codeph">CREATE TABLE</code> or <code class="ph codeph">ALTER TABLE</code> statements.
+        If you bring data into S3 using the normal S3 transfer mechanisms instead of Impala DML statements,
+        issue a <code class="ph codeph">REFRESH</code> statement for the table before using Impala to query the S3 data.
+      </p>
+        <p class="p">
+        Because of differences between S3 and traditional filesystems, DML operations
+        for S3 tables can take longer than for tables on HDFS. For example, both the
+        <code class="ph codeph">LOAD DATA</code> statement and the final stage of the <code class="ph codeph">INSERT</code>
+        and <code class="ph codeph">CREATE TABLE AS SELECT</code> statements involve moving files from one directory
+        to another. (In the case of <code class="ph codeph">INSERT</code> and <code class="ph codeph">CREATE TABLE AS SELECT</code>,
+        the files are moved from a temporary staging directory to the final destination directory.)
+        Because S3 does not support a <span class="q">"rename"</span> operation for existing objects, in these cases Impala
+        actually copies the data files from one location to another and then removes the original files.
+        In <span class="keyword">Impala 2.6</span>, the <code class="ph codeph">S3_SKIP_INSERT_STAGING</code> query option provides a way
+        to speed up <code class="ph codeph">INSERT</code> statements for S3 tables and partitions, with the tradeoff
+        that a problem during statement execution could leave data in an inconsistent state.
+        It does not apply to <code class="ph codeph">INSERT OVERWRITE</code> or <code class="ph codeph">LOAD DATA</code> statements.
+        See <a class="xref" href="../shared/../topics/impala_s3_skip_insert_staging.html#s3_skip_insert_staging">S3_SKIP_INSERT_STAGING Query Option (Impala 2.6 or higher only)</a> for details.
+      </p>
+      </div>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title6" id="s3_etl__s3_manual_etl">
+      <h3 class="title topictitle3" id="ariaid-title6">Manually Loading Data into Impala Tables on S3</h3>
+      <div class="body conbody">
+        <p class="p">
+          As an alternative, or on earlier Impala releases without DML support for S3,
+          you can use the Amazon-provided methods to bring data files into S3 for querying through Impala. See
+          <a class="xref" href="http://aws.amazon.com/s3/" target="_blank">the Amazon S3 web site</a> for
+          details.
+        </p>
+
+        <div class="note important note_important"><span class="note__title importanttitle">Important:</span> 
+          <div class="p">
+        For best compatibility with the S3 write support in <span class="keyword">Impala 2.6</span>
+        and higher:
+        <ul class="ul">
+        <li class="li">Use native Hadoop techniques to create data files in S3 for querying through Impala.</li>
+        <li class="li">Use the <code class="ph codeph">PURGE</code> clause of <code class="ph codeph">DROP TABLE</code> when dropping internal (managed) tables.</li>
+        </ul>
+        By default, when you drop an internal (managed) table, the data files are
+        moved to the HDFS trashcan. This operation is expensive for tables that
+        reside on the Amazon S3 filesystem. Therefore, for S3 tables, prefer to use
+        <code class="ph codeph">DROP TABLE <var class="keyword varname">table_name</var> PURGE</code> rather than the default <code class="ph codeph">DROP TABLE</code> statement.
+        The <code class="ph codeph">PURGE</code> clause makes Impala delete the data files immediately,
+        skipping the HDFS trashcan.
+        For the <code class="ph codeph">PURGE</code> clause to work effectively, you must originally create the
+        data files on S3 using one of the tools from the Hadoop ecosystem, such as
+        <code class="ph codeph">hadoop fs -cp</code>, or <code class="ph codeph">INSERT</code> in Impala or Hive.
+      </div>
+        </div>
+
+        <p class="p">
+          Alternative file creation techniques (less compatible with the <code class="ph codeph">PURGE</code> clause) include:
+        </p>
+
+        <ul class="ul">
+          <li class="li">
+            The <a class="xref" href="https://console.aws.amazon.com/s3/home" target="_blank">Amazon AWS / S3
+            web interface</a> to upload from a web browser.
+          </li>
+
+          <li class="li">
+            The <a class="xref" href="http://aws.amazon.com/cli/" target="_blank">Amazon AWS CLI</a> to
+            manipulate files from the command line.
+          </li>
+
+          <li class="li">
+            Other S3-enabled software, such as
+            <a class="xref" href="http://s3tools.org/s3cmd" target="_blank">the S3Tools client software</a>.
+          </li>
+        </ul>
+
+        <p class="p">
+          After you upload data files to a location already mapped to an Impala table or partition, or if you delete
+          files in S3 from such a location, issue the <code class="ph codeph">REFRESH <var class="keyword varname">table_name</var></code>
+          statement to make Impala aware of the new set of data files.
+        </p>
+
+      </div>
+    </article>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title7" id="s3__s3_ddl">
+
+    <h2 class="title topictitle2" id="ariaid-title7">Creating Impala Databases, Tables, and Partitions for Data Stored on S3</h2>
+  
+
+    <div class="body conbody">
+
+      <p class="p">
+        Impala reads data for a table or partition from S3 based on the <code class="ph codeph">LOCATION</code> attribute for the
+        table or partition. Specify the S3 details in the <code class="ph codeph">LOCATION</code> clause of a <code class="ph codeph">CREATE
+        TABLE</code> or <code class="ph codeph">ALTER TABLE</code> statement. The notation for the <code class="ph codeph">LOCATION</code>
+        clause is <code class="ph codeph">s3a://<var class="keyword varname">bucket_name</var>/<var class="keyword varname">path/to/file</var></code>. The
+        filesystem prefix is always <code class="ph codeph">s3a://</code> because Impala does not support the <code class="ph codeph">s3://</code> or
+        <code class="ph codeph">s3n://</code> prefixes.
+      </p>
+
+      <p class="p">
+        For a partitioned table, either specify a separate <code class="ph codeph">LOCATION</code> clause for each new partition,
+        or specify a base <code class="ph codeph">LOCATION</code> for the table and set up a directory structure in S3 to mirror
+        the way Impala partitioned tables are structured in HDFS. Although, strictly speaking, S3 filenames do not
+        have directory paths, Impala treats S3 filenames with <code class="ph codeph">/</code> characters the same as HDFS
+        pathnames that include directories.
+      </p>
+
+      <p class="p">
+        You point a nonpartitioned table or an individual partition at S3 by specifying a single directory
+        path in S3, which could be any arbitrary directory. To replicate the structure of an entire Impala
+        partitioned table or database in S3 requires more care, with directories and subdirectories nested and
+        named to match the equivalent directory tree in HDFS. Consider setting up an empty staging area if
+        necessary in HDFS, and recording the complete directory structure so that you can replicate it in S3.
+        
+      </p>
+
+      <p class="p">
+        For convenience when working with multiple tables with data files stored in S3, you can create a database
+        with a <code class="ph codeph">LOCATION</code> attribute pointing to an S3 path.
+        Specify a URL of the form <code class="ph codeph">s3a://<var class="keyword varname">bucket</var>/<var class="keyword varname">root/path/for/database</var></code>
+        for the <code class="ph codeph">LOCATION</code> attribute of the database.
+        Any tables created inside that database
+        automatically create directories underneath the one specified by the database
+        <code class="ph codeph">LOCATION</code> attribute.
+      </p>
+
+      <p class="p">
+        For example, the following session creates a partitioned table where only a single partition resides on S3.
+        The partitions for years 2013 and 2014 are located on HDFS. The partition for year 2015 includes a
+        <code class="ph codeph">LOCATION</code> attribute with an <code class="ph codeph">s3a://</code> URL, and so refers to data residing on
+        S3, under a specific path underneath the bucket <code class="ph codeph">impala-demo</code>.
+      </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; create database db_on_hdfs;
+[localhost:21000] &gt; use db_on_hdfs;
+[localhost:21000] &gt; create table mostly_on_hdfs (x int) partitioned by (year int);
+[localhost:21000] &gt; alter table mostly_on_hdfs add partition (year=2013);
+[localhost:21000] &gt; alter table mostly_on_hdfs add partition (year=2014);
+[localhost:21000] &gt; alter table mostly_on_hdfs add partition (year=2015)
+                  &gt;   location 's3a://impala-demo/dir1/dir2/dir3/t1';
+</code></pre>
+
+      <p class="p">
+        The following session creates a database and two partitioned tables residing entirely on S3, one
+        partitioned by a single column and the other partitioned by multiple columns. Because a
+        <code class="ph codeph">LOCATION</code> attribute with an <code class="ph codeph">s3a://</code> URL is specified for the database, the
+        tables inside that database are automatically created on S3 underneath the database directory. To see the
+        names of the associated subdirectories, including the partition key values, we use an S3 client tool to
+        examine how the directory structure is organized on S3. For example, Impala partition directories such as
+        <code class="ph codeph">month=1</code> do not include leading zeroes, which sometimes appear in partition directories created
+        through Hive.
+      </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; create database db_on_s3 location 's3a://impala-demo/dir1/dir2/dir3';
+[localhost:21000] &gt; use db_on_s3;
+
+[localhost:21000] &gt; create table partitioned_on_s3 (x int) partitioned by (year int);
+[localhost:21000] &gt; alter table partitioned_on_s3 add partition (year=2013);
+[localhost:21000] &gt; alter table partitioned_on_s3 add partition (year=2014);
+[localhost:21000] &gt; alter table partitioned_on_s3 add partition (year=2015);
+
+[localhost:21000] &gt; !aws s3 ls s3://impala-demo/dir1/dir2/dir3 --recursive;
+2015-03-17 13:56:34          0 dir1/dir2/dir3/
+2015-03-17 16:43:28          0 dir1/dir2/dir3/partitioned_on_s3/
+2015-03-17 16:43:49          0 dir1/dir2/dir3/partitioned_on_s3/year=2013/
+2015-03-17 16:43:53          0 dir1/dir2/dir3/partitioned_on_s3/year=2014/
+2015-03-17 16:43:58          0 dir1/dir2/dir3/partitioned_on_s3/year=2015/
+
+[localhost:21000] &gt; create table partitioned_multiple_keys (x int)
+                  &gt;   partitioned by (year smallint, month tinyint, day tinyint);
+[localhost:21000] &gt; alter table partitioned_multiple_keys
+                  &gt;   add partition (year=2015,month=1,day=1);
+[localhost:21000] &gt; alter table partitioned_multiple_keys
+                  &gt;   add partition (year=2015,month=1,day=31);
+[localhost:21000] &gt; alter table partitioned_multiple_keys
+                  &gt;   add partition (year=2015,month=2,day=28);
+
+[localhost:21000] &gt; !aws s3 ls s3://impala-demo/dir1/dir2/dir3 --recursive;
+2015-03-17 13:56:34          0 dir1/dir2/dir3/
+2015-03-17 16:47:13          0 dir1/dir2/dir3/partitioned_multiple_keys/
+2015-03-17 16:47:44          0 dir1/dir2/dir3/partitioned_multiple_keys/year=2015/month=1/day=1/
+2015-03-17 16:47:50          0 dir1/dir2/dir3/partitioned_multiple_keys/year=2015/month=1/day=31/
+2015-03-17 16:47:57          0 dir1/dir2/dir3/partitioned_multiple_keys/year=2015/month=2/day=28/
+2015-03-17 16:43:28          0 dir1/dir2/dir3/partitioned_on_s3/
+2015-03-17 16:43:49          0 dir1/dir2/dir3/partitioned_on_s3/year=2013/
+2015-03-17 16:43:53          0 dir1/dir2/dir3/partitioned_on_s3/year=2014/
+2015-03-17 16:43:58          0 dir1/dir2/dir3/partitioned_on_s3/year=2015/
+</code></pre>
+
+      <p class="p">
+        The <code class="ph codeph">CREATE DATABASE</code> and <code class="ph codeph">CREATE TABLE</code> statements create the associated
+        directory paths if they do not already exist. You can specify multiple levels of directories, and the
+        <code class="ph codeph">CREATE</code> statement creates all appropriate levels, similar to using <code class="ph codeph">mkdir
+        -p</code>.
+      </p>
+
+      <p class="p">
+        Use the standard S3 file upload methods to actually put the data files into the right locations. You can
+        also put the directory paths and data files in place before creating the associated Impala databases or
+        tables, and Impala automatically uses the data from the appropriate location after the associated databases
+        and tables are created.
+      </p>
+
+      <p class="p">
+        You can switch whether an existing table or partition points to data in HDFS or S3. For example, if you
+        have an Impala table or partition pointing to data files in HDFS or S3, and you later transfer those data
+        files to the other filesystem, use an <code class="ph codeph">ALTER TABLE</code> statement to adjust the
+        <code class="ph codeph">LOCATION</code> attribute of the corresponding table or partition to reflect that change. Because
+        Impala does not have an <code class="ph codeph">ALTER DATABASE</code> statement, this location-switching technique is not
+        practical for entire databases that have a custom <code class="ph codeph">LOCATION</code> attribute.
+      </p>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title8" id="s3__s3_internal_external">
+
+    <h2 class="title topictitle2" id="ariaid-title8">Internal and External Tables Located on S3</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        Just as with tables located on HDFS storage, you can designate S3-based tables as either internal (managed
+        by Impala) or external, by using the syntax <code class="ph codeph">CREATE TABLE</code> or <code class="ph codeph">CREATE EXTERNAL
+        TABLE</code> respectively. When you drop an internal table, the files associated with the table are
+        removed, even if they are on S3 storage. When you drop an external table, the files associated with the
+        table are left alone, and are still available for access by other tools or components. See
+        <a class="xref" href="impala_tables.html#tables">Overview of Impala Tables</a> for details.
+      </p>
+
+      <p class="p">
+        If the data on S3 is intended to be long-lived and accessed by other tools in addition to Impala, create
+        any associated S3 tables with the <code class="ph codeph">CREATE EXTERNAL TABLE</code> syntax, so that the files are not
+        deleted from S3 when the table is dropped.
+      </p>
+
+      <p class="p">
+        If the data on S3 is only needed for querying by Impala and can be safely discarded once the Impala
+        workflow is complete, create the associated S3 tables using the <code class="ph codeph">CREATE TABLE</code> syntax, so
+        that dropping the table also deletes the corresponding data files on S3.
+      </p>
+
+      <p class="p">
+        For example, this session creates a table in S3 with the same column layout as a table in HDFS, then
+        examines the S3 table and queries some data from it. The table in S3 works the same as a table in HDFS as
+        far as the expected file format of the data, table and column statistics, and other table properties. The
+        only indication that it is not an HDFS table is the <code class="ph codeph">s3a://</code> URL in the
+        <code class="ph codeph">LOCATION</code> property. Many data files can reside in the S3 directory, and their combined
+        contents form the table data. Because the data in this example is uploaded after the table is created, a
+        <code class="ph codeph">REFRESH</code> statement prompts Impala to update its cached information about the data files.
+      </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; create table usa_cities_s3 like usa_cities location 's3a://impala-demo/usa_cities';
+[localhost:21000] &gt; desc usa_cities_s3;
++-------+----------+---------+
+| name  | type     | comment |
++-------+----------+---------+
+| id    | smallint |         |
+| city  | string   |         |
+| state | string   |         |
++-------+----------+---------+
+
+-- Now from a web browser, upload the same data file(s) to S3 as in the HDFS table,
+-- under the relevant bucket and path. If you already have the data in S3, you would
+-- point the table LOCATION at an existing path.
+
+[localhost:21000] &gt; refresh usa_cities_s3;
+[localhost:21000] &gt; select count(*) from usa_cities_s3;
++----------+
+| count(*) |
++----------+
+| 289      |
++----------+
+[localhost:21000] &gt; select distinct state from sample_data_s3 limit 5;
++----------------------+
+| state                |
++----------------------+
+| Louisiana            |
+| Minnesota            |
+| Georgia              |
+| Alaska               |
+| Ohio                 |
++----------------------+
+[localhost:21000] &gt; desc formatted usa_cities_s3;
++------------------------------+------------------------------+---------+
+| name                         | type                         | comment |
++------------------------------+------------------------------+---------+
+| # col_name                   | data_type                    | comment |
+|                              | NULL                         | NULL    |
+| id                           | smallint                     | NULL    |
+| city                         | string                       | NULL    |
+| state                        | string                       | NULL    |
+|                              | NULL                         | NULL    |
+| # Detailed Table Information | NULL                         | NULL    |
+| Database:                    | s3_testing                   | NULL    |
+| Owner:                       | jrussell                     | NULL    |
+| CreateTime:                  | Mon Mar 16 11:36:25 PDT 2015 | NULL    |
+| LastAccessTime:              | UNKNOWN                      | NULL    |
+| Protect Mode:                | None                         | NULL    |
+| Retention:                   | 0                            | NULL    |
+| Location:                    | s3a://impala-demo/usa_cities | NULL    |
+| Table Type:                  | MANAGED_TABLE                | NULL    |
+...
++------------------------------+------------------------------+---------+
+</code></pre>
+
+
+
+      <p class="p">
+        In this case, we have already uploaded a Parquet file with a million rows of data to the
+        <code class="ph codeph">sample_data</code> directory underneath the <code class="ph codeph">impala-demo</code> bucket on S3. This
+        session creates a table with matching column settings pointing to the corresponding location in S3, then
+        queries the table. Because the data is already in place on S3 when the table is created, no
+        <code class="ph codeph">REFRESH</code> statement is required.
+      </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; create table sample_data_s3
+                  &gt; (id int, id bigint, val int, zerofill string,
+                  &gt; name string, assertion boolean, city string, state string)
+                  &gt; stored as parquet location 's3a://impala-demo/sample_data';
+[localhost:21000] &gt; select count(*) from sample_data_s3;;
++----------+
+| count(*) |
++----------+
+| 1000000  |
++----------+
+[localhost:21000] &gt; select count(*) howmany, assertion from sample_data_s3 group by assertion;
++---------+-----------+
+| howmany | assertion |
++---------+-----------+
+| 667149  | true      |
+| 332851  | false     |
++---------+-----------+
+</code></pre>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title9" id="s3__s3_queries">
+
+    <h2 class="title topictitle2" id="ariaid-title9">Running and Tuning Impala Queries for Data Stored on S3</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        Once the appropriate <code class="ph codeph">LOCATION</code> attributes are set up at the table or partition level, you
+        query data stored in S3 exactly the same as data stored on HDFS or in HBase:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          Queries against S3 data support all the same file formats as for HDFS data.
+        </li>
+
+        <li class="li">
+          Tables can be unpartitioned or partitioned. For partitioned tables, either manually construct paths in S3
+          corresponding to the HDFS directories representing partition key values, or use <code class="ph codeph">ALTER TABLE ...
+          ADD PARTITION</code> to set up the appropriate paths in S3.
+        </li>
+
+        <li class="li">
+          HDFS and HBase tables can be joined to S3 tables, or S3 tables can be joined with each other.
+        </li>
+
+        <li class="li">
+          Authorization using the Sentry framework to control access to databases, tables, or columns works the
+          same whether the data is in HDFS or in S3.
+        </li>
+
+        <li class="li">
+          The <span class="keyword cmdname">catalogd</span> daemon caches metadata for both HDFS and S3 tables. Use
+          <code class="ph codeph">REFRESH</code> and <code class="ph codeph">INVALIDATE METADATA</code> for S3 tables in the same situations
+          where you would issue those statements for HDFS tables.
+        </li>
+
+        <li class="li">
+          Queries against S3 tables are subject to the same kinds of admission control and resource management as
+          HDFS tables.
+        </li>
+
+        <li class="li">
+          Metadata about S3 tables is stored in the same metastore database as for HDFS tables.
+        </li>
+
+        <li class="li">
+          You can set up views referring to S3 tables, the same as for HDFS tables.
+        </li>
+
+        <li class="li">
+          The <code class="ph codeph">COMPUTE STATS</code>, <code class="ph codeph">SHOW TABLE STATS</code>, and <code class="ph codeph">SHOW COLUMN
+          STATS</code> statements work for S3 tables also.
+        </li>
+      </ul>
+
+    </div>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title10" id="s3_queries__s3_performance">
+
+      <h3 class="title topictitle3" id="ariaid-title10">Understanding and Tuning Impala Query Performance for S3 Data</h3>
+  
+
+      <div class="body conbody">
+
+        <p class="p">
+          Although Impala queries for data stored in S3 might be less performant than queries against the
+          equivalent data stored in HDFS, you can still do some tuning. Here are techniques you can use to
+          interpret explain plans and profiles for queries against S3 data, and tips to achieve the best
+          performance possible for such queries.
+        </p>
+
+        <p class="p">
+          All else being equal, performance is expected to be lower for queries running against data on S3 rather
+          than HDFS. The actual mechanics of the <code class="ph codeph">SELECT</code> statement are somewhat different when the
+          data is in S3. Although the work is still distributed across the datanodes of the cluster, Impala might
+          parallelize the work for a distributed query differently for data on HDFS and S3. S3 does not have the
+          same block notion as HDFS, so Impala uses heuristics to determine how to split up large S3 files for
+          processing in parallel. Because all hosts can access any S3 data file with equal efficiency, the
+          distribution of work might be different than for HDFS data, where the data blocks are physically read
+          using short-circuit local reads by hosts that contain the appropriate block replicas. Although the I/O to
+          read the S3 data might be spread evenly across the hosts of the cluster, the fact that all data is
+          initially retrieved across the network means that the overall query performance is likely to be lower for
+          S3 data than for HDFS data.
+        </p>
+
+        <p class="p">
+        In <span class="keyword">Impala 2.6</span> and higher, Impala queries are optimized for files stored in Amazon S3.
+        For Impala tables that use the file formats Parquet, RCFile, SequenceFile,
+        Avro, and uncompressed text, the setting <code class="ph codeph">fs.s3a.block.size</code>
+        in the <span class="ph filepath">core-site.xml</span> configuration file determines
+        how Impala divides the I/O work of reading the data files. This configuration
+        setting is specified in bytes. By default, this
+        value is 33554432 (32 MB), meaning that Impala parallelizes S3 read operations on the files
+        as if they were made up of 32 MB blocks. For example, if your S3 queries primarily access
+        Parquet files written by MapReduce or Hive, increase <code class="ph codeph">fs.s3a.block.size</code>
+        to 134217728 (128 MB) to match the row group size of those files. If most S3 queries involve
+        Parquet files written by Impala, increase <code class="ph codeph">fs.s3a.block.size</code>
+        to 268435456 (256 MB) to match the row group size produced by Impala.
+      </p>
+
+        <p class="p">
+        Because of differences between S3 and traditional filesystems, DML operations
+        for S3 tables can take longer than for tables on HDFS. For example, both the
+        <code class="ph codeph">LOAD DATA</code> statement and the final stage of the <code class="ph codeph">INSERT</code>
+        and <code class="ph codeph">CREATE TABLE AS SELECT</code> statements involve moving files from one directory
+        to another. (In the case of <code class="ph codeph">INSERT</code> and <code class="ph codeph">CREATE TABLE AS SELECT</code>,
+        the files are moved from a temporary staging directory to the final destination directory.)
+        Because S3 does not support a <span class="q">"rename"</span> operation for existing objects, in these cases Impala
+        actually copies the data files from one location to another and then removes the original files.
+        In <span class="keyword">Impala 2.6</span>, the <code class="ph codeph">S3_SKIP_INSERT_STAGING</code> query option provides a way
+        to speed up <code class="ph codeph">INSERT</code> statements for S3 tables and partitions, with the tradeoff
+        that a problem during statement execution could leave data in an inconsistent state.
+        It does not apply to <code class="ph codeph">INSERT OVERWRITE</code> or <code class="ph codeph">LOAD DATA</code> statements.
+        See <a class="xref" href="../shared/../topics/impala_s3_skip_insert_staging.html#s3_skip_insert_staging">S3_SKIP_INSERT_STAGING Query Option (Impala 2.6 or higher only)</a> for details.
+      </p>
+
+        <p class="p">
+          When optimizing aspects of for complex queries such as the join order, Impala treats tables on HDFS and
+          S3 the same way. Therefore, follow all the same tuning recommendations for S3 tables as for HDFS ones,
+          such as using the <code class="ph codeph">COMPUTE STATS</code> statement to help Impala construct accurate estimates of
+          row counts and cardinality. See <a class="xref" href="impala_performance.html#performance">Tuning Impala for Performance</a> for details.
+        </p>
+
+        <p class="p">
+          In query profile reports, the numbers for <code class="ph codeph">BytesReadLocal</code>,
+          <code class="ph codeph">BytesReadShortCircuit</code>, <code class="ph codeph">BytesReadDataNodeCached</code>, and
+          <code class="ph codeph">BytesReadRemoteUnexpected</code> are blank because those metrics come from HDFS.
+          If you do see any indications that a query against an S3 table performed <span class="q">"remote read"</span>
+          operations, do not be alarmed. That is expected because, by definition, all the I/O for S3 tables involves
+          remote reads.
+        </p>
+
+      </div>
+
+    </article>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title11" id="s3__s3_restrictions">
+
+    <h2 class="title topictitle2" id="ariaid-title11">Restrictions on Impala Support for S3</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        Impala requires that the default filesystem for the cluster be HDFS. You cannot use S3 as the only
+        filesystem in the cluster.
+      </p>
+
+      <p class="p">
+        Prior to <span class="keyword">Impala 2.6</span> Impala could not perform DML operations (<code class="ph codeph">INSERT</code>,
+        <code class="ph codeph">LOAD DATA</code>, or <code class="ph codeph">CREATE TABLE AS SELECT</code>) where the destination is a table
+        or partition located on an S3 filesystem. This restriction is lifted in <span class="keyword">Impala 2.6</span> and higher.
+      </p>
+
+      <p class="p">
+        Impala does not support the old <code class="ph codeph">s3://</code> block-based and <code class="ph codeph">s3n://</code> filesystem
+        schemes, only <code class="ph codeph">s3a://</code>.
+      </p>
+
+      <p class="p">
+        Although S3 is often used to store JSON-formatted data, the current Impala support for S3 does not include
+        directly querying JSON data. For Impala queries, use data files in one of the file formats listed in
+        <a class="xref" href="impala_file_formats.html#file_formats">How Impala Works with Hadoop File Formats</a>. If you have data in JSON format, you can prepare a
+        flattened version of that data for querying by Impala as part of your ETL cycle.
+      </p>
+
+      <p class="p">
+        You cannot use the <code class="ph codeph">ALTER TABLE ... SET CACHED</code> statement for tables or partitions that are
+        located in S3.
+      </p>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title12" id="s3__s3_best_practices">
+    <h2 class="title topictitle2" id="ariaid-title12">Best Practices for Using Impala with S3</h2>
+    
+    <div class="body conbody">
+      <p class="p">
+        The following guidelines represent best practices derived from testing and field experience with Impala on S3:
+      </p>
+      <ul class="ul">
+        <li class="li">
+          <p class="p">
+            Any reference to an S3 location must be fully qualified. (This rule applies when
+            S3 is not designated as the default filesystem.)
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            Set the safety valve <code class="ph codeph">fs.s3a.connection.maximum</code> to 1500 for <span class="keyword cmdname">impalad</span>.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            Set safety valve <code class="ph codeph">fs.s3a.block.size</code> to 134217728
+            (128 MB in bytes) if most Parquet files queried by Impala were written by Hive
+            or ParquetMR jobs. Set the block size to 268435456 (256 MB in bytes) if most Parquet
+            files queried by Impala were written by Impala.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            <code class="ph codeph">DROP TABLE .. PURGE</code> is much faster than the default <code class="ph codeph">DROP TABLE</code>.
+            The same applies to <code class="ph codeph">ALTER TABLE ... DROP PARTITION PURGE</code>
+            versus the default <code class="ph codeph">DROP PARTITION</code> operation.
+            However, due to the eventually consistent nature of S3, the files for that
+            table or partition could remain for some unbounded time when using <code class="ph codeph">PURGE</code>.
+            The default <code class="ph codeph">DROP TABLE/PARTITION</code> is slow because Impala copies the files to the HDFS trash folder,
+            and Impala waits until all the data is moved. <code class="ph codeph">DROP TABLE/PARTITION .. PURGE</code> is a
+            fast delete operation, and the Impala statement finishes quickly even though the change might not
+            have propagated fully throughout S3.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            <code class="ph codeph">INSERT</code> statements are faster than <code class="ph codeph">INSERT OVERWRITE</code> for S3.
+            The query option <code class="ph codeph">S3_SKIP_INSERT_STAGING</code>, which is set to <code class="ph codeph">true</code> by default,
+            skips the staging step for regular <code class="ph codeph">INSERT</code> (but not <code class="ph codeph">INSERT OVERWRITE</code>).
+            This makes the operation much faster, but consistency is not guaranteed: if a node fails during execution, the
+            table could end up with inconsistent data. Set this option to <code class="ph codeph">false</code> if stronger
+            consistency is required, however this setting will make the <code class="ph codeph">INSERT</code> operations slower.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            Too many files in a table can make metadata loading and updating slow on S3.
+            If too many requests are made to S3, S3 has a back-off mechanism and
+            responds slower than usual. You might have many small files because of:
+          </p>
+          <ul class="ul">
+            <li class="li">
+              <p class="p">
+                Too many partitions due to over-granular partitioning. Prefer partitions with
+                many megabytes of data, so that even a query against a single partition can
+                be parallelized effectively.
+              </p>
+            </li>
+            <li class="li">
+              <p class="p">
+                Many small <code class="ph codeph">INSERT</code> queries. Prefer bulk
+                <code class="ph codeph">INSERT</code>s so that more data is written to fewer
+                files.
+              </p>
+            </li>
+          </ul>
+        </li>
+      </ul>
+
+    </div>
+  </article>
+
+
+</article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_s3_skip_insert_staging.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_s3_skip_insert_staging.html b/docs/build/html/topics/impala_s3_skip_insert_staging.html
new file mode 100644
index 0000000..53cf4e9
--- /dev/null
+++ b/docs/build/html/topics/impala_s3_skip_insert_staging.html
@@ -0,0 +1,78 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="s3_skip_insert_staging"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>S3_SKIP_INSERT_STAGING Query Option (Impala 2.6 or higher only)</title></head><body id="s3_skip_insert_staging"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">S3_SKIP_INSERT_STAGING Query Option (<span class="keyword">Impala 2.6</span> or higher only)</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      
+    </p>
+
+    <p class="p">
+      Speeds up <code class="ph codeph">INSERT</code> operations on tables or partitions residing on the
+      Amazon S3 filesystem. The tradeoff is the possibility of inconsistent data left behind
+      if an error occurs partway through the operation.
+    </p>
+
+    <p class="p">
+      By default, Impala write operations to S3 tables and partitions involve a two-stage process.
+      Impala writes intermediate files to S3, then (because S3 does not provide a <span class="q">"rename"</span>
+      operation) those intermediate files are copied to their final location, making the process
+      more expensive as on a filesystem that supports renaming or moving files.
+      This query option makes Impala skip the intermediate files, and instead write the
+      new data directly to the final destination.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+    <div class="note important note_important"><span class="note__title importanttitle">Important:</span> 
+      <p class="p">
+        If a host that is participating in the <code class="ph codeph">INSERT</code> operation fails partway through
+        the query, you might be left with a table or partition that contains some but not all of the
+        expected data files. Therefore, this option is most appropriate for a development or test
+        environment where you have the ability to reconstruct the table if a problem during
+        <code class="ph codeph">INSERT</code> leaves the data in an inconsistent state.
+      </p>
+    </div>
+
+    <p class="p">
+      The timing of file deletion during an <code class="ph codeph">INSERT OVERWRITE</code> operation
+      makes it impractical to write new files to S3 and delete the old files in a single operation.
+      Therefore, this query option only affects regular <code class="ph codeph">INSERT</code> statements that add
+      to the existing data in a table, not <code class="ph codeph">INSERT OVERWRITE</code> statements.
+      Use <code class="ph codeph">TRUNCATE TABLE</code> if you need to remove all contents from an S3 table
+      before performing a fast <code class="ph codeph">INSERT</code> with this option enabled.
+    </p>
+
+    <p class="p">
+      Performance improvements with this option enabled can be substantial. The speed increase
+      might be more noticeable for non-partitioned tables than for partitioned tables.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Type:</strong> Boolean; recognized values are 1 and 0, or <code class="ph codeph">true</code> and <code class="ph codeph">false</code>;
+        any other value interpreted as <code class="ph codeph">false</code>
+      </p>
+    <p class="p">
+        <strong class="ph b">Default:</strong> <code class="ph codeph">true</code> (shown as 1 in output of <code class="ph codeph">SET</code> statement)
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.6.0</span>
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+    <p class="p">
+      <a class="xref" href="impala_s3.html#s3">Using Impala with the Amazon S3 Filesystem</a>
+    </p>
+
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>
\ No newline at end of file

[41/51] [partial] incubator-impala git commit: IMPALA-4181 [DOCS] Publish rendered Impala documentation to ASF site

Posted by jb...@apache.org.

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_config_options.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_config_options.html b/docs/build/html/topics/impala_config_options.html
new file mode 100644
index 0000000..5bf3ff2
--- /dev/null
+++ b/docs/build/html/topics/impala_config_options.html
@@ -0,0 +1,361 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_processes.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="config_options"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Modifying Impala Startup Options</title></head><body id="config_options"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Modifying Impala Startup Options</h1>
+
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      
+
+      
+
+      
+
+      
+
+      
+
+      
+
+      
+
+      
+
+      
+
+      
+
+      
+
+      
+      The configuration options for the Impala-related daemons let you choose which hosts and
+      ports to use for the services that run on a single host, specify directories for logging,
+      control resource usage and security, and specify other aspects of the Impala software.
+    </p>
+
+    <p class="p toc inpage"></p>
+
+  </div>
+
+  <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_processes.html">Starting Impala</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="config_options__config_options_noncm">
+
+    <h2 class="title topictitle2" id="ariaid-title2">Configuring Impala Startup Options through the Command Line</h2>
+
+    <div class="body conbody">
+
+      <p class="p"> The Impala server, statestore, and catalog services start up using values provided in a
+        defaults file, <span class="ph filepath">/etc/default/impala</span>. </p>
+
+      <p class="p">
+        This file includes information about many resources used by Impala. Most of the defaults
+        included in this file should be effective in most cases. For example, typically you
+        would not change the definition of the <code class="ph codeph">CLASSPATH</code> variable, but you
+        would always set the address used by the statestore server. Some of the content you
+        might modify includes:
+      </p>
+
+
+
+<pre class="pre codeblock"><code>IMPALA_STATE_STORE_HOST=127.0.0.1
+IMPALA_STATE_STORE_PORT=24000
+IMPALA_BACKEND_PORT=22000
+IMPALA_LOG_DIR=/var/log/impala
+IMPALA_CATALOG_SERVICE_HOST=...
+IMPALA_STATE_STORE_HOST=...
+
+export IMPALA_STATE_STORE_ARGS=${IMPALA_STATE_STORE_ARGS:- \
+    -log_dir=${IMPALA_LOG_DIR} -state_store_port=${IMPALA_STATE_STORE_PORT}}
+IMPALA_SERVER_ARGS=" \
+-log_dir=${IMPALA_LOG_DIR} \
+-catalog_service_host=${IMPALA_CATALOG_SERVICE_HOST} \
+-state_store_port=${IMPALA_STATE_STORE_PORT} \
+-use_statestore \
+-state_store_host=${IMPALA_STATE_STORE_HOST} \
+-be_port=${IMPALA_BACKEND_PORT}"
+export ENABLE_CORE_DUMPS=${ENABLE_COREDUMPS:-false}</code></pre>
+
+      <p class="p">
+        To use alternate values, edit the defaults file, then restart all the Impala-related
+        services so that the changes take effect. Restart the Impala server using the following
+        commands:
+      </p>
+
+<pre class="pre codeblock"><code>$ sudo service impala-server restart
+Stopping Impala Server:                                    [  OK  ]
+Starting Impala Server:                                    [  OK  ]</code></pre>
+
+      <p class="p">
+        Restart the Impala statestore using the following commands:
+      </p>
+
+<pre class="pre codeblock"><code>$ sudo service impala-state-store restart
+Stopping Impala State Store Server:                        [  OK  ]
+Starting Impala State Store Server:                        [  OK  ]</code></pre>
+
+      <p class="p">
+        Restart the Impala catalog service using the following commands:
+      </p>
+
+<pre class="pre codeblock"><code>$ sudo service impala-catalog restart
+Stopping Impala Catalog Server:                            [  OK  ]
+Starting Impala Catalog Server:                            [  OK  ]</code></pre>
+
+      <p class="p">
+        Some common settings to change include:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          <p class="p">
+            Statestore address. Where practical, put the statestore on a separate host not
+            running the <span class="keyword cmdname">impalad</span> daemon. In that recommended configuration,
+            the <span class="keyword cmdname">impalad</span> daemon cannot refer to the statestore server using
+            the loopback address. If the statestore is hosted on a machine with an IP address of
+            192.168.0.27, change:
+          </p>
+<pre class="pre codeblock"><code>IMPALA_STATE_STORE_HOST=127.0.0.1</code></pre>
+          <p class="p">
+            to:
+          </p>
+<pre class="pre codeblock"><code>IMPALA_STATE_STORE_HOST=192.168.0.27</code></pre>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            Catalog server address (including both the hostname and the port number). Update the
+            value of the <code class="ph codeph">IMPALA_CATALOG_SERVICE_HOST</code> variable. Where
+            practical, run the catalog server on the same host as the statestore. In that
+            recommended configuration, the <span class="keyword cmdname">impalad</span> daemon cannot refer to the
+            catalog server using the loopback address. If the catalog service is hosted on a
+            machine with an IP address of 192.168.0.27, add the following line:
+          </p>
+<pre class="pre codeblock"><code>IMPALA_CATALOG_SERVICE_HOST=192.168.0.27:26000</code></pre>
+          <p class="p">
+            The <span class="ph filepath">/etc/default/impala</span> defaults file currently does not define
+            an <code class="ph codeph">IMPALA_CATALOG_ARGS</code> environment variable, but if you add one it
+            will be recognized by the service startup/shutdown script. Add a definition for this
+            variable to <span class="ph filepath">/etc/default/impala</span> and add the option
+            <code class="ph codeph">-catalog_service_host=<var class="keyword varname">hostname</var></code>. If the port is
+            different than the default 26000, also add the option
+            <code class="ph codeph">-catalog_service_port=<var class="keyword varname">port</var></code>.
+          </p>
+        </li>
+
+        <li class="li" id="config_options_noncm__mem_limit">
+          <p class="p">
+            Memory limits. You can limit the amount of memory available to Impala. For example,
+            to allow Impala to use no more than 70% of system memory, change:
+          </p>
+
+<pre class="pre codeblock"><code>export IMPALA_SERVER_ARGS=${IMPALA_SERVER_ARGS:- \
+    -log_dir=${IMPALA_LOG_DIR} \
+    -state_store_port=${IMPALA_STATE_STORE_PORT} \
+    -use_statestore -state_store_host=${IMPALA_STATE_STORE_HOST} \
+    -be_port=${IMPALA_BACKEND_PORT}}</code></pre>
+          <p class="p">
+            to:
+          </p>
+<pre class="pre codeblock"><code>export IMPALA_SERVER_ARGS=${IMPALA_SERVER_ARGS:- \
+    -log_dir=${IMPALA_LOG_DIR} -state_store_port=${IMPALA_STATE_STORE_PORT} \
+    -use_statestore -state_store_host=${IMPALA_STATE_STORE_HOST} \
+    -be_port=${IMPALA_BACKEND_PORT} -mem_limit=70%}</code></pre>
+          <p class="p">
+            You can specify the memory limit using absolute notation such as
+            <code class="ph codeph">500m</code> or <code class="ph codeph">2G</code>, or as a percentage of physical memory
+            such as <code class="ph codeph">60%</code>.
+          </p>
+
+          <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+            Queries that exceed the specified memory limit are aborted. Percentage limits are
+            based on the physical memory of the machine and do not consider cgroups.
+          </div>
+        </li>
+
+        <li class="li">
+          <p class="p"> Core dump enablement. To enable core dumps, change: </p>
+<pre class="pre codeblock"><code>export ENABLE_CORE_DUMPS=${ENABLE_COREDUMPS:-false}</code></pre>
+          <p class="p">
+            to:
+          </p>
+<pre class="pre codeblock"><code>export ENABLE_CORE_DUMPS=${ENABLE_COREDUMPS:-true}</code></pre>
+
+          <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+      <ul class="ul">
+        <li class="li">
+          <p class="p">
+            The location of core dump files may vary according to your operating system configuration.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            Other security settings may prevent Impala from writing core dumps even when this option is enabled.
+          </p>
+        </li>
+      </ul>
+      </div>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            Authorization using the open source Sentry plugin. Specify the
+            <code class="ph codeph">-server_name</code> and <code class="ph codeph">-authorization_policy_file</code>
+            options as part of the <code class="ph codeph">IMPALA_SERVER_ARGS</code> and
+            <code class="ph codeph">IMPALA_STATE_STORE_ARGS</code> settings to enable the core Impala support
+            for authentication. See <a class="xref" href="impala_authorization.html#secure_startup">Starting the impalad Daemon with Sentry Authorization Enabled</a> for
+            details.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            Auditing for successful or blocked Impala queries, another aspect of security.
+            Specify the <code class="ph codeph">-audit_event_log_dir=<var class="keyword varname">directory_path</var></code>
+            option and optionally the
+            <code class="ph codeph">-max_audit_event_log_file_size=<var class="keyword varname">number_of_queries</var></code>
+            and <code class="ph codeph">-abort_on_failed_audit_event</code> options as part of the
+            <code class="ph codeph">IMPALA_SERVER_ARGS</code> settings, for each Impala node, to enable and
+            customize auditing. See <a class="xref" href="impala_auditing.html#auditing">Auditing Impala Operations</a> for details.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            Password protection for the Impala web UI, which listens on port 25000 by default.
+            This feature involves adding some or all of the
+            <code class="ph codeph">--webserver_password_file</code>,
+            <code class="ph codeph">--webserver_authentication_domain</code>, and
+            <code class="ph codeph">--webserver_certificate_file</code> options to the
+            <code class="ph codeph">IMPALA_SERVER_ARGS</code> and <code class="ph codeph">IMPALA_STATE_STORE_ARGS</code>
+            settings. See <a class="xref" href="impala_security_guidelines.html#security_guidelines">Security Guidelines for Impala</a> for
+            details.
+          </p>
+        </li>
+
+        <li class="li" id="config_options_noncm__default_query_options">
+          <div class="p">
+            Another setting you might add to <code class="ph codeph">IMPALA_SERVER_ARGS</code> is a
+            comma-separated list of query options and values:
+<pre class="pre codeblock"><code>-default_query_options='<var class="keyword varname">option</var>=<var class="keyword varname">value</var>,<var class="keyword varname">option</var>=<var class="keyword varname">value</var>,...'
+</code></pre>
+            These options control the behavior of queries performed by this
+            <span class="keyword cmdname">impalad</span> instance. The option values you specify here override the
+            default values for <a class="xref" href="impala_query_options.html#query_options">Impala query
+            options</a>, as shown by the <code class="ph codeph">SET</code> statement in
+            <span class="keyword cmdname">impala-shell</span>.
+          </div>
+        </li>
+
+
+
+        <li class="li">
+          <p class="p">
+            During troubleshooting, <span class="keyword">the appropriate support channel</span> might direct you to change other values,
+            particularly for <code class="ph codeph">IMPALA_SERVER_ARGS</code>, to work around issues or
+            gather debugging information.
+          </p>
+        </li>
+      </ul>
+
+
+
+      <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+        <p class="p">
+          These startup options for the <span class="keyword cmdname">impalad</span> daemon are different from the
+          command-line options for the <span class="keyword cmdname">impala-shell</span> command. For the
+          <span class="keyword cmdname">impala-shell</span> options, see
+          <a class="xref" href="impala_shell_options.html#shell_options">impala-shell Configuration Options</a>.
+        </p>
+      </div>
+
+      
+
+    </div>
+
+    
+
+    
+
+    
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="config_options__config_options_checking">
+
+    <h2 class="title topictitle2" id="ariaid-title3">Checking the Values of Impala Configuration Options</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        You can check the current runtime value of all these settings through the Impala web
+        interface, available by default at
+        <code class="ph codeph">http://<var class="keyword varname">impala_hostname</var>:25000/varz</code> for the
+        <span class="keyword cmdname">impalad</span> daemon,
+        <code class="ph codeph">http://<var class="keyword varname">impala_hostname</var>:25010/varz</code> for the
+        <span class="keyword cmdname">statestored</span> daemon, or
+        <code class="ph codeph">http://<var class="keyword varname">impala_hostname</var>:25020/varz</code> for the
+        <span class="keyword cmdname">catalogd</span> daemon.
+      </p>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="config_options__config_options_impalad">
+
+    <h2 class="title topictitle2" id="ariaid-title4">Startup Options for impalad Daemon</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        The <code class="ph codeph">impalad</code> daemon implements the main Impala service, which performs
+        query processing and reads and writes the data files.
+      </p>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title5" id="config_options__config_options_statestored">
+
+    <h2 class="title topictitle2" id="ariaid-title5">Startup Options for statestored Daemon</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        The <span class="keyword cmdname">statestored</span> daemon implements the Impala statestore service,
+        which monitors the availability of Impala services across the cluster, and handles
+        situations such as nodes becoming unavailable or becoming available again.
+      </p>
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title6" id="config_options__config_options_catalogd">
+
+    <h2 class="title topictitle2" id="ariaid-title6">Startup Options for catalogd Daemon</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        The <span class="keyword cmdname">catalogd</span> daemon implements the Impala catalog service, which
+        broadcasts metadata changes to all the Impala nodes when Impala creates a table, inserts
+        data, or performs other kinds of DDL and DML operations.
+      </p>
+
+      <p class="p">
+        By default, the metadata loading and caching on startup happens asynchronously, so Impala can begin
+        accepting requests promptly. To enable the original behavior, where Impala waited until all metadata was
+        loaded before accepting any requests, set the <span class="keyword cmdname">catalogd</span> configuration option
+        <code class="ph codeph">--load_catalog_in_background=false</code>.
+      </p>
+
+    </div>
+
+  </article>
+
+</article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_config_performance.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_config_performance.html b/docs/build/html/topics/impala_config_performance.html
new file mode 100644
index 0000000..61de174
--- /dev/null
+++ b/docs/build/html/topics/impala_config_performance.html
@@ -0,0 +1,149 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_config.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="config_performance"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Post-Installation Configuration for Impala</title></head><body id="config_performance"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Post-Installation Configuration for Impala</h1>
+  
+
+  <div class="body conbody">
+
+    <p class="p" id="config_performance__p_24">
+      This section describes the mandatory and recommended configuration settings for Impala. If Impala is
+      installed using cluster management software, some of these configurations might be completed automatically; you must still
+      configure short-circuit reads manually. If you want to customize your environment, consider making the changes described in this topic.
+    </p>
+
+    <ul class="ul">
+      <li class="li">
+        You must enable short-circuit reads, whether or not Impala was installed with cluster
+        management software. This setting goes in the Impala configuration settings, not the Hadoop-wide settings.
+      </li>
+
+      <li class="li">
+        You must enable block location tracking, and you can optionally enable native checksumming for optimal performance.
+      </li>
+    </ul>
+
+    <section class="section" id="config_performance__section_fhq_wyv_ls"><h2 class="title sectiontitle">Mandatory: Short-Circuit Reads</h2>
+      
+      <p class="p"> Enabling short-circuit reads allows Impala to read local data directly
+        from the file system. This removes the need to communicate through the
+        DataNodes, improving performance. This setting also minimizes the number
+        of additional copies of data. Short-circuit reads requires
+          <code class="ph codeph">libhadoop.so</code>
+        (the Hadoop Native Library) to be accessible to both the server and the
+        client. <code class="ph codeph">libhadoop.so</code> is not available if you have
+        installed from a tarball. You must install from an
+        <code class="ph codeph">.rpm</code>, <code class="ph codeph">.deb</code>, or parcel to use
+        short-circuit local reads.
+      </p>
+      <p class="p">
+        <strong class="ph b">To configure DataNodes for short-circuit reads:</strong>
+      </p>
+      <ol class="ol" id="config_performance__ol_qlq_wyv_ls">
+        <li class="li" id="config_performance__copy_config_files"> Copy the client
+            <code class="ph codeph">core-site.xml</code> and <code class="ph codeph">hdfs-site.xml</code>
+          configuration files from the Hadoop configuration directory to the
+          Impala configuration directory. The default Impala configuration
+          location is <code class="ph codeph">/etc/impala/conf</code>. </li>
+        <li class="li">
+          
+          
+          
+          On all Impala nodes, configure the following properties in 
+          
+          Impala's copy of <code class="ph codeph">hdfs-site.xml</code> as shown: <pre class="pre codeblock"><code>&lt;property&gt;
+    &lt;name&gt;dfs.client.read.shortcircuit&lt;/name&gt;
+    &lt;value&gt;true&lt;/value&gt;
+&lt;/property&gt;
+
+&lt;property&gt;
+    &lt;name&gt;dfs.domain.socket.path&lt;/name&gt;
+    &lt;value&gt;/var/run/hdfs-sockets/dn&lt;/value&gt;
+&lt;/property&gt;
+
+&lt;property&gt;
+    &lt;name&gt;dfs.client.file-block-storage-locations.timeout.millis&lt;/name&gt;
+    &lt;value&gt;10000&lt;/value&gt;
+&lt;/property&gt;</code></pre>
+          
+          
+        </li>
+        <li class="li">
+          <p class="p"> If <code class="ph codeph">/var/run/hadoop-hdfs/</code> is group-writable, make
+            sure its group is <code class="ph codeph">root</code>. </p>
+          <div class="note note note_note"><span class="note__title notetitle">Note:</span>  If you are also going to enable block location tracking, you
+            can skip copying configuration files and restarting DataNodes and go
+            straight to <a class="xref" href="#config_performance__block_location_tracking">Optional: Block Location Tracking</a>.
+            Configuring short-circuit reads and block location tracking require
+            the same process of copying files and restarting services, so you
+            can complete that process once when you have completed all
+            configuration changes. Whether you copy files and restart services
+            now or during configuring block location tracking, short-circuit
+            reads are not enabled until you complete those final steps. </div>
+        </li>
+        <li class="li" id="config_performance__restart_all_datanodes"> After applying these changes, restart
+          all DataNodes. </li>
+      </ol>
+    </section>
+
+    <section class="section" id="config_performance__block_location_tracking"><h2 class="title sectiontitle">Mandatory: Block Location Tracking</h2>
+
+      
+
+      <p class="p">
+        Enabling block location metadata allows Impala to know which disk data blocks are located on, allowing
+        better utilization of the underlying disks. Impala will not start unless this setting is enabled.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">To enable block location tracking:</strong>
+      </p>
+
+      <ol class="ol">
+        <li class="li">
+          For each DataNode, adding the following to the&nbsp;<code class="ph codeph">hdfs-site.xml</code> file:
+<pre class="pre codeblock"><code>&lt;property&gt;
+  &lt;name&gt;dfs.datanode.hdfs-blocks-metadata.enabled&lt;/name&gt;
+  &lt;value&gt;true&lt;/value&gt;
+&lt;/property&gt; </code></pre>
+        </li>
+
+        <li class="li"> Copy the client
+            <code class="ph codeph">core-site.xml</code> and <code class="ph codeph">hdfs-site.xml</code>
+          configuration files from the Hadoop configuration directory to the
+          Impala configuration directory. The default Impala configuration
+          location is <code class="ph codeph">/etc/impala/conf</code>. </li>
+
+        <li class="li"> After applying these changes, restart
+          all DataNodes. </li>
+      </ol>
+    </section>
+
+    <section class="section" id="config_performance__native_checksumming"><h2 class="title sectiontitle">Optional: Native Checksumming</h2>
+
+      
+
+      <p class="p">
+        Enabling native checksumming causes Impala to use an optimized native library for computing checksums, if
+        that library is available.
+      </p>
+
+      <p class="p" id="config_performance__p_29">
+        <strong class="ph b">To enable native checksumming:</strong>
+      </p>
+
+      <p class="p">
+        If you installed <span class="keyword"></span> from packages, the native checksumming library is installed and setup correctly. In
+        such a case, no additional steps are required. Conversely, if you installed by other means, such as with
+        tarballs, native checksumming may not be available due to missing shared objects. Finding the message
+        "<code class="ph codeph">Unable to load native-hadoop library for your platform... using builtin-java classes where
+        applicable</code>" in the Impala logs indicates native checksumming may be unavailable. To enable native
+        checksumming, you must build and install <code class="ph codeph">libhadoop.so</code> (the
+        
+        
+        Hadoop Native Library).
+      </p>
+    </section>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_config.html">Managing Impala</a></div></div></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_connecting.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_connecting.html b/docs/build/html/topics/impala_connecting.html
new file mode 100644
index 0000000..e48d850
--- /dev/null
+++ b/docs/build/html/topics/impala_connecting.html
@@ -0,0 +1,190 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_impala_shell.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="connecting"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Connecting to impalad through impala-shell</title></head><body id="connecting"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Connecting to impalad through impala-shell</h1>
+  
+  
+
+  <div class="body conbody">
+
+
+
+    <div class="p">
+      Within an <span class="keyword cmdname">impala-shell</span> session, you can only issue queries while connected to an instance
+      of the <span class="keyword cmdname">impalad</span> daemon. You can specify the connection information:
+      <ul class="ul">
+        <li class="li">
+          Through command-line options when you run the <span class="keyword cmdname">impala-shell</span> command.
+        </li>
+        <li class="li">
+          Through a configuration file that is read when you run the <span class="keyword cmdname">impala-shell</span> command.
+        </li>
+        <li class="li">
+          During an <span class="keyword cmdname">impala-shell</span> session, by issuing a <code class="ph codeph">CONNECT</code> command.
+        </li>
+      </ul>
+      See <a class="xref" href="impala_shell_options.html">impala-shell Configuration Options</a> for the command-line and configuration file options you can use.
+    </div>
+
+    <p class="p">
+      You can connect to any DataNode where an instance of <span class="keyword cmdname">impalad</span> is running,
+      and that host coordinates the execution of all queries sent to it.
+    </p>
+
+    <p class="p">
+      For simplicity during development, you might always connect to the same host, perhaps running <span class="keyword cmdname">impala-shell</span> on
+      the same host as <span class="keyword cmdname">impalad</span> and specifying the hostname as <code class="ph codeph">localhost</code>.
+    </p>
+
+    <p class="p">
+      In a production environment, you might enable load balancing, in which you connect to specific host/port combination
+      but queries are forwarded to arbitrary hosts. This technique spreads the overhead of acting as the coordinator
+      node among all the DataNodes in the cluster. See <a class="xref" href="impala_proxy.html">Using Impala through a Proxy for High Availability</a> for details.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">To connect the Impala shell during shell startup:</strong>
+    </p>
+
+    <ol class="ol">
+      <li class="li">
+        Locate the hostname of a DataNode within the cluster that is running an instance of the
+        <span class="keyword cmdname">impalad</span> daemon. If that DataNode uses a non-default port (something
+        other than port 21000) for <span class="keyword cmdname">impala-shell</span> connections, find out the
+        port number also.
+      </li>
+
+      <li class="li">
+        Use the <code class="ph codeph">-i</code> option to the
+        <span class="keyword cmdname">impala-shell</span> interpreter to specify the connection information for
+        that instance of <span class="keyword cmdname">impalad</span>:
+<pre class="pre codeblock"><code>
+# When you are logged into the same machine running impalad.
+# The prompt will reflect the current hostname.
+$ impala-shell
+
+# When you are logged into the same machine running impalad.
+# The host will reflect the hostname 'localhost'.
+$ impala-shell -i localhost
+
+# When you are logged onto a different host, perhaps a client machine
+# outside the Hadoop cluster.
+$ impala-shell -i <var class="keyword varname">some.other.hostname</var>
+
+# When you are logged onto a different host, and impalad is listening
+# on a non-default port. Perhaps a load balancer is forwarding requests
+# to a different host/port combination behind the scenes.
+$ impala-shell -i <var class="keyword varname">some.other.hostname</var>:<var class="keyword varname">port_number</var>
+</code></pre>
+      </li>
+    </ol>
+
+    <p class="p">
+      <strong class="ph b">To connect the Impala shell after shell startup:</strong>
+    </p>
+
+    <ol class="ol">
+      <li class="li">
+        Start the Impala shell with no connection:
+<pre class="pre codeblock"><code>$ impala-shell</code></pre>
+        <p class="p">
+          You should see a prompt like the following:
+        </p>
+<pre class="pre codeblock"><code>Welcome to the Impala shell. Press TAB twice to see a list of available commands.
+...
+<span class="ph">(Shell
+      build version: Impala Shell v2.8.x (<var class="keyword varname">hash</var>) built on
+      <var class="keyword varname">date</var>)</span>
+[Not connected] &gt; </code></pre>
+      </li>
+
+      <li class="li">
+        Locate the hostname of a DataNode within the cluster that is running an instance of the
+        <span class="keyword cmdname">impalad</span> daemon. If that DataNode uses a non-default port (something
+        other than port 21000) for <span class="keyword cmdname">impala-shell</span> connections, find out the
+        port number also.
+      </li>
+
+      <li class="li">
+        Use the <code class="ph codeph">connect</code> command to connect to an Impala instance. Enter a command of the form:
+<pre class="pre codeblock"><code>[Not connected] &gt; connect <var class="keyword varname">impalad-host</var>
+[<var class="keyword varname">impalad-host</var>:21000] &gt;</code></pre>
+        <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+          Replace <var class="keyword varname">impalad-host</var> with the hostname you have configured for any DataNode running
+          Impala in your environment. The changed prompt indicates a successful connection.
+        </div>
+      </li>
+    </ol>
+
+    <p class="p">
+      <strong class="ph b">To start <span class="keyword cmdname">impala-shell</span> in a specific database:</strong>
+    </p>
+
+    <p class="p">
+      You can use all the same connection options as in previous examples.
+      For simplicity, these examples assume that you are logged into one of
+      the DataNodes that is running the <span class="keyword cmdname">impalad</span> daemon.
+    </p>
+
+    <ol class="ol">
+      <li class="li">
+        Find the name of the database containing the relevant tables, views, and so
+        on that you want to operate on.
+      </li>
+
+      <li class="li">
+        Use the <code class="ph codeph">-d</code> option to the
+        <span class="keyword cmdname">impala-shell</span> interpreter to connect and immediately
+        switch to the specified database, without the need for a <code class="ph codeph">USE</code>
+        statement or fully qualified names:
+<pre class="pre codeblock"><code>
+# Subsequent queries with unqualified names operate on
+# tables, views, and so on inside the database named 'staging'.
+$ impala-shell -i localhost -d staging
+
+# It is common during development, ETL, benchmarking, and so on
+# to have different databases containing the same table names
+# but with different contents or layouts.
+$ impala-shell -i localhost -d parquet_snappy_compression
+$ impala-shell -i localhost -d parquet_gzip_compression
+</code></pre>
+      </li>
+    </ol>
+
+    <p class="p">
+      <strong class="ph b">To run one or several statements in non-interactive mode:</strong>
+    </p>
+
+    <p class="p">
+      You can use all the same connection options as in previous examples.
+      For simplicity, these examples assume that you are logged into one of
+      the DataNodes that is running the <span class="keyword cmdname">impalad</span> daemon.
+    </p>
+
+    <ol class="ol">
+      <li class="li">
+        Construct a statement, or a file containing a sequence of statements,
+        that you want to run in an automated way, without typing or copying
+        and pasting each time.
+      </li>
+
+      <li class="li">
+        Invoke <span class="keyword cmdname">impala-shell</span> with the <code class="ph codeph">-q</code> option to run a single statement, or
+        the <code class="ph codeph">-f</code> option to run a sequence of statements from a file.
+        The <span class="keyword cmdname">impala-shell</span> command returns immediately, without going into
+        the interactive interpreter.
+<pre class="pre codeblock"><code>
+# A utility command that you might run while developing shell scripts
+# to manipulate HDFS files.
+$ impala-shell -i localhost -d database_of_interest -q 'show tables'
+
+# A sequence of CREATE TABLE, CREATE VIEW, and similar DDL statements
+# can go into a file to make the setup process repeatable.
+$ impala-shell -i localhost -d database_of_interest -f recreate_tables.sql
+</code></pre>
+      </li>
+    </ol>
+
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_impala_shell.html">Using the Impala Shell (impala-shell Command)</a></div></div></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_conversion_functions.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_conversion_functions.html b/docs/build/html/topics/impala_conversion_functions.html
new file mode 100644
index 0000000..d49cef8
--- /dev/null
+++ b/docs/build/html/topics/impala_conversion_functions.html
@@ -0,0 +1,288 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_functions.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="conversion_functions"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Impala Type Conversion Functions</title></head><body id="conversion_functions"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Impala Type Conversion Functions</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      Conversion functions are usually used in combination with other functions, to explicitly pass the expected
+      data types. Impala has strict rules regarding data types for function parameters. For example, Impala does
+      not automatically convert a <code class="ph codeph">DOUBLE</code> value to <code class="ph codeph">FLOAT</code>, a
+      <code class="ph codeph">BIGINT</code> value to <code class="ph codeph">INT</code>, or other conversion where precision could be lost or
+      overflow could occur. Also, for reporting or dealing with loosely defined schemas in big data contexts,
+      you might frequently need to convert values to or from the <code class="ph codeph">STRING</code> type.
+    </p>
+
+    <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+      Although in <span class="keyword">Impala 2.3</span>, the <code class="ph codeph">SHOW FUNCTIONS</code> output for
+      database <code class="ph codeph">_IMPALA_BUILTINS</code> contains some function signatures
+      matching the pattern <code class="ph codeph">castto*</code>, these functions are not intended
+      for public use and are expected to be hidden in future.
+    </div>
+
+    <p class="p">
+      <strong class="ph b">Function reference:</strong>
+    </p>
+
+    <p class="p">
+      Impala supports the following type conversion functions:
+    </p>
+
+<dl class="dl">
+
+
+<dt class="dt dlterm" id="conversion_functions__cast">
+<code class="ph codeph">cast(<var class="keyword varname">expr</var> AS <var class="keyword varname">type</var>)</code>
+</dt>
+
+<dd class="dd">
+
+<strong class="ph b">Purpose:</strong> Converts the value of an expression to any other type.
+If the expression value is of a type that cannot be converted to the target type, the result is <code class="ph codeph">NULL</code>.
+<p class="p"><strong class="ph b">Usage notes:</strong>
+Use <code class="ph codeph">CAST</code> when passing a column value or literal to a function that
+expects a parameter with a different type.
+Frequently used in SQL operations such as <code class="ph codeph">CREATE TABLE AS SELECT</code>
+and <code class="ph codeph">INSERT ... VALUES</code> to ensure that values from various sources
+are of the appropriate type for the destination columns.
+Where practical, do a one-time <code class="ph codeph">CAST()</code> operation during the ingestion process
+to make each column into the appropriate type, rather than using many <code class="ph codeph">CAST()</code>
+operations in each query; doing type conversions for each row during each query can be expensive
+for tables with millions or billions of rows.
+</p>
+    <p class="p">
+        The way this function deals with time zones when converting to or from <code class="ph codeph">TIMESTAMP</code>
+        values is affected by the <code class="ph codeph">-use_local_tz_for_unix_timestamp_conversions</code> startup flag for the
+        <span class="keyword cmdname">impalad</span> daemon. See <a class="xref" href="../shared/../topics/impala_timestamp.html#timestamp">TIMESTAMP Data Type</a> for details about
+        how Impala handles time zone considerations for the <code class="ph codeph">TIMESTAMP</code> data type.
+      </p>
+
+<p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+<pre class="pre codeblock"><code>select concat('Here are the first ',10,' results.'); -- Fails
+select concat('Here are the first ',cast(10 as string),' results.'); -- Succeeds
+</code></pre>
+<p class="p">
+The following example starts with a text table where every column has a type of <code class="ph codeph">STRING</code>,
+which might be how you ingest data of unknown schema until you can verify the cleanliness of the underly values.
+Then it uses <code class="ph codeph">CAST()</code> to create a new Parquet table with the same data, but using specific
+numeric data types for the columns with numeric data. Using numeric types of appropriate sizes can result in
+substantial space savings on disk and in memory, and performance improvements in queries,
+over using strings or larger-than-necessary numeric types.
+</p>
+<pre class="pre codeblock"><code>create table t1 (name string, x string, y string, z string);
+
+create table t2 stored as parquet
+as select
+  name,
+  cast(x as bigint) x,
+  cast(y as timestamp) y,
+  cast(z as smallint) z
+from t1;
+
+describe t2;
++------+----------+---------+
+| name | type     | comment |
++------+----------+---------+
+| name | string   |         |
+| x    | bigint   |         |
+| y    | smallint |         |
+| z    | tinyint  |         |
++------+----------+---------+
+</code></pre>
+<p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+<p class="p">
+
+  For details of casts from each kind of data type, see the description of
+  the appropriate type:
+  <a class="xref" href="impala_tinyint.html#tinyint">TINYINT Data Type</a>,
+  <a class="xref" href="impala_smallint.html#smallint">SMALLINT Data Type</a>,
+  <a class="xref" href="impala_int.html#int">INT Data Type</a>,
+  <a class="xref" href="impala_bigint.html#bigint">BIGINT Data Type</a>,
+  <a class="xref" href="impala_float.html#float">FLOAT Data Type</a>,
+  <a class="xref" href="impala_double.html#double">DOUBLE Data Type</a>,
+  <a class="xref" href="impala_decimal.html#decimal">DECIMAL Data Type (Impala 1.4 or higher only)</a>,
+  <a class="xref" href="impala_string.html#string">STRING Data Type</a>,
+  <a class="xref" href="impala_char.html#char">CHAR Data Type (Impala 2.0 or higher only)</a>,
+  <a class="xref" href="impala_varchar.html#varchar">VARCHAR Data Type (Impala 2.0 or higher only)</a>,
+  <a class="xref" href="impala_timestamp.html#timestamp">TIMESTAMP Data Type</a>,
+  <a class="xref" href="impala_boolean.html#boolean">BOOLEAN Data Type</a>
+</p>
+</dd>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+<dt class="dt dlterm" id="conversion_functions__typeof">
+<code class="ph codeph">typeof(type value)</code>
+</dt>
+<dd class="dd">
+
+<strong class="ph b">Purpose:</strong> Returns the name of the data type corresponding to an expression. For types with
+extra attributes, such as length for <code class="ph codeph">CHAR</code> and <code class="ph codeph">VARCHAR</code>,
+or precision and scale for <code class="ph codeph">DECIMAL</code>, includes the full specification of the type.
+
+<p class="p"><strong class="ph b">Return type:</strong> <code class="ph codeph">string</code></p>
+<p class="p"><strong class="ph b">Usage notes:</strong> Typically used in interactive exploration of a schema, or in application code that programmatically generates schema definitions such as <code class="ph codeph">CREATE TABLE</code> statements.
+For example, previously, to understand the type of an expression such as
+<code class="ph codeph">col1 / col2</code> or <code class="ph codeph">concat(col1, col2, col3)</code>,
+you might have created a dummy table with a single row, using syntax such as <code class="ph codeph">CREATE TABLE foo AS SELECT 5 / 3.0</code>,
+and then doing a <code class="ph codeph">DESCRIBE</code> to see the type of the row.
+Or you might have done a <code class="ph codeph">CREATE TABLE AS SELECT</code> operation to create a table and
+copy data into it, only learning the types of the columns by doing a <code class="ph codeph">DESCRIBE</code> afterward.
+This technique is especially useful for arithmetic expressions involving <code class="ph codeph">DECIMAL</code> types,
+because the precision and scale of the result is typically different than that of the operands.
+</p>
+<p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.3.0</span>
+      </p>
+<p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+<p class="p">
+These examples show how to check the type of a simple literal or function value.
+Notice how adding even tiny integers together changes the data type of the result to
+avoid overflow, and how the results of arithmetic operations on <code class="ph codeph">DECIMAL</code> values
+have specific precision and scale attributes.
+</p>
+<pre class="pre codeblock"><code>select typeof(2)
++-----------+
+| typeof(2) |
++-----------+
+| TINYINT   |
++-----------+
+
+select typeof(2+2)
++---------------+
+| typeof(2 + 2) |
++---------------+
+| SMALLINT      |
++---------------+
+
+select typeof('xyz')
++---------------+
+| typeof('xyz') |
++---------------+
+| STRING        |
++---------------+
+
+select typeof(now())
++---------------+
+| typeof(now()) |
++---------------+
+| TIMESTAMP     |
++---------------+
+
+select typeof(5.3 / 2.1)
++-------------------+
+| typeof(5.3 / 2.1) |
++-------------------+
+| DECIMAL(6,4)      |
++-------------------+
+
+select typeof(5.30001 / 2342.1);
++--------------------------+
+| typeof(5.30001 / 2342.1) |
++--------------------------+
+| DECIMAL(13,11)           |
++--------------------------+
+
+select typeof(typeof(2+2))
++-----------------------+
+| typeof(typeof(2 + 2)) |
++-----------------------+
+| STRING                |
++-----------------------+
+</code></pre>
+
+<p class="p">
+This example shows how even if you do not have a record of the type of a column,
+for example because the type was changed by <code class="ph codeph">ALTER TABLE</code> after the
+original <code class="ph codeph">CREATE TABLE</code>, you can still find out the type in a
+more compact form than examining the full <code class="ph codeph">DESCRIBE</code> output.
+Remember to use <code class="ph codeph">LIMIT 1</code> in such cases, to avoid an identical
+result value for every row in the table.
+</p>
+<pre class="pre codeblock"><code>create table typeof_example (a int, b tinyint, c smallint, d bigint);
+
+/* Empty result set if there is no data in the table. */
+select typeof(a) from typeof_example;
+
+/* OK, now we have some data but the type of column A is being changed. */
+insert into typeof_example values (1, 2, 3, 4);
+alter table typeof_example change a a bigint;
+
+/* We can always find out the current type of that column without doing a full DESCRIBE. */
+select typeof(a) from typeof_example limit 1;
++-----------+
+| typeof(a) |
++-----------+
+| BIGINT    |
++-----------+
+</code></pre>
+<p class="p">
+This example shows how you might programmatically generate a <code class="ph codeph">CREATE TABLE</code> statement
+with the appropriate column definitions to hold the result values of arbitrary expressions.
+The <code class="ph codeph">typeof()</code> function lets you construct a detailed <code class="ph codeph">CREATE TABLE</code> statement
+without actually creating the table, as opposed to <code class="ph codeph">CREATE TABLE AS SELECT</code> operations
+where you create the destination table but only learn the column data types afterward through <code class="ph codeph">DESCRIBE</code>.
+</p>
+<pre class="pre codeblock"><code>describe typeof_example;
++------+----------+---------+
+| name | type     | comment |
++------+----------+---------+
+| a    | bigint   |         |
+| b    | tinyint  |         |
+| c    | smallint |         |
+| d    | bigint   |         |
++------+----------+---------+
+
+/* An ETL or business intelligence tool might create variations on a table with different file formats,
+   different sets of columns, and so on. TYPEOF() lets an application introspect the types of the original columns. */
+select concat('create table derived_table (a ', typeof(a), ', b ', typeof(b), ', c ',
+    typeof(c), ', d ', typeof(d), ') stored as parquet;')
+  as 'create table statement'
+from typeof_example limit 1;
++-------------------------------------------------------------------------------------------+
+| create table statement                                                                    |
++-------------------------------------------------------------------------------------------+
+| create table derived_table (a BIGINT, b TINYINT, c SMALLINT, d BIGINT) stored as parquet; |
++-------------------------------------------------------------------------------------------+
+</code></pre>
+</dd>
+
+
+</dl>
+
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_functions.html">Impala Built-In Functions</a></div></div></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_count.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_count.html b/docs/build/html/topics/impala_count.html
new file mode 100644
index 0000000..c1b961a
--- /dev/null
+++ b/docs/build/html/topics/impala_count.html
@@ -0,0 +1,353 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_aggregate_functions.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="count"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>COUNT Function</title></head><body id="count"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">COUNT Function</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      
+      An aggregate function that returns the number of rows, or the number of non-<code class="ph codeph">NULL</code> rows.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>COUNT([DISTINCT | ALL] <var class="keyword varname">expression</var>) [OVER (<var class="keyword varname">analytic_clause</var>)]</code></pre>
+
+    <p class="p">
+      Depending on the argument, <code class="ph codeph">COUNT()</code> considers rows that meet certain conditions:
+    </p>
+
+    <ul class="ul">
+      <li class="li">
+        The notation <code class="ph codeph">COUNT(*)</code> includes <code class="ph codeph">NULL</code> values in the total.
+      </li>
+
+      <li class="li">
+        The notation <code class="ph codeph">COUNT(<var class="keyword varname">column_name</var>)</code> only considers rows where the column
+        contains a non-<code class="ph codeph">NULL</code> value.
+      </li>
+
+      <li class="li">
+        You can also combine <code class="ph codeph">COUNT</code> with the <code class="ph codeph">DISTINCT</code> operator to eliminate
+        duplicates before counting, and to count the combinations of values across multiple columns.
+      </li>
+    </ul>
+
+    <p class="p">
+      When the query contains a <code class="ph codeph">GROUP BY</code> clause, returns one value for each combination of
+      grouping values.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Return type:</strong> <code class="ph codeph">BIGINT</code>
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+    <p class="p">
+        If you frequently run aggregate functions such as <code class="ph codeph">MIN()</code>, <code class="ph codeph">MAX()</code>, and
+        <code class="ph codeph">COUNT(DISTINCT)</code> on partition key columns, consider enabling the <code class="ph codeph">OPTIMIZE_PARTITION_KEY_SCANS</code>
+        query option, which optimizes such queries. This feature is available in <span class="keyword">Impala 2.5</span> and higher.
+        See <a class="xref" href="../shared/../topics/impala_optimize_partition_key_scans.html">OPTIMIZE_PARTITION_KEY_SCANS Query Option (Impala 2.5 or higher only)</a>
+        for the kinds of queries that this option applies to, and slight differences in how partitions are
+        evaluated when this query option is enabled.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Complex type considerations:</strong>
+      </p>
+
+    <p class="p">
+        To access a column with a complex type (<code class="ph codeph">ARRAY</code>, <code class="ph codeph">STRUCT</code>, or <code class="ph codeph">MAP</code>)
+        in an aggregation function, you unpack the individual elements using join notation in the query,
+        and then apply the function to the final scalar item, field, key, or value at the bottom of any nested type hierarchy in the column.
+        See <a class="xref" href="../shared/../topics/impala_complex_types.html#complex_types">Complex Types (Impala 2.3 or higher only)</a> for details about using complex types in Impala.
+      </p>
+
+    <div class="p">
+The following example demonstrates calls to several aggregation functions
+using values from a column containing nested complex types
+(an <code class="ph codeph">ARRAY</code> of <code class="ph codeph">STRUCT</code> items).
+The array is unpacked inside the query using join notation.
+The array elements are referenced using the <code class="ph codeph">ITEM</code>
+pseudocolumn, and the structure fields inside the array elements
+are referenced using dot notation.
+Numeric values such as <code class="ph codeph">SUM()</code> and <code class="ph codeph">AVG()</code>
+are computed using the numeric <code class="ph codeph">R_NATIONKEY</code> field, and
+the general-purpose <code class="ph codeph">MAX()</code> and <code class="ph codeph">MIN()</code>
+values are computed from the string <code class="ph codeph">N_NAME</code> field.
+<pre class="pre codeblock"><code>describe region;
++-------------+-------------------------+---------+
+| name        | type                    | comment |
++-------------+-------------------------+---------+
+| r_regionkey | smallint                |         |
+| r_name      | string                  |         |
+| r_comment   | string                  |         |
+| r_nations   | array&lt;struct&lt;           |         |
+|             |   n_nationkey:smallint, |         |
+|             |   n_name:string,        |         |
+|             |   n_comment:string      |         |
+|             | &gt;&gt;                      |         |
++-------------+-------------------------+---------+
+
+select r_name, r_nations.item.n_nationkey
+  from region, region.r_nations as r_nations
+order by r_name, r_nations.item.n_nationkey;
++-------------+------------------+
+| r_name      | item.n_nationkey |
++-------------+------------------+
+| AFRICA      | 0                |
+| AFRICA      | 5                |
+| AFRICA      | 14               |
+| AFRICA      | 15               |
+| AFRICA      | 16               |
+| AMERICA     | 1                |
+| AMERICA     | 2                |
+| AMERICA     | 3                |
+| AMERICA     | 17               |
+| AMERICA     | 24               |
+| ASIA        | 8                |
+| ASIA        | 9                |
+| ASIA        | 12               |
+| ASIA        | 18               |
+| ASIA        | 21               |
+| EUROPE      | 6                |
+| EUROPE      | 7                |
+| EUROPE      | 19               |
+| EUROPE      | 22               |
+| EUROPE      | 23               |
+| MIDDLE EAST | 4                |
+| MIDDLE EAST | 10               |
+| MIDDLE EAST | 11               |
+| MIDDLE EAST | 13               |
+| MIDDLE EAST | 20               |
++-------------+------------------+
+
+select
+  r_name,
+  count(r_nations.item.n_nationkey) as count,
+  sum(r_nations.item.n_nationkey) as sum,
+  avg(r_nations.item.n_nationkey) as avg,
+  min(r_nations.item.n_name) as minimum,
+  max(r_nations.item.n_name) as maximum,
+  ndv(r_nations.item.n_nationkey) as distinct_vals
+from
+  region, region.r_nations as r_nations
+group by r_name
+order by r_name;
++-------------+-------+-----+------+-----------+----------------+---------------+
+| r_name      | count | sum | avg  | minimum   | maximum        | distinct_vals |
++-------------+-------+-----+------+-----------+----------------+---------------+
+| AFRICA      | 5     | 50  | 10   | ALGERIA   | MOZAMBIQUE     | 5             |
+| AMERICA     | 5     | 47  | 9.4  | ARGENTINA | UNITED STATES  | 5             |
+| ASIA        | 5     | 68  | 13.6 | CHINA     | VIETNAM        | 5             |
+| EUROPE      | 5     | 77  | 15.4 | FRANCE    | UNITED KINGDOM | 5             |
+| MIDDLE EAST | 5     | 58  | 11.6 | EGYPT     | SAUDI ARABIA   | 5             |
++-------------+-------+-----+------+-----------+----------------+---------------+
+</code></pre>
+</div>
+
+    <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>-- How many rows total are in the table, regardless of NULL values?
+select count(*) from t1;
+-- How many rows are in the table with non-NULL values for a column?
+select count(c1) from t1;
+-- Count the rows that meet certain conditions.
+-- Again, * includes NULLs, so COUNT(*) might be greater than COUNT(col).
+select count(*) from t1 where x &gt; 10;
+select count(c1) from t1 where x &gt; 10;
+-- Can also be used in combination with DISTINCT and/or GROUP BY.
+-- Combine COUNT and DISTINCT to find the number of unique values.
+-- Must use column names rather than * with COUNT(DISTINCT ...) syntax.
+-- Rows with NULL values are not counted.
+select count(distinct c1) from t1;
+-- Rows with a NULL value in _either_ column are not counted.
+select count(distinct c1, c2) from t1;
+-- Return more than one result.
+select month, year, count(distinct visitor_id) from web_stats group by month, year;
+</code></pre>
+
+    <div class="p">
+      The following examples show how to use <code class="ph codeph">COUNT()</code> in an analytic context. They use a table
+      containing integers from 1 to 10. Notice how the <code class="ph codeph">COUNT()</code> is reported for each input value, as
+      opposed to the <code class="ph codeph">GROUP BY</code> clause which condenses the result set.
+<pre class="pre codeblock"><code>select x, property, count(x) over (partition by property) as count from int_t where property in ('odd','even');
++----+----------+-------+
+| x  | property | count |
++----+----------+-------+
+| 2  | even     | 5     |
+| 4  | even     | 5     |
+| 6  | even     | 5     |
+| 8  | even     | 5     |
+| 10 | even     | 5     |
+| 1  | odd      | 5     |
+| 3  | odd      | 5     |
+| 5  | odd      | 5     |
+| 7  | odd      | 5     |
+| 9  | odd      | 5     |
++----+----------+-------+
+</code></pre>
+
+Adding an <code class="ph codeph">ORDER BY</code> clause lets you experiment with results that are cumulative or apply to a moving
+set of rows (the <span class="q">"window"</span>). The following examples use <code class="ph codeph">COUNT()</code> in an analytic context
+(that is, with an <code class="ph codeph">OVER()</code> clause) to produce a running count of all the even values,
+then a running count of all the odd values. The basic <code class="ph codeph">ORDER BY x</code> clause implicitly
+activates a window clause of <code class="ph codeph">RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW</code>,
+which is effectively the same as <code class="ph codeph">ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW</code>,
+therefore all of these examples produce the same results:
+<pre class="pre codeblock"><code>select x, property,
+  count(x) over (partition by property <strong class="ph b">order by x</strong>) as 'cumulative count'
+  from int_t where property in ('odd','even');
++----+----------+------------------+
+| x  | property | cumulative count |
++----+----------+------------------+
+| 2  | even     | 1                |
+| 4  | even     | 2                |
+| 6  | even     | 3                |
+| 8  | even     | 4                |
+| 10 | even     | 5                |
+| 1  | odd      | 1                |
+| 3  | odd      | 2                |
+| 5  | odd      | 3                |
+| 7  | odd      | 4                |
+| 9  | odd      | 5                |
++----+----------+------------------+
+
+select x, property,
+  count(x) over
+  (
+    partition by property
+    <strong class="ph b">order by x</strong>
+    <strong class="ph b">range between unbounded preceding and current row</strong>
+  ) as 'cumulative total'
+from int_t where property in ('odd','even');
++----+----------+------------------+
+| x  | property | cumulative count |
++----+----------+------------------+
+| 2  | even     | 1                |
+| 4  | even     | 2                |
+| 6  | even     | 3                |
+| 8  | even     | 4                |
+| 10 | even     | 5                |
+| 1  | odd      | 1                |
+| 3  | odd      | 2                |
+| 5  | odd      | 3                |
+| 7  | odd      | 4                |
+| 9  | odd      | 5                |
++----+----------+------------------+
+
+select x, property,
+  count(x) over
+  (
+    partition by property
+    <strong class="ph b">order by x</strong>
+    <strong class="ph b">rows between unbounded preceding and current row</strong>
+  ) as 'cumulative total'
+  from int_t where property in ('odd','even');
++----+----------+------------------+
+| x  | property | cumulative count |
++----+----------+------------------+
+| 2  | even     | 1                |
+| 4  | even     | 2                |
+| 6  | even     | 3                |
+| 8  | even     | 4                |
+| 10 | even     | 5                |
+| 1  | odd      | 1                |
+| 3  | odd      | 2                |
+| 5  | odd      | 3                |
+| 7  | odd      | 4                |
+| 9  | odd      | 5                |
++----+----------+------------------+
+</code></pre>
+
+The following examples show how to construct a moving window, with a running count taking into account 1 row before
+and 1 row after the current row, within the same partition (all the even values or all the odd values).
+Therefore, the count is consistently 3 for rows in the middle of the window, and 2 for
+rows near the ends of the window, where there is no preceding or no following row in the partition.
+Because of a restriction in the Impala <code class="ph codeph">RANGE</code> syntax, this type of
+moving window is possible with the <code class="ph codeph">ROWS BETWEEN</code> clause but not the <code class="ph codeph">RANGE BETWEEN</code>
+clause:
+<pre class="pre codeblock"><code>select x, property,
+  count(x) over
+  (
+    partition by property
+    <strong class="ph b">order by x</strong>
+    <strong class="ph b">rows between 1 preceding and 1 following</strong>
+  ) as 'moving total'
+  from int_t where property in ('odd','even');
++----+----------+--------------+
+| x  | property | moving total |
++----+----------+--------------+
+| 2  | even     | 2            |
+| 4  | even     | 3            |
+| 6  | even     | 3            |
+| 8  | even     | 3            |
+| 10 | even     | 2            |
+| 1  | odd      | 2            |
+| 3  | odd      | 3            |
+| 5  | odd      | 3            |
+| 7  | odd      | 3            |
+| 9  | odd      | 2            |
++----+----------+--------------+
+
+-- Doesn't work because of syntax restriction on RANGE clause.
+select x, property,
+  count(x) over
+  (
+    partition by property
+    <strong class="ph b">order by x</strong>
+    <strong class="ph b">range between 1 preceding and 1 following</strong>
+  ) as 'moving total'
+from int_t where property in ('odd','even');
+ERROR: AnalysisException: RANGE is only supported with both the lower and upper bounds UNBOUNDED or one UNBOUNDED and the other CURRENT ROW.
+</code></pre>
+    </div>
+
+    <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+        <p class="p">
+          By default, Impala only allows a single <code class="ph codeph">COUNT(DISTINCT <var class="keyword varname">columns</var>)</code>
+          expression in each query.
+        </p>
+        <p class="p">
+          If you do not need precise accuracy, you can produce an estimate of the distinct values for a column by
+          specifying <code class="ph codeph">NDV(<var class="keyword varname">column</var>)</code>; a query can contain multiple instances of
+          <code class="ph codeph">NDV(<var class="keyword varname">column</var>)</code>. To make Impala automatically rewrite
+          <code class="ph codeph">COUNT(DISTINCT)</code> expressions to <code class="ph codeph">NDV()</code>, enable the
+          <code class="ph codeph">APPX_COUNT_DISTINCT</code> query option.
+        </p>
+        <p class="p">
+          To produce the same result as multiple <code class="ph codeph">COUNT(DISTINCT)</code> expressions, you can use the
+          following technique for queries involving a single table:
+        </p>
+<pre class="pre codeblock"><code>select v1.c1 result1, v2.c1 result2 from
+  (select count(distinct col1) as c1 from t1) v1
+    cross join
+  (select count(distinct col2) as c1 from t1) v2;
+</code></pre>
+        <p class="p">
+          Because <code class="ph codeph">CROSS JOIN</code> is an expensive operation, prefer to use the <code class="ph codeph">NDV()</code>
+          technique wherever practical.
+        </p>
+      </div>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+    <p class="p">
+      <a class="xref" href="impala_analytic_functions.html#analytic_functions">Impala Analytic Functions</a>
+    </p>
+
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_aggregate_functions.html">Impala Aggregate Functions</a></div></div></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_create_database.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_create_database.html b/docs/build/html/topics/impala_create_database.html
new file mode 100644
index 0000000..bfee172
--- /dev/null
+++ b/docs/build/html/topics/impala_create_database.html
@@ -0,0 +1,209 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_langref_sql.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="create_database"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>CREATE DATABASE Statement</title></head><body id="create_database"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">CREATE DATABASE Statement</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      
+      Creates a new database.
+    </p>
+
+    <p class="p">
+      In Impala, a database is both:
+    </p>
+
+    <ul class="ul">
+      <li class="li">
+        A logical construct for grouping together related tables, views, and functions within their own namespace.
+        You might use a separate database for each application, set of related tables, or round of experimentation.
+      </li>
+
+      <li class="li">
+        A physical construct represented by a directory tree in HDFS. Tables (internal tables), partitions, and
+        data files are all located under this directory. You can perform HDFS-level operations such as backing it up and measuring space usage,
+        or remove it with a <code class="ph codeph">DROP DATABASE</code> statement.
+      </li>
+    </ul>
+
+    <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>CREATE (DATABASE|SCHEMA) [IF NOT EXISTS] <var class="keyword varname">database_name</var>[COMMENT '<var class="keyword varname">database_comment</var>']
+  [LOCATION <var class="keyword varname">hdfs_path</var>];</code></pre>
+
+    <p class="p">
+        <strong class="ph b">Statement type:</strong> DDL
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+    <p class="p">
+      A database is physically represented as a directory in HDFS, with a filename extension <code class="ph codeph">.db</code>,
+      under the main Impala data directory. If the associated HDFS directory does not exist, it is created for you.
+      All databases and their associated directories are top-level objects, with no physical or logical nesting.
+    </p>
+
+    <p class="p">
+      After creating a database, to make it the current database within an <span class="keyword cmdname">impala-shell</span> session,
+      use the <code class="ph codeph">USE</code> statement. You can refer to tables in the current database without prepending
+      any qualifier to their names.
+    </p>
+
+    <p class="p">
+      When you first connect to Impala through <span class="keyword cmdname">impala-shell</span>, the database you start in (before
+      issuing any <code class="ph codeph">CREATE DATABASE</code> or <code class="ph codeph">USE</code> statements) is named
+      <code class="ph codeph">default</code>.
+    </p>
+
+    <div class="p">
+        Impala includes another predefined database, <code class="ph codeph">_impala_builtins</code>, that serves as the location
+        for the <a class="xref" href="../shared/../topics/impala_functions.html#builtins">built-in functions</a>. To see the built-in
+        functions, use a statement like the following:
+<pre class="pre codeblock"><code>show functions in _impala_builtins;
+show functions in _impala_builtins like '*<var class="keyword varname">substring</var>*';
+</code></pre>
+      </div>
+
+    <p class="p">
+      After creating a database, your <span class="keyword cmdname">impala-shell</span> session or another
+      <span class="keyword cmdname">impala-shell</span> connected to the same node can immediately access that database. To access
+      the database through the Impala daemon on a different node, issue the <code class="ph codeph">INVALIDATE METADATA</code>
+      statement first while connected to that other node.
+    </p>
+
+    <p class="p">
+      Setting the <code class="ph codeph">LOCATION</code> attribute for a new database is a way to work with sets of files in an
+      HDFS directory structure outside the default Impala data directory, as opposed to setting the
+      <code class="ph codeph">LOCATION</code> attribute for each individual table.
+    </p>
+
+    <p class="p">
+        If you connect to different Impala nodes within an <span class="keyword cmdname">impala-shell</span> session for
+        load-balancing purposes, you can enable the <code class="ph codeph">SYNC_DDL</code> query option to make each DDL
+        statement wait before returning, until the new or changed metadata has been received by all the Impala
+        nodes. See <a class="xref" href="../shared/../topics/impala_sync_ddl.html#sync_ddl">SYNC_DDL Query Option</a> for details.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Hive considerations:</strong>
+      </p>
+
+    <p class="p">
+      When you create a database in Impala, the database can also be used by Hive.
+      When you create a database in Hive, issue an <code class="ph codeph">INVALIDATE METADATA</code>
+      statement in Impala to make Impala permanently aware of the new database.
+    </p>
+
+    <p class="p">
+      The <code class="ph codeph">SHOW DATABASES</code> statement lists all databases, or the databases whose name
+      matches a wildcard pattern. <span class="ph">In <span class="keyword">Impala 2.5</span> and higher, the
+      <code class="ph codeph">SHOW DATABASES</code> output includes a second column that displays the associated
+      comment, if any, for each database.</span>
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Amazon S3 considerations:</strong>
+      </p>
+
+    <p class="p">
+      To specify that any tables created within a database reside on the Amazon S3 system,
+      you can include an <code class="ph codeph">s3a://</code> prefix on the <code class="ph codeph">LOCATION</code>
+      attribute. In <span class="keyword">Impala 2.6</span> and higher, Impala automatically creates any
+      required folders as the databases, tables, and partitions are created, and removes
+      them when they are dropped.
+    </p>
+
+    <p class="p">
+        In <span class="keyword">Impala 2.6</span> and higher, Impala DDL statements such as
+        <code class="ph codeph">CREATE DATABASE</code>, <code class="ph codeph">CREATE TABLE</code>, <code class="ph codeph">DROP DATABASE CASCADE</code>,
+        <code class="ph codeph">DROP TABLE</code>, and <code class="ph codeph">ALTER TABLE [ADD|DROP] PARTITION</code> can create or remove folders
+        as needed in the Amazon S3 system. Prior to <span class="keyword">Impala 2.6</span>, you had to create folders yourself and point
+        Impala database, tables, or partitions at them, and manually remove folders when no longer needed.
+        See <a class="xref" href="../shared/../topics/impala_s3.html#s3">Using Impala with the Amazon S3 Filesystem</a> for details about reading and writing S3 data with Impala.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Cancellation:</strong> Cannot be cancelled.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">HDFS permissions:</strong>
+      </p>
+    <p class="p">
+      The user ID that the <span class="keyword cmdname">impalad</span> daemon runs under,
+      typically the <code class="ph codeph">impala</code> user, must have write
+      permission for the parent HDFS directory under which the database
+      is located.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+    <pre class="pre codeblock"><code>create database first_db;
+use first_db;
+create table t1 (x int);
+
+create database second_db;
+use second_db;
+-- Each database has its own namespace for tables.
+-- You can reuse the same table names in each database.
+create table t1 (s string);
+
+create database temp;
+
+-- You can either USE a database after creating it,
+-- or qualify all references to the table name with the name of the database.
+-- Here, tables T2 and T3 are both created in the TEMP database.
+
+create table temp.t2 (x int, y int);
+use database temp;
+create table t3 (s string);
+
+-- You cannot drop a database while it is selected by the USE statement.
+drop database temp;
+<em class="ph i">ERROR: AnalysisException: Cannot drop current default database: temp</em>
+
+-- The always-available database 'default' is a convenient one to USE
+-- before dropping a database you created.
+use default;
+
+-- Before dropping a database, first drop all the tables inside it,
+<span class="ph">-- or in <span class="keyword">Impala 2.3</span> and higher use the CASCADE clause.</span>
+drop database temp;
+ERROR: ImpalaRuntimeException: Error making 'dropDatabase' RPC to Hive Metastore:
+CAUSED BY: InvalidOperationException: Database temp is not empty
+show tables in temp;
++------+
+| name |
++------+
+| t3   |
++------+
+
+<span class="ph">-- <span class="keyword">Impala 2.3</span> and higher:</span>
+<span class="ph">drop database temp cascade;</span>
+
+-- Earlier releases:
+drop table temp.t3;
+drop database temp;
+</code></pre>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+    <p class="p">
+      <a class="xref" href="impala_databases.html#databases">Overview of Impala Databases</a>, <a class="xref" href="impala_drop_database.html#drop_database">DROP DATABASE Statement</a>,
+      <a class="xref" href="impala_use.html#use">USE Statement</a>, <a class="xref" href="impala_show.html#show_databases">SHOW DATABASES</a>,
+      <a class="xref" href="impala_tables.html#tables">Overview of Impala Tables</a>
+    </p>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_langref_sql.html">Impala SQL Statements</a></div></div></nav></article></main></body></html>
\ No newline at end of file

[49/51] [partial] incubator-impala git commit: IMPALA-4181 [DOCS] Publish rendered Impala documentation to ASF site

Posted by jb...@apache.org.

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_alter_table.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_alter_table.html b/docs/build/html/topics/impala_alter_table.html
new file mode 100644
index 0000000..5337a50
--- /dev/null
+++ b/docs/build/html/topics/impala_alter_table.html
@@ -0,0 +1,1033 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_langref_sql.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="alter_table"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>ALTER TABLE Statement</title></head><body id="alter_table"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">ALTER TABLE Statement</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      
+      The <code class="ph codeph">ALTER TABLE</code> statement changes the structure or properties of an existing Impala table.
+    </p>
+    <p class="p">
+      In Impala, this is primarily a logical operation that updates the table metadata in the metastore database that Impala
+      shares with Hive. Most <code class="ph codeph">ALTER TABLE</code> operations do not actually rewrite, move, and so on the actual data
+      files. (The <code class="ph codeph">RENAME TO</code> clause is the one exception; it can cause HDFS files to be moved to different paths.)
+      When you do an <code class="ph codeph">ALTER TABLE</code> operation, you typically need to perform corresponding physical filesystem operations,
+      such as rewriting the data files to include extra fields, or converting them to a different file format.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>ALTER TABLE [<var class="keyword varname">old_db_name</var>.]<var class="keyword varname">old_table_name</var> RENAME TO [<var class="keyword varname">new_db_name</var>.]<var class="keyword varname">new_table_name</var>
+
+ALTER TABLE <var class="keyword varname">name</var> ADD COLUMNS (<var class="keyword varname">col_spec</var>[, <var class="keyword varname">col_spec</var> ...])
+ALTER TABLE <var class="keyword varname">name</var> DROP [COLUMN] <var class="keyword varname">column_name</var>
+ALTER TABLE <var class="keyword varname">name</var> CHANGE <var class="keyword varname">column_name</var> <var class="keyword varname">new_name</var> <var class="keyword varname">new_type</var>
+ALTER TABLE <var class="keyword varname">name</var> REPLACE COLUMNS (<var class="keyword varname">col_spec</var>[, <var class="keyword varname">col_spec</var> ...])
+
+ALTER TABLE <var class="keyword varname">name</var> ADD [IF NOT EXISTS] PARTITION (<var class="keyword varname">partition_spec</var>)
+  <span class="ph">[<var class="keyword varname">location_spec</var>]</span>
+  <span class="ph">[<var class="keyword varname">cache_spec</var>]</span>
+<span class="ph">ALTER TABLE <var class="keyword varname">name</var> ADD [IF NOT EXISTS] RANGE PARTITION (<var class="keyword varname">kudu_partition_spec</var>)</span>
+
+ALTER TABLE <var class="keyword varname">name</var> DROP [IF EXISTS] PARTITION (<var class="keyword varname">partition_spec</var>)
+  <span class="ph">[PURGE]</span>
+<span class="ph">ALTER TABLE <var class="keyword varname">name</var> DROP [IF EXISTS] RANGE PARTITION <var class="keyword varname">kudu_partition_spec</var></span>
+
+<span class="ph">ALTER TABLE <var class="keyword varname">name</var> RECOVER PARTITIONS</span>
+
+ALTER TABLE <var class="keyword varname">name</var> [PARTITION (<var class="keyword varname">partition_spec</var>)]
+  SET { FILEFORMAT <var class="keyword varname">file_format</var>
+  | LOCATION '<var class="keyword varname">hdfs_path_of_directory</var>'
+  | TBLPROPERTIES (<var class="keyword varname">table_properties</var>)
+  | SERDEPROPERTIES (<var class="keyword varname">serde_properties</var>) }
+
+<span class="ph">ALTER TABLE <var class="keyword varname">name</var> <var class="keyword varname">colname</var>
+  ('<var class="keyword varname">statsKey</var>'='<var class="keyword varname">val</var>, ...)
+
+statsKey ::= numDVs | numNulls | avgSize | maxSize</span>
+
+<span class="ph">ALTER TABLE <var class="keyword varname">name</var> [PARTITION (<var class="keyword varname">partition_spec</var>)] SET { CACHED IN '<var class="keyword varname">pool_name</var>' <span class="ph">[WITH REPLICATION = <var class="keyword varname">integer</var>]</span> | UNCACHED }</span>
+
+<var class="keyword varname">new_name</var> ::= [<var class="keyword varname">new_database</var>.]<var class="keyword varname">new_table_name</var>
+
+<var class="keyword varname">col_spec</var> ::= <var class="keyword varname">col_name</var> <var class="keyword varname">type_name</var>
+
+<var class="keyword varname">partition_spec</var> ::= <var class="keyword varname">simple_partition_spec</var> | <span class="ph"><var class="keyword varname">complex_partition_spec</var></span>
+
+<var class="keyword varname">simple_partition_spec</var> ::= <var class="keyword varname">partition_col</var>=<var class="keyword varname">constant_value</var>
+
+<span class="ph"><var class="keyword varname">complex_partition_spec</var> ::= <var class="keyword varname">comparison_expression_on_partition_col</var></span>
+
+<span class="ph"><var class="keyword varname">kudu_partition_spec</var> ::= <var class="keyword varname">constant</var> <var class="keyword varname">range_operator</var> VALUES <var class="keyword varname">range_operator</var> <var class="keyword varname">constant</var> | VALUE = <var class="keyword varname">constant</var></span>
+
+<span class="ph">cache_spec ::= CACHED IN '<var class="keyword varname">pool_name</var>' [WITH REPLICATION = <var class="keyword varname">integer</var>] | UNCACHED</span>
+
+<span class="ph">location_spec ::= LOCATION '<var class="keyword varname">hdfs_path_of_directory</var>'</span>
+
+<var class="keyword varname">table_properties</var> ::= '<var class="keyword varname">name</var>'='<var class="keyword varname">value</var>'[, '<var class="keyword varname">name</var>'='<var class="keyword varname">value</var>' ...]
+
+<var class="keyword varname">serde_properties</var> ::= '<var class="keyword varname">name</var>'='<var class="keyword varname">value</var>'[, '<var class="keyword varname">name</var>'='<var class="keyword varname">value</var>' ...]
+
+<var class="keyword varname">file_format</var> ::= { PARQUET | TEXTFILE | RCFILE | SEQUENCEFILE | AVRO }
+</code></pre>
+
+    <p class="p">
+        <strong class="ph b">Statement type:</strong> DDL
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Complex type considerations:</strong>
+      </p>
+
+    <p class="p">
+      In <span class="keyword">Impala 2.3</span> and higher, the <code class="ph codeph">ALTER TABLE</code> statement can
+      change the metadata for tables containing complex types (<code class="ph codeph">ARRAY</code>,
+      <code class="ph codeph">STRUCT</code>, and <code class="ph codeph">MAP</code>).
+      For example, you can use an <code class="ph codeph">ADD COLUMNS</code>, <code class="ph codeph">DROP COLUMN</code>, or <code class="ph codeph">CHANGE</code>
+      clause to modify the table layout for complex type columns.
+      Although Impala queries only work for complex type columns in Parquet tables, the complex type support in the
+      <code class="ph codeph">ALTER TABLE</code> statement applies to all file formats.
+      For example, you can use Impala to update metadata for a staging table in a non-Parquet file format where the
+      data is populated by Hive. Or you can use <code class="ph codeph">ALTER TABLE SET FILEFORMAT</code> to change the format
+      of an existing table to Parquet so that Impala can query it. Remember that changing the file format for a table does
+      not convert the data files within the table; you must prepare any Parquet data files containing complex types
+      outside Impala, and bring them into the table using <code class="ph codeph">LOAD DATA</code> or updating the table's
+      <code class="ph codeph">LOCATION</code> property.
+      See <a class="xref" href="impala_complex_types.html#complex_types">Complex Types (Impala 2.3 or higher only)</a> for details about using complex types.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+    <p class="p">
+      Whenever you specify partitions in an <code class="ph codeph">ALTER TABLE</code> statement, through the <code class="ph codeph">PARTITION
+      (<var class="keyword varname">partition_spec</var>)</code> clause, you must include all the partitioning columns in the
+      specification.
+    </p>
+
+    <p class="p">
+      Most of the <code class="ph codeph">ALTER TABLE</code> operations work the same for internal tables (managed by Impala) as
+      for external tables (with data files located in arbitrary locations). The exception is renaming a table; for
+      an external table, the underlying data directory is not renamed or moved.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Dropping or altering multiple partitions:</strong>
+    </p>
+
+    <p class="p">
+      In <span class="keyword">Impala 2.8</span> and higher,
+      the expression for the partition clause with a <code class="ph codeph">DROP</code> or <code class="ph codeph">SET</code>
+      operation can include comparison operators such as <code class="ph codeph">&lt;</code>, <code class="ph codeph">IN</code>,
+      or <code class="ph codeph">BETWEEN</code>, and Boolean operators such as <code class="ph codeph">AND</code>
+      and <code class="ph codeph">OR</code>.
+    </p>
+
+    <p class="p">
+      For example, you might drop a group of partitions corresponding to a particular date
+      range after the data <span class="q">"ages out"</span>:
+    </p>
+
+<pre class="pre codeblock"><code>
+alter table historical_data drop partition (year &lt; 1995);
+alter table historical_data drop partition (year = 1996 and month between 1 and 6);
+
+</code></pre>
+
+    <p class="p">
+      For tables with multiple partition keys columns, you can specify multiple
+      conditions separated by commas, and the operation only applies to the partitions
+      that match all the conditions (similar to using an <code class="ph codeph">AND</code> clause):
+    </p>
+
+<pre class="pre codeblock"><code>
+alter table historical_data drop partition (year &lt; 1995, last_name like 'A%');
+
+</code></pre>
+
+    <p class="p">
+      This technique can also be used to change the file format of groups of partitions,
+      as part of an ETL pipeline that periodically consolidates and rewrites the underlying
+      data files in a different file format:
+    </p>
+
+<pre class="pre codeblock"><code>
+alter table fast_growing_data partition (year = 2016, month in (10,11,12)) set fileformat parquet;
+
+</code></pre>
+
+    <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+      <p class="p">
+        The extended syntax involving comparison operators and multiple partitions
+        applies to the <code class="ph codeph">SET FILEFORMAT</code>, <code class="ph codeph">SET TBLPROPERTIES</code>,
+        <code class="ph codeph">SET SERDEPROPERTIES</code>, and <code class="ph codeph">SET [UN]CACHED</code> clauses.
+        You can also use this syntax with the <code class="ph codeph">PARTITION</code> clause
+        in the <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> statement, and with the
+        <code class="ph codeph">PARTITION</code> clause of the <code class="ph codeph">SHOW FILES</code> statement.
+        Some forms of <code class="ph codeph">ALTER TABLE</code> still only apply to one partition
+        at a time: the <code class="ph codeph">SET LOCATION</code> and <code class="ph codeph">ADD PARTITION</code>
+        clauses. The <code class="ph codeph">PARTITION</code> clauses in the <code class="ph codeph">LOAD DATA</code>
+        and <code class="ph codeph">INSERT</code> statements also only apply to one partition at a time.
+      </p>
+      <p class="p">
+        A DDL statement that applies to multiple partitions is considered successful
+        (resulting in no changes) even if no partitions match the conditions.
+        The results are the same as if the <code class="ph codeph">IF EXISTS</code> clause was specified.
+      </p>
+      <p class="p">
+        The performance and scalability of this technique is similar to
+        issuing a sequence of single-partition <code class="ph codeph">ALTER TABLE</code>
+        statements in quick succession. To minimize bottlenecks due to
+        communication with the metastore database, or causing other
+        DDL operations on the same table to wait, test the effects of
+        performing <code class="ph codeph">ALTER TABLE</code> statements that affect
+        large numbers of partitions.
+      </p>
+    </div>
+
+    <p class="p">
+        <strong class="ph b">Amazon S3 considerations:</strong>
+      </p>
+
+    <p class="p">
+      You can specify an <code class="ph codeph">s3a://</code> prefix on the <code class="ph codeph">LOCATION</code> attribute of a table or partition
+      to make Impala query data from the Amazon S3 filesystem. In <span class="keyword">Impala 2.6</span> and higher, Impala automatically
+      handles creating or removing the associated folders when you issue <code class="ph codeph">ALTER TABLE</code> statements
+      with the <code class="ph codeph">ADD PARTITION</code> or <code class="ph codeph">DROP PARTITION</code> clauses.
+    </p>
+
+    <p class="p">
+        In <span class="keyword">Impala 2.6</span> and higher, Impala DDL statements such as
+        <code class="ph codeph">CREATE DATABASE</code>, <code class="ph codeph">CREATE TABLE</code>, <code class="ph codeph">DROP DATABASE CASCADE</code>,
+        <code class="ph codeph">DROP TABLE</code>, and <code class="ph codeph">ALTER TABLE [ADD|DROP] PARTITION</code> can create or remove folders
+        as needed in the Amazon S3 system. Prior to <span class="keyword">Impala 2.6</span>, you had to create folders yourself and point
+        Impala database, tables, or partitions at them, and manually remove folders when no longer needed.
+        See <a class="xref" href="../shared/../topics/impala_s3.html#s3">Using Impala with the Amazon S3 Filesystem</a> for details about reading and writing S3 data with Impala.
+      </p>
+
+    <p class="p">
+      <strong class="ph b">HDFS caching (CACHED IN clause):</strong>
+    </p>
+
+    <p class="p">
+      If you specify the <code class="ph codeph">CACHED IN</code> clause, any existing or future data files in the table
+      directory or the partition subdirectories are designated to be loaded into memory with the HDFS caching
+      mechanism. See <a class="xref" href="impala_perf_hdfs_caching.html#hdfs_caching">Using HDFS Caching with Impala (Impala 2.1 or higher only)</a> for details about using the HDFS
+      caching feature.
+    </p>
+
+    <p class="p">
+        In <span class="keyword">Impala 2.2</span> and higher, the optional <code class="ph codeph">WITH REPLICATION</code> clause
+        for <code class="ph codeph">CREATE TABLE</code> and <code class="ph codeph">ALTER TABLE</code> lets you specify
+        a <dfn class="term">replication factor</dfn>, the number of hosts on which to cache the same data blocks.
+        When Impala processes a cached data block, where the cache replication factor is greater than 1, Impala randomly
+        selects a host that has a cached copy of that data block. This optimization avoids excessive CPU
+        usage on a single host when the same cached data block is processed multiple times.
+        Where practical, specify a value greater than or equal to the HDFS block replication factor.
+      </p>
+
+    <p class="p">
+        If you connect to different Impala nodes within an <span class="keyword cmdname">impala-shell</span> session for
+        load-balancing purposes, you can enable the <code class="ph codeph">SYNC_DDL</code> query option to make each DDL
+        statement wait before returning, until the new or changed metadata has been received by all the Impala
+        nodes. See <a class="xref" href="../shared/../topics/impala_sync_ddl.html#sync_ddl">SYNC_DDL Query Option</a> for details.
+      </p>
+
+    <p class="p">
+      The following sections show examples of the use cases for various <code class="ph codeph">ALTER TABLE</code> clauses.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">To rename a table (RENAME TO clause):</strong>
+    </p>
+
+
+
+    <p class="p">
+      The <code class="ph codeph">RENAME TO</code> clause lets you change the name of an existing table, and optionally which
+      database it is located in.
+    </p>
+
+    <p class="p">
+      For internal tables, this operation physically renames the directory within HDFS that contains the data files;
+      the original directory name no longer exists. By qualifying the table names with database names, you can use
+      this technique to move an internal table (and its associated data directory) from one database to another.
+      For example:
+    </p>
+
+<pre class="pre codeblock"><code>create database d1;
+create database d2;
+create database d3;
+use d1;
+create table mobile (x int);
+use d2;
+-- Move table from another database to the current one.
+alter table d1.mobile rename to mobile;
+use d1;
+-- Move table from one database to another.
+alter table d2.mobile rename to d3.mobile;</code></pre>
+
+    <p class="p">
+      For external tables,
+    </p>
+
+    <p class="p">
+      <strong class="ph b">To change the physical location where Impala looks for data files associated with a table or
+      partition:</strong>
+    </p>
+
+<pre class="pre codeblock"><code>ALTER TABLE <var class="keyword varname">table_name</var> [PARTITION (<var class="keyword varname">partition_spec</var>)] SET LOCATION '<var class="keyword varname">hdfs_path_of_directory</var>';</code></pre>
+
+    <p class="p">
+      The path you specify is the full HDFS path where the data files reside, or will be created. Impala does not
+      create any additional subdirectory named after the table. Impala does not move any data files to this new
+      location or change any data files that might already exist in that directory.
+    </p>
+
+    <p class="p">
+      To set the location for a single partition, include the <code class="ph codeph">PARTITION</code> clause. Specify all the
+      same partitioning columns for the table, with a constant value for each, to precisely identify the single
+      partition affected by the statement:
+    </p>
+
+<pre class="pre codeblock"><code>create table p1 (s string) partitioned by (month int, day int);
+-- Each ADD PARTITION clause creates a subdirectory in HDFS.
+alter table p1 add partition (month=1, day=1);
+alter table p1 add partition (month=1, day=2);
+alter table p1 add partition (month=2, day=1);
+alter table p1 add partition (month=2, day=2);
+-- Redirect queries, INSERT, and LOAD DATA for one partition
+-- to a specific different directory.
+alter table p1 partition (month=1, day=1) set location '/usr/external_data/new_years_day';
+</code></pre>
+
+    <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+        If you are creating a partition for the first time and specifying its location, for maximum efficiency, use
+        a single <code class="ph codeph">ALTER TABLE</code> statement including both the <code class="ph codeph">ADD PARTITION</code> and
+        <code class="ph codeph">LOCATION</code> clauses, rather than separate statements with <code class="ph codeph">ADD PARTITION</code> and
+        <code class="ph codeph">SET LOCATION</code> clauses.
+      </div>
+
+    <p class="p">
+      <strong class="ph b">To automatically detect new partition directories added through Hive or HDFS operations:</strong>
+    </p>
+
+    <p class="p">
+      In <span class="keyword">Impala 2.3</span> and higher, the <code class="ph codeph">RECOVER PARTITIONS</code> clause scans
+      a partitioned table to detect if any new partition directories were added outside of Impala,
+      such as by Hive <code class="ph codeph">ALTER TABLE</code> statements or by <span class="keyword cmdname">hdfs dfs</span>
+      or <span class="keyword cmdname">hadoop fs</span> commands. The <code class="ph codeph">RECOVER PARTITIONS</code> clause
+      automatically recognizes any data files present in these new directories, the same as
+      the <code class="ph codeph">REFRESH</code> statement does.
+    </p>
+
+    <p class="p">
+      For example, here is a sequence of examples showing how you might create a partitioned table in Impala,
+      create new partitions through Hive, copy data files into the new partitions with the <span class="keyword cmdname">hdfs</span>
+      command, and have Impala recognize the new partitions and new data:
+    </p>
+
+    <p class="p">
+      In Impala, create the table, and a single partition for demonstration purposes:
+    </p>
+
+<pre class="pre codeblock"><code>
+
+create database recover_partitions;
+use recover_partitions;
+create table t1 (s string) partitioned by (yy int, mm int);
+insert into t1 partition (yy = 2016, mm = 1) values ('Partition exists');
+show files in t1;
++---------------------------------------------------------------------+------+--------------+
+| Path                                                                | Size | Partition    |
++---------------------------------------------------------------------+------+--------------+
+| /user/hive/warehouse/recover_partitions.db/t1/yy=2016/mm=1/data.txt | 17B  | yy=2016/mm=1 |
++---------------------------------------------------------------------+------+--------------+
+quit;
+
+</code></pre>
+
+    <p class="p">
+      In Hive, create some new partitions. In a real use case, you might create the
+      partitions and populate them with data as the final stages of an ETL pipeline.
+    </p>
+
+<pre class="pre codeblock"><code>
+
+hive&gt; use recover_partitions;
+OK
+hive&gt; alter table t1 add partition (yy = 2016, mm = 2);
+OK
+hive&gt; alter table t1 add partition (yy = 2016, mm = 3);
+OK
+hive&gt; quit;
+
+</code></pre>
+
+    <p class="p">
+      For demonstration purposes, manually copy data (a single row) into these
+      new partitions, using manual HDFS operations:
+    </p>
+
+<pre class="pre codeblock"><code>
+
+$ hdfs dfs -ls /user/hive/warehouse/recover_partitions.db/t1/yy=2016/
+Found 3 items
+drwxr-xr-x - impala   hive 0 2016-05-09 16:06 /user/hive/warehouse/recover_partitions.db/t1/yy=2016/mm=1
+drwxr-xr-x - jrussell hive 0 2016-05-09 16:14 /user/hive/warehouse/recover_partitions.db/t1/yy=2016/mm=2
+drwxr-xr-x - jrussell hive 0 2016-05-09 16:13 /user/hive/warehouse/recover_partitions.db/t1/yy=2016/mm=3
+
+$ hdfs dfs -cp /user/hive/warehouse/recover_partitions.db/t1/yy=2016/mm=1/data.txt \
+  /user/hive/warehouse/recover_partitions.db/t1/yy=2016/mm=2/data.txt
+$ hdfs dfs -cp /user/hive/warehouse/recover_partitions.db/t1/yy=2016/mm=1/data.txt \
+  /user/hive/warehouse/recover_partitions.db/t1/yy=2016/mm=3/data.txt
+
+</code></pre>
+
+<pre class="pre codeblock"><code>
+
+hive&gt; select * from t1;
+OK
+Partition exists  2016  1
+Partition exists  2016  2
+Partition exists  2016  3
+hive&gt; quit;
+
+</code></pre>
+
+    <p class="p">
+      In Impala, initially the partitions and data are not visible.
+      Running <code class="ph codeph">ALTER TABLE</code> with the <code class="ph codeph">RECOVER PARTITIONS</code>
+      clause scans the table data directory to find any new partition directories, and
+      the data files inside them:
+    </p>
+
+<pre class="pre codeblock"><code>
+
+select * from t1;
++------------------+------+----+
+| s                | yy   | mm |
++------------------+------+----+
+| Partition exists | 2016 | 1  |
++------------------+------+----+
+
+alter table t1 recover partitions;
+select * from t1;
++------------------+------+----+
+| s                | yy   | mm |
++------------------+------+----+
+| Partition exists | 2016 | 1  |
+| Partition exists | 2016 | 3  |
+| Partition exists | 2016 | 2  |
++------------------+------+----+
+
+</code></pre>
+
+    <p class="p">
+      <strong class="ph b">To change the key-value pairs of the TBLPROPERTIES and SERDEPROPERTIES fields:</strong>
+    </p>
+
+<pre class="pre codeblock"><code>ALTER TABLE <var class="keyword varname">table_name</var> SET TBLPROPERTIES ('<var class="keyword varname">key1</var>'='<var class="keyword varname">value1</var>', '<var class="keyword varname">key2</var>'='<var class="keyword varname">value2</var>'[, ...]);
+ALTER TABLE <var class="keyword varname">table_name</var> SET SERDEPROPERTIES ('<var class="keyword varname">key1</var>'='<var class="keyword varname">value1</var>', '<var class="keyword varname">key2</var>'='<var class="keyword varname">value2</var>'[, ...]);</code></pre>
+
+    <p class="p">
+      The <code class="ph codeph">TBLPROPERTIES</code> clause is primarily a way to associate arbitrary user-specified data items
+      with a particular table.
+    </p>
+
+    <p class="p">
+      The <code class="ph codeph">SERDEPROPERTIES</code> clause sets up metadata defining how tables are read or written, needed
+      in some cases by Hive but not used extensively by Impala. You would use this clause primarily to change the
+      delimiter in an existing text table or partition, by setting the <code class="ph codeph">'serialization.format'</code> and
+      <code class="ph codeph">'field.delim'</code> property values to the new delimiter character:
+    </p>
+
+<pre class="pre codeblock"><code>-- This table begins life as pipe-separated text format.
+create table change_to_csv (s1 string, s2 string) row format delimited fields terminated by '|';
+-- Then we change it to a CSV table.
+alter table change_to_csv set SERDEPROPERTIES ('serialization.format'=',', 'field.delim'=',');
+insert overwrite change_to_csv values ('stop','go'), ('yes','no');
+!hdfs dfs -cat 'hdfs://<var class="keyword varname">hostname</var>:8020/<var class="keyword varname">data_directory</var>/<var class="keyword varname">dbname</var>.db/change_to_csv/<var class="keyword varname">data_file</var>';
+stop,go
+yes,no</code></pre>
+
+    <p class="p">
+      Use the <code class="ph codeph">DESCRIBE FORMATTED</code> statement to see the current values of these properties for an
+      existing table. See <a class="xref" href="impala_create_table.html#create_table">CREATE TABLE Statement</a> for more details about these clauses.
+      See <a class="xref" href="impala_perf_stats.html#perf_table_stats_manual">Setting the NUMROWS Value Manually through ALTER TABLE</a> for an example of using table properties to
+      fine-tune the performance-related table statistics.
+    </p>
+      
+    <p class="p">
+      <strong class="ph b">To manually set or update table or column statistics:</strong>
+    </p>
+
+    <p class="p">
+      Although for most tables the <code class="ph codeph">COMPUTE STATS</code> or <code class="ph codeph">COMPUTE INCREMENTAL STATS</code>
+      statement is all you need to keep table and column statistics up to date for a table,
+      sometimes for a very large table or one that is updated frequently, the length of time to recompute
+      all the statistics might make it impractical to run those statements as often as needed.
+      As a workaround, you can use the <code class="ph codeph">ALTER TABLE</code> statement to set table statistics
+      at the level of the entire table or a single partition, or column statistics at the level of
+      the entire table.
+    </p>
+
+    <div class="p">
+      You can set the <code class="ph codeph">numrows</code> value for table statistics by changing the
+      <code class="ph codeph">TBLPROPERTIES</code> setting for a table or partition.
+      For example:
+<pre class="pre codeblock"><code>create table analysis_data stored as parquet as select * from raw_data;
+Inserted 1000000000 rows in 181.98s
+compute stats analysis_data;
+insert into analysis_data select * from smaller_table_we_forgot_before;
+Inserted 1000000 rows in 15.32s
+-- Now there are 1001000000 rows. We can update this single data point in the stats.
+alter table analysis_data set tblproperties('numRows'='1001000000', 'STATS_GENERATED_VIA_STATS_TASK'='true');</code></pre>
+<pre class="pre codeblock"><code>-- If the table originally contained 1 million rows, and we add another partition with 30 thousand rows,
+-- change the numRows property for the partition and the overall table.
+alter table partitioned_data partition(year=2009, month=4) set tblproperties ('numRows'='30000', 'STATS_GENERATED_VIA_STATS_TASK'='true');
+alter table partitioned_data set tblproperties ('numRows'='1030000', 'STATS_GENERATED_VIA_STATS_TASK'='true');</code></pre>
+      See <a class="xref" href="impala_perf_stats.html#perf_table_stats_manual">Setting the NUMROWS Value Manually through ALTER TABLE</a> for details.
+    </div>
+
+    <p class="p">
+      In <span class="keyword">Impala 2.6</span> and higher, you can use the <code class="ph codeph">SET COLUMN STATS</code> clause
+      to set a specific stats value for a particular column.
+    </p>
+
+    <div class="p">
+        You specify a case-insensitive symbolic name for the kind of statistics:
+        <code class="ph codeph">numDVs</code>, <code class="ph codeph">numNulls</code>, <code class="ph codeph">avgSize</code>, <code class="ph codeph">maxSize</code>.
+        The key names and values are both quoted. This operation applies to an entire table,
+        not a specific partition. For example:
+<pre class="pre codeblock"><code>
+create table t1 (x int, s string);
+insert into t1 values (1, 'one'), (2, 'two'), (2, 'deux');
+show column stats t1;
++--------+--------+------------------+--------+----------+----------+
+| Column | Type   | #Distinct Values | #Nulls | Max Size | Avg Size |
++--------+--------+------------------+--------+----------+----------+
+| x      | INT    | -1               | -1     | 4        | 4        |
+| s      | STRING | -1               | -1     | -1       | -1       |
++--------+--------+------------------+--------+----------+----------+
+alter table t1 set column stats x ('numDVs'='2','numNulls'='0');
+alter table t1 set column stats s ('numdvs'='3','maxsize'='4');
+show column stats t1;
++--------+--------+------------------+--------+----------+----------+
+| Column | Type   | #Distinct Values | #Nulls | Max Size | Avg Size |
++--------+--------+------------------+--------+----------+----------+
+| x      | INT    | 2                | 0      | 4        | 4        |
+| s      | STRING | 3                | -1     | 4        | -1       |
++--------+--------+------------------+--------+----------+----------+
+</code></pre>
+      </div>
+
+    <p class="p">
+      <strong class="ph b">To reorganize columns for a table:</strong>
+    </p>
+
+<pre class="pre codeblock"><code>ALTER TABLE <var class="keyword varname">table_name</var> ADD COLUMNS (<var class="keyword varname">column_defs</var>);
+ALTER TABLE <var class="keyword varname">table_name</var> REPLACE COLUMNS (<var class="keyword varname">column_defs</var>);
+ALTER TABLE <var class="keyword varname">table_name</var> CHANGE <var class="keyword varname">column_name</var> <var class="keyword varname">new_name</var> <var class="keyword varname">new_type</var>;
+ALTER TABLE <var class="keyword varname">table_name</var> DROP <var class="keyword varname">column_name</var>;</code></pre>
+
+    <p class="p">
+      The <var class="keyword varname">column_spec</var> is the same as in the <code class="ph codeph">CREATE TABLE</code> statement: the column
+      name, then its data type, then an optional comment. You can add multiple columns at a time. The parentheses
+      are required whether you add a single column or multiple columns. When you replace columns, all the original
+      column definitions are discarded. You might use this technique if you receive a new set of data files with
+      different data types or columns in a different order. (The data files are retained, so if the new columns are
+      incompatible with the old ones, use <code class="ph codeph">INSERT OVERWRITE</code> or <code class="ph codeph">LOAD DATA OVERWRITE</code>
+      to replace all the data before issuing any further queries.)
+    </p>
+
+    <p class="p">
+      For example, here is how you might add columns to an existing table.
+      The first <code class="ph codeph">ALTER TABLE</code> adds two new columns, and the second
+      <code class="ph codeph">ALTER TABLE</code> adds one new column.
+      A single Impala query reads both the old and new data files, containing different numbers of columns.
+      For any columns not present in a particular data file, all the column values are
+      considered to be <code class="ph codeph">NULL</code>.
+    </p>
+
+<pre class="pre codeblock"><code>
+create table t1 (x int);
+insert into t1 values (1), (2);
+
+alter table t1 add columns (s string, t timestamp);
+insert into t1 values (3, 'three', now());
+
+alter table t1 add columns (b boolean);
+insert into t1 values (4, 'four', now(), true);
+
+select * from t1 order by x;
++---+-------+-------------------------------+------+
+| x | s     | t                             | b    |
++---+-------+-------------------------------+------+
+| 1 | NULL  | NULL                          | NULL |
+| 2 | NULL  | NULL                          | NULL |
+| 3 | three | 2016-05-11 11:19:45.054457000 | NULL |
+| 4 | four  | 2016-05-11 11:20:20.260733000 | true |
++---+-------+-------------------------------+------+
+</code></pre>
+
+    <p class="p">
+      You might use the <code class="ph codeph">CHANGE</code> clause to rename a single column, or to treat an existing column as
+      a different type than before, such as to switch between treating a column as <code class="ph codeph">STRING</code> and
+      <code class="ph codeph">TIMESTAMP</code>, or between <code class="ph codeph">INT</code> and <code class="ph codeph">BIGINT</code>. You can only drop a
+      single column at a time; to drop multiple columns, issue multiple <code class="ph codeph">ALTER TABLE</code> statements, or
+      define the new set of columns with a single <code class="ph codeph">ALTER TABLE ... REPLACE COLUMNS</code> statement.
+    </p>
+
+    <p class="p">
+      The following examples show some safe operations to drop or change columns. Dropping the final column
+      in a table lets Impala ignore the data causing any disruption to existing data files. Changing the type
+      of a column works if existing data values can be safely converted to the new type. The type conversion
+      rules depend on the file format of the underlying table. For example, in a text table, the same value
+      can be interpreted as a <code class="ph codeph">STRING</code> or a numeric value, while in a binary format such as
+      Parquet, the rules are stricter and type conversions only work between certain sizes of integers.
+    </p>
+
+<pre class="pre codeblock"><code>
+create table optional_columns (x int, y int, z int, a1 int, a2 int);
+insert into optional_columns values (1,2,3,0,0), (2,3,4,100,100);
+
+-- When the last column in the table is dropped, Impala ignores the
+-- values that are no longer needed. (Dropping A1 but leaving A2
+-- would cause problems, as we will see in a subsequent example.)
+alter table optional_columns drop column a2;
+alter table optional_columns drop column a1;
+
+select * from optional_columns;
++---+---+---+
+| x | y | z |
++---+---+---+
+| 1 | 2 | 3 |
+| 2 | 3 | 4 |
++---+---+---+
+</code></pre>
+
+<pre class="pre codeblock"><code>
+create table int_to_string (s string, x int);
+insert into int_to_string values ('one', 1), ('two', 2);
+
+-- What was an INT column will now be interpreted as STRING.
+-- This technique works for text tables but not other file formats.
+-- The second X represents the new name of the column, which we keep the same.
+alter table int_to_string change x x string;
+
+-- Once the type is changed, we can insert non-integer values into the X column
+-- and treat that column as a string, for example by uppercasing or concatenating.
+insert into int_to_string values ('three', 'trois');
+select s, upper(x) from int_to_string;
++-------+----------+
+| s     | upper(x) |
++-------+----------+
+| one   | 1        |
+| two   | 2        |
+| three | TROIS    |
++-------+----------+
+</code></pre>
+
+    <p class="p">
+      Remember that Impala does not actually do any conversion for the underlying data files as a result of
+      <code class="ph codeph">ALTER TABLE</code> statements. If you use <code class="ph codeph">ALTER TABLE</code> to create a table
+      layout that does not agree with the contents of the underlying files, you must replace the files
+      yourself, such as using <code class="ph codeph">LOAD DATA</code> to load a new set of data files, or
+      <code class="ph codeph">INSERT OVERWRITE</code> to copy from another table and replace the original data.
+    </p>
+
+    <p class="p">
+      The following example shows what happens if you delete the middle column from a Parquet table containing three columns.
+      The underlying data files still contain three columns of data. Because the columns are interpreted based on their positions in
+      the data file instead of the specific column names, a <code class="ph codeph">SELECT *</code> query now reads the first and second
+      columns from the data file, potentially leading to unexpected results or conversion errors.
+      For this reason, if you expect to someday drop a column, declare it as the last column in the table, where its data
+      can be ignored by queries after the column is dropped. Or, re-run your ETL process and create new data files
+      if you drop or change the type of a column in a way that causes problems with existing data files.
+    </p>
+
+<pre class="pre codeblock"><code>
+-- Parquet table showing how dropping a column can produce unexpected results.
+create table p1 (s1 string, s2 string, s3 string) stored as parquet;
+
+insert into p1 values ('one', 'un', 'uno'), ('two', 'deux', 'dos'),
+  ('three', 'trois', 'tres');
+select * from p1;
++-------+-------+------+
+| s1    | s2    | s3   |
++-------+-------+------+
+| one   | un    | uno  |
+| two   | deux  | dos  |
+| three | trois | tres |
++-------+-------+------+
+
+alter table p1 drop column s2;
+-- The S3 column contains unexpected results.
+-- Because S2 and S3 have compatible types, the query reads
+-- values from the dropped S2, because the existing data files
+-- still contain those values as the second column.
+select * from p1;
++-------+-------+
+| s1    | s3    |
++-------+-------+
+| one   | un    |
+| two   | deux  |
+| three | trois |
++-------+-------+
+</code></pre>
+
+<pre class="pre codeblock"><code>
+-- Parquet table showing how dropping a column can produce conversion errors.
+create table p2 (s1 string, x int, s3 string) stored as parquet;
+
+insert into p2 values ('one', 1, 'uno'), ('two', 2, 'dos'), ('three', 3, 'tres');
+select * from p2;
++-------+---+------+
+| s1    | x | s3   |
++-------+---+------+
+| one   | 1 | uno  |
+| two   | 2 | dos  |
+| three | 3 | tres |
++-------+---+------+
+
+alter table p2 drop column x;
+select * from p2;
+WARNINGS: 
+File '<var class="keyword varname">hdfs_filename</var>' has an incompatible Parquet schema for column 'add_columns.p2.s3'.
+Column type: STRING, Parquet schema:
+optional int32 x [i:1 d:1 r:0]
+
+File '<var class="keyword varname">hdfs_filename</var>' has an incompatible Parquet schema for column 'add_columns.p2.s3'.
+Column type: STRING, Parquet schema:
+optional int32 x [i:1 d:1 r:0]
+</code></pre>
+
+    <p class="p">
+      In <span class="keyword">Impala 2.6</span> and higher, if an Avro table is created without column definitions in the
+      <code class="ph codeph">CREATE TABLE</code> statement, and columns are later
+      added through <code class="ph codeph">ALTER TABLE</code>, the resulting
+      table is now queryable. Missing values from the newly added
+      columns now default to <code class="ph codeph">NULL</code>.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">To change the file format that Impala expects data to be in, for a table or partition:</strong>
+    </p>
+
+    <p class="p">
+      Use an <code class="ph codeph">ALTER TABLE ... SET FILEFORMAT</code> clause. You can include an optional <code class="ph codeph">PARTITION
+      (<var class="keyword varname">col1</var>=<var class="keyword varname">val1</var>, <var class="keyword varname">col2</var>=<var class="keyword varname">val2</var>,
+      ...</code> clause so that the file format is changed for a specific partition rather than the entire table.
+    </p>
+
+    <p class="p">
+      Because this operation only changes the table metadata, you must do any conversion of existing data using
+      regular Hadoop techniques outside of Impala. Any new data created by the Impala <code class="ph codeph">INSERT</code>
+      statement will be in the new format. You cannot specify the delimiter for Text files; the data files must be
+      comma-delimited.
+
+    </p>
+
+    <p class="p">
+      To set the file format for a single partition, include the <code class="ph codeph">PARTITION</code> clause. Specify all the
+      same partitioning columns for the table, with a constant value for each, to precisely identify the single
+      partition affected by the statement:
+    </p>
+
+<pre class="pre codeblock"><code>create table p1 (s string) partitioned by (month int, day int);
+-- Each ADD PARTITION clause creates a subdirectory in HDFS.
+alter table p1 add partition (month=1, day=1);
+alter table p1 add partition (month=1, day=2);
+alter table p1 add partition (month=2, day=1);
+alter table p1 add partition (month=2, day=2);
+-- Queries and INSERT statements will read and write files
+-- in this format for this specific partition.
+alter table p1 partition (month=2, day=2) set fileformat parquet;
+</code></pre>
+
+    <p class="p">
+      <strong class="ph b">To add or drop partitions for a table</strong>, the table must already be partitioned (that is, created with a
+      <code class="ph codeph">PARTITIONED BY</code> clause). The partition is a physical directory in HDFS, with a name that
+      encodes a particular column value (the <strong class="ph b">partition key</strong>). The Impala <code class="ph codeph">INSERT</code> statement
+      already creates the partition if necessary, so the <code class="ph codeph">ALTER TABLE ... ADD PARTITION</code> is
+      primarily useful for importing data by moving or copying existing data files into the HDFS directory
+      corresponding to a partition. (You can use the <code class="ph codeph">LOAD DATA</code> statement to move files into the
+      partition directory, or <code class="ph codeph">ALTER TABLE ... PARTITION (...) SET LOCATION</code> to point a partition at
+      a directory that already contains data files.
+    </p>
+
+    <p class="p">
+      The <code class="ph codeph">DROP PARTITION</code> clause is used to remove the HDFS directory and associated data files for
+      a particular set of partition key values; for example, if you always analyze the last 3 months worth of data,
+      at the beginning of each month you might drop the oldest partition that is no longer needed. Removing
+      partitions reduces the amount of metadata associated with the table and the complexity of calculating the
+      optimal query plan, which can simplify and speed up queries on partitioned tables, particularly join queries.
+      Here is an example showing the <code class="ph codeph">ADD PARTITION</code> and <code class="ph codeph">DROP PARTITION</code> clauses.
+    </p>
+
+    <p class="p">
+      To avoid errors while adding or dropping partitions whose existence is not certain,
+      add the optional <code class="ph codeph">IF [NOT] EXISTS</code> clause between the <code class="ph codeph">ADD</code> or
+      <code class="ph codeph">DROP</code> keyword and the <code class="ph codeph">PARTITION</code> keyword. That is, the entire
+      clause becomes <code class="ph codeph">ADD IF NOT EXISTS PARTITION</code> or <code class="ph codeph">DROP IF EXISTS PARTITION</code>.
+      The following example shows how partitions can be created automatically through <code class="ph codeph">INSERT</code>
+      statements, or manually through <code class="ph codeph">ALTER TABLE</code> statements. The <code class="ph codeph">IF [NOT] EXISTS</code>
+      clauses let the <code class="ph codeph">ALTER TABLE</code> statements succeed even if a new requested partition already
+      exists, or a partition to be dropped does not exist.
+    </p>
+
+<p class="p">
+Inserting 2 year values creates 2 partitions:
+</p>
+
+<pre class="pre codeblock"><code>
+create table partition_t (s string) partitioned by (y int);
+insert into partition_t (s,y) values ('two thousand',2000), ('nineteen ninety',1990);
+show partitions partition_t;
++-------+-------+--------+------+--------------+-------------------+--------+-------------------+
+| y     | #Rows | #Files | Size | Bytes Cached | Cache Replication | Format | Incremental stats |
++-------+-------+--------+------+--------------+-------------------+--------+-------+
+| 1990  | -1    | 1      | 16B  | NOT CACHED   | NOT CACHED        | TEXT   | false |
+| 2000  | -1    | 1      | 13B  | NOT CACHED   | NOT CACHED        | TEXT   | false |
+| Total | -1    | 2      | 29B  | 0B           |                   |        |       |
++-------+-------+--------+------+--------------+-------------------+--------+-------+
+</code></pre>
+
+<p class="p">
+Without the <code class="ph codeph">IF NOT EXISTS</code> clause, an attempt to add a new partition might fail:
+</p>
+
+<pre class="pre codeblock"><code>
+alter table partition_t add partition (y=2000);
+ERROR: AnalysisException: Partition spec already exists: (y=2000).
+</code></pre>
+
+<p class="p">
+The <code class="ph codeph">IF NOT EXISTS</code> clause makes the statement succeed whether or not there was already a
+partition with the specified key value:
+</p>
+
+<pre class="pre codeblock"><code>
+alter table partition_t add if not exists partition (y=2000);
+alter table partition_t add if not exists partition (y=2010);
+show partitions partition_t;
++-------+-------+--------+------+--------------+-------------------+--------+-------------------+
+| y     | #Rows | #Files | Size | Bytes Cached | Cache Replication | Format | Incremental stats |
++-------+-------+--------+------+--------------+-------------------+--------+-------+
+| 1990  | -1    | 1      | 16B  | NOT CACHED   | NOT CACHED        | TEXT   | false |
+| 2000  | -1    | 1      | 13B  | NOT CACHED   | NOT CACHED        | TEXT   | false |
+| 2010  | -1    | 0      | 0B   | NOT CACHED   | NOT CACHED        | TEXT   | false |
+| Total | -1    | 2      | 29B  | 0B           |                   |        |       |
++-------+-------+--------+------+--------------+-------------------+--------+-------+
+</code></pre>
+
+<p class="p">
+Likewise, the <code class="ph codeph">IF EXISTS</code> clause lets <code class="ph codeph">DROP PARTITION</code> succeed whether or not the partition is already
+in the table:
+</p>
+
+<pre class="pre codeblock"><code>
+alter table partition_t drop if exists partition (y=2000);
+alter table partition_t drop if exists partition (y=1950);
+show partitions partition_t;
++-------+-------+--------+------+--------------+-------------------+--------+-------------------+
+| y     | #Rows | #Files | Size | Bytes Cached | Cache Replication | Format | Incremental stats |
++-------+-------+--------+------+--------------+-------------------+--------+-------+
+| 1990  | -1    | 1      | 16B  | NOT CACHED   | NOT CACHED        | TEXT   | false |
+| 2010  | -1    | 0      | 0B   | NOT CACHED   | NOT CACHED        | TEXT   | false |
+| Total | -1    | 1      | 16B  | 0B           |                   |        |       |
++-------+-------+--------+------+--------------+-------------------+--------+-------+
+</code></pre>
+
+    <p class="p"> The optional <code class="ph codeph">PURGE</code> keyword, available in
+      <span class="keyword">Impala 2.3</span> and higher, is used with the <code class="ph codeph">DROP
+        PARTITION</code> clause to remove associated HDFS data files
+      immediately rather than going through the HDFS trashcan mechanism. Use
+      this keyword when dropping a partition if it is crucial to remove the data
+      as quickly as possible to free up space, or if there is a problem with the
+      trashcan, such as the trash cannot being configured or being in a
+      different HDFS encryption zone than the data files. </p>
+
+    
+
+<pre class="pre codeblock"><code>-- Create an empty table and define the partitioning scheme.
+create table part_t (x int) partitioned by (month int);
+-- Create an empty partition into which you could copy data files from some other source.
+alter table part_t add partition (month=1);
+-- After changing the underlying data, issue a REFRESH statement to make the data visible in Impala.
+refresh part_t;
+-- Later, do the same for the next month.
+alter table part_t add partition (month=2);
+
+-- Now you no longer need the older data.
+alter table part_t drop partition (month=1);
+-- If the table was partitioned by month and year, you would issue a statement like:
+-- alter table part_t drop partition (year=2003,month=1);
+-- which would require 12 ALTER TABLE statements to remove a year's worth of data.
+
+-- If the data files for subsequent months were in a different file format,
+-- you could set a different file format for the new partition as you create it.
+alter table part_t add partition (month=3) set fileformat=parquet;
+</code></pre>
+
+    <p class="p">
+      The value specified for a partition key can be an arbitrary constant expression, without any references to
+      columns. For example:
+    </p>
+
+<pre class="pre codeblock"><code>alter table time_data add partition (month=concat('Decem','ber'));
+alter table sales_data add partition (zipcode = cast(9021 * 10 as string));</code></pre>
+
+    <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+      <p class="p">
+        An alternative way to reorganize a table and its associated data files is to use <code class="ph codeph">CREATE
+        TABLE</code> to create a variation of the original table, then use <code class="ph codeph">INSERT</code> to copy the
+        transformed or reordered data to the new table. The advantage of <code class="ph codeph">ALTER TABLE</code> is that it
+        avoids making a duplicate copy of the data files, allowing you to reorganize huge volumes of data in a
+        space-efficient way using familiar Hadoop techniques.
+      </p>
+    </div>
+
+    <p class="p">
+      <strong class="ph b">To switch a table between internal and external:</strong>
+    </p>
+
+    <div class="p">
+        You can switch a table from internal to external, or from external to internal, by using the <code class="ph codeph">ALTER
+        TABLE</code> statement:
+<pre class="pre codeblock"><code>
+-- Switch a table from internal to external.
+ALTER TABLE <var class="keyword varname">table_name</var> SET TBLPROPERTIES('EXTERNAL'='TRUE');
+
+-- Switch a table from external to internal.
+ALTER TABLE <var class="keyword varname">table_name</var> SET TBLPROPERTIES('EXTERNAL'='FALSE');
+</code></pre>
+      </div>
+
+    <p class="p">
+        <strong class="ph b">Cancellation:</strong> Cannot be cancelled.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">HDFS permissions:</strong>
+      </p>
+    <p class="p">
+      Most <code class="ph codeph">ALTER TABLE</code> clauses do not actually
+      read or write any HDFS files, and so do not depend on
+      specific HDFS permissions. For example, the <code class="ph codeph">SET FILEFORMAT</code>
+      clause does not actually check the file format existing data files or
+      convert them to the new format, and the <code class="ph codeph">SET LOCATION</code> clause
+      does not require any special permissions on the new location.
+      (Any permission-related failures would come later, when you
+      actually query or insert into the table.)
+    </p>
+
+
+    <p class="p">
+      In general, <code class="ph codeph">ALTER TABLE</code> clauses that do touch
+      HDFS files and directories require the same HDFS permissions
+      as corresponding <code class="ph codeph">CREATE</code>, <code class="ph codeph">INSERT</code>,
+      or <code class="ph codeph">SELECT</code> statements.
+      The permissions allow
+      the user ID that the <span class="keyword cmdname">impalad</span> daemon runs under,
+      typically the <code class="ph codeph">impala</code> user, to read or write
+      files or directories, or (in the case of the execute bit) descend into a directory.
+      The <code class="ph codeph">RENAME TO</code> clause requires read, write, and execute permission in the
+      source and destination database directories and in the table data directory,
+      and read and write permission for the data files within the table.
+      The <code class="ph codeph">ADD PARTITION</code> and <code class="ph codeph">DROP PARTITION</code> clauses
+      require write and execute permissions for the associated partition directory.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Kudu considerations:</strong>
+      </p>
+
+    <div class="p">
+      Because of the extra constraints and features of Kudu tables, such as the <code class="ph codeph">NOT NULL</code>
+      and <code class="ph codeph">DEFAULT</code> attributes for columns, <code class="ph codeph">ALTER TABLE</code> has specific
+      requirements related to Kudu tables:
+      <ul class="ul">
+        <li class="li">
+          <p class="p">
+            In an <code class="ph codeph">ADD COLUMNS</code> operation, you can specify the <code class="ph codeph">NULL</code>,
+            <code class="ph codeph">NOT NULL</code>, and <code class="ph codeph">DEFAULT <var class="keyword varname">default_value</var></code>
+            column attributes.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            If you add a column with a <code class="ph codeph">NOT NULL</code> attribute, it must also have a
+            <code class="ph codeph">DEFAULT</code> attribute, so the default value can be assigned to that
+            column for all existing rows.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            The <code class="ph codeph">DROP COLUMN</code> clause works the same for a Kudu table as for other
+            kinds of tables.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            Although you can change the name of a column with the <code class="ph codeph">CHANGE</code> clause,
+            you cannot change the type of a column in a Kudu table.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            You cannot assign the <code class="ph codeph">ENCODING</code>, <code class="ph codeph">COMPRESSION</code>,
+            or <code class="ph codeph">BLOCK_SIZE</code> attributes when adding a column.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            You cannot change the default value, nullability, encoding, compression, or block size
+            of existing columns in a Kudu table.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            You cannot use the <code class="ph codeph">REPLACE COLUMNS</code> clause with a Kudu table.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            The <code class="ph codeph">RENAME TO</code> clause for a Kudu table only affects the name stored in the
+            metastore database that Impala uses to refer to the table. To change which underlying Kudu
+            table is associated with an Impala table name, you must change the <code class="ph codeph">TBLPROPERTIES</code>
+            property of the table: <code class="ph codeph">SET TBLPROPERTIES('kudu.table_name'='<var class="keyword varname">kudu_tbl_name</var>)</code>.
+            Doing so causes Kudu to change the name of the underlying Kudu table.
+          </p>
+        </li>
+      </ul>
+    </div>
+
+    <p class="p">
+      Kudu tables all use an underlying partitioning mechanism. The partition syntax is different than for non-Kudu
+      tables. You can use the <code class="ph codeph">ALTER TABLE</code> statement to add and drop <dfn class="term">range partitions</dfn>
+      from a Kudu table. Any new range must not overlap with any existing ranges. Dropping a range removes all the associated
+      rows from the table. See <a class="xref" href="impala_kudu.html#kudu_partitioning">Partitioning for Kudu Tables</a> for details.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+    <p class="p">
+      <a class="xref" href="impala_tables.html#tables">Overview of Impala Tables</a>,
+      <a class="xref" href="impala_create_table.html#create_table">CREATE TABLE Statement</a>, <a class="xref" href="impala_drop_table.html#drop_table">DROP TABLE Statement</a>,
+      <a class="xref" href="impala_partitioning.html#partitioning">Partitioning for Impala Tables</a>, <a class="xref" href="impala_tables.html#internal_tables">Internal Tables</a>,
+      <a class="xref" href="impala_tables.html#external_tables">External Tables</a>
+    </p>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_langref_sql.html">Impala SQL Statements</a></div></div></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_alter_view.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_alter_view.html b/docs/build/html/topics/impala_alter_view.html
new file mode 100644
index 0000000..70cf3a7
--- /dev/null
+++ b/docs/build/html/topics/impala_alter_view.html
@@ -0,0 +1,139 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_langref_sql.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="alter_view"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>ALTER VIEW Statement</title></head><body id="alter_view"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">ALTER VIEW Statement</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      
+      Changes the characteristics of a view. The syntax has two forms:
+    </p>
+
+    <ul class="ul">
+      <li class="li">
+        The <code class="ph codeph">AS</code> clause associates the view with a different query.
+      </li>
+      <li class="li">
+        The <code class="ph codeph">RENAME TO</code> clause changes the name of the view, moves the view to
+        a different database, or both.
+      </li>
+    </ul>
+
+    <p class="p">
+      Because a view is purely a logical construct (an alias for a query) with no physical data behind it,
+      <code class="ph codeph">ALTER VIEW</code> only involves changes to metadata in the metastore database, not any data files
+      in HDFS.
+    </p>
+
+
+
+
+
+
+
+
+
+    <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>ALTER VIEW [<var class="keyword varname">database_name</var>.]<var class="keyword varname">view_name</var> AS <var class="keyword varname">select_statement</var>
+ALTER VIEW [<var class="keyword varname">database_name</var>.]<var class="keyword varname">view_name</var> RENAME TO [<var class="keyword varname">database_name</var>.]<var class="keyword varname">view_name</var></code></pre>
+
+    <p class="p">
+        <strong class="ph b">Statement type:</strong> DDL
+      </p>
+
+    <p class="p">
+        If you connect to different Impala nodes within an <span class="keyword cmdname">impala-shell</span> session for
+        load-balancing purposes, you can enable the <code class="ph codeph">SYNC_DDL</code> query option to make each DDL
+        statement wait before returning, until the new or changed metadata has been received by all the Impala
+        nodes. See <a class="xref" href="../shared/../topics/impala_sync_ddl.html#sync_ddl">SYNC_DDL Query Option</a> for details.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Security considerations:</strong>
+      </p>
+    <p class="p">
+        If these statements in your environment contain sensitive literal values such as credit card numbers or tax
+        identifiers, Impala can redact this sensitive information when displaying the statements in log files and
+        other administrative contexts. See <span class="xref">the documentation for your Apache Hadoop distribution</span> for details.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Cancellation:</strong> Cannot be cancelled.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">HDFS permissions:</strong> This statement does not touch any HDFS files or directories,
+        therefore no HDFS permissions are required.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>create table t1 (x int, y int, s string);
+create table t2 like t1;
+create view v1 as select * from t1;
+alter view v1 as select * from t2;
+alter view v1 as select x, upper(s) s from t2;</code></pre>
+
+
+
+    <div class="p">
+        To see the definition of a view, issue a <code class="ph codeph">DESCRIBE FORMATTED</code> statement, which shows the
+        query from the original <code class="ph codeph">CREATE VIEW</code> statement:
+<pre class="pre codeblock"><code>[localhost:21000] &gt; create view v1 as select * from t1;
+[localhost:21000] &gt; describe formatted v1;
+Query finished, fetching results ...
++------------------------------+------------------------------+------------+
+| name                         | type                         | comment    |
++------------------------------+------------------------------+------------+
+| # col_name                   | data_type                    | comment    |
+|                              | NULL                         | NULL       |
+| x                            | int                          | None       |
+| y                            | int                          | None       |
+| s                            | string                       | None       |
+|                              | NULL                         | NULL       |
+| # Detailed Table Information | NULL                         | NULL       |
+| Database:                    | views                        | NULL       |
+| Owner:                       | doc_demo                     | NULL       |
+| CreateTime:                  | Mon Jul 08 15:56:27 EDT 2013 | NULL       |
+| LastAccessTime:              | UNKNOWN                      | NULL       |
+| Protect Mode:                | None                         | NULL       |
+| Retention:                   | 0                            | NULL       |
+<strong class="ph b">| Table Type:                  | VIRTUAL_VIEW                 | NULL       |</strong>
+| Table Parameters:            | NULL                         | NULL       |
+|                              | transient_lastDdlTime        | 1373313387 |
+|                              | NULL                         | NULL       |
+| # Storage Information        | NULL                         | NULL       |
+| SerDe Library:               | null                         | NULL       |
+| InputFormat:                 | null                         | NULL       |
+| OutputFormat:                | null                         | NULL       |
+| Compressed:                  | No                           | NULL       |
+| Num Buckets:                 | 0                            | NULL       |
+| Bucket Columns:              | []                           | NULL       |
+| Sort Columns:                | []                           | NULL       |
+|                              | NULL                         | NULL       |
+| # View Information           | NULL                         | NULL       |
+<strong class="ph b">| View Original Text:          | SELECT * FROM t1             | NULL       |
+| View Expanded Text:          | SELECT * FROM t1             | NULL       |</strong>
++------------------------------+------------------------------+------------+
+</code></pre>
+      </div>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+    <p class="p">
+      <a class="xref" href="impala_views.html#views">Overview of Impala Views</a>, <a class="xref" href="impala_create_view.html#create_view">CREATE VIEW Statement</a>,
+      <a class="xref" href="impala_drop_view.html#drop_view">DROP VIEW Statement</a>
+    </p>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_langref_sql.html">Impala SQL Statements</a></div></div></nav></article></main></body></html>
\ No newline at end of file

[12/51] [partial] incubator-impala git commit: IMPALA-4181 [DOCS] Publish rendered Impala documentation to ASF site

Posted by jb...@apache.org.

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_proxy.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_proxy.html b/docs/build/html/topics/impala_proxy.html
new file mode 100644
index 0000000..d29dfc6
--- /dev/null
+++ b/docs/build/html/topics/impala_proxy.html
@@ -0,0 +1,396 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_admin.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="proxy"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Using Impala through a Proxy for High Availability</title></head><body id="proxy"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Using Impala through a Proxy for High Availability</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      For most clusters that have multiple users and production availability requirements, you might set up a proxy
+      server to relay requests to and from Impala.
+    </p>
+
+    <p class="p">
+      Currently, the Impala statestore mechanism does not include such proxying and load-balancing features. Set up
+      a software package of your choice to perform these functions.
+    </p>
+
+    <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+      <p class="p">
+        Most considerations for load balancing and high availability apply to the <span class="keyword cmdname">impalad</span> daemon.
+        The <span class="keyword cmdname">statestored</span> and <span class="keyword cmdname">catalogd</span> daemons do not have special
+        requirements for high availability, because problems with those daemons do not result in data loss.
+        If those daemons become unavailable due to an outage on a particular
+        host, you can stop the Impala service, delete the <span class="ph uicontrol">Impala StateStore</span> and
+        <span class="ph uicontrol">Impala Catalog Server</span> roles, add the roles on a different host, and restart the
+        Impala service.
+      </p>
+    </div>
+
+    <p class="p toc inpage"></p>
+
+  </div>
+
+  <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_admin.html">Impala Administration</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="proxy__proxy_overview">
+
+    <h2 class="title topictitle2" id="ariaid-title2">Overview of Proxy Usage and Load Balancing for Impala</h2>
+  
+
+    <div class="body conbody">
+
+      <p class="p">
+        Using a load-balancing proxy server for Impala has the following advantages:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          Applications connect to a single well-known host and port, rather than keeping track of the hosts where
+          the <span class="keyword cmdname">impalad</span> daemon is running.
+        </li>
+
+        <li class="li">
+          If any host running the <span class="keyword cmdname">impalad</span> daemon becomes unavailable, application connection
+          requests still succeed because you always connect to the proxy server rather than a specific host running
+          the <span class="keyword cmdname">impalad</span> daemon.
+        </li>
+
+        <li class="li">
+          The coordinator node for each Impala query potentially requires more memory and CPU cycles than the other
+          nodes that process the query. The proxy server can issue queries using round-robin scheduling, so that
+          each connection uses a different coordinator node. This load-balancing technique lets the Impala nodes
+          share this additional work, rather than concentrating it on a single machine.
+        </li>
+      </ul>
+
+      <p class="p">
+        The following setup steps are a general outline that apply to any load-balancing proxy software:
+      </p>
+
+      <ol class="ol">
+        <li class="li">
+          Download the load-balancing proxy software. It should only need to be installed and configured on a
+          single host. Pick a host other than the DataNodes where <span class="keyword cmdname">impalad</span> is running,
+          because the intention is to protect against the possibility of one or more of these DataNodes becoming unavailable.
+        </li>
+
+        <li class="li">
+          Configure the load balancer (typically by editing a configuration file).
+          In particular:
+          <ul class="ul">
+            <li class="li">
+              <p class="p">
+                Set up a port that the load balancer will listen on to relay Impala requests back and forth.
+              </p>
+            </li>
+            <li class="li">
+              <p class="p">
+                Consider enabling <span class="q">"sticky sessions"</span>. Where practical, enable this setting
+                so that stateless client applications such as <span class="keyword cmdname">impalad</span> and Hue
+                are not disconnected from long-running queries. Evaluate whether this setting is
+                appropriate for your combination of workload and client applications.
+              </p>
+            </li>
+            <li class="li">
+              <p class="p">
+                For Kerberized clusters, follow the instructions in <a class="xref" href="impala_proxy.html#proxy_kerberos">Special Proxy Considerations for Clusters Using Kerberos</a>.
+              </p>
+            </li>
+          </ul>
+        </li>
+
+        <li class="li">
+          Specify the host and port settings for each Impala node. These are the hosts that the load balancer will
+          choose from when relaying each Impala query. See <a class="xref" href="impala_ports.html#ports">Ports Used by Impala</a> for when to use
+          port 21000, 21050, or another value depending on what type of connections you are load balancing.
+          <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+            <p class="p">
+              In particular, if you are using Hue or JDBC-based applications,
+              you typically set up load balancing for both ports 21000 and 21050, because
+              these client applications connect through port 21050 while the <span class="keyword cmdname">impala-shell</span>
+              command connects through port 21000.
+            </p>
+          </div>
+        </li>
+
+        <li class="li">
+          Run the load-balancing proxy server, pointing it at the configuration file that you set up.
+        </li>
+
+        <li class="li">
+          For any scripts, jobs, or configuration settings for applications that formerly connected to a specific
+          datanode to run Impala SQL statements, change the connection information (such as the <code class="ph codeph">-i</code>
+          option in <span class="keyword cmdname">impala-shell</span>) to point to the load balancer instead.
+        </li>
+      </ol>
+
+      <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+        The following sections use the HAProxy software as a representative example of a load balancer
+        that you can use with Impala.
+      </div>
+
+    </div>
+
+  </article>
+
+  
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="proxy__proxy_kerberos">
+
+    <h2 class="title topictitle2" id="ariaid-title3">Special Proxy Considerations for Clusters Using Kerberos</h2>
+  
+
+    <div class="body conbody">
+
+      <p class="p">
+        In a cluster using Kerberos, applications check host credentials to verify that the host they are
+        connecting to is the same one that is actually processing the request, to prevent man-in-the-middle
+        attacks. To clarify that the load-balancing proxy server is legitimate, perform these extra Kerberos setup
+        steps:
+      </p>
+
+      <ol class="ol">
+        <li class="li">
+          This section assumes you are starting with a Kerberos-enabled cluster. See
+          <a class="xref" href="impala_kerberos.html#kerberos">Enabling Kerberos Authentication for Impala</a> for instructions for setting up Impala with Kerberos. See
+          <span class="xref">the documentation for your Apache Hadoop distribution</span> for general steps to set up Kerberos.
+        </li>
+
+        <li class="li">
+          Choose the host you will use for the proxy server. Based on the Kerberos setup procedure, it should
+          already have an entry <code class="ph codeph">impala/<var class="keyword varname">proxy_host</var>@<var class="keyword varname">realm</var></code> in
+          its keytab. If not, go back over the initial Kerberos configuration steps for the keytab on each host
+          running the <span class="keyword cmdname">impalad</span> daemon.
+        </li>
+
+        <li class="li">
+          Copy the keytab file from the proxy host to all other hosts in the cluster that run the
+          <span class="keyword cmdname">impalad</span> daemon. (For optimal performance, <span class="keyword cmdname">impalad</span> should be running
+          on all DataNodes in the cluster.) Put the keytab file in a secure location on each of these other hosts.
+        </li>
+
+        <li class="li">
+          Add an entry <code class="ph codeph">impala/<var class="keyword varname">actual_hostname</var>@<var class="keyword varname">realm</var></code> to the keytab on each
+          host running the <span class="keyword cmdname">impalad</span> daemon.
+        </li>
+
+        <li class="li">
+
+         For each impalad node, merge the existing keytab with the proxy\u2019s keytab using
+          <span class="keyword cmdname">ktutil</span>, producing a new keytab file. For example:
+          <pre class="pre codeblock"><code>$ ktutil
+  ktutil: read_kt proxy.keytab
+  ktutil: read_kt impala.keytab
+  ktutil: write_kt proxy_impala.keytab
+  ktutil: quit</code></pre>
+
+        </li>
+
+        <li class="li">
+
+          To verify that the keytabs are merged, run the command:
+<pre class="pre codeblock"><code>
+klist -k <var class="keyword varname">keytabfile</var>
+</code></pre>
+          which lists the credentials for both <code class="ph codeph">principal</code> and <code class="ph codeph">be_principal</code> on
+          all nodes.
+        </li>
+
+
+        <li class="li">
+
+          Make sure that the <code class="ph codeph">impala</code> user has permission to read this merged keytab file.
+
+        </li>
+
+        <li class="li">
+          Change the following configuration settings for each host in the cluster that participates
+          in the load balancing:
+          <ul class="ul">
+            <li class="li">
+              In the <span class="keyword cmdname">impalad</span> option definition, add:
+<pre class="pre codeblock"><code>
+--principal=impala/<em class="ph i">proxy_host@realm</em>
+  --be_principal=impala/<em class="ph i">actual_host@realm</em>
+  --keytab_file=<em class="ph i">path_to_merged_keytab</em>
+</code></pre>
+              <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+                Every host has different <code class="ph codeph">--be_principal</code> because the actual hostname
+                is different on each host.
+
+                Specify the fully qualified domain name (FQDN) for the proxy host, not the IP
+                address. Use the exact FQDN as returned by a reverse DNS lookup for the associated
+                IP address.
+
+              </div>
+            </li>
+
+            <li class="li">
+              Modify the startup options. See <a class="xref" href="impala_config_options.html#config_options">Modifying Impala Startup Options</a> for the procedure to modify the startup
+              options.
+            </li>
+          </ul>
+        </li>
+
+        <li class="li">
+          Restart Impala to make the changes take effect. Restart the <span class="keyword cmdname">impalad</span> daemons on all
+          hosts in the cluster, as well as the <span class="keyword cmdname">statestored</span> and <span class="keyword cmdname">catalogd</span>
+          daemons.
+        </li>
+
+      </ol>
+
+
+
+    </div>
+
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="proxy__tut_proxy">
+
+    <h2 class="title topictitle2" id="ariaid-title4">Example of Configuring HAProxy Load Balancer for Impala</h2>
+  
+
+    <div class="body conbody">
+
+      <p class="p">
+        If you are not already using a load-balancing proxy, you can experiment with
+        <a class="xref" href="http://haproxy.1wt.eu/" target="_blank">HAProxy</a> a free, open source load
+        balancer. This example shows how you might install and configure that load balancer on a Red Hat Enterprise
+        Linux system.
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          <p class="p">
+            Install the load balancer: <code class="ph codeph">yum install haproxy</code>
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            Set up the configuration file: <span class="ph filepath">/etc/haproxy/haproxy.cfg</span>. See the following section
+            for a sample configuration file.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            Run the load balancer (on a single host, preferably one not running <span class="keyword cmdname">impalad</span>):
+          </p>
+<pre class="pre codeblock"><code>/usr/sbin/haproxy \u2013f /etc/haproxy/haproxy.cfg</code></pre>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            In <span class="keyword cmdname">impala-shell</span>, JDBC applications, or ODBC applications, connect to the listener
+            port of the proxy host, rather than port 21000 or 21050 on a host actually running <span class="keyword cmdname">impalad</span>.
+            The sample configuration file sets haproxy to listen on port 25003, therefore you would send all
+            requests to <code class="ph codeph"><var class="keyword varname">haproxy_host</var>:25003</code>.
+          </p>
+        </li>
+      </ul>
+
+      <p class="p">
+        This is the sample <span class="ph filepath">haproxy.cfg</span> used in this example:
+      </p>
+
+<pre class="pre codeblock"><code>global
+    # To have these messages end up in /var/log/haproxy.log you will
+    # need to:
+    #
+    # 1) configure syslog to accept network log events.  This is done
+    #    by adding the '-r' option to the SYSLOGD_OPTIONS in
+    #    /etc/sysconfig/syslog
+    #
+    # 2) configure local2 events to go to the /var/log/haproxy.log
+    #   file. A line like the following can be added to
+    #   /etc/sysconfig/syslog
+    #
+    #    local2.*                       /var/log/haproxy.log
+    #
+    log         127.0.0.1 local0
+    log         127.0.0.1 local1 notice
+    chroot      /var/lib/haproxy
+    pidfile     /var/run/haproxy.pid
+    maxconn     4000
+    user        haproxy
+    group       haproxy
+    daemon
+
+    # turn on stats unix socket
+    #stats socket /var/lib/haproxy/stats
+
+#---------------------------------------------------------------------
+# common defaults that all the 'listen' and 'backend' sections will
+# use if not designated in their block
+#
+# You might need to adjust timing values to prevent timeouts.
+#---------------------------------------------------------------------
+defaults
+    mode                    http
+    log                     global
+    option                  httplog
+    option                  dontlognull
+    option http-server-close
+    option forwardfor       except 127.0.0.0/8
+    option                  redispatch
+    retries                 3
+    maxconn                 3000
+    contimeout 5000
+    clitimeout 50000
+    srvtimeout 50000
+
+#
+# This sets up the admin page for HA Proxy at port 25002.
+#
+listen stats :25002
+    balance
+    mode http
+    stats enable
+    stats auth <var class="keyword varname">username</var>:<var class="keyword varname">password</var>
+
+# This is the setup for Impala. Impala client connect to load_balancer_host:25003.
+# HAProxy will balance connections among the list of servers listed below.
+# The list of Impalad is listening at port 21000 for beeswax (impala-shell) or original ODBC driver.
+# For JDBC or ODBC version 2.x driver, use port 21050 instead of 21000.
+listen impala :25003
+    mode tcp
+    option tcplog
+    balance leastconn
+
+    server <var class="keyword varname">symbolic_name_1</var> impala-host-1.example.com:21000
+    server <var class="keyword varname">symbolic_name_2</var> impala-host-2.example.com:21000
+    server <var class="keyword varname">symbolic_name_3</var> impala-host-3.example.com:21000
+    server <var class="keyword varname">symbolic_name_4</var> impala-host-4.example.com:21000
+
+# Setup for Hue or other JDBC-enabled applications.
+# In particular, Hue requires sticky sessions.
+# The application connects to load_balancer_host:21051, and HAProxy balances
+# connections to the associated hosts, where Impala listens for JDBC
+# requests on port 21050.
+listen impalajdbc :21051
+    mode tcp
+    option tcplog
+    balance source
+    server <var class="keyword varname">symbolic_name_5</var> impala-host-1.example.com:21050
+    server <var class="keyword varname">symbolic_name_6</var> impala-host-2.example.com:21050
+    server <var class="keyword varname">symbolic_name_7</var> impala-host-3.example.com:21050
+    server <var class="keyword varname">symbolic_name_8</var> impala-host-4.example.com:21050
+</code></pre>
+
+      <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+        If your JDBC or ODBC application connects to Impala through a load balancer such as
+        <code class="ph codeph">haproxy</code>, be cautious about reusing the connections. If the load balancer has set up
+        connection timeout values, either check the connection frequently so that it never sits idle longer than
+        the load balancer timeout value, or check the connection validity before using it and create a new one if
+        the connection has been closed.
+      </div>
+
+    </div>
+
+  </article>
+
+</article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_query_options.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_query_options.html b/docs/build/html/topics/impala_query_options.html
new file mode 100644
index 0000000..ee27d90
--- /dev/null
+++ b/docs/build/html/topics/impala_query_options.html
@@ -0,0 +1,49 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_set.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_abort_on_default_limit_exceeded.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_abort_on_error.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_allow_unsupported_formats.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_appx_count_distinct.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_batch_size.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_compression_codec.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_debug_action.html"><meta name="DC.Relation" scheme="URI" 
 content="../topics/impala_default_order_by_limit.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_disable_codegen.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_disable_row_runtime_filtering.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_disable_streaming_preaggregations.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_disable_unsafe_spills.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_exec_single_node_rows_threshold.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_explain_level.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_hbase_cache_blocks.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_hbase_caching.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_live_progress.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_live_summary.html"><meta name="DC.Relation" scheme="U
 RI" content="../topics/impala_max_errors.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_max_io_buffers.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_max_scan_range_length.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_max_num_runtime_filters.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_mem_limit.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_mt_dop.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_num_nodes.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_num_scanner_threads.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_optimize_partition_key_scans.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_parquet_compression_codec.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_parquet_annotate_strings_utf8.html"><meta name="DC.Relation" scheme="URI" content="../topics/
 impala_parquet_fallback_schema_resolution.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_parquet_file_size.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_prefetch_mode.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_timeout_s.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_request_pool.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_replica_preference.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_reservation_request_timeout.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_runtime_bloom_filter_size.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_runtime_filter_max_size.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_runtime_filter_min_size.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_runtime_filter_mode.html"><meta name="DC.Relation" scheme="URI" content="
 ../topics/impala_runtime_filter_wait_time_ms.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_s3_skip_insert_staging.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_scan_node_codegen_threshold.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_scratch_limit.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_schedule_random_replica.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_support_start_over.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_sync_ddl.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_v_cpu_cores.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="query_options"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Q
 uery Options for the SET Statement</title></head><body id="query_options"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Query Options for the SET Statement</h1>
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      You can specify the following options using the <code class="ph codeph">SET</code> statement, and those settings affect all
+      queries issued from that session.
+    </p>
+
+    <p class="p">
+      Some query options are useful in day-to-day operations for improving usability, performance, or flexibility.
+    </p>
+
+    <p class="p">
+      Other query options control special-purpose aspects of Impala operation and are intended primarily for
+      advanced debugging or troubleshooting.
+    </p>
+
+    <p class="p">
+      Options with Boolean parameters can be set to 1 or <code class="ph codeph">true</code> to enable, or 0 or <code class="ph codeph">false</code>
+      to turn off.
+    </p>
+
+    <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+      <p class="p">
+        In Impala 2.0 and later, you can set query options directly through the JDBC and ODBC interfaces by using the
+        <code class="ph codeph">SET</code> statement. Formerly, <code class="ph codeph">SET</code> was only available as a command within the
+        <span class="keyword cmdname">impala-shell</span> interpreter.
+      </p>
+    </div>
+
+
+
+    <p class="p toc"></p>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+    <p class="p">
+      <a class="xref" href="impala_set.html#set">SET Statement</a>
+    </p>
+  </div>
+<nav role="navigation" class="related-links"><ul class="ullinks"><li class="link ulchildlink"><strong><a href="../topics/impala_abort_on_default_limit_exceeded.html">ABORT_ON_DEFAULT_LIMIT_EXCEEDED Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_abort_on_error.html">ABORT_ON_ERROR Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_allow_unsupported_formats.html">ALLOW_UNSUPPORTED_FORMATS Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_appx_count_distinct.html">APPX_COUNT_DISTINCT Query Option (Impala 2.0 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_batch_size.html">BATCH_SIZE Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_compression_codec.html">COMPRESSION_CODEC Query Option (Impala 2.0 or higher only)</a></strong><br></li><li class="lin
 k ulchildlink"><strong><a href="../topics/impala_debug_action.html">DEBUG_ACTION Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_default_order_by_limit.html">DEFAULT_ORDER_BY_LIMIT Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_disable_codegen.html">DISABLE_CODEGEN Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_disable_row_runtime_filtering.html">DISABLE_ROW_RUNTIME_FILTERING Query Option (Impala 2.5 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_disable_streaming_preaggregations.html">DISABLE_STREAMING_PREAGGREGATIONS Query Option (Impala 2.5 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_disable_unsafe_spills.html">DISABLE_UNSAFE_SPILLS Query Option (Impala 2.0 or higher only)</a></strong><br></li><li class="link ulchildlink"><str
 ong><a href="../topics/impala_exec_single_node_rows_threshold.html">EXEC_SINGLE_NODE_ROWS_THRESHOLD Query Option (Impala 2.1 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_explain_level.html">EXPLAIN_LEVEL Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_hbase_cache_blocks.html">HBASE_CACHE_BLOCKS Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_hbase_caching.html">HBASE_CACHING Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_live_progress.html">LIVE_PROGRESS Query Option (Impala 2.3 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_live_summary.html">LIVE_SUMMARY Query Option (Impala 2.3 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_max_errors.html">MAX_ERRORS Query Option</a></strong>
 <br></li><li class="link ulchildlink"><strong><a href="../topics/impala_max_io_buffers.html">MAX_IO_BUFFERS Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_max_scan_range_length.html">MAX_SCAN_RANGE_LENGTH Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_max_num_runtime_filters.html">MAX_NUM_RUNTIME_FILTERS Query Option (Impala 2.5 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_mem_limit.html">MEM_LIMIT Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_mt_dop.html">MT_DOP Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_num_nodes.html">NUM_NODES Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_num_scanner_threads.html">NUM_SCANNER_THREADS Query Option</a></strong><br></li><li class="link ulchild
 link"><strong><a href="../topics/impala_optimize_partition_key_scans.html">OPTIMIZE_PARTITION_KEY_SCANS Query Option (Impala 2.5 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_parquet_compression_codec.html">PARQUET_COMPRESSION_CODEC Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_parquet_annotate_strings_utf8.html">PARQUET_ANNOTATE_STRINGS_UTF8 Query Option (Impala 2.6 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_parquet_fallback_schema_resolution.html">PARQUET_FALLBACK_SCHEMA_RESOLUTION Query Option (Impala 2.6 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_parquet_file_size.html">PARQUET_FILE_SIZE Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_prefetch_mode.html">PREFETCH_MODE Query Option (Impala 2.6 or higher only)</a></st
 rong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_query_timeout_s.html">QUERY_TIMEOUT_S Query Option (Impala 2.0 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_request_pool.html">REQUEST_POOL Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_replica_preference.html">REPLICA_PREFERENCE Query Option (Impala 2.7 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_reservation_request_timeout.html">RESERVATION_REQUEST_TIMEOUT Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_runtime_bloom_filter_size.html">RUNTIME_BLOOM_FILTER_SIZE Query Option (Impala 2.5 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_runtime_filter_max_size.html">RUNTIME_FILTER_MAX_SIZE Query Option (Impala 2.6 or higher only)</a></strong><br></li><
 li class="link ulchildlink"><strong><a href="../topics/impala_runtime_filter_min_size.html">RUNTIME_FILTER_MIN_SIZE Query Option (Impala 2.6 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_runtime_filter_mode.html">RUNTIME_FILTER_MODE Query Option (Impala 2.5 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_runtime_filter_wait_time_ms.html">RUNTIME_FILTER_WAIT_TIME_MS Query Option (Impala 2.5 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_s3_skip_insert_staging.html">S3_SKIP_INSERT_STAGING Query Option (Impala 2.6 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_scan_node_codegen_threshold.html">SCAN_NODE_CODEGEN_THRESHOLD Query Option (Impala 2.5 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_scratch_limit.html">SCRATCH_LIMIT 
 Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_schedule_random_replica.html">SCHEDULE_RANDOM_REPLICA Query Option (Impala 2.5 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_support_start_over.html">SUPPORT_START_OVER Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_sync_ddl.html">SYNC_DDL Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_v_cpu_cores.html">V_CPU_CORES Query Option</a></strong><br></li></ul><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_set.html">SET Statement</a></div></div></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_query_timeout_s.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_query_timeout_s.html b/docs/build/html/topics/impala_query_timeout_s.html
new file mode 100644
index 0000000..0dff374
--- /dev/null
+++ b/docs/build/html/topics/impala_query_timeout_s.html
@@ -0,0 +1,62 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="query_timeout_s"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>QUERY_TIMEOUT_S Query Option (Impala 2.0 or higher only)</title></head><body id="query_timeout_s"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">QUERY_TIMEOUT_S Query Option (<span class="keyword">Impala 2.0</span> or higher only)</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      
+      Sets the idle query timeout value for the session, in seconds. Queries that sit idle for longer than the
+      timeout value are automatically cancelled. If the system administrator specified the
+      <code class="ph codeph">--idle_query_timeout</code> startup option, <code class="ph codeph">QUERY_TIMEOUT_S</code> must be smaller than
+      or equal to the <code class="ph codeph">--idle_query_timeout</code> value.
+    </p>
+
+    <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+        <p class="p">
+          The timeout clock for queries and sessions only starts ticking when the query or session is idle.
+          For queries, this means the query has results ready but is waiting for a client to fetch the data. A
+          query can run for an arbitrary time without triggering a timeout, because the query is computing results
+          rather than sitting idle waiting for the results to be fetched. The timeout period is intended to prevent
+          unclosed queries from consuming resources and taking up slots in the admission count of running queries,
+          potentially preventing other queries from starting.
+        </p>
+        <p class="p">
+          For sessions, this means that no query has been submitted for some period of time.
+        </p>
+      </div>
+
+    <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>SET QUERY_TIMEOUT_S=<var class="keyword varname">seconds</var>;</code></pre>
+
+
+
+    <p class="p">
+      <strong class="ph b">Type:</strong> numeric
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Default:</strong> 0 (no timeout if <code class="ph codeph">--idle_query_timeout</code> not in effect; otherwise, use
+      <code class="ph codeph">--idle_query_timeout</code> value)
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.0.0</span>
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+    <p class="p">
+      <a class="xref" href="impala_timeouts.html#timeouts">Setting Timeout Periods for Daemons, Queries, and Sessions</a>
+    </p>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_rcfile.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_rcfile.html b/docs/build/html/topics/impala_rcfile.html
new file mode 100644
index 0000000..0e2668d
--- /dev/null
+++ b/docs/build/html/topics/impala_rcfile.html
@@ -0,0 +1,246 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_file_formats.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="rcfile"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Using the RCFile File Format with Impala Tables</title></head><body id="rcfile"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Using the RCFile File Format with Impala Tables</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      
+      Impala supports using RCFile data files.
+    </p>
+
+    <table class="table"><caption><span class="table--title-label">Table 1. </span><span class="title">RCFile Format Support in Impala</span></caption><colgroup><col style="width:10%"><col style="width:10%"><col style="width:20%"><col style="width:30%"><col style="width:30%"></colgroup><thead class="thead">
+          <tr class="row">
+            <th class="entry nocellnorowborder" id="rcfile__entry__1">
+              File Type
+            </th>
+            <th class="entry nocellnorowborder" id="rcfile__entry__2">
+              Format
+            </th>
+            <th class="entry nocellnorowborder" id="rcfile__entry__3">
+              Compression Codecs
+            </th>
+            <th class="entry nocellnorowborder" id="rcfile__entry__4">
+              Impala Can CREATE?
+            </th>
+            <th class="entry nocellnorowborder" id="rcfile__entry__5">
+              Impala Can INSERT?
+            </th>
+          </tr>
+        </thead><tbody class="tbody">
+          <tr class="row">
+            <td class="entry nocellnorowborder" headers="rcfile__entry__1 ">
+              <a class="xref" href="impala_rcfile.html#rcfile">RCFile</a>
+            </td>
+            <td class="entry nocellnorowborder" headers="rcfile__entry__2 ">
+              Structured
+            </td>
+            <td class="entry nocellnorowborder" headers="rcfile__entry__3 ">
+              Snappy, gzip, deflate, bzip2
+            </td>
+            <td class="entry nocellnorowborder" headers="rcfile__entry__4 ">
+              Yes.
+            </td>
+            <td class="entry nocellnorowborder" headers="rcfile__entry__5 ">
+              No. Import data by using <code class="ph codeph">LOAD DATA</code> on data files already in the right format, or use
+              <code class="ph codeph">INSERT</code> in Hive followed by <code class="ph codeph">REFRESH <var class="keyword varname">table_name</var></code> in Impala.
+            </td>
+
+          </tr>
+        </tbody></table>
+
+    <p class="p toc inpage"></p>
+  </div>
+
+  <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_file_formats.html">How Impala Works with Hadoop File Formats</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="rcfile__rcfile_create">
+
+    <h2 class="title topictitle2" id="ariaid-title2">Creating RCFile Tables and Loading Data</h2>
+  
+
+    <div class="body conbody">
+
+      <p class="p">
+        If you do not have an existing data file to use, begin by creating one in the appropriate format.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">To create an RCFile table:</strong>
+      </p>
+
+      <p class="p">
+        In the <code class="ph codeph">impala-shell</code> interpreter, issue a command similar to:
+      </p>
+
+<pre class="pre codeblock"><code>create table rcfile_table (<var class="keyword varname">column_specs</var>) stored as rcfile;</code></pre>
+
+      <p class="p">
+        Because Impala can query some kinds of tables that it cannot currently write to, after creating tables of
+        certain file formats, you might use the Hive shell to load the data. See
+        <a class="xref" href="impala_file_formats.html#file_formats">How Impala Works with Hadoop File Formats</a> for details. After loading data into a table through
+        Hive or other mechanism outside of Impala, issue a <code class="ph codeph">REFRESH <var class="keyword varname">table_name</var></code>
+        statement the next time you connect to the Impala node, before querying the table, to make Impala recognize
+        the new data.
+      </p>
+
+      <div class="note important note_important"><span class="note__title importanttitle">Important:</span> 
+        See <a class="xref" href="impala_known_issues.html#known_issues">Known Issues and Workarounds in Impala</a> for potential compatibility issues with
+        RCFile tables created in Hive 0.12, due to a change in the default RCFile SerDe for Hive.
+      </div>
+
+      <p class="p">
+        For example, here is how you might create some RCFile tables in Impala (by specifying the columns
+        explicitly, or cloning the structure of another table), load data through Hive, and query them through
+        Impala:
+      </p>
+
+<pre class="pre codeblock"><code>$ impala-shell -i localhost
+[localhost:21000] &gt; create table rcfile_table (x int) stored as rcfile;
+[localhost:21000] &gt; create table rcfile_clone like some_other_table stored as rcfile;
+[localhost:21000] &gt; quit;
+
+$ hive
+hive&gt; insert into table rcfile_table select x from some_other_table;
+3 Rows loaded to rcfile_table
+Time taken: 19.015 seconds
+hive&gt; quit;
+
+$ impala-shell -i localhost
+[localhost:21000] &gt; select * from rcfile_table;
+Returned 0 row(s) in 0.23s
+[localhost:21000] &gt; -- Make Impala recognize the data loaded through Hive;
+[localhost:21000] &gt; refresh rcfile_table;
+[localhost:21000] &gt; select * from rcfile_table;
++---+
+| x |
++---+
+| 1 |
+| 2 |
+| 3 |
++---+
+Returned 3 row(s) in 0.23s</code></pre>
+
+      <p class="p">
+        <strong class="ph b">Complex type considerations:</strong>
+        Although you can create tables in this file format using
+        the complex types (<code class="ph codeph">ARRAY</code>, <code class="ph codeph">STRUCT</code>,
+        and <code class="ph codeph">MAP</code>) available in <span class="keyword">Impala 2.3</span> and higher,
+        currently, Impala can query these types only in Parquet tables.
+        <span class="ph">
+        The one exception to the preceding rule is <code class="ph codeph">COUNT(*)</code> queries on RCFile tables that include complex types.
+        Such queries are allowed in <span class="keyword">Impala 2.6</span> and higher.
+        </span>
+      </p>
+
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="rcfile__rcfile_compression">
+
+    <h2 class="title topictitle2" id="ariaid-title3">Enabling Compression for RCFile Tables</h2>
+  
+
+    <div class="body conbody">
+
+      <p class="p">
+        
+        You may want to enable compression on existing tables. Enabling compression provides performance gains in
+        most cases and is supported for RCFile tables. For example, to enable Snappy compression, you would specify
+        the following additional settings when loading data through the Hive shell:
+      </p>
+
+<pre class="pre codeblock"><code>hive&gt; SET hive.exec.compress.output=true;
+hive&gt; SET mapred.max.split.size=256000000;
+hive&gt; SET mapred.output.compression.type=BLOCK;
+hive&gt; SET mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec;
+hive&gt; INSERT OVERWRITE TABLE <var class="keyword varname">new_table</var> SELECT * FROM <var class="keyword varname">old_table</var>;</code></pre>
+
+      <p class="p">
+        If you are converting partitioned tables, you must complete additional steps. In such a case, specify
+        additional settings similar to the following:
+      </p>
+
+<pre class="pre codeblock"><code>hive&gt; CREATE TABLE <var class="keyword varname">new_table</var> (<var class="keyword varname">your_cols</var>) PARTITIONED BY (<var class="keyword varname">partition_cols</var>) STORED AS <var class="keyword varname">new_format</var>;
+hive&gt; SET hive.exec.dynamic.partition.mode=nonstrict;
+hive&gt; SET hive.exec.dynamic.partition=true;
+hive&gt; INSERT OVERWRITE TABLE <var class="keyword varname">new_table</var> PARTITION(<var class="keyword varname">comma_separated_partition_cols</var>) SELECT * FROM <var class="keyword varname">old_table</var>;</code></pre>
+
+      <p class="p">
+        Remember that Hive does not require that you specify a source format for it. Consider the case of
+        converting a table with two partition columns called <code class="ph codeph">year</code> and <code class="ph codeph">month</code> to a
+        Snappy compressed RCFile. Combining the components outlined previously to complete this table conversion,
+        you would specify settings similar to the following:
+      </p>
+
+<pre class="pre codeblock"><code>hive&gt; CREATE TABLE tbl_rc (int_col INT, string_col STRING) STORED AS RCFILE;
+hive&gt; SET hive.exec.compress.output=true;
+hive&gt; SET mapred.max.split.size=256000000;
+hive&gt; SET mapred.output.compression.type=BLOCK;
+hive&gt; SET mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec;
+hive&gt; SET hive.exec.dynamic.partition.mode=nonstrict;
+hive&gt; SET hive.exec.dynamic.partition=true;
+hive&gt; INSERT OVERWRITE TABLE tbl_rc SELECT * FROM tbl;</code></pre>
+
+      <p class="p">
+        To complete a similar process for a table that includes partitions, you would specify settings similar to
+        the following:
+      </p>
+
+<pre class="pre codeblock"><code>hive&gt; CREATE TABLE tbl_rc (int_col INT, string_col STRING) PARTITIONED BY (year INT) STORED AS RCFILE;
+hive&gt; SET hive.exec.compress.output=true;
+hive&gt; SET mapred.max.split.size=256000000;
+hive&gt; SET mapred.output.compression.type=BLOCK;
+hive&gt; SET mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec;
+hive&gt; SET hive.exec.dynamic.partition.mode=nonstrict;
+hive&gt; SET hive.exec.dynamic.partition=true;
+hive&gt; INSERT OVERWRITE TABLE tbl_rc PARTITION(year) SELECT * FROM tbl;</code></pre>
+
+      <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+        <p class="p">
+          The compression type is specified in the following command:
+        </p>
+<pre class="pre codeblock"><code>SET mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec;</code></pre>
+        <p class="p">
+          You could elect to specify alternative codecs such as <code class="ph codeph">GzipCodec</code> here.
+        </p>
+      </div>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="rcfile__rcfile_performance">
+
+    <h2 class="title topictitle2" id="ariaid-title4">Query Performance for Impala RCFile Tables</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        In general, expect query performance with RCFile tables to be
+        faster than with tables using text data, but slower than with
+        Parquet tables. See <a class="xref" href="impala_parquet.html#parquet">Using the Parquet File Format with Impala Tables</a>
+        for information about using the Parquet file format for
+        high-performance analytic queries.
+      </p>
+
+      <p class="p">
+        In <span class="keyword">Impala 2.6</span> and higher, Impala queries are optimized for files stored in Amazon S3.
+        For Impala tables that use the file formats Parquet, RCFile, SequenceFile,
+        Avro, and uncompressed text, the setting <code class="ph codeph">fs.s3a.block.size</code>
+        in the <span class="ph filepath">core-site.xml</span> configuration file determines
+        how Impala divides the I/O work of reading the data files. This configuration
+        setting is specified in bytes. By default, this
+        value is 33554432 (32 MB), meaning that Impala parallelizes S3 read operations on the files
+        as if they were made up of 32 MB blocks. For example, if your S3 queries primarily access
+        Parquet files written by MapReduce or Hive, increase <code class="ph codeph">fs.s3a.block.size</code>
+        to 134217728 (128 MB) to match the row group size of those files. If most S3 queries involve
+        Parquet files written by Impala, increase <code class="ph codeph">fs.s3a.block.size</code>
+        to 268435456 (256 MB) to match the row group size produced by Impala.
+      </p>
+
+    </div>
+  </article>
+
+  
+</article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_real.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_real.html b/docs/build/html/topics/impala_real.html
new file mode 100644
index 0000000..f66d313
--- /dev/null
+++ b/docs/build/html/topics/impala_real.html
@@ -0,0 +1,39 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_datatypes.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="real"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>REAL Data Type</title></head><body id="real"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">REAL Data Type</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      An alias for the <code class="ph codeph">DOUBLE</code> data type. See <a class="xref" href="impala_double.html#double">DOUBLE Data Type</a> for details.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+    <p class="p">
+      These examples show how you can use the type names <code class="ph codeph">REAL</code> and <code class="ph codeph">DOUBLE</code>
+      interchangeably, and behind the scenes Impala treats them always as <code class="ph codeph">DOUBLE</code>.
+    </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; create table r1 (x real);
+[localhost:21000] &gt; describe r1;
++------+--------+---------+
+| name | type   | comment |
++------+--------+---------+
+| x    | double |         |
++------+--------+---------+
+[localhost:21000] &gt; insert into r1 values (1.5), (cast (2.2 as double));
+[localhost:21000] &gt; select cast (1e6 as real);
++---------------------------+
+| cast(1000000.0 as double) |
++---------------------------+
+| 1000000                   |
++---------------------------+</code></pre>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_datatypes.html">Data Types</a></div></div></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_refresh.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_refresh.html b/docs/build/html/topics/impala_refresh.html
new file mode 100644
index 0000000..75ce520
--- /dev/null
+++ b/docs/build/html/topics/impala_refresh.html
@@ -0,0 +1,387 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_langref_sql.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="refresh"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>REFRESH Statement</title></head><body id="refresh"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">REFRESH Statement</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      
+      To accurately respond to queries, the Impala node that acts as the coordinator (the node to which you are
+      connected through <span class="keyword cmdname">impala-shell</span>, JDBC, or ODBC) must have current metadata about those
+      databases and tables that are referenced in Impala queries. If you are not familiar with the way Impala uses
+      metadata and how it shares the same metastore database as Hive, see
+      <a class="xref" href="impala_hadoop.html#intro_metastore">Overview of Impala Metadata and the Metastore</a> for background information.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+<pre class="pre codeblock"><code>REFRESH [<var class="keyword varname">db_name</var>.]<var class="keyword varname">table_name</var> [PARTITION (<var class="keyword varname">key_col1</var>=<var class="keyword varname">val1</var> [, <var class="keyword varname">key_col2</var>=<var class="keyword varname">val2</var>...])]</code></pre>
+
+    <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+    <p class="p">
+      Use the <code class="ph codeph">REFRESH</code> statement to load the latest metastore metadata and block location data for
+      a particular table in these scenarios:
+    </p>
+
+    <ul class="ul">
+      <li class="li">
+        After loading new data files into the HDFS data directory for the table. (Once you have set up an ETL
+        pipeline to bring data into Impala on a regular basis, this is typically the most frequent reason why
+        metadata needs to be refreshed.)
+      </li>
+
+      <li class="li">
+        After issuing <code class="ph codeph">ALTER TABLE</code>, <code class="ph codeph">INSERT</code>, <code class="ph codeph">LOAD DATA</code>, or other
+        table-modifying SQL statement in Hive.
+      </li>
+    </ul>
+
+    <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+      <p class="p">
+        In <span class="keyword">Impala 2.3</span> and higher, the syntax <code class="ph codeph">ALTER TABLE <var class="keyword varname">table_name</var> RECOVER PARTITIONS</code>
+        is a faster alternative to <code class="ph codeph">REFRESH</code> when the only change to the table data is the addition of
+        new partition directories through Hive or manual HDFS operations.
+        See <a class="xref" href="impala_alter_table.html#alter_table">ALTER TABLE Statement</a> for details.
+      </p>
+    </div>
+
+    <p class="p">
+      You only need to issue the <code class="ph codeph">REFRESH</code> statement on the node to which you connect to issue
+      queries. The coordinator node divides the work among all the Impala nodes in a cluster, and sends read
+      requests for the correct HDFS blocks without relying on the metadata on the other nodes.
+    </p>
+
+    <p class="p">
+      <code class="ph codeph">REFRESH</code> reloads the metadata for the table from the metastore database, and does an
+      incremental reload of the low-level block location data to account for any new data files added to the HDFS
+      data directory for the table. It is a low-overhead, single-table operation, specifically tuned for the common
+      scenario where new data files are added to HDFS.
+    </p>
+
+    <p class="p">
+      Only the metadata for the specified table is flushed. The table must already exist and be known to Impala,
+      either because the <code class="ph codeph">CREATE TABLE</code> statement was run in Impala rather than Hive, or because a
+      previous <code class="ph codeph">INVALIDATE METADATA</code> statement caused Impala to reload its entire metadata catalog.
+    </p>
+
+    <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
+      <p class="p">
+        The catalog service broadcasts any changed metadata as a result of Impala
+        <code class="ph codeph">ALTER TABLE</code>, <code class="ph codeph">INSERT</code> and <code class="ph codeph">LOAD DATA</code> statements to all
+        Impala nodes. Thus, the <code class="ph codeph">REFRESH</code> statement is only required if you load data through Hive
+        or by manipulating data files in HDFS directly. See <a class="xref" href="impala_components.html#intro_catalogd">The Impala Catalog Service</a> for
+        more information on the catalog service.
+      </p>
+      <p class="p">
+        Another way to avoid inconsistency across nodes is to enable the
+        <code class="ph codeph">SYNC_DDL</code> query option before performing a DDL statement or an <code class="ph codeph">INSERT</code> or
+        <code class="ph codeph">LOAD DATA</code>.
+      </p>
+      <p class="p">
+        The table name is a required parameter. To flush the metadata for all tables, use the
+        <code class="ph codeph"><a class="xref" href="impala_invalidate_metadata.html#invalidate_metadata">INVALIDATE METADATA</a></code>
+        command.
+      </p>
+      <p class="p">
+      Because <code class="ph codeph">REFRESH <var class="keyword varname">table_name</var></code> only works for tables that the current
+      Impala node is already aware of, when you create a new table in the Hive shell, enter
+      <code class="ph codeph">INVALIDATE METADATA <var class="keyword varname">new_table</var></code> before you can see the new table in
+      <span class="keyword cmdname">impala-shell</span>. Once the table is known by Impala, you can issue <code class="ph codeph">REFRESH
+      <var class="keyword varname">table_name</var></code> after you add data files for that table.
+    </p>
+    </div>
+
+    <p class="p">
+      <code class="ph codeph">INVALIDATE METADATA</code> and <code class="ph codeph">REFRESH</code> are counterparts: <code class="ph codeph">INVALIDATE
+      METADATA</code> waits to reload the metadata when needed for a subsequent query, but reloads all the
+      metadata for the table, which can be an expensive operation, especially for large tables with many
+      partitions. <code class="ph codeph">REFRESH</code> reloads the metadata immediately, but only loads the block location
+      data for newly added data files, making it a less expensive operation overall. If data was altered in some
+      more extensive way, such as being reorganized by the HDFS balancer, use <code class="ph codeph">INVALIDATE
+      METADATA</code> to avoid a performance penalty from reduced local reads. If you used Impala version 1.0,
+      the <code class="ph codeph">INVALIDATE METADATA</code> statement works just like the Impala 1.0 <code class="ph codeph">REFRESH</code>
+      statement did, while the Impala 1.1 <code class="ph codeph">REFRESH</code> is optimized for the common use case of adding
+      new data files to an existing table, thus the table name argument is now required.
+    </p>
+
+    <p class="p">
+      A metadata update for an <code class="ph codeph">impalad</code> instance <strong class="ph b">is</strong> required if:
+    </p>
+
+    <ul class="ul">
+      <li class="li">
+        A metadata change occurs.
+      </li>
+
+      <li class="li">
+        <strong class="ph b">and</strong> the change is made through Hive.
+      </li>
+
+      <li class="li">
+        <strong class="ph b">and</strong> the change is made to a metastore database to which clients such as the Impala shell or ODBC directly
+        connect.
+      </li>
+    </ul>
+
+    <p class="p">
+      A metadata update for an Impala node is <strong class="ph b">not</strong> required after you run <code class="ph codeph">ALTER TABLE</code>,
+      <code class="ph codeph">INSERT</code>, or other table-modifying statement in Impala rather than Hive. Impala handles the
+      metadata synchronization automatically through the catalog service.
+    </p>
+
+    <p class="p">
+      Database and table metadata is typically modified by:
+    </p>
+
+    <ul class="ul">
+      <li class="li">
+        Hive - through <code class="ph codeph">ALTER</code>, <code class="ph codeph">CREATE</code>, <code class="ph codeph">DROP</code> or
+        <code class="ph codeph">INSERT</code> operations.
+      </li>
+
+      <li class="li">
+        Impalad - through <code class="ph codeph">CREATE TABLE</code>, <code class="ph codeph">ALTER TABLE</code>, and <code class="ph codeph">INSERT</code>
+        operations. <span class="ph">Such changes are propagated to all Impala nodes by the
+        Impala catalog service.</span>
+      </li>
+    </ul>
+
+    <p class="p">
+      <code class="ph codeph">REFRESH</code> causes the metadata for that table to be immediately reloaded. For a huge table,
+      that process could take a noticeable amount of time; but doing the refresh up front avoids an unpredictable
+      delay later, for example if the next reference to the table is during a benchmark test.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Refreshing a single partition:</strong>
+    </p>
+
+    <p class="p">
+      In <span class="keyword">Impala 2.7</span> and higher, the <code class="ph codeph">REFRESH</code> statement can apply to a single partition at a time,
+      rather than the whole table. Include the optional <code class="ph codeph">PARTITION (<var class="keyword varname">partition_spec</var>)</code>
+      clause and specify values for each of the partition key columns.
+    </p>
+
+    <p class="p">
+      The following examples show how to make Impala aware of data added to a single partition, after data is loaded into
+      a partition's data directory using some mechanism outside Impala, such as Hive or Spark. The partition can be one that
+      Impala created and is already aware of, or a new partition created through Hive.
+    </p>
+
+<pre class="pre codeblock"><code>
+impala&gt; create table p (x int) partitioned by (y int);
+impala&gt; insert into p (x,y) values (1,2), (2,2), (2,1);
+impala&gt; show partitions p;
++-------+-------+--------+------+...
+| y     | #Rows | #Files | Size |...
++-------+-------+--------+------+...
+| 1     | -1    | 1      | 2B   |...
+| 2     | -1    | 1      | 4B   |...
+| Total | -1    | 2      | 6B   |...
++-------+-------+--------+------+...
+
+-- ... Data is inserted into one of the partitions by some external mechanism ...
+beeline&gt; insert into p partition (y = 1) values(1000);
+
+impala&gt; refresh p partition (y=1);
+impala&gt; select x from p where y=1;
++------+
+| x    |
++------+
+| 2    | &lt;- Original data created by Impala
+| 1000 | &lt;- Additional data inserted through Beeline
++------+
+
+</code></pre>
+
+    <p class="p">
+      The same applies for tables with more than one partition key column.
+      The <code class="ph codeph">PARTITION</code> clause of the <code class="ph codeph">REFRESH</code>
+      statement must include all the partition key columns.
+    </p>
+
+<pre class="pre codeblock"><code>
+impala&gt; create table p2 (x int) partitioned by (y int, z int);
+impala&gt; insert into p2 (x,y,z) values (0,0,0), (1,2,3), (2,2,3);
+impala&gt; show partitions p2;
++-------+---+-------+--------+------+...
+| y     | z | #Rows | #Files | Size |...
++-------+---+-------+--------+------+...
+| 0     | 0 | -1    | 1      | 2B   |...
+| 2     | 3 | -1    | 1      | 4B   |...
+| Total |   | -1    | 2      | 6B   |...
++-------+---+-------+--------+------+...
+
+-- ... Data is inserted into one of the partitions by some external mechanism ...
+beeline&gt; insert into p2 partition (y = 2, z = 3) values(1000);
+
+impala&gt; refresh p2 partition (y=2, z=3);
+impala&gt; select x from p where y=2 and z = 3;
++------+
+| x    |
++------+
+| 1    | &lt;- Original data created by Impala
+| 2    | &lt;- Original data created by Impala
+| 1000 | &lt;- Additional data inserted through Beeline
++------+
+
+</code></pre>
+
+    <p class="p">
+      The following examples show how specifying a nonexistent partition does not cause any error,
+      and the order of the partition key columns does not have to match the column order in the table.
+      The partition spec must include all the partition key columns; specifying an incomplete set of
+      columns does cause an error.
+    </p>
+
+<pre class="pre codeblock"><code>
+-- Partition doesn't exist.
+refresh p2 partition (y=0, z=3);
+refresh p2 partition (y=0, z=-1)
+-- Key columns specified in a different order than the table definition.
+refresh p2 partition (z=1, y=0)
+-- Incomplete partition spec causes an error.
+refresh p2 partition (y=0)
+ERROR: AnalysisException: Items in partition spec must exactly match the partition columns in the table definition: default.p2 (1 vs 2)
+
+</code></pre>
+
+    <p class="p">
+        If you connect to different Impala nodes within an <span class="keyword cmdname">impala-shell</span> session for
+        load-balancing purposes, you can enable the <code class="ph codeph">SYNC_DDL</code> query option to make each DDL
+        statement wait before returning, until the new or changed metadata has been received by all the Impala
+        nodes. See <a class="xref" href="../shared/../topics/impala_sync_ddl.html#sync_ddl">SYNC_DDL Query Option</a> for details.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+    <p class="p">
+      The following example shows how you might use the <code class="ph codeph">REFRESH</code> statement after manually adding
+      new HDFS data files to the Impala data directory for a table:
+    </p>
+
+<pre class="pre codeblock"><code>[impalad-host:21000] &gt; refresh t1;
+[impalad-host:21000] &gt; refresh t2;
+[impalad-host:21000] &gt; select * from t1;
+...
+[impalad-host:21000] &gt; select * from t2;
+... </code></pre>
+
+    <p class="p">
+      For more examples of using <code class="ph codeph">REFRESH</code> and <code class="ph codeph">INVALIDATE METADATA</code> with a
+      combination of Impala and Hive operations, see <a class="xref" href="impala_tutorial.html#tutorial_impala_hive">Switching Back and Forth Between Impala and Hive</a>.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Related impala-shell options:</strong>
+    </p>
+
+    <p class="p">
+      The <span class="keyword cmdname">impala-shell</span> option <code class="ph codeph">-r</code> issues an <code class="ph codeph">INVALIDATE METADATA</code> statement
+      when starting up the shell, effectively performing a <code class="ph codeph">REFRESH</code> of all tables.
+      Due to the expense of reloading the metadata for all tables, the <span class="keyword cmdname">impala-shell</span> <code class="ph codeph">-r</code>
+      option is not recommended for day-to-day use in a production environment. (This option was mainly intended as a workaround
+      for synchronization issues in very old Impala versions.)
+    </p>
+
+    <p class="p">
+        <strong class="ph b">HDFS permissions:</strong>
+      </p>
+    <p class="p">
+      The user ID that the <span class="keyword cmdname">impalad</span> daemon runs under,
+      typically the <code class="ph codeph">impala</code> user, must have execute
+      permissions for all the relevant directories holding table data.
+      (A table could have data spread across multiple directories,
+      or in unexpected paths, if it uses partitioning or
+      specifies a <code class="ph codeph">LOCATION</code> attribute for
+      individual partitions or the entire table.)
+      Issues with permissions might not cause an immediate error for this statement,
+      but subsequent statements such as <code class="ph codeph">SELECT</code>
+      or <code class="ph codeph">SHOW TABLE STATS</code> could fail.
+    </p>
+    <p class="p">
+      All HDFS and Sentry permissions and privileges are the same whether you refresh the entire table
+      or a single partition.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">HDFS considerations:</strong>
+      </p>
+
+    <p class="p">
+      The <code class="ph codeph">REFRESH</code> command checks HDFS permissions of the underlying data files and directories,
+      caching this information so that a statement can be cancelled immediately if for example the
+      <code class="ph codeph">impala</code> user does not have permission to write to the data directory for the table. Impala
+      reports any lack of write permissions as an <code class="ph codeph">INFO</code> message in the log file, in case that
+      represents an oversight. If you change HDFS permissions to make data readable or writeable by the Impala
+      user, issue another <code class="ph codeph">REFRESH</code> to make Impala aware of the change.
+    </p>
+
+    <div class="note important note_important"><span class="note__title importanttitle">Important:</span> 
+        After adding or replacing data in a table used in performance-critical queries, issue a <code class="ph codeph">COMPUTE
+        STATS</code> statement to make sure all statistics are up-to-date. Consider updating statistics for a
+        table after any <code class="ph codeph">INSERT</code>, <code class="ph codeph">LOAD DATA</code>, or <code class="ph codeph">CREATE TABLE AS
+        SELECT</code> statement in Impala, or after loading data through Hive and doing a <code class="ph codeph">REFRESH
+        <var class="keyword varname">table_name</var></code> in Impala. This technique is especially important for tables that
+        are very large, used in join queries, or both.
+      </div>
+
+    <p class="p">
+        <strong class="ph b">Amazon S3 considerations:</strong>
+      </p>
+    <p class="p">
+        The <code class="ph codeph">REFRESH</code> and <code class="ph codeph">INVALIDATE METADATA</code> statements also cache metadata
+        for tables where the data resides in the Amazon Simple Storage Service (S3).
+        In particular, issue a <code class="ph codeph">REFRESH</code> for a table after adding or removing files
+        in the associated S3 data directory.
+        See <a class="xref" href="../shared/../topics/impala_s3.html#s3">Using Impala with the Amazon S3 Filesystem</a> for details about working with S3 tables.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Cancellation:</strong> Cannot be cancelled.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Kudu considerations:</strong>
+      </p>
+    <p class="p">
+        Much of the metadata for Kudu tables is handled by the underlying
+        storage layer. Kudu tables have less reliance on the metastore
+        database, and require less metadata caching on the Impala side.
+        For example, information about partitions in Kudu tables is managed
+        by Kudu, and Impala does not cache any block locality metadata
+        for Kudu tables.
+      </p>
+    <p class="p">
+        The <code class="ph codeph">REFRESH</code> and <code class="ph codeph">INVALIDATE METADATA</code>
+        statements are needed less frequently for Kudu tables than for
+        HDFS-backed tables. Neither statement is needed when data is
+        added to, removed, or updated in a Kudu table, even if the changes
+        are made directly to Kudu through a client program using the Kudu API.
+        Run <code class="ph codeph">REFRESH <var class="keyword varname">table_name</var></code> or
+        <code class="ph codeph">INVALIDATE METADATA <var class="keyword varname">table_name</var></code>
+        for a Kudu table only after making a change to the Kudu table schema,
+        such as adding or dropping a column, by a mechanism other than
+        Impala.
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+    <p class="p">
+      <a class="xref" href="impala_hadoop.html#intro_metastore">Overview of Impala Metadata and the Metastore</a>,
+      <a class="xref" href="impala_invalidate_metadata.html#invalidate_metadata">INVALIDATE METADATA Statement</a>
+    </p>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_langref_sql.html">Impala SQL Statements</a></div></div></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_release_notes.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_release_notes.html b/docs/build/html/topics/impala_release_notes.html
new file mode 100644
index 0000000..e36b70f
--- /dev/null
+++ b/docs/build/html/topics/impala_release_notes.html
@@ -0,0 +1,26 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_relnotes.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_new_features.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_incompatible_changes.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_known_issues.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_fixed_issues.html"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="impala_release_notes"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Impala Release Notes</title></head><body id="impala_release_notes"><
 main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Impala Release Notes</h1>
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      These release notes provide information on the <a class="xref" href="impala_new_features.html#new_features">new
+      features</a> and <a class="xref" href="impala_known_issues.html#known_issues">known issues and limitations</a> for
+      Impala versions up to <span class="ph">Impala 2.8.x</span>. For users
+      upgrading from earlier Impala releases, or using Impala in combination with specific versions of other
+      software, <a class="xref" href="impala_incompatible_changes.html#incompatible_changes">Incompatible Changes and Limitations in Apache Impala (incubating)</a> lists any changes to
+      file formats, SQL syntax, or software dependencies to take into account.
+    </p>
+
+    <p class="p">
+      Once you are finished reviewing these release notes, for more information about using Impala, see
+      <a class="xref" href="impala_concepts.html">Impala Concepts and Architecture</a>.
+    </p>
+
+    <p class="p toc"></p>
+  </div>
+<nav role="navigation" class="related-links"><ul class="ullinks"><li class="link ulchildlink"><strong><a href="../topics/impala_relnotes.html">Impala Release Notes</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_new_features.html">New Features in Apache Impala (incubating)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_incompatible_changes.html">Incompatible Changes and Limitations in Apache Impala (incubating)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_known_issues.html">Known Issues and Workarounds in Impala</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_fixed_issues.html">Fixed Issues in Apache Impala (incubating)</a></strong><br></li></ul></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_relnotes.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_relnotes.html b/docs/build/html/topics/impala_relnotes.html
new file mode 100644
index 0000000..09a20c9
--- /dev/null
+++ b/docs/build/html/topics/impala_relnotes.html
@@ -0,0 +1,26 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_release_notes.html"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="relnotes"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Impala Release Notes</title></head><body id="relnotes"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Impala Release Notes</h1>
+  
+
+  <div class="body conbody" id="relnotes__relnotes_intro">
+
+    <p class="p">
+      These release notes provide information on the <a class="xref" href="impala_new_features.html#new_features">new
+      features</a> and <a class="xref" href="impala_known_issues.html#known_issues">known issues and limitations</a> for
+      Impala versions up to <span class="ph">Impala 2.8.x</span>. For users
+      upgrading from earlier Impala releases, or using Impala in combination with specific versions of other
+      software, <a class="xref" href="impala_incompatible_changes.html#incompatible_changes">Incompatible Changes and Limitations in Apache Impala (incubating)</a> lists any changes to
+      file formats, SQL syntax, or software dependencies to take into account.
+    </p>
+
+    <p class="p">
+      Once you are finished reviewing these release notes, for more information about using Impala, see
+      <a class="xref" href="impala_concepts.html">Impala Concepts and Architecture</a>.
+    </p>
+
+    <p class="p toc"></p>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_release_notes.html">Impala Release Notes</a></div></div></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_replica_preference.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_replica_preference.html b/docs/build/html/topics/impala_replica_preference.html
new file mode 100644
index 0000000..157d21c
--- /dev/null
+++ b/docs/build/html/topics/impala_replica_preference.html
@@ -0,0 +1,45 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="replica_preference"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>REPLICA_PREFERENCE Query Option (Impala 2.7 or higher only)</title></head><body id="replica_preference"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">REPLICA_PREFERENCE Query Option (<span class="keyword">Impala 2.7</span> or higher only)</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      
+    </p>
+
+    <p class="p">
+      The <code class="ph codeph">REPLICA_PREFERENCE</code> query option
+      lets you spread the load more evenly if hotspots and bottlenecks persist, by allowing hosts to do local reads,
+      or even remote reads, to retrieve the data for cached blocks if Impala can determine that it would be
+      too expensive to do all such processing on a particular host.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Type:</strong> numeric (0, 3, 5)
+      or corresponding mnemonic strings (<code class="ph codeph">CACHE_LOCAL</code>, <code class="ph codeph">DISK_LOCAL</code>, <code class="ph codeph">REMOTE</code>).
+      The gaps in the numeric sequence are to accomodate other intermediate
+      values that might be added in the future.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Default:</strong> 0 (equivalent to <code class="ph codeph">CACHE_LOCAL</code>)
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.7.0</span>
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+    <p class="p">
+      <a class="xref" href="impala_perf_hdfs_caching.html#hdfs_caching">Using HDFS Caching with Impala (Impala 2.1 or higher only)</a>, <a class="xref" href="impala_schedule_random_replica.html#schedule_random_replica">SCHEDULE_RANDOM_REPLICA Query Option (Impala 2.5 or higher only)</a>
+    </p>
+
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/75c46918/docs/build/html/topics/impala_request_pool.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_request_pool.html b/docs/build/html/topics/impala_request_pool.html
new file mode 100644
index 0000000..7127b0c
--- /dev/null
+++ b/docs/build/html/topics/impala_request_pool.html
@@ -0,0 +1,35 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2017"><meta name="DC.rights.owner" content="(C) Copyright 2017"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="request_pool"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>REQUEST_POOL Query Option</title></head><body id="request_pool"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">REQUEST_POOL Query Option</h1>
+  
+  
+
+  <div class="body conbody">
+
+    <p class="p">
+      
+      The pool or queue name that queries should be submitted to. Only applies when you enable the Impala admission control feature.
+      Specifies the name of the pool used by requests from Impala to the resource manager.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Type:</strong> <code class="ph codeph">STRING</code>
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Default:</strong> empty (use the user-to-pool mapping defined by an <span class="keyword cmdname">impalad</span> startup option
+      in the Impala configuration file)
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+    <p class="p">
+      <a class="xref" href="impala_admission.html">Admission Control and Query Queuing</a>
+    </p>
+
+
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>
\ No newline at end of file