You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@impala.apache.org by mi...@apache.org on 2018/05/09 21:10:39 UTC

[30/51] [partial] impala git commit: [DOCS] Impala doc site update for 3.0

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_hbase.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_hbase.html b/docs/build3x/html/topics/impala_hbase.html
new file mode 100644
index 0000000..ef339ea
--- /dev/null
+++ b/docs/build3x/html/topics/impala_hbase.html
@@ -0,0 +1,772 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.
 0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="impala_hbase"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Using Impala to Query HBase Tables</title></head><body id="impala_hbase"><main role="main"><article role="article" aria-labelledby="impala_hbase__hbase">
+
+  <h1 class="title topictitle1" id="impala_hbase__hbase">Using Impala to Query HBase Tables</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+
+      You can use Impala to query HBase tables. This capability allows convenient access to a storage system that
+      is tuned for different kinds of workloads than the default with Impala. The default Impala tables use data
+      files stored on HDFS, which are ideal for bulk loads and queries using full-table scans. In contrast, HBase
+      can do efficient queries for data organized for OLTP-style workloads, with lookups of individual rows or
+      ranges of values.
+    </p>
+
+    <p class="p">
+      From the perspective of an Impala user, coming from an RDBMS background, HBase is a kind of key-value store
+      where the value consists of multiple fields. The key is mapped to one column in the Impala table, and the
+      various fields of the value are mapped to the other columns in the Impala table.
+    </p>
+
+    <p class="p">
+      For background information on HBase, see <a class="xref" href="https://hbase.apache.org/book.html" target="_blank">the Apache HBase documentation</a>.
+    </p>
+
+    <p class="p toc inpage"></p>
+  </div>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title2" id="impala_hbase__hbase_using">
+
+    <h2 class="title topictitle2" id="ariaid-title2">Overview of Using HBase with Impala</h2>
+
+
+    <div class="body conbody">
+
+      <p class="p">
+        When you use Impala with HBase:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          You create the tables on the Impala side using the Hive shell, because the Impala <code class="ph codeph">CREATE
+          TABLE</code> statement currently does not support custom SerDes and some other syntax needed for these
+          tables:
+          <ul class="ul">
+            <li class="li">
+              You designate it as an HBase table using the <code class="ph codeph">STORED BY
+              'org.apache.hadoop.hive.hbase.HBaseStorageHandler'</code> clause on the Hive <code class="ph codeph">CREATE
+              TABLE</code> statement.
+            </li>
+
+            <li class="li">
+              You map these specially created tables to corresponding tables that exist in HBase, with the clause
+              <code class="ph codeph">TBLPROPERTIES("hbase.table.name" = "<var class="keyword varname">table_name_in_hbase</var>")</code> on the
+              Hive <code class="ph codeph">CREATE TABLE</code> statement.
+            </li>
+
+            <li class="li">
+              See <a class="xref" href="#hbase_queries">Examples of Querying HBase Tables from Impala</a> for a full example.
+            </li>
+          </ul>
+        </li>
+
+        <li class="li">
+          You define the column corresponding to the HBase row key as a string with the <code class="ph codeph">#string</code>
+          keyword, or map it to a <code class="ph codeph">STRING</code> column.
+        </li>
+
+        <li class="li">
+          Because Impala and Hive share the same metastore database, once you create the table in Hive, you can
+          query or insert into it through Impala. (After creating a new table through Hive, issue the
+          <code class="ph codeph">INVALIDATE METADATA</code> statement in <span class="keyword cmdname">impala-shell</span> to make Impala aware of
+          the new table.)
+        </li>
+
+        <li class="li">
+          You issue queries against the Impala tables. For efficient queries, use <code class="ph codeph">WHERE</code> clauses to
+          find a single key value or a range of key values wherever practical, by testing the Impala column
+          corresponding to the HBase row key. Avoid queries that do full-table scans, which are efficient for
+          regular Impala tables but inefficient in HBase.
+        </li>
+      </ul>
+
+      <p class="p">
+        To work with an HBase table from Impala, ensure that the <code class="ph codeph">impala</code> user has read/write
+        privileges for the HBase table, using the <code class="ph codeph">GRANT</code> command in the HBase shell. For details
+        about HBase security, see <a class="xref" href="https://hbase.apache.org/book.html#security" target="_blank">the Security chapter in the Apache HBase documentation</a>.
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="impala_hbase__hbase_config">
+
+    <h2 class="title topictitle2" id="ariaid-title3">Configuring HBase for Use with Impala</h2>
+
+
+    <div class="body conbody">
+
+      <p class="p">
+        HBase works out of the box with Impala. There is no mandatory configuration needed to use these two
+        components together.
+      </p>
+
+      <p class="p">
+        To avoid delays if HBase is unavailable during Impala startup or after an <code class="ph codeph">INVALIDATE
+        METADATA</code> statement, set timeout values similar to the following in
+        <span class="ph filepath">/etc/impala/conf/hbase-site.xml</span>:
+      </p>
+
+<pre class="pre codeblock"><code>&lt;property&gt;
+  &lt;name&gt;hbase.client.retries.number&lt;/name&gt;
+  &lt;value&gt;3&lt;/value&gt;
+&lt;/property&gt;
+&lt;property&gt;
+  &lt;name&gt;hbase.rpc.timeout&lt;/name&gt;
+  &lt;value&gt;3000&lt;/value&gt;
+&lt;/property&gt;
+</code></pre>
+
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="impala_hbase__hbase_types">
+
+    <h2 class="title topictitle2" id="ariaid-title4">Supported Data Types for HBase Columns</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        To understand how Impala column data types are mapped to fields in HBase, you should have some background
+        knowledge about HBase first. You set up the mapping by running the <code class="ph codeph">CREATE TABLE</code> statement
+        in the Hive shell. See
+        <a class="xref" href="https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration" target="_blank">the
+        Hive wiki</a> for a starting point, and <a class="xref" href="#hbase_queries">Examples of Querying HBase Tables from Impala</a> for examples.
+      </p>
+
+      <p class="p">
+        HBase works as a kind of <span class="q">"bit bucket"</span>, in the sense that HBase does not enforce any typing for the
+        key or value fields. All the type enforcement is done on the Impala side.
+      </p>
+
+      <p class="p">
+        For best performance of Impala queries against HBase tables, most queries will perform comparisons in the
+        <code class="ph codeph">WHERE</code> against the column that corresponds to the HBase row key. When creating the table
+        through the Hive shell, use the <code class="ph codeph">STRING</code> data type for the column that corresponds to the
+        HBase row key. Impala can translate conditional tests (through operators such as <code class="ph codeph">=</code>,
+        <code class="ph codeph">&lt;</code>, <code class="ph codeph">BETWEEN</code>, and <code class="ph codeph">IN</code>) against this column into fast
+        lookups in HBase, but this optimization (<span class="q">"predicate pushdown"</span>) only works when that column is
+        defined as <code class="ph codeph">STRING</code>.
+      </p>
+
+      <p class="p">
+        Starting in Impala 1.1, Impala also supports reading and writing to columns that are defined in the Hive
+        <code class="ph codeph">CREATE TABLE</code> statement using binary data types, represented in the Hive table definition
+        using the <code class="ph codeph">#binary</code> keyword, often abbreviated as <code class="ph codeph">#b</code>. Defining numeric
+        columns as binary can reduce the overall data volume in the HBase tables. You should still define the
+        column that corresponds to the HBase row key as a <code class="ph codeph">STRING</code>, to allow fast lookups using
+        those columns.
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title5" id="impala_hbase__hbase_performance">
+
+    <h2 class="title topictitle2" id="ariaid-title5">Performance Considerations for the Impala-HBase Integration</h2>
+
+
+    <div class="body conbody">
+
+      <p class="p">
+        To understand the performance characteristics of SQL queries against data stored in HBase, you should have
+        some background knowledge about how HBase interacts with SQL-oriented systems first. See
+        <a class="xref" href="https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration" target="_blank">the
+        Hive wiki</a> for a starting point; because Impala shares the same metastore database as Hive, the
+        information about mapping columns from Hive tables to HBase tables is generally applicable to Impala too.
+      </p>
+
+      <p class="p">
+        Impala uses the HBase client API via Java Native Interface (JNI) to query data stored in HBase. This
+        querying does not read HFiles directly. The extra communication overhead makes it important to choose what
+        data to store in HBase or in HDFS, and construct efficient queries that can retrieve the HBase data
+        efficiently:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          Use HBase table for queries that return a single row or a range of rows, not queries that scan the entire
+          table. (If a query has no <code class="ph codeph">WHERE</code> clause, that is a strong indicator that it is an
+          inefficient query for an HBase table.)
+        </li>
+
+        <li class="li">
+          If you have join queries that do aggregation operations on large fact tables and join the results against
+          small dimension tables, consider using Impala for the fact tables and HBase for the dimension tables.
+          (Because Impala does a full scan on the HBase table in this case, rather than doing single-row HBase
+          lookups based on the join column, only use this technique where the HBase table is small enough that
+          doing a full table scan does not cause a performance bottleneck for the query.)
+        </li>
+      </ul>
+
+      <p class="p">
+        Query predicates are applied to row keys as start and stop keys, thereby limiting the scope of a particular
+        lookup. If row keys are not mapped to string columns, then ordering is typically incorrect and comparison
+        operations do not work. For example, if row keys are not mapped to string columns, evaluating for greater
+        than (&gt;) or less than (&lt;) cannot be completed.
+      </p>
+
+      <p class="p">
+        Predicates on non-key columns can be sent to HBase to scan as <code class="ph codeph">SingleColumnValueFilters</code>,
+        providing some performance gains. In such a case, HBase returns fewer rows than if those same predicates
+        were applied using Impala. While there is some improvement, it is not as great when start and stop rows are
+        used. This is because the number of rows that HBase must examine is not limited as it is when start and
+        stop rows are used. As long as the row key predicate only applies to a single row, HBase will locate and
+        return that row. Conversely, if a non-key predicate is used, even if it only applies to a single row, HBase
+        must still scan the entire table to find the correct result.
+      </p>
+
+      <div class="example"><h3 class="title sectiontitle">Interpreting EXPLAIN Output for HBase Queries</h3>
+
+
+
+        <p class="p">
+          For example, here are some queries against the following Impala table, which is mapped to an HBase table.
+          The examples show excerpts from the output of the <code class="ph codeph">EXPLAIN</code> statement, demonstrating what
+          things to look for to indicate an efficient or inefficient query against an HBase table.
+        </p>
+
+        <p class="p">
+          The first column (<code class="ph codeph">cust_id</code>) was specified as the key column in the <code class="ph codeph">CREATE
+          EXTERNAL TABLE</code> statement; for performance, it is important to declare this column as
+          <code class="ph codeph">STRING</code>. Other columns, such as <code class="ph codeph">BIRTH_YEAR</code> and
+          <code class="ph codeph">NEVER_LOGGED_ON</code>, are also declared as <code class="ph codeph">STRING</code>, rather than their
+          <span class="q">"natural"</span> types of <code class="ph codeph">INT</code> or <code class="ph codeph">BOOLEAN</code>, because Impala can optimize
+          those types more effectively in HBase tables. For comparison, we leave one column,
+          <code class="ph codeph">YEAR_REGISTERED</code>, as <code class="ph codeph">INT</code> to show that filtering on this column is
+          inefficient.
+        </p>
+
+<pre class="pre codeblock"><code>describe hbase_table;
+Query: describe hbase_table
++-----------------------+--------+---------+
+| name                  | type   | comment |
++-----------------------+--------+---------+
+| cust_id               | <strong class="ph b">string</strong> |         |
+| birth_year            | <strong class="ph b">string</strong> |         |
+| never_logged_on       | <strong class="ph b">string</strong> |         |
+| private_email_address | string |         |
+| year_registered       | <strong class="ph b">int</strong>    |         |
++-----------------------+--------+---------+
+</code></pre>
+
+        <p class="p">
+          The best case for performance involves a single row lookup using an equality comparison on the column
+          defined as the row key:
+        </p>
+
+<pre class="pre codeblock"><code>explain select count(*) from hbase_table where cust_id = 'some_user@example.com';
++------------------------------------------------------------------------------------+
+| Explain String                                                                     |
++------------------------------------------------------------------------------------+
+| Estimated Per-Host Requirements: Memory=1.01GB VCores=1                            |
+| WARNING: The following tables are missing relevant table and/or column statistics. |
+| hbase.hbase_table                                                                  |
+|                                                                                    |
+| 03:AGGREGATE [MERGE FINALIZE]                                                      |
+| |  output: sum(count(*))                                                           |
+| |                                                                                  |
+| 02:EXCHANGE [PARTITION=UNPARTITIONED]                                              |
+| |                                                                                  |
+| 01:AGGREGATE                                                                       |
+| |  output: count(*)                                                                |
+| |                                                                                  |
+<strong class="ph b">| 00:SCAN HBASE [hbase.hbase_table]                                                  |</strong>
+<strong class="ph b">|    start key: some_user@example.com                                                |</strong>
+<strong class="ph b">|    stop key: some_user@example.com\0                                               |</strong>
++------------------------------------------------------------------------------------+
+</code></pre>
+
+        <p class="p">
+          Another type of efficient query involves a range lookup on the row key column, using SQL operators such
+          as greater than (or equal), less than (or equal), or <code class="ph codeph">BETWEEN</code>. This example also includes
+          an equality test on a non-key column; because that column is a <code class="ph codeph">STRING</code>, Impala can let
+          HBase perform that test, indicated by the <code class="ph codeph">hbase filters:</code> line in the
+          <code class="ph codeph">EXPLAIN</code> output. Doing the filtering within HBase is more efficient than transmitting all
+          the data to Impala and doing the filtering on the Impala side.
+        </p>
+
+<pre class="pre codeblock"><code>explain select count(*) from hbase_table where cust_id between 'a' and 'b'
+  and never_logged_on = 'true';
++------------------------------------------------------------------------------------+
+| Explain String                                                                     |
++------------------------------------------------------------------------------------+
+...
+
+| 01:AGGREGATE                                                                       |
+| |  output: count(*)                                                                |
+| |                                                                                  |
+<strong class="ph b">| 00:SCAN HBASE [hbase.hbase_table]                                                  |</strong>
+<strong class="ph b">|    start key: a                                                                    |</strong>
+<strong class="ph b">|    stop key: b\0                                                                   |</strong>
+<strong class="ph b">|    hbase filters: cols:never_logged_on EQUAL 'true'                                |</strong>
++------------------------------------------------------------------------------------+
+</code></pre>
+
+        <p class="p">
+          The query is less efficient if Impala has to evaluate any of the predicates, because Impala must scan the
+          entire HBase table. Impala can only push down predicates to HBase for columns declared as
+          <code class="ph codeph">STRING</code>. This example tests a column declared as <code class="ph codeph">INT</code>, and the
+          <code class="ph codeph">predicates:</code> line in the <code class="ph codeph">EXPLAIN</code> output indicates that the test is
+          performed after the data is transmitted to Impala.
+        </p>
+
+<pre class="pre codeblock"><code>explain select count(*) from hbase_table where year_registered = 2010;
++------------------------------------------------------------------------------------+
+| Explain String                                                                     |
++------------------------------------------------------------------------------------+
+...
+
+| 01:AGGREGATE                                                                       |
+| |  output: count(*)                                                                |
+| |                                                                                  |
+<strong class="ph b">| 00:SCAN HBASE [hbase.hbase_table]                                                  |</strong>
+<strong class="ph b">|    predicates: year_registered = 2010                                              |</strong>
++------------------------------------------------------------------------------------+
+</code></pre>
+
+        <p class="p">
+          The same inefficiency applies if the key column is compared to any non-constant value. Here, even though
+          the key column is a <code class="ph codeph">STRING</code>, and is tested using an equality operator, Impala must scan
+          the entire HBase table because the key column is compared to another column value rather than a constant.
+        </p>
+
+<pre class="pre codeblock"><code>explain select count(*) from hbase_table where cust_id = private_email_address;
++------------------------------------------------------------------------------------+
+| Explain String                                                                     |
++------------------------------------------------------------------------------------+
+...
+
+| 01:AGGREGATE                                                                       |
+| |  output: count(*)                                                                |
+| |                                                                                  |
+<strong class="ph b">| 00:SCAN HBASE [hbase.hbase_table]                                                  |</strong>
+<strong class="ph b">|    predicates: cust_id = private_email_address                                    |</strong>
++------------------------------------------------------------------------------------+
+</code></pre>
+
+        <p class="p">
+          Currently, tests on the row key using <code class="ph codeph">OR</code> or <code class="ph codeph">IN</code> clauses are not
+          optimized into direct lookups either. Such limitations might be lifted in the future, so always check the
+          <code class="ph codeph">EXPLAIN</code> output to be sure whether a particular SQL construct results in an efficient
+          query or not for HBase tables.
+        </p>
+
+<pre class="pre codeblock"><code>explain select count(*) from hbase_table where
+  cust_id = 'some_user@example.com' or cust_id = 'other_user@example.com';
++----------------------------------------------------------------------------------------+
+| Explain String                                                                         |
++----------------------------------------------------------------------------------------+
+...
+
+| 01:AGGREGATE                                                                           |
+| |  output: count(*)                                                                    |
+| |                                                                                      |
+<strong class="ph b">| 00:SCAN HBASE [hbase.hbase_table]                                                      |</strong>
+<strong class="ph b">|    predicates: cust_id = 'some_user@example.com' OR cust_id = 'other_user@example.com' |</strong>
++----------------------------------------------------------------------------------------+
+
+explain select count(*) from hbase_table where
+  cust_id in ('some_user@example.com', 'other_user@example.com');
++------------------------------------------------------------------------------------+
+| Explain String                                                                     |
++------------------------------------------------------------------------------------+
+...
+
+| 01:AGGREGATE                                                                       |
+| |  output: count(*)                                                                |
+| |                                                                                  |
+<strong class="ph b">| 00:SCAN HBASE [hbase.hbase_table]                                                  |</strong>
+<strong class="ph b">|    predicates: cust_id IN ('some_user@example.com', 'other_user@example.com')      |</strong>
++------------------------------------------------------------------------------------+
+</code></pre>
+
+        <p class="p">
+          Either rewrite into separate queries for each value and combine the results in the application, or
+          combine the single-row queries using UNION ALL:
+        </p>
+
+<pre class="pre codeblock"><code>select count(*) from hbase_table where cust_id = 'some_user@example.com';
+select count(*) from hbase_table where cust_id = 'other_user@example.com';
+
+explain
+  select count(*) from hbase_table where cust_id = 'some_user@example.com'
+  union all
+  select count(*) from hbase_table where cust_id = 'other_user@example.com';
++------------------------------------------------------------------------------------+
+| Explain String                                                                     |
++------------------------------------------------------------------------------------+
+...
+
+| |  04:AGGREGATE                                                                    |
+| |  |  output: count(*)                                                             |
+| |  |                                                                               |
+<strong class="ph b">| |  03:SCAN HBASE [hbase.hbase_table]                                               |</strong>
+<strong class="ph b">| |     start key: other_user@example.com                                            |</strong>
+<strong class="ph b">| |     stop key: other_user@example.com\0                                           |</strong>
+| |                                                                                  |
+| 10:MERGE                                                                           |
+...
+
+| 02:AGGREGATE                                                                       |
+| |  output: count(*)                                                                |
+| |                                                                                  |
+<strong class="ph b">| 01:SCAN HBASE [hbase.hbase_table]                                                  |</strong>
+<strong class="ph b">|    start key: some_user@example.com                                                |</strong>
+<strong class="ph b">|    stop key: some_user@example.com\0                                               |</strong>
++------------------------------------------------------------------------------------+
+</code></pre>
+
+      </div>
+
+      <div class="example"><h3 class="title sectiontitle">Configuration Options for Java HBase Applications</h3>
+
+
+
+        <p class="p"> If you have an HBase Java application that calls the
+            <code class="ph codeph">setCacheBlocks</code> or <code class="ph codeph">setCaching</code>
+          methods of the class <a class="xref" href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html" target="_blank">org.apache.hadoop.hbase.client.Scan</a>, you can set these same
+          caching behaviors through Impala query options, to control the memory
+          pressure on the HBase RegionServer. For example, when doing queries in
+          HBase that result in full-table scans (which by default are
+          inefficient for HBase), you can reduce memory usage and speed up the
+          queries by turning off the <code class="ph codeph">HBASE_CACHE_BLOCKS</code> setting
+          and specifying a large number for the <code class="ph codeph">HBASE_CACHING</code>
+          setting.
+        </p>
+
+        <p class="p">
+          To set these options, issue commands like the following in <span class="keyword cmdname">impala-shell</span>:
+        </p>
+
+<pre class="pre codeblock"><code>-- Same as calling setCacheBlocks(true) or setCacheBlocks(false).
+set hbase_cache_blocks=true;
+set hbase_cache_blocks=false;
+
+-- Same as calling setCaching(rows).
+set hbase_caching=1000;
+</code></pre>
+
+        <p class="p">
+          Or update the <span class="keyword cmdname">impalad</span> defaults file <span class="ph filepath">/etc/default/impala</span> and
+          include settings for <code class="ph codeph">HBASE_CACHE_BLOCKS</code> and/or <code class="ph codeph">HBASE_CACHING</code> in the
+          <code class="ph codeph">-default_query_options</code> setting for <code class="ph codeph">IMPALA_SERVER_ARGS</code>. See
+          <a class="xref" href="impala_config_options.html#config_options">Modifying Impala Startup Options</a> for details.
+        </p>
+
+        <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+          In Impala 2.0 and later, these options are settable through the JDBC or ODBC interfaces using the
+          <code class="ph codeph">SET</code> statement.
+        </div>
+
+      </div>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title6" id="impala_hbase__hbase_scenarios">
+
+    <h2 class="title topictitle2" id="ariaid-title6">Use Cases for Querying HBase through Impala</h2>
+
+
+    <div class="body conbody">
+
+      <p class="p">
+        The following are popular use cases for using Impala to query HBase tables:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          Keeping large fact tables in Impala, and smaller dimension tables in HBase. The fact tables use Parquet
+          or other binary file format optimized for scan operations. Join queries scan through the large Impala
+          fact tables, and cross-reference the dimension tables using efficient single-row lookups in HBase.
+        </li>
+
+        <li class="li">
+          Using HBase to store rapidly incrementing counters, such as how many times a web page has been viewed, or
+          on a social network, how many connections a user has or how many votes a post received. HBase is
+          efficient for capturing such changeable data: the append-only storage mechanism is efficient for writing
+          each change to disk, and a query always returns the latest value. An application could query specific
+          totals like these from HBase, and combine the results with a broader set of data queried from Impala.
+        </li>
+
+        <li class="li">
+          <p class="p">
+            Storing very wide tables in HBase. Wide tables have many columns, possibly thousands, typically
+            recording many attributes for an important subject such as a user of an online service. These tables
+            are also often sparse, that is, most of the columns values are <code class="ph codeph">NULL</code>, 0,
+            <code class="ph codeph">false</code>, empty string, or other blank or placeholder value. (For example, any particular
+            web site user might have never used some site feature, filled in a certain field in their profile,
+            visited a particular part of the site, and so on.) A typical query against this kind of table is to
+            look up a single row to retrieve all the information about a specific subject, rather than summing,
+            averaging, or filtering millions of rows as in typical Impala-managed tables.
+          </p>
+          <p class="p">
+            Or the HBase table could be joined with a larger Impala-managed table. For example, analyze the large
+            Impala table representing web traffic for a site and pick out 50 users who view the most pages. Join
+            that result with the wide user table in HBase to look up attributes of those users. The HBase side of
+            the join would result in 50 efficient single-row lookups in HBase, rather than scanning the entire user
+            table.
+          </p>
+        </li>
+      </ul>
+    </div>
+  </article>
+
+
+
+
+
+
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title7" id="impala_hbase__hbase_loading">
+
+    <h2 class="title topictitle2" id="ariaid-title7">Loading Data into an HBase Table</h2>
+
+
+    <div class="body conbody">
+
+      <p class="p">
+        The Impala <code class="ph codeph">INSERT</code> statement works for HBase tables. The <code class="ph codeph">INSERT ... VALUES</code>
+        syntax is ideally suited to HBase tables, because inserting a single row is an efficient operation for an
+        HBase table. (For regular Impala tables, with data files in HDFS, the tiny data files produced by
+        <code class="ph codeph">INSERT ... VALUES</code> are extremely inefficient, so you would not use that technique with
+        tables containing any significant data volume.)
+      </p>
+
+
+
+      <p class="p">
+        When you use the <code class="ph codeph">INSERT ... SELECT</code> syntax, the result in the HBase table could be fewer
+        rows than you expect. HBase only stores the most recent version of each unique row key, so if an
+        <code class="ph codeph">INSERT ... SELECT</code> statement copies over multiple rows containing the same value for the
+        key column, subsequent queries will only return one row with each key column value:
+      </p>
+
+      <p class="p">
+        Although Impala does not have an <code class="ph codeph">UPDATE</code> statement, you can achieve the same effect by
+        doing successive <code class="ph codeph">INSERT</code> statements using the same value for the key column each time:
+      </p>
+
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title8" id="impala_hbase__hbase_limitations">
+
+    <h2 class="title topictitle2" id="ariaid-title8">Limitations and Restrictions of the Impala and HBase Integration</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        The Impala integration with HBase has the following limitations and restrictions, some inherited from the
+        integration between HBase and Hive, and some unique to Impala:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          <p class="p">
+            If you issue a <code class="ph codeph">DROP TABLE</code> for an internal (Impala-managed) table that is mapped to an
+            HBase table, the underlying table is not removed in HBase. The Hive <code class="ph codeph">DROP TABLE</code>
+            statement also removes the HBase table in this case.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            The <code class="ph codeph">INSERT OVERWRITE</code> statement is not available for HBase tables. You can insert new
+            data, or modify an existing row by inserting a new row with the same key value, but not replace the
+            entire contents of the table. You can do an <code class="ph codeph">INSERT OVERWRITE</code> in Hive if you need this
+            capability.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            If you issue a <code class="ph codeph">CREATE TABLE LIKE</code> statement for a table mapped to an HBase table, the
+            new table is also an HBase table, but inherits the same underlying HBase table name as the original.
+            The new table is effectively an alias for the old one, not a new table with identical column structure.
+            Avoid using <code class="ph codeph">CREATE TABLE LIKE</code> for HBase tables, to avoid any confusion.
+          </p>
+        </li>
+
+        <li class="li">
+          <p class="p">
+            Copying data into an HBase table using the Impala <code class="ph codeph">INSERT ... SELECT</code> syntax might
+            produce fewer new rows than are in the query result set. If the result set contains multiple rows with
+            the same value for the key column, each row supercedes any previous rows with the same key value.
+            Because the order of the inserted rows is unpredictable, you cannot rely on this technique to preserve
+            the <span class="q">"latest"</span> version of a particular key value.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+            Because the complex data types (<code class="ph codeph">ARRAY</code>, <code class="ph codeph">STRUCT</code>, and <code class="ph codeph">MAP</code>)
+            available in <span class="keyword">Impala 2.3</span> and higher are currently only supported in Parquet tables, you cannot
+            use these types in HBase tables that are queried through Impala.
+          </p>
+        </li>
+        <li class="li">
+          <p class="p">
+        The <code class="ph codeph">LOAD DATA</code> statement cannot be used with HBase tables.
+      </p>
+        </li>
+        <li class="li">
+          <p class="p">
+        The <code class="ph codeph">TABLESAMPLE</code> clause of the <code class="ph codeph">SELECT</code>
+        statement does not apply to a table reference derived from a view, a subquery,
+        or anything other than a real base table. This clause only works for tables
+        backed by HDFS or HDFS-like data files, therefore it does not apply to Kudu or
+        HBase tables.
+      </p>
+        </li>
+      </ul>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title9" id="impala_hbase__hbase_queries">
+
+    <h2 class="title topictitle2" id="ariaid-title9">Examples of Querying HBase Tables from Impala</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        The following examples create an HBase table with four column families,
+        create a corresponding table through Hive,
+        then insert and query the table through Impala.
+      </p>
+      <p class="p">
+        In HBase shell, the table
+        name is quoted in <code class="ph codeph">CREATE</code> and <code class="ph codeph">DROP</code> statements. Tables created in HBase
+        begin in <span class="q">"enabled"</span> state; before dropping them through the HBase shell, you must issue a
+        <code class="ph codeph">disable '<var class="keyword varname">table_name</var>'</code> statement.
+      </p>
+
+<pre class="pre codeblock"><code>$ hbase shell
+15/02/10 16:07:45
+HBase Shell; enter 'help&lt;RETURN&gt;' for list of supported commands.
+Type "exit&lt;RETURN&gt;" to leave the HBase Shell
+...
+
+hbase(main):001:0&gt; create 'hbasealltypessmall', 'boolsCF', 'intsCF', 'floatsCF', 'stringsCF'
+0 row(s) in 4.6520 seconds
+
+=&gt; Hbase::Table - hbasealltypessmall
+hbase(main):006:0&gt; quit
+</code></pre>
+
+        <p class="p">
+          Issue the following <code class="ph codeph">CREATE TABLE</code> statement in the Hive shell. (The Impala <code class="ph codeph">CREATE
+          TABLE</code> statement currently does not support the <code class="ph codeph">STORED BY</code> clause, so you switch into Hive to
+          create the table, then back to Impala and the <span class="keyword cmdname">impala-shell</span> interpreter to issue the
+          queries.)
+        </p>
+
+        <p class="p">
+          This example creates an external table mapped to the HBase table, usable by both Impala and Hive. It is
+          defined as an external table so that when dropped by Impala or Hive, the original HBase table is not touched at all.
+        </p>
+
+        <p class="p">
+          The <code class="ph codeph">WITH SERDEPROPERTIES</code> clause
+          specifies that the first column (<code class="ph codeph">ID</code>) represents the row key, and maps the remaining
+          columns of the SQL table to HBase column families. The mapping relies on the ordinal order of the
+          columns in the table, not the column names in the <code class="ph codeph">CREATE TABLE</code> statement.
+          The first column is defined to be the lookup key; the
+          <code class="ph codeph">STRING</code> data type produces the fastest key-based lookups for HBase tables.
+        </p>
+
+        <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+          For Impala with HBase tables, the most important aspect to ensure good performance is to use a
+          <code class="ph codeph">STRING</code> column as the row key, as shown in this example.
+        </div>
+
+<pre class="pre codeblock"><code>$ hive
+...
+hive&gt; use hbase;
+OK
+Time taken: 4.095 seconds
+hive&gt; CREATE EXTERNAL TABLE hbasestringids (
+    &gt;   id string,
+    &gt;   bool_col boolean,
+    &gt;   tinyint_col tinyint,
+    &gt;   smallint_col smallint,
+    &gt;   int_col int,
+    &gt;   bigint_col bigint,
+    &gt;   float_col float,
+    &gt;   double_col double,
+    &gt;   date_string_col string,
+    &gt;   string_col string,
+    &gt;   timestamp_col timestamp)
+    &gt; STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
+    &gt; WITH SERDEPROPERTIES (
+    &gt;   "hbase.columns.mapping" =
+    &gt;   ":key,boolsCF:bool_col,intsCF:tinyint_col,intsCF:smallint_col,intsCF:int_col,intsCF:\
+    &gt;   bigint_col,floatsCF:float_col,floatsCF:double_col,stringsCF:date_string_col,\
+    &gt;   stringsCF:string_col,stringsCF:timestamp_col"
+    &gt; )
+    &gt; TBLPROPERTIES("hbase.table.name" = "hbasealltypessmall");
+OK
+Time taken: 2.879 seconds
+hive&gt; quit;
+</code></pre>
+
+        <p class="p">
+          Once you have established the mapping to an HBase table, you can issue DML statements and queries
+          from Impala. The following example shows a series of <code class="ph codeph">INSERT</code>
+          statements followed by a query.
+          The ideal kind of query from a performance standpoint
+          retrieves a row from the table based on a row key
+          mapped to a string column.
+          An initial <code class="ph codeph">INVALIDATE METADATA <var class="keyword varname">table_name</var></code>
+          statement makes the table created through Hive visible to Impala.
+        </p>
+
+<pre class="pre codeblock"><code>$ impala-shell -i localhost -d hbase
+Starting Impala Shell without Kerberos authentication
+Connected to localhost:21000
+...
+Query: use `hbase`
+[localhost:21000] &gt; invalidate metadata hbasestringids;
+Fetched 0 row(s) in 0.09s
+[localhost:21000] &gt; desc hbasestringids;
++-----------------+-----------+---------+
+| name            | type      | comment |
++-----------------+-----------+---------+
+| id              | string    |         |
+| bool_col        | boolean   |         |
+| double_col      | double    |         |
+| float_col       | float     |         |
+| bigint_col      | bigint    |         |
+| int_col         | int       |         |
+| smallint_col    | smallint  |         |
+| tinyint_col     | tinyint   |         |
+| date_string_col | string    |         |
+| string_col      | string    |         |
+| timestamp_col   | timestamp |         |
++-----------------+-----------+---------+
+Fetched 11 row(s) in 0.02s
+[localhost:21000] &gt; insert into hbasestringids values ('0001',true,3.141,9.94,1234567,32768,4000,76,'2014-12-31','Hello world',now());
+Inserted 1 row(s) in 0.26s
+[localhost:21000] &gt; insert into hbasestringids values ('0002',false,2.004,6.196,1500,8000,129,127,'2014-01-01','Foo bar',now());
+Inserted 1 row(s) in 0.12s
+[localhost:21000] &gt; select * from hbasestringids where id = '0001';
++------+----------+------------+-------------------+------------+---------+--------------+-------------+-----------------+-------------+-------------------------------+
+| id   | bool_col | double_col | float_col         | bigint_col | int_col | smallint_col | tinyint_col | date_string_col | string_col  | timestamp_col                 |
++------+----------+------------+-------------------+------------+---------+--------------+-------------+-----------------+-------------+-------------------------------+
+| 0001 | true     | 3.141      | 9.939999580383301 | 1234567    | 32768   | 4000         | 76          | 2014-12-31      | Hello world | 2015-02-10 16:36:59.764838000 |
++------+----------+------------+-------------------+------------+---------+--------------+-------------+-----------------+-------------+-------------------------------+
+Fetched 1 row(s) in 0.54s
+</code></pre>
+
+        <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+        After you create a table in Hive, such as the HBase mapping table in this example, issue an
+        <code class="ph codeph">INVALIDATE METADATA <var class="keyword varname">table_name</var></code> statement the next time you connect to
+        Impala, make Impala aware of the new table. (Prior to Impala 1.2.4, you could not specify the table name if
+        Impala was not aware of the table yet; in Impala 1.2.4 and higher, specifying the table name avoids
+        reloading the metadata for other tables that are not changed.)
+      </div>
+    </div>
+  </article>
+</article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_hbase_cache_blocks.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_hbase_cache_blocks.html b/docs/build3x/html/topics/impala_hbase_cache_blocks.html
new file mode 100644
index 0000000..27ebee3
--- /dev/null
+++ b/docs/build3x/html/topics/impala_hbase_cache_blocks.html
@@ -0,0 +1,36 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="hbase_cache_blocks"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>HBASE_CACHE_BLOCKS Query Option</title></head><body id="hbase_cache_blocks"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">HBASE_CACHE_BLOCKS Query Option</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+
+      Setting this option is equivalent to calling the
+        <code class="ph codeph">setCacheBlocks</code> method of the class <a class="xref" href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html" target="_blank">org.apache.hadoop.hbase.client.Scan</a>, in an HBase Java
+      application. Helps to control the memory pressure on the HBase
+      RegionServer, in conjunction with the <code class="ph codeph">HBASE_CACHING</code> query
+      option. </p>
+
+    <p class="p">
+        <strong class="ph b">Type:</strong> Boolean; recognized values are 1 and 0, or <code class="ph codeph">true</code> and <code class="ph codeph">false</code>;
+        any other value interpreted as <code class="ph codeph">false</code>
+      </p>
+    <p class="p">
+        <strong class="ph b">Default:</strong> <code class="ph codeph">false</code> (shown as 0 in output of <code class="ph codeph">SET</code> statement)
+      </p>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+    <p class="p">
+      <a class="xref" href="impala_hbase.html#impala_hbase">Using Impala to Query HBase Tables</a>,
+      <a class="xref" href="impala_hbase_caching.html#hbase_caching">HBASE_CACHING Query Option</a>
+    </p>
+
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_hbase_caching.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_hbase_caching.html b/docs/build3x/html/topics/impala_hbase_caching.html
new file mode 100644
index 0000000..e2082d4
--- /dev/null
+++ b/docs/build3x/html/topics/impala_hbase_caching.html
@@ -0,0 +1,36 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="hbase_caching"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>HBASE_CACHING Query Option</title></head><body id="hbase_caching"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">HBASE_CACHING Query Option</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+
+      Setting this option is equivalent to calling the
+        <code class="ph codeph">setCaching</code> method of the class <a class="xref" href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html" target="_blank">org.apache.hadoop.hbase.client.Scan</a>, in an HBase Java
+      application. Helps to control the memory pressure on the HBase
+      RegionServer, in conjunction with the <code class="ph codeph">HBASE_CACHE_BLOCKS</code>
+      query option. </p>
+
+    <p class="p">
+      <strong class="ph b">Type:</strong> <code class="ph codeph">BOOLEAN</code>
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Default:</strong> 0
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+    <p class="p">
+      <a class="xref" href="impala_hbase.html#impala_hbase">Using Impala to Query HBase Tables</a>,
+      <a class="xref" href="impala_hbase_cache_blocks.html#hbase_cache_blocks">HBASE_CACHE_BLOCKS Query Option</a>
+    </p>
+
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_hints.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_hints.html b/docs/build3x/html/topics/impala_hints.html
new file mode 100644
index 0000000..7777fa2
--- /dev/null
+++ b/docs/build3x/html/topics/impala_hints.html
@@ -0,0 +1,488 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_langref_sql.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="hints"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Optimizer Hints</title></head><body id="hints"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Optimizer Hints</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+       The Impala SQL supports
+      query hints, for fine-tuning the inner workings of queries. Specify hints
+      as a temporary workaround for expensive queries, where missing statistics
+      or other factors cause inefficient performance. </p>
+
+    <p class="p"> Hints are most often used for the resource-intensive Impala queries,
+      such as: </p>
+
+    <ul class="ul">
+      <li class="li">
+        Join queries involving large tables, where intermediate result sets are transmitted across the network to
+        evaluate the join conditions.
+      </li>
+
+      <li class="li">
+        Inserting into partitioned Parquet tables, where many memory buffers could be allocated on each host to
+        hold intermediate results for each partition.
+      </li>
+    </ul>
+
+    <p class="p">
+        <strong class="ph b">Syntax:</strong>
+      </p>
+
+    <p class="p"> In <span class="keyword">Impala 2.0</span> and higher, you can
+      specify the hints inside comments that use either the <code class="ph codeph">/*
+        */</code> or <code class="ph codeph">--</code> notation. Specify a
+        <code class="ph codeph">+</code> symbol immediately before the hint name. Recently
+      added hints are only available using the <code class="ph codeph">/* */</code> and
+        <code class="ph codeph">--</code> notation. For clarity, the <code class="ph codeph">/* */</code>
+      and <code class="ph codeph">--</code> styles are used in the syntax and examples
+      throughout this section. With the <code class="ph codeph">/* */</code> or
+        <code class="ph codeph">--</code> notation for hints, specify a <code class="ph codeph">+</code>
+      symbol immediately before the first hint name. Multiple hints can be
+      specified separated by commas, for example <code class="ph codeph">/* +clustered,shuffle
+        */</code>
+    </p>
+
+<pre class="pre codeblock"><code>SELECT STRAIGHT_JOIN <var class="keyword varname">select_list</var> FROM
+<var class="keyword varname">join_left_hand_table</var>
+  JOIN /* +BROADCAST|SHUFFLE */
+<var class="keyword varname">join_right_hand_table</var>
+<var class="keyword varname">remainder_of_query</var>;
+
+SELECT <var class="keyword varname">select_list</var> FROM
+<var class="keyword varname">join_left_hand_table</var>
+  JOIN -- +BROADCAST|SHUFFLE
+<var class="keyword varname">join_right_hand_table</var>
+<var class="keyword varname">remainder_of_query</var>;
+
+INSERT <var class="keyword varname">insert_clauses</var>
+  /* +SHUFFLE|NOSHUFFLE */
+  SELECT <var class="keyword varname">remainder_of_query</var>;
+
+INSERT <var class="keyword varname">insert_clauses</var>
+  -- +SHUFFLE|NOSHUFFLE
+  SELECT <var class="keyword varname">remainder_of_query</var>;
+
+<span class="ph">
+INSERT /* +SHUFFLE|NOSHUFFLE */
+  <var class="keyword varname">insert_clauses</var>
+  SELECT <var class="keyword varname">remainder_of_query</var>;</span>
+
+<span class="ph">
+INSERT -- +SHUFFLE|NOSHUFFLE
+  <var class="keyword varname">insert_clauses</var>
+  SELECT <var class="keyword varname">remainder_of_query</var>;</span>
+
+<span class="ph">
+UPSERT /* +SHUFFLE|NOSHUFFLE */
+  <var class="keyword varname">upsert_clauses</var>
+  SELECT <var class="keyword varname">remainder_of_query</var>;</span>
+
+<span class="ph">
+UPSERT -- +SHUFFLE|NOSHUFFLE
+  <var class="keyword varname">upsert_clauses</var>
+  SELECT <var class="keyword varname">remainder_of_query</var>;</span>
+
+<span class="ph">SELECT <var class="keyword varname">select_list</var> FROM
+<var class="keyword varname">table_ref</var>
+  /* +{SCHEDULE_CACHE_LOCAL | SCHEDULE_DISK_LOCAL | SCHEDULE_REMOTE}
+    [,RANDOM_REPLICA] */
+<var class="keyword varname">remainder_of_query</var>;</span>
+
+<span class="ph">INSERT <var class="keyword varname">insert_clauses</var>
+  -- +CLUSTERED
+  SELECT <var class="keyword varname">remainder_of_query</var>;
+
+INSERT <var class="keyword varname">insert_clauses</var>
+  /* +CLUSTERED */
+  SELECT <var class="keyword varname">remainder_of_query</var>;</span>
+
+<span class="ph">INSERT -- +CLUSTERED
+  <var class="keyword varname">insert_clauses</var>
+  SELECT <var class="keyword varname">remainder_of_query</var>;
+
+INSERT /* +CLUSTERED */
+  <var class="keyword varname">insert_clauses</var>
+  SELECT <var class="keyword varname">remainder_of_query</var>;
+
+UPSERT -- +CLUSTERED
+  <var class="keyword varname">upsert_clauses</var>
+  SELECT <var class="keyword varname">remainder_of_query</var>;
+
+UPSERT /* +CLUSTERED */
+  <var class="keyword varname">upsert_clauses</var>
+  SELECT <var class="keyword varname">remainder_of_query</var>;</span>
+
+CREATE /* +SHUFFLE|NOSHUFFLE */
+  <var class="keyword varname">table_clauses</var>
+  AS SELECT <var class="keyword varname">remainder_of_query</var>;
+
+CREATE -- +SHUFFLE|NOSHUFFLE
+  <var class="keyword varname">table_clauses</var>
+  AS SELECT <var class="keyword varname">remainder_of_query</var>;
+
+CREATE /* +CLUSTER|NOCLUSTER */
+  <var class="keyword varname">table_clauses</var>
+  AS SELECT <var class="keyword varname">remainder_of_query</var>;
+
+CREATE -- +CLUSTER|NOCLUSTER
+  <var class="keyword varname">table_clauses</var>
+  AS SELECT <var class="keyword varname">remainder_of_query</var>;
+</code></pre>
+    <p class="p">The square bracket style hints are supported for backward compatibility,
+      but the syntax is deprecated and will be removed in a future release. For
+      that reason, any newly added hints are not available with the square
+      bracket syntax.</p>
+    <pre class="pre codeblock"><code>SELECT STRAIGHT_JOIN <var class="keyword varname">select_list</var> FROM
+<var class="keyword varname">join_left_hand_table</var>
+  JOIN [{ /* +BROADCAST */ | /* +SHUFFLE */ }]
+<var class="keyword varname">join_right_hand_table</var>
+<var class="keyword varname">remainder_of_query</var>;
+
+INSERT <var class="keyword varname">insert_clauses</var>
+  [{ /* +SHUFFLE */ | /* +NOSHUFFLE */ }]
+  [<span class="ph">/* +CLUSTERED */</span>]
+  SELECT <var class="keyword varname">remainder_of_query</var>;
+
+<span class="ph">
+UPSERT [{ /* +SHUFFLE */ | /* +NOSHUFFLE */ }]
+  [<span class="ph">/* +CLUSTERED */</span>]
+  <var class="keyword varname">upsert_clauses</var>
+  SELECT <var class="keyword varname">remainder_of_query</var>;</span>
+</code></pre>
+
+    <p class="p">
+        <strong class="ph b">Usage notes:</strong>
+      </p>
+
+    <p class="p">
+      With both forms of hint syntax, include the <code class="ph codeph">STRAIGHT_JOIN</code>
+      keyword immediately after the <code class="ph codeph">SELECT</code> and any
+      <code class="ph codeph">DISTINCT</code> or <code class="ph codeph">ALL</code> keywords to prevent Impala from
+      reordering the tables in a way that makes the join-related hints ineffective.
+    </p>
+
+    <p class="p">
+        The <code class="ph codeph">STRAIGHT_JOIN</code> hint affects the join order of table references in the query
+        block containing the hint. It does not affect the join order of nested queries, such as views,
+        inline views, or <code class="ph codeph">WHERE</code>-clause subqueries. To use this hint for performance
+        tuning of complex queries, apply the hint to all query blocks that need a fixed join order.
+      </p>
+
+    <p class="p">
+      To reduce the need to use hints, run the <code class="ph codeph">COMPUTE STATS</code> statement against all tables involved
+      in joins, or used as the source tables for <code class="ph codeph">INSERT ... SELECT</code> operations where the
+      destination is a partitioned Parquet table. Do this operation after loading data or making substantial
+      changes to the data within each table. Having up-to-date statistics helps Impala choose more efficient query
+      plans without the need for hinting. See <a class="xref" href="impala_perf_stats.html#perf_stats">Table and Column Statistics</a> for details and
+      examples.
+    </p>
+
+    <p class="p">
+      To see which join strategy is used for a particular query, examine the <code class="ph codeph">EXPLAIN</code> output for
+      that query. See <a class="xref" href="impala_explain_plan.html#perf_explain">Using the EXPLAIN Plan for Performance Tuning</a> for details and examples.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Hints for join queries:</strong>
+    </p>
+
+    <p class="p">
+      The <code class="ph codeph">/* +BROADCAST */</code> and <code class="ph codeph">/* +SHUFFLE */</code> hints control the execution strategy for join
+      queries. Specify one of the following constructs immediately after the <code class="ph codeph">JOIN</code> keyword in a
+      query:
+    </p>
+
+    <ul class="ul">
+      <li class="li">
+        <code class="ph codeph">/* +SHUFFLE */</code> - Makes that join operation use the <span class="q">"partitioned"</span> technique, which divides
+        up corresponding rows from both tables using a hashing algorithm, sending subsets of the rows to other
+        nodes for processing. (The keyword <code class="ph codeph">SHUFFLE</code> is used to indicate a <span class="q">"partitioned join"</span>,
+        because that type of join is not related to <span class="q">"partitioned tables"</span>.) Since the alternative
+        <span class="q">"broadcast"</span> join mechanism is the default when table and index statistics are unavailable, you might
+        use this hint for queries where broadcast joins are unsuitable; typically, partitioned joins are more
+        efficient for joins between large tables of similar size.
+      </li>
+
+      <li class="li">
+        <code class="ph codeph">/* +BROADCAST */</code> - Makes that join operation use the <span class="q">"broadcast"</span> technique that sends the
+        entire contents of the right-hand table to all nodes involved in processing the join. This is the default
+        mode of operation when table and index statistics are unavailable, so you would typically only need it if
+        stale metadata caused Impala to mistakenly choose a partitioned join operation. Typically, broadcast joins
+        are more efficient in cases where one table is much smaller than the other. (Put the smaller table on the
+        right side of the <code class="ph codeph">JOIN</code> operator.)
+      </li>
+    </ul>
+
+    <p class="p">
+      <strong class="ph b">Hints for INSERT ... SELECT and CREATE TABLE AS SELECT (CTAS):</strong>
+    </p>
+    <p class="p" id="hints__insert_hints">
+      When inserting into partitioned tables, such as using the Parquet file
+      format, you can include a hint in the <code class="ph codeph">INSERT</code> or <code class="ph codeph">CREATE TABLE AS SELECT(CTAS)</code>
+      statements to fine-tune the overall performance of the operation and its
+      resource usage.</p>
+    <p class="p">
+      You would only use hints if an <code class="ph codeph">INSERT</code> or
+        <code class="ph codeph">CTAS</code> into a partitioned table was failing due to
+      capacity limits, or if such an operation was succeeding but with
+      less-than-optimal performance.
+    </p>
+
+    <ul class="ul">
+      <li class="li">
+        <code class="ph codeph">/* +SHUFFLE */</code> and <code class="ph codeph">/* +NOSHUFFLE */</code> Hints
+        <ul class="ul">
+          <li class="li">
+            <code class="ph codeph">/* +SHUFFLE */</code> adds an exchange node, before
+            writing the data, which re-partitions the result of the
+              <code class="ph codeph">SELECT</code> based on the partitioning columns of the
+            target table. With this hint, only one node writes to a partition at
+            a time, minimizing the global number of simultaneous writes and the
+            number of memory buffers holding data for individual partitions.
+            This also reduces fragmentation, resulting in fewer files. Thus it
+            reduces overall resource usage of the <code class="ph codeph">INSERT</code> or
+              <code class="ph codeph">CTAS</code> operation and allows some operations to
+            succeed that otherwise would fail. It does involve some data
+            transfer between the nodes so that the data files for a particular
+            partition are all written on the same node.
+
+            <p class="p">
+              Use <code class="ph codeph">/* +SHUFFLE */</code> in cases where an <code class="ph codeph">INSERT</code>
+              or <code class="ph codeph">CTAS</code> statement fails or runs inefficiently due
+              to all nodes attempting to write data for all partitions.
+            </p>
+
+            <p class="p"> If the table is unpartitioned or every partitioning expression
+              is constant, then <code class="ph codeph">/* +SHUFFLE */</code> will cause every
+              write to happen on the coordinator node.
+            </p>
+          </li>
+
+          <li class="li">
+            <code class="ph codeph">/* +NOSHUFFLE */</code> does not add exchange node before
+            inserting to partitioned tables and disables re-partitioning. So the
+            selected execution plan might be faster overall, but might also
+            produce a larger number of small data files or exceed capacity
+            limits, causing the <code class="ph codeph">INSERT</code> or <code class="ph codeph">CTAS</code>
+            operation to fail.
+
+            <p class="p"> Impala automatically uses the <code class="ph codeph">/*
+                +SHUFFLE */</code> method if any partition key column in the
+              source table, mentioned in the <code class="ph codeph">SELECT</code> clause,
+              does not have column statistics. In this case, use the <code class="ph codeph">/*
+                +NOSHUFFLE */</code> hint if you want to override this default
+              behavior.
+            </p>
+          </li>
+
+          <li class="li">
+            If column statistics are available for all partition key columns
+            in the source table mentioned in the <code class="ph codeph">INSERT ...
+              SELECT</code> or <code class="ph codeph">CTAS</code> query, Impala chooses
+            whether to use the <code class="ph codeph">/* +SHUFFLE */</code> or <code class="ph codeph">/*
+              +NOSHUFFLE */</code> technique based on the estimated number of
+            distinct values in those columns and the number of nodes involved in
+            the operation. In this case, you might need the <code class="ph codeph">/* +SHUFFLE
+              */</code> or the <code class="ph codeph">/* +NOSHUFFLE */</code> hint to
+            override the execution plan selected by Impala.
+          </li>
+        </ul>
+      </li>
+
+      <li class="li">
+        <code class="ph codeph">/* +CLUSTERED */</code> and <code class="ph codeph">/* +NOCLUSTERED
+          */</code> Hints
+        <ul class="ul">
+          <li class="li">
+            <code class="ph codeph">/* +CLUSTERED */</code> sorts data by the partition
+            columns before inserting to ensure that only one partition is
+            written at a time per node. Use this hint to reduce the number of
+            files kept open and the number of buffers kept in memory
+            simultaneously. This technique is primarily useful for inserts into
+            Parquet tables, where the large block size requires substantial
+            memory to buffer data for multiple output files at once. This hint
+            is available in <span class="keyword">Impala 2.8</span> or higher.
+
+            <p class="p">
+              Starting in <span class="keyword">Impala 3.0</span>, <code class="ph codeph">/*
+                +CLUSTERED */</code> is the default behavior for HDFS tables.
+            </p>
+          </li>
+
+          <li class="li">
+            <code class="ph codeph">/* +NOCLUSTERED */</code> does not sort by primary key
+            before insert. This hint is available in <span class="keyword">Impala 2.8</span> or higher.
+
+            <p class="p">
+              Use this hint when inserting to Kudu tables.
+            </p>
+
+            <p class="p">
+              In the versions lower than <span class="keyword">Impala 3.0</span>,
+                <code class="ph codeph">/* +NOCLUSTERED */</code> is the default in HDFS
+              tables.
+            </p>
+          </li>
+        </ul>
+      </li>
+    </ul>
+
+    <p class="p">
+      Starting from <span class="keyword">Impala 2.9</span>, <code class="ph codeph">INSERT</code>
+      or <code class="ph codeph">UPSERT</code> operations into Kudu tables automatically have
+      an exchange and sort node added to the plan that partitions and sorts the
+      rows according to the partitioning/primary key scheme of the target table
+      (unless the number of rows to be inserted is small enough to trigger
+      single node execution). Use the<code class="ph codeph"> /* +NOCLUSTERED */</code> and
+        <code class="ph codeph">/* +NOSHUFFLE */</code> hints together to disable partitioning
+      and sorting before the rows are sent to Kudu.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Hints for scheduling of HDFS blocks:</strong>
+    </p>
+
+    <p class="p">
+      The hints <code class="ph codeph">/* +SCHEDULE_CACHE_LOCAL */</code>,
+      <code class="ph codeph">/* +SCHEDULE_DISK_LOCAL */</code>, and
+      <code class="ph codeph">/* +SCHEDULE_REMOTE */</code> have the same effect
+      as specifying the <code class="ph codeph">REPLICA_PREFERENCE</code> query
+      option with the respective option settings of <code class="ph codeph">CACHE_LOCAL</code>,
+      <code class="ph codeph">DISK_LOCAL</code>, or <code class="ph codeph">REMOTE</code>.
+      The hint <code class="ph codeph">/* +RANDOM_REPLICA */</code> is the same as
+      enabling the <code class="ph codeph">SCHEDULE_RANDOM_REPLICA</code> query option.
+    </p>
+
+    <p class="p">
+      You can use these hints in combination by separating them with commas,
+      for example, <code class="ph codeph">/* +SCHEDULE_CACHE_LOCAL,RANDOM_REPLICA */</code>.
+      See <a class="xref" href="impala_replica_preference.html">REPLICA_PREFERENCE Query Option (Impala 2.7 or higher only)</a> and
+      <a class="xref" href="impala_schedule_random_replica.html">SCHEDULE_RANDOM_REPLICA Query Option (Impala 2.5 or higher only)</a> for information about how
+      these settings influence the way Impala processes HDFS data blocks.
+    </p>
+
+    <p class="p">
+      Specifying the replica preference as a query hint always overrides the
+      query option setting. Specifying either the <code class="ph codeph">SCHEDULE_RANDOM_REPLICA</code>
+      query option or the corresponding <code class="ph codeph">RANDOM_REPLICA</code> query hint
+      enables the random tie-breaking behavior when processing data blocks
+      during the query.
+    </p>
+
+    <p class="p">
+      <strong class="ph b">Suggestions versus directives:</strong>
+    </p>
+
+    <p class="p">
+      In early Impala releases, hints were always obeyed and so acted more like directives. Once Impala gained join
+      order optimizations, sometimes join queries were automatically reordered in a way that made a hint
+      irrelevant. Therefore, the hints act more like suggestions in Impala 1.2.2 and higher.
+    </p>
+
+    <p class="p">
+      To force Impala to follow the hinted execution mechanism for a join query, include the
+      <code class="ph codeph">STRAIGHT_JOIN</code> keyword in the <code class="ph codeph">SELECT</code> statement. See
+      <a class="xref" href="impala_perf_joins.html#straight_join">Overriding Join Reordering with STRAIGHT_JOIN</a> for details. When you use this technique, Impala does not
+      reorder the joined tables at all, so you must be careful to arrange the join order to put the largest table
+      (or subquery result set) first, then the smallest, second smallest, third smallest, and so on. This ordering lets Impala do the
+      most I/O-intensive parts of the query using local reads on the DataNodes, and then reduce the size of the
+      intermediate result set as much as possible as each subsequent table or subquery result set is joined.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Restrictions:</strong>
+      </p>
+
+    <p class="p">
+      Queries that include subqueries in the <code class="ph codeph">WHERE</code> clause can be rewritten internally as join
+      queries. Currently, you cannot apply hints to the joins produced by these types of queries.
+    </p>
+
+    <p class="p">
+      Because hints can prevent queries from taking advantage of new metadata or improvements in query planning,
+      use them only when required to work around performance issues, and be prepared to remove them when they are
+      no longer required, such as after a new Impala release or bug fix.
+    </p>
+
+    <p class="p">
+      In particular, the <code class="ph codeph">/* +BROADCAST */</code> and <code class="ph codeph">/* +SHUFFLE */</code> hints are expected to be
+      needed much less frequently in Impala 1.2.2 and higher, because the join order optimization feature in
+      combination with the <code class="ph codeph">COMPUTE STATS</code> statement now automatically choose join order and join
+      mechanism without the need to rewrite the query and add hints. See
+      <a class="xref" href="impala_perf_joins.html#perf_joins">Performance Considerations for Join Queries</a> for details.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Compatibility:</strong>
+      </p>
+
+    <p class="p">
+      The hints embedded within <code class="ph codeph">--</code> comments are compatible with Hive queries. The hints embedded
+      within <code class="ph codeph">/* */</code> comments or <code class="ph codeph">[ ]</code> square brackets are not recognized by or not
+      compatible with Hive. For example, Hive raises an error for Impala hints within <code class="ph codeph">/* */</code>
+      comments because it does not recognize the Impala hint names.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Considerations for views:</strong>
+      </p>
+
+    <p class="p">
+      If you use a hint in the query that defines a view, the hint is preserved when you query the view. Impala
+      internally rewrites all hints in views to use the <code class="ph codeph">--</code> comment notation, so that Hive can
+      query such views without errors due to unrecognized hint names.
+    </p>
+
+    <p class="p">
+        <strong class="ph b">Examples:</strong>
+      </p>
+
+    <p class="p">
+      For example, this query joins a large customer table with a small lookup table of less than 100 rows. The
+      right-hand table can be broadcast efficiently to all nodes involved in the join. Thus, you would use the
+      <code class="ph codeph">/* +broadcast */</code> hint to force a broadcast join strategy:
+    </p>
+
+<pre class="pre codeblock"><code>select straight_join customer.address, state_lookup.state_name
+  from customer join <strong class="ph b">/* +broadcast */</strong> state_lookup
+  on customer.state_id = state_lookup.state_id;</code></pre>
+
+    <p class="p">
+      This query joins two large tables of unpredictable size. You might benchmark the query with both kinds of
+      hints and find that it is more efficient to transmit portions of each table to other nodes for processing.
+      Thus, you would use the <code class="ph codeph">/* +shuffle */</code> hint to force a partitioned join strategy:
+    </p>
+
+<pre class="pre codeblock"><code>select straight_join weather.wind_velocity, geospatial.altitude
+  from weather join <strong class="ph b">/* +shuffle */</strong> geospatial
+  on weather.lat = geospatial.lat and weather.long = geospatial.long;</code></pre>
+
+    <p class="p">
+      For joins involving three or more tables, the hint applies to the tables on either side of that specific
+      <code class="ph codeph">JOIN</code> keyword. The <code class="ph codeph">STRAIGHT_JOIN</code> keyword ensures that joins are processed
+      in a predictable order from left to right. For example, this query joins
+      <code class="ph codeph">t1</code> and <code class="ph codeph">t2</code> using a partitioned join, then joins that result set to
+      <code class="ph codeph">t3</code> using a broadcast join:
+    </p>
+
+<pre class="pre codeblock"><code>select straight_join t1.name, t2.id, t3.price
+  from t1 join <strong class="ph b">/* +shuffle */</strong> t2 join <strong class="ph b">/* +broadcast */</strong> t3
+  on t1.id = t2.id and t2.id = t3.id;</code></pre>
+
+
+
+    <p class="p">
+        <strong class="ph b">Related information:</strong>
+      </p>
+
+    <p class="p">
+      For more background information about join queries, see <a class="xref" href="impala_joins.html#joins">Joins in Impala SELECT Statements</a>. For
+      performance considerations, see <a class="xref" href="impala_perf_joins.html#perf_joins">Performance Considerations for Join Queries</a>.
+    </p>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_langref_sql.html">Impala SQL Statements</a></div></div></nav></article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_identifiers.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_identifiers.html b/docs/build3x/html/topics/impala_identifiers.html
new file mode 100644
index 0000000..267b91d
--- /dev/null
+++ b/docs/build3x/html/topics/impala_identifiers.html
@@ -0,0 +1,110 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_schema_objects.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="identifiers"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Overview of Impala Identifiers</title></head><body id="identifiers"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Overview of Impala Identifiers</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+      Identifiers are the names of databases, tables, or columns that you specify in a SQL statement. The rules for
+      identifiers govern what names you can give to things you create, the notation for referring to names
+      containing unusual characters, and other aspects such as case sensitivity.
+    </p>
+
+    <ul class="ul">
+      <li class="li">
+        <p class="p">
+        The minimum length of an identifier is 1 character.
+        </p>
+      </li>
+
+      <li class="li">
+        <p class="p">
+        The maximum length of an identifier is currently 128 characters, enforced by the metastore database.
+        </p>
+      </li>
+
+      <li class="li">
+        <p class="p">
+        An identifier must start with an alphabetic character. The remainder can contain any combination of
+        alphanumeric characters and underscores. Quoting the identifier with backticks has no effect on the allowed
+        characters in the name.
+        </p>
+      </li>
+
+      <li class="li">
+        <p class="p">
+        An identifier can contain only ASCII characters.
+        </p>
+      </li>
+
+      <li class="li">
+        <p class="p">
+        To use an identifier name that matches one of the Impala reserved keywords (listed in
+        <a class="xref" href="impala_reserved_words.html#reserved_words">Impala Reserved Words</a>), surround the identifier with <code class="ph codeph">``</code>
+        characters (backticks). Quote the reserved word even if it is part of a fully qualified name.
+        The following example shows how a reserved word can be used as a column name if it is quoted
+        with backticks in the <code class="ph codeph">CREATE TABLE</code> statement, and how the column name
+        must also be quoted with backticks in a query:
+        </p>
+<pre class="pre codeblock"><code>[localhost:21000] &gt; create table reserved (`data` string);
+
+[localhost:21000] &gt; select data from reserved;
+ERROR: AnalysisException: Syntax error in line 1:
+select data from reserved
+       ^
+Encountered: DATA
+Expected: ALL, CASE, CAST, DISTINCT, EXISTS, FALSE, IF, INTERVAL, NOT, NULL, STRAIGHT_JOIN, TRUE, IDENTIFIER
+CAUSED BY: Exception: Syntax error
+
+[localhost:21000] &gt; select reserved.data from reserved;
+ERROR: AnalysisException: Syntax error in line 1:
+select reserved.data from reserved
+                ^
+Encountered: DATA
+Expected: IDENTIFIER
+CAUSED BY: Exception: Syntax error
+
+[localhost:21000] &gt; select reserved.`data` from reserved;
+
+[localhost:21000] &gt;
+</code></pre>
+
+        <div class="note important note_important"><span class="note__title importanttitle">Important:</span>
+          Because the list of reserved words grows over time as new SQL syntax is added,
+          consider adopting coding conventions (especially for any automated scripts
+          or in packaged applications) to always quote all identifiers with backticks.
+          Quoting all identifiers protects your SQL from compatibility issues if
+          new reserved words are added in later releases.
+        </div>
+
+      </li>
+
+      <li class="li">
+        <p class="p">
+        Impala identifiers are always case-insensitive. That is, tables named <code class="ph codeph">t1</code> and
+        <code class="ph codeph">T1</code> always refer to the same table, regardless of quote characters. Internally, Impala
+        always folds all specified table and column names to lowercase. This is why the column headers in query
+        output are always displayed in lowercase.
+        </p>
+      </li>
+    </ul>
+
+    <p class="p">
+      See <a class="xref" href="impala_aliases.html#aliases">Overview of Impala Aliases</a> for how to define shorter or easier-to-remember aliases if the
+      original names are long or cryptic identifiers.
+      <span class="ph"> Aliases follow the same rules as identifiers when it comes to case
+        insensitivity. Aliases can be longer than identifiers (up to the maximum length of a Java string) and can
+        include additional characters such as spaces and dashes when they are quoted using backtick characters.
+        </span>
+    </p>
+
+    <p class="p">
+        Another way to define different names for the same tables or columns is to create views. See
+        <a class="xref" href="../shared/../topics/impala_views.html#views">Overview of Impala Views</a> for details.
+      </p>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_schema_objects.html">Impala Schema Objects and Object Names</a></div></div></nav></article></main></body></html>